remote control: distributed application configuration, … · resource discovery systems often...

Remote Control: DistributedApplication Configuration, Management,

and Visualization with PlushJeannie Albrecht – Williams College

Ryan Braud, Darren Dao, Nikolay Topilski, Christopher Tuttle, Alex C. Snoeren,and Amin Vahdat – University of California, San Diego

ABSTRACT

Support for distributed application management in large-scale networked environments re-mains in its early stages. Although a number of solutions exist for subtasks of application deploy-ment, monitoring, maintenance, and visualization in distributed environments, few tools provide aunified framework for application management. Many of the existing tools address the manage-ment needs of a single type of application or service that runs in a specific environment, and thesetools are not adaptable enough to be used for other applications or platforms. In this paper, wepresent the design and implementation of Plush, a fully configurable application management in-frastructure designed to meet the general requirements of several different classes of distributedapplications and execution environments. Plush allows developers to specifically define the flowof control needed by their computations using application building blocks. Through an extensibleresource management interface, Plush supports execution in a variety of environments, includingboth live deployment platforms and emulated clusters. To gain an understanding of how Plushmanages different classes of distributed applications, we take a closer look at specific applicationsand evaluate how Plush provides support for each.

Introduction

Managing distributed applications involves deploy-ing, configuring, executing, and debugging software run-ning on multiple computers simultaneously. Particularlyfor applications running on resources that are spreadacross the wide-area, distributed application managementis a time-consuming and error-prone process. After theinitial deployment of the software, the applications needmechanisms for detecting and recovering from the in-evitable failures and problems endemic to distributed en-vironments. To achieve availability and reliability, appli-cations must be carefully monitored and controlled to en-sure continued operation and sustained performance. Op-erators in charge of deploying and managing these appli-cations face a daunting list of challenges: discoveringand acquiring appropriate resources for hosting the appli-cation, distributing the necessary software, and appropri-ately configuring the resources (and re-configuring themif operating conditions change). It is not surprising, then,that a number of tools have been developed to addressvarious aspects of the process in distributed environ-ments, but no solution yet flexibly automates the applica-tion deployment and management process across all en-vironments.

Presently, most researchers who want to evaluatetheir applications in wide-area distributed environ-ments take one of three management approaches. OnPlanetLab [6, 26], service operators address deploy-ment and monitoring in an ad hoc, application-specific

fashion using customized scripts. Grid researchers, onthe other hand, leverage one or more toolkits (such asthe Globus Toolkit [12]) for application developmentand deployment. These toolkits often require tight in-tegration with not only the infrastructure, but the ap-plication itself. Hence, applications must be customtailored for a given toolkit, and can not easily be runin other environments. Similarly, system administra-tors who are responsible for configuring resources inmachine-room settings often use remote executiontools such as cfengine [9] for managing and configur-ing networks of machines. As in the other two ap-proaches, however, the configuration files are tailoredto a specific environment and a particular set of re-sources, and thus are not easily extended to other plat-forms.

Motivated by the limitations of existing approach-es, we believe that a unified set of abstractions forachieving availability, scalability, and fault tolerance canbe applied to a broad range of distributed applications,shielding developers from some of the complexities oflarge-scale networked environments. The primary goalof our research is to understand these abstractions anddefine interfaces for specifying and managing distrib-uted computations run in any execution environment.We are not trying to build another toolkit for managingdistributed applications. Rather, we hope to define theway users think about their applications, regardless oftheir target platform. We took inspiration from classical

21st Large Installation System Administration Conference (LISA ’07) 183

Remote Control: Distributed Application . . . With Plush Albrecht, et al.

operating systems like UNIX [28] which defined thestandard abstractions for managing applications: files,processes, pipes, etc. For most users, communicationwith these abstractions is simplified through the use of ashell or command-line interpreter. Of course, distributedcomputations are both more difficult to specify, becauseof heterogeneous hardware and software bases, andmore difficult to manage, because of failure conditionsand variable host and network attributes. Further, manydistributed testbeds do not provide global file system ab-stractions, which complicates data management.

To this end, we present Plush [27], a generic appli-cation management infrastructure that provides a unifiedset of abstractions for specifying, deploying, and moni-toring distributed applications. Although Plush was ini-tially designed to support applications running on Planet-Lab [2], Plush now provides extensions that allow usersto manage distributed applications in a variety of com-puting environments. Plush users describe distributedcomputations using an extensible application specifica-tion language. In contrast to other application manage-ment systems, however, the language allows users tocustomize various aspects of the deployment life cycle tofit the needs of an application and its target infrastruc-ture. Users can, for example, specify a particular re-source discovery service to use during application de-ployment. Plush also provides extensive failure manage-ment support to automatically adapt to failures in the ap-plication and the underlying computational infrastruc-ture. Users interact with Plush through a simple com-mand-line interface or a graphical user interface (GUI).Additionally, Plush exports an XML-RPC interface thatallows users to programmatically integrate their applica-tions with Plush if desired.

Plush provides abstractions for managing re-source discovery and acquisition, software distribu-tion, and process execution in a variety of distributedenvironments. Applications are specified using combi-nations of Plush application ‘‘building blocks’’ that de-fine a custom control flow. Once an application is run-ning, Plush monitors it for failures or application-levelerrors for the duration of its execution. Upon detectinga problem, Plush performs a number of user-config-urable recovery actions, such as restarting the applica-tion, automatically reconfiguring it, or even searchingfor alternate resources. For applications requiring wide-area synchronization, Plush provides several efficientsynchronization primitives in the form of partial barri-ers, which help applications achieve better performanceand robustness in failure-prone environments [1].

The remainder of this paper discusses the archi-tecture of Plush. We motivate the design in the nextsection by enumerating a set of general requirementsfor managing distributed applications. Subsequently,we present details about the design and implementa-tion of Plush and then provide specific applicationcase studies and uses of Plush. Related work is shownin the next section which is followed by the conclu-sion.

Application Management Requirements

To better understand the requirements of a dis-tributed application management framework, we firstconsider how we might run a specific application in awidely-used distributed environment. In particular, weinvestigate the process of running SWORD [23], apublicly-available resource discovery service, on Plan-etLab. SWORD uses a distributed hash table (DHT)for storing data, and aims to run on as many hosts aspossible, as long as the hosts provide some minimumlevel of availability (since SWORD provides a serviceto other PlanetLab users). Before starting SWORD,we have to find and gain access to PlanetLab ma-chines capable of hosting the service. Since SWORDis most concerned with reliability, it does not necessar-ily need powerful machines, but it must avoid nodesthat frequently perform poorly over a relatively longtime. We locate reliable machines using a tool likeCoMon [25], which monitors resource usage on Plan-etLab, and then we install the SWORD software onthose machines.

This software installation involves downloadingthe SWORD software package on each host individu-ally, unpacking the software, and installing any soft-ware dependencies, including a Java Runtime Envi-ronment. After the software has been installed on allof the selected machines, we start the SWORD execu-tion. Recall that reliability is important to SWORD, soif an error or failure occurs at any point, we need toquickly detect it (perhaps using custom scripts andcron jobs) and restore the service to maintain highavailability.

Running SWORD on PlanetLab is an example ofa specific distributed application deployment. Thelow-level details of managing distributed applications ingeneral largely depend on the characteristics of the tar-get application and environment. For example, long-running services such as SWORD prefer reliable ma-chines and attempt to dynamically recover from failuresto ensure high availability. On the other hand, short-lived scientific parallel applications (e.g., EMAN [18])prefer powerful machines with high bandwidth/low la-tency network connections. Long term reliability is not ahuge concern for these applications, since they haveshort execution times. At a high level, however, if weignore the complexities associated with resource man-agement, the requirements for managing distributed ap-plications are largely similar for all applications and en-vironments. Rather than reinvent the same infrastructurefor each class separately, our goal is to identify commonabstractions that support the execution of many types ofdistributed applications, and to build an application-management infrastructure that supports the general re-quirements of all applications. In this section, we identi-fy these general requirements for distributed applicationmanagement.

Specification. A generic application controllermust allow application operators to customize the

184 21st Large Installation System Administration Conference (LISA ’07)

Albrecht, et al. Remote Control: Distributed Application . . . With Plush

control flow for each application. This specification isan abstraction that describes distributed computations.A specification identifies all aspects of the executionand environment needed to successfully deploy, man-age, and maintain an application, including the soft-ware required to run the application, the processes thatwill run on each machine, the resources required toachieve the desired performance, and any environ-ment-specific execution parameters. User credentialsfor resources must also be included in the applicationspecification in order to obtain access to resources. Tomanage complex multi-phased computations, such asscientific parallel applications, the specification mustsupport application synchronization requirements. Simi-larly, distributing computations among pools of ma-chines requires a way to specify a workflow – a collec-tion of tasks that must be completed in a given order –within an application specification.

The complexity of distributed applications variesgreatly from simple, single-process applications toelaborate, parallel applications. Thus the challenge isto define a specification language abstraction that pro-vides enough expressibility for complex distributedapplications, but is not too complicated for single-process computations. In short, the language must besimple enough for novice application developers tounderstand, yet expose enough advanced functionalityto run complex scenarios.

Resource Discovery and Acquisition. Anotherkey abstraction in distributed applications are re-sources. Put simply, resources are computing devicesthat are connected to a network and are capable ofhosting an application. Because resources in distrib-uted environments are often heterogeneous, applica-tion developers naturally want to find the resource setthat best satisfies the demands of their application.Even if hardware is largely homogeneous, dynamic re-source characteristics such as available bandwidth orCPU load can vary over time. The goal of resourcediscovery is to find the best current set of resourcesfor the distributed application as described in the spec-ification. In environments that support dynamic virtualmachine instantiation [5, 30], these resources may notexist in advance. Thus, resource discovery in this caseinvolves finding the appropriate physical machines tohost the virtual machine configurations.

Resource discovery systems often interact directlywith resource acquisition systems. Resource acquisitioninvolves obtaining a lease or permission to use the de-sired resources. Depending on the execution environ-ment, acquisition can take a number of forms. For ex-ample, to support advanced resource reservations as ina batch pool, resource acquisition is responsible forsubmitting a resource request and subsequently obtain-ing a lease from the scheduler. In virtual machine envi-ronments, resource acquisition may involve instantiat-ing virtual machines, verifying their successful creation,and gathering the appropriate information (e.g., IP

address, authentication keys) required for access. Thechallenge facing an application management frameworkis to provide a generic resource-management interface.Ultimately, the complexities associated with creating andgaining access to physical or virtual resources should behidden from the application developer.

Deployment. Upon obtaining an appropriate setof resources, the application-deployment abstractiondefines the steps required to prepare the resourceswith the correct software and data files, and run anynecessary executables to start the application. This in-volves copying, unpacking, and installing the softwareon the target hosts. The application controller mustsupport a variety of different file-transfer mechanismsfor each environment, and should react to failures thatoccur during the transfer of software or in starting exe-cutables.

One important aspect of application deploymentis configuring the requested number of resources withcompatible versions of the software. Ensuring that aminimum number of resources are available and cor-rectly configured for a computation may involve re-questing new resources from the resource discoveryand acquisition systems to compensate for failures thatoccur at startup. Further, many applications requiresome form of synchronization across hosts to guaran-tee that various phases of computation start at approxi-mately the same time. Thus, the application controllermust provide mechanisms for loose synchronization.

Maintenance. Perhaps the most difficult require-ment for managing distributed applications is monitor-ing and maintaining an application after execution be-gins. Thus, another abstraction that the applicationcontroller must define is support for customizable ap-plication maintenance. One key aspect of maintenanceis application and resource monitoring, which involvesprobing hosts for failure due to network outages orhardware malfunctions, and querying applications forindications of failure (often requiring hooks into appli-cation-specific code for observing the progress of anexecution). Such monitoring allows for more specificerror reporting and simplifies the debugging process.

In some cases, system failures may result in a sit-uation where application requirements can no longerbe met. For example, if an application is initially con-figured to be deployed on 50 resources, but only 48can be contacted at a certain point in time, the applica-tion controller should adapt the application, if possi-ble, and continue executing with only 48 machines.Similarly, different applications have different policiesand requirements with respect to failure recovery.Some applications may be able to simply restart afailed process on a single host, while others may re-quire the entire execution to abort in the case of fail-ure. Thus, in addition to the other features previouslydescribed, the application controller should support avariety of options for failure recovery.



Plush: Design and Implementation

We now describe Plush, an extensible distributedapplication controller, designed to address the require-ments of large-scale distributed application manage-ment discussed in the second section.

I/O and Timers

Communication Fabric

Host MonitorBarriers File

Manager Processes

Barrier Block

ProcessMonitor

ResourceDiscovery and

Acquisition Process Block

Workflow Block

Component Block

Application Block

SoftwareResourceManager

User Interface

Barrier Block

Figure 1a: The architecture of Plush. The user interface is shown above the rest of the architecture and containsmethods for interacting with all boxes in the lower sub-systems of Plush. Boxes below the user interface andabove the dotted line indicate objects defined within the application specification abstraction. Boxes below theline represent the core functional units of Plush.

Application BlockApplication Block /app/app

Component Block 1 /app/comp1 Senders

Process Block 1 /app/comp1/proc1 prepare_files.pl

Process Block 2 /app/comp1/proc2join_overlay.pl

Process Block 3 /app/comp1/proc3send_files.pl

Barrier Block 1 /app/comp1/barr1bootstrap_barrier

Component Block 2 /app/comp2 Receivers

Process Block 1 /app/comp2/proc1join_overlay.pl

Process Block 2 /app/comp2/proc2receive_files.pl

Barrier Block 1 /app/comp2/barr1bootstrap_barrier

Figure 1b: Example file-distribution application comprised of application, component, process, and barrier blocks inPlush. Arrows indicate control-flow dependencies (i.e., X → Y implies that X must complete before Y starts).

To directly monitor and control distributed appli-cations, Plush itself must be distributed. Plush uses aclient-server architecture, with clients running on eachresource (e.g., machine) involved in the application.The Plush server, called the controller, interprets inputfrom the user (i.e., the person running the application)and sends messages on behalf of the user over an over-lay network (typically a tree) to Plush clients. The

controller, typically run from the user’s workstation,directs the flow of control throughout the life of thedistributed application. The clients run alongside eachapplication component across the network and per-form actions based upon instructions received fromthe controller.

Figure 1a shows an overview of the Plush con-troller architecture. (The client architecture is symmet-ric to the controller with only minor differences infunctionality.) The architecture consists of three mainsub-systems: the application specification, core func-tional units, and user interface. Plush parses the appli-cation specification provided by the user and stores



internal data structures and objects specific to the ap-plication being run. The core functional units then ma-nipulate and act on the objects defined by the applica-tion specification to run the application. The function-al units also store authentication information, monitorphysical machines, handle event and timer actions,and maintain the communication infrastructure thatenables the controller to query the status of the distrib-uted application on the clients. The user interface pro-vides the functionality needed to interact with the oth-er parts of the architecture, allowing the user to main-tain and manipulate the application during execution.In this section, we describe the design and implemen-tation details of each of the Plush sub-systems.1

Application SpecificationDeveloping a complete, yet accessible, applica-

tion specification language was one of the principalchallenges in this work. Our approach, which hasevolved over the past three years, consists of combina-tions of five different abstractions:

1. Process blocks – Describe the processes exe-cuted on each machine involved in an applica-tion. The process abstraction includes runtimeparameters, path variables, runtime environ-ment details, file and process I/O information,and the specific commands needed to start aprocess on a remote machine.

2. Barrier blocks – Describe the barriers that areused to synchronize the various phases of exe-cution within a distributed application.

3. Workflow blocks – Describe the flow of datain a distributed computation, including how thedata should be processed. Workflow blockmay contain process and barrier blocks. For ex-ample, a workflow block might describe a setof input files over which a process or barrierblock will iterate during execution.

4. Component blocks – Describe the groups of re-sources required to run the application. This in-cludes expectations specific to a set of metricsfor the target resources. In the case of computenodes in a cluster, for example, these metricsmight include maximum load requirements andminimum free memory requirements. Compo-nents also define required software configura-tions, installation instructions, and any authenti-cation information needed to access the re-sources. Component blocks may contain work-flow blocks, process blocks, and barrier blocks.

5. Application blocks – Describe high-level in-formation about a distributed application. Thisincludes one or many component blocks, aswell as attributes to help automate failure re-covery.To better illustrate the use of these blocks in

Plush, consider building the specification for a simple1Note that the components within the sub-systems are high-

lighted using boldface throughout the text in the remainderof this section.

file-distribution application as shown in Figure 1b.This application consists of two groups of machines.One group, the senders, stores the files, and the secondgroup, the receivers, attempt to retrieve the files fromthe senders. The goal of the application is to experi-ment with the use of an overlay network to send filesfrom the senders to the receivers using some new file-distribution protocol. In this example, there are twophases of execution. In the first phase, all senders andreceivers join the overlay before any transfers begin,and the senders must prepare the files for transfer be-fore the receivers start receiving files. In the secondphase, the receivers begin receiving the files. No newsenders or receivers are allowed to join the networkduring the second phase.

The first step in building the corresponding Plushapplication specification for our new file-distributionprotocol is to define an application block. The applica-tion block defines general characteristics about the ap-plication including liveness properties and failure de-tection and recovery options, which determine defaultfailure recovery behavior. For this example, we choosethe behavior ‘‘restart-on-failure,’’ which attempts torestart the failed application instance on a single host,since it is not necessary to abort the entire applicationacross all hosts if only a single failure occurs.

The application block also contains one or manycomponent blocks that describe the groups of re-sources (i.e., machines) required to run the applica-tion. Our application consists of a set of senders and aset of receivers, and two separate component blocksdescribe the two groups of machines. The sender com-ponent block defines the location and installation in-structions for the sender software, and includes au-thentication information to access the resources. Simi-larly, the receiver component block defines the receiv-er software package. In our example, it may be desir-able to require that all machines in the sender grouphave a processor speed of at least 1 GHz, and eachsender should have sufficient bandwidth for sendingfiles to multiple receivers at once. These types of ma-chine-specific requirements are included in the com-ponent blocks. Within each component block, a com-bination of workflow, process, and barrier blocks de-scribe the distributed computation.2

Plush process blocks describe the specific com-mands required to execute the application. Most processblocks depend on the successful installation of softwarepackages defined in the component blocks. Users speci-fy the commands required to start a given process, andactions to take upon process exit. The exit policies cre-ate a Plush process monitor that oversees the executionof a specific process. Our example has several processblocks. In the sender component, process blocks defineprocesses for preparing the files, joining the overlay, and

2Although our example does not use workflow blocks, theyare used in applications where data files must be distributedand iteratively processed.



sending the files. Similarly, the receiver component con-tains process blocks for joining the overlay and receiv-ing the files.

Some applications operate in phases, producingoutput files in early stages that are used as input filesin later stages. To ensure all hosts start each phase ofcomputation only after the previous phase completes,barrier blocks define loose synchronization semanticsbetween process and workflow blocks. In our exam-ple, a barrier ensures that all receivers and senders jointhe overlay in phase one before beginning the filetransfer in phase two. Note that although each barrierblock is uniquely defined within a component block, itis possible for the same barrier to be referenced inmultiple component blocks. We use barrier blocks inour example within each component block that refer tothe same barrier, which means that the application willwait for all receivers and senders to reach the barrierbefore allowing either component to start sending orreceiving files.

In Figure 1b, the outer application block containsour two component blocks that run in parallel (sincethere are no arrows indicating control-flow dependen-cies between them). Within the component blocks, thedifferent phases are separated by the bootstrap barrierthat is defined by barrier block 1. Component block 1,which describes the senders, contains process blocks 1and 2 that define perl scripts that run in parallel duringphase one, synchronize on the barrier in barrier block1, and then proceed to process block 3 in phase twowhich sends the files. Component block 2, which de-scribes the receivers, runs process block 1 in phaseone, synchronizes on the barrier in barrier block 1, andthen proceeds to process block 2 in phase two whichruns the process that receives the files. In our imple-mentation, the blocks are represented by XML that isparsed by the Plush controller when the application isrun. We show an example of the XML later.

We designed the Plush application specificationto support a variety of execution patterns. With theblocks described above, Plush supports the arbitrarycombination of processes, barriers, and workflows,provided that the flow of control between them formsa directed acyclic graph. Using predecessor tags inPlush, users specify the flow of control and definewhether processes run in parallel or sequentially. Ar-rows between blocks in Figure 1b, for example, indi-cate the predecessor dependencies. (Process blocks 1and 2 in component block 1 will run in parallel beforeblocking at the bootstrap barrier, and then the execu-tion will continue on to process block 3 after the boot-strap barrier releases.) Internally, Plush stores theblocks in a hierarchical data structure, and referencesspecific blocks in a manner similar to referencing ab-solute paths in a UNIX file system. Figure 1b showsthe unique path names for each block from our file-distribution example. This naming abstraction alsosimplifies coordination among remote hosts. Each

Plush client maintains an identical local copy of theapplication specification. Thus, for communication re-garding control flow changes, the controller sends theclients messages indicating which ‘‘block’’ is current-ly being executed, and the clients update their localstate information accordingly.

Core Functional UnitsAfter parsing the block abstractions defined by

the user within the application specification, Plush in-stantiates a set of core functional units to perform theoperations required to configure and deploy the dis-tributed application. Figure 1a shows these units asshaded boxes below the dotted line. The functionalunits manipulate the objects defined in the applicationspecification to manage distributed applications. Inthis section, we describe the role of each of theseunits.

Starting at the highest level, the Plush resourcediscovery and acquisition unit uses the resource re-quirements in the component blocks to locate and cre-ate (if necessary) resources on behalf of the user. Theresource discovery and acquisition unit is responsiblefor obtaining a valid set, called a matching, of re-sources that meet the application’s demands. To deter-mine this matching, Plush may either call an existingexternal service to construct a resource pool, such asSWORD or CoMon for PlanetLab, or use a staticallydefined resource pool based on information providedby the user. The Plush resource matcher then uses theresources in the resource pool to create a matching forthe application. All hosts involved in an applicationrun a Plush host monitor that periodically publishesinformation about the host. The resource discoveryand acquisition unit may use this information to helpfind the best matching. Upon acquiring a resource, aPlush resource manager stores the lease, token, orany necessary user credential needed for accessingthat resource to allow Plush to perform actions on be-half of the user in the future.

The remaining functional units in Figure 1a areresponsible for application deployment and mainte-nance. These units connect to resources, install re-quired software, start the execution, and monitor theexecution for failures. One important functional unitused for these operations is the Plush barrier manag-er, which provides advanced synchronization servicesfor Plush and the application itself. In our experience,traditional barriers [17] are not well suited for volatile,wide-area network conditions; the semantics are sim-ply too strict. Instead, Plush uses partial barriers,which are designed to perform better in volatile envi-ronments [1]. Partial barriers ensure that the executionmakes forward progress in the face of failures, and im-prove performance in failure-prone environments us-ing relaxed synchronization semantics.

The Plush file manager handles all files requiredby a distributed application. This unit contains informa-tion regarding software packages, file transfer methods,



installation instructions, and workflow data files. Thefile manager is responsible for preparing the physical re-sources for execution using the information provided bythe application specification. It monitors the status offile transfers and installations, and if it detects an erroror failure, the controller is notified and the resource dis-covery and acquisition unit may be required to find anew host to replace the failed one.

Once the resources are prepared with the neces-sary software, the application deployment phase com-pletes by starting the execution. This is accomplishedby starting a number of processes on remote hosts.Plush processes are defined within process blocks inthe application specification. A Plush process is an ab-straction for standard UNIX processes that run onmultiple hosts. Processes require information aboutthe runtime environment needed for an execution in-cluding the working directory, path, environment vari-ables, file I/O, and the command line arguments.

The two lowest layers of the Plush architectureconsist of a communication fabric and the I/O andtimer subsystems. The communication fabric handlespassing and receiving messages among Plush overlayparticipants. Participants communicate over TCP con-nections. The default topology for a Plush overlay is astar, although we also provide support for tree topolo-gies for increased scalability (discussed later in detail).In the case of a star topology, all clients connect direct-ly to the controller, which allows for quick failure de-tection and recovery. The controller sends messages tothe clients instructing them to perform certain actions.When the clients complete their tasks, they report backto the controller for further direction. The communica-tion fabric at the controller knows what hosts are in-volved in a particular application instance, so that theappropriate messages reach all necessary hosts.

At the bottom of all of the other units is the PlushI/O and timer abstraction. As messages are received inthe communication fabric, message handlers fire events.These events are sent to the I/O and timer layer and en-ter a queue. The event loop pulls events off the queue,and calls the appropriate event handler. Timers are aspecial type of event in Plush that fire at a predefined in-stant.

Fault Tolerance and ScalabilityTwo of the biggest challenges that we encoun-

tered during the design of Plush was being robust tofailures and scaling to hundreds of machines spreadacross the wide-area. In this section we explore faulttolerance and scalability in Plush.

Fault TolerancePlush must be robust to the variety of failures

that occur during application execution. When design-ing Plush, we aimed to provide the functionality need-ed to detect and recover from most failures withoutever needing to involve the user running the applica-tion. Rather than enumerate all possible failures that

may occur, we will discuss how Plush handles threecommon failure classes – process, host, and controllerfailures.

Process failures. When a remote host starts aprocess defined in a process block, Plush attaches aprocess monitor to the process. The role of the processmonitor is to catch any signals raised by the process,and to react appropriately. When a process exits eitherdue to successful completion or error, the processmonitor sends a message to the controller indicatingthat the process has exited, and includes its exit status.Plush defines a default set of behaviors that occur inresponse to a variety of exit codes (although these canbe overridden within an application specification). Thedefault behaviors include ignoring the failure, restart-ing only the failed process, restarting the application,or aborting the entire application.

In addition to process failures, Plush also allowsusers to monitor the status of a process that is still run-ning through a specific type of process monitor calleda liveness monitor, whose goal is to detect misbehav-ing and unresponsive processes that get stuck in loopsand never exit. This is especially useful in the case oflong-running services that are not closely monitoredby the user. To use the liveness monitor, the user spec-ifies a script and a time interval in the process block ofthe application specification. The liveness monitorwakes up once per time interval and runs the script totest for the liveness of the application, returning eithersuccess or failure. If the test fails, the Plush client killsthe process, causing the process monitor to be alertedand inform the controller.

Remote host failures. Detecting and reacting toprocess failures is straightforward since the controlleris able to communicate information to the client re-garding the appropriate recovery action. When a hostfails, however, recovering is more difficult. A hostmay fail for a number of reasons, including networkoutages, hardware problems, and power loss. Underall of these conditions, the goal of Plush is to quicklydetect the problem and reconfigure the applicationwith a new set of resources to continue execution. ThePlush controller maintains a list of the last time suc-cessful communication occurred with each connectedclient. If the controller does not hear from a clientwithin a specified time interval, the controller sends aping to the client. If the controller does not receive aresponse from the client, we assume host failure. Reli-able failure detection is an active area of research;while the simple technique we employ has been suffi-cient thus far, we intend to leverage advances in thisspace where appropriate.

There are three possible actions in response to ahost failure: restart, rematch, and abort. By default, thecontroller tries all three actions in order. The first andeasiest way to recover from a host failure is to simplyreconnect and restart the application on the failed host.



This technique works if the host experiences a tempo-rary power or network outage, and is only unreachablefor a short period of time. If the controller is unable toreconnect to the host, the next option is to rematch in anattempt to replace the failed host with a different host.In this case, Plush reruns the resource matcher to find anew machine. Depending on the application, the entireexecution may need to be restarted across all hosts afterthe new host joins the control overlay, or the executionmay only need to be started on the new host. If the con-troller is unable to find a new host to replace the failedhost, Plush aborts the entire application.

In some applications, it is desirable to mark ahost as failed when it becomes overloaded or experi-ences poor network connectivity. The Plush host mon-itor that runs on each machine is responsible for peri-odically informing the controller about each machine’sstatus. If the controller determines that the perfor-mance is less than the application tolerates, it marksthe host as failed and attempts to rematch. This func-tionality is a preference specified at startup. AlthoughPlush currently monitors host-level metrics includingload and free memory, the technique is easily extendedto encompass sophisticated application-level expecta-tions of host viability.

Controller failures. Because the controller is re-sponsible for managing the flow of control across allconnected clients, recovering from a failure at the con-troller is difficult. One solution is to use a simple pri-mary-backup scheme, where multiple controllers in-crease reliability. All messages sent from the clientsand primary controller are sent to the backup con-trollers as well. If a pre-determined amount of timepasses and the backup controllers do not receive anymessages from the primary, the primary is assumed tohave failed. The first backup becomes the primary,and execution continues.

This strategy has several drawbacks. First, itcauses extra messages to be sent over the network,which limits the scalability of Plush. Second, this ap-proach does not perform well when a network parti-tion occurs. During a network partition, multiple con-trollers may become the primary controller for subsetsof the clients initially involved in the application.Once the network partition is resolved, it is difficult toreestablish consistency among all hosts. While wehave implemented this architecture, we are currentlyexploring other possibilities.

ScalabilityIn addition to fault tolerance, an application con-

troller designed for large-scale environments mustscale to hundreds or even thousands of participants.Unfortunately there is a tradeoff between performanceand scalability. The solutions that perform the best atmoderate scale typically do not scale as well as solu-tions with lower performance. To balance scalabilityand performance, Plush provides users with two topo-logical alternatives.

By default, all Plush clients connect directly tothe controller forming a star topology. This architec-ture scales to approximately 300 remote hosts, limitedby the number of file descriptors allowed per processon the controller machine in addition to the band-width, CPU, and latency required to communicatewith all connected clients. The star topology is easy tomaintain, since all clients connect directly to the con-troller. In the event of a host failure, only the failedhost is affected. Further, the time required for the con-troller to exchange messages with clients is short dueto the direct connections.

At larger scales, network and file descriptor limi-tations at the controller become a bottleneck. To ad-dress this, Plush also supports tree topologies. In aneffort to reduce the number of hops between theclients and the controller, we construct ‘‘bushy’’ trees,where the depth of the tree is small and each node inthe tree has many children. The controller is the rootof the tree. The children of the root are chosen to bewell-connected and historically reliable hosts whenev-er possible. Each child of the root acts as a ‘‘proxycontroller ’’ for the hosts connected to it. These proxycontrollers send invitations and receive joins from oth-er hosts, reducing the total number of messages sentback to the root controller. Important messages, suchas failure notifications, are still sent back to the rootcontroller. Using the tree topology, we have been ableto use Plush to manage an application running on 1000ModelNet [29] emulated hosts, as well as an applica-tion running on 500 PlanetLab clients. We believe thatPlush has the ability to scale by another order of mag-nitude with the current design.

While the tree topology has many benefits overthe star topology, it also introduces several new prob-lems with respect to host failures and tree mainte-nance. In the star topology, a host failure is simple torecover from since it only involves one host. In thetree topology, however, if a non-leaf host fails, allchildren of the failed host must find a new parent. De-pending on the number of hosts affected, a reconfigu-ration involving several hosts often has a significantimpact on performance. Our current implementationtries to minimize the probability of this type of failureby making intelligent decisions during tree construc-tion. For example, in the case of ModelNet, many vir-tual hosts (and Plush clients) reside on the same physi-cal machine. When constructing the tree in Plush, onlyone client per physical machine connects directly tothe controller and becomes the proxy controller. Theremaining clients running on the same physical ma-chine become children of the proxy controller. In thewide area, similar decisions are made by placing hoststhat are geographically close together under the sameparent. This decreases the number of hops and latencybetween leaf nodes and their parent, minimizing thechance of network failures.



Running An Application Using PlushIn this section, we will discuss how the architec-

tural components of Plush interact to run a distributedapplication. When starting Plush, the user’s worksta-tion becomes the controller. The user submits an appli-cation specification to the Plush controller. The con-troller parses the specification, and internally createsthe objects shown above the dotted line in Figure 1a.

Figure 2a: Nebula World View tab showing an application running on PlanetLab. Different colored dots indicatePlanetLab sites in various stages of execution.

Figure 2b: Nebula Application View tab displaying Plush application specification.

After parsing the application specification, thecontroller runs the resource discovery and acquisitionunit to find a suitable set of resources that meet the re-quirements specified in the component blocks. Uponlocating the necessary resources, the resource managerstores the required access and authentication informa-tion. The controller then attempts to connect to each

remote host. If the Plush client is not already running,the controller initiates a bootstrapping procedure tocopy the Plush client binary to the remote host, andthen uses SSH to connect to the remote host and startthe client process. Once the client process is running,the controller establishes a TCP connection to the re-mote host, and transmits an INVITE message to the hostto join the Plush overlay (which is either a star or treeas discussed previously).

If a Plush client agrees to run the application, theclient sends a JOIN message back to the controller ac-cepting the invitation. Next, the controller sends aPREPARE message to the new client, which contains acopy of the application specification (XML represen-tation). The client parses the application specification,



starts a local host monitor, sends a PREPARED mes-sage back to the controller, and waits for further in-struction. Once enough hosts join the overlay andagree to run the application, the controller initiates thebeginning of the application deployment stage bysending a GO message to all connected clients. Thefile managers then begin installing the requested soft-ware and preparing the hosts for execution.

In most applications, the controller instructs thehosts to begin execution after all hosts have completedthe software installation. (Synchronizing the begin-ning of the execution is not required if the applicationdoes not need all hosts to start simultaneously.) Sinceeach client has now created an exact copy of the con-troller ’s application specification, the controller andclients exchange messages about the application’sprogress using the block naming abstraction (i.e.,/app/comp1/proc1) to identify the status of the execu-tion. For barriers, a barrier manager running on thecontroller determines when it is appropriate for hoststo be released from the barriers.

Upon detecting a failure, clients notify the con-troller, and the controller attempts to recover from itaccording to the actions enumerated in the user’s ap-plication specification. Since many failures are appli-cation-specific, Plush exports optional callbacks to theapplication itself to determine the appropriate reactionfor some failure conditions. When the applicationcompletes (or upon a user command), Plush stops allassociated processes, transfers output data back to thecontroller ’s local disk if desired, performs user-speci-fied cleanup actions on the resources, disconnects theresources from the overlay by closing the TCP con-nections, and stops the Plush client processes.

User InterfacePlush aims to support a variety of applications

being run by users with a wide range of expertise inbuilding and managing distributed applications. Thus,Plush provides three interfaces which each provideusers with techniques for interacting with their appli-cations. We describe the functionality of each user in-terface in this section.

Figure 1a shows the user interface above all oth-er parts of Plush. In reality, the user interacts with ev-ery box shown in the figure through the user interface.For example, the user forces the resource discoveryand acquisition unit to find a new set of resources us-ing a terminal command. We designed Plush in thisway to provide maximum control over the application.At any point, the user can override a default Plush be-havior. The effect is a customizable application con-troller that supports a variety of distributed applica-tions.

Graphical User InterfaceIn an effort to simplify the creation of application

specifications and help visualize the status of execu-tions running on resources around the world, we

implemented a graphical user interface for Plush calledNebula. In particular, we designed Nebula (as shownin Figures 2a, 2b, and 3) to simplify the process ofspecifying and managing applications running acrossthe PlanetLab wide-area testbed. Plush obtains datafrom the PlanetLab Central (PLC) database to deter-mine what hosts a user has access to, and Nebula usesthis information to plot the sites on the map. To startusing Nebula, users have the option of building theirPlush application specification from scratch or loadinga preexisting XML document representing an applica-tion specification. Upon loading the application speci-fication, the user runs the application by clicking theRun button from the Plush toolbar, which causes Plushto start locating and acquiring resources.

The main Nebula window contains four tabs thatshow different information about user’s application. Inthe ‘‘World View’’ tab, users see an image of a worldmap with colored dots indicating PlanetLab hosts. Dif-ferent colored dots on the map indicate sites involvedin the current application. In Figure 2a, the dots(which range from red to green on a user’s screen)show PlanetLab sites involved in the current applica-tion. The grey dots are other available PlanetLab sitesthat are not currently being used by Plush. As the ap-plication proceeds through the different phases of exe-cution, the sites change color, allowing the user to vi-sualize the progress of their application. When failuresoccur, the impacted sites turn red, giving the user animmediate visual indication of the problem. Similarly,green dots indicate that the application is executingcorrectly. If a user wishes to establish an SSH connec-tion directly to a particular resource, they can simplyright-click on a host in the map and choose the SSHoption from the pop-up menu. This opens a new tab inNebula containing an SSH terminal to the host. Userscan also mark hosts as failed by right-clicking andchoosing the Fail option from the pop-up menu if theyare able to determine failure more quickly than Plush’sautomated techniques. Failed hosts are completely re-moved from the execution.

Users retrieve more detailed usage statistics andmonitoring information about specific hosts (such asCPU load, free memory, or bandwidth usage) by dou-ble clicking on the individual sites in the map. Thisopens a second window that displays real-time graphsbased on data retrieved from resource monitoringtools, as shown in the bottom right corner of Figure2a. The second smaller window displays a graph ofthe CPU or memory usage, and the status of the appli-cation on each host. Plush currently provides built-insupport for monitoring CoMon [25] data on PlanetLabmachines, which is the source of the CPU and memo-ry data. Additionally, if the user wishes to view theCPU usage or percentage of free memory availableacross all hosts, there is a menu item under the Planet-Lab menu that changes the colors of the dots on themap such that red means high CPU usage or low free



memory, and green indicates low CPU usage and highfree memory. Users can also add and remove hosts totheir PlanetLab account (or slice in PlanetLab termi-nology) directly by highlighting regions of the mapand choosing the appropriate menu option from thePlanetLab menu. Additionally, users can renew theirPlanetLab slice from Nebula.

Figure 3: Nebula Host View tab showing PlanetLab resources. This tab allows users to select multiple hosts at onceand run shell commands on the selected resources. The text-box at the bottom shows the output from the com-mand.

The second tab in the Nebula main window is the‘‘ A p p l i c a t i o n Vi e w.’’ The Application View tab, shownin Figure 2b, allows users to build Plush applicationspecifications using the blocks described previously. Al-ternatively, users may load an existing XML file describ-ing an application specification by choosing the Load Ap-plication menu option under the File menu. There is alsoan option to save a new application specification to anXML file for later use. After creating or loading an ap-plication specification, the Run button located on the Ap-plication View tab starts the application.

The Plush blocks in the application specificationchange to green during the execution of the applica-tion to indicate progress. After an application beginsexecution, users have the option to ‘‘force’’ an appli-cation to skip ahead to the next phase of execution(which corresponds to releasing a synchronizationbarrier), or aborting an application to terminate execu-tion across all resources. Once the application abortsor completes execution, the user may either save theirapplication specification, disconnect from the Plushcommunication mesh, restart the same application, orload and run a new application by choosing the appro-priate option from the File menu.

The third tab is the ‘‘Resource View’’ tab. Thistab is blank until an application starts running. Duringexecution, this tab lists the specific machines that arecurrently involved in the application. If failures occur

during execution, the list of machines is updated dy-namically, such that the Resource View tab alwayscontains an accurate listing of the machines that are inuse. The resources are separated into components, sothat the user knows which resources are assigned towhich tasks in their application.

The fourth tab in Nebula is called the ‘‘HostView’’ tab, shown in Figure 3. This tab contains a ta-ble that displays the hostname of all available re-sources. In the right column, the status of the host isshown. The purpose of this tab is to give users anotheralternative to visualize the status of an executing ap-plication. The status of the host in the right columncorresponds to the color of the dot in the ‘‘WorldView’’ tab. This tab also allows users to run shellcommands simultaneously on several resources, andview the output. As shown in Figure 3, users can se-lect multiple hosts as once, run a command, and theoutput is shown in the text-box at the bottom of thewindow. Note that hosts do not have to be involved inan application in order to take advantage of this fea-ture. Plush will connect to any available resources andrun commands on behalf of the user. Just as in theWorld View tab, right-clicking on hosts in the HostView tab opens a pop-up menu that enables users toSSH directly to the hosts.

Command-line InterfaceMotivated by the popularity and familiarity of

the shell interface in UNIX, Plush further streamlinesthe develop-deploy-debug cycle for distributed appli-cation management through a simple command-lineinterface where users deploy, run, monitor, and debugtheir distributed applications running on hundreds ofremote machines. Plush combines the functionality ofa distributed shell with the power of an application



controller to provide a robust execution environmentfor users to run their applications. From a user’s stand-point, the Plush terminal looks like a shell. Plush sup-ports several commands for monitoring the state of anexecution, as well as commands for manipulating theapplication specification during execution. Table 1shows some of the available commands.

Command Descriptionload <file> Load application specificationconnect <host> Connect to host and start clientdisconnect Close all connections and clientsinfo nodes Print all resource informationinfo mesh Print communication fabric status

infoinfo control Print application control state inforun Start the application (after load)shell <cmd> Run ‘‘cmd’’ on all connected re-

sources

Table 1: Sample Plush terminal commands.

Programmatic InterfaceMany commands that are available via the Plush

command-line interface are also exported via an XML-RPC interface to deliver similar functionality as thecommand-line to those who desire programmatic ac-cess. This allows Plush to be scripted and used for re-mote execution and automated application manage-ment, and also enables the use of external services forresource discovery, creation, and acquisition withinPlush. In addition to the commands that Plush exports,external services and programs may also register them-selves with Plush so that the controller can send call-backs to the XML-RPC client when various actions oc-cur during the application’s execution.

Figure 4 shows the Plush XML-RPC API. Thefunctions shown in the PlushXmlRpcServer class areavailable to users who wish to access Plush program-matically in scripts, or for external resource discoveryand acquisition services that need to add and removeresources from the Plush resource pool. The plushAddNode(HashMap) and plushRemoveNode(string) callsadd and remove nodes from the resource pool, respec-tively. setXmlRpcClientUrl(string) registers XML-RPCclients for callbacks, while plushTestConnection() sim-ply tests the connection to the Plush server and returns‘‘Hello World.’’ The remaining function calls in theclass mimic the behavior of the corresponding com-mand-line operations.

Aside from resource discovery and acquisitionservices, the XML-RPC API allows for the implemen-tation of different user interfaces for Plush. Since al-most all of the Plush terminal commands are availableas XML-RPC function calls, users are free to imple-ment their own customized environment specific userinterface without understanding or modifying the in-ternals of the Plush implementation. This is beneficial

because it gives the users more flexibility to developin the programming language of their choice. Mostmainstream programming languages have support forXML-RPC, and hence users are able to develop inter-faces for Plush in any language, provided that the cho-sen language is capable of handling XML-RPC. Forexample, Nebula is implemented in Java, and uses theXML-RPC interface shown in Figure 4 to interactwith a Plush controller. To increase the functionalityand simplify the development of these interfaces, thePlush XML-RPC server has the ability to make call-backs to programs that register with the Plush con-troller via setXmlRpcClientUrl(string). Some of the morecommon callback functions are shown in the bottomof Figure 4 in class PlushXmlRpcCallback. Note thatthese callbacks are only useful if the programmaticclient implements the corresponding functions.

class PlushXmlRpcServer extends XmlRpcServer {void plushAddNode(HashMap properties);void plushRemoveNode(string hostname);string plushTestConnection();void plushCreateResources();void plushLoadApp(string filename);void plushRunApp();void plushDisconnectApp(string hostname);void plushQuit();void plushFailHost(string hostname);void setXmlRpcClientUrl(string clientUrl);

}

class PlushXmlRpcCallback extends XmlRpcClient {void sendPlanetLabSlices();void sendSliceNodes(string slice);void sendAllPlanetLabNodes();void sendApplicationExit();void sendHostStatus(string host);void sendBlockStatus(string block);void sendResourceMatching(HashMap matching);

}Figure 4: Plush XML-RPC API.

Implementation Details

Plush is a publicly available software packageconsisting of over 60,000 lines of C++ code. Plush de-pends on several C++ libraries, including those pro-vided by xmlrpc-c, curl, xml2, zlib, math, openssl,readline, curses, boost, and pthreads. The command-line interface also depends on packages for lex andyacc (we use flex and bison).

In addition to the main C++ codebase, Plush usesseveral simple perl scripts for interacting with the Plan-etLab Central database and bootstrapping resources.Plush runs on most UNIX-based platforms, includingLinux, FreeBSD, and Mac OS X, and a single Plushcontroller can manage clients running on different oper-ating systems. The only prerequisite for using Plush ona resource is the ability to SSH to the resource. Current-ly Plush is being used to manage applications on



PlanetLab, ModelNet, and Xen virtual machines [5] inour research cluster.

Nebula consists of approximately 25,000 lines ofJava code. Nebula communicates with Plush using theXML-RPC interface. XML-RPC is implemented inNebula using the Apache XML-RPC client and serverpackages. In addition, Nebula uses the JOGL imple-mentation of the OpenGL graphics package for Java.Nebula runs in any computing environment that sup-ports Java, including Windows, Linux, FreeBSD, andMac OS X among others. Note that since Nebula andPlush communicate solely via XML-RPC, it is notnecessary to run Nebula on the same physical machineas the Plush controller.

<?xml_version="1.0"encoding="utf"−8?><plush>

<project_name="sword"><software_name="sword_software" type="tar">

<package_name="sword.tar" type="web"><path>http://plush.ucsd.edu/sword.tar</path><dest>sword.tar</dest>

</package></software><component_name="sword_participants">

<rspec><num_hosts_min="10" max="800"/>

</rspec><resources>

<resource_type="planetlab" group="ucsd_sword"/></resources><software_name="sword_software"/>

</component><application_block_name="sword_app_block" service="1"

reconnect_interval="300"><execution>

<component_block_name="participants"><component_name="sword_participants"/><process_block_name="sword">

<process_name="sword_run"><path>dd/planetlab/run−sword</path>

</process></process_block>

</component_block></execution>

</application_block></project>

<plush>Figure 5: SWORD application specification.

Usage Scenarios

One of the primary goals of our work is to builda generic application management framework thatsupports execution in any execution environment. Thisis mainly accomplished through the Plush resource ab-straction. In Plush, resources are computing devicescapable of hosting applications, such as physical ma-chines, emulated hosts, or virtual machines. To show

that Plush achieves this goal, in this section we take acloser look at specific uses of Plush in different dis-tributed computing environments, including a live de-ployment testbed, an emulated network, and a clusterof virtual machines.

PlanetLab Live DeploymentTo demonstrate Plush’s ability to manage the live

deployment of applications, we revisit our previousexample from the second section and show how Plushmanages SWORD [23] on PlanetLab. Recall thatSWORD is a resource discovery service that relies onhost monitors running on each PlanetLab machine toreport information periodically about their resourceusage. This data is stored in a DHT (distributed hashtable), and later accessed by SWORD clients to re-spond to requests for groups of resources that havespecific characteristics. SWORD is a service thathelps PlanetLab users find the best set of resourcesbased on the priorities and requirements specified, andis an example of a long-running Internet service.

The XML application specification for SWORDis shown in Figure 5. Note that this specification couldbe built using Nebula, in which case the user would



never have to edit the XML directly. The top half ofthe specification in Figure 5 defines the SWORD soft-ware package and the component (resource group) re-quired for the application. Notice that SWORD usesone component consisting of hosts assigned to theucsd_sword PlanetLab slice.

An interesting feature of this component defini-tion is the ‘‘num_hosts’’ tag. Since SWORD is a ser-vice that wants to run on as many nodes as possible, arange of acceptable values is used rather than a singlenumber. In this case, as long as 10 hosts are available,Plush will continue managing SWORD. Since the maxis set to 800, Plush will not look for more than 800 re-sources to host SWORD. Since PlanetLab containsless than 800 hosts, this means that SWORD will at-tempt to run on all PlanetLab resources.

The lower half of the application specificationdefines the application block, component block, andprocess block that describes the SWORD execution.The application block contains a few key features thathelp Plush react to failures more efficiently for long-running services. When defining the application blockobject for SWORD, we include special ‘‘service’’ and‘‘reconnect_interval’’ attributes. The service attributetells the Plush controller that SWORD is a long-run-ning service and requires different default behaviorsfor initialization and failure recovery. For example,during application initialization the controller does notwait for all participants to install the software beforestarting all hosts simultaneously. Instead, the con-troller instructs individual clients to start the applica-tion as soon as they finish installing the software,since there is no reason to synchronize the executionacross all hosts. Further, if a process fails when theservice attribute has been specified, the controller at-tempts to restart SWORD on that host without abort-ing the entire application.

The reconnect_interval specifies the period oftime the controller waits before rerunning the resourcediscovery and acquisition unit. For long running ser-vices, hosts often fail and recover during execution.The reconnect_interval attribute tells the controller tocheck for new hosts that have come alive since the lastrun of the resource discovery unit. The controller alsounsets any hosts that had previously been marked as‘‘failed’’ at this time. This is the controller’s way of‘‘refreshing’’ the list of available hosts. The controllercontinues to search for new hosts until reaching themaximum num_hosts value, which is 800 in our case.

Evaluating Fault ToleranceTo demonstrate Plush’s ability to automatically re-

cover from host failures for long running services, weran SWORD on PlanetLab with 100 randomly chosenhosts, as shown in Figure 6. The host set includes ma-chines behind DSL links as well as hosts from othercontinents. When Plush starts the application, the con-troller starts the Plush client on 100 randomly chosen

PlanetLab machines, and they each begin downloadingthe SWORD software package (38 MB).

It takes approximately 1000 seconds for all hoststo successfully download, install, and start SWORD.At time t = 1250s, we kill the SWORD process on 20randomly chosen hosts to simulate host failure. Nor-mally, Plush would automatically try to restart theSWORD process on these hosts. However, we dis-abled this feature to simulate host failures and force arematching. The remote Plush clients notify the con-troller that the hosts have failed, and the controller be-gins to find replacements for the failed machines. Thereplacement hosts join the Plush overlay and startdownloading the SWORD software. As before, Plushchooses the replacements randomly, and low band-width/high latency links have a great impact on thetime it takes to fully recover from the host failure. Att = 2200s, the service is restored on 100 machines.

0

20

40

60

80

100

0 500 1000 1500 2000 2500

Num

ber

of h

osts

Elapsed time (seconds)

RunningFailed

Figure 6: SWORD running on 100 randomly chosenPlanetLab hosts. At t=1250 seconds, we fail 20hosts. The Plush controller finds new hosts, whostart the Plush client process and begin download-ing and installing the SWORD software. Serviceis fully restored at approximately t=2200 seconds.

Using Plush to manage long-running services likeSWORD alleviates the burden of manually probing forfailures and configuring/reconfiguring hosts. Further,Plush interfaces directly with the PlanetLab CentralAPI, which means that users can automatically addhosts to their slice and renew their slice using Plush.This is beneficial since services typically want to run onas many PlanetLab hosts as possible, including any newhosts that come online. In addition, Plush simplifies thetask of debugging problems by providing a single pointof control for all connected PlanetLab hosts. Thus, if auser wants to view the memory consumption of theirservice across all connected hosts, a single Plush com-mand retrieves this information, making it easier tomaintain and monitor a service running on hundreds ofresources scattered around the world.ModelNet Emulation

Aside from PlanetLab resources, Plush also sup-ports running applications on virtual hosts in emulated



environments. In this section we discuss how Plushsupports using ModelNet [29] emulated resources tohost applications. In addition, we will discuss how abatch scheduler uses the Plush programmatic interfaceto perform remote job execution.

Mission is a simple batch scheduler used to man-age the execution of jobs that run on ModelNet in ourresearch cluster. ModelNet is a network emulation envi-ronment that consists of one or more Linux edge nodesand a set of FreeBSD core machines running a special-ized ModelNet kernel. The code running on the edgehosts routes packets through the core machines, wherethe packets are subjected to the delay, bandwidth, andloss specified in a target topology. A single physical ma-chine hosts multiple ‘‘virtual’’ IP addresses that act asemulated resources on the Linux edge hosts.

To setup the ModelNet computing environmentwith the target topology, two phases of execution are re-quired: deploy and run. Before running any applications,the user must first deploy the desired topology on eachphysical machine, including the FreeBSD core. The de-ploy process essentially instantiates the emulated hosts,and installs the desired topology on all machines. Then,after setting a few environment variables, the user is freeto run applications on the emulated hosts using virtualIP addresses just as applications are run on physical ma-chines using real IP addresses.

A single ModelNet experiment typically con-sumes almost all of the computing resources availableon the physical machines involved. Thus, when run-ning an experiment, it is essential to restrict access tothe machines so that only one experiment is running ata time. Further, there are a limited number of FreeBSDcore machines running the ModelNet kernel available,and access to these hosts must also be arbitrated. Mis-sion is a batch scheduler developed locally to help ac-complish this goal by allowing the resources to be ef-ficiently shared among multiple users. ModelNet userssubmit their jobs to the Mission job queue, and as themachines become available, Mission pulls jobs off thequeue and runs them on behalf of the user, ensuringthat no two jobs are run simultaneously.

A Mission job submission has two components:a Plush application specification and resource directo-ry file. For ModelNet, the directory file contains infor-mation about both the physical and virtual (emulated)resources on which the ModelNet experiment will run.In the resource directory file, some entries include twoextra parameters, ‘‘vip’’ and ‘‘vn’’, which define thevirtual IP address and virtual number (similar to ahostname) for the emulated resources. In addition tothe directory file that is used to populate the Plush re-source pool, users also submit an application specifi-cation describing the application they wish to run onthe emulated topology to the Mission server.

The application specification submitted to Mis-sion contains two component blocks separated by a

synchronization barrier. The first component block de-scribes the processes that run on the physical ma-chines during the deployment phase (where the emu-lated topology is instantiated). The second componentblock defines the processes associated with the targetapplication. When the controller starts the Plush clientson the emulated hosts, it specifies extra command linearguments that are defined in the directory file by the‘‘ v i p ’’ and ‘‘vn’’ attributes. This sets the appropriateModelNet environment variables, ensuring that allcommands run on that client on behalf of the user in-herit those settings.

When a user submits a Plush application specifi-cation and directory file to Mission, the Mission serverparses the directory file to identify which resourcesare needed to host the application. When those re-sources become available for use, Mission starts aPlush controller on behalf of the user using the PlushXML-RPC interface. Mission passes Plush the direc-tory file and application specification, and continuesto interact throughout the execution of the applicationvia XML-RPC. After Plush notifies Mission that theexecution has ended, Mission kills the Plush processand reports back to the user with the results. Any ter-minal output that is generated is emailed to the user.

Plush jobs are currently being submitted to Mis-sion on a daily basis at UCSD. These jobs include ex-perimental content distribution protocols, distributedmodel checking systems, and other distributed appli-cations of varying complexity. Mission users benefitfrom Plush’s automated execution capabilities. Userssimply submit their jobs to Mission and receive anemail when their task is complete. They do not have tospend time configuring their environment or startingthe execution. Individual host errors that occur duringexecution are aggregated into one message and re-turned back to the user in the email. Logfiles are col-lected in a public directory on a common file systemand labeled with a job ID, so that users are free to in-spect the output from individual hosts if desired.

Virtual Machine Deployment

In all of the examples discussed above, the poolof resources available to Plush is known at startup. Inthe PlanetLab examples, Plush uses slice informationto determine the set of user-accessible hosts. For Mod-elNet, the emulated topology includes specific infor-mation about the virtual hosts to be created and thisinformation is passed to Plush in the directory file. Wenext describe how Plush manages applications in envi-ronments without fixed sets of machines, but ratherunderlying capabilities to create and destroy resourceson demand.

Shirako [16] is a utility computing framework.Through programmatic interfaces, Shirako allows usersto create dynamic on-demand clusters of resources, in-cluding storage, network paths, physical servers, and vir-tual machines. Shirako is based on a resource leasing



abstraction, enabling users to negotiate access to re-sources. Usher [21] is a virtual machine scheduling sys-tem for cluster environments. It allows users to createtheir own virtual machines or clusters. When a user re-quests a virtual machine, Usher uses data collected byvirtual machine monitors to make informed decisionsabout when and where the virtual machine should run.

We have extended Plush to interface with bothShirako and Usher. Through its XML-RPC interface,Plush interacts with the Shirako and Usher servers. Asresources are created and destroyed, the resource poolin Plush is updated to include the current set of leasedresources. Using this dynamic resource pool, Plushmanages applications running on potentially tempo-rary virtual machines in the same way that applica-tions are managed in static environments like Planet-Lab. Thus, using the resource abstractions provided byPlush, users are able to run their applications on Plan-etLab, ModelNet, or on clusters of virtual machineswithout ever having to worry about the underlying de-tails of the environment.

<?xml_version="1.0" encoding="utf−8"?><plush>

<project_name="simple"><component_name="Group1">

<rspec><num_hosts>10</num_hosts><shirako>

<num_hosts>10</num_hosts><type>1</type><memory>200</memory><bandwidth>200</bandwidth><cpu>50</cpu><lease_length>600</lease_length><server>http://shirako.cs.duke.edu:20000</server>

</shirako></rspec><resources>

<resource_type="ssh" group="shirako"/></resources>

</component></project>

</plush>Figure 7: Plush component definition containing Shirako resources. The resource description contains a lease pa-

rameter which tells Shirako how long the user needs the resources.

To support dynamic resource creation and man-agement, we augment the Plush application specifica-tion with a description of the desired virtual machinesas shown in Figure 7. Specifically, the Plush applica-tion specification needs to include information aboutthe desired attributes of the resources so that this in-formation can be passed on to either Shirako or Usher.Shirako and Usher currently create Xen [5] virtual ma-chines (as indicated by the ‘‘type’’ flag in the resourcedescription) with the CPU speed, memory, disk space,and maximum bandwidth specified in the resource re-quest. As the Plush controller parses the application

specification, it stores the resource description. Thenwhen the create resource command is issued either viathe terminal interface or programmatically throughXML-RPC, Plush contacts the appropriate Shirako orUsher server and submits the resource request. Oncethe resources are ready for use, Plush is informed viaan XML-RPC callback that also contains contact in-formation about the new resources. This callback up-dates the Plush resource pool and the user is free tostart applications on the new resources by issuing therun command to the Plush controller.

Though the integration of Plush and Usher is stillin its early stages, Plush is being used by Shirakousers regularly at Duke University. While Shirakomultiplexes resources on behalf of users, it does notprovide any abstractions or functionality for using theresources once they have been created. On the otherhand, Plush provides abstractions for managing dis-tributed applications on remote machines, but providesno support for multiplexing resources. A ‘‘resource’’is merely an abstraction in Plush to describe a machine(physical or virtual) that can host a distributed applica-tion. Resources can be added and removed from theapplication’s resource pool, but Plush relies on exter-nal mechanisms (like Shirako and Usher) for the cre-ation and destruction of resources.

The integration of Shirako and Plush allowsusers to seamlessly leverage the functionality of bothsystems. While Shirako provides a web interface forcreating and destroying resources, it does not providean interface for using the new resources, so Shirakousers benefit from the interactivity provided by thePlush shell. Researchers at Duke are currently using



Plush to orchestrate workflows of batch tasks and per-form data staging for scientific applications includingBLAST [3] on virtual machine clusters managed byShirako [14].

Related Work

The functionality that Plush provides is related towork in a variety of areas. With respect to remote jobexecution, there are several tools available that pro-vide a subset of the features that Plush supports, in-cluding cfengine [9], gexec [10], and vxargs [20]. Thedifference between Plush and these tools is that Plushprovides more than just remote job execution. Plushalso supports mechanisms for failure recovery, and au-tomatic reconfiguration due to changing conditions. Ingeneral, the pluggable aspect of Plush allows for theuse of existing tools for actions like resource discov-ery and allocation, which provides more advancedfunctionality than most remote job execution tools.

From the user’s point of view, the Plush com-mand-line is similar to distributed shell systems suchas GridShell [31] and GCEShell [22]. These tools pro-vide a user-friendly language abstraction layer thatsupport script processing. Both tools are designed towork in Grid environments. Plush provides a similarfunctionality as GridShell and GCEShell, but unlikethese tools, Plush works in a variety of environments.

In addition to remote job execution tools and dis-tributed shells, projects like the PlanetLab ApplicationManager (appmanager) [15] and SmartFrog [13] focusspecifically on managing distributed applications. app-manager is a tool for maintaining long running ser-vices and does not provide support for short-lived exe-cutions. SmartFrog [13] is a framework for describing,deploying, and controlling distributed applications. Itconsists of a collection of daemons that manage dis-tributed applications and a description language to de-scribe the applications. Unlike Plush, SmartFrog is anot a turnkey solution, but rather a framework forbuilding configurable systems. Applications must ad-here to a specific API to take advantage of Smart-Frog’s features.

There are also several commercially availableproducts that perform functions similar to Plush.Namely, Opsware [24] and Appistry [4] provide soft-ware solutions for distributed application manage-ment. Opsware System 6 allows customers to visual-ize many aspects of their systems, and automates soft-ware management of complex, multi-tiered applica-tions. The Appistry Enterprise Application Fabricstrives to deliver application scalability, dependability,and manageability in grid computing environments. Incomparison to Plush, both of these tools focus moreon enterprise application versioning and package man-agement, and less on providing support for interactingwith experimental distributed systems.

The Grid community has several application man-agement projects with goals similar to Plush, including

Condor [8] and GrADS/vGrADS [7]. Condor is a work-load management system for compute-intensive jobsthat is designed to deploy and manage distributed exe-cutions. Where Plush is designed to deploy and managenaturally distributed tasks with resources spread acrossseveral sites, Condor is optimized for leveraging under-utilized cycles in desktop machines within an organiza-tion where each job is parallelizable and compute-bound. GrADS/vGrADS [7] provides a set of program-ming tools and an execution environment for easingprogram development in computational grids. GrADSfocuses specifically on applications where resource re-quirements change during execution. The task deploy-ment process in GrADS is similar to Plush. Once the ap-plication starts execution, GrADS maintains resource re-quirements for compute intensive scientific applicationsthrough a stop/migrate/restart cycle. Plush, on the otherhand, supports a far broader range of recovery actions.

Within the realm of workflow management, thereare tools that provide more advanced functionalitythan Plush. For example, GridFlow [11], Kepler [19],and the other tools described in [32] are designed foradvanced workflow management in Grid environ-ments. The main difference between these tools andPlush is that they focus solely on workflow manage-ment schemes. Thus they are not well suited for man-aging applications that do not contain workflows, suchas long-running services.

Lastly, the Globus Toolkit [12] is a frameworkfor building Grid systems and applications, and is per-haps the most widely used software package for Griddevelopment. Some components of Globus provide asimilar functionality as Plush. With respect to our ap-plication specification language, the Globus ResourceSpecification Language (RSL) provides an abstractlanguage for describing resources that is similar in de-sign to our language. The Globus Resource AllocationManager (GRAM) processes requests for resources,allocates the resources, and manages active jobs inGrid environments, providing much of the same func-tionality as Plush does. The biggest difference be-tween Plush and Globus is that Plush provides a user-friendly shell interface where users directly interactwith their applications. Globus, on the other hand, is aframework, and each application must use the APIs tocreate the desired functionality. In the future, we planto integrate Plush with some of the Globus tools, suchas GRAM and RSL. In this scenario Plush will act as afront-end user interface for the tools available inGlobus.

Conclusion

Plush is an extensible application control infra-structure designed to meet the demands of a variety ofdistributed applications. Plush provides abstractionsfor resource discovery and acquisition, software instal-lation, process execution, and failure recovery in dis-tributed environments. When an error is detected,



Plush has the ability to perform several application-specific actions, including restarting the computation,finding a new set of resources, or attempting to adaptthe application to continue execution and maintainliveness. In addition, Plush provides new relaxed syn-chronization primitives that help applications achievegood throughput even in unpredictable wide-area con-ditions where traditional synchronization primitivesare too strict to be effective.

Plush is in daily use by researchers worldwide, anduser feedback has been largely positive. Most users findPlush to be an ‘‘extremely useful tool’’ that provides auser-friendly interface to a powerful and adaptable appli-cation control infrastructure. Other users claim thatPlush is ‘‘flexible enough to work across many admin-istrative domains (something that typical scripts do notdo).’’ Further, unlike many related tools, Plush doesnot require applications to adhere to a specific API,making it easy to run distributed applications in a vari-ety of environments. Our users tell us that Plush is‘‘fairly easy to get installed and setup on a new ma-chine. The structure of the application specificationlargely makes sense and is easy to modify and adapt.’’

Although Plush has been in development forover three years now, we still have some features thatneed improvement. One important area for future en-hancements is error reporting. Debugging applicationsis inherently difficult in distributed environments. Wetry to make it easier for researchers using Plush to lo-cate and diagnose errors, but this is a difficult task. Forexample, one user says that ‘‘when things go wrongwith the experiment, it’s often difficult to figure outwhat happened. The debug output occasionally doesnot include enough information to find the source ofthe problem.’’ We are currently investigating ways toallow application specific error reporting, and ulti-mately simplify the task of debugging distributed ap-plications in volatile environments.

Author Biographies

Jeannie Albrecht is an Assistant Professor ofComputer Science at Williams College in Williams-town, Massachusetts. She received her Ph.D. in Com-puter Science from the University of California, SanDiego in June 2007 under the supervision of AminVahdat and Alex C. Snoeren.

Ryan Braud is a fourth-year doctoral student atthe University of California, San Diego where heworks under the direction of Amin Vahdat in the Sys-tems and Networking research group. He received hisB.S. in Computer Science and Mathematics from theUniversity of Maryland, College Park in 2004.

Darren Dao is a graduate student at the Universi-ty of California, San Diego where he works under thedirection of Amin Vahdat in the Systems and Net-working research group. He received his B.S. in Com-puter Science from the University of California, SanDiego in 2006.

Nikolay Topilski is a graduate student at the Uni-versity of California, San Diego where he works underthe direction of Amin Vahdat in the Systems and Net-working research group. He received his B.S. in Com-puter Science from the University of California, SanDiego in 2002.

Christopher Tuttle is a Software Engineer atGoogle in Mountain View, California. He received hisM.S. in Computer Science from the University of Cal-ifornia, San Diego in December 2005 under the super-vision of Alex C. Snoeren.

Alex C. Snoeren is an Assistant Professor in theDepartment of Computer Science and Engineering atthe University of California, San Diego. He receivedhis Ph.D. in Electrical Engineering and Computer Sci-ence from the Massachusetts Institute of Technologyin 2003 under the supervision of Hari Balakrishnanand M. Frans Kaashoek.

Amin Vahdat is a Professor in the Department ofComputer Science and Engineering and the Director ofthe Center for Networked Systems at the University ofCalifornia, San Diego. He received his Ph.D. in Com-puter Science from the University of California, Berke-ley in 1998 under the supervision of Thomas Anderson.Before joining UCSD in January 2004, he was on thefaculty at Duke University from 1999-2003.

Bibliography

[1] Albrecht, J., C. Tuttle, A. C. Snoeren, and A.Vahdat, ‘‘Loose Synchronization for Large-ScaleNetworked Systems,’’ Proceedings of the USENIXAnnual Technical Conference (USENIX), 2006.

[2] Albrecht, J., C. Tuttle, A. C. Snoeren, and A. Vah-dat, ‘‘PlanetLab Application Management UsingPlush,’’ ACM Operating Systems Review (OSR),Vo l . 40, Num. 1, 2006.

[3] Altschul, S. F., W. Gish, W. Miller, E. W. Myers,and D. J. Lipman, ‘‘Basic Local Alignment SearchTo o l , ’’ Journal of Molecular Biology, Vol. 215,1990.

[4] Appistry, http://www.appistry.com/ .[5] Barham, P., B. Dragovic, K. Fraser, S. Hand, T.

Harris, A. Ho, R. Neugebauer, I. Pratt, and A.Warfield, ‘‘Xen and the Art of Virtualization,’’Proceedings of the ACM Symposium on Operat-ing System Principles (SOSP), 2003.

[6] Bavier, A., M. Bowman, B. Chun, D. Culler, S.Karlin, S. Muir, L. Peterson, T. Roscoe, T. Spalink,and M. Wawrzoniak, ‘‘Operating Systems Supportfor Planetary-Scale Network Services,’’ Proceed-ings of the ACM/USENIX Symposium on Net-worked Systems Design and Implementation (NS-DI), 2004.

[7] Berman, F., H. Casanova, A. Chien, K. Cooper,H. Dail, A. Dasgupta, W. Deng, J. Dongarra, L.Johnsson, K. Kennedy, C. Koelbel, B. Liu, X.Liu, A. Mandal, G. Marin, M. Mazina, J. Mellor-Crummey, C. Mendes, A. Olugbile, M. Patel, D.



Reed, Z. Shi, O. Sievert, H. Xia, and A. YarKhan,‘‘New Grid Scheduling and Rescheduling Meth-ods in the GrADS Project,’’ International Journalof Parallel Programming (IJPP), Vol. 33, Num.2-3, 2005.

[8] Bricker, A., M. Litzkow, and M. Livny, ‘‘CondorTe c h n i c a l Summary,’’ Te c h n i c a l Report 1069,University of Wisconsin-Madison, CS Depart-ment, 1991.

[9] Burgess, M., ‘‘Cfengine: A Site ConfigurationEngine,’’ USENIX Computing Systems, Vol. 8,Num. 3, 1995.

[10] Chun, B., gexec, http://www.theether.org/gexec/ .[11] Coa, J., S. Jarvis, S. Saini, and G. Nudd, ‘‘Grid-

Flow: Workflow Managament for Grid Comput-ing,’’ Proceedings of the IEEE International Sym-posium on Cluster Computing and the Grid (CC-Grid), 2003.

[12] Foster, I., A Globus Toolkit Primer, 2005, http://www.globus.org/toolkit/docs/4.0/key/GT4_Primer_0.6.pdf .

[13] Goldsack, P., J. Guijarro, A. Lain, G. Mecheneau,P. Murray, and P. Toft, ‘‘SmartFrog: Configura-tion and Automatic Ignition of Distributed Appli-cations,’’ HP Openview University AssociationConference (HP OVUA), 2003.

[14] Grit, L., D. Irwin, V. Marupadi, P. Shivam, A.Yumerefendi, J. Chase, and J. Albrecht, ‘‘Har-nessing Virtual Machine Resource Control forJob Management,’’ Proceedings of the Workshopon System-level Virtualization for High Perfor-mance Computing (HPCVirt), 2007.

[15] Huebsch, R., PlanetLab Application Manager,http://appmanager.berkeley.intel-research.net .

[16] Irwin, D., J. Chase, L. Grit, A. Yumerefendi, D.Becker, and K. G. Yocum, ‘‘Sharing NetworkedResources with Brokered Leases,’’ Proceedingsof the USENIX Annual Technical Conference(USENIX), 2006.

[17] Jordan, H. F., ‘‘A Special Purpose Architecturefor Finite Element Analysis,’’ Proceedings of theInternational Conference on Parallel Processing(ICPP), 1978.

[18] Ludtke, S., P. Baldwin, and W. Chiu, ‘‘EMAN:Semiautomated Software for High-Resolution Sin-gle-Particle Reconstructions,’’ Journal of Struc-tural Biology, Vol. 122, 1999.

[19] Ludäscher, B., I. Altintas, C. Berkley, D. Higgins,E. Jaeger-Frank, M. Jones, E. A. Lee, J. Tao, andY. Zhao, ‘‘Scientific Workflow Management andthe Kepler System,’’ Concurrency and Computa-tion: Practice and Experience, Special Issue onScientific Workflows (CC: P&E), Vol. 18, Num.10, 2005.

[20] Mao, Y., vxargs, http://dharma.cis.upenn.edu/planetlab/vxargs/ .

[21] McNett, M., D. Gupta, A. Vahdat, and G. M.Voelker, ‘‘Usher: An Extensible Framework for

Managing Clusters of Virtual Machines,’’ Pro-ceedings of the USENIX Large Installation Sys-tem Administration Conference (LISA), 2007.

[22] Nacar, M. A., M. Pierce, and G. C. Fox, ‘‘Devel-oping a Secure Grid Computing EnvironmentShell Engine: Containers and Services,’’ Neural,Parallel, and Scientific Computations (NPSC),Vol. 12, 2004.

[23] Oppenheimer, D., J. Albrecht, D. Patterson, andA. Vahdat, ‘‘Design and Implementation Trade-offs for Wide-Area Resource Discovery,’’ Pro-ceedings of the IEEE Symposium on High Perfor-mance Distributed Compuuting (HPDC), 2005.

[24] Opsware, http://www.opsware.com/ .[25] Park, K. and V. S. Pai, ‘‘CoMon: A Mostly-Scal-

able Monitoring System for PlanetLab,’’ ACMOperating Systems Review (OSR), Vol. 40, Num.1, 2006.

[26] Peterson, L., T. Anderson, D. Culler, and T.Roscoe, ‘‘A Blueprint for Introducing DisruptiveTechnology into the Internet,’’ Proceedings of theACM Workshop on Hot Topics in Networks (Hot-Nets), 2002.

[27] Plush, http://plush.ucsd.edu .[28] Ritchie, D. M. and K. Thompson, ‘‘The UNIX

Time-Sharing System,’’ Communications of theAssociation for Computing Machinery (CACM),Vol. 17, Num. 7, 1974.

[29] Vahdat, A., K. Yocum, K. Walsh, P. Mahadevan,D. Kostic, J. Chase, and D. Becker, ‘‘Scalabilityand Accuracy in a Large-Scale Network Emula-tor,’’ Proceedings of the ACM/USENIX Sympo-sium on Operating System Design and Implemen-tation (OSDI), 2002.

[30] Waldspurger, C. A., ‘‘Memory Resource Manage-ment in VMware ESX Server,’’ Proceedings ofthe ACM/USENIX Symposium on Operating Sys-tem Design and Implementation (OSDI), 2002.

[31] Walker, E., T. Minyard, and J. Boisseau, ‘‘Grid-Shell: A Login Shell for Orchestrating and Coor-dinating Applications in a Grid Enabled Environ-ment,’’ Proceedings of the International Confer-ence on Computing, Communications and Con-trol Technologies (CCCT), 2004.

[32] Yu, J. and R. Buyya, ‘‘A Taxonomy of WorkflowManagement Systems for Grid Computing,’’ Jour-nal of Grid Computing (JGC), Vol. 3, Num. 3-4,2005.


remote control: distributed application configuration, … · resource discovery systems often...

Documents