parallel unconcentrated management for applications (puma ... unconcentrated... · distributed...

14
Parallel Unconcentrated Management for Applications (PUMA) Applied to Dialog Systems Kevin Jesse UC Davis [email protected] Austin Chau UC Davis [email protected] Abstract Dialog systems are versatile, providing services from weather, alarms, movie recommendations, traffic, advice and more. Their functionality de- pends on the underlying micro-services, which are interchangeable applications running in real time. Since most modern dialog systems are hosted on network clusters, these micro-services must be able to scale under a distributed environment with min- imal impact on performance and data integrity. Furthermore, micro-services must not compete for resources or impact performance of other micro- services. This paper proposes a new framework which adopts existing distributed system managers to scale applications, like dialog systems, while maintaining programmability. To test the effective- ness of this new framework, PUMA, we developed a movie recommendation dialog service that runs exclusively on the proposed framework. This im- plementation is then put through a series of bench- marks that simulates high service usage in a va- riety of capacities. We expect PUMA to outper- form existing designs and be ideal for applications with many independent micro-services such as dia- log systems. 1 Introduction dialog systems are extremely useful by the skills they present to a user. An interaction consists of several utterances that may illicit the help of several micro-services. These micro- services must scale and perform with minimal latency in or- der for the core service to return a timely response. These services often operate in the cloud where the largest obstacles are availability, performance unpredictability, and auto scal- ing [Armbrust et al., 2010]. While these obstacles are central to the quality of the service, they are often neglected in favor implementation simplicity. Novel dialog systems, like Tick- Tock, [Yu et al., 2015], are naively implemented to run on a single Linux instance, despite intentions to integrate with high throughput systems. We continue to see this trend as ap- plications like dialog systems increase complexity both com- putationally and as a service [Gustafson et al., 2000], [Levin et al., 1998], [Henderson et al., 2013]. Moreover, these sys- tems are not built with scalability, availability, concurrency, and parallelized computation in mind at the application level. This needs to be addressed. PUMA is inspired from Amazon Lambda and Elastic Load Balancing services. Complex dialog systems of today like Alexa are built on serverless platforms with instant scaling response on triggers. Code runs parallel for requests and triggers individually, scaling only for the size of the work- load; this is extremely cost efficient [Villamizar et al., 2016]. While practical for simple and isolation driven architectures, this platform suffers from several drawbacks. Namely, inher- ent portability complexity of existing code for stateless server design, a lack of visibility in scaling and load balancing ser- vices, and a commitment to using AWS. Organizations are in need of an open source application scalar and load balancer that is IaaS/PaaS independent. Moreover, scaling monolithic servers running multiple services on demand is not feasible on traditional providers like RackSpace and AWS [Villamizar et al., 2015]. Scaling traditional monolithic applications con- tinues to be a challenge because they typically offer a variety of services, some more popular than others. If popular or high utilization services need to be scaled because of high demand, the entire set of services must also be scaled [Villamizar et al., 2015]. PUMA attempts to alleviate monolithic service scal- ing and load balancing difficulties by enabling service defini- tion. PUMA proxies requests for application defined micro- services that are scaled and balanced automatically. PUMA can run on both single instance servers and in a distributed manner using commodity nodes. PUMA automatically enables distributed computing re- sources to the application, proxies micro-services, ensures availability, and provides computational resource responsibil- ity. PUMA routes core functionality of standalone applica- tions across distributed computing nodes. Nodes can execute requests and application defined services in parallel and co- ordinate high intensity workloads by automatically spawning more nodes. Application defined services are not limited to services accessed only by the end user, but all areas of com- putation where distributed computing can be leveraged. Influ- enced by Google’s MapReduce [Dean and Ghemawat, 2008], PUMA aims to enable data-segmented parallelized computa- tion on dynamically spawnable CPU nodes. This framework prevents blocking of new services by ensuring enough com- putational resources exist for new requests. Thus, the CPU

Upload: others

Post on 24-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

Parallel Unconcentrated Management for Applications (PUMA)Applied to Dialog Systems

Kevin JesseUC Davis

[email protected]

Austin ChauUC Davis

[email protected]

AbstractDialog systems are versatile, providing servicesfrom weather, alarms, movie recommendations,traffic, advice and more. Their functionality de-pends on the underlying micro-services, which areinterchangeable applications running in real time.Since most modern dialog systems are hosted onnetwork clusters, these micro-services must be ableto scale under a distributed environment with min-imal impact on performance and data integrity.Furthermore, micro-services must not compete forresources or impact performance of other micro-services. This paper proposes a new frameworkwhich adopts existing distributed system managersto scale applications, like dialog systems, whilemaintaining programmability. To test the effective-ness of this new framework, PUMA, we developeda movie recommendation dialog service that runsexclusively on the proposed framework. This im-plementation is then put through a series of bench-marks that simulates high service usage in a va-riety of capacities. We expect PUMA to outper-form existing designs and be ideal for applicationswith many independent micro-services such as dia-log systems.

1 Introductiondialog systems are extremely useful by the skills they presentto a user. An interaction consists of several utterances thatmay illicit the help of several micro-services. These micro-services must scale and perform with minimal latency in or-der for the core service to return a timely response. Theseservices often operate in the cloud where the largest obstaclesare availability, performance unpredictability, and auto scal-ing [Armbrust et al., 2010]. While these obstacles are centralto the quality of the service, they are often neglected in favorimplementation simplicity. Novel dialog systems, like Tick-Tock, [Yu et al., 2015], are naively implemented to run ona single Linux instance, despite intentions to integrate withhigh throughput systems. We continue to see this trend as ap-plications like dialog systems increase complexity both com-putationally and as a service [Gustafson et al., 2000], [Levinet al., 1998], [Henderson et al., 2013]. Moreover, these sys-

tems are not built with scalability, availability, concurrency,and parallelized computation in mind at the application level.This needs to be addressed.

PUMA is inspired from Amazon Lambda and Elastic LoadBalancing services. Complex dialog systems of today likeAlexa are built on serverless platforms with instant scalingresponse on triggers. Code runs parallel for requests andtriggers individually, scaling only for the size of the work-load; this is extremely cost efficient [Villamizar et al., 2016].While practical for simple and isolation driven architectures,this platform suffers from several drawbacks. Namely, inher-ent portability complexity of existing code for stateless serverdesign, a lack of visibility in scaling and load balancing ser-vices, and a commitment to using AWS. Organizations are inneed of an open source application scalar and load balancerthat is IaaS/PaaS independent. Moreover, scaling monolithicservers running multiple services on demand is not feasibleon traditional providers like RackSpace and AWS [Villamizaret al., 2015]. Scaling traditional monolithic applications con-tinues to be a challenge because they typically offer a varietyof services, some more popular than others. If popular or highutilization services need to be scaled because of high demand,the entire set of services must also be scaled [Villamizar et al.,2015]. PUMA attempts to alleviate monolithic service scal-ing and load balancing difficulties by enabling service defini-tion. PUMA proxies requests for application defined micro-services that are scaled and balanced automatically. PUMAcan run on both single instance servers and in a distributedmanner using commodity nodes.

PUMA automatically enables distributed computing re-sources to the application, proxies micro-services, ensuresavailability, and provides computational resource responsibil-ity. PUMA routes core functionality of standalone applica-tions across distributed computing nodes. Nodes can executerequests and application defined services in parallel and co-ordinate high intensity workloads by automatically spawningmore nodes. Application defined services are not limited toservices accessed only by the end user, but all areas of com-putation where distributed computing can be leveraged. Influ-enced by Google’s MapReduce [Dean and Ghemawat, 2008],PUMA aims to enable data-segmented parallelized computa-tion on dynamically spawnable CPU nodes. This frameworkprevents blocking of new services by ensuring enough com-putational resources exist for new requests. Thus, the CPU

Page 2: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

is available for requests, while leaving heavy lifting such asdistributed matrix factorization [Gemulla et al., 2011], neuralnetwork training [Dean et al., 2012], and classification tasksto scalable computing workers.

These workers are delegated by the load supervisor.PUMA leverages Envoy, an open source L7 service proxyand communication bus [Klein, 2016]. By using Envoy asthe load supervisor, PUMA can efficiently manage a meshof nodes and service endpoints. Envoy runs natively withany application language, providing a rich service architec-ture in a latency-optimized manner [Klein, 2016]; ideal forreal time applications such as dialog systems. Envoy also pro-vides gRPC support, ideal for encapsulating code for remoteexecution [Klein, 2016]. PUMA is designed for a centralizedapplication designed service architecture, that is scalable andaccessible with API-like connectors to key features and ser-vices like database operations and background service oper-ations. With Envoy’s abstracted upstream mesh and simpleservice architecture, PUMA does the heavy lifting by con-figuring Envoy to the applications specifications. Integrationsimplicity for existing architectures is a central focus of thiswork, and to evaluate this metric, PUMA was tested on an ex-isting monolithic dialog system. This system utilizes severalmarried services that when configured with PUMA, could bedecentralized and optimized to the dynamic workload, run-ning and scaling independently. This system was then testedon a series of benchmarks that measured latency, availability,adaptation to unpredictable workloads, and scalability.

Portability is a central focus of this work. Most applica-tions are first developed as a standalone application and whenscaled to distributed computing environments, like AmazonLambda, there is a significant modifications required. Addi-tionally, designing a new standalone service with future dis-tributed computing design in mind, hurts deployment timeand increase expense of development. As distributed com-puting advances or service agreements change, legacy archi-tectures embed in applications must be supported. PUMAabstracts the distributed computing software from the appli-cation and fundamental design choices translates directly intoeasy maintainability throughout the product life cycle. Ourframework achieves high portability and maintainability onour sample monolithic dialog system while increasing perfor-mance by scaling and balancing various application definedservices.

Some key questions in evaluating this framework are: whatare implementation costs, how difficult is programmability,and how much of an improvement over a traditional non-distributed micro service implementations. Precisely, howmany lines of code is required to port an existing monolithicapplication and what is the independent service speedup.In deciding what metrics to use for programmability, werelied on operating system techniques such as measuring thenumber of lines of modified code. Due to ease of migrationand implementation, this framework can be adopted in var-ious domains for any application. To measure independentservice speedup, we evaluate common data computations onPUMA that can simulate potential speedup for more complexoperations like matrix factorization.

Figure 1: PUMA System Framework PUMA configuresstandalone applications to interface with application definedservices in Envoy. PUMA configured applications serve re-quests by proxy to distributed computing nodes. Envoy loadbalances requests with commodity hardware to provide ser-vice availability. A lightweight PUMA python library isavailable to provide API-like communication with applica-tion defined services, thus providing parallel computing on adistributed mesh.

The contributions of this paper are,

• A framework for automatically configuring standalonemonolithic applications to a set of application definedmicro-services capable of load balancing and scaling ac-cording to the workload

• A set of benchmarks that demonstrate the effectivenessof such framework on computationally expensive tasks

• A ported monolithic style application with less than 15lines of altered code.

We published our source code1 and maintain a running in-stance2 of the ported chatbox application. Our team is hop-ing to publish a python library for application defined PUMAconfigurations to automatically port standard python applica-tions to PUMA enabled applications.

2 System OverviewPUMA library is at the root of application portability. Withthe PUMA library imported to standalone applications, majorfunctionality can be migrated to the service mesh. This re-quires well defined application service boundaries, but thesecan be specified by the application, further ensuring maxi-mum flexibility during portability.

The PUMA interface extends Envoy by exposing proxyfunctionality and providing abstractions to Envoy featureslike load balancing and automatic computational scaling.With application service mappings defined by the PUMA li-brary, independent services can be routed to computationalnodes running on Envoy’s native code base at the serviceendpoint. Code intended for gRPC streaming endpoints are

1https://github.com/kevinjesse/puma2http://puma251.ddns.net:8080

Page 3: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

encapsulated and forwarded while complementary function-ality like routing and load balancing for gRPC still occurson the Envoy substrate. The database sub-interface translatesdatabase requests from locally defined database connectionsto a balanced Envoy service routing directly to DynamoDB.This is critical for frequent or large database queries as thedatabase service can scale according to usage in addition tosoftware logic; thus not becoming a common bottleneck forscaled software logic. The PUMA interface registers definedconfigurations of services from the PUMA lib and propagatesthose configurations to Envoy in a seamless fashion. PUMAprovides application developers with the tools necessary toscale and balance the application, without having to have spe-cific domain knowledge of L7 proxies, load balancing, andscaling technologies.

Envoy is a self contained process that can run alongside theapplication server. The Envoy mesh is completely transparentto the application in which the application sends traditionalGET and POST requests to and from the localhost, com-pletely unaware of the underlying network topology [Klein,2016]. This is important for several reasons: applicationsdo not need to be configured with distributed computing inmind for portability. Moreover, with REST API-like commu-nication, applications can be written in any language; serviceorient architectures tend to use multiple application frame-works and languages. While Envoy is typically a service toservice communication system, PUMA uses Envoy as a frontedge proxy. This brings the same benefits of observability,management, identical service discovery, load balancing, andscaling at first interaction with the workload. This interactionwith the workload at then edge provides Envoy with a distinctscaling advantage as there is little delay of recognizing whenmore computational resources are necessary and no up time toexpand the mesh because the future resources are configuredat start up. Lastly, the transparent Envoy mesh means scalingand library upgrades can be deployed and upgraded across theinfrastructure efficiently and quickly. This distributed mesh iscomposed of a variety of commodity hardware.

Envoy relies on commodity hardware for the execution ofits services. The envoy master thread controls coordinationwhile worker threads accept connections and is bounded toa set of resources. These threads are ran natively so that thearchitectural components can get out of the way and are opti-mized for highest hardware performance.

Services expansion provides a potential bottleneck atdatabase operations. Even services that do not directly accessthe database, commodity nodes will eventually need infor-mation that reside elsewhere. This tail latency is dangerous,especially when sharing a structure such as a database. Todemonstrate the need for conscious structural decisions, werecommend using a database like DynamoDB which is op-timized for Envoy and has all of the advantages previouslydiscussed.

The detail of PUMA’s configuration methods will demon-strate the easy of migrating an existing application to this sys-tem architecture.

3 DesignThe design for PUMA was inspired by the Lambda servicearchitecture used in the Alexa Prize. Novel dialog and multi-modal systems [Yu et al., 2015][Thomason et al., 2016] can-not easily be built on serverless platforms and future concernssuch as load balancing, is often neglected in favor of featuredevelopment and integration. PUMA inherits many advan-tages of serverless applications with Envoy, while maintain-ing flexibility and exclusiveness of monolithic cloud environ-ments (IaaS/PaaS).

Figure 2: Serverless Model Applications must be writtenfor existing serverless platforms like AWS Lambda. Appli-cations in serverless platforms can easily be scaled and loadbalanced while cost of operation is less than traditional serverhosts like EC2.

3.1 Serverless DesignServerless computing denotes a special software architecturewhere application logic is execution in an environment with-out visible processes, servers, virtual machines, or operatingsystems [Stigler, 2018]. While these operating system andarchitectural principles still exist in underlying hardware andvirtualization, they are abstracted away so the developer cansolely focus on writing deploy-able code. Service providersare responsible for efficiently serving all requests from itsclients and do not have to run a permanent workload for aspecific client. The application developer does not need toworry about scalablity and load balancing and the serviceprovider can utilize their infrastructure more efficiently; a truewin win.

Serverless architecture has several advantages over tradi-tional PaaS providers. Many PaaS providers do not guaranteeautomatic scaling, and the application developer has to man-ually scale the application, potentially, still without necessar-ily meeting dynamic needs. PUMA solves this scaling issuewithin PaaS systems by leveraging Envoy. Moreover, PUMAmaintains the performance and abstractions serverless archi-tectures provide without any of the drawbacks.

Serverless architectures have some fundamental draw-backs. Long running applications or services with tail la-

Page 4: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

tency are often more expensive than a dedicated server or vir-tual machine [Baldini et al., 2017]. Serverless code is writ-ten for the platform it is running on like AWS Lambda orAzure Functions which introduces a learning curve and ven-dor lock-in. Additionally, when the application is completelydependent on a third-party provider, there is less control ofthe application and changing the provider leads to significantin application changes. As serverless costs change, it mightbe more affordable to revert the code base for other providers,a difficult and expensive procedure. In contrast, PUMA runson any platform PaaS and is completely portable because itruns in a Docker container. When running applications ina serverless environment, service providers might run soft-ware from several different customers on the same physicalserver to utilize their resources more efficiently [Baldini etal., 2017]. Not only will your application not be guaranteedto maximally scale according to it’s workload, other customerapplication and provider infrastructure bugs can lead to secu-rity vulnerabilities. PUMA can run in any PaaS or monolithicserver and scale without restrictions as long as the underly-ing hardware is available. In practice, serverless platformssuffer from a cold start problem in which there is a delay toinitialize internal resources when the application is offline orthere hasn’t been requests to the function for a while [Baldiniet al., 2017]. For dialog systems and real time systems, de-lays within application functions is unacceptable [Patil et al.,2017]. In contrast, PUMA always runs a single master threadand will spawn worker threads as needed so response timeis always instant. Lastly, FaaS services like AWS Lambdado not provide out-of-the-box tools to test functions locally;this incurs a cost for testing the application on AWS Lambda[Baldini et al., 2017]. Due to PUMAs design to run in aDocker container, testing and extensive logging can be doneanywhere and does not incur additional costs.

PUMA avoids the aforementioned disadvantageous ofserverless platforms while providing most of the benefits:scaling and load balancing capabilities, micro-service design,abstractions to allow developers to focus on code and addi-tionally requires little modification to existing monolithic ap-plication. This is accomplished by exposing key envoy ser-vice endpoints.

3.2 EnvoyEnvoy is an open-source L7 proxy and communication busdesigned for modern service architectures with the principleof accessible but transparent network resources. PUMA de-ploys Envoy as a service to service with front proxy to allowcommunication with the internet from specialized ports. Weuse these ports to specifically communicate with PUMA fromwithin the web application.

Envoy runs with a single master thread that provides co-ordination with a number of worker threads. These workerthreads listen, filter, and forward connections. Connectionsthrough the Envoy proxy are bound to a single worker thread,allowing connections to be embarrassingly parallel with lit-tle effort necessary for coordination between worker threads.Envoy is 100% non-blocking and the number of workerthreads is typically configured to the number of hardwarethreads available on the machine. When a connection is re-

Figure 3: PUMA Configured Envoy Mesh PUMA definedservices at the PUMA library are configured in Envoy frontproxy mesh network. These services use envoy listeners de-fined in the application using the PUMA library. Workersoperate through a PUMA flask application that maintains ap-plication defined service logic.

ceived on a listener, or worker thread, it can perform a varietyof different proxy tasks: rate limiting, TLS client authentica-tion, HTTP connection management, filtering, and forward-ing.

Each Envoy deployment has a HTTP connection man-ager responsible for HTTP multiplexing. PUMA interfaceswith Envoy thought the HTTP connection manager on port8080 for statically established routes in the configuration. AsPUMA becomes more robust, we hope to dynamically estab-lish routes on running deployments using Envoys RDS API(Route Discovery Service). The RDS API is an optional APIthat Envoy calls dynamically to fetch route configurations.PUMA also leverages Envoys HTTP routing to efficientlyroute incoming HTTP requests to upstream clusters, acquiresa connection pool to host the upstream cluster, and forwardsthe request. Benchmarking PUMA and envoy saw that theforward proxy acquired connection pools up to 250 clusternodes. Furthermore, as PUMA becomes more feature rich,we hope to support priority based routing; while sufficient forour real time system, other applications with more significantdemands might benefit from priority routing. Expose simpleload balancing options is in PUMA’s future.

Currently PUMA relies on Envoy’s simple load balancingpolicy of round robin. This means each service defined inPUMA lib gets an equal distribution of computational re-sources. With multi-modal systems of various importanceand impact to real time, round robin might be less than ideal.With PUMA we hope to integrate load balancer options of:weighted least request, ring hash, Maglev, random, and orig-inal destination. Circuit breaking is another critical compo-nent of distributed systems that PUMA leverages with Envoy;although we do not plan to expose any configurable options.In contrast, global rate limiting configurations are available

Page 5: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

{” l o a d b a l a n c e r ” : ” r o u n d r o b i n ” ,” l o g g i n g ” : {

” f i l t e r ” : ” { . . .} ” ,” c o n f i g ” : ” { . . .} ” ,”name” : ” . . . ”} ,” r a t e l i m i t ” : {

” c l u s t e r n a m e ” : ” . . . ” ,} ,” dynamodb ” : {

” c o n f i g ” : ” { . . .} ” ,} ,” p u m a s e r v i c e s ” : {

” s e r v i c e 3 ” : ” / f a c e ” ,” s e r v i c e 2 ” : ” / h a p t i c ” ,” s e r v i c e 1 ” : ” / p o l i c y ” ,” s e r v i c e 5 ” : ” / db ” ,” s e r v i c e 4 ” : ” / s e n t i m e n t ”} ,” h t t p c o n n e c t i o n m a n a g e r ” : {

” h t t p f i l t e r s ” : [ ] ,” d r a i n t i m e o u t ” : ” { . . .} ” ,” r d s ” : ” { . . .} ” ,” t r a c i n g ” : ” { . . .} ” ,” i d l e t i m e o u t ” : ” { . . .} ” ,” r o u t e c o n f i g ” : ” { . . .} ” ,

}}

Figure 4: Example PUMA Core Configuration JSONThis configuration is passed to puma config to define fun-damental Envoy principles. The configuration is parsed andPUMA configures Envoy according to the definitions. Notepuma services is the list of micro-services to be establishedwith their respective url access points on the localhost.

in PUMA because various applications will have vastly dif-ferent use cases where, for example, a large number of hostsare forwarding to a small number of hosts with a low averagerequest latency. This is a common issue with increasing lowlatency connections to a single database server. This use-caseexemplifies why horizontally scalable distributed databaseslike DynamoDB should be configured with the PUMA li-brary, thus enabling seamless scaling and access of a singletable over hundreds on Envoy mesh instances. Lastly, PUMAwill expose basic asynchronous logging functionality and for-mats.

3.3 PUMA and PortabilityThe PUMA package is distributed in two parts: PUMA li-brary for application configuration and service definition, andPUMA core which contains the modifications to Envoy to ac-cept such configurations.

PUMA LibraryThe PUMA library is a python library on PyPi3 available asearly as April 2018. It provides a set of functions used toconfigure PUMA Core in an application. These functions in-clude: puma config, puma service, and puma start.

puma configpuma config is a configuration method used to set PUMACore configurations for logging, load balancing, databases,service definitions, rate limiting, and more. The only inputparameter is a JSON or python dictionary with PUMA con-figuration keys and values. For example, configuring PUMA

3https://pypi.python.org/pypi

core can look like the JSON in Figure 4.

puma servicepuma service is an code encapsulation method that takespython source code encoded as a string and passed to the cor-responding service. Similar to the puma config, the en-coded source code is assigned to a service key and loadedinto the runtime of PUMA’s envoy flask management app onport 8080. For PUMA data bench-marking, puma serviceis was used to define map reduce in the service space.

puma startpuma start is a wrapper to start the PUMA flask app andlaunch the envoy mesh. puma start should be declared inthe final steps of starting the application.

PUMA CoreThe PUMA Core is an PUMA configured Envoy4 front proxy.The PUMA Core contains modified configuration files and amaster Flask5 application. The Flask application serves to en-able configurations, routes, database connections and more.All Envoy deployments rely on a master Flask applicationand PUMA is no different. However, the PUMA Core mas-ter application maintains configurations and PUMA servicedefinitions. With the defined services and the encapsulatedsource code from PUMA Lib, PUMA Core can deploy suchsource code to the corresponding service and service URLs.Finally PUMA deploys the PUMA configured Envoy frontproxy in a docker container, starts the upstream mesh, andenables worker listeners on all commodity hardware. Ideallyservice definitions would be established in the Envoy configu-rations as declared dynamically, however because of the shortdevelopmental time frame, we have manually configured themaximum number of service definitions to be five.

Porting With PUMAPorting with PUMA requires minor modifications to existingapplications. PUMA Core configuration JSONs are variablein length, but at a minimum require at least one service defini-tion; this amounts to a single line of code. Another modifica-tion to a traditional application is including the puma startmethod. Encoding application python code into a list re-quires approximately five lines of modified code. Most ap-plications no longer need the server infrastructure code be-cause it exists within PUMA and so applications typicallybecome more lightweight; this is consistent with serverlessmigrations. Lastly, if URLs are defined differently as micro-services than in the traditional socket implementation, thenthey need to be replaced with the correct PUMA service end-points. Section 4.1 Porting An Existing Application, willdemonstrate the simplicity of porting an existing applicationand how our application lost weight.

3.4 Database OptimizationTraditional database connections to legacy databases are stillsupported in PUMA, however conducive for optimal per-formance. Legacy database architectures are supported butmay require Docker modifications to ensure that the proper

4envoyproxy.io/5flask.pocoo.org/

Page 6: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

libraries are available at deployment. The PUMA migra-tion of the dialog movie recommendation application, main-tained the existing PostgreSQL database, but required psy-copg2 package at deployment in order to connect to the Post-greSQL database.

Low request latency to database structures provides a chal-lenge in distributed systems. Systems where micro-servicesare embarrassingly parallel and contend for the same struc-tures provides potential hazards and impedes the ability fortimely responses. While aggregating database structures byservice helps, it does not scale with the PUMA deployment.Ideal databases for PUMA are DynamoDB6 [DeCandia et al.,2007] or MongoDB7 because of the ability to horizontallyscale with ease of configuration in Envoy by PUMA. Otherdatabase configurations should be considered if running En-voy on alternative elastic computing platforms such as Mi-crosoft Azure8 and Google Compute Engine9. Databases likeGoogle’s BigTable [Chang et al., 2008] or Spanner [Corbettet al., 2013], are suitable for PUMA because they are capableto handle an influx of requests as Envoy scales. Configuringany of these database solutions does impact portability, butwith DynamoDB, it is to a minimum.

4 ImplementationPUMA was deployed by porting an existing monolithic ap-plication onto the Envoy Mesh. Additionally, to determine ifthe Envoy mesh would provide speedup in computationallyexpensive tasks, we benchmark PUMA with a simple map-reduce implementation.

4.1 Porting An Existing ApplicationPorting the existing application like our monolithic socketbased python server application was incredibly simple. Withonly nine lines of additional code and 13 lines of retainedlogic, our application shed 49 lines of code. Appendix Ademonstrates how using PUMA and moving to a pseudo-serverless architecture significantly reduces the responsibilityof the developer by removing system code. With PUMA thedeveloper can focus on the application code knowing theirapplication can scale and load balance accordingly.

4.2 MapReduceA branch of our PUMA system is created for the MapRe-duce benchmark. The mechanism and system framework areno different from the main PUMA implementation, only thatsome code is modified to aid the collection of experimentaldata.

The PUMA MapReduce system consists of the client APIlibrary and the server Envoy service. The client API libraryis responsible for providing the transparent function in whichthe application will call. The function takes in the sequentialdata and the mapping function as arguments. The library willthen chunk the data into the size of the square root of thenumber of data given, so that the size of each chunk is similar

6aws.amazon.com/dynamodb7mongodb.com/8azure.microsoft.com9cloud.google.com/compute/

to the number of chunks. We recognize this to be a pointof optimization. The library then package the chunked dataand mapping function into discrete JSON string, and is sentto the server Envoy endpoint via a RESTful API. The librarythen waits for all the response to be returned, re-combines thechunks and returns the result to the application.

The server endpoint of PUMA utilizes Envoy’s port ser-vices and load balancing. RESTful requests from PUMAclient is received using the python Flask library, which mapsthe requested URL endpoint to their respective service. En-voy routes each incoming requests in a round robin fashion toa fixed number of service instances, which processes the re-quest and return the results as a JSON RESTful API responseto the original sender.

For our MapReduce implementation, the mapping func-tion is covered into JSON by obtaining the string repre-sentation of the source using the built-in inspect module’sgetsouceline() 10. The data is sent as a JSON array.On the server, the mapping function is converted to local ob-ject using the exec() function, and is then called with thedata parameters using eval() 11. In future work, we in-tended to streamline this process and migrate away from us-ing eval() due to the inherent security vulnerability, butfor this paper and the simplicity of our implementation, weincluded the client and the server connection into our TCB.

Figure 5: PUMA API service This figure describes the sys-tem diagram of PUMA MapReduce API. Applications calla function provided by PUMA’s API library, for instance,puma map(), where the API library will chunk the data ar-ray into multiple smaller arrays. The data and mapping func-tion is sent as JSON via RESTful API to the target PUMA-Envoy service, where Envoy load balances the incoming re-quests into respective compute instances. Once the instanceprocessed the data chunk, it is sent back to the client’s APIlibrary via JSON. The API library then rejoins the chunkeddata arrays and return to the application.

5 EvaluationA major goal of the PUMA system is to provide standaloneapplications transparent access to a distributed system frame-work. In order to measure the effectiveness of the PUMA sys-

10https://docs.python.org/3.5/library/inspect.html11https://docs.python.org/3.5/library/functions.html

Page 7: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

tem, we gauged the system based on three categories: Ease ofadopting the PUMA library, throughput and latency.

5.1 Ease of AdoptionWe qualitatively recorded the required effort of adoptinga task or applications to using distributed systems usingPUMA. Our point of consideration includes the transparencyof using the PUMA system, the lines of code needed to bechanged, and the number of factors needed to be consideredby the application developer when porting their apps onto adistributed system.

5.2 Throughput and latencyWe benchmarked the PUMA system on a MapReduce prob-lem. MapReduce is a common problem both for standaloneapplications, such as the use of for loop on a list of data,and for distributed system applications, for processing a largedataset on server clusters [Dean and Ghemawat, 2008]. AMapReduce problem will also gauge the throughput and la-tency of our system via the size of the data being mapped andthe complexity of the mapping function, respectively.

Benchmarking SystemWe created our distributed system on an AWS EC2m5.24xlarge instances with 96 virtual CPUs and 384GB ofmemory [Amazon Web Services, 2018]. The distributed sys-tem is created by Envoy via docker-compose. We scaled ourdistributed system to having 250 instances for our bench-marking service using Envoy, where each instance is simu-lated using child processes behind the scene.

We ran two versions of the trial with different clients. Thefirst trial was run with a 2016 15” MacBook Pro with an In-tel Core i7 Skylake at 2.6GHz and 16GB of memory [Apple,2017]. This is to simulate real-world usage of PUMA, wherethe developer of the standalone application is most likely tobe running their application on their computer. The secondtrial was run directly on the EC2 instance. By running theclient directly on the machine, it eliminates latency of go-ing through the Wide Area Network from our results. Theperformance impact to Envoy and the distributed system isnegligible due to the amount of resource on the EC2 instance.

Experimental SetupWe varied the data size of the mapping problem for bench-marking the throughput of the system. We created a MapRe-duce problem of 100 to 1,000,000 item with a step size of afactor of 10. Each item consists of the integer 100, each to bemapped by the factorial function. Each step size is run fivetimes to reduce statistical variation.

We varied the integer to be mapped to benchmark the la-tency of the system. We vary the input integer for the factorialfunction, using 100, 500, 100 and 1500. Each trail is run withthe data size of 100,000. Each input integer trials is run fivetimes to reduce statistical variation.

The total latency of the system is measured by recordingthe time taken starting from the creation of the data array bythe client until the client has collected all results from PUMAand returned the sum of all the mapped factorials.

1 import a s y n c i o2 import puma34 async def a p p f u n c t i o n (5 d a t a : l i s t6 ) −> l i s t :78 def f a c l ( n ) :9 t o t a l = n

10 f o r i in range ( n − 1 , 0 , −1):11 t o t a l ∗= i12 re turn t o t a l1314 r e s u l t = a w a i t puma . puma map (15 f a c l , d a t a16 )17 re turn r e s u l t

Figure 6: Paralleling Data Operations with PUMA PUMAis more than a service definition and configuration tool.PUMA has can support computational tasks that are natu-rally advantageous to execute on a distributed system. WithPUMA, a simple map reduce function can leverage the entireEnvoy mesh. Data operations are balanced and scaled con-current with other services.

6 ResultsWe collected data according to our evaluation metrics togauge the performance and effectiveness of PUMA.

6.1 Ease of AdoptionWe measured PUMA’s ease of adoption according to its trans-parency, lines of code changes needed and abstractions of fac-tors needed for consideration when porting application to dis-tributed systems.

TransparencyPUMA enables transparent usage of the distributed systemby abstracting communications with the servers, chunking ofthe data and functions and gathering of server responses awayfrom the API user, as seen in Figure 5 and Appendix A . Ap-plication services only have to wait for the PUMA library toreturn the results, as the data must go through the networkstack. Figure 6 shows that the PUMA system uses asyncio, amodern built-in Python library that allows for asynchronoustasks to be written like synchronous code12. The await key-word ensures code will only continue executing when alltasks within puma map() is completed. The API is alsodesigned to be as similar to the non-PUMA counterpart aspossible. Application services may simply substitute map()from the Python built-in library with puma.puma map(),and provide the correct configuration to PUMA, such as theaddress of the distributed system, and PUMA will transpar-ently run the map function on the distributed system.

Lines of CodeAppendix A demonstrates less than 9 additional lines of codeneeded to be modified to migrate application logic to a micro-service architecture. Figure 6 shows that custom computa-tional operations such as map-reduce require little modifiedlines of code to changed to a distributed system task. Astandard map(function, iterable, ...) function

12https://docs.python.org/3.5/library/asyncio.html

Page 8: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

in python requires a function type and an iterable type. Ourpuma map(function, data) implementation of maphas a very similar function signature. This means the APIusers only have to change the function signature in order touse PUMA and the distributed Envoy mesh.

Factors for ConsiderationMigrating applications to a distributed environment, or con-verting existing applications to a software as a service (SaaS)requires consideration of multiple factors in the applicationarchitecture. For instance, while most applications to bemonolithic, SaaS such as our dialog systems needs to be cre-ated as a microservice. Traditional applications assume datato come from a structure that has a complete view of the data,while SaaS runs on distributed systems where databases arestored fragmented across multiple clusters. In order to takeadvantage of a distributed system, SaaS developers must re-think their application architecture and distribute their appli-cation as components piece by piece [Golding, 2018]. SaaSdevelopers must also understand the restraints of the respec-tive distributed system and choose the application compo-nents carefully to avoid bottleneck.

Our PUMA system abstracts away most of these consider-ations behind the API, allowing the application developer toconvert components of their applications with ease. For in-stance, application developer converting a map function cansimply call the puma map() instead. The developer onlyneeds to recognize components of their application that canbe distributed, and the library manages the distribution andintegration with the distributed system service. Load balanc-ing and data consistency are done by PUMA as well, wherePUMA manages the load balancing via Envoy and data con-sistency by combining the returned data before passing theresult to the user.

6.2 Throughput and LatencyWe measured the performance of PUMA quantitatively usingMapReduce with varying data size and input integer for thefactorial function.

Varying Data SizeFrom our result, PUMA has demonstrated to speedup com-putation when a certain data size is reached. As seen fromFigure 7a, the graph for total latency when using PUMA in-tersects the graph for not using PUMA at between 10,000 and100,000 when WAN overhead is present and at between 100and 1,000 when WAN overhead is not present. This meansthat at their respective data size threshold, it is more effi-cient to run the MapReduce problem using PUMA on thedistributed system than on the local machine. This is rea-sonable since using distributed systems allow the MapReducetask to be run in parallel across a large number of computinginstances.

The differences in threshold where PUMA with local clientperforms better than PUMA with client over WAN illustratesthe bottleneck of the system being the HTTP requests, whichwill be demonstrated in Figure 8. Since data is chucked tosqrt(datasize) number of arrays each with similar sizes, thenumber of HTTP requests sent to the envoy services growsexponentially.

Both blue lines for MapReduce without PUMA is strictlyexponential, meaning the MapReduce algorithm have not in-troduced noises to the result. The different x-intercept forboth lines illustrates the latency due to differences in comput-ing power between MapReduce running on the EC2 instanceand running on a laptop.

Varying Factorial IntegerWith data size kept consistent, PUMA’s the speedup thresholdis more prominent as seen in Figure 7b. PUMA systems, i.e.the orange lines, with and without WAN overhead are seento perform better than without PUMA at between 100 and500 factorial. This is reasonable since applications withoutPUMA has to run each computation after another, whereasPUMA can run the mapping in parallel. The overhead ofHTTP requests is also constant since the number of HTTPrequests made is constant.

The orange lines are seen to converge at 1000 factorial.This means that at high enough mapping complexity, Theoverhead of going through the wide area network becomesnegligible due to the latency of the actual data processing.

Both blue lines for MapReduce without PUMA are ver-tically transpositional of another, illustrating the MapReducealgorithm yields predictable increase in total latency when themapping complexity is increased. It also illustrates that dif-ferences in client computing power have a consistent effect atany mapping complexity.

WAN EffectsThe results from varying MapReduce’s data size and factorialinteger has shown an overhead with HTTP request. We plotthe differences in total latency between trials with and withoutPUMA for both experimental parameters in Figure 8. Thesegraphs demonstrate the change in network overhead with thechange in data size and factorial integer.

As seen in Figure 8a, the difference in total latency in-creases exponentially as data size increases. Since the onlydifference between the two PUMA systems is the usage ofwide area network, the difference in latency observed is at-tributed to the network latency and variability during the tri-als. This will also explain the high fluctuation of the data.The difference in latency is increasing exponentially as thedata size increases exponentially. This poses a potential prob-lem with scalability as the data size grows even larger. How-ever, since the issues lie with the number of network requests,optimizations on reducing network requests or intelligentlybalances the sending of resources should mitigate the issues.Having dedicated connections between client and PUMA,such as using the client within a company network, wouldalso reduce the overhead.

Figure 8b further illustrates the effect of network overhead,and that complexity of the mapped function does not increasethe latency of the MapReduce task. The difference in totallatency decreases slightly as the factorial integer increases.This means the difference in total latency is mostly indepen-dent of the complexity of the mapped function. The slightdecrease can be attributed to the fact that at large mappedfunction complexity, the time difference becomes less signif-icant against the total time needed to complete the algorithm.

In addition to calculating the differences between using

Page 9: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

(a) with Varying Data Size (b) with Varying Factorial Integer

Figure 7: Total Latency of Performing MapReduce with Iterative Factorial Function Both graphs show the averagetotal latency in logarithmic scale for the trials with varying data sizes in Figure 7a and with varying factorial input integer inFigure 7b. The two blue lines represent the trials where the MapReduce is run solely on the client machine. The two orangelines represent the trials the algorithm is run using PUMA – where the mapping of individual data is done using the distributedsystem. The lines with triangle represent the trials where the client machine is part of the EC2 instance. Thus there is nowWAN overhead between the client-server communication. Exponential trend-lines are given for the orange lines. The lines withcircles represent the trials where the client machine is the MacBook Pro. Thus each request to PUMA must be sent via theWAN to reach the distributed system.

PUMA with different clients, we calculated the difference be-tween running MapReduce on the EC2 instance and a laptop.The green lines represent the difference in total latency due tothe difference in computing power. Varying data size yieldsa standard exponential graph, since the number of data needsto be processed increases exponentially. Varying the factorialinteger yields a linear graph, since the number of data neededto be processed is constant, but the complexity of each map-ping increases.

7 Related WorkOur project involves multiple technologies within the areaof distributed systems architecture. PUMA requires a dis-tributed system framework to manage system resources. Italso requires a data distribution mechanism to break apartcomputations on large datasets. Since PUMA’s goal is to dis-tribute standalone applications with minimal re-engineering,it needs to adapt the applications using interfaces or encap-sulations. Below are some of the researches and tools in thecommunity that can provide PUMA ways to fulfill those re-quirements.

7.1 Distributed System FrameworksA core part of a distributed system is the distributed systemmanager. Software, such as Envoy and Amazon Elastic LoadBalancer, manage system resources over the network and dis-tribute workload accordingly. It also serves as the interface

between the application and the network, where common I/Osuch as database access are provided by the manager.

AWS Elastic Load Balancing is a scalable distributed sys-tem framework for the AWS cloud service. It manages in-coming traffic either at request level or network level, thenroutes traffic to targeted network clients that are runningidentical or different micro-services [Amazon Web Services,2018]. Amazon Web Services ELB balances incoming traf-fic within the network so that no one system is overloaded.This load balancer can provide a framework for managingour distributed systems. However, our project is not focusingon distributing incoming task within a network, but insteaddistributing internal processes across clients for optimal com-puting.

Envoy is an enterprise-sized distributed system frameworkthat runs and manages services across a distributed system[Klein, 2016]. It enables network distribution transparencyby providing a communication mesh for applications overthe network. While it is feature-rich and comprehensive, ourproject is not focusing on managing or load balancing be-tween services, but how we can distribute workload on anapplication level using a load supervisor. However, its designfor abstracting the network can be used to provide networkdistribution at the application level.

Chubby is a course-grained lock service designed to scaleapplications across networked machine clusters [Burrows,2006]. For example, the Google File System uses Chubby toappoint master servers and as the root of their distributed data

Page 10: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

(a) with Varying Data Size (b) with Varying Factorial Integer

Figure 8: Total Latency Penalty from WAN overhead Both graphs show the difference in total latency between trials withthe client running on the EC2 instances and running on the local MacBook Pro in logarithmic scale. Figure 7a shows the resultsfor trials with varying data sizes and Figure 8b shows the results for trials with varying factorial input integers. Green linesshow the difference in latency without PUMA, i.e., the difference in latency due to the computing power of the two clients. Theorange lines show the difference in latency for the PUMA system, i.e., the latency due to the overhead of sending requests overthe wide area network.

structures [Burrows, 2006]. By using a lock-based paradigm,Chubby abstracts management of distributed consensus andallow applications to be implemented and scaled with rel-atively low effort. However, Chubby focuses on reliabilityand deprioritizes performance, which dialog system micro-services require to achieve low interaction latency.

Zookeeper is a distributed database management softwaredesigned by Yahoo for Hadoop and eventually integrated withApache [Hunt et al., 2010]. Similar to Chubby in principle,Zookeeper provides intuitive interface using locks, registersand group messaging to the underlying distributed file sys-tem. It is a useful and powerful tool at the data-level, whichour framework can utilize for managing data concurrencyover the network. Apache and Hadoop, which zookeeper iscurrently part of, can be used as a foundation for managingdata distribution in our framework.

7.2 Distributed Data ComputationDistributing partial data for machine learning across a dis-tributed domain has been proposed for image processing tasks[Alonso-Calvo et al., 2010]. Images are divided into differ-ent sub-images that can be stored and processed separately,which the system manger can distribute the processing ofeach sub-image over the network. One of PUMA’s respon-sibility is to allow applications that work on large continuousdatasets to be distributed across the network. It must be ableto manage the division of the data and distribute the task overthe network. PUMA is also responsible for aggregating theresults of the completed distributed task once all sub-imagesare processed. Unlike Google’s TensorFlow [Abadi et al.,

2016], PUMA optimizes both request and computational re-sources for current usage needs and is not exclusive to rigidand specific training optimization that are leveraged exclu-sively on Google clusters.

7.3 Existing Software AdaptationA major focus in distributed systems is the adaptation oflegacy software into the distributed network. Most legacysoftware is written with outdated technology and is notportable to modern network-centric architectures. Onemethod of deploying legacy software components into mod-ern distributed systems is through encapsulation [Sneed,2000]. A wrapper is used to encapsulate the legacy softwareand provide a bridge between modern architecture and thesoftware components. Wrapping technology provides mod-ern applications access to the legacy components, such aswrapping legacy databases to provide object-oriented seman-tics to modern software. This concept of wrapping can beadapted to our framework, either by providing bidirectionalinterfaces using the wrapper, or using the wrapper on the dis-tributed network to that the application can see the networkas a standalone, non-distributed hardware.

8 ConclusionWe have described PUMA and its ability to provide dis-tributed computing environments for traditional monolithicapplications. PUMA provides applications with serverlessadvantages such as scaling, load balancing, deniability ofinfrastructure provisioning and management all without the

Page 11: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

disadvantages of vendor lock-in, lack of security, and coldstart. We have shown that porting applications with PUMAis flexible enough for most use cases and potentially suitablefor wide adoption. Moreover, we demonstrated that PUMAoutperforms traditional computational heavy tasks by lever-aging the Envoy mesh. With applications becoming morecomplex and dependent on simultaneous high performing ser-vices, it becomes increasingly important to delegate resourcesefficiently and optimally. PUMA accomplishes this task andprovides significant ease of adoption.

References[Abadi et al., 2016] Martın Abadi, Ashish Agarwal, Paul

Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Gre-gory S. Corrado, Andy Davis, Jeffrey Dean, MatthieuDevin, Sanjay Ghemawat, Ian J. Goodfellow, AndrewHarp, Geoffrey Irving, Michael Isard, Yangqing Jia,Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, JoshLevenberg, Dan Mane, Rajat Monga, Sherry Moore,Derek Gordon Murray, Chris Olah, Mike Schuster,Jonathon Shlens, Benoit Steiner, Ilya Sutskever, KunalTalwar, Paul A. Tucker, Vincent Vanhoucke, Vijay Va-sudevan, Fernanda B. Viegas, Oriol Vinyals, Pete War-den, Martin Wattenberg, Martin Wicke, Yuan Yu, andXiaoqiang Zheng. Tensorflow: Large-scale machinelearning on heterogeneous distributed systems. CoRR,abs/1603.04467, 2016.

[Alonso-Calvo et al., 2010] Raul Alonso-Calvo, Jose Cre-spo, Miguel Garcıa-Remesal, Alberto Anguita, and VictorMaojo. On distributing load in cloud computing: A realapplication for very-large image datasets. Procedia Com-puter Science, 1(1):2669 – 2677, 2010. ICCS 2010.

[Amazon Web Services, 2018] Inc. Amazon Web Services.What is elastic load balancing?, 2018.

[Apple, 2017] Inc. Apple. Apple, Sep 2017.

[Armbrust et al., 2010] Michael Armbrust, Armando Fox,Rean Griffith, Anthony D. Joseph, Randy Katz, AndyKonwinski, Gunho Lee, David Patterson, Ariel Rabkin,Ion Stoica, and Matei Zaharia. A view of cloud computing.Commun. ACM, 53(4):50–58, April 2010.

[Baldini et al., 2017] Ioana Baldini, Paul Castro, KerryChang, Perry Cheng, Stephen Fink, Vatche Ishakian, NickMitchell, Vinod Muthusamy, Rodric Rabbah, AleksanderSlominski, et al. Serverless computing: Current trends andopen problems. In Research Advances in Cloud Comput-ing, pages 1–20. Springer, 2017.

[Burrows, 2006] Mike Burrows. The chubby lock servicefor loosely-coupled distributed systems. In Proceedingsof the 7th Symposium on Operating Systems Design andImplementation, OSDI ’06, pages 335–350, Berkeley, CA,USA, 2006. USENIX Association.

[Chang et al., 2008] Fay Chang, Jeffrey Dean, Sanjay Ghe-mawat, Wilson C Hsieh, Deborah A Wallach, Mike Bur-rows, Tushar Chandra, Andrew Fikes, and Robert E Gru-ber. Bigtable: A distributed storage system for structured

data. ACM Transactions on Computer Systems (TOCS),26(2):4, 2008.

[Corbett et al., 2013] James C Corbett, Jeffrey Dean,Michael Epstein, Andrew Fikes, Christopher Frost,Jeffrey John Furman, Sanjay Ghemawat, Andrey Gubarev,Christopher Heiser, Peter Hochschild, et al. Spanner:Google’s globally distributed database. ACM Transactionson Computer Systems (TOCS), 31(3):8, 2013.

[Dean and Ghemawat, 2008] Jeffrey Dean and Sanjay Ghe-mawat. Mapreduce: simplified data processing on largeclusters. Communications of the ACM, 51(1):107–113,2008.

[Dean et al., 2012] Jeffrey Dean, Greg Corrado, RajatMonga, Kai Chen, Matthieu Devin, Mark Mao,Marc’aurelio Ranzato, Andrew Senior, Paul Tucker,Ke Yang, Quoc V. Le, and Andrew Y. Ng. Large scaledistributed deep networks. In F. Pereira, C. J. C. Burges,L. Bottou, and K. Q. Weinberger, editors, Advances inNeural Information Processing Systems 25, pages 1223–1231. Curran Associates, Inc., 2012.

[DeCandia et al., 2007] Giuseppe DeCandia, Deniz Has-torun, Madan Jampani, Gunavardhan Kakulapati, AvinashLakshman, Alex Pilchin, Swaminathan Sivasubramanian,Peter Vosshall, and Werner Vogels. Dynamo: amazon’shighly available key-value store. In ACM SIGOPS oper-ating systems review, volume 41, pages 205–220. ACM,2007.

[Gemulla et al., 2011] Rainer Gemulla, Erik Nijkamp, Pe-ter J. Haas, and Yannis Sismanis. Large-scale matrix fac-torization with distributed stochastic gradient descent. InProceedings of the 17th ACM SIGKDD International Con-ference on Knowledge Discovery and Data Mining, KDD’11, pages 69–77, New York, NY, USA, 2011. ACM.

[Golding, 2018] Tod Golding. Migrating applications tosaas: A minimally invasive approach, Mar 2018.

[Gustafson et al., 2000] Joakim Gustafson, Linda Bell,Jonas Beskow, Johan Boye, Rolf Carlson, Jens Edlund,Bjorn Granstrom, David House, and Mats Wiren. Adapt-amultimodal conversational dialogue system in an apart-ment domain. In Sixth International Conference onSpoken Language Processing, 2000.

[Henderson et al., 2013] Matthew Henderson, Blaise Thom-son, and Steve Young. Deep neural network approach forthe dialog state tracking challenge. In Proceedings of theSIGDIAL 2013 Conference, pages 467–471, 2013.

[Hunt et al., 2010] Patrick Hunt, Mahadev Konar, Yahoo! Grid, Flavio P Junqueira, Benjamin Reed, and Ya-hoo ! Research. Zookeeper: Wait-free coordination forinternet-scale systems. 8, 06 2010.

[Klein, 2016] Matt Klein. Announcing envoy: C l7 proxyand communication bus, Sep 2016.

[Levin et al., 1998] Esther Levin, Roberto Pieraccini, andWieland Eckert. Using markov decision process for learn-ing dialogue strategies. In Acoustics, Speech and Signal

Page 12: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

Processing, 1998. Proceedings of the 1998 IEEE Inter-national Conference on, volume 1, pages 201–204. IEEE,1998.

[Patil et al., 2017] Amit Patil, K Marimuthu, R Niranchana,et al. Comparative study of cloud platforms to develop achatbot. International Journal of Engineering & Technol-ogy, 6(3):57–61, 2017.

[Sneed, 2000] Harry Sneed. Encapsulation of legacy soft-ware: A technique for reusing legacy software compo-nents. 9:293–313, 05 2000.

[Stigler, 2018] Maddie Stigler. Understanding serverlesscomputing. In Beginning Serverless Computing, pages 1–14. Springer, 2018.

[Thomason et al., 2016] Jesse Thomason, Jivko Sinapov,Maxwell Svetlik, Peter Stone, and Raymond J Mooney.Learning multi-modal grounded linguistic semantics byplaying” i spy”. In IJCAI, pages 3477–3483, 2016.

[Villamizar et al., 2015] Mario Villamizar, Oscar Garces,Harold Castro, Mauricio Verano, Lorena Salamanca,Rubby Casallas, and Santiago Gil. Evaluating the mono-lithic and the microservice architecture pattern to deployweb applications in the cloud. In Computing ColombianConference (10CCC), 2015 10th, pages 583–590. IEEE,2015.

[Villamizar et al., 2016] Mario Villamizar, Oscar Garces,Lina Ochoa, Harold Castro, Lorena Salamanca, Mauri-cio Verano, Rubby Casallas, Santiago Gil, Carlos Valen-cia, Angee Zambrano, et al. Infrastructure cost compar-ison of running web applications in the cloud using awslambda and monolithic and microservice architectures. InCluster, Cloud and Grid Computing (CCGrid), 2016 16thIEEE/ACM International Symposium on, pages 179–182.IEEE, 2016.

[Yu et al., 2015] Zhou Yu, Alexandros Papangelis, andAlexander Rudnicky. Ticktock: A non-goal-oriented mul-timodal dialog system with engagement awareness. InProceedings of the AAAI Spring Symposium, volume 100,2015.

Page 13: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

Appendix A: Migrating Application To Distributed Mesh With PUMA

Listing 1: Before PUMA Migration Monolithic threading application used traditional sockets. Most migration logic occurredhere within listenToClient.

1 import s o c k e t2 import t h r e a d i n g3 import d i a l o g u e C t r l a s d C t r l4 from d i a l o g u e C t r l import d i a l o g u e C t r l , i n i t R e s o u r c e s , d i a l o g u e I d l e5 import j s o n6 import s y s7 import t ime8 import t r a c e b a c k9

10 debug = F a l s e11 p a s s i v e = {}121314 def c h a t b o x s o c k e t ( ) :15 i f debug :16 re turn 1312017 e l s e :18 re turn 13113192021 c l a s s T h r e a d i n g S e r v e r ( o b j e c t ) :22 def i n i t ( s e l f , hos t , p o r t ) :23 s e l f . h o s t = h o s t24 s e l f . p o r t = p o r t25 s e l f . sock = s o c k e t . s o c k e t ( s o c k e t . AF INET , s o c k e t . SOCK STREAM)26 s e l f . sock . s e t s o c k o p t ( s o c k e t . SOL SOCKET , s o c k e t . SO REUSEADDR, 1)27 s e l f . sock . b ind ( ( s e l f . hos t , s e l f . p o r t ) )2829 def l i s t e n ( s e l f ) :30 s e l f . sock . l i s t e n ( 5 )31 s e l f . sock . s e t t i m e o u t ( None )32 whi le True :33 c l i e n t , a d d r e s s = s e l f . sock . a c c e p t ( )34 c l i e n t . s e t t i m e o u t ( 6 0 )35 t h r e a d i n g . Thread ( t a r g e t = s e l f . l i s t e n T o C l i e n t , a r g s =( c l i e n t , a d d r e s s ) ) . s t a r t ( )3637 def l i s t e n T o C l i e n t ( s e l f , c l i e n t , a d d r e s s ) :38 s i z e = 204839 whi le True :40 t r y :41 d a t a = c l i e n t . r e c v ( s i z e )42 i f d a t a :43 t r y :44 p r i n t ” d a t a : {}” . format ( d a t a )45 r e s p o n s e , u s e r i d , pa s s iveL en , s i g n a l = d i a l o g u e C t r l ( d a t a )46 i f r e s p o n s e == d C t r l . e n d d i a l o g u e :47 s i g n a l = ’ end ’48 r e s p o n s e J s o n = j s o n . dumps (49 { ’ r e s p o n s e ’ : r e s p o n s e , ’ u s e r i d ’ : u s e r i d , ’ s i g n a l ’ : s i g n a l , ’ p a s s i v e L e n ’ : p a s s i v e L e n })50 p r i n t r e s p o n s e J s o n51 c l i e n t . send ( r e s p o n s e J s o n )52 i f s i g n a l != ” l i s t e n ” :53 d i a l o g u e I d l e ( u s e r i d , debug )54 e xc ep t E x c e p t i o n as e :55 e x c t y p e , e x c v a l u e , e x c t r a c e b a c k = s y s . e x c i n f o ( )56 t r a c e b a c k . p r i n t t b ( e x c t r a c e b a c k , l i m i t =1 , )57 t r a c e b a c k . p r i n t e x c ( )58 e l s e :59 r a i s e e r r o r ( ’ C l i e n t d i s c o n n e c t e d ’ )60 e xc ep t :61 c l i e n t . c l o s e ( )62 re turn F a l s e63 c l i e n t . c l o s e ( )6465 i f n a m e == ” m a i n ” :66 i n i t R e s o u r c e s ( )67 i f ’ debug ’ in s y s . a rgv :68 debug = True69 p r i n t ” Running i n debug mode . ( s o c k e t : {}) ” . format ( c h a t b o x s o c k e t ( ) )70 whi le True :71 T h r e a d i n g S e r v e r ( ’ l o c a l h o s t ’ , c h a t b o x s o c k e t ( ) ) . l i s t e n ( )

Page 14: Parallel Unconcentrated Management for Applications (PUMA ... Unconcentrated... · distributed matrix factorization [Gemulla et al., 2011], neural network training [Dean et al., 2012],

Listing 2: After PUMA Migration Only code retain was application logic which was encoded and sent to PUMA Core fordeployment to service endpoints

1 from d i a l o g u e C t r l import d i a l o g u e C t r l , i n i t R e s o u r c e s , d i a l o g u e I d l e2 import j s o n3 import p u m a l i b4 import i n s p e c t56 def s e r v i c e 1 ( ) :7 t e x t = r e q u e s t . form [ ’ c h a t T e x t ’ ]8 mode = r e q u e s t . form [ ’mode ’ ]9 i d = s t r ( r e q u e s t . form [ ’UUID ’ ] )

10 i f i d == ’−1’ :11 i d = s t r ( uu id . uu id4 ( ) )12 # app s p e c i f i c py thon c o n t a i n e r13 r e s p o n s e , u s e r i d , pa s s iveL en , s i g n a l = d c t r l . d i a l o g u e C t r l ( j s o n . dumps ({ ’ t e x t ’ : t e x t , ’mode ’ : mode , ’ i d ’ : i d } ) )14 i f s i g n a l != ” l i s t e n ” :15 d c t r l . d i a l o g u e I d l e ( u s e r i d , debug=True )16 re turn j s o n . dumps ({ ’ r e s p o n s e ’ : r e s p o n s e , ’ u s e r i d ’ : u s e r i d , ’ s i g n a l ’ : s i g n a l , ’ p a s s i v e L e n ’ : p a s s i v e L e n })1718 con f = {}19 con f [ ’ p u m a s e r v i c e s ’ ] = {20 ” s e r v i c e 1 ” : ” / a g e n t ” ,21 ” s e r v i c e 2 ” : ” / l i s t e n e r ” ,22 }23 puma conf ig ( j s o n . dumps ( con f ) )2425 s e r v i c e = {26 ” s e r v i c e 1 ” : i n s p e c t . g e t s o u r c e l i n e s ( s e r v i c e 1 ( ) )27 }2829 p u m a s e r v i c e ( j s o n . dumps ( s e r v i c e ) )3031 i f n a m e == ” m a i n ” :32 i n i t R e s o u r c e s ( )33 p u m a s t a r t ( )