cost-effective container orchestration using usage data

Cost-Effective Container Orchestration Using Usage DataYouseok Nam

Sungkyunkwan UniversitySuwon, Korea

[email protected]

Hwansoo HanSungkyunkwan University

Suwon, [email protected]

ABSTRACTRecent cloud IDE services provide containers as development en-vironment to users. Since users have little knowledge on specifictasks to run and computing resources required in their containers, itis difficult to decide exactly how many containers to allocate to thecloud instance. Cloud services often employ conservative managingpolicy to make the cloud instances to an appropriate level, and onlyincrease the instances little by little when their services encounterresource problems. In addition, a simple container placement policycreates a situation where no more containers can be allocated, eventhough resources are available in some of cloud instances depend-ing on their execution situations. To improve this, we place as manycloud instances as possible based on the predicted container usage,which is collected from the usage data of the containers on previ-ous cloud instances. When a cloud instance has too much surplusresource, we also employ container migration to effectively manageoverall cloud instances. By equipping our cloud service with anintelligent management policy, we can reduce the total number ofcloud instances in use and increase the cost efficiency for our cloudservice by 14.7%, according to our simulation study.

CCS CONCEPTS• Software and its engineering→ Cloud computing.

KEYWORDSContainer Orchestration, Docker Container, Predicted ResourceUsageACM Reference Format:Youseok Nam and Hwansoo Han. 2020. Cost-Effective Container Orchestra-tion Using Usage Data. In Proceedings of September 17-19, 2020 (SMA 2020).ACM, New York, NY, USA, Article 4, 4 pages. https://doi.org/xx.xxx/xxx_x

1 INTRODUCTIONOur cloud IDE service, goormIDE [6], provides Docker [7] contain-ers as they are to provide a dedicated development environment tofree users. Because root privileges are given to Docker containers,except for some security-related privileges[5], users can freely cus-tomize what they want to do in a given environment. This is a bigadvantage from the user’s point of view, but it is a problem fromthe administrator’s point of view. Since it is nearly impossible toknow which tasks in the container will use how many resources,

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).SMA 2020, Jeju, Republic of Korea,© 2020 Copyright held by the owner/author(s).ACM ISBN 978-1-4503-6843-8/20/10.https://doi.org/xx.xxx/xxx_x

predicting the exact number of containers to allocate on one cloudinstance is a difficult goal for administrators.

Elastic operations are required for cloud instances to accom-modate fluctuating needs for containers, and providing cloud in-stances as needed has been resolved by using AmazonWeb Services(AWS) [2]. However, the more cloud instances are provided, thesooner the infrastructure maintenance cost increases. To minimizethe cost, we should place as many containers as possible to theminimum number of cloud instances. A cloud instance needs toaccommodate as many containers as possible, but we do not knowhow much resources a container is using. Thus, we conservativelytried to allocate more than a certain number of containers per cloudinstance, and we tried to maintain if there were no problems. Oth-erwise, we allocate more cloud instances. Although the servicewas stably provided based on simple, static container operationpolicy, which is based on the empirical operation experience, allcloud instance resource conditions were not suitable for the staticcontainer operation policy. Surplus resource conditions occurred,even when the maximum number of containers was allocated. Inaddition, even if containers were allocated to each cloud instanceat the maximum level, we sometimes encounter situations wheretoo many cloud instances were used for the overall small numberof allocated containers. This is mostly due to different lifetimes ofcontainers. Some containers finish earlier than others and maketheir cloud instances waste available resources.

To solve this problem, the accumulated container usage log isused to obtain a predicted number of the container usage. If thereare surplus resources on cloud instances, additional containersare allocated to those cloud instances. When the number of cloudinstances is excessive, container migration is used and reduce thenumber of cloud instances. Throughout our simulation on containermanagement, we estimate howmuch our service operation cost canbe reduced. In our simulation, container usage-based predictionsand migration opportunities are obtained over the course of theweek, and we apply our intelligent management policy over thenext week. According to our evaluation, we find that approximately14.7% of the total running time of the cloud instance can be reduced.

2 CONTAINER ORCHESTRATION POLICYThe AWS instance type is fixed to m5.large [1] which has 2 cores ofvCPU and 8GB of memory. Up to 40 containers are allocated to onecloud instance. If the CPU utilization exceeds 95% or the availablememory decreases below 500MB, further container allocation is notallowed. When all cloud instances are fully allocated for contain-ers, a new cloud instance will be launched for upcoming containerallocation. If there are multiple cloud instances are available forcontainer allocation, one of them are selected based on the scoregiven to the cloud instance. The score assigned to a cloud instancedecreases from a 10 point scale. When the CPU utilization usage

https://doi.org/xx.xxx/xxx_x

https://doi.org/xx.xxx/xxx_x

SMA 2020, Jeju, Republic of Korea,Youseok Nam and Hwansoo Han

Figure 1: Cloud instance with surplus resources

increases, the CPU score decreases. When the remaining memorybecomes low, the memory score decreases. When the number of cur-rently allocated containers increases, the container score decreases.Among the three scores assigned to a cloud instance, the lowest oneamong them is selected as the score for the cloud instance. If thereare cloud instances with the same score, the requested container isallocated to the cloud instance launched the latest.

If we use a conservative, fixed container allocation policy asdescribed before, we may stably provide services, but occasion-ally miss the chances to fully utilize the cloud instances. All cloudinstances will be in different conditions in terms of resource uti-lization. Even if containers are fully allocated up to 40 ones, actualCPU utilization and/or memory usage still show available capabil-ity for more containers. Fig. 1 depicts the situation where surplusresources are available during the life time of cloud instance. Inorder to minimize the cost, it is best to place as many contain-ers as possible by using the surplus resources to the maximum.

However, the amount of surplus resources are different for eachcloud instance and the service administrator does not know howmuch resources will be used by the container to be allocated newly.Thus, a default management policy was required. We set a predictedamount of resource usage to some extent for the container to be al-located newly and compare it with the available resource to decidecontainer allocation on cloud instances.

3 PREDICTED RESOURCE USAGEIt is unknown howmany resources a container will use in the future,but it can be predicted through usage logs accumulated previously.The usage logger records every minute CPU utilization (%) andmemory usage (MB) per cloud instance; which cloud instance usesCPU andmemory, the amount and the time. Since we deploy Dockercontainers for our service, the usage information can be obtainedthrough the Docker Engine API [4].

Cost-Effective Container Orchestration Using Usage DataSMA 2020, Jeju, Republic of Korea,

The predicted container usage amount was estimated by takingthe average value for CPU utilization and the maximum value formemory out of the usage log records per minute. However, amongthe recorded usage, it was necessary to exclude those that werefar from the average, because they suddenly showed high or lowusage in a short time. To filter out glitches in usage records, theaverage and standard deviation were calculated for the all usagerecords, and the usage records outside the standard deviation werefiltered out. Then, only the remaining records are used to calculatethe average usage for the container.

For containers which are created from the same user for thesame programming IDE, we can use the resource usages from theprevious containers as prediction for the containers to allocate.However, for containers created for new users, we do not have anyprevious usage data. For these containers, a categorical prediction isused and the average usage of other containers in the same categoryis used as inferred usage for the new container. To group containersin to the same category, the operating system and the softwarestack used in those containers are useful indicators. If indicators aretoo specific, the log data for the category is too small to representthe category. In our cloud IDE service, programming languagesused by the users, such as C/C++ and Java, are popular and have alot of log data. Thus, program language in IDE service was selectedas an indicator for grouping category.

4 CONTAINER MIGRATIONSince the predicted usage amount is literally a prediction, the con-tainer can actually use more resources. Moreover, there may becases where a large number of cloud instances have surplus re-sources at certain points of time, since containers have differentlifetimes. In this case, it would be beneficial to merge the containersof multiple cloud instances into one cloud instance and releasethe rest, rather than to wait for a new container allocation. Plac-ing running containers to other cloud instance requires containermigration facility [3]. Since container migration requires movingboth storage and container internal memory, the cost is somewhatexpensive. Therefore, it is necessary to proceed with migration,when the containers of multiple cloud instances can be combinedinto one without causing resource problems.

5 EVALUATIONSince there are many containers and logs for containers are huge,we cannot calculate all the logs at once. Therefore, we calculatethe predicted resource amounts based on the logs accumulatedfor the past week, and apply them in the next week to simulatehow much difference can make for the cost of operating servicebetween two container managing policies — static, fixed containeroperation policy and dynamic container operation policy with us-age prediction and container migration. Minimizing cost can beachieved by allocating containers efficiently and reducing the totallease time of all cloud instances as much as possible. To do this, weneed to obtain the life time of each container and instance resourceusage per minute for a week, and then simulation is performedto investigate if containers can be placed more efficiently at thebeginning of the deployment. In addition, it is necessary to decidewhether migration can be performed based on the instance resource

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

java

cpp

go

nodej

s

php

pyth

on

CP

U U

tili

zati

on (

%)

Programming Language

Figure 2: Predicted CPU utilization per programming lan-guage

0

50

100

150

200

250

300

350

400

450

java

cpp

go

nodej

s

php

pyth

on

Mem

ory

(M

B)

Programming Language

Figure 3: Predicted memory usage per programming lan-guage

usage and the number of allocated containers from the per-minutelog data. Thus, the lease times of containers, resource usages ofcloud instances, and the life times of instances are required at thegranularity of minute for our simulation study.

Our log-based simulation strategy is as follows. Basically, weproceed our simulation time by theminute unit. First, the containersthat exist for the first one minute are placed as their original cloudinstances. Second, we check whether migration is necessary. Third,when there is a container to deploy based on the container lifetime during the next one minute, we use its predicted resourceamounts and try to place it in an instance with surplus resources. Ifno available cloud instance exists, it is placed in the original cloudinstance. Fourth, when there is a cloud instance to launch duringthe minute unit, we check if all currently running cloud instancescan accept containers. We launch the new cloud instance, onlywhen the current instances cannot accept any more containers.Finally, we repeat the above simulation process every minute fora week. If migration occurs during the simulation process andthe cloud instance that should be running originally has no morecontainers to run, we reduce the lease time of that cloud instance. Ifthis cloud instance needs to be launched to accept containers later,the reduced lease time for that cloud instance is calculated up tothat point where the cloud instance re-launched.

SMA 2020, Jeju, Republic of Korea,Youseok Nam and Hwansoo Han

Table 1: Statistics in simulation study

Prediction SimulationDuration 7 days 7 days# Logs 1,419,305 1,328,331# Containers 4,271 4,468# Cloud instances 7 8

With these simulation strategy, we performed our study for theper-minute usage log data for two weeks from July 25, 2020 to Au-gust 7, 2020. Table 1 shows statistics for simulation study. Duringthe first week, we predicted resource usages for containers. Thispart is done with 1,419,305 logs accumulated from 4,271 contain-ers and 7 cloud instances. Our prediction on resource usage wasperformed with groups of containers and used the programminglanguages in IDE service as category indicator. In our study, wedistinguish six difference program languages, such as java, c++,go, node.js, php, and python. The predicted resource amounts foreach category of containers varies from 0.007% to 30% for CPUutilization, and they varies from 12MB to 1GB for memory usage.In Fig. 2 and Fig. 3, we present average predicted resource amountsfor six programming languages. IDE containers for Java showedthe most resource demanding containers in CPU utilization andmemory usage among six category containers.

Simulation study was continued with the predicted resourceamounts and applied our container management policy during thenext week from August 1, 2020 to August 7, 2020. The simulationprocess was based on 1,328,331 logs accumulated from 4,468 contain-ers and 8 instances. Since 1,190 containers among 4,468 containersare created for the same user profile, we used the predicted resourceusages from the previous containers. The rest 3,278 containers arecreated for new users. These containers used the predicted resourceusages based on programming language category data. As a result,7,019 minutes has been reduced in the total lease time of cloudinstances, which is about 14.7% of 47,760 minutes of the originaltotal lease time of cloud instances.

6 CONCLUSIONSWe simulated how much cost could be saved, when we efficientlyuse the surplus resources wasted by the fixed container operationpolicy and efficiently deploy containers with container migration.Based on the usage data during one week, we applied to our con-tainer management policy for the next one week. As a result, a totalof 7,019 minutes decreased out of 47,760 minutes, which is 14.7%reduction in the lease time of cloud instances for our cloud IDEservice.

ACKNOWLEDGMENTSThis work was supported by Institute of Information & commu-nications Technology Planning & Evaluation(IITP) grant fundedby the Korea government(MSIT) (No. 2020-0-01616, Research andDevelopment of Container Management and Source Code AnalysisTechnology to Strengthen the Competitiveness of the Cloud IDEService).

REFERENCES[1] Amazon. 2017. Introducing Amazon EC2M5 Instances. (2017). https://aws.amazon.

com/about-aws/whats-new/2017/11/introducing-amazon-ec2-m5-instances/[2] Jeff Barr. 2006. Amazon EC2. (2006). https://aws.amazon.com/ec2[3] Tyco blog. 2014. Live Migration of Linux Containers. (2014). https://tycho.ws/

blog/2014/09/container-migration.html[4] Docker. 2013. Develop with Docker Engine API. (2013). https://docs.docker.com/

engine/api/[5] Docker. 2013. Docker Security. (2013). https://docs.docker.com/engine/security/

security/[6] goorm. 2013. Anyone can develop. (2013). https://www.goorm.io[7] Solomon Hykes. 2013. Docker. (2013). https://www.docker.com/

https://aws.amazon.com/about-aws/whats-new/2017/11/introducing-amazon-ec2-m5-instances/

https://aws.amazon.com/about-aws/whats-new/2017/11/introducing-amazon-ec2-m5-instances/

https://aws.amazon.com/ec2

https://tycho.ws/blog/2014/09/container-migration.html

https://tycho.ws/blog/2014/09/container-migration.html

https://docs.docker.com/engine/api/

https://docs.docker.com/engine/api/

https://docs.docker.com/engine/security/security/

https://docs.docker.com/engine/security/security/

https://www.goorm.io

https://www.docker.com/

cost-effective container orchestration using usage data

Documents