architecting tenant based qos in multi-tenant cloud platforms

ARCHITECTING TENANT BASED

QOS IN MULTI-TENANT CLOUD

PLATFORMS

- Arun prasath S

3

Table of Contents

Introduction 4

Problem statement 4

Solution overview 5

Goals 5

Typical implementation in production environment 6

Linux Containers 8

Components of LXC 8

Control group (cgroup) 9

Subsystems 10

blkio Subsystem 10

CPU Subsystem 10

Memory Subsystem 10

Python Controller 11

Puppet 12

Docker 13

Other options 13

Summary 14

References 14

4

Introduction

Achieving QOS in a multi-tenant cloud platforms is still a difficult task and many

companies follow different approaches to solve this problem. Here in this document I tried

architecting a simple solution for achieving different QOS for different tenants in a Multi-tenant

cloud environment based on my experiments.

Problem statement

Openstack steps into platform as service by introducing a new component ‘Trove’

Database as Service (DBaaS) offering in its upcoming Icehouse release. But Openstack announced

that Trove will be operating as Single tenant service (Which means, for each Database instance,

a new VM will be created). This is a costly affair for cloud service providers and also resources

may not be used efficiently in this scenario.

Many big cloud service providers like Google and Amazon provides options for the same

DBaaS as a multi-tenant service. In this case, many instances of the DB will run in a single virtual

machine. This reduces the cost of running extra virtual machines.

But it also have few problems like QOS, security and isolation. The QOS factors are CPU,

Memory, IOPS (Input/Output Operations per second) etc.

More than one DB instance will be running in a single machine. In worst case scenarios

one DB Instance may end up eating large amount of resources which greatly affects other DB

Instances. We need to guarantee the QOS as mentioned in the SLA.

Since more than one DB Instance will be running in a single machine, we have some

security considerations. When one customer’s database gets affected it must not affect other

instances.

Also consumers in Single tenant are charged based on their usages like number of IO, total

space, CPU, memory etc. But when it comes to multi-tenant it’s hard to estimate the usage as

more than one DB instance will be running in a single virtual machine.

In another perspective, the existing solution of creating each VM for each customer has a

drawback of running separate operating system for each customer. This separate operating

system is an extra load for the service provider as it need a lot of data space and memory.

5

Solution overview

In my proposed solution, I used Linux containers running inside a virtual machines for

isolation DB Instances. Each database instance will be running inside a Container. Therefore we

can achieve true isolation, resources can be controlled and metering is also quite easy. With

cgroup feature we can control the IOPS of the container and thereby we can offer different

service level (IOPS) for different tenants.

Existing offering (In Openstack Trove) Proposed solution

Goals Tenant based QOS

Multi tenancy

True resource isolation for DB Instances

Perfect metering

Automation

6

Typical implementation in production environment

This is a typical implementation in the production environment. The user requests for a

database instance using the dashboard. Once when the request is initiated, the Python controller

gets the resources specified for a particular flavor in Nova and then consolidates the existing

containers.

If there is space available in the any virtual machines, the container is created there. Or

else a new nova virtual machine is created and then the container is created in that virtual

machine with the user specified parameters (CPU, RAM and IOPS).

7

Each time a virtual machine is created, it is discovered by puppet and the container

software (LXC or Docker) is installed. Each time a container is created, MySQL is installed.

After the creation and provisioning of the containers the users are provided the access to

the database. (IP Address, MySQL username and password).

The above is a modular approach in provisioning server. However for smaller companies

the architecture can be simplified by using pre-built vagrant or golden images.

The following is the brief of all the components mentioned above.

8

Linux Containers

Linux containers provide light weight operating system level virtualization which isolates

processes and resources in a simpler way compared to full-scale virtual machines. LXC works in

the way similar to virtualization but with the difference that it don’t need separate kernel

instance. It allows us to create many number of sand box environment which is completely

isolated from the host and other containers.

Components of LXC Namespaces – Used to provide process isolation

cgroups – Used to control System management and resource control

SELinux – Ensures isolation between host and the container and also Individual containers

Libvirt- Tool box to manage containers

Since QOS is our primary objective, we are going to focus more on control groups.

9

Control group (cgroup)

Control group is a kernel feature to limit the resources like CPU, System memory and

network bandwidth among the user-defined groups of tasks.

For example, we can limit a MySQL instance from using all memory. In the same way we

can guarantee that the MySQL instance gets the specified resource.

In this architecture, I am using cgroup feature on Linux containers to isolate DB Instances

and guarantee the minimum QOS for the customer.

Limits for a particular container is defined in the containers configuration file. Hence we

can allocate different resources for different containers based on customer requirements.

In our scenario the containers will be running as process and the processes inside the

containers will be running as the sub process.

10

Subsystems Subsystems are kernel modules that are aware of cgroups. They are resource controllers

that allocate varying level of system resources to different cgroups. The following are the

subsystems of cgroup.

blkio Subsystem The Block I/O subsystem controls and monitors access to I/O on block devices by tasks in

cgroups. It offers features like proportional weight division and I/O throttling (Upper limit).

Common parameters:

blkio.throttle.read_iops_device - specifies the upper limit on the number of read operations a

device can perform

blkio.throttle.read_bps_device - specifies the upper limit on the number of read operations a

device can perform

blkio.throttle.write_bps_device - specifies the upper limit on the number of write operations a

device can perform

CPU Subsystem The cpu subsystem schedules CPU access to cgroups.

Common parameters:

cpu.shares - contains an integer value that specifies a relative share of CPU time available to the

tasks in a cgroup

cpu.rt_period_us - specifies a period of time in microseconds (µs, represented here as "us") for

how regularly a cgroup's access to CPU resource should be reallocated

Memory Subsystem The memory subsystem generates automatic reports on memory resources used by the tasks in

a cgroup, and sets limits on memory use by those tasks

Common parameters:

memory.usage_in_bytes - reports the total current memory usage by processes in the cgroup (in

bytes)

memory.max_usage_in_bytes - reports the maximum memory used by processes in the cgroup

(in bytes)

memory.limit_in_bytes - sets the maximum amount of user memory

There are also various other subsystems like cpuacct, cpuset, devices, freezer etc. Those can be

used in our scenario for enhanced configurations.

11

Python Controller

In a fresh Openstack environment when a user requests an instance, a new VM is created.

But in our case we need to provision containers. Hence we need to modify the normal Openstack

work flow.

One popular way to do this is via REST based API. Since I am a python guy, I am doing this

via Python APIs provided by Openstack.

All details of the containers created by the users is saved in the local MySQL database. In

this scenario, the user is shown a dashboard or a form for database provisioning. When the user

requests the instance, this python controller takes control. It gets the flavor details we used to

build nova VM by enquiring Openstack. Then it consolidates the containers provisioned by using

the local database. If it could not find any space, then a new nova VM is created using API calls

and then the process continues. If existing VM has necessary resource to provision a container,

then the container is created in that existing VM.

The following is a sample python code for creating an Instance.

Initially we can set the resource level options for any particular flavor.

nova-manage flavor set_key --name m1.small --key quota:disk_read_bytes_sec

--value 10240000

nova-manage flavor set_key --name m1.small --key quota:disk_write_bytes_sec --value 10240000

12

Puppet

Openstack can provide any number of machines based on demand. But to get all those

machines into production (Installing required softwares like LXC or Docker in our scenario), we

need some automation. There are various automation tools for change and configuration

management. In this scenario I used puppet.

Puppet can manage our servers. In a puppet environment, we describe the necessary

machine state in a declarative code. Puppet clients connects to the server and ensures that they

are in the state described by the manifest file in server.

In our scenario we will be defining manifests for installing LXC or Docker. Once after the

necessary container is installed we bring the container under control of puppet for software

(MySQL) installation.

Puppet manifest for MySQL is available in Github.

https://github.com/puppetlabs/puppetlabs-mysql

13

Docker

Docker is an open source developer-friendly abstraction layer on the top of Linux

containers (LXC). Docker gives a simple and meaningful layer to play with containers in a cloud

environment. By using Docker we can actually build containers, use it and make changes based

on our need, push our used containers to the Docker repository and pull any time and any

number of time for further usage. This means a lot in a PAAS market.

In a high level terminology, Docker can automate the deployment of applications as highly

portable, self-sufficient containers which are independent of hardware, language, framework,

packaging system and hosting provider.

Docker also provides drivers for Openstack which embeds with Nova and provides ability

to work with containers along with nova virtual machines. Since most Openstack production

environment need to instantiate various different operating systems, we can have a work around

and achieve our need. In this scenario we are going to run a Docker or LXC on the top of a virtual

machine.

Docker, along with puppet or chef can be very useful for the Platform as a Service

providers. They are very useful in automated provisioning of platforms required for developers,

in a very convenient and sophisticated way. Thus making operations team work much easier.

Other options

The above method is one way of creating multi-tenant cloud environment. But there are

many number of ways to achieve it using various other options.

Rackspace uses OpenVZ to build their cloud platforms. They uses OpenVZ to contain

their customers database and for resource isolation. OpenVZ has many advantages over LXC.

Resource allocation is made simple in OpenVZ. (i.e. Guaranteed RAM and Burstable RAM are

specified using simple commands ). Live migrations are quite easy in OpenVZ when compared

to LXC.

Oracle follows an interesting architecture in its DBaaS offering. They created a

customized ‘Container Database’. All the customer databases are in Pluggable database (PDB)

format and they can be plugged to the container database and can work on.

14

Summary

Thus the tenant based QOS feature is achieved in a multi-tenant cloud platform. I haven’t

mentioned about some other features and drawbacks like migration, scale up, high availability

etc. All those drawbacks can be rectified by having some workaround in the architecture.

References

1) Linux Plumbers Conference 2013, Rackspace session

2) http://www.kernel.org/doc/Documentation/cgroups

3) http://docs.openstack.org/developer/trove/

4) https://help.ubuntu.com/lts/serverguide/lxc.html

5) http://wwwkemper.informatik.tu-uenchen.de/research/publications/conferences/sigmod2008-

mtd.pdf

6) http://blog.docker.io/2013/10/gathering-lxc-docker-containers-metrics/

7) http://www.vldb.org/pvldb/vol7/p37-das.pdf

http://www.kernel.org/doc/Documentation/cgroups/memory.txt

http://docs.openstack.org/developer/trove/

https://help.ubuntu.com/lts/serverguide/lxc.html

http://wwwkemper.informatik.tu-uenchen.de/research/publications/conferences/sigmod2008-mtd.pdf

http://wwwkemper.informatik.tu-uenchen.de/research/publications/conferences/sigmod2008-mtd.pdf

http://blog.docker.io/2013/10/gathering-lxc-docker-containers-metrics/

http://www.vldb.org/pvldb/vol7/p37-das.pdf

architecting tenant based qos in multi-tenant cloud platforms

Engineering

tenant cloud

single virtual

manage flavor

virtual machine

single machine

virtual machines

production

common parameters