how to tame your vm - usenix...web1 web1-target web2 web2-target 50 per. mov. avg. (web1) 50 per....

How to Tame your VM: an Automated Control System for Virtualized Services

Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu kim@ece.cmu.edu

Department of Electrical and Computer Engineering

Carnegie Mellon University

LISA 2010 - November 11, 2010

Virtualized Services

• Virtualized Infrastructure (vSphere,Hyper-V,ESX,KVM)

– Easy to deploy

– Easy to migrate

– Easy to re-use virtual machines

• Multi-tier services are configured and deployed as virtual machines

Hypervisor

Db3 Web3 App3

Problem: What about performance?

• VMs share physical resource (CPU time / Disk / Network / Memory bandwidth)

• Need to maintain service quality

– Response time / Latency

• Existing infrastructure provides mismatch interface between the service’s performance goal (latency) and configurable parameters (resource shares) – Admins need to fully understand the

services before able to tune the system

Hypervisor

Db3 Web3 App3

Existing Solutions – Managing Service Performance

• Resource Provisioning

– Admins define resource shares / reservations / limits

– The “correct” values depends on hardware / applications / workloads

• Load Balancing

– Migrate VMs to balance host utilization

– Free up capacity for each host (low utilization ~ better performance)

vm1 vm3

Host 1

Host 2

# Shares

Automated Resource Control

• Dynamically adjust resource shares during run-time

• Input = required service response time – I want to serve pages from

vm1 in less than 4 seconds

• The system calculates number of shares required to maintain service level – Adapt to changes in workload

– Admins do not have to guess the number of shares

Control system

vm1 target vm1 response

Control System

Controller

Modeler

Sensor

Actuator

Sensor

Actuator

KVM Host

Sensor

Actuator

KVM Host KVM Host

Record service response time

Adjust host CPU shares

Calculate model parameters

Calculate # of shares required

Sensor unit

Server VM

Sensor VM

Host Bridge

Request Packets

xtables TEE

tshark

To Modeler analyzer

{service:blog, tier:web, vm: www1, response time 250 ms}, {service:blog, tier:db, vm:db1, response time 2.5 ms}

• Record service response time • Starts when the server sees

“request” packet • Until the server sends

“response” packet • Based on libpcap / tshark – need

to recognize the application’s header

Request Packets

Actuator Unit

Actuator Agent

Hypervisor

Shares Allocation

CPU Cgroup

Server 2 VM

• Adjust number of shares based on controller’s allocation

• We use Linux/KVM as our hypervisor • Cgroup cpu.shares –

control CFS scheduling shares

• Similar mechanism exists on other platform (Xen weight / ESX shares)

Server 1 VM

{server1:80, server2:20 }

Modeler

Web server Database

Client

Tweb = f (TDB ) TDB = f (Scpu)

• The modeler calculates the expected service response time based on shares allocated to each VM

• We use an intuitive model for our 2-tier services in the experiment

Resource Shares Effect

Host 1 Host 2

Web server

VM (Wordpress)

SQL Server VM

(mysql)

Cruncher VM

(100% cpu)

Test Clients

SQL Server CPU Allocation effect

1 101 201 301 401 501 601 701 801 901

Time step (5s)

Web Response

thttp-msec cpu

1 101 201 301 401 501 601 701 801 901

Time step (5s)

Database Response

tdb-msec cpu

Database Response vs CPU

0 50 100 150 200 250 300 350 400 450 500

# of CPU Shares allocated to SQL VM (out of 1000)

TDB = a0 (Scpu)b0

Database Response vs CPU (controlled)

0 100 200 300 400 500 600 700 800 900 1000

# of CPU Shares allocated to SQL server VM (out of 1000)

HTTP vs Database response

0 2 4 6 8 10 12 14 16 18 20

Database Response (ms)

Tweb = a1 (TDB ) + b1

Controllers

• Model uses readings to estimate the service parameters <a0,b0,a1,b1> – TDB = a0 (Scpu)b0

– Tweb = a1 (TDB ) + b1

• Controller finds the minimal Scpu such that Tweb < specified response time – Long-term control: uses moving average to find model

parameters & shares – Short-term control: uses the last-period reading to

find model parameters • Avoid excessive response time violation while waiting for the

long-term model to settle

System Evaluations

Host 1 Host 2

Test Clients

Host 3

Web server 1 VM DB Server 1

Web server 2 VM

DB server 2 VM

Actuator Sensor Sensor Sensor

Static Workload

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191

web1 web1-target web2 web2-target

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191

web1 web1-target web2 web2-target

Dynamic Workload

1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601

web1 web1-target web2 web2-target 50 per. Mov. Avg. (web1) 50 per. Mov. Avg. (web2)

Web1 Load

1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601

web1 web1-target web2 web2-target 50 per. Mov. Avg. (web1) 50 per. Mov. Avg. (web2)

Deviation from Target

Without Control With Control

Mean Deviation (ms)

Target Violation Period

Mean Deviation (ms)

Target Violation Period

Static Load

Instance 1 540 0% 109 (-80%) 9%

Instance 2 1043 100% 282 (-73%) 26%

Dynamic Load

Instance 1 276 7% 182 (-34%) 9%

Instance 2 354 27% 336 (-5%) 12%

Our control system helps track the target service response time

Further Improvements

• Enhance Service Model

– Service dependencies

• Cache / Proxy / Load balancer effects

– Non-cpu resource (disk / network / memory)

• Robust Control system

– Still susceptible to temporal system effects (garbage collections / cron jobs)

– Need to determine feasibility of the target performance

Conclusions

• Automated Resource Control system

– Maintain target system’s response time

– Allows admin to express allocation in term of performance target

• Managing service performance could be automate with sufficient domain knowledge

– Need basic framework to describe performance model

– Leave the details to the machines (parameter fitting / optimization / resource monitoring)

Sample Response & Share values

1 51 101 151 201 251 301 351

target

1 51 101 151 201 251 301 351

shares

how to tame your vm - usenix...web1 web1-target web2 web2-target 50 per. mov. avg. (web1) 50 per....

Documents

mov oscilatorio

258860 web1

mov. parabolico

mov preindependentista

275296 web1

mov manual

mov bodyboards

mov. ondulatorio

branding mov

mov final.xlsx

microsoft word - mcsl 17 assembly lab manual€¦ · web...

mov, relativo

298494 web1

29963 web1

64158 web1

mov vishay

srmagnov2013 web1

mov analysis

giftguide web1

mov software