computer science dynamic resource management in internet data centers prashant shenoy university of...

Computer Science

Dynamic Resource Management in Internet Data

Centers

Prashant Shenoy

University of Massachusetts

Computer Science

Motivation

Internet applications used in a variety of domains Online banking, online brokerage,

online music store, e-commerce

Internet usage continues to grow rapidly Broadband deployment is

accelerating

Outages of Internet applications more common

“Site not responding”“connection timed out”

Computer Science

Internet Application Outages

Down for 30 minutes

Average download time ~ 260 sec

Periodic outages over 4 days

Cause: Too many users leading to overload

Holiday Shopping Season 2000:

9/11: site inaccessible for brief periods

Computer Science

0

20000

40000

60000

80000

100000

120000

140000

0 6 12 18 24

Time (hrs)

Request rate (req/min)

Internet Workloads are highly variable

Short-term fluctuations “Slashdot Effect” Flash Crowds

Long-term seasonal effects Time-of-day, month-of-year

Peak difficult to predict Static overprovisoning not effective

Manual allocation: slow

Soccer World Cup’98

Key Issue: How can we design applications to handle large workload variations?

Computer Science

Internet Data Centers Internet applications run

on data centers Server farms

Provide computational and storage resources

Applications share data center resources

Problem: How should the platform allocate resources to absorb workload variations?

Computer Science

Talk Outline

Motivation Internet data center model Dynamic provisioning Request Policing Cataclysm Server Platform Experimental results Summary

Computer Science

Data Center Model

Dedicated hosting: each application runs on a subset of servers in the data center Subsets are mutually exclusive: no server sharing Data center hosts multiple applications

Free server pool: unused servers

Retail Web site streaming

Computer Science

Internet Application Model

Internet applications: multiple tiers Example: 3 tiers: HTTP, J2EE app server, database

Replicable applications Individual tiers: partially or fully replicable Example: clustered HTTP, J2EE server, shared-nothing db

Each application employs a sentry Each tier uses a dispatcher: load balancing

requests

http J2EE

database Load balancing sentry

Computer Science

Approach

Dynamic provisioning Allocate servers to applications on-the-fly

Request policing Turn away excess requests Degrade performance based on SLA

Couple provisioning and policing

Computer Science

Research Questions

How many servers to allocate and when? Multi-tier apps: when and how to provision each tier?

How many requests should be turned away during overload? Multi-tier apps: where should requests be dropped?

Can we meet SLAs during overloads?

Is it possible to predict future workloads?

Computer Science

Dynamic Provisioning

Key idea: increase or decrease allocated servers to handle workload fluctuations Monitor incoming workload Compute current or future demand Match number of allocated servers to demand

Monitor workloadMonitor workload

Compute current/future demand

Compute current/future demand Adjust allocationAdjust allocation

Computer Science

Single-tier Provisioning Single tier provisioning well studied [Muse, TACT]

Non-trivial to extend to multiple-tiers

Strawman #1: use single-tier provisioning independently at each tier

Problem: independent tier provisioning may not increase goodput

C=15 C=10 C=10.1

14 req/s14 10 10

dropped 4 req/s

Computer Science

Single-tier Provisioning Single tier provisioning well studied [Muse, TACT]

Non-trivial to extend to multiple-tiers

Strawman #1: use single-tier provisioning independently at each tier

Problem: independent tier provisioning may not increase goodput

C=15 C=10.1

14 req/s14

C=20

14

dropped 3.9 req/s

10.1

Computer Science

Model-based Provisioning Black box approach

Treat application as a black box Measure response time from outside Increase allocation if response time > SLA

• Use a model to determine how much to allocate

Strawman #2: use black box for multi-tier apps Problems:

Unclear which tier needs more capacity May not increase goodput if bottleneck tier is not replicable

14 req/s

C=15 C=10.1

14

C=20

14 10.1

Computer Science

Provisioning Multi-tier Apps Approach: holistic view of multi-tier application

Determine tier-specific capacity independently Allocate capacity by looking at all tiers (and other apps)

Predictive provisioning Long-term provisioning: time scale of hours Maintain long-term workload statistics Predict and provisioning for the next few hours

Reactive provisioning Short term provisioning: time scale of several minutes React to “current” workload trends Correct errors of long-term provisioning Handle flash crowds (inherently unpredictable)

Computer Science

Workload Prediction

Long term workload monitoring and prediction Monitor workload for multiple days Maintain a histogram for each hour of

the day• Capture time of day effects

Forecast based on• Observed workload for that hour in the

past• Observed workload for the past few

hours of the current day Predict a high percentile of expected

workload

Mon

Tue

Wed

Today

Computer Science

Predictive Provisioning

Queuing theoretic application model Each individual server is a G/G/1 queue

Derive per-tier E(r) from end-to-end SLA Monitor other parameters and determine per-server

capacity) Use predicted workload pred to determine # servers per tier

• Assumes perfect load balancing in each tier Alternative: each tier G/G/k

G/G/1

G/G/1

G/G/1

pred

€

≥ E(s) +σ a

2 +σ b2

2 * E(r) − E(s)( )

⎛

⎝ ⎜

⎞

⎠ ⎟

−1

Computer Science

Reactive Provisioning

Idea: react to current conditions Useful for capturing significant short-term fluctuations Can correct errors in predictions

Track error between long-term predictions and actual Allocate additional servers if error exceeds a threshold Account for prediction errors

Can be invoked if request drop rate exceeds a threshold Handles sudden flash crowds

Operates over time scale of a few minutes Pure reactive provisioning: lags workload

Reactive + predictive more effective!

Predictionerrorpred

actual

error > Invokereactor

time series

allocate servers

Computer Science

Talk Outline


Computer Science

Request Policing

Key Idea: If incoming req. rate > current capacity Turn away excess requests Degrade performance of requests

Why police when you can provision? Provisioning is not instantaneous

• Residual sessions on reallocated server

• Application and OS installation and configuration overheads Overhead of several (5-30) minutes

Sentry policing G/G/1

G/G/1

G/G/1drop

Computer Science

Class-based Differentiation

Some requests are more important than others Purchase versus catalog browsing Stock trade versus view account balance

Overload => preferentially let in more important requests Maximize utility during overload

Incoming requests queued up in class queues Example: gold, silver, bronze class

Higher priority to more important classes

Sentry policing

drop

Computer Science

Scalable Policing Techniques

Examining individual requests infeasible Incoming rate may be order of magnitude greater than capacity Need to reduce overhead of policing decisions

Idea #1: Batch processing Premise: Requests arrivals are bursty Admit a batch of queued up requests

• One admission control test per batch

• Reduces overhead from O(n) to O(b)

Idea #2: Use pre-computed thresholds Example: capacity = 100 req/s, G=75, S=50, B=50 req/s

• Admit all gold, half of silver and no broze Periodically estimate and s: compute threshold O(1) overhead: trades accuracy for efficiency

Computer Science

Cataclysm Server Platform

Prototype data center Commodity hardware

40+ Pentium servers 2 TB of RAID arrays Gigabit switches Linux-based platform

Computer Science

Cataclysm Software Architecture

Cataclysm Control PlaneCataclysm Control PlaneProvisioningGlobal allocationApp placement

Nuc

leus

Apps

OS N

ucle

us

Apps

OS

Nuc

leus

Apps

OS

Server Node

Runs apps, sentries

Resourcemonitoring,Local allocation

Two key components: control plane and nuclei

Computer Science

Cataclysm Node Architecture

Capsule: component of an app on a node Qlinux: proportional-sharing of node resources Nucleus: resource allocations across capsules and

VMs

Nuc

leus

Capsule

QLinux HSFQ CPU schedulerProp-share packet schedCello disk schedulerSFVM memory mgr

Nuc

leus

QLinux

Capsule

VM

Capsule

VM

Capsule

VM

Active Dormant

UML Xen

Computer Science

Cataclysm Applications

Multi-tiered apps: Rubis (e-auctions), Rubbos (b-board) Apache, JBOSS, mysql

Tier-1 Sentry Ktcpvs: kernel HTTP load balancer Request policing and class-based differentiation Workload monitoring

Tier-2 sentry: Apache JBOSS redirector, workload monitoring Nuclues: Linux trace toolkit, /proc to monitor node statistics All system components are replicable!

ApacheLoad bal

police

ktcpvs

Apache JBOSSmysql

Computer Science

Talk Outline


Computer Science

Dynamic Provisioning

Server Allocation adapts to changing workload

Workload Server Allocation

0

50

100

150

200

250

0 2 4 6 8 10

Time (min)

Workload (number sessions) 0

1

2

3

4

5

0 2 4 6 8 10

Time (min)

Number of servers

RuBiS: E-auction application like Ebay

Computer Science

Class-based differentiation

Arrival rate

0

50

100

150

200

250

0 100 200 300 400 500 600

Time (sec)

Arr

ival

rat

e

GLD

SIL

BRZ

Fraction admitted

0

0.2

0.4

0.6

0.8

1

1.2

0 100 200 300 400 500 600

Time (sec)

Fra

ctio

n a

dm

itte

d

GLD

SIL

BRZ

Computer Science

Threshold-based: higher scalability

Scalability

0

20

40

60

80

100

0 5000 10000 15000 20000

Arrival rate

CP

U u

sag

e

Batch

Thresh

Computer Science

Other Research Results

OS Resource Allocation Qlinux [ACM MM00], SFS [OSDI00], DFS [RTAS02] SHARC cluster-based prop. sharing [TPDS03]

Shared hosting provisioning Measurement-based [IWQOS02], Queuing-based

[Sigmetrics03,IWQOS03] Provisioning granularity [Self-manage 03]

Application placement [PDCS 2004] Profiling and Overbooking [OSDI02]

Storage issues iSCSI vs NFS [FAST03], Policy-managed [TR03]

Computer Science

Glimpse of Other Projects

Hyperion: Network processor based measurement platform Measurement in the backbone and at the edge NP-based measurements in the data center

RiSE: Rich Sensor Environments Video sensor networks Robotics sensor networks Real-time sensor networks Weather sensors

Computer Science

Concluding Remarks

Internet applications see varying workloads

Handle workload dynamics by Dynamic capacity provisioning Request Policing

Need to account for multi-tiered applications

Joint work: Bhuvan Urgaonkar, Abhishek Chandra and Vijay Sundaram

More at http://lass.cs.umass.edu

Computer Science

Predictive Provisioning

Invoked once every hour Captures long-term variations - time of day effects Extensions to seasonal effects (month-of-year, holidays)

How to initialize? Needs several days of history to work well

What happens if no servers are available? Use revenue/utility to arbitrate allocation [Muse] Turn away excess requests

Non-replicable tiers are easy to handle Provision other tiers until non-replicable tier is saturated

Computer Science

Degrade or Drop?

Depends on the application and the SLA Degrading increases effective capacity

Also degrades performance seen by requests

Degrade if Utility from servicing more requests at lower performance > Utility from servicing fewer requests - penalty of dropping

requests

Otherwise drop requests

< 500ms r1

< 1s r2

<10s r3

SLA:

Computer Science

Use of Virtual Machine Monitors

Server allocation can be slow (~ 5-20+ minutes) Need residual sessions to terminate Disk scrubbing, OS and app installation, configuration Application and system overheads

Flash crowds => need fast allocation Use virtual machines

Each app runs inside a VM, multiple VMs on a server Only one VM is active at any time, other VMs are “hot spares”

Server allocation => idle one VM, activate another System overhead reduces to < 1s Need to still account for residual sessions

Application issue, not longer a system limitation

Computer Science

Threshold-based: loss of accuracy

Response time

0

1000

2000

3000

4000

5000

0 100 200 300 400 500

Time (sec)

95th

res

p t

ime

(mse

c)

GLDSILBRZ

Admission rate

0

50

100

150

200

250

0 100 200 300 400 500

Time (sec)

Ad

mis

sio

n r

ate

GLDSILBRZ

computer science dynamic resource management in internet data centers prashant shenoy university of...

Documents

policing slide

allocation slide

data center subsets

data centers server

j2ee server

ecommerce internet usage

storage resources applications

data center resources