computer science dynamic resource management in internet data centers prashant shenoy university of...
TRANSCRIPT
Computer Science
Dynamic Resource Management in Internet Data
Centers
Prashant Shenoy
University of Massachusetts
Computer Science
Motivation
Internet applications used in a variety of domains Online banking, online brokerage,
online music store, e-commerce
Internet usage continues to grow rapidly Broadband deployment is
accelerating
Outages of Internet applications more common
“Site not responding”“connection timed out”
Computer Science
Internet Application Outages
Down for 30 minutes
Average download time ~ 260 sec
Periodic outages over 4 days
Cause: Too many users leading to overload
Holiday Shopping Season 2000:
9/11: site inaccessible for brief periods
Computer Science
0
20000
40000
60000
80000
100000
120000
140000
0 6 12 18 24
Time (hrs)
Request rate (req/min)
Internet Workloads are highly variable
Short-term fluctuations “Slashdot Effect” Flash Crowds
Long-term seasonal effects Time-of-day, month-of-year
Peak difficult to predict Static overprovisoning not effective
Manual allocation: slow
Soccer World Cup’98
Key Issue: How can we design applications to handle large workload variations?
Computer Science
Internet Data Centers Internet applications run
on data centers Server farms
Provide computational and storage resources
Applications share data center resources
Problem: How should the platform allocate resources to absorb workload variations?
Computer Science
Talk Outline
Motivation Internet data center model Dynamic provisioning Request Policing Cataclysm Server Platform Experimental results Summary
Computer Science
Data Center Model
Dedicated hosting: each application runs on a subset of servers in the data center Subsets are mutually exclusive: no server sharing Data center hosts multiple applications
Free server pool: unused servers
Retail Web site streaming
Computer Science
Internet Application Model
Internet applications: multiple tiers Example: 3 tiers: HTTP, J2EE app server, database
Replicable applications Individual tiers: partially or fully replicable Example: clustered HTTP, J2EE server, shared-nothing db
Each application employs a sentry Each tier uses a dispatcher: load balancing
requests
http J2EE
database Load balancing sentry
Computer Science
Approach
Dynamic provisioning Allocate servers to applications on-the-fly
Request policing Turn away excess requests Degrade performance based on SLA
Couple provisioning and policing
Computer Science
Research Questions
How many servers to allocate and when? Multi-tier apps: when and how to provision each tier?
How many requests should be turned away during overload? Multi-tier apps: where should requests be dropped?
Can we meet SLAs during overloads?
Is it possible to predict future workloads?
Computer Science
Dynamic Provisioning
Key idea: increase or decrease allocated servers to handle workload fluctuations Monitor incoming workload Compute current or future demand Match number of allocated servers to demand
Monitor workloadMonitor workload
Compute current/future demand
Compute current/future demand Adjust allocationAdjust allocation
Computer Science
Single-tier Provisioning Single tier provisioning well studied [Muse, TACT]
Non-trivial to extend to multiple-tiers
Strawman #1: use single-tier provisioning independently at each tier
Problem: independent tier provisioning may not increase goodput
C=15 C=10 C=10.1
14 req/s14 10 10
dropped 4 req/s
Computer Science
Single-tier Provisioning Single tier provisioning well studied [Muse, TACT]
Non-trivial to extend to multiple-tiers
Strawman #1: use single-tier provisioning independently at each tier
Problem: independent tier provisioning may not increase goodput
C=15 C=10.1
14 req/s14
C=20
14
dropped 3.9 req/s
10.1
Computer Science
Model-based Provisioning Black box approach
Treat application as a black box Measure response time from outside Increase allocation if response time > SLA
• Use a model to determine how much to allocate
Strawman #2: use black box for multi-tier apps Problems:
Unclear which tier needs more capacity May not increase goodput if bottleneck tier is not replicable
14 req/s
C=15 C=10.1
14
C=20
14 10.1
Computer Science
Provisioning Multi-tier Apps Approach: holistic view of multi-tier application
Determine tier-specific capacity independently Allocate capacity by looking at all tiers (and other apps)
Predictive provisioning Long-term provisioning: time scale of hours Maintain long-term workload statistics Predict and provisioning for the next few hours
Reactive provisioning Short term provisioning: time scale of several minutes React to “current” workload trends Correct errors of long-term provisioning Handle flash crowds (inherently unpredictable)
Computer Science
Workload Prediction
Long term workload monitoring and prediction Monitor workload for multiple days Maintain a histogram for each hour of
the day• Capture time of day effects
Forecast based on• Observed workload for that hour in the
past• Observed workload for the past few
hours of the current day Predict a high percentile of expected
workload
Mon
Tue
Wed
Today
Computer Science
Predictive Provisioning
Queuing theoretic application model Each individual server is a G/G/1 queue
Derive per-tier E(r) from end-to-end SLA Monitor other parameters and determine per-server
capacity) Use predicted workload pred to determine # servers per tier
• Assumes perfect load balancing in each tier Alternative: each tier G/G/k
G/G/1
G/G/1
G/G/1
pred
€
≥ E(s) +σ a
2 +σ b2
2 * E(r) − E(s)( )
⎛
⎝ ⎜
⎞
⎠ ⎟
−1
Computer Science
Reactive Provisioning
Idea: react to current conditions Useful for capturing significant short-term fluctuations Can correct errors in predictions
Track error between long-term predictions and actual Allocate additional servers if error exceeds a threshold Account for prediction errors
Can be invoked if request drop rate exceeds a threshold Handles sudden flash crowds
Operates over time scale of a few minutes Pure reactive provisioning: lags workload
Reactive + predictive more effective!
Predictionerrorpred
actual
error > Invokereactor
time series
allocate servers
Computer Science
Talk Outline
Motivation Internet data center model Dynamic provisioning Request Policing Cataclysm Server Platform Experimental results Summary
Computer Science
Request Policing
Key Idea: If incoming req. rate > current capacity Turn away excess requests Degrade performance of requests
Why police when you can provision? Provisioning is not instantaneous
• Residual sessions on reallocated server
• Application and OS installation and configuration overheads Overhead of several (5-30) minutes
Sentry policing G/G/1
G/G/1
G/G/1drop
Computer Science
Class-based Differentiation
Some requests are more important than others Purchase versus catalog browsing Stock trade versus view account balance
Overload => preferentially let in more important requests Maximize utility during overload
Incoming requests queued up in class queues Example: gold, silver, bronze class
Higher priority to more important classes
Sentry policing
drop
Computer Science
Scalable Policing Techniques
Examining individual requests infeasible Incoming rate may be order of magnitude greater than capacity Need to reduce overhead of policing decisions
Idea #1: Batch processing Premise: Requests arrivals are bursty Admit a batch of queued up requests
• One admission control test per batch
• Reduces overhead from O(n) to O(b)
Idea #2: Use pre-computed thresholds Example: capacity = 100 req/s, G=75, S=50, B=50 req/s
• Admit all gold, half of silver and no broze Periodically estimate and s: compute threshold O(1) overhead: trades accuracy for efficiency
Computer Science
Cataclysm Server Platform
Prototype data center Commodity hardware
40+ Pentium servers 2 TB of RAID arrays Gigabit switches Linux-based platform
Computer Science
Cataclysm Software Architecture
Cataclysm Control PlaneCataclysm Control PlaneProvisioningGlobal allocationApp placement
Nuc
leus
Apps
OS N
ucle
us
Apps
OS
Nuc
leus
Apps
OS
Server Node
Runs apps, sentries
Resourcemonitoring,Local allocation
Two key components: control plane and nuclei
Computer Science
Cataclysm Node Architecture
Capsule: component of an app on a node Qlinux: proportional-sharing of node resources Nucleus: resource allocations across capsules and
VMs
Nuc
leus
Capsule
QLinux HSFQ CPU schedulerProp-share packet schedCello disk schedulerSFVM memory mgr
Nuc
leus
QLinux
Capsule
VM
Capsule
VM
Capsule
VM
Active Dormant
UML Xen
Computer Science
Cataclysm Applications
Multi-tiered apps: Rubis (e-auctions), Rubbos (b-board) Apache, JBOSS, mysql
Tier-1 Sentry Ktcpvs: kernel HTTP load balancer Request policing and class-based differentiation Workload monitoring
Tier-2 sentry: Apache JBOSS redirector, workload monitoring Nuclues: Linux trace toolkit, /proc to monitor node statistics All system components are replicable!
ApacheLoad bal
police
ktcpvs
Apache JBOSSmysql
Computer Science
Talk Outline
Motivation Internet data center model Dynamic provisioning Request Policing Cataclysm Server Platform Experimental results Summary
Computer Science
Dynamic Provisioning
Server Allocation adapts to changing workload
Workload Server Allocation
0
50
100
150
200
250
0 2 4 6 8 10
Time (min)
Workload (number sessions) 0
1
2
3
4
5
0 2 4 6 8 10
Time (min)
Number of servers
RuBiS: E-auction application like Ebay
Computer Science
Class-based differentiation
Arrival rate
0
50
100
150
200
250
0 100 200 300 400 500 600
Time (sec)
Arr
ival
rat
e
GLD
SIL
BRZ
Fraction admitted
0
0.2
0.4
0.6
0.8
1
1.2
0 100 200 300 400 500 600
Time (sec)
Fra
ctio
n a
dm
itte
d
GLD
SIL
BRZ
Computer Science
Threshold-based: higher scalability
Scalability
0
20
40
60
80
100
0 5000 10000 15000 20000
Arrival rate
CP
U u
sag
e
Batch
Thresh
Computer Science
Other Research Results
OS Resource Allocation Qlinux [ACM MM00], SFS [OSDI00], DFS [RTAS02] SHARC cluster-based prop. sharing [TPDS03]
Shared hosting provisioning Measurement-based [IWQOS02], Queuing-based
[Sigmetrics03,IWQOS03] Provisioning granularity [Self-manage 03]
Application placement [PDCS 2004] Profiling and Overbooking [OSDI02]
Storage issues iSCSI vs NFS [FAST03], Policy-managed [TR03]
Computer Science
Glimpse of Other Projects
Hyperion: Network processor based measurement platform Measurement in the backbone and at the edge NP-based measurements in the data center
RiSE: Rich Sensor Environments Video sensor networks Robotics sensor networks Real-time sensor networks Weather sensors
Computer Science
Concluding Remarks
Internet applications see varying workloads
Handle workload dynamics by Dynamic capacity provisioning Request Policing
Need to account for multi-tiered applications
Joint work: Bhuvan Urgaonkar, Abhishek Chandra and Vijay Sundaram
More at http://lass.cs.umass.edu
Computer Science
Predictive Provisioning
Invoked once every hour Captures long-term variations - time of day effects Extensions to seasonal effects (month-of-year, holidays)
How to initialize? Needs several days of history to work well
What happens if no servers are available? Use revenue/utility to arbitrate allocation [Muse] Turn away excess requests
Non-replicable tiers are easy to handle Provision other tiers until non-replicable tier is saturated
Computer Science
Degrade or Drop?
Depends on the application and the SLA Degrading increases effective capacity
Also degrades performance seen by requests
Degrade if Utility from servicing more requests at lower performance > Utility from servicing fewer requests - penalty of dropping
requests
Otherwise drop requests
< 500ms r1
< 1s r2
<10s r3
SLA:
Computer Science
Use of Virtual Machine Monitors
Server allocation can be slow (~ 5-20+ minutes) Need residual sessions to terminate Disk scrubbing, OS and app installation, configuration Application and system overheads
Flash crowds => need fast allocation Use virtual machines
Each app runs inside a VM, multiple VMs on a server Only one VM is active at any time, other VMs are “hot spares”
Server allocation => idle one VM, activate another System overhead reduces to < 1s Need to still account for residual sessions
Application issue, not longer a system limitation