overbooking.ppt
TRANSCRIPT
1Computer Science
Resource Overbooking and Application Profiling in Shared
Hosting PlatformsBhuvan Urgaonkar
Prashant Shenoy Timothy Roscoe †
UMASS Amherst and Intel Research †
2Computer Science
Motivation
❒ Proliferation of Internet applications❍ Electronic commerce, streaming media, online
games, online trading,…
❒ Commonly hosted on clusters of servers❍ Cheaper alternative to large multiprocessors
ClientsInternetStreaming
Games
E-commercecluster
3Computer Science
Hosting Platforms
❒ Hosting platform: server cluster that runs third-party applications
❒ Application providers pay for server resources❍ CPU, disk, network bandwidth, memory
❒ Platform provider guarantees resource availability❍ Performance guarantees provided to applications
❒ Central challenge: Maximize revenue while providing resource guarantees
4Computer Science
Design Challenges
❒ How to determine an application’s resource needs?
❒ How to provision resources to meet these needs?
❒ How to map applications to nodes in the platform?
❒ How to handle dynamic variations in the load?
5Computer Science
Talk Outline➾ Introduction
❒ Inferring Resource Requirements
❒ Provisioning Resources
❒ Handling Dynamic Load Variations
❒ Experimental Evaluation
❒ Related Work
6Computer Science
Hosting Platform Model❒ Hosting Platforms: Dedicated vs Shared
❍ Dedicated: Applications get integral # nodes❍ Shared: Applications may get fractional # nodes
❒ Our focus: Shared Hosting Platforms❍ Nodes may have competing applications
❒ Capsule: component of an application running on a node❍ Example: e-commerce application: HTTP server, app
server, database server
7Computer Science
Provisioning By Overbooking❒ How should the platform allocate resources?
❍ Provision resources based on worst-case needs❒ Worst-case provisioning is wasteful
❍ Low platform utilization❒ Applications may be tolerant to occasional violations
❍ E.g., CPU guarantees should be met 99% of the time
❒ Possible to provide useful guarantees even after provisioning less than worst-case needs
ð Idea: Improve utilization by overbooking resources
8Computer Science
Application Profiling
❒ Use the Linux trace toolkit
time
Begin CPU quantum End CPU quantum
ON OFF
❒ Profiling: process of determining resource usage❍ Run the application on an isolated set of nodes❍ Subject the application to a real workload❍ Model CPU and network usage as ON-OFF processes
9Computer Science
Resource Usage Distribution
time
Measurement Interval
Cum
ulat
ive
Pr
obab
ilit
y
Fractional usage0 1
1
r(100)
0.99
r(99)
Prob
abil
ity
Fractional usage0 1
10Computer Science
Capturing Burstiness: Token Bucket
❒ Token Bucket (σ, ρ)❍ Resource usage over t ≤ σ.t + ρ
Algorithm by Tang et al
❒ Additional parameter T❍ Satisfy token bucket guarantees only for t ≥ T
ρ1
ρ2
time
usage
σ1.t + ρ1
σ2.t + ρ2
11Computer Science
Profiles of Server Applications
❒ Applications exhibit different degrees of burstiness ❍ May have a long tail
❒ Insight: Choose (σ, ρ) based on a high percentile
0
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 1
Postgres Server, 10 clients
Pro
bab
ilit
y
Fraction of CPU
0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Apache Web Server, 50% cgi-bin
Pro
bab
ilit
y
Fraction of CPU
0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.1 .2 0.3 0.4 0.5 0.6 0.7 0.8
Streaming Media Server, 20 clients
Pro
bab
ilit
y
Fraction of NW bandwidth
12Computer Science
Resource Overbooking❒ Applications specify overbooking tolerance O
❍ Probability with which capsule needs may be violated
❒ Controlled overbooking via admission control:
ΣK (σk ·Tmin + ρk)·(1 - Ok) ≤ C·Tmin
Pr (ΣKUk > C) ≤ min (O1,…,Ok)
❒ A node that has sufficient resources for a capsule is feasible for it
13Computer Science
Mapping Capsules to Nodes
❒ A bipartite graphs of capsules and feasible nodes❍ Greedy mapping: consider capsules in non-decreasing
order of degrees: O( c . Log c ) ❍ Guaranteed to find a placement if one exists!❍ Multiple feasible nodes => best fit, worst fit, random…
1
2
3
1234
capsules nodes capsules nodes
1
33
1
2
4
Final Mapping
14Computer Science
Handling Flash Crowds❒ Detect overloads by online profiling
0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Apache Web Server, Overload
Pro
bab
ilit
y
Fraction of CPU
0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Apache Web Server, Expected Workload
Pro
bab
ilit
y
Fraction of CPU
0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Apache Web Server, Offline Profile
Pro
bab
ilit
y
Fraction of CPU
❒ Reacting to overloads (ongoing work)❍ Compute new allocations❍ Change allocations, move capsules, add servers
15Computer Science
Talk Outline➾ Introduction
➾ Inferring Resource Requirements
➾ Provisioning Resources
➾ Handling Dynamic Load Variations
❒ Experimental Evaluation
❒ Related Work
16Computer Science
The SHARC Prototype
❒ A Linux-based Shared Hosting Platform
❍ 6 Dell Poweredge 1550 servers
❍ Gigabit Ethernet link
❒ Software Components❍ Profiling
Vanilla Linux + Linux trace toolkit
❍ Control plane Overbooking, placement
❍ QoS-enhanced Linux kernel HSFQ schedulers
17Computer Science
Experimental Setup
❒ Prototype running on a 5 node cluster❍ Each server: 1 GHz PIII with 512MB RAM and Gigabit
ethernet❍ Control plane runs on a dedicated node❍ Applications run on the other four nodes
❒ Workload: mix of server applications❍ PostgreSQL database server with pgbench (TPC-B) benchmark❍ Apache web server with SPECWeb99 (static & dynamic HTTP)❍ MPEG streaming server with 1.5 Mb/s VBR MPEG-1 clients❍ Quake I game server with “terminator” bots
18Computer Science
Resource Overbooking Benefits
❒ Small amounts of overbooking can yield large gains❍ Bursty applications yields larger benefits
0
50
100
150
200
250
300
350
0 20 40 60 80 100 120 140
Placement of Streaming Media Servers
No OvbOvb=1%Ovb=5%
Num
ber
of
Ap
ps
Pla
ced
Number of Nodes
0
200
400
600
800
1000
1200
1400
0 20 40 60 80 100 120 140
No OvbOvb=1%Ovb=5%
We
b S
erv
ers
Pla
ced
Number of Nodes
Placement of Apache Web Servers
19Computer Science
Capsule Placement Algorithms
❒ Diverse requirements: worst-fit outperforms others❒ Similar requirements: all perform similarly
0
20
40
60
80
100
16 32 64
Placement Algorithms, Ovb=5%
RandomBest-fitWorst-fit
Num
ber
of
Ap
ps
Pla
ced
Number of Nodes
0
500
1000
1500
2000
2500
3000
3500
16 32 64
Placement Algorithms, Ovb=5%
RandomBest-fitWorst-fit
Nu
mb
er o
f A
pp
s P
lace
dNumber of Nodes
20Computer Science
Performance with Overbooking
❒ Performance degradation is within specified overbooking tolerance
5.230.590.3100Viol (sec)
Streaming
9.0421.7822.2122.4622.8Tput(trans/s)
PostgreSQL
39.864.8166.9167.5167.9Tput(req/s)
Apache
Avg95th 99th 100th IsolatedMetricApplication
21Computer Science
Related Work❒ Single node resource management
❍ Proportional share schedulers: WFQ, SFQ, BVT, …❍ Reservation based schedulers: Nemesis, Rialto, …
❒ Cluster-based resource management❍ Cluster Reserves [Aron00], Aron thesis [Aron00]❍ MUSE [Chase01]: economic approach ❍ Oceano [IBM], Planetary computing [HP]❍ Clusters for high availability: Porcupine [Saito99]❍ Grid computing
22Computer Science
Concluding Remarks
❒ Resource management in shared hosting platforms❍ Application profiling to determine resource usage❍ Revenue maximization using controlled overbooking ❍ Ability to handle dynamic workloads (ongoing work)
❒ URL: http://lass.cs.umass.edu