ClouDiA: A Deployment Advisor for Public Clouds
Tao Zou, Ronan Le Bras, Marcos Vaz Salles*, Alan Demers, Johannes Gehrke
Cornell University*University of Copenhagen (DIKU)
Instance Allocation in Public CloudsCloud Provider’s View Cloud Tenant’s View
2
Instance Allocation in Public Clouds
……
TOR TOR TOR
3
Sub-Aggregation ……
Aggregation
CoreCloud Provider’s View
4
• Mean Latency Heterogeneity in EC2
• Mean Latency Stability in EC2
Network Latencies in Public Clouds
Challenge: Can this be done out-of-the-box?(i.e. no changes to the cloud or the application)
Challenge: How to avoid long links?
Examples of Latency Sensitive ApplicationsScientific Simulation Key-value Store
Search Aggregation Service Pipelines
(time-to-solution) (response time)
(response time) (response time)
Communication Graphs
Longest Link
Longest Path
Opportunity: Communication graphs are not complete.
A careful logical to physical mapping can help!
Opportunity: Gamble with over-allocation.
6
Architecture of ClouDiA
Allocate Instances (+ Extra Instances)
Get Measurements
Search Mapping
Deployment Plan
Terminate Extra Instances
Communication Graph
Objectives
Start Application
ClouDiA Public Cloud Tenant
Measuring Network Distance
• Goal: obtain a reliable estimate of link costs
• Approximations that are easy to obtain – Hop counts– Length of Common IP Prefix
• Accurate distance: pair-wise network latencies– Large number of measurements for each pair
• To observe enough latency jitters
– Interferences at end points• Heavily application and network dependent • Can’t model exactly measure without interference
do not work
Measuring Network Latencies • Without interference• Most efficient method: “staged”
Stage i
Stage i+1
No concurrent send/recv at end points Parallelism with minimal coordination
9
Architecture of ClouDiA
Allocate Instances (+ Extra Instances)
Get Measurements
Search Mapping
Deployment Plan
Terminate Extra Instances
Communication Graph
Objectives
Start Application
ClouDiA Public Cloud Tenant
Search Mapping (Longest Link)• NP-Hard
– To find a solution of cost – To find a solution of cost – To find a solution of cost
• Goal: find a “good” solution within timeout
• Two formulations:– Mixed-Integer Programming
• O() boolean variables
– Constraint Programming (*)• O() integer variables• an objective cost has to be given a priori
Hard to approximate
Searching using Constraint Programming
0
Give an objective c:1. Remove all links with cost > c2. Find a subgraph isomorphism
k=4
• A lot of distinct latency values– Bi-section search? finding “no solution” takes time– k-means clustering? works well with proper k
Give an objective c:1. Remove all links with cost > c2. Find a subgraph isomorphism
TimeoutTimeoutcost
Experimental Settings
• 100 to 150 m1.large instances in EC2
• IBM ILOG CPLEX Optimizer/CP Optimizer– Multi-core, a single machine
• Three workloads:– Behavioral simulation – Synthetic aggregation query– Key-value store
13
Effect of Over-Allocation
• 100 instances + 10%-50% over-allocation• Get Measurements + Search Deployment < 10 minutes
14
• 10% over-allocation
Overall Improvement
Thank you!
• ClouDiA is a deployment advisor for public clouds– Out-of-the-box– 15%-55% time reduction for latency sensitive
applications
• Could be adopted by cloud services providers– With API changes
Conclusion