volley: automated data placement for geo-distributed cloud services

27
Volley: Automated Data Placement for Geo-Distributed Cloud Services

Upload: glain

Post on 24-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Volley: Automated Data Placement for Geo-Distributed Cloud Services. Why data placement important?. user wants lower latency . cloud service operator wants to limit cost . partitioning data across DCs . Commercial cloud service trace analysis. Live Messenger Live Mesh - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Volley:Automated Data Placement

for Geo-Distributed Cloud Services

Page 2: Volley: Automated Data Placement  for Geo-Distributed Cloud Services
Page 3: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Why data placement important?

Minimize latency

Eliminate redundant cost

Optimize utilization of data center

•user wants lower latency

•cloud service operator wants to limit cost

•partitioning data across DCs

Page 4: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Live Messenger Live Mesh

• Cover all users and devices that accessed these services over this entire month

• clients are identified by application-level unique identifiers.

Commercial cloud service trace analysis

Page 5: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Challenge of data placement

Geographic Diversity

Data Sharing

Data-inter Dependency

Data Center Capacity

Client Mobility

Page 6: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Challenge: Geographic Diversity

Page 7: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Challenge: Data Sharing

Page 8: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Data-inter dependency in Live meshChallenge: Data-inter Dependency

Page 9: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

The rush in industry to build additional datacenters is motivated in part by reaching the capacity constraints of individual datacenters as new users are added. This in turn requires automatic mechanisms to rapidly migrate application data to new datacenters to take advantage of their capacity

Challenge: Datacenter Capacity

Page 10: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Challenge: User Mobility

Page 11: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Proven algorithms do not apply to this problem

Page 12: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Volley

Page 13: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Three phases

Volley Algorithm

Compute Initial Placement

Iteratively Move Data to Reduce Latency

Iteratively Collapse Data to Datacenters

Page 14: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Common IPPut data close to the IP address that accesses it most frequently oneDCPut all data in one data center HashRandomly allocate data Volley

Data placement heuristics

Page 15: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Capacity Skew

Inter-Datacenter Traffic

Latency

Evaluation

Metrics

Page 16: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Hash> Volley> Common IP> oneDC

Capacity Skew

Page 17: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

oneDC> Volley> Common IP> Hash

Inter-datacenter Traffic

Page 18: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Volley> Common IP> oneDC> Hash>

Latency

Page 19: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Capacity skew:Hash>Volley>Common IP>oneDC

Inter-DC traffic:oneDC>Volley>Common IP>Hash

LatencyVolley>Common IP>oneDC>Hash

Evaluation

Page 20: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Iteration Count• In phase 2, exceeded iterations do not have significant

improvement• 5 iterations enough• Phase 3 determines the capacity skewRe-computation• Do make sense• Reason: data migration

Improvement of Volley

Page 21: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Data placement is vital in cloud service

Volley has a comprehensive advantagesimultaneously reduces user latency and operator cost reduces datacenter capacity skew by over 2X reduces inter-DC traffic by over 1.8X reduces user latency by 30% at 75th percentile runs in under 16 clock-hours for 400 machine-hours computation across 1 week of traces

The re-computation of Volley algorithm is necessary

Conclusion

Page 22: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Limitation of the evaluation conducted by the paper No good contrast Can geo-distance stand for latency? Client mobility? Large space for development

Let’s go on…….

Page 23: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Thank You!

Page 24: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Phase 1:calculate geographic centroid for each data

Page 25: Volley: Automated Data Placement  for Geo-Distributed Cloud Services

Phase 2:Refine centroid for each data iteratively

•considering client locations, and data inter-dependencies •using weighted spring model that attracts data items , but on a spherical coordinate system

Page 26: Volley: Automated Data Placement  for Geo-Distributed Cloud Services
Page 27: Volley: Automated Data Placement  for Geo-Distributed Cloud Services