donar decentralized server selection for cloud …...–ultradns –akamai global traffic management...

51
DONAR Decentralized Server Selection for Cloud Services Patrick Wendell, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman, and Jennifer Rexford

Upload: others

Post on 04-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

DONAR Decentralized Server Selection

for Cloud Services

Patrick Wendell, Princeton University

Joint work with Joe Wenjie Jiang,

Michael J. Freedman, and Jennifer Rexford

Page 2: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Outline

• Server selection background

• Constraint-based policy interface

• Scalable optimization algorithm

• Production deployment

Page 3: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

User Facing Services are Geo-Replicated

Page 4: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Reasoning About Server Selection

Service Replicas

Client Requests

Mapping Nodes

Page 5: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Example: Distributed DNS

Client 1

Client C

DNS 1

DNS 2

DNS 10

Servers Auth. Nameservers

Client 2

Clients Mapping Nodes Service Replicas

DNS Resolvers

Page 6: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Example: HTTP Redir/Proxying

Client 1

Client C

Datacenters HTTP Proxies

Client 2

Clients Mapping Nodes Service Replicas

HTTP Clients

Proxy 1

Proxy 2

Proxy 500

Page 7: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Reasoning About Server Selection

Service Replicas

Client Requests

Mapping Nodes

Page 8: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Reasoning About Server Selection

Service Replicas

Client Requests

Mapping Nodes

Outsource to DONAR

Page 9: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Outline

• Server selection background

• Constraint-based policy interface

• Scalable optimization algorithm

• Production deployment

Page 10: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Naïve Policy Choices Load-Aware: “Round Robin”

Service Replicas

Client Requests

Mapping Nodes

Page 11: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Naïve Policy Choices Location-Aware: “Closest Node”

Service Replicas

Client Requests

Mapping Nodes

Goal: support complex policies

across many nodes.

Page 12: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Policies as Constraints

Replicas DONAR Nodes

bandwidth_cap = 10,000 req/m

split_ratio = 10% allowed_dev = ± 5%

Page 13: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Eg. 10-Server Deployment

How to describe policy with constraints?

Page 14: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

No Constraints Equivalent to “Closest Node”

2% 6%

10%

1% 1% 7%

2%

28%

9%

35% Requests per Replica

Page 15: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

No Constraints Equivalent to “Closest Node”

2% 6%

10%

1% 1% 7%

2%

28%

9%

35% Requests per Replica

Impose 20% Cap

Page 16: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Cap as Overload Protection

2% 6%

10%

1% 1% 7%

14% 20% 20% 20%

Requests per Replica

Page 17: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

12 Hours Later…

5%

16%

29%

4% 3%

16%

3% 10% 12%

3%

Requests per Replica

Page 18: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

“Load Balance” (split = 10%, tolerance = 5%)

Requests per Replica

5% 5% 5% 5% 5%

15% 15% 15% 15% 15%

Page 19: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

“Load Balance” (split = 10%, tolerance = 5%)

Requests per Replica

5% 5% 5% 5% 5%

15% 15% 15% 15% 15%

Trade-off network proximity & load distribution

Page 20: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

12 Hours Later…

Requests per Replica

7%

15% 15% 15%

5%

13%

5% 10% 10%

5%

Large range of policies by varying cap/weight

Page 21: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Outline

• Server selection background

• Constraint-based policy interface

• Scalable optimization algorithm

• Production deployment

Page 22: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Optimization: Policy Realization

• Global LP describing “optimal” pairing

Clients: c ∈ C Nodes: n ∈ N Replica Instances: i ∈ I

Minimize network cost

min α𝒄 ∙ 𝑅𝑐𝑖 ∙ 𝑐𝑜𝑠𝑡(𝑐, 𝑖)

𝑖∈𝐼𝑐∈𝐶

Server loads within tolerance

𝑃𝑖 −ω𝑖 ≤ 𝜀𝑖

Bandwidth caps met

𝐵𝑖 < 𝐵 ∙ 𝑃𝑖

s.t.

Page 23: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Optimization Workflow

Measure Traffic

Track Replica Set

Calculate Optimal

Assignment

Page 24: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Optimization Workflow

Measure Traffic

Track Replica Set

Calculate Optimal

Assignment

Per-customer!

Page 25: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Optimization Workflow

Measure Traffic

Track Replica Set

Calculate Optimal

Assignment

Continuously!

(respond to underlying traffic)

Page 26: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

By The Numbers

101 102

103

104

DONAR Nodes

Customers

replicas/customer

client groups/ customer

Problem for each customer: 102 * 104 = 106

Page 27: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Measure Traffic & Optimize Locally?

Service Replicas

Mapping Nodes

Page 28: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Not Accurate!

Service Replicas

Mapping Nodes

Client Requests

No one node sees entire client population

Page 29: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Aggregate at Central Coordinator?

Service Replicas

Mapping Nodes

Page 30: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Aggregate at Central Coordinator?

Service Replicas

Mapping Nodes

Share Traffic Measurements

(106)

Page 31: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Aggregate at Central Coordinator?

Service Replicas

Mapping Nodes

Optimize

Page 32: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Aggregate at Central Coordinator?

Service Replicas

Mapping Nodes

Return assignments

(106)

Page 33: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

So Far

Accurate Efficient Reliable

Local only No Yes Yes

Central Coordinator

Yes No No

Page 34: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Decomposing Objective Function

min α𝒄 ∙ 𝑅𝑐𝑖 ∙ 𝑐𝑜𝑠𝑡(𝑐, 𝑖)

𝑖∈𝐼𝑐∈𝐶

𝑠𝑛 α𝑐𝑛 ∙ 𝑅𝑛𝑐𝑖 ∙ 𝑐𝑜𝑠𝑡(𝑐, 𝑖)

𝑖∈𝐼𝑐∈𝐶𝑛∈𝑁

Traffic from c

prob of mapping c to i

cost of mapping c to i

∀ clients ∀ instances

=

∀ nodes Traffic to this node

We also decompose constraints

(more complicated)

Page 35: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Decomposed Local Problem For Some Node (n*)

min

loadi = f(prevailing load on each server + load I will impose on each server)

∀𝑖𝑙𝑜𝑎𝑑𝑖 + 𝑠𝑛∗ α𝑐𝑛∗ ∙ 𝑅𝑛∗𝑐𝑖 ∙ 𝑐𝑜𝑠𝑡 𝑐, 𝑖

𝑖∈𝐼𝑐∈𝐶

Local distance minimization

Global load information

Page 36: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

DONAR Algorithm

Service Replicas

Mapping Nodes

Solve local problem

Page 37: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

DONAR Algorithm

Service Replicas

Mapping Nodes

Solve local problem

Share summary data

w/ others (102)

Page 38: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

DONAR Algorithm

Service Replicas

Mapping Nodes

Solve local problem

Page 39: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

DONAR Algorithm

Service Replicas

Mapping Nodes

Share summary data w/ others

(102)

Page 40: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

DONAR Algorithm

• Provably converges to global optimum

• Requires no coordination

• Reduces message passing by 104

Service Replicas

Mapping Nodes

Page 41: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Better!

Accurate Efficient Reliable

Local only No Yes Yes

Central Coordinator

Yes No No

DONAR Yes Yes Yes

Page 42: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Outline

• Server selection background

• Constraint-based policy interface

• Scalable optimization algorithm

• Production deployment

Page 43: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Production and Deployment

• Publicly deployed 24/7 since November 2009

• IP2Geo data from Quova Inc. • Production use:

– All MeasurementLab Services (incl. FCC Broadband Testing)

– CoralCDN

• Services around 1M DNS requests per day

Page 44: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Systems Challenges (See Paper!)

• Network availability Anycast with BGP

• Reliable data storage Chain-Replication with Apportioned Queries

• Secure, reliable updates Self-Certifying Update Protocol

Page 45: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

CoralCDN Replicas

DONAR Nodes

Client Requests

CoralCDN Experimental Setup

split_weight = .1 tolerance = .02

Page 46: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Results: DONAR Curbs Volatility

“Closest Node” policy

DONAR “Equal Split” Policy

Page 47: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Results: DONAR Minimizes Distance

1 2 3 4 5 6 7 8 9 10

Re

qu

est

s p

er

Re

plic

a

Ranked Order from Closest

Minimal (Closest Node)

DONAR

Round-Robin

Page 48: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Conclusions

• Dynamic server selection is difficult – Global constraints – Distributed decision-making

• Services reap benefit of outsourcing to DONAR.

– Flexible policies – General: Supports DNS & HTTP Proxying – Efficient distributed constraint optimization

• Interested in using? Contact me or visit

http://www.donardns.org.

Page 49: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Questions?

Page 50: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Related Work (Academic and Industry)

• Academic – Improving network measurement

• iPlane: An informationplane for distributed services H. V. Madhyastha, T. Isdal, M. Piatek, C. Dixon, T. Anderson, A. Krishnamurthy, and A. Venkataramani, “,” in OSDI, Nov. 2006

– “Application Layer Anycast” • OASIS: Anycast for Any Service

Michael J. Freedman, Karthik Lakshminarayanan, and David Mazières Proc. 3rd USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI '06) San Jose, CA, May 2006.

• Proprietary

– Amazon Elastic Load Balancing – UltraDNS – Akamai Global Traffic Management

Page 51: DONAR Decentralized Server Selection for Cloud …...–UltraDNS –Akamai Global Traffic Management Doesn’t [Akamai/UltraDNS/etc] Already Do This? •Existing approaches use alternative,

Doesn’t [Akamai/UltraDNS/etc] Already Do This?

• Existing approaches use alternative, centralized formulations.

• Often restrict the set of nodes per-service.

• Lose benefit of large number of nodes (proxies/DNS servers/etc).