minimizing churn in distributed systems p. brighten godfrey, scott shenker, and ion stoica uc...

31
Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, a nd Ion Stoica UC Berkeley SIGCOMM’06

Upload: diana-eaton

Post on 19-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

Minimizing Churn in Distributed Systems

P. Brighten Godfrey, Scott Shenker, and Ion StoicaUC Berkeley

SIGCOMM’06

Page 2: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

2

Road Map

IntroductionSimulation

Basic Properties

AnalysisApplicationsDiscussionConclusion

Page 3: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

3

Introduction

Churn Change in the set of participating nodes due to joins,

graceful leaves, and failures

A quantitative guide to the churn form selection strategies

Analytically characterize the performance of strategies

Compare the performance of strategies with different real traces

Page 4: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

4

Road Map

IntroductionSimulation

Basic Properties

AnalysisApplicationsDiscussionConclusion

Page 5: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

5

Churn Simulations Model

System Model Node status

Up (in use, or available), down Nodes in use

Definition of churn

Example Two nodes fail and replaced by others

10 , nk

i ii

ii

UU

UU

TC

events 1

1

,max

1

)( 221kkT

Page 6: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

6

Selection Strategies

Predictive fixed strategies Fixed decent

Select randomly from 50% with more up time

Fixed most available The most time up

Fixed longest lived Greatest average

session time

Agnostic fixed strategies Fixed random

Predictive replacement strategies Max Expectation

Greatest expected remaining uptime

Longest uptime Longest current uptime

Optimal

Agnostic replacement strategies Random Replacement (RR) Passive Preference list

Fail and then replace Active preference list

Page 7: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

7

Traces

Synthetic traces PDF

a = 1.5 and b fixed so that mean is 30 minutes

1)()(

a

a

bx

abxf

Page 8: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

8

Simulation Setup

Event-based simulator Selection algorithm to react immediately after each

change Chord protocol simulator

No loss, except the node fail when then datagram is in flight

At least 10 trails Sample 1000 random nodes 95% confidence intervals

Page 9: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

9

Basic Properties

Synthetic Pareto lifetimes Fixed k = 50 Fixed strategies are the same

The same mean session time

Page 10: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

10

Benefit of Replacement Strategies

1.3~5 times improvement The dynamically selecting nodes for long-

running distributed application would be worthwhile

Page 11: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

11

Benefit of Replacement Strategies

The best fixed strategies match the performance of the best replacement one The trace are shorter

Page 12: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

12

Agnostic Strategies

RR is worse for small k, but is with in a factor of 2 of Max Expectation

RR is 1.2~3 times better than Passive and 2.5~10 times better than Active PL

Page 13: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

13

Road Map

IntroductionSimulation

Basic Properties

AnalysisApplicationsDiscussionConclusion

Page 14: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

14

Analysis of Fixed and PL strategies

Fixed strategies Node recover instantaneously

Each failure and recovery, normalized by time The number of a node failure Expected churn

Passive Preference List strategies If k is large, then same as Fixed strategies

Active Preference List strategies It pays more to switch back after the recovery of

the node

Tkk )( 11

T

22

Tk

kT

kT1

Page 15: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

15

Analysis of Random Replacement

Intuition Waiting time paradox

RR is (roughly) selecting the current session of a random node

This is biased towards longer sessions RR does very badly when stable nodes are rare

One with mean r >> 1 and others’ are 1 Churn of RR is about 2 and the best fixed strategies is

Churn rate

2

i

d

i

LCEEd

CEi

exp12

1

1

Page 16: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

16

Analysis of Random Replacement

Agreement of the analysis with a simulation for n = 20 and the previous Pareto-distributed session time plot

Page 17: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

17

Characteristics of Random Replacement

X’ is more skewed than X If E[X’] = E[X], then

x’ and x are the yth percentile values of X’ and X

The churn of RR decreases as the distributions become more “skewed”

If the session time distributions are stable and have equal mean , RR’s expected churn is at most twice the expected churn of any fixed or Preference List strategy

]|[ ]''|'[ xXXExXXE

Page 18: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

18

Road Map

IntroductionSimulation

Basic Properties

AnalysisApplicationsDiscussionConclusion

Page 19: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

19

Anycast

Whenever its current server fails, it obtains a list of the m servers to which it has lowest latency and connects to random on of these m

Switching to another server is not countedLatencies were obtained from a synthetic

edge network delay space generator It is modeled on measurements of latency

between DNS servers

Page 20: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

20

Anycast

Trade of between server list m and latency t t increases => Passive PL m increases => RR hybrid:

ω decrease: Passive PL to Longest Uptime

uptimelatency )1(

Page 21: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

21

Anycast

When session time is small, the end host experiences the mean server failure tare , as in Active PL

Page 22: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

22

DHT Neighbor Selection

Long-distant neighbor Deterministic topology (Active PL) Randomized topology (RR)

Simulation Sample n nodes from Gnutella Feed into Chord protocol simulator Two node send message to a node with single

key It is failed when two message are lossed

Page 23: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

23

DHT Neighbor Selection

Randomized topology are more stable, but have slightly longer routes

Randomized topology also can reduce maintenance bandwidth

Page 24: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

24

Multicast

Select one of m suitable nodes as parent Suitable: available bandwidth to serve another

child Strategies

Longest uptime, Minimum Depth, Minimum Latency Homogeneous bandwidth

Page 25: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

25

Multicast

Page 26: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

26

DHT Replica Placement

Root set (Passive PL) Nodes with ID closer to key (Object) should

keep the replica Root directory (RR)

Replica of directory is the same as root set

Replica may be on any node in the systemSimulation

Lazy replication On equal footing

Page 27: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

27

DHT Replica Placement

There are many permanent failures in Gnutella traces

Page 28: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

28

Road Map

IntroductionSimulation

Basic Properties

AnalysisApplicationsDiscussionConclusion

Page 29: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

29

Discussion

When would one use Random Replacement? Minimize churn

Longest Uptime RR would be easier to implement

Uptime is not easy to determine• Network problem, liar

What about load balance? The result do not address fairness between users

Page 30: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

30

Road Map

IntroductionSimulation

Basic Properties

AnalysisApplicationsDiscussionConclusion

Page 31: Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

31

Conclusion

A guide to performance of a range of node selection strategies in real-world traces

Highlight and explain analytically the god performance of RR relative to smart strategies

Explain the performance implications of a variety of existing distributed systems designs