a world of (im)possibilities nancy lynch celebration: sixty and beyond hagit attiya, technion...

A World of (Im)PossibilitiesNancy Lynch Celebration: Sixty and Beyond

Hagit Attiya, Technion

Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 2

Introduction

One of the main themes of Nancy's work has been proving lower bounds and impossibility results for problems that arise in distributed computing.

Overview some of Nancy's results Less known results, hidden gems closer to our hearts

Emphasize their meaning and implications How they influenced the development of the field and of

distributed systems Concentrating on their positive impact


Best-Known Example: FLP

Impossibility of asynchronous fault-tolerant consensus[Fischer, Lynch, Paterson]

Motivated work on strengthening models of computation

partially synchronous models [Dwork, Lynch, Stockmeyer] unreliable failure detectors [Chandra, Toueg]

weakening the problem definition k-set agreement

[Chaudhuri] renaming [Attiya et al.] condition-based approaches [Raynal, Rajsbaum et al.]


FLP: Impact

Related practical problems: transaction commit leader election atomic broadcast maintaining consistent replicated data

The wait-free hierarchy (classify concurrent abstract data types) [Herlihy]

Attempts to solve k-set agreement and renaming led to the application of topology in distributed computing.

[Chaudhuri] [Borowsky, Gafni][Saks, Zaharoglou][Herlihy, Shavit]


2nd Example: Brewer's Conjecture

[Brewer, PODC 2000 invited talk]

A web service cannot provide all three guarantees: Consistency Availability Partition-tolerance


What Does This Mean?

[Gilbert, Lynch, SIGACT News 2002]

A web service cannot provide all three guarantees: Consistency: atomicity of (read / write) operations Availability: request by nonfaulty client gets response Partition-tolerance: even when lost messages create

two partitioned components in the network


Proof Ideaadapted from [Attiya, Bar-Noy, Dolev]

p0

p1

XX

XX

p1 reads 0

p0 writes 1Exec 1:

Exec 2:

p0 writes 1 p1 reads 0Exec 3:

looksameto p1


Brewer's Conjecture: Implications Traditional database services maintain the consistency

and fail to provide availability in the face of partitions Relax the consistency guarantees of the web service

Sometimes miss values or return stale data (Internet queries)[PIER: Huebsch, Hellerstein, Lanham, Loo, Shenker, Stoica]

Allow partitions to evolve separately, and build mechanisms to cope when this happens (stream processing)

[Medusa: Balazinska, Balakrishnan, Stonebraker]

Sacrifice availability, but not often (stream processing)…[BOREALIS: Balazinska, Balakrishnan, Madden, Stonebraker]

Assume a mechanism to guard against partitions… [CQ: Shah, Hellerstein, Brewer]


3rd Example: Best-Case Cost of Fault-Tolerant Algorithms

Does making an algorithm be fault-tolerant incur a cost even when the system is well-behaved?

Previous investigation focused on the synchronous case early stopping algorithms for consensus:

2 rounds vs. 1 round for non-fault-tolerant algorithm[Dolev, Reischuk, Strong] [Dwork, Moses]

[Moses, Tuttle] non-blocking commit:

twice as many rounds as for blocking commit [Dwork, Skeen]

What about the asynchronous case?


Are Wait-Free Algorithms Fast? [Attiya, Lynch, Shavit]

Studies the best-case complexity of an algorithm When there are no failures, although algorithm can tolerate any

number of crashes (is wait-free) When the execution is synchronized, although the algorithm

works in asynchronous executions also

Complexity measure of interest is running time Time is measured by synchronized rounds

Problem of interest is approximate agreement

n = 6


Wait-Free Algorithms are not Fast A non-fault-tolerant algorithm takes O(1) time

one process writes its input and the rest read it achieves perfect agreement ( = 0)

Prove an Ω(log n) time lower bound for wait-free approximate agreement

So there are problems for which being wait-free in the asynchronous model imposes more than constant additional cost even when failures do not occur.


Proof Idea

< log n

< n

0

0

0 decide0

0

0

0

0

0

0

this process cannotinfluence the decision


Proof Idea1

< 1

0

0

< log n

< n

0

0

0

0

0

0

decide0

decide1


The Best-Case Cost of Fault-Tolerance Formalize the idea of "designing for the normal /

common case" and show its cost[Lampson, "Hints for computer system design"]

The idea of accommodating the worst case & measuring the best / normal / common case has become standard. message cost of consensus in failure-free runs

[Halpern, Hadzilacos] contention-free step complexity

[Alur, Taubenfeld] obstruction-free step complexity

[Ellen, Luchangco, Moir, Shavit]


Interleaving Algorithms

Also an approximate agreement algorithm matching the (log n) time lower bound

Interleaves two algorithms: One guarantees fault-tolerance Another guarantees best-case time complexity Need to coordinate results… Using a “virtual” two-process approximate agreement

algorithm Similar applications of interleaving,

especially in randomized consensus [Saks, Shavit, Woll] E.g., this morning session [Aspnes, Attiya,

Censor]


Application: Replicated Storage

[Yu and Vahdat] Emulates a shared memory Replication-based implementation of wide-area data

access services need automatic regeneration of failed replicas and

reconfiguration of groups Probabilistic guarantee: reads may return stale values

with a small probability Optimizes for best case:

Failure-free reconfiguration is quick and cheap Failure-induced calls a consensus protocol [Saks, Shavit, Woll]

for replicas to agree on next configuration


4th Example: Clock Synchronization In a distributed system with n nodes that experiences

variable message delays, how closely can the nodes' clocks be synchronized?


Clock Synchronization Lower Bound [Lundelius, Lynch]

No algorithm can synchronize n clocks closer than

(1-1/n)u For a clique with same message delay uncertainty u on all links (u = max delay - min delay)

Even if no failures and no clock drift

Proof introduced the shifting technique

p0

p1

d-u dp0

p1

d-ud

shift p0 backwards by u


What About Other Topologies?

[Halpern, Megiddo, Munshi]

Arbitrary topologies and nonuniform uncertainties Adversary's optimal strategy is to maximize a certain

quantity involving neighboring nodes' initial clock values and the delays

between them subject to constraints on message uncertainty

Bound is expressed as a system of equations, and this linear program is solved using optimization techniques Shifting notion is captured in the linear program Not in closed form except for a few special cases

Bound is tight


What About Closed Form Bounds? [Biaz, Welch] If uncertainties are symmetric (same in both directions of

a link), then lower bound is diam/2

where diam is diameter of the graph w.r.t. uncertainties

e

2

3

4

3

5

2 4

1

5 diam = 9

af

dcb


f

Arbitrary topology G with arbitrary uncertainties is equivalent to clique G' with same nodes where uncertainty between any two nodes is length of shortest path between them in G (w.r.t. uncertainties)

[Halpern, Megiddo, Munshi]

Shift a carefully chosen execution on the clique, for 2 nodes diam apart to get the diam/2 lower bound.

a

Shifting Equivalent Clique

a b

c

de

f

3

5649 233

6

12

4

35

5


What About Upper Bounds? For arbitrary graph and arbitrary topology,

the radius is an upper bound [Halpern, Megiddo, Munshi]

Since radius ≤ diam, within factor of 2

diam = 9radius = 5

2

3

4

3

5

2 4

1

5

af

dcb

e Tight & almost tight closed form upper bounds for some

specific common topologies with uniform uncertainties[Biaz, Welch]


External Clock Synchronization What about external synchronization,

when some clocks have outside time sources? Previous results for internal synchronization

The tight bound on how close a node's clock can get to the source time is half the shortest path distance (w.r.t. uncertainties) from the node to a source

[Attiya, Hay, Welch]

2

3

4

3

5

2 4

1

5sourcea

f

dsource

cb

bounds are:b: 3/2c: 1/2e: 3/2f: 5/2


Optimal Synchronization Per Execution Given information collected in a specific execution,

by some algorithm strategy, find the tightest possible synchronization internal synchronization, offline algorithm

[Attiya, Herzberg, Rajsbaum]

external synchronization, online algorithm [Patt-Shamir,

Rajsbaum] extended to handle clock drift

[Ostrovsky, Patt-Shamir]


Gradient Clock Synchronization The clock skew between any pair of nodes should be a

function of the distance between them[Fan,

Lynch]

af

dcb

e

clocks of a and dneed not beas tightly synch'edas those of a and b


Gradient Clock Synchronization motivated by problems in sensor networks,

or more generally, large scale networks, where nodes in the same locality need to be more tightly synchronized data fusion target tracking

http://www.mikalac.com/mis/missile.html


Gradient Clock Synch Lower Bound Closest that two nodes' clocks can get (in worst case) is

(log D / log log D) D is diameter of network global influence

Algorithms requiring a fixed maximum skew for nearby nodes may not scale well E.g., TDMA

http://www.dsna-dti.aviation-civile.gouv.fr/actualities/revuesgb/revue64gb/64pgarticle2gb/telecom_c2gb.html


Gradient Clock Synch Lower Bound: Assumption 1Nonzero clock drift: (hardware) clocks can run fast or slow,

within known bounds

clocktime

real time

hardwareclockmax slope

< 1+

1+

min slope< (1+)-1 (1+)-1


Gradient Clock Synch Lower Bound: Assumption 2Algorithm must ensure that (logical) clocks always increase

at some minimum positive rate

clocktime

real time

logicalclock

min slope<


Gradient Clock Synch LB: Simple Case

Consider a simple algorithm in which the clock value of p1 is periodically propagated down the chain

Can construct execution in which pn-1's new clock value is larger than pn's old clock value by an amount depending on D carefully choose message delays manipulate clock drift rates cause nodes to suddenly jump to higher values without

synchronizing with their neighbors Insight in the paper is generalizing this to any algorithm

pnp3p2p1


Is the Lower Bound Tight?

Recall lower bound is (log D / log log D) Several pre-existing algorithms have O(D) Then upper bound improved to O(√D)

[Locher, Wattenhofer]

Recently upper bound improved to O(log D)[Lenzen, Locher, Wattenhofer]

Still a small gap; can the lower bound be improved?


How Long Can Large Difference Last? In the simple diffusion algorithm on the chain,

large difference between pn-1 and pn only lasts while message is in transit

Perhaps difficulties could be avoided by keeping track of “generation” of clock value and only comparing apples with apples (clocks of the same generation)? but this could be complicated


And There’s a Lot More… Lower bounds on space for mutual exclusion

[Burns, Lynch] Lower bound on number of messages for leader election in

synchronous rings[Frederickson, Lynch]

Impossibility results for data link layer and connection management[Fekete, Lynch, Mansour, Spinelli] [Kleinberg, Attiya, Lynch]

Lower bound on time for consensus in partially synchronous models[Attiya, Dwork, Lynch, Stockmeyer]

Lower bound on time for synchronous k-set agreement[Chaudhuri, Herlihy, Lynch, Tuttle]

Tradeoff between safety and liveness for randomized coordinated attack

[Varghese, Lynch] Impossibility of boosting fault tolerance

[Attie, Guerraoui, Kouznetsov, Lynch, Rajsbaum] …


Final Observations

Strive to make the results relevant Natural problems Practical architectural assumptions Realistic performance measures (for lower bounds)

Crisp arguments (ingenious but clear) Easy to understand and verify Simple to extend and lead to follow-ups


Take-Home Message

Impossibility results help the development of the area

Understanding inherent limits guides efforts in the appropriate directions

And setting boundaries is good for everyone…

Thanks for your attention

Thank you, Nancy!

a world of (im)possibilities nancy lynch celebration: sixty and beyond hagit attiya, technion...

Documents

brewer slide

university slide

shavit slide

network slide

positive impact slide

consistency guarantees

dolev p0p0 p1p1 x x

web service