a world of (im)possibilities nancy lynch celebration: sixty and beyond hagit attiya, technion...
Post on 21-Dec-2015
214 views
TRANSCRIPT
A World of (Im)PossibilitiesNancy Lynch Celebration: Sixty and Beyond
Hagit Attiya, Technion
Jennifer Welch, Texas A&M University
PODC/Concur 2008 World of (Im)Possibilities 2
Introduction
One of the main themes of Nancy's work has been proving lower bounds and impossibility results for problems that arise in distributed computing.
Overview some of Nancy's results Less known results, hidden gems closer to our hearts
Emphasize their meaning and implications How they influenced the development of the field and of
distributed systems Concentrating on their positive impact
PODC/Concur 2008 World of (Im)Possibilities 3
Best-Known Example: FLP
Impossibility of asynchronous fault-tolerant consensus[Fischer, Lynch, Paterson]
Motivated work on strengthening models of computation
partially synchronous models [Dwork, Lynch, Stockmeyer] unreliable failure detectors [Chandra, Toueg]
weakening the problem definition k-set agreement
[Chaudhuri] renaming [Attiya et al.] condition-based approaches [Raynal, Rajsbaum et al.]
PODC/Concur 2008 World of (Im)Possibilities 4
FLP: Impact
Related practical problems: transaction commit leader election atomic broadcast maintaining consistent replicated data
The wait-free hierarchy (classify concurrent abstract data types) [Herlihy]
Attempts to solve k-set agreement and renaming led to the application of topology in distributed computing.
[Chaudhuri] [Borowsky, Gafni][Saks, Zaharoglou][Herlihy, Shavit]
PODC/Concur 2008 World of (Im)Possibilities 5
2nd Example: Brewer's Conjecture
[Brewer, PODC 2000 invited talk]
A web service cannot provide all three guarantees: Consistency Availability Partition-tolerance
PODC/Concur 2008 World of (Im)Possibilities 6
What Does This Mean?
[Gilbert, Lynch, SIGACT News 2002]
A web service cannot provide all three guarantees: Consistency: atomicity of (read / write) operations Availability: request by nonfaulty client gets response Partition-tolerance: even when lost messages create
two partitioned components in the network
PODC/Concur 2008 World of (Im)Possibilities 7
Proof Ideaadapted from [Attiya, Bar-Noy, Dolev]
p0
p1
XX
XX
p1 reads 0
p0 writes 1Exec 1:
Exec 2:
p0 writes 1 p1 reads 0Exec 3:
looksameto p1
PODC/Concur 2008 World of (Im)Possibilities 8
Brewer's Conjecture: Implications Traditional database services maintain the consistency
and fail to provide availability in the face of partitions Relax the consistency guarantees of the web service
Sometimes miss values or return stale data (Internet queries)[PIER: Huebsch, Hellerstein, Lanham, Loo, Shenker, Stoica]
Allow partitions to evolve separately, and build mechanisms to cope when this happens (stream processing)
[Medusa: Balazinska, Balakrishnan, Stonebraker]
Sacrifice availability, but not often (stream processing)…[BOREALIS: Balazinska, Balakrishnan, Madden, Stonebraker]
Assume a mechanism to guard against partitions… [CQ: Shah, Hellerstein, Brewer]
PODC/Concur 2008 World of (Im)Possibilities 9
3rd Example: Best-Case Cost of Fault-Tolerant Algorithms
Does making an algorithm be fault-tolerant incur a cost even when the system is well-behaved?
Previous investigation focused on the synchronous case early stopping algorithms for consensus:
2 rounds vs. 1 round for non-fault-tolerant algorithm[Dolev, Reischuk, Strong] [Dwork, Moses]
[Moses, Tuttle] non-blocking commit:
twice as many rounds as for blocking commit [Dwork, Skeen]
What about the asynchronous case?
PODC/Concur 2008 World of (Im)Possibilities 10
Are Wait-Free Algorithms Fast? [Attiya, Lynch, Shavit]
Studies the best-case complexity of an algorithm When there are no failures, although algorithm can tolerate any
number of crashes (is wait-free) When the execution is synchronized, although the algorithm
works in asynchronous executions also
Complexity measure of interest is running time Time is measured by synchronized rounds
Problem of interest is approximate agreement
n = 6
PODC/Concur 2008 World of (Im)Possibilities 11
Wait-Free Algorithms are not Fast A non-fault-tolerant algorithm takes O(1) time
one process writes its input and the rest read it achieves perfect agreement ( = 0)
Prove an Ω(log n) time lower bound for wait-free approximate agreement
So there are problems for which being wait-free in the asynchronous model imposes more than constant additional cost even when failures do not occur.
PODC/Concur 2008 World of (Im)Possibilities 12
Proof Idea
< log n
< n
0
0
0 decide0
0
0
0
0
0
0
this process cannotinfluence the decision
PODC/Concur 2008 World of (Im)Possibilities 13
Proof Idea1
< 1
0
0
< log n
< n
0
0
0
0
0
0
decide0
decide1
PODC/Concur 2008 World of (Im)Possibilities 14
The Best-Case Cost of Fault-Tolerance Formalize the idea of "designing for the normal /
common case" and show its cost[Lampson, "Hints for computer system design"]
The idea of accommodating the worst case & measuring the best / normal / common case has become standard. message cost of consensus in failure-free runs
[Halpern, Hadzilacos] contention-free step complexity
[Alur, Taubenfeld] obstruction-free step complexity
[Ellen, Luchangco, Moir, Shavit]
PODC/Concur 2008 World of (Im)Possibilities 15
Interleaving Algorithms
Also an approximate agreement algorithm matching the (log n) time lower bound
Interleaves two algorithms: One guarantees fault-tolerance Another guarantees best-case time complexity Need to coordinate results… Using a “virtual” two-process approximate agreement
algorithm Similar applications of interleaving,
especially in randomized consensus [Saks, Shavit, Woll] E.g., this morning session [Aspnes, Attiya,
Censor]
PODC/Concur 2008 World of (Im)Possibilities 16
Application: Replicated Storage
[Yu and Vahdat] Emulates a shared memory Replication-based implementation of wide-area data
access services need automatic regeneration of failed replicas and
reconfiguration of groups Probabilistic guarantee: reads may return stale values
with a small probability Optimizes for best case:
Failure-free reconfiguration is quick and cheap Failure-induced calls a consensus protocol [Saks, Shavit, Woll]
for replicas to agree on next configuration
PODC/Concur 2008 World of (Im)Possibilities 17
4th Example: Clock Synchronization In a distributed system with n nodes that experiences
variable message delays, how closely can the nodes' clocks be synchronized?
PODC/Concur 2008 World of (Im)Possibilities 18
Clock Synchronization Lower Bound [Lundelius, Lynch]
No algorithm can synchronize n clocks closer than
(1-1/n)u For a clique with same message delay uncertainty u on all links (u = max delay - min delay)
Even if no failures and no clock drift
Proof introduced the shifting technique
p0
p1
d-u dp0
p1
d-ud
shift p0 backwards by u
PODC/Concur 2008 World of (Im)Possibilities 19
What About Other Topologies?
[Halpern, Megiddo, Munshi]
Arbitrary topologies and nonuniform uncertainties Adversary's optimal strategy is to maximize a certain
quantity involving neighboring nodes' initial clock values and the delays
between them subject to constraints on message uncertainty
Bound is expressed as a system of equations, and this linear program is solved using optimization techniques Shifting notion is captured in the linear program Not in closed form except for a few special cases
Bound is tight
PODC/Concur 2008 World of (Im)Possibilities 20
What About Closed Form Bounds? [Biaz, Welch] If uncertainties are symmetric (same in both directions of
a link), then lower bound is diam/2
where diam is diameter of the graph w.r.t. uncertainties
e
2
3
4
3
5
2 4
1
5 diam = 9
af
dcb
PODC/Concur 2008 World of (Im)Possibilities 21
f
Arbitrary topology G with arbitrary uncertainties is equivalent to clique G' with same nodes where uncertainty between any two nodes is length of shortest path between them in G (w.r.t. uncertainties)
[Halpern, Megiddo, Munshi]
Shift a carefully chosen execution on the clique, for 2 nodes diam apart to get the diam/2 lower bound.
a
Shifting Equivalent Clique
a b
c
de
f
3
5649 233
6
12
4
35
5
PODC/Concur 2008 World of (Im)Possibilities 22
What About Upper Bounds? For arbitrary graph and arbitrary topology,
the radius is an upper bound [Halpern, Megiddo, Munshi]
Since radius ≤ diam, within factor of 2
diam = 9radius = 5
2
3
4
3
5
2 4
1
5
af
dcb
e Tight & almost tight closed form upper bounds for some
specific common topologies with uniform uncertainties[Biaz, Welch]
PODC/Concur 2008 World of (Im)Possibilities 23
External Clock Synchronization What about external synchronization,
when some clocks have outside time sources? Previous results for internal synchronization
The tight bound on how close a node's clock can get to the source time is half the shortest path distance (w.r.t. uncertainties) from the node to a source
[Attiya, Hay, Welch]
2
3
4
3
5
2 4
1
5sourcea
f
dsource
cb
bounds are:b: 3/2c: 1/2e: 3/2f: 5/2
PODC/Concur 2008 World of (Im)Possibilities 24
Optimal Synchronization Per Execution Given information collected in a specific execution,
by some algorithm strategy, find the tightest possible synchronization internal synchronization, offline algorithm
[Attiya, Herzberg, Rajsbaum]
external synchronization, online algorithm [Patt-Shamir,
Rajsbaum] extended to handle clock drift
[Ostrovsky, Patt-Shamir]
PODC/Concur 2008 World of (Im)Possibilities 25
Gradient Clock Synchronization The clock skew between any pair of nodes should be a
function of the distance between them[Fan,
Lynch]
af
dcb
e
clocks of a and dneed not beas tightly synch'edas those of a and b
PODC/Concur 2008 World of (Im)Possibilities 26
Gradient Clock Synchronization motivated by problems in sensor networks,
or more generally, large scale networks, where nodes in the same locality need to be more tightly synchronized data fusion target tracking
http://www.mikalac.com/mis/missile.html
PODC/Concur 2008 World of (Im)Possibilities 27
Gradient Clock Synch Lower Bound Closest that two nodes' clocks can get (in worst case) is
(log D / log log D) D is diameter of network global influence
Algorithms requiring a fixed maximum skew for nearby nodes may not scale well E.g., TDMA
http://www.dsna-dti.aviation-civile.gouv.fr/actualities/revuesgb/revue64gb/64pgarticle2gb/telecom_c2gb.html
PODC/Concur 2008 World of (Im)Possibilities 28
Gradient Clock Synch Lower Bound: Assumption 1Nonzero clock drift: (hardware) clocks can run fast or slow,
within known bounds
clocktime
real time
hardwareclockmax slope
< 1+
1+
min slope< (1+)-1 (1+)-1
PODC/Concur 2008 World of (Im)Possibilities 29
Gradient Clock Synch Lower Bound: Assumption 2Algorithm must ensure that (logical) clocks always increase
at some minimum positive rate
clocktime
real time
logicalclock
min slope<
PODC/Concur 2008 World of (Im)Possibilities 30
Gradient Clock Synch LB: Simple Case
Consider a simple algorithm in which the clock value of p1 is periodically propagated down the chain
Can construct execution in which pn-1's new clock value is larger than pn's old clock value by an amount depending on D carefully choose message delays manipulate clock drift rates cause nodes to suddenly jump to higher values without
synchronizing with their neighbors Insight in the paper is generalizing this to any algorithm
pnp3p2p1
PODC/Concur 2008 World of (Im)Possibilities 31
Is the Lower Bound Tight?
Recall lower bound is (log D / log log D) Several pre-existing algorithms have O(D) Then upper bound improved to O(√D)
[Locher, Wattenhofer]
Recently upper bound improved to O(log D)[Lenzen, Locher, Wattenhofer]
Still a small gap; can the lower bound be improved?
PODC/Concur 2008 World of (Im)Possibilities 32
How Long Can Large Difference Last? In the simple diffusion algorithm on the chain,
large difference between pn-1 and pn only lasts while message is in transit
Perhaps difficulties could be avoided by keeping track of “generation” of clock value and only comparing apples with apples (clocks of the same generation)? but this could be complicated
PODC/Concur 2008 World of (Im)Possibilities 33
And There’s a Lot More… Lower bounds on space for mutual exclusion
[Burns, Lynch] Lower bound on number of messages for leader election in
synchronous rings[Frederickson, Lynch]
Impossibility results for data link layer and connection management[Fekete, Lynch, Mansour, Spinelli] [Kleinberg, Attiya, Lynch]
Lower bound on time for consensus in partially synchronous models[Attiya, Dwork, Lynch, Stockmeyer]
Lower bound on time for synchronous k-set agreement[Chaudhuri, Herlihy, Lynch, Tuttle]
Tradeoff between safety and liveness for randomized coordinated attack
[Varghese, Lynch] Impossibility of boosting fault tolerance
[Attie, Guerraoui, Kouznetsov, Lynch, Rajsbaum] …
PODC/Concur 2008 World of (Im)Possibilities 34
Final Observations
Strive to make the results relevant Natural problems Practical architectural assumptions Realistic performance measures (for lower bounds)
Crisp arguments (ingenious but clear) Easy to understand and verify Simple to extend and lead to follow-ups
PODC/Concur 2008 World of (Im)Possibilities 35
Take-Home Message
Impossibility results help the development of the area
Understanding inherent limits guides efforts in the appropriate directions
And setting boundaries is good for everyone…
Thanks for your attention
Thank you, Nancy!