distributed programming and consistency: principles and practice peter alvaro neil conway joseph m....
Post on 16-Dec-2015
232 Views
Preview:
TRANSCRIPT
Distributed Programming and Consistency: Principles and Practice
Peter AlvaroNeil Conway
Joseph M. HellersteinUC Berkeley
Part I: Principles
Motivation
Why are distributed systems hard?
UncertaintyIn communication
Why are distributed systems hard?
We attack at dawn
??
Why are distributed systems hard?
Wait for my signal Then attack!
Then attack!
Why are distributed systems hard?
Attack!No, WAIT!
?
Distributed systems are easier when messages are
Distributed systems are easier when messages are
• Reorderable
Distributed systems are easier when messages are
• Reorderable• Retryable
Distributed systems are easier when messages are
• Reorderable• Retryable• Retraction-free
(notes)
• Point to make: convergent objects are NOT retraction-free.
Context: replicated distributed systems
Distributed: connected (but not always, or well)
Replicated: redundant data
Context: replicated distributed systems
Running example: a key-value store
put(“dinner”, “pizza”)Dinner = pizza
Dinner = pizza
Dinner = pizza
get(“dinner”)
“pizza”
Context: replicated distributed systems
Distributed replication is desirableDistributed consistency is expensive
Consistency?
Definitions abound: pick one.
but try to be consistent…
What isn’t consistent?
Replication anomalies:
• Read anomalies (staleness)• Write divergence (concurrent updates)
Anomalies
Stale reads
put(“dinner”, “pasta”)Dinner = pizza
Dinner = pizza
Dinner = pizza
get(“dinner”)
“pizza”
Dinner = pasta Dinner = pasta
Dinner = pasta
?
Anomalies
Write conflicts
put(“dessert”, “cake”)Dessert = cake
Dessert = fruitput(“dessert”, “fruit”)
?
Consistency
Anomalies witness replication.
A consistent replicated datastore rules out (some) replication anomalies.
Consistency models
• Strong consistency• Eventual consistency• Weaker models
Strong consistency
AKA ``single copy’’ consistency
Replication is transparent; no witnesses of replication
Strong consistency
• Reads that could witness replication block• Concurrent writes take turns
Strong consistency
Some strategies:
• Single master with synchronous replication– Writes are totally ordered, reads see latest values
• Quorum systems– A (majority) ensemble simulates a single master
• ``State machine replication’’– Use consensus to establish a total order over reads
and writes systemwide
Strong consistency
Drawbacks:
Latency, availability, partition tolerance
Eventual Consistency
• Tolerate stale reads and concurrent writes• Ensure that eventually* all replicas converge
* When activity has ceased and all messages are delivered to all replicas
Eventual Consistency
Strategies:
• Establish a total update order off critical path (eg bayou).– Epidemic (gossip-based) replication– Tentatively apply, then possibly retract, updates as
the order is learned
Eventual Consistency
Strategies:
• Deliver updates according to a ``cheap’’ order (e.g. causal). – Break ties with timestamps, merge functions
hmmmm
Eventual Consistency
Strategies:
• Constrain the application so that updates are reorderable
Won’t always work.When will it work?
Eventual consistency – more definitions
• Convergence– Conflicting writes are uniformly resolved– Reads eventually return current data– State-centric
• Confluence– A program has deterministic executions• Output is a function of input
– Program-centric
Eventual consistency – more definitions
• Confluence is a strong correctness criterion– Not all programs are meant to be deterministic
• But it’s a nice property– E.g., for replay-based debugging– E.g., because the meaning of a program is its
output (not its ``traces’’)• Confluent => convergent
Eventual consistency – more definitions
Confluent => convergent
Deterministic executions imply replica agreement
Eventual consistency – more definitions
But convergent state does not imply deterministic executions
(peter notes)
• EC systems focus only on controlling write anomalies (stale reads always fair game, though session guarantees may restrict which read anomalies can happen)
• EC systems are convergent – eventually there are no divergent states
• Deterministic => convergent (but not tother way)• Determinism is compositional: two deterministic systems
glued together make a deterministic system• Convergence is not compositional: two convergent systems
glued together do not necessarily make a convergent system (eg if the glue is NM)
(peter notes)
• Guarded asynchrony – need to carefully explain the significance of this
• Essentially, confluent programs cannot allow one-shot queries on changing state (even monotonically changing)
• One-shot queries must be converted into subscriptions to a stream of updates– That way, we are guaranteed to see the last
update to a given lattice
(joe notes)
• There’s stuff you can do in The storage layer…. Or you can pop up.
• Layer vs language• Sequential emulation at the storage layer• Crdts – state-centric attempt to achieve relaxed
ordering (object-by-object)• Then bloom • The programming model should match the
computation model
Distributed design patternsfor eventual consistency
ACID 2.0The classic ACID has the goal to make the application perceive that there is exactly one computer and it is doing nothing else while this transaction is being processed.
Consider the new ACID (or ACID2.0). The letters stand for: Associative, Commutative, Idempotent, and Distributed. The goal for ACID2.0 is to succeed if the pieces of the work happen:
At least once, Anywhere in the system, In any order.
- Pat Helland,Building on quicksand
ACID 2.0
• Associative -- operations can be ``eagerly’’ processed
• Commutative – operations can be reordered• Idempotent – retry is always an option• Distributed – (needed a ``D’’)
ACID 2.0
Instead of low-level reads and writes programmers use an abstract vocabulary of reorderable, retryable actions:• Retry – a mechanism to ensure that all
messages are delivered• Reorderability -- ensures that all replicas
converge
Putting ACID 2.0 into practice
Putting ACID 2.0 into practice
1. CRDTs– A state-based approach– Keep distributed state in data structures
providing only ACI methods
2. Disorderly programming– A language-based approach– Encourage structuring computation using
reorderable statements and data
Formalizing ACID 2.0
• ACI are precisely the properties that define the LUB operation in a join semilattice
• If states form a lattice, we can always merge states using the LUB.
C(v)RDTs
Convergent Replicated Datatypes
Idea: • represent state as a join semilattice. • Provide a ACI merge function
CRDTs
Data structures:
1. Grow-only set (Gset)– Trivial – merge is union and union is commutative
2. 2PSet– Two Gsets – one for adds, the other for tombstones– Idiosyncrasy: you can only add/delete once.
3. Counters1. Tricky! Vector clock with an entry for each replica2. Increment @ replica I => VC[i] += 13. Value: sum of all VC values
CRDTs
Difficulties:
Convergent objects alone are not strong enough to build confluent systems.
Asynchronous messaging
You never really know
Asynchronous messaging
A B
C sendA B
C
Monotonic Logic
The more you know, the more you know.
Monotonic Logic
A B
C
E
D
A
C
E
select/filter
Monotonic Logic
A B
C project /map
f(A) f(B)
f(C)
Monotonic Logic
A BC D E
B
D
B
Djoin / compose
F
Monotonic Logic is order-insensitive
A B
C
E
D
A
C
E
Monotonic Logic is pipelineable
A B
C
E
D
A
C
E
Nonmonotonic Logic
When do you know for sure?
Nonmonotonic Logic
A BC D E
B
D
set minus A B
CD
E
Retraction!
Retraction!
X
Y
Z
Nonmonotonic logic is order-sensitive
A BC D E
B
D
set minus A
C E
Nonmonotonic logic is blocking
A
set minus A ?
A?
Nonmonotonic logic is blocking
A
set minusA?
``Sealed’’
CALM Analysis
• Asynchrony => loss of order• Nonmonotonicity => order-sensitivity• Asynchrony ; Nonmonotonicity =>
Inconsistency
[…]
CALM Analysis
• Asynchrony => loss of order• Nonmonotonicity => order-sensitivity• Asynchrony ; Nonmonotonicity =>
Inconsistency
?
``Point of Order’’
[…]
Disorderly programming
An aside about logic programming:
In (classical) logic, theories are • Associative and commutative– Consequences are the same regardless of the
order in which we make deductions• Idempotent– Axioms can be reiterated freely
Disorderly programming
An aside about logic programming:
In (classical) logic, theories are Associative, Commutative, and Idempotent because
Knowledge is monotonic:
The more you know, the more you know
Disorderly programming
An aside about logic programming:
It is challenging to even talk about order in logic programming languages [dedalus].
Yet we can build …
Disorderly programming
Idea: embody the ACID 2.0 design patterns in how we structure distributed programs.
Disorderly data: unordered relationsDisorderly code: specify how data changes over time
Bloom
Bloom Rules do |mes, mem| [mem.address, mes.id, mes.payload]end
multicast <~ (message * members)
Operational model
Time
Set(Union)
Integer(Max)
Boolean(Or)
“Growth”:Larger Sets
“Growth”:Larger Numbers
“Growth”:false true
Time
Set(merge = Union)
Integer(merge = Max)
Boolean(merge = Or)
size() >= 5
Monotone functionfrom set max
Monotone functionfrom max boolean
72
Builtin LatticesName Description ? a t b Sample Monotone
Functions
lbool Threshold test false a ∨ b when_true() ! v
lmax Increasing number
1 max(a,b)
gt(n) ! lbool+(n) ! lmax-(n) ! lmax
lmin Decreasing number
−1 min(a,b)
lt(n) ! lbool
lset Set of values ; a [ b intersect(lset) ! lsetproduct(lset) ! lset
contains?(v) ! lboolsize() ! lmax
lpset Non-negative set
; a [ b sum() ! lmax
lbag Multiset of values
; a [ b mult(v) ! lmax+(lbag) ! lbag
lmap Map from keys to lattice values
empty
map
at(v) ! any-latintersect(lmap) ! lmap
73
Quorum Vote in BloomL
QUORUM_SIZE = 5RESULT_ADDR = "example.org"
class QuorumVote include Bud
state do channel :vote_chn, [:@addr, :voter_id] channel :result_chn, [:@addr] lset :votes lmax :vote_cnt lbool :got_quorum end
bloom do votes <= vote_chn {|v| v.voter_id} vote_cnt <= votes.size got_quorum <= vote_cnt.gt_eq(QUORUM_SIZE) result_chn <~ got_quorum.when_true { [RESULT_ADDR] } endend
Map set ! max
Map max ! bool
Threshold test on bool
Lattice state declarations
Communication interfaces
Accumulate votesinto set
Annotated Ruby class
Program state
Program logic
Merge function for set lattice
Convergence – a 2PSet
The difficulty with queries
``The work of multiple transactions can interleave as long as they are doing the commutative operations. If any transaction dares to READ the value, that does not commute, is annoying, and stops other concurrent work.’’ -- Pat Helland
top related