flexible approaches to replicating shared data consistently · globally distributed concurrent...

41
Flexible approaches to replicating shared data consistently Marc Shapiro Joint work with Nishith Krishna and Karthikeyan Bhargavan

Upload: others

Post on 23-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

Flexible approaches to replicating shared data consistently

Marc ShapiroJoint work with Nishith Krishna and Karthikeyan Bhargavan

Page 2: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 1

Sharing information on a global scale

� Large numbers of users� Globally distributed� Concurrent access and update� Invariants between objects� Conflicts are rare but do occur� Variable network bandwidth, high latency

Replicate for fault tolerance, reduced latency, load balancing

Enterprise collaboration, business information

Page 3: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 2

Important lesson #1

Replication is beneficial in many information sharing scenarios:

� Preserves autonomy

� Reduces access latency

� Improves fault-tolerance

� Supports disconnected operation

Page 4: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 3

System model

0

0

1

2

0 3

Many possible schedulesIn or out? Order?

Converge

rejectedrejected

executedexecuted

action = action = value, delta value, delta or operationor operationBob

Suzy

Mary

Page 5: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 4

Synchronous updates (pessimistic replication)

1SR = 1-Copy Serialisability

Avoid conflicts a priori by locking

Sequential access Intuitive, transparent

Vulnerable to:� latency� disconnection,

faults� deadlock

Doesn’t scale if write contention

paintred

insertsmiley

timeBob

Mary

lock

lock

Page 6: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 5

Asynchronous updates (optimistic replication)

Resolve conflicts a posteriori

� Disconnected, cooperative

� Powerful� Batch & optimise

Tentative:� Diverge, rollback� Different user

experienceDoesn’t scale if write contention

paintred

insertsmiley

time

Suzy

reconcileBob

Page 7: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 6

Important lesson #2

No replication scheme is ideal for all applications.

� Performance, complexity, consistency trade-offs

� No “one-size-fits-all” design

� Pessimistic / optimistic mode visible to users

� Contention / conflicts critical

Page 8: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 7

Conflicts & non-commute

Conflict: concurrent execution would violate application invariant� e.g. calendar: no double booking

Non-commuting operations: decide order� Commuting: optimisations

Scheduling� Conflict: Is action in or out?� Non-commuting: Ordering?

Bob & SuzyMon 10:00

Mary & SuzyMon 10:00

time

Bob

Suzy

price -= 10€

price *= 1.05

Page 9: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 8

Important lesson #3

Understand your application needs and design with replication in mind.

� Capture invariants.� Design for commutativity. � Avoid concurrent non-commuting

operations.� Avoid conflicting operations.� Otherwise have modest scalability

expectations.

Page 10: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 9

Exploring the consistency design spaceUnderstanding replication & consistency

� Semantics� Asynchronous / optimistic updates� Partial replication� Decentralised

Constraint-graph representation� Break into simpler sub-problems� Composable sub-algorithms� Spectrum of solutions

New serialisation algorithm� No unnecessary aborts

Page 11: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 10

Scenario

αααα

ββββ γγγγ

δδδδ εεεε

�→

Promote 1 Julysalary += 1000

Redundancy

salary := 0

MustHave

Before

NonCommutingConflict

Atomic

CausalDependence

0 1

0 2

####

Page 12: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 11

Multilog & schedule

M = (K, →, �, #) Local view per site:� Known actions� Known constraints� Grows over time

Sound schedule: S = init α γ … ∈ Σ(M): � known actions, zero or once� αααα →→→→ ββββ ∧∧∧∧ α, β∈S ⇒ α <S β� αααα ���� ββββ ∧∧∧∧ β∈S ⇒ α∈S

M sound ⇔ Σ(M) ≠ ∅

Page 13: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 12

Protocol primitives

Guar (M) = {α | α ∈ every sound schedule }Dead (M) = {α | α ∉ every sound schedule }

Serialised(M) = {α | α # β ⇒

α→β ∨ β→α ∨ β∈Dead (M) }Decided (M) = Dead(M) ∪

(Guar(M) ∩ Serialised(M))

Monotonic in t

M sound ⇔ Guar (M) ∩ Dead (M) = ∅

Page 14: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 13

Consistency: a formal definition

Mergeability: Any combination of multilogs remains sound:

∀ i, i’, i’’,…, t, t’, t’’…Mi(t) ∪ Mi’(t’) ∪ Mi’’(t’’)… sound

Eventual Decision: Every action eventually decided everywhere:

∀α, i, j, t: α ∈ Ki(t) ⇒ ∃t’, α ∈ Decided (Mj (t’))

Omniscient observer:(∪Dead)∩(∪Guar ) = ∅

Page 15: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 14

Abstract consistency algorithm

Input: any application semantics (K, →, �, #)

Decompose into very simple sub-problems

Graphs:� I = input� B = Before� M = MustHave� S = Serialisation� O = output

Output: scheduling partial order

α

β γ

δ ε

γ

δ ε O graph

I graph

Page 16: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 15

Conflict breaking

Make dead at least one action per → cycle

B: Before edges from I� Redden a node� Delete red node and its

edges� Terminate when acyclic

Concurrent, asynchronousNumerous variants

α

β γ

δ ε

α

β γ

δ ε B graph

I graph

Page 17: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 16

Conflict-breaking spectrum

B-Null: B assumed acyclic; do nothing [file systems, Usenet, ESDS]

B-TotalOrder, B-LocalMin: UIDs [DB]B-Conservative: Redden every node ∈ cycle

[Holliday]B-HighDegree: Redden highest-degree node

[Hamadi]Sub-algorithms: Not optimalB-IceCube: Globally minimise red nodesB-Arbitrary: application/user

Page 18: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 17

Agreement

If α∈Dead ∧ α�β then β∈DeadM: MustHave edges from IColour shared across graphs

� Propagate colour along edgesConcurrent, asynchronous

α

β γ

δ ε

α

β γ

δ ε M graph

I graph

Page 19: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 18

Serialisation

Serialise non-dead # edgesS: →, # edges from I

� Delete red node & edges� Insert → along # in S, B, O� Delete # when →� Terminate when no # edges

Concurrent May create new cycles in BMany variants

α

β γ

δ ε

α

β γ

δ ε S graph

I graph

Page 20: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 19

Serialisation spectrum

S-Null: Assume no unordered #; do nothing [Usenet, C-ESDS]

S-Random: baselineS-Conservative: convert to conflict [DB]S-TotalOrder: UIDs [NC-ESDS]S-HappensBefore: follow Happens-Before

[state-machine replication]

Page 21: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 20

Output

O: edges, → from IColours from Conflict Breaking,

Agreement→ edges from Serialisation

� When 3 sub-algorithms have all terminated

� Make red nodes dead, others guaranteed

Scheduling partial order

α

β γ

δ ε

α

β γ

δ ε O graph

I graph

init

Page 22: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 21

Cycle-avoiding serialisation algorithm

Idea: given some node α� Consider all 24 possible neighbourhoods� Serialise in direction that cannot create

a cycle, if exists� Otherwise deterministic global order

α

β+

α

β+

α

β+

α

β+

α

β+

α

β+

α

β+

α

β+

α

β+

α

β+

α

β+

α

β+

α

β+

α

β+

α

β+

α

β+

α

β+

Page 23: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 22

Cycle-free serialisation algorithm

Start when B acyclic. In S:� Choose two nodes α, β� Lock α, β� Atomically perform the cycle-avoiding

serialisation move:• Insert → in S, O• Delete #

� UnlockNever causes abortsPairwise agreement

Page 24: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 23

Isolation

Transaction isolationT: initially same as S + transactions

� If ∃ → with # between T1, T2� Then ∀ → with # between T1,

T2� Terminate when done

Concurrent, asynchronousMay create new cycles in B

� C-B does not terminate before isolation

Many variants

α

β γ

δ ε

α

β γ

δ ε T graph

I graph

Page 25: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 24

Example

No two-phase commit

d1 d2

d1 d2 d3

d3

T1 = d1.r → d2.w

T2 = d2.r → d1.w → d3.r

T3 = d3.w

schedule = d2.r d1.w d3.w d3.r d1.r d2.w

serialise

isolation

serialise

Page 26: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 25

Simulations: Pseudo-realistic

B-HD = high-degreeB-Cons = conservativeB-LM = local minimum

S-AC = avoid-cyclesS-Rand = randomS-Cons = conservative

Page 27: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 26

Joyce

Document = multilog� 1 operation log / user� Operations� Constraints = logical invariants

Local views� Write to my log: updates are local� View: collect multilog, break conflicts,

replay� Consistent: resolution & replay satisfies

constraintsConvergence: authoritative log

Page 28: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 27

Joyce collaboration experience

paintred

insertsmiley

time

Suzy

reconcileBob

reconcile

Local views� Reconcile� Unlimited, selective undo

Convergence� Commit log

Page 29: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 28

Conclusion

Actions & constraints: simple, formal model� Encode application semantics� Express consistency

Basic components of consistency � Decide� Mergeability

Universal consistency protocol� Sub-algorithms� Mix & Match� Cycle-avoiding serialisation

Partial replication

Page 30: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 29

----

Page 31: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 30

Site schedule

S ∈ Σ(M)� Choose any sound schedule� Si(t+1) / Si(t) / Si’(t) may differ greatly

More actions ⇒ more non-determinism More constraints ⇒ less non-determinismEnough to ensure consistency

0

0

0

Si(t) ∈ Σ(Mi(t))

Page 32: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 31

Example more actionsmore actions⇒⇒⇒⇒⇒⇒⇒⇒

more schedulesmore schedulesmore constraintsmore constraints

⇒⇒⇒⇒⇒⇒⇒⇒

fewer schedulesfewer schedules

Page 33: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 32

Eventual Consistency

From the literature: EC =� If all clients stop submitting new

updates,� Then eventually all replicas converge to

the same value� (Eventually decide)

Page 34: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 33

Common monotonic prefix property

There exists prefix π(i,t):� Monotonic: t < t’ ⇒ π(i, t) << π(i, t’)� Equivalence: π(i, t) ≡ π(i’, t)� Eventually inclusive: α∈Ki(t) ⇒ α∈ π(i, t’)

CMP: goals to achieve

0

0

0

Page 35: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 34

Composing sub-algorithms

Parallel composition:� Any conflict-breaking algorithm� Any serialisation algorithm

Subtle termination conditions:� Parallel composition. Terminate: (1)

Serialisation, (2) Conflict-breaking, (3) Agreement

� Fast agreement minimises red nodes� Sequential composition: conflict breaking +

agreement ⇒ S acyclic. Then S-NoCycles + synchronisation

Page 36: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 35

S-AvoidCycles

Page 37: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 36

Simulations: range

B-HighDegree + S-NoCycles

Page 38: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 37

Simulations: Random

B-HD = high-degreeB-Cons = conservativeB-LM = local minimum

S-AC = avoid-cyclesS-Rand = randomS-Cons = conservative

Page 39: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 38

Incremental algorithm

Cannot decide α until all its constraints known

Iteratively dectect quiescent subgraph: timestamp matrix

Output from interation n = input to iteration n+1� Verify inclusion property

Page 40: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 39

Partial replication

A site replicates any number of disjoint databases

Receives actions, constraints relative to its replicas only

Consistency:� Mergeability� Eventual decision w.r.t. database

No need for global consensusOmniscient observer = full replication site

Page 41: Flexible approaches to replicating shared data consistently · Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable

MS Academic Days Europe 2005-04 Exploring consistency design space 40

Partial replication + Cycle-free serialisation

Partitioned database + partial replicationOperations commute across partitionA small number (often 1) of primary nodes

decide partitionIn-partition NonCommute: primary decidesCross-partition: pairwise agreement

Total order unnecessary (≠ state-machine replication)