eventual consistency jinyang. sequential consistency sequential consistency properties: –latest...

37
Eventual Consistency Jinyang

Upload: paige-brunell

Post on 22-Dec-2015

288 views

Category:

Documents


0 download

TRANSCRIPT

Eventual Consistency

Jinyang

Sequential consistency

• Sequential consistency properties:– Latest read must see latest write

• Handles caching

– All writes are applied in a single order• Handles concurrent writes

• Realizing sequential consistency:– Reads/writes from a single node execute one at a

time– All reads/writes to address X must be ordered by

one memory/storage module responsible for X

Realizing sequential consistency

W(A

)1

W(A)2

Cacheor

replica

CacheOr

replica

W(B)3Invalidate, R

(B)

Disadvantages of sequential consistency

• Requires highly available connections– Lots of chatter between clients/servers

• Not suitable for certain scenarios:– Disconnected clients (e.g. your laptop)– Apps might prefer potential inconsistency

to loss of availability

Why (not) eventual consistency?

• Support disconnected operations– Better to read a stale value than nothing– Better to save writes somewhere than nothing

• Potentially anomalous application behavior– Stale reads and conflicting writes…

Operating w/o total connectivity

replica replica

Client writes to its local replica

W(A)1 W(A)2

Sync w/ server resolves non-conflicting changes,reports conflicting ones

to user

No sync between clients

Pair-wise synchronization

replica replica

replica

W(A)1 W(A)2

W(B)3Pair-wise sync resolves non-conflicting changes,reports conflicting ones

to users

Examples usages?

• File synchronizers– One user, many gadgets

File synchronizer

• Goal1. All replica contents eventually become

identical

2. No lost updates– Do not replace new version with old ones

Prevent lost updates

• Detect if updates were sequential– If so, replace old version with new one– If not, detect conflict

• “Optimistic” vs. “Pessimistic” – Eventual Consistency: Let updates

happen, worry about whether they can be serialized later

– Sequential Consistency: Updates cannot take effect unless they are serialized first

How to prevent lost updates?

• Strawman: use mtime to decide which version should replace the other

• Problem w/ wallclock: cannot detect disagreement on ordering

H1

H2

W(f)a

mtime: 15648

W(f)c

23657

f

W(f)b

16679f

12354f 15648

Strawman fix

• Carry the entire modification history

• If history X is a prefix of Y, Y is newer

H1

W(f)a W(f)b

W(f)c

H1:15648

H1:15648

H1:15648H1:16679

H1:15648H2:23657

Compress version history

H1

W(f)a W(f)b

W(f)c

H1:1

H1:1

H1:1H1:2

H1:1H1:2H2:1

H1:1H1:2

H1:2 implies H1:1,so we only need one

number per host

H1:1 H1:2

H1:1 H1:2 H1:2H2:1

H2

Compare vector timestamp

H1:1H2:3H3:2

H1:1H2:5H3:7

H1:1H2:3H3:2

H1:2H2:1H3:7

<

<

Using vector timestamp

H1

W(f)a W(f)b

W(f)c

H1:1 H1:2

H1:1 H1:1H2:1

H1:2 H1:2H2:1

H2

Using vector timestamp

H1

W(f)a W(f)b

W(f)c

H1:1 H1:2

H1:1 H1:1H2:1H1:1H2:1

H2

How to deal w/ conflicts?

• Easy: mailboxes w/ two different set of messages

• Medium: changes to different lines of a C source file

• Hard: changes to same line of a C source file

• After conflict resolution, what should the vector timestamp be?

What about file deletion?

• Can we forget about the vector timestamp for deleted files?

• Simple solution: treat deletion as a write– Conflicts involving a deleted file is easy

• Downside:– Need to remember vector timestamp for

deleted files indefinitely

Tra [Cox, Josephson]

• What are Tra’s novel properties?– Easy to compress storage of vector

timestamps– No need to check every file’s version vector

during sync– Allows partial sync of subtrees– No need to keep timestamp for deleted files

forever

Tra’s key technique

• Two vector timestamps:1. One represents modification time

– Tracks what a host has

2. One represents synchronization time– Tracks what a host knows

• Sync time implies no modification happens since mod time

H1:1H2:5H3:7

H1:10H2:20H3:25

f1 f2H1:0

H1:0H2:0

H1:0

H1:0H2:0

Using sync time

H1

W(f1)a W(f2)b

H1:1

H1:1H2:0

H2

H1:2

H1:2H2:0

f1

f1 f2H1:1

H1:2H2:0

H1:2

H1:2H2:0

f2

Compress mtime and synctime

• dir synctime = element-wise min of child sync times

• dir mtime = element-wise max of child mod times

• Sync(d1d1’)– Skip d1 if mtime of d1 is less than synctime of d1’

• Can we achieve this with single mtime?– Skip d1 if mtime of d1 is less than mtime of d1’

Synctime enables partial synchronization

• Directory d1 contains f1 and f2, suppose host sync a subtree (d1/f1)– With synctime+mtime: synctime of d1 does not

change. Mtime of d1 increases– With mtime only: Mtime of d1 increases

• Host later syncs subtree d1/f2– With synctime+mtime: will pull in modifications in

e2 because synctime of d1 is smaller– With mtime only: skips d1 because mtime is high

enough

f2 H1:0H1:0H2:0

Using sync time

H1

W(f1)a W(f2)b

H1:1

H2

H1:2f1 f2

H1:2

H1:2H2:0

d

Sync f1 only

f1 H1:0H1:0H2:0

H1:2

H1:0H2:0

d

f1 H1:1H1:2H2:0

H1:2

H1:0H2:0

d

Sync f2 only

f1 H1:1

H1:2

H1:2H2:0

d

f2 H1:2

f2 H1:0

How to deal w/ deletion

H1

W(f1)a D(f2)

H1:1

H2

f1 f2

H1:2

H1:2H2:0

d

f1 H1:0

H1:0

H1:0H2:0

d

H1:2H2:0

Deletion notice for a deleted file

contains its sync time

f1 H1:1

H1:2

H1:2H2:0

d

f2

How to deal w/ deletion

H1

W(f1)a D(f2)

H1:1

H2

f1 f2

H1:2

H1:2H2:0

d

f1 H1:0

H1:0

H1:0H2:1

d

H1:2H2:0

Deletion notice for a deleted file

contains its sync time

H2:1 H2:1f1 H1:1

H1:2

H1:2H2:1

d

f2

Another definition of eventual consistency

• Eventual consistency (Tra)– All replica contents are eventually identical– Do not care about individual writes, just

overwrite old replica w/ new one

• Eventual consistency (Bayou)– Writes are eventually applied in total order– Reads might not see most recent writes in

total order

Bayou

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

0:01:02:0

0:01:02:0

0:01:02:0

N0

N1

N2

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

0:31:02:0

N0

N1

N2

1:0 W(x)2:0 W(y)3:0 W(z)

0:01:12:0

0:01:02:0

1:1 W(x)

1:0 W(x)2:0 W(y)3:0 W(z)

0:31:02:0

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

0:31:02:0

N0

N1

N2

1:0 W(x)2:0 W(y)3:0 W(z)

0:31:42:0

0:01:02:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

1:1 W(x)0:31:42:0

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

N0

N1

N2

0:31:42:0

0:01:02:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:41:42:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z) Which portion of

The log is stable?

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

N0

N1

N2

0:31:42:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:41:42:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:31:42:5

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

N0

N1

N2

0:31:62:5

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:41:42:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:41:42:5

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:31:42:5

Bayou uses a primary to commit a total order

• Why is it important to make log stable?– Stable writes can be committed – Stable portion of the log can be truncated

• Problem: If any node is offline, the stable portion of all logs stops growing

• Bayou’s solution:– A designated primary defines a total commit order – Primary assigns CSNs (commit-seq-no)– Any write with a known CSN is stable– All stable writes are ordered before tentative writes

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

0:31:02:0

N0

N1

N2

1:1:0 W(x)2:2:0 W(y)3:3:0 W(z)

0:01:12:0

0:01:02:0

∞:1:1 W(x)

∞:1:1 W(x) 0:01:12:0

Bayou propagation

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

VersionVector

Write log

0:41:12:0

N0

N1

N2

1:1:0 W(x)2:2:0 W(y)3:3:0 W(z)

0:01:12:0

0:01:02:0

∞:1:1 W(x)

4:1:1 W(x)

1:1:0 W(x)2:2:0 W(y)3:3:0 W(z)4:1:1 W(x)

0:41:12:0

Bayou’s limitations

• Primary cannot fail

• Server creation & retirement makes nodeID grow arbitrarily long

• Anomalous behaviors for apps?– Calendar app