![Page 1: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/1.jpg)
Fault-Tolerance, Fast and Slow:Exploiting Failure Asynchrony
in Distributed SystemsRamnatthan Alagappan, Aishwarya Ganesan, Jing Liu, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau
![Page 2: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/2.jpg)
OSDI ‘18
Replication Protocols
1
Viewstamped replication RaftPaxos
![Page 3: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/3.jpg)
OSDI ‘18
Replication Protocols
1
GFSColossusBigTable
Foundation upon which datacenter systems are built
Viewstamped replication RaftPaxos
![Page 4: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/4.jpg)
OSDI ‘18
The Two Different Worlds of Replication
2
![Page 5: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/5.jpg)
OSDI ‘18
The Two Different Worlds of Replication
2
World-1 World-2
![Page 6: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/6.jpg)
OSDI ‘18
The Two Different Worlds of Replication
2
How and where to store system state?World-1 World-2
![Page 7: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/7.jpg)
OSDI ‘18
The Two Different Worlds of Replication
2
How and where to store system state?
synchronously persist updates to disks
Disk-durable
World-1 World-2
![Page 8: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/8.jpg)
OSDI ‘18
buffer updates only in volatile memory
The Two Different Worlds of Replication
2
How and where to store system state?
synchronously persist updates to disks
Disk-durable
World-1
Memory-durable
World-2
![Page 9: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/9.jpg)
OSDI ‘18
buffer updates only in volatile memory
The Two Different Worlds of Replication
2
How and where to store system state?
synchronously persist updates to disks
Disk-durable
World-1
Paxos, Raft [ATC ‘14], ZAB [DSN ‘11], Gaios [NSDI ‘11], ZooKeeper, etcd, LogCabin …
Memory-durable
World-2
![Page 10: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/10.jpg)
OSDI ‘18
buffer updates only in volatile memory
The Two Different Worlds of Replication
2
How and where to store system state?
synchronously persist updates to disks
Disk-durable
World-1
Paxos, Raft [ATC ‘14], ZAB [DSN ‘11], Gaios [NSDI ‘11], ZooKeeper, etcd, LogCabin …
Memory-durable
World-2
Viewstamped replication, NOPaxos [OSDI ‘16],SpecPaxos [NSDI ’15] …
![Page 11: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/11.jpg)
OSDI ‘18
buffer updates only in volatile memory
The Two Different Worlds of Replication
2
How and where to store system state?
synchronously persist updates to disks
Disk-durable
World-1
Neither approach is ideal: reliable or performant
Paxos, Raft [ATC ‘14], ZAB [DSN ‘11], Gaios [NSDI ‘11], ZooKeeper, etcd, LogCabin …
Memory-durable
World-2
Viewstamped replication, NOPaxos [OSDI ‘16],SpecPaxos [NSDI ’15] …
![Page 12: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/12.jpg)
OSDI ‘18
buffer updates only in volatile memory
The Two Different Worlds of Replication
2
How and where to store system state?
synchronously persist updates to disks
Disk-durable
World-1
Neither approach is ideal: reliable or performant
Paxos, Raft [ATC ‘14], ZAB [DSN ‘11], Gaios [NSDI ‘11], ZooKeeper, etcd, LogCabin …
Memory-durable
World-2
Viewstamped replication, NOPaxos [OSDI ‘16],SpecPaxos [NSDI ’15] …
safe but suffer from poor performance
![Page 13: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/13.jpg)
OSDI ‘18
buffer updates only in volatile memory
The Two Different Worlds of Replication
2
How and where to store system state?
synchronously persist updates to disks
Disk-durable
World-1
Neither approach is ideal: reliable or performant
Paxos, Raft [ATC ‘14], ZAB [DSN ‘11], Gaios [NSDI ‘11], ZooKeeper, etcd, LogCabin …
Memory-durable
World-2
Viewstamped replication, NOPaxos [OSDI ‘16],SpecPaxos [NSDI ’15] …
safe but suffer from poor performance
performant but riskunsafety or unavailability
![Page 14: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/14.jpg)
OSDI ‘18
Can a protocol provide strong reliability while maintaining high performance?
3
![Page 15: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/15.jpg)
OSDI ‘18
SAUCR:Situation-Aware Updates and Crash Recovery
4
![Page 16: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/16.jpg)
OSDI ‘18
SAUCR:Situation-Aware Updates and Crash Recovery
Simple insight: dynamically (based on the situation) decide how to commit updates
4
![Page 17: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/17.jpg)
OSDI ‘18
SAUCR:Situation-Aware Updates and Crash Recovery
Simple insight: dynamically (based on the situation) decide how to commit updates
with many or all nodes up, buffer in memory – fast modewith failures, if only bare majority up, flush to disk – slow mode
4
![Page 18: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/18.jpg)
OSDI ‘18
SAUCR:Situation-Aware Updates and Crash Recovery
Simple insight: dynamically (based on the situation) decide how to commit updates
with many or all nodes up, buffer in memory – fast modewith failures, if only bare majority up, flush to disk – slow mode
Strong reliability while maintaining high performance
4
![Page 19: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/19.jpg)
OSDI ‘18
Simultaneity of Failures
SAUCR’s effectiveness depends upon simultaneity of failures
5
![Page 20: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/20.jpg)
OSDI ‘18
Simultaneity of Failures
SAUCR’s effectiveness depends upon simultaneity of failuresindependent and non-simultaneous correlated (gap of a few milliseconds to a few seconds)
can react and switch from fast to slow modepreserves durability and availability
5
![Page 21: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/21.jpg)
OSDI ‘18
Simultaneity of Failures
SAUCR’s effectiveness depends upon simultaneity of failuresindependent and non-simultaneous correlated (gap of a few milliseconds to a few seconds)
can react and switch from fast to slow modepreserves durability and availability
many truly simultaneous correlatedno gap and so cannot reactremain unavailable
5
![Page 22: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/22.jpg)
OSDI ‘18
Simultaneity of Failures
SAUCR’s effectiveness depends upon simultaneity of failuresindependent and non-simultaneous correlated (gap of a few milliseconds to a few seconds)
can react and switch from fast to slow modepreserves durability and availability
many truly simultaneous correlatedno gap and so cannot reactremain unavailable
however, existing data hints they are extremely rare –the Non-Simultaneity Conjecture
5
![Page 23: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/23.jpg)
OSDI ‘18
Results
6
![Page 24: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/24.jpg)
OSDI ‘18
Results
Implemented in ZooKeeper
6
![Page 25: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/25.jpg)
OSDI ‘18
Results
Implemented in ZooKeeperSAUCR improves reliability compared to memory-durable systems
durable and available in 100s of crash scenariosmemory-durable loses data or becomes unavailable
6
![Page 26: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/26.jpg)
OSDI ‘18
Results
Implemented in ZooKeeperSAUCR improves reliability compared to memory-durable systems
durable and available in 100s of crash scenariosmemory-durable loses data or becomes unavailable
Improvements at no or little costoverheads within 0%-9% of memory-durable systems
6
![Page 27: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/27.jpg)
OSDI ‘18
Results
Implemented in ZooKeeperSAUCR improves reliability compared to memory-durable systems
durable and available in 100s of crash scenariosmemory-durable loses data or becomes unavailable
Improvements at no or little costoverheads within 0%-9% of memory-durable systems
Compared to disk-durableslight reduction in availability in extremely rare casesimproves performance – 2.5x on SSDs, 100x on HDDs
6
![Page 28: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/28.jpg)
OSDI ‘18
Outline
IntroductionDistributed updates and crash recovery
disk-durable protocolsmemory-durable protocols
Situation-aware updates and crash recoveryResultsSummary and conclusion
7
![Page 29: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/29.jpg)
OSDI ‘18
Disk-Durable Protocols
8
![Page 30: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/30.jpg)
OSDI ‘18
Disk-Durable Protocols
8
Leader Follower Follower
Update
![Page 31: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/31.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1
Leader Follower Follower
Update
![Page 32: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/32.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1
A=2
Leader Follower Follower
Client
Update
![Page 33: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/33.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1
A=2
Leader Follower Follower
Client
Update
A=2A=2
![Page 34: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/34.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1
A=2
Leader Follower Follower
Client
Update
A=2 A=2
![Page 35: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/35.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1A=2
Leader Follower Follower
fsync
Client
Update
fsync
A=2 A=2
fsync
![Page 36: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/36.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1A=2
Leader Follower Follower
fsync
ClientCommitted
fsync completed on a majority ?
Update
fsync
A=2 A=2
fsync
![Page 37: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/37.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1A=2
Leader Follower Follower
fsync
ClientCommitted
Recovery
fsync completed on a majority ?
Update
fsync
A=2 A=2
fsync
![Page 38: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/38.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1A=2
Leader Follower Follower
fsync
ClientCommitted
Recovery
fsync completed on a majority ?
if ack’d anyone, data on disk – safe Update
fsync
A=2 A=2
fsync
![Page 39: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/39.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1A=2
Leader Follower Follower
fsync
ClientCommitted
Recovery
fsync completed on a majority ?
A=1 A=2
if ack’d anyone, data on disk – safe Update
fsync
A=2 A=2
fsync
![Page 40: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/40.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1A=2
Leader Follower Follower
fsync
ClientCommitted
Recovery
fsync completed on a majority ?
A=1 A=2
if ack’d anyone, data on disk – safe Update
fsync
A=2 A=2
fsync
![Page 41: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/41.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1A=2
Leader Follower Follower
fsync
ClientCommitted
Recovery
fsync completed on a majority ?
A=1 A=2
if ack’d anyone, data on disk – safe Update
fsync
A=2 A=2
fsync
![Page 42: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/42.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1A=2
Leader Follower Follower
fsync
ClientCommitted
Recovery
fsync completed on a majority ?
A=1 A=2
ready
imm
edia
te
if ack’d anyone, data on disk – safe
A=1 A=2
Update
fsync
A=2 A=2
fsync recovery: just read from local disk
![Page 43: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/43.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1A=2
Leader Follower Follower
fsync
ClientCommitted
Recovery
fsync completed on a majority ?
A=1 A=2
ready
imm
edia
te
if ack’d anyone, data on disk – safe
A=1 A=2
Update
fsync
A=2 A=2
fsync recovery: just read from local disk
A=1
ready
imm
edia
te
A=1
lagging
![Page 44: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/44.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1A=2
Leader Follower Follower
fsync
ClientCommitted
Recovery
fsync completed on a majority ?
A=1 A=2
ready
imm
edia
te
if ack’d anyone, data on disk – safe
A=1 A=2
Update
fsync
A=2 A=2
Safe and available
fsync recovery: just read from local disk
A=1
ready
imm
edia
te
A=1
lagging
![Page 45: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/45.jpg)
OSDI ‘18
Disk-Durable Protocols
8
A=1 A=1 A=1A=2
Leader Follower Follower
fsync
ClientCommitted
Recovery
But poor performance due to fsync – 50x on HDDs, 2.5x on SSDs
fsync completed on a majority ?
A=1 A=2
ready
imm
edia
te
if ack’d anyone, data on disk – safe
A=1 A=2
Update
fsync
A=2 A=2
Safe and available
fsync recovery: just read from local disk
A=1
ready
imm
edia
te
A=1
lagging
![Page 46: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/46.jpg)
OSDI ‘18 9
Leader Follower Follower
Client
Update
Memory
A=1
Memory
A=1
Memory
A=1
A=2
Memory-Durable Protocols (Oblivious Recovery)
![Page 47: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/47.jpg)
OSDI ‘18 9
Leader Follower Follower
ClientCommitted
Update
Memory
A=1
Memory
A=1
Memory
A=1A=2 A=2 A=2
buffered on a majority ?
Memory-Durable Protocols (Oblivious Recovery)
![Page 48: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/48.jpg)
OSDI ‘18 9
Leader Follower Follower
ClientCommitted
RecoveryUpdate
Memory
A=1
Memory
A=1
Memory
A=1A=2 A=2 A=2
buffered on a majority ?
Memory-Durable Protocols (Oblivious Recovery)
![Page 49: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/49.jpg)
OSDI ‘18 9
Leader Follower Follower
ClientCommitted
RecoveryUpdateOblivious: doesn’t realize loss on failure
Memory
A=1
Memory
A=1
Memory
A=1A=2 A=2 A=2
buffered on a majority ?
Memory-Durable Protocols (Oblivious Recovery)
![Page 50: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/50.jpg)
OSDI ‘18
Memory
9
Leader Follower Follower
ClientCommitted
RecoveryUpdateOblivious: doesn’t realize loss on failure
A=1 A=2
Memory
A=1
Memory
A=1
Memory
A=1A=2 A=2 A=2
buffered on a majority ?
Memory-Durable Protocols (Oblivious Recovery)
![Page 51: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/51.jpg)
OSDI ‘18
Memory
9
Leader Follower Follower
ClientCommitted
RecoveryUpdateOblivious: doesn’t realize loss on failure
A=1 A=2
Memory
A=1
Memory
A=1
Memory
A=1A=2 A=2 A=2
buffered on a majority ?
Memory-Durable Protocols (Oblivious Recovery)
![Page 52: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/52.jpg)
OSDI ‘18
Memory
9
Leader Follower Follower
ClientCommitted
RecoveryUpdateOblivious: doesn’t realize loss on failure
Memory
A=1
Memory
A=1
Memory
A=1A=2 A=2 A=2
buffered on a majority ?
Memory-Durable Protocols (Oblivious Recovery)
![Page 53: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/53.jpg)
OSDI ‘18
MemoryMemory
9
Leader Follower Follower
ClientCommitted
RecoveryUpdateOblivious: doesn’t realize loss on failure
ready
Memory
A=1
Memory
A=1
Memory
A=1A=2 A=2 A=2
buffered on a majority ?
imm
edia
te
Memory-Durable Protocols (Oblivious Recovery)
![Page 54: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/54.jpg)
OSDI ‘18
MemoryMemory
9
Leader Follower Follower
ClientCommitted
RecoveryUpdateOblivious: doesn’t realize loss on failure
ready
Memory
A=1
Memory
A=1
Memory
A=1A=2 A=2 A=2
buffered on a majority ?
imm
edia
te
Memory-Durable Protocols (Oblivious Recovery)
e.g., ZooKeeper with forceSync = falsepractitioners do use this config!
![Page 55: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/55.jpg)
OSDI ‘18
MemoryMemory
9
Leader Follower Follower
ClientCommitted
RecoveryUpdateOblivious: doesn’t realize loss on failure
ready
Memory
A=1
Memory
A=1
Memory
A=1A=2 A=2 A=2
buffered on a majority ?
imm
edia
te
Memory-Durable Protocols (Oblivious Recovery)
Performant
e.g., ZooKeeper with forceSync = falsepractitioners do use this config!
![Page 56: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/56.jpg)
OSDI ‘18
But can lead to data loss
MemoryMemory
9
Leader Follower Follower
ClientCommitted
RecoveryUpdateOblivious: doesn’t realize loss on failure
ready
Memory
A=1
Memory
A=1
Memory
A=1A=2 A=2 A=2
buffered on a majority ?
imm
edia
te
Memory-Durable Protocols (Oblivious Recovery)
Performant
e.g., ZooKeeper with forceSync = falsepractitioners do use this config!
![Page 57: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/57.jpg)
OSDI ‘18
Data Loss Example in Oblivious Approach
10
![Page 58: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/58.jpg)
OSDI ‘18
Data Loss Example in Oblivious Approach
10
![Page 59: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/59.jpg)
OSDI ‘18
Data Loss Example in Oblivious Approach
10
A=1
A=1
A=1
A=1 committed
![Page 60: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/60.jpg)
OSDI ‘18
Data Loss Example in Oblivious Approach
10
A=1
A=1
A=1
A=1 committedtwo nodes slow or failed
![Page 61: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/61.jpg)
OSDI ‘18
Data Loss Example in Oblivious Approach
10
A=1
A=1
A=1
A=1 committed
A=1
A=1
A=1
two nodes slow or failed
![Page 62: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/62.jpg)
OSDI ‘18
Data Loss Example in Oblivious Approach
10
A=1
A=1
A=1
A=1 committed
A=1
A=1
A=1
crashestwo nodes slow or failed
![Page 63: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/63.jpg)
OSDI ‘18
Data Loss Example in Oblivious Approach
10
A=1
A=1
A=1
A=1 committed
A=1
A=1
A=1
crashestwo nodes slow or failed
, recovers
![Page 64: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/64.jpg)
OSDI ‘18
Data Loss Example in Oblivious Approach
10
A=1
A=1
A=1
A=1 committed
A=1
A=1
crashestwo nodes slow or failed
, recoversloses its data but oblivious:
immediately joins
![Page 65: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/65.jpg)
OSDI ‘18
Data Loss Example in Oblivious Approach
10
A=1
A=1
A=1
A=1 committed
A=1
A=1
crashestwo nodes slow or failed
, recoversloses its data but oblivious:
immediately joins
A=1
A=1
![Page 66: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/66.jpg)
OSDI ‘18
Data Loss Example in Oblivious Approach
10
A=1
A=1
A=1
A=1 committed
A=1
A=1
crashes
A=1
A=1
two nodes slow or failed
, recoversloses its data but oblivious:
immediately joins
![Page 67: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/67.jpg)
OSDI ‘18
Data Loss Example in Oblivious Approach
10
A=1
A=1
A=1
A=1 committed
A=1
A=1
crashes
A=1
A=1
two nodes slow or failed
, recoversloses its data but oblivious:
immediately joins
lagging nodes along with recovered node form majority;
lose committed update
majority do not know
of previously committed update
![Page 68: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/68.jpg)
OSDI ‘18 11
Leader Follower Follower
ClientCommitted
Update
Memory
A=1
Memory
A=1
Memory
A=1A=2A=2
buffered on a majority ?
Memory-Durable Protocols (Loss-Aware Recovery)
A=2
![Page 69: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/69.jpg)
OSDI ‘18 11
Leader Follower Follower
ClientCommitted
RecoveryUpdate
Memory
A=1
Memory
A=1
Memory
A=1A=2A=2
buffered on a majority ?
Memory-Durable Protocols (Loss-Aware Recovery)
A=2
![Page 70: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/70.jpg)
OSDI ‘18 11
Leader Follower Follower
ClientCommitted
RecoveryUpdate
Loss-aware: realizes loss, waits for majority
Memory
A=1
Memory
A=1
Memory
A=1A=2A=2
buffered on a majority ?
Memory-Durable Protocols (Loss-Aware Recovery)
A=2
![Page 71: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/71.jpg)
OSDI ‘18
Memory
11
Leader Follower Follower
ClientCommitted
RecoveryUpdate
Loss-aware: realizes loss, waits for majority
A=1 A=2
Memory
A=1
Memory
A=1
Memory
A=1A=2A=2
buffered on a majority ?
Memory-Durable Protocols (Loss-Aware Recovery)
A=2
![Page 72: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/72.jpg)
OSDI ‘18
Memory
11
Leader Follower Follower
ClientCommitted
RecoveryUpdate
Loss-aware: realizes loss, waits for majority
A=1 A=2
Memory
A=1
Memory
A=1
Memory
A=1A=2A=2
buffered on a majority ?
Memory-Durable Protocols (Loss-Aware Recovery)
A=2
![Page 73: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/73.jpg)
OSDI ‘18
Memory
11
Leader Follower Follower
ClientCommitted
RecoveryUpdate
Loss-aware: realizes loss, waits for majority
Memory
A=1
Memory
A=1
Memory
A=1A=2A=2
buffered on a majority ?
Memory-Durable Protocols (Loss-Aware Recovery)
A=2
![Page 74: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/74.jpg)
OSDI ‘18 11
Leader Follower Follower
ClientCommitted
RecoveryUpdate
Loss-aware: realizes loss, waits for majority
recoveringwait for majority
responsesMemory
A=1
Memory
A=1
Memory
A=1A=2A=2
buffered on a majority ?
Memory-Durable Protocols (Loss-Aware Recovery)
A=2
![Page 75: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/75.jpg)
OSDI ‘18
Memory
11
Leader Follower Follower
ClientCommitted
RecoveryUpdate
Loss-aware: realizes loss, waits for majority
recoveringwait for majority
responses
ready
Memory
A=1
Memory
A=1
Memory
A=1A=2A=2
buffered on a majority ?
maj
ority
re
spon
ses
Memory-Durable Protocols (Loss-Aware Recovery)
A=1 A=2
A=2
![Page 76: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/76.jpg)
OSDI ‘18
Memory
11
Leader Follower Follower
ClientCommitted
RecoveryUpdate
Loss-aware: realizes loss, waits for majority
recoveringwait for majority
responses
ready
Memory
A=1
Memory
A=1
Memory
A=1A=2A=2
buffered on a majority ?
maj
ority
re
spon
ses
Memory-Durable Protocols (Loss-Aware Recovery)
A=1 A=2
A=2 e.g., Viewstamped replication
![Page 77: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/77.jpg)
OSDI ‘18
Avoids loss (unlike oblivious) but can lead to unavailability
Memory
11
Leader Follower Follower
ClientCommitted
RecoveryUpdate
Loss-aware: realizes loss, waits for majority
recoveringwait for majority
responses
ready
Memory
A=1
Memory
A=1
Memory
A=1A=2A=2
buffered on a majority ?
maj
ority
re
spon
ses
Memory-Durable Protocols (Loss-Aware Recovery)
A=1 A=2
A=2 e.g., Viewstamped replication
![Page 78: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/78.jpg)
OSDI ‘18
Unavailability Example in Loss-Aware Approach
12
![Page 79: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/79.jpg)
OSDI ‘18
Unavailability Example in Loss-Aware Approach
12
A=1
A=1
A=1
A=1 committedtwo nodes crashed
![Page 80: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/80.jpg)
OSDI ‘18
Unavailability Example in Loss-Aware Approach
12
A=1
A=1
A=1
A=1 committed
A=1
A=1
A=1
two nodes crashed
![Page 81: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/81.jpg)
OSDI ‘18
Unavailability Example in Loss-Aware Approach
12
A=1
A=1
A=1
A=1 committed
A=1
A=1
A=1
crashestwo nodes crashed
![Page 82: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/82.jpg)
OSDI ‘18
Unavailability Example in Loss-Aware Approach
12
A=1
A=1
A=1
A=1 committed
A=1
A=1
A=1
crashestwo nodes crashed
, recovers
![Page 83: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/83.jpg)
OSDI ‘18
Unavailability Example in Loss-Aware Approach
12
A=1
A=1
A=1
A=1 committed
A=1
A=1
crashestwo nodes crashed
, recoverscannot collect
majority responsesalthough majority up – unavailable
![Page 84: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/84.jpg)
OSDI ‘18
Unavailability Example in Loss-Aware Approach
12
A=1
A=1
A=1
A=1 committed
A=1
A=1
crashestwo nodes crashed
, recoverscannot collect
majority responsesalthough majority up – unavailable
A=1
A=1
![Page 85: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/85.jpg)
OSDI ‘18
Unavailability Example in Loss-Aware Approach
12
A=1
A=1
A=1
A=1 committed
A=1
A=1
crashestwo nodes crashed
, recoverscannot collect
majority responsesalthough majority up – unavailable
A=1
A=1
failed nodes recover
![Page 86: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/86.jpg)
OSDI ‘18
Unavailability Example in Loss-Aware Approach
12
A=1
A=1
A=1
A=1 committed
A=1
A=1
crashestwo nodes crashed
, recoverscannot collect
majority responsesalthough majority up – unavailable
A=1
A=1
failed nodes recoverstay in recovering
unavailable even after all nodes recover
![Page 87: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/87.jpg)
OSDI ‘18
Outline
IntroductionDistributed updates and crash recoverySituation-aware updates and crash recovery
SAUCR insights, guarantees, and overviewsituation-aware updatessituation-aware crash recovery
ResultsSummary and conclusion
13
![Page 88: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/88.jpg)
OSDI ‘18 14
SAUCR Intuition and Insight
![Page 89: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/89.jpg)
OSDI ‘18
Existing protocols are static in nature: do not adapt to failures
14
SAUCR Intuition and Insight
![Page 90: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/90.jpg)
OSDI ‘18
Existing protocols are static in nature: do not adapt to failures
14
buffer even with many failures
Memory-durable
always
poor reliability
SAUCR Intuition and Insight
![Page 91: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/91.jpg)
OSDI ‘18
Existing protocols are static in nature: do not adapt to failures
14
buffer even with many failures
Memory-durable
persist even when no failures
Disk-durable
alwaysalways
poor reliability poor performance
SAUCR Intuition and Insight
![Page 92: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/92.jpg)
OSDI ‘18
Existing protocols are static in nature: do not adapt to failures
Insight: reacting to failures and adapting to situation can achieve reliability and performance
14
buffer even with many failures
Memory-durable
persist even when no failures
Disk-durable
alwaysalways
poor reliability poor performance
SAUCR Intuition and Insight
![Page 93: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/93.jpg)
OSDI ‘18
Existing protocols are static in nature: do not adapt to failures
Insight: reacting to failures and adapting to situation can achieve reliability and performance
when no or few failures could buffer in memory
14
buffer even with many failures
Memory-durable
persist even when no failures
Disk-durable
alwaysalwayscommon case
when many or all up
buffer in memory
Memory-durable
poor reliability poor performance
SAUCR Intuition and Insight
![Page 94: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/94.jpg)
OSDI ‘18
Existing protocols are static in nature: do not adapt to failures
Insight: reacting to failures and adapting to situation can achieve reliability and performance
when no or few failures could buffer in memorywhen failure arise, flush
14
buffer even with many failures
Memory-durable
persist even when no failures
Disk-durable
alwaysalwayswith failures
when only minimum upcommon case
when many or all up
buffer in memory
flush to disk
Memory-durable Disk-durable
poor reliability poor performance
SAUCR Intuition and Insight
![Page 95: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/95.jpg)
OSDI ‘18
Guarantees Depend upon Simultaneity of Failures
15
![Page 96: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/96.jpg)
OSDI ‘18
Guarantees Depend upon Simultaneity of Failures
15
With non-simultaneous, gap exists, SAUCR can react and ensures durability
![Page 97: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/97.jpg)
OSDI ‘18
Guarantees Depend upon Simultaneity of Failures
15
With non-simultaneous, gap exists, SAUCR can react and ensures durabilityindependent: likelihood of many nodes failing together is negligible
![Page 98: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/98.jpg)
OSDI ‘18
Guarantees Depend upon Simultaneity of Failures
15
With non-simultaneous, gap exists, SAUCR can react and ensures durabilityindependent: likelihood of many nodes failing together is negligiblecorrelated: many nodes fail together
although many nodes fail, not necessarily simultaneous; most cases, non-simultaneous
![Page 99: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/99.jpg)
OSDI ‘18
Guarantees Depend upon Simultaneity of Failures
15
With non-simultaneous, gap exists, SAUCR can react and ensures durabilityindependent: likelihood of many nodes failing together is negligiblecorrelated: many nodes fail together
although many nodes fail, not necessarily simultaneous; most cases, non-simultaneous
With simultaneous correlated, no gap, SAUCR cannot react, unavailable
![Page 100: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/100.jpg)
OSDI ‘18
Guarantees Depend upon Simultaneity of Failures
15
With non-simultaneous, gap exists, SAUCR can react and ensures durabilityindependent: likelihood of many nodes failing together is negligiblecorrelated: many nodes fail together
although many nodes fail, not necessarily simultaneous; most cases, non-simultaneous
With simultaneous correlated, no gap, SAUCR cannot react, unavailable
We conjecture they are extremely rare: a gap exists between failurescorrelated but a few seconds apart [Ford et al., OSDI ‘10]analysis reveals a gap of 50 ms or more almost always
![Page 101: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/101.jpg)
OSDI ‘18
Guarantees Depend upon Simultaneity of Failures
15
With non-simultaneous, gap exists, SAUCR can react and ensures durabilityindependent: likelihood of many nodes failing together is negligiblecorrelated: many nodes fail together
although many nodes fail, not necessarily simultaneous; most cases, non-simultaneous
With simultaneous correlated, no gap, SAUCR cannot react, unavailable
We conjecture they are extremely rare: a gap exists between failurescorrelated but a few seconds apart [Ford et al., OSDI ‘10]analysis reveals a gap of 50 ms or more almost always
Most cases: any no. of independent and non-simultaneous correlated – same as disk-durableRare cases: more than a majority crash truly simultaneously – remain unavailable
![Page 102: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/102.jpg)
OSDI ‘18
SAUCR Overview
16
![Page 103: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/103.jpg)
OSDI ‘18
SAUCR Overview
Updateswhen more than a majority up, buffer updates in memory – fast mode
e.g., 4 or 5 nodes up in a 5-node cluster
16
![Page 104: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/104.jpg)
OSDI ‘18
SAUCR Overview
Updateswhen more than a majority up, buffer updates in memory – fast mode
e.g., 4 or 5 nodes up in a 5-node cluster
When nodes fail and only a bare majority alive, flush to disk – slow modee.g., only 3 nodes up in a 5-node cluster
16
![Page 105: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/105.jpg)
OSDI ‘18
SAUCR Overview
Updateswhen more than a majority up, buffer updates in memory – fast mode
e.g., 4 or 5 nodes up in a 5-node cluster
When nodes fail and only a bare majority alive, flush to disk – slow modee.g., only 3 nodes up in a 5-node cluster
Crash Recoverywhen a node recovers from a crash, it recovers its data
either from its disk (if crashed in slow mode)or from other nodes (if crashed in fast mode)
16
![Page 106: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/106.jpg)
OSDI ‘18
Situation-Aware Updates
17
![Page 107: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/107.jpg)
OSDI ‘18
Situation-Aware Updates
17
L
all nodes up
![Page 108: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/108.jpg)
OSDI ‘18
Situation-Aware Updates
17
L
all nodes upfast mode -
buffer updates
![Page 109: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/109.jpg)
OSDI ‘18
Situation-Aware Updates
17
L L
all nodes up 4 nodes up (more than majority)fast mode -
buffer updates
![Page 110: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/110.jpg)
OSDI ‘18
Situation-Aware Updates
17
L L
all nodes up 4 nodes up (more than majority)fast mode -
buffer updates remain infast mode
![Page 111: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/111.jpg)
OSDI ‘18
Situation-Aware Updates
17
L L L
all nodes up 4 nodes up (more than majority)
only majority upfast mode -
buffer updates remain infast mode
![Page 112: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/112.jpg)
OSDI ‘18
Situation-Aware Updates
17
L L L
all nodes up 4 nodes up (more than majority)
only majority upfast mode -
buffer updates remain infast mode
switch to slow, flush to disk
![Page 113: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/113.jpg)
OSDI ‘18
Situation-Aware Updates
17
L L L L
all nodes up 4 nodes up (more than majority)
only majority upfast mode -
buffer updates remain infast mode
switch to slow, flush to disk
![Page 114: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/114.jpg)
OSDI ‘18
Situation-Aware Updates
17
L L L L
all nodes up 4 nodes up (more than majority)
only majority up commitsubsequent
updates in slow mode
fast mode -buffer updates remain in
fast mode
switch to slow, flush to disk
![Page 115: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/115.jpg)
OSDI ‘18
Situation-Aware Updates
17
L L L L L
all nodes up 4 nodes up (more than majority)
only majority up commitsubsequent
updates in slow mode
one node recoversand catches up;fast mode -
buffer updates remain infast mode
switch to slow, flush to disk
![Page 116: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/116.jpg)
OSDI ‘18
Situation-Aware Updates
17
L L L L L
all nodes up 4 nodes up (more than majority)
only majority up commitsubsequent
updates in slow mode
one node recoversand catches up;fast mode -
buffer updates remain infast mode
switch to slow, flush to disk
![Page 117: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/117.jpg)
OSDI ‘18
Situation-Aware Updates
17
L L L L L
all nodes up 4 nodes up (more than majority)
only majority up commitsubsequent
updates in slow mode
one node recoversand catches up;fast mode -
buffer updates remain infast mode
switch to slow, flush to disk
![Page 118: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/118.jpg)
OSDI ‘18
Situation-Aware Updates
17
L L L L L
all nodes up 4 nodes up (more than majority)
only majority up commitsubsequent
updates in slow mode
one node recoversand catches up;fast mode -
buffer updates remain infast mode
switch to slow, flush to disk switch to fast
![Page 119: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/119.jpg)
OSDI ‘18
Failure Reaction
32
Basic failure-detection mechanism: heartbeats
remain in fast mode
Follower failures
switch to slow mode
Leader failures
on a missing heartbeat, followers flush to disk
Leader Leader
Challenges: too many packets, spurious elections, too much data to flushTechniques in the paper …
Result: can react to failures even when they are only a few milliseconds apart, preserving durability and availability
steps down
![Page 120: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/120.jpg)
OSDI ‘18
Mode-Aware Crash Recovery
19
![Page 121: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/121.jpg)
OSDI ‘18
Disk-durable: always recover from diskMemory-durable: always recover from other nodes (loss-aware)
Mode-Aware Crash Recovery
19
![Page 122: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/122.jpg)
OSDI ‘18
Disk-durable: always recover from diskMemory-durable: always recover from other nodes (loss-aware)
Mode-Aware Crash Recovery
19
SAUCR
![Page 123: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/123.jpg)
OSDI ‘18
Disk-durable: always recover from diskMemory-durable: always recover from other nodes (loss-aware)
Mode-Aware Crash Recovery
19
SAUCR
![Page 124: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/124.jpg)
OSDI ‘18
Disk-durable: always recover from diskMemory-durable: always recover from other nodes (loss-aware)
Mode-Aware Crash Recovery
19
SAUCR
![Page 125: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/125.jpg)
OSDI ‘18
Disk-durable: always recover from diskMemory-durable: always recover from other nodes (loss-aware)
Mode-Aware Crash Recovery
19
SAUCR
![Page 126: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/126.jpg)
OSDI ‘18
Disk-durable: always recover from diskMemory-durable: always recover from other nodes (loss-aware)
Mode-Aware Crash Recovery
19
SAUCR
recover from local disk
![Page 127: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/127.jpg)
OSDI ‘18
Disk-durable: always recover from diskMemory-durable: always recover from other nodes (loss-aware)
Mode-Aware Crash Recovery
19
SAUCR
recover from local disk
ready
immediate
![Page 128: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/128.jpg)
OSDI ‘18
Disk-durable: always recover from diskMemory-durable: always recover from other nodes (loss-aware)
Mode-Aware Crash Recovery
19
SAUCR
recover from local disk
ready
immediate
![Page 129: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/129.jpg)
OSDI ‘18
Disk-durable: always recover from diskMemory-durable: always recover from other nodes (loss-aware)
Mode-Aware Crash Recovery
19
SAUCR
recover from local disk
ready
lost updates
recover from other nodes
immediate
![Page 130: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/130.jpg)
OSDI ‘18
Disk-durable: always recover from diskMemory-durable: always recover from other nodes (loss-aware)
Mode-Aware Crash Recovery
19
SAUCR
recover from local disk
ready
lost updates
recover from other nodes
a bare minority (bare majority - 1)responses
immediate
![Page 131: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/131.jpg)
OSDI ‘18
Disk-durable: always recover from diskMemory-durable: always recover from other nodes (loss-aware)
Mode-Aware Crash Recovery
19
SAUCR
recover from local disk
ready
lost updates
recover from other nodes
ready
a bare minority (bare majority - 1)responses
immediate
![Page 132: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/132.jpg)
OSDI ‘18
Intuition for why SAUCR’s recovery is safe
20
![Page 133: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/133.jpg)
OSDI ‘18
Intuition for why SAUCR’s recovery is safe
20
Assume update-A committed, S1 recovers and has seen A before crashS1
![Page 134: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/134.jpg)
OSDI ‘18
Intuition for why SAUCR’s recovery is safe
20
Assume update-A committed, S1 recovers and has seen A before crash
Safety condition: update-A must be recovered
S1
![Page 135: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/135.jpg)
OSDI ‘18
Intuition for why SAUCR’s recovery is safe
20
Assume update-A committed, S1 recovers and has seen A before crash
Safety condition: update-A must be recoveredIf A was committed in fast mode, then at least one in any bare minority must contain update-A
AAA
S1
![Page 136: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/136.jpg)
OSDI ‘18
Intuition for why SAUCR’s recovery is safe
20
Assume update-A committed, S1 recovers and has seen A before crash
Safety condition: update-A must be recoveredIf A was committed in fast mode, then at least one in any bare minority must contain update-AIf update-A was committed in slow mode, S1 recovers from disk
A AA
S1
![Page 137: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/137.jpg)
OSDI ‘18
Intuition for why SAUCR’s recovery is safe
20
Assume update-A committed, S1 recovers and has seen A before crash
Safety condition: update-A must be recoveredIf A was committed in fast mode, then at least one in any bare minority must contain update-AIf update-A was committed in slow mode, S1 recovers from diskProof sketch in the paper …
A AA
S1
![Page 138: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/138.jpg)
OSDI ‘18
Outline
IntroductionDistributed updates and crash recoverySituation-aware updates and crash recoveryResultsSummary and conclusion
51
![Page 139: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/139.jpg)
OSDI ‘18
Evaluation
We implement in SAUCR in ZooKeeperCompare SAUCR’s reliability and performance against
disk-durable ZooKeeper (forceSync = true)memory-durable ZooKeeper (forceSync = false)viewstamped replication (ideal model)
52
![Page 140: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/140.jpg)
OSDI ‘18
Reliability Testing
53
12345
1234 1235 1245 1345 2345
123 124 345…12
1 2 3 4 5
125
4514 …
…
13
Cluster crash-testing frameworkGenerates cluster-state sequences
How it works?Please see our paper…
![Page 141: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/141.jpg)
OSDI ‘18
703 0
217 1047
1264 0 0
1264 0 0
Reliability Results
54
Correct Unavailable Data loss Correct Unavailable Data loss
703 0 561
217 1047 0
1264 0 0
Systemsmemory-durable
zookeeperviewstampedreplication
disk-durablezookeeperSAUCR 1200 64 0
Simultaneous
non-simultaneous: gap of 50 ms, simultaneous: no gap memory-durable zookeeper silently loses dataviewstamped replication leads to permanent unavailabilitySAUCR reacts to non-simultaneous – durable and availableother systems behave the same as non-simultaneous casessimultaneous: SAUCR by design remains unavailable in some cases
561
0
Non-Simultaneous
![Page 142: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/142.jpg)
OSDI ‘18
Macro-benchmark Performance: YCSB-load
55
Compared to disk-durable, both memory-durable and SAUCR are faster SAUCR’s performance matches memory-durable ZooKeeper
within 9% of memory-durable Zookeeper even for write-intensive workloadsoverheads because SAUCR writes to one additional node
0
5
10
15
20
25
HDD SSD
Thr
ough
put
(KO
ps/s
)
Memory-durable ZooKeeper SAUCR Disk-durable ZooKeeper
100x
2.5x
9%9%
![Page 143: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/143.jpg)
OSDI ‘18
Summary
26
Replication protocols are an important foundationneed to be performant, yet also provide high reliability
Dichotomy: disk-durable vs. memory-durable protocolsunsavory choices: either performant or reliable
SAUCR – situation-aware updates and crash recoveryprovides both high performance and reliability
![Page 144: VWHPV - research.cs.wisc.edu26', ¶ exiihu xsgdwhv rqo\ lq yrodwloh phpru\ 7kh 7zr 'liihuhqw :ruogv ri 5hsolfdwlrq +rz dqg zkhuh wr vwruh v\vwhp vwdwh" v\qfkurqrxvo\ shuvlvw](https://reader033.vdocument.in/reader033/viewer/2022041605/5e33e861ba06832fc04b2652/html5/thumbnails/144.jpg)
OSDI ‘18
Conclusions
Paying careful attention to how failures occurcan find approaches that provide both performance and reliability more data from real-world deployments?
Hybrid approach – an effective systems-design technique –applicable to distributed updates and recovery too
worthwhile to look at other important protocols/systems where we make similar two-ends-of-the-spectrum tradeoffs?
Thank you!Poster #6
27