![Page 1: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/1.jpg)
CAP THEOREM Large Scale Data Management
![Page 2: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/2.jpg)
Consistency, Availability, Par99ons-‐Tolerance
• Conjecture by Eric Brewer at PODC 2000 : – It is impossible for a web service to provide following three guarantees : • Consistency • Availability • Par99on-‐tolerance
• Established as theorem in 2002: – Lynch, Nancy, and Seth Gilbert. Brewer’s conjecture and the feasibility of consistent, available, par99on-‐tolerant web services. ACM SIGACT News, v. 33 issue 2, 2002, p. 51-‐59.
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 3: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/3.jpg)
CAP theorem
• Consistency -‐ all nodes should see the same data at the same 9me
• Availability -‐ node failures do not prevent survivors from con9nuing to operate
• Par88on-‐tolerance -‐ the system con9nues to operate despite arbitrary message loss
• A distributed system can sa8sfy any two of these guarantees at the same 8me but not all three
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 4: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/4.jpg)
Consistency + Availability
• Examples: – Single-‐site databases – Cluster databases – LDAP – xFS file system
• Traits: – 2-‐phase commit – cache valida9on protocols
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 5: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/5.jpg)
Consistency + Par99on Tolerance
• Examples: – Distributed databases – Distributed Locking – Majority protocols
• Traits: – Pessimis9c locking – Make minority par99ons unavailable (Quorums)
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 6: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/6.jpg)
Availability + Par99on Tolerance
• Example: – Code – DNS – Usenet
• Traits: – Expira9on/leases – Conflict resolu9on – Op9mis9c replica9on
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 7: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/7.jpg)
Data Store and CAP
• RDBMS : CA (Master/Slave replica8on, Sharding) • Amazon Dynamo : AP (Read-‐repair, applica9on hooks)
• Terracota : CA (Quorum vote, majority par99on survival)
• Apache Cassandra : AP (Par99oning, Read-‐repair) • Apache Zookeeper: AP (Consensus protocol) • Google BigTable : CA • Apache CouchDB : AP Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 8: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/8.jpg)
hTp://blog.nahurst.com/visual-‐guide-‐to-‐nosql-‐systems
![Page 9: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/9.jpg)
Techniques for CAP
• Consistent Hashing • Vector Clocks • Sloppy Quorum • Merkle trees • Gossip-‐based protocols • CRDTs • See that later…
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 10: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/10.jpg)
Idea of the proof
• hTp://www.youtube.com/watch?v=Jw1iFr4v58M
![Page 11: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/11.jpg)
Atomic Data Object
• Atomic/Linearizable Consistency: – There must exist a total order on all opera9on such that each opera9on looks as if it were completed at a single instant
– This is equivalent to requiring requests on the distributed shared memory to act as if they are execu9ng on single node, responding to opera9ons one at the 9me
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 12: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/12.jpg)
Available Data Objects
• For a distributed system to be con9nuously available, every request received by a non-‐failing node in the system must result in a response – That is, any algorithm used by service must eventually terminate • (In some ways, this is weak defini9on of availability : it puts no bounds on how long the algorithm may run before termina9ng, and therefore allows unbounded computa9on)
• (On the other hand, when qualified by the need for par99on tolerance, this can be seen as a strong defini9on of availability : even when severe network failures occur, every request must terminate)
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 13: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/13.jpg)
Par99on Tolerance • In order to model par99on tolerance, the network is allowed to
lose arbitrary many messages sent from one node to another • When a network is par99oned, all messages sent from nodes in one
component of the par99on to another component are lost. • The atomicity requirement implies that every response will be
atomic, even though arbitrary messages sent as part of the algorithm might not be delivered
• The availability requirement therefore implies that every node receiving request from a client must respond, even through arbitrary messages that are sent may be lost
• Par99on Tolerance : No set of failures less than total network failure is allowed to cause the system to respond incorrectly
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 14: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/14.jpg)
Asynchronous Network Model
• There is no clock • Nodes must make decisions based only on messages received and local computa9on
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 15: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/15.jpg)
Asynchronous Networks: impossibility result
• Theorem 1 : It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following proper:es: – Availability – Atomic consistency in all fair execu9ons (including those in which messages are lost)
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 16: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/16.jpg)
Asynchronous Networks: impossibility result
• Proof (by contradic9on) : – Assume an algorithm A exists that meets the three criteria : • atomicity, availability and par99on tolerance
– We construct an execu9on of A in which there exists a request that returns and inconsistent response
– Assume that the network consists of at least two nodes. Thus it can be divided into two disjoint, non-‐empty sets G1,G2
– Assume all messages between G1 and G2 are lost. – If a write occurs in G1 and read occurs in G2, then the read opera9on cannot return the results of earlier write opera9on.
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 17: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/17.jpg)
Asynchronous Networks: impossibility result
• Formal proof: – Let v0 be the ini9al value of the atomic object – Let α1 be the prefix of an execu9on of A in which a single write of a value not equal to v0 occurs in G1, ending with the termina9on of the write opera9on.
– assume that no other client requests occur in either G1 or G2. assume that no messages from G1 are received in G2 and no messages from G2 are received in G1
– we know that write opera9on will complete (by the availability requirement)
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 18: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/18.jpg)
Asynchronous Networks: impossibility result
• Let α2 be the prefix of an execu9on in which a single read occurs in G2 and no other client requests occur, ending with the termina9on of the read opera9on
• During α2 no messages from G2 are received in G1 and no messages from G1 are received in G2
• We know that the read must return a value (by the availability requirement)
• The value returned by this execu9on must be v0 as no write opera9on has occurred in α2
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 19: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/19.jpg)
Asynchronous Networks: impossibility result
• Let α be an execu9on beginning with α1 and con9nuing with α2. To the nodes in G2 , α is indis9nguishable from α2, as all the messages from G1 to G2 are lost (in both α1 and α2 that together make up α), and α1 does not include any client requests to nodes in G2.
• Therefore, in the α execu9on -‐ the read request (from α2) must s9ll return v0.
• However, the read request does not begin un9l aler the write request (from α1) has completed
• This therefore contradicts the atomicity property, proving that no such algorithm exists
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 20: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/20.jpg)
Asynchronous Networks: Impossibility Result
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 21: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/21.jpg)
Impossibility results
• It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following proper9es: – Availability -‐ in all fair execu9ons – Atomic consistency -‐ in fair execu9ons in which no messages are lost
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 22: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/22.jpg)
Impossibility results
• Proof: – The main idea is that in the asynchronous model, an algorithm has no way of determining whether a message has been lost, or has been arbitrary delayed in the transmission channel
– Therefore if there existed an an algorithm that guaranteed atomic consistency in execu9ons in which no messages were lost, there would exist an algorithm that guaranteed atomic consistency in all execu9ons.
– This would violate Theorem 1
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 23: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/23.jpg)
CAP theorem
• While it is impossible to provide all three proper9es : atomicity, availability and par99on tolerance, any two of these proper9es can be achieved: – Atomic, Par99on Tolerant – Atomic, Available – Atomic, Par99on Tolerant
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 24: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/24.jpg)
Atomic, Par99on-‐Tolerant • If availability is not required , it is easy to achieve atomic data and
par99on tolerance • The trivial system that ignores all requests meets these requirements • Stronger liveness criterion : if all the messages in an execu9on are
delivered, system is available and all opera9ons terminate • A simple centralized algorithm meets these requirements : a single
designated node maintains the value of an object • A node receiving request forwards the request to designated node which
sends a response. When acknowledgement is received, the node sends a response to the client
• Many distributed databases provide this guarantee, especially algorithms based on distributed locking or quorums : if certain failure paTerns occur, liveness condi9on is weakened and the service no longer returns response. If there are no failures, then liveness is guaranteed.
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 25: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/25.jpg)
Atomic, Available
• If there are no par99ons -‐ it is possible to provide atomic, available data
• Centralized algorithm with single designated node for maintaining value of an object meets these requirements
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 26: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/26.jpg)
Available, Par99on-‐Tolerant
• It is possible to provide high availability and par99on tolerance if atomic consistency is not required
• If there are no consistency requirements, the service can trivially return v0, the ini9al value in response to every request
• It is possible to provide weakened consistency in an available, par99on-‐tolerant semng
• Web caches are one example of weakly consistent network
Aleksandar Bradic, Vast.com hTp://fr.slideshare.net/alekbr/cap-‐theorem
![Page 27: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/27.jpg)
Par9ally Synchronous Model
• The Lynch Paper also details CAP in Par9ally Synchronous Model : every node has a clock and all clocks increase at the same rate.
• However, clocks are not synchronized • If theorem 1 holds in Par9ally synchronous model, the corollary 1.1 does not hold.
• Weaker consistency (t-‐connected) can be achieved.
![Page 28: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/28.jpg)
CAP Conclusion
• It is possible to build Large Scale Distributed Data Management systems under the CAP theorem: – One property should be sacrified.
![Page 29: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/29.jpg)
Sacrifying one Property • If Consistency is sacrified (AP): – Push consistency problems to applica9ons, Can be more difficult to solve, or not… high programming cost
– Deployement on asynchonous infrastructure… • If Availability is sacrified (CP) – Blocking protocols can really block the system, – Cheap programming cost on asynchronous infrastructure
• If P is sacrified (AC) – Need to provide a quasi-‐synchronous model, where complex failures never happens
– Cheap programming cost with synchronous infra… Stonebraker CACM CACM 2010
![Page 30: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/30.jpg)
Challenges
• Whatever the choices been made, AC/AP/CP • Scalability and throughtput that can be achieved with different approaches will make the difference
• The balance between programming cost/scalability-‐efficiency will be the key.
• Nice challenges for scien9st and engineers…
![Page 31: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/31.jpg)
Clash of cultures
• a Classic distributed systems: focused on ACID seman9cs – A: Atomic – C: Consistent – I: Isolated – D: Durable
• a “Modern” Internet systems: focused on BASE – Basically Available – Sol-‐state (or scalable) – Eventually consistent
NoSQL (CouchDB…) vs NewSQL (VoltDB…) Dan PritcheT BASE, an ACID Alterna9ve ACM Queue hTp://queue.acm.org/detail.cfm?id=1394128
![Page 32: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&](https://reader033.vdocument.in/reader033/viewer/2022050407/5f841a2099095a5ee90695b7/html5/thumbnails/32.jpg)
hTp://blogs.the451group.com/informa9on_management/2011/04/15/nosql-‐newsql-‐and-‐beyond/