cap theorem
TRANSCRIPT
CAP TheoremVikash Kodati
10/19/2016
T-Mobile Confidential2
CAP Theorem (also Brewer's theorem)
4/6/2016
It is impossible for a distributed system to simultaneously provide all three of the following guarantees: (Pick any two)
1. Consistency: All nodes should see the same data at the same time or reads return latest written value by any client
2. Availability: Every request receives a response. The system allows operations all the time and operations return quickly
3. Partition – Tolerance: the system continues to operate despite arbitrary partitioning due to network failures
T-Mobile Confidential3 4/6/2016
T-Mobile Confidential4 4/6/2016
T-Mobile Confidential5 4/6/2016
T-Mobile Confidential6
Why is Availability Important?
4/6/2016
Availability = Reads / writes complete reliably and quickly.
• Data shows that a 500 ms increase in latency for operations at Amazon.com or at Google.com can cause a 20% drop in revenue
• At Amazon , each added millisecond of latency implies a $6M yearly loss
• SLAs written by providers predominantly deal with latencies faced by clients
T-Mobile Confidential7
Why is Consistency Important?
4/6/2016
Consistency = all nodes see same data at time, or reads return latest written value by any client.
• When you access your bank or investment account via multiple clients, you want the updates done from one client to be visible to other clients.
• When thousands of customers are looking to book a flight, all updates from any client should be accessible by other clients.
T-Mobile Confidential8
Why is Partition - Tolerance Important?
4/6/2016
Partitions can happen across datacenter when the network gets disconnected• Internet router outages• Under-sea cables cut• DNS not working
• Partitions can also occur within a datacenter, e.g., a rack switch outage.
• Still we desire a system to continue functioning normally
T-Mobile Confidential9
DESIGN FOR FAILURE
6/13/2016
Typical first year for a new cluster:~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)~1 network rewiring (rolling ~5% of machines down over 2-day span)~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)~5 racks go wonky (40-80 machines see 50% packet loss)~8 network maintenances (4 might cause ~30-minute random connectivity losses)~12 router reloads (takes out DNS and external vips for a couple minutes)~3 router failures (have to immediately pull traffic for an hour)~dozens of minor 30-second blips for dns~1000 individual machine failures~thousands of hard drive failuresslow disks, bad memory, misconfigured machines, flaky machines, etc.
Note: Data taken from Jeff Dean’s slides (Google)
T-Mobile Confidential10
CAP Theorem Fallout
4/6/2016
• Since partition-tolerance is essential / inevitable in today’s cloud computing systems, CAP theorem implies that a system has to choose between consistency and availability.
• Cassandra (AP System, Sacrifice Consistency)• Eventual (Weak) consistency , Availability and Partition –Tolerance.
• Traditional RDBMSs (CA System, Partitions can’t happen [single node])• Strong consistency over availability under a partition.
NOSQL Landscape
6/13/2016 T-Mobile Confidential11
T-Mobile Confidential12
Eventual Consistency
4/6/2016
• If all writes stop (to a key), then all its values (replicas) will coverage eventually.
• If writes continues , then system always tries to keep converging.• Moving “wave” of updated values lagging behind the latest values sent by
clients , but always trying to catch up.
• But works well when there a few periods of low writes – system converges quickly.
T-Mobile Confidential13
RDBMS vs. Key value stores
4/6/2016
• While RDBMS provide ACID.• Atomicity• Consistency• Isolation• Durability
• Key – value stores like Cassandra provide BASE• Basically Available Soft-state Eventual Consistency• Prefers Availability over Consistency
T-Mobile Confidential14
CONCLUSION
4/6/2016
• Business vs Engineering decisions• Would you rather be down or show wrong prices/inventory• Would you rather be slow or show wrong prices/inventory
• New Paradigm • Industries trying trying to run on availability• Model systems to mimic laws of Physics• Once done, cant revert. Why: because we cannot go back in time• Once done, we can correct (if needed). This is eventual consistency
• Telecommunication Industry Example• Commissions team: Pay advances but chargeback later
T-Mobile Confidential15
THANK YOU & QA
6/13/2016
Vikash Kodati
• Email: [email protected]• Yammer: https://www.yammer.com/t-mobile.com/users/vikashkodati• Github: https://github.com/vikashkodati• LinkedIn: /in/vikashkodati• Twitter: @vikashkodati• Blog: https://tmobileusa.sharepoint.com/portals/hub/personal/vikashkodati
T-Mobile Confidential16
REFERENCES
6/13/2016
http://mwhittaker.github.io/2014/08/16/illustrated-proof-cap-theorem/
https://pinboard.in/u:sids/t:fifthel2015
Mandelbroth Set
6/13/2016 T-Mobile Confidential17
Mandelbroth Set
6/13/2016 T-Mobile Confidential18