cap theorem

18
CAP Theorem Vikash Kodati 10/19/2016

Upload: vikash-kodati

Post on 17-Jan-2017

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CAP Theorem

CAP TheoremVikash Kodati

10/19/2016

Page 2: CAP Theorem

T-Mobile Confidential2

CAP Theorem (also Brewer's theorem)

4/6/2016

It is impossible for a distributed system to simultaneously provide all three of the following guarantees: (Pick any two)

1. Consistency: All nodes should see the same data at the same time or reads return latest written value by any client

2. Availability: Every request receives a response. The system allows operations all the time and operations return quickly

3. Partition – Tolerance: the system continues to operate despite arbitrary partitioning due to network failures

Page 3: CAP Theorem

T-Mobile Confidential3 4/6/2016

Page 4: CAP Theorem

T-Mobile Confidential4 4/6/2016

Page 5: CAP Theorem

T-Mobile Confidential5 4/6/2016

Page 6: CAP Theorem

T-Mobile Confidential6

Why is Availability Important?

4/6/2016

Availability = Reads / writes complete reliably and quickly.

• Data shows that a 500 ms increase in latency for operations at Amazon.com or at Google.com can cause a 20% drop in revenue

• At Amazon , each added millisecond of latency implies a $6M yearly loss

• SLAs written by providers predominantly deal with latencies faced by clients

Page 7: CAP Theorem

T-Mobile Confidential7

Why is Consistency Important?

4/6/2016

Consistency = all nodes see same data at time, or reads return latest written value by any client.

• When you access your bank or investment account via multiple clients, you want the updates done from one client to be visible to other clients.

• When thousands of customers are looking to book a flight, all updates from any client should be accessible by other clients.

Page 8: CAP Theorem

T-Mobile Confidential8

Why is Partition - Tolerance Important?

4/6/2016

Partitions can happen across datacenter when the network gets disconnected• Internet router outages• Under-sea cables cut• DNS not working

• Partitions can also occur within a datacenter, e.g., a rack switch outage.

• Still we desire a system to continue functioning normally

Page 9: CAP Theorem

T-Mobile Confidential9

DESIGN FOR FAILURE

6/13/2016

Typical first year for a new cluster:~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)~1 network rewiring (rolling ~5% of machines down over 2-day span)~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)~5 racks go wonky (40-80 machines see 50% packet loss)~8 network maintenances (4 might cause ~30-minute random connectivity losses)~12 router reloads (takes out DNS and external vips for a couple minutes)~3 router failures (have to immediately pull traffic for an hour)~dozens of minor 30-second blips for dns~1000 individual machine failures~thousands of hard drive failuresslow disks, bad memory, misconfigured machines, flaky machines, etc.

Note: Data taken from Jeff Dean’s slides (Google)

Page 10: CAP Theorem

T-Mobile Confidential10

CAP Theorem Fallout

4/6/2016

• Since partition-tolerance is essential / inevitable in today’s cloud computing systems, CAP theorem implies that a system has to choose between consistency and availability.

• Cassandra (AP System, Sacrifice Consistency)• Eventual (Weak) consistency , Availability and Partition –Tolerance.

• Traditional RDBMSs (CA System, Partitions can’t happen [single node])• Strong consistency over availability under a partition.

Page 11: CAP Theorem

NOSQL Landscape

6/13/2016 T-Mobile Confidential11

Page 12: CAP Theorem

T-Mobile Confidential12

Eventual Consistency

4/6/2016

• If all writes stop (to a key), then all its values (replicas) will coverage eventually.

• If writes continues , then system always tries to keep converging.• Moving “wave” of updated values lagging behind the latest values sent by

clients , but always trying to catch up.

• But works well when there a few periods of low writes – system converges quickly.

Page 13: CAP Theorem

T-Mobile Confidential13

RDBMS vs. Key value stores

4/6/2016

• While RDBMS provide ACID.• Atomicity• Consistency• Isolation• Durability

• Key – value stores like Cassandra provide BASE• Basically Available Soft-state Eventual Consistency• Prefers Availability over Consistency

Page 14: CAP Theorem

T-Mobile Confidential14

CONCLUSION

4/6/2016

• Business vs Engineering decisions• Would you rather be down or show wrong prices/inventory• Would you rather be slow or show wrong prices/inventory

• New Paradigm • Industries trying trying to run on availability• Model systems to mimic laws of Physics• Once done, cant revert. Why: because we cannot go back in time• Once done, we can correct (if needed). This is eventual consistency

• Telecommunication Industry Example• Commissions team: Pay advances but chargeback later

Page 16: CAP Theorem

T-Mobile Confidential16

REFERENCES

6/13/2016

http://mwhittaker.github.io/2014/08/16/illustrated-proof-cap-theorem/

https://pinboard.in/u:sids/t:fifthel2015

Page 17: CAP Theorem

Mandelbroth Set

6/13/2016 T-Mobile Confidential17

Page 18: CAP Theorem

Mandelbroth Set

6/13/2016 T-Mobile Confidential18