scaling datastores and the cap theorem

50
Put Your Thinking CAP On Tomer Gabel, Wix JDay Lviv, 2015

Upload: tomer-gabel

Post on 02-Jul-2015

1.603 views

Category:

Software


4 download

DESCRIPTION

A talk given at BuildStuff 2014 in Vilnius, Lithuania; originally developed by Yoav Abrahami, and based on the works of Kyle "Aphyr" Kingsbury. Original abstract: Friday 4th June 1976, the Sex Pistols kicked off their first gig, a gig that's considered to change western music culture forever, pioneering the genesis of punk rock. Wednesday 19th July 2000 had a similar impact on internet scale companies as the Sex Pistols did on music, with the keynote speech by Eric Brewer at the ACM symposium on the [Principles of Distributed Computing] (PODC). Eric Brewer claimed that as applications become more web-based we should stop worrying about data consistency, because if we want high availability in those new distributed applications, then we cannot have data consistency. Two years later, in 2002, Seth Gilbert and Nancy Lynch [formally proved] Brewer's claim as what is known today as the Brewer's Theorem or CAP. The CAP theorem mandates that a distributed system cannot satisfy both Consistency, Availability and Partition tolerance. In the database ecosystem, many tools claim to solve our data persistence problems while scaling out, offering different capabilities (document stores, key/values, SQL, graph, etc). In this talk we will explore the CAP theorem + We will define what are Consistency, Availability and Partition Tolerance + We will explore what CAP means for our applications (ACID vs BASE) + We will explore practical applications on MySQL with read slave, MongoDB and Riak based on the work by [Aphyr - Kyle Kingsbury].

TRANSCRIPT

Page 1: Scaling Datastores and the CAP Theorem

Put Your Thinking

CAP OnTomer Gabel, Wix

JDay Lviv, 2015

Page 2: Scaling Datastores and the CAP Theorem

Credits

Originally a talk by

Yoav Abrahami (Wix)

Based on “Call Me Maybe” by

Kyle “Aphyr” Kingsbury

Page 3: Scaling Datastores and the CAP Theorem

Brewer’s CAP Theorem

Partition Tolerance

ConsistencyAvailability

Page 4: Scaling Datastores and the CAP Theorem

Brewer’s CAP Theorem

Partition Tolerance

ConsistencyAvailability

Page 5: Scaling Datastores and the CAP Theorem

By Example

• I want this book!

– I add it to the cart

– Then continue

browsing

• There’s only one copy

in stock!

Page 6: Scaling Datastores and the CAP Theorem

By Example

• I want this book!

– I add it to the cart

– Then continue

browsing

• There’s only one copy

in stock!

• … and someone else

just bought it.

Page 7: Scaling Datastores and the CAP Theorem

Consistency

Page 8: Scaling Datastores and the CAP Theorem

Consistency: Defined

• In a consistent

system:

All participants

see the same value

at the same time

• “Do you have this

book in stock?”

Page 9: Scaling Datastores and the CAP Theorem

Consistency: Defined

• If our book store is an

inconsistent system:

– Two customers may

buy the book

– But there’s only one

item in inventory!

• We’ve just violated a

business constraint.

Page 10: Scaling Datastores and the CAP Theorem

Availability

Page 11: Scaling Datastores and the CAP Theorem

Availability: Defined

• An available system:

– Is reachable

– Responds to requests

(within SLA)

• Availability does not

guarantee success!

– The operation may fail

– “This book is no longer

available”

Page 12: Scaling Datastores and the CAP Theorem

Availability: Defined

• What if the system is

unavailable?

– I complete the

checkout

– And click on “Pay”

– And wait

– And wait some more

– And…

• Did I purchase the

book or not?!

Page 13: Scaling Datastores and the CAP Theorem

Partition

Tolerance

Page 14: Scaling Datastores and the CAP Theorem

Partition Tolerance: Defined

• Partition: one or

more nodes are

unreachable

• No practical

system runs on a

single node

• So all systems are

susceptible!

A

B

C

D

E

Page 15: Scaling Datastores and the CAP Theorem

“The Network is Reliable”

• All four happen in an

IP network

• To a client, delays

and drops are the

same

• Perfect failure

detection is provably

impossible1!

A B

drop delay

duplicate reorder

A B

A B A B

time

1 “Impossibility of Distributed Consensus with One Faulty Process”, Fischer, Lynch and Paterson

Page 16: Scaling Datastores and the CAP Theorem

Partition Tolerance: Reified

• External causes:– Bad network config

– Faulty equipment

– Scheduled maintenance

• Even software causes partitions:– Bad network config.

– GC pauses

– Overloaded servers

• Plenty of war stories!– Netflix

– Twilio

– GitHub

– Wix :-)

• Some hard numbers1:– 5.2 failed devices/day

– 59K lost packets/day

– Adding redundancy only improves by 40%

1 “Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications”, Gill et al

Page 17: Scaling Datastores and the CAP Theorem

“Proving” CAP

Page 18: Scaling Datastores and the CAP Theorem

In Pictures

• Let’s consider a simple

system:

– Service A writes values

– Service B reads values

– Values are replicated

between nodes

• These are “ideal”

systems

– Bug-free, predictable

Node 1

V0A

Node 2

V0B

Page 19: Scaling Datastores and the CAP Theorem

In Pictures

• “Sunny day scenario”:

– A writes a new value V1

– The value is replicated

to node 2

– B reads the new value

Node 1

V0A

Node 2

V0B

V1

V1

V1

V1

Page 20: Scaling Datastores and the CAP Theorem

In Pictures

• What happens if the

network drops?

– A writes a new value V1

– Replication fails

– B still sees the old value

– The system is

inconsistent

Node 1

V0A

Node 2

V0B

V1

V0

V1

Page 21: Scaling Datastores and the CAP Theorem

In Pictures

• Possible mitigation is

synchronous replication

– A writes a new value V1

– Cannot replicate, so write is

rejected

– Both A and B still see V0

– The system is logically

unavailable

Node 1

V0A

Node 2

V0B

V1

Page 22: Scaling Datastores and the CAP Theorem

What does it all mean?

Page 23: Scaling Datastores and the CAP Theorem

The network is not reliable

• Distributed systems must handle partitions

• Any modern system runs on >1 nodes…

• … and is therefore distributed

• Ergo, you have to choose:

– Consistency over availability

– Availability over consistency

Page 24: Scaling Datastores and the CAP Theorem

Granularity

• Real systems comprise many operations

– “Add book to cart”

– “Pay for the book”

• Each has different properties

• It’s a spectrum, not a binary choice!

Consistency Availability

Shopping CartCheckout

Page 25: Scaling Datastores and the CAP Theorem

CAP IN THE REAL

WORLD

Kyle “Aphyr” Kingsbury

Breaking consistency

guarantees since 2013

Page 26: Scaling Datastores and the CAP Theorem

PostgreSQL

• Traditional RDBMS

– Transactional

– ACID compliant

• Primarily a CP system

– Writes against a

master node

• “Not a distributed

system”

– Except with a client at

play!

Page 27: Scaling Datastores and the CAP Theorem

PostgreSQL

• Writes are a simplified

2PC:

– Client votes to commit

– Server validates

transaction

– Server stores changes

– Server acknowledges

commit

– Client receives

acknowledgement

Client Server

Store

Page 28: Scaling Datastores and the CAP Theorem

PostgreSQL

• But what if the ack is

never received?

• The commit is already

stored…

• … but the client has

no indication!

• The system is in an

inconsistent state

Client Server

Store

?

Page 29: Scaling Datastores and the CAP Theorem

PostgreSQL

• Let’s experiment!

• 5 clients write to a

PostgreSQL instance

• We then drop the server

from the network

• Results:

– 1000 writes

– 950 acknowledged

– 952 survivors

Page 30: Scaling Datastores and the CAP Theorem

So what can we do?

1. Accept false-negatives

– May not be acceptable for your use case!

2. Use idempotent operations

3. Apply unique transaction IDs

– Query state after partition is resolved

• These strategies apply to any RDBMS

Page 31: Scaling Datastores and the CAP Theorem

• A document-oriented database

• Availability/scale via replica sets

– Client writes to a master node

– Master replicates writes to n replicas

• User-selectable consistency guarantees

Page 32: Scaling Datastores and the CAP Theorem

MongoDB

• When a partition occurs:

– If the master is in the

minority, it is demoted

– The majority promotes a

new master…

– … selected by the highest

optime

Page 33: Scaling Datastores and the CAP Theorem

MongoDB

• The cluster “heals” after partition resolution:

– The “old” master rejoins the cluster

– Acknowleged minority writes are reverted!

Page 34: Scaling Datastores and the CAP Theorem

MongoDB

• Let’s experiment!

• Set up a 5-node

MongoDB cluster

• 5 clients write to

the cluster

• We then partition

the cluster

• … and restore it to

see what happens

Page 35: Scaling Datastores and the CAP Theorem

MongoDB

• With write concern unacknowleged:– Server does not ack

writes (except TCP)

– The default prior to November 2012

• Results:– 6000 writes

– 5700 acknowledged

– 3319 survivors

– 42% data loss!

Page 36: Scaling Datastores and the CAP Theorem

MongoDB

• With write concern

acknowleged:

– Server acknowledges

writes (after store)

– The default guarantee

• Results:

– 6000 writes

– 5900 acknowledged

– 3692 survivors

– 37% data loss!

Page 37: Scaling Datastores and the CAP Theorem

MongoDB

• With write concern replica acknowleged:– Client specifies

minimum replicas

– Server acks after writes to replicas

• Results:– 6000 writes

– 5695 acknowledged

– 3768 survivors

– 33% data loss!

Page 38: Scaling Datastores and the CAP Theorem

MongoDB

• With write concern majority:– For an n-node cluster,

requires at least n/2replicas

– Also called “quorum”

• Results:– 6000 writes

– 5700 acknowledged

– 5701 survivors

– No data loss

Page 39: Scaling Datastores and the CAP Theorem

So what can we do?

1. Keep calm and carry on

– As Aphyr puts it, “not all applications need

consistency”

– Have a reliable backup strategy

– … and make sure you drill restores!

2. Use write concern majority

– And take the performance hit

Page 40: Scaling Datastores and the CAP Theorem

The prime suspects

• Aphyr’s Jepsen tests

include:

– Redis

– Riak

– Zookeeper

– Kafka

– Cassandra

– RabbitMQ

– etcd (and consul)

– ElasticSearch

• If you’re

considering them,

go read his posts

• In fact, go read his

posts regardless

http://aphyr.com/tags/jepsen

Page 41: Scaling Datastores and the CAP Theorem

STRATEGIES FOR

DISTRIBUTED SYSTEMS

Page 42: Scaling Datastores and the CAP Theorem

Immutable Data

• Immutable (adj.):

“Unchanging over

time or unable to be

changed.”

• Meaning:

– No deletes

– No updates

– No merge conflicts

– Replication is trivial

Page 43: Scaling Datastores and the CAP Theorem

Idempotence

• An idempotent

operation:

– Can be applied one or

more times with the

same effect

• Enables retries

• Not always possible

– Side-effects are key

– Consider: payments

Page 44: Scaling Datastores and the CAP Theorem

Eventual Consistency

• A design which prefers

availability

• … but guarantees that

clients will eventually see

consistent reads

• Consider git:

– Always available locally

– Converges via push/pull

– Human conflict resolution

Page 45: Scaling Datastores and the CAP Theorem

Eventual Consistency

• The system expects

data to diverge

• … and includes

mechanisms to regain

convergence

– Partial ordering to

minimize conflicts

– A merge function to

resolve conflicts

Page 46: Scaling Datastores and the CAP Theorem

Vector Clocks

• A technique for partial ordering

• Each node has a logical clock

– The clock increases on every write

– Track the last observed clocks for each item

– Include this vector on replication

• When observed and inbound vectors have

no common ancestor, we have a conflict

• This lets us know when history diverged

Page 47: Scaling Datastores and the CAP Theorem

CRDTs• Commutative Replicated Data Types1

• A CRDT is a data structure that:

– Eventually converges to a consistent state

– Guarantees no conflicts on replication

1 “A comprehensive study of Convergent and Commutative Replicated Data Types”, Shapiro et al

Page 48: Scaling Datastores and the CAP Theorem

CRDTs

• CRDTs provide specialized semantics:

– G-Counter: Monotonously increasing counter

– PN-Counter: Also supports decrements

– G-Set: A set that only supports adds

– 2P-Set: Supports removals but only once

• OR-Sets are particularly useful

– Keeps track of both additions and removals

– Can be used for shopping carts

Page 49: Scaling Datastores and the CAP Theorem

Questions?

Complaints?

Page 50: Scaling Datastores and the CAP Theorem

WE’RE DONE

HERE!

Thank you for listening

[email protected]

@tomerg

http://il.linkedin.com/in/tomergabel

Aphyr’s “Call Me Maybe” blog posts:

http://aphyr.com/tags/jepsen