consistency in distributed systems, part 2
DESCRIPTION
Building on the introductory work of a past webinar, we take a deep dive in the locking, replication, and failure modes of leading NoSQL databases. We focus on three main areas critical for modern developers and architects: Industry survey of ACID compliance. Best practices to store 1-many and many-many relationships in Riak, Cassandra, MongoDB, DyanamoDB, and Cloudant. Consistency between primary and secondary indexes (an often neglected subject) and implications for immutable data models.TRANSCRIPT
Consistency in Distributed Systems: IIMike MillerCo-Founder, Chief Scientist @mlmilleratmit
2014-07-31 2
AMP on Consistency
https://amplab.cs.berkeley.edu/tag/consistency/
2014-07-31
Topics For Today‣ Brief review of Part I (2014-06-12)
‣ Why is Consistency hard?
‣ What should you really care about? • Single object/row/document operations • Multi-part transactions • Primary/secondary indexes
‣ Dirty little (ACID) secrets: Results from industry survey
‣ Failure modes, strategies, and gotchas
3
2014-07-31
Motivation
4
2014-07-31 5
MobileBig Data
=> Stress models for consistency, transactional reasoning
2014-07-31
This is your problem when… !
… data doesn’t fit on one server. … data replicated between servers (e.g. read slaves). … data spread between data centers. … state spread across more than one device (mobile!) … mixed workloads with concurrency. … state spread across more than one process.
6
2014-07-31
This is now everyone’s problem
7
2014-07-31
Good news — market response: NewSQL, NoSQL, Cloud, …
8
2014-07-31 9
ships with a mobile strategy
2014-07-31
{Write: ‘Local’, Sync: ‘Later’}
Embedded, Edge, Satellites
Desktop, Browser
Cloud
10
2014-07-31
NoSQL Taxonomy
11
2014-07-31 12
2014-07-31 13
…
…http://www.bailis.org/papers/ramp-sigmod2014.pdf
Fundamental reason: CAP Theorem
2014-07-31
You do need to understand your datastore.
14
2014-07-31
Why is Consistency Hard?
15
2014-07-31
1. The network is reliable.
2. Latency is zero. (Fallacies of Distributed Computing, P. Deutsch)
16
MySQL, MongoDB, CouchDB, SOLR, …
Dynamo, Cloudant, Cassandra, Riak, …
Primary
Secondary
Client
Repl
icat
ion
w(x=1)success
Clientr(x)x=1
time
Perfect Network
Primary
Secondary
Client
Repl
icat
ion
w(x=1)success
Clientr(x)x=Null
time
Network Partition: Primary Only
Available, temporarily inconsistent
Primary
Secondary
Client
Repl
icat
ion
w(x=1)
Client success
time
Network Partition: Primary+Secondary
Consistent
Primary
Secondary
Client
Repl
icat
ion
w(x=1)
failure
time
Network Partition: Primary+Secondary
Not Available
2014-07-31
Partition Failures Dominate‣ 2011 (AWS): • misconfiguration => 12 hour outage
‣ 2011 Survey (Microsoft): • 13,300 customer impacting network failures • Median 60,000 packts lost per failure • mean 41 link failures per day (95% of 136) • median time to repair of 5 minutes (up to a week) • Redundant networks only reduce failure impact by 40%
‣ HP Managed Enterprise Networks • 28% of customer tickets due to network problems • 39% of all support tickets due to network problems • Median incident duration: 114-188 minutes
22
http
://qu
eue.
acm
.org
/det
ail.c
fm?i
d=26
5573
6
2014-07-31
LatencyNetwork health really depends on your latency tolerance.
A slow network can be just as bad as a broken network.
The tails matter.
23
2014-07-31
Median Latencies
24
Same AZ Different AZs
Different Regions
http://www.bailis.org/blog/communication-costs-in-real-world-networks/
2014-07-31
99.99% Latencies
25
Same AZ Different AZs
Different Regions
http://www.bailis.org/blog/communication-costs-in-real-world-networks/
2014-07-31
Latency Summary‣ Distributed, coordinated operations: ‣ rate ~ 1/latency
‣ Real world latencies are substantial, with long tails
‣ At scale, 0.01% events happen constantly
‣ Picture actually much worse due to systematic fluctuations
‣ 99.99% Latencies: ‣ Same AZ: ~50 ms ‣ Same Region: ~80 ms ‣ Inter-Region: 200-400 ms!
26
2014-07-31
Thank god for ACID (New)SQL, right? !
… not so fast
27
2014-07-31
ACID in the Wild
28
http://arxiv-web3.library.cornell.edu/abs/1302.0309v1
2014-07-31
Beware the Marketing
29
http://arxiv-web3.library.cornell.edu/abs/1302.0309v1
2014-07-31 30
Wow!
2014-07-31
So… What do we use? What should we worry about?
31
2014-07-31
1. Locks / Concurrency 2. Relationships / Foreign Keys 3. Inter-index consistency
32
Distinguishing Characteristics
2014-07-31
Subjective Classification
33
Cassandra Cloudant MongoDB Riak
Locking Minimal None Writes and Reads Minimal
Consistency Quorum, Optional Paxos Quorum Single document
LocksQuorum,
Optional Paxos
Relationships, “JOINs”
De-normalize, Materialized
Views
Normalize, Materialized
Views
De-normalize, Application Joins
De-normalize or Link Walking
Leading Strategies Immutability Immutability Fat Documents Immutability
“Intention” HA, Shared Nothing, Many Servers
HA, Shared Nothing, Many Servers
Master/Slave, Single Server
HA, Shared Nothing, Many Servers
2014-07-31
It happens in all no-SQL systems. Is it the application's responsibility or the DB?
34
De-normalization
2014-07-31
Relationships as Single Documents
35
Natural fit for some applicationshttp://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
2014-07-31
Relationships as Single Documents
36
Duplication sucks, pathologicallyhttp://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
2014-07-31
Materialized Views Rule
37
Cassandra, Cloudant: “JOINs” via materialized views
2014-07-31
Review: Cassandra
‣ Highly Available
‣ CQL eases pain of de-normalization
‣ 1-many, many-many relationships via inserts into multiple column families at update
‣ Eventual consistency as those updates propagate
‣ Can appeal to Paxos API with latency, availability hit
38
2014-07-31
Review: Cloudant‣ Highly Available
‣ Normalize document structure, include foreign keys to other documents.
‣ Manage foreign key integrity yourself
‣ 1-many, many-many relationships via materialized views
‣ Eventual consistency between primary-index and (batch updated) materialized view
39
2014-07-31
Review: MongoDB‣ Understand when MongoDB locks
‣ Go as far as you can with “fat”, de-normalized documents
‣ Beware the consistency subtleties of replica sets, de-normalization
40
2014-07-31
Review: Riak‣ Highly Available
‣ Include foreign keys to other documents.
‣ Manage foreign key integrity yourself
‣ one-way (“graphy”) relationships via link-walking API
‣ Can appeal to Paxos API with latency, availability hit
41
2014-07-31
My Final $0.02‣ Time to market should be your #1 concern.
‣ You will probably run both SQL and NoSQL.
‣ We’ve focused on the database, but all new apps need a mobile strategy.
‣ You’ll never engineer a perfect network • Focus on Availability and Partition Tolerance
‣ You will need to become advanced/expert in data modeling for your choice of DB
42
2014-07-31
cloudant.com
@mlmilleratmit
#Cloudant
Thanks!
43
IRC
2014-07-31 44