what every developer should know about database scalability
Post on 17-Oct-2014
36.307 views
DESCRIPTION
Replication. Partitioning. Relational databases. Bigtable. Dynamo. There is no one-size-fits-all approach to scaling your database, and the CAP theorem proved that there never will be. This talk will explain the advantages and limits of the approaches to scaling traditional relational databases, as well as the tradeoffs made by the designers of newer distributed systems like Cassandra. These slides are from Jonathan Ellis's OSCON 09 talk: http://en.oreilly.com/oscon2009/public/schedule/detail/7955TRANSCRIPT
![Page 1: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/1.jpg)
Database scalability
Jonathan Ellis
![Page 2: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/2.jpg)
Classic RDBMS persistence
Data
Index
![Page 3: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/3.jpg)
Disk is the new tape*
• ~8ms to seek– ~4ms on expensive 15k rpm disks
![Page 4: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/4.jpg)
![Page 5: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/5.jpg)
What scaling means
money
perf
orm
ance
![Page 6: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/6.jpg)
Performance
• Latency
• Throughput
![Page 7: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/7.jpg)
Two kinds of operations
• Reads
• Writes
![Page 8: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/8.jpg)
Caching
• Memcached
• Ehcache
• etc
DB
cache
![Page 9: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/9.jpg)
Cache invalidation
• Implicit
• Explicit
![Page 10: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/10.jpg)
Cache set invalidation
get_cached_cart(cart=13, offset=10, limit=10)
get('cart:13:10:10')
?
![Page 11: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/11.jpg)
Set invalidation 2
prefix = get('cart_prefix:13')
get(prefix + ':10:10')
del('cart_prefix:13')
http://www.aminus.org/blogs/index.php/2007/12/30/memcached_set_invalidation?blog=2
![Page 12: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/12.jpg)
Replication
![Page 13: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/13.jpg)
Types of replication
• Master → slave– Master → slave → other slaves
• Master ↔ master– multi-master
![Page 14: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/14.jpg)
Types of replication 2
• Synchronous
• Asynchronous
![Page 15: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/15.jpg)
Synchronous
• Synchronous = slow(er)
• Complexity (e.g. 2pc)
• PGCluster
• Oracle
![Page 16: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/16.jpg)
Asynchronous master/slave
• Easiest
• Failover
• MySQL replication
• Slony, Londiste, WAL shipping
• Tungsten
![Page 17: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/17.jpg)
Asynchronous multi-master
• Conflict resolution
– O(N3) or O(N2) as you add nodes
– http://research.microsoft.com/~gray/replicas.ps
• Bucardo
• MySQL Cluster
![Page 18: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/18.jpg)
Achtung!
• Asynchronous replication can lose data if the master fails
![Page 19: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/19.jpg)
“Architecture”
• Primarily about how you cope with failure scenarios
![Page 20: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/20.jpg)
Replication does not scale writes
![Page 21: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/21.jpg)
Scaling writes
• Partitioning aka sharding– Key / horizontal
– Vertical
– Directed
![Page 22: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/22.jpg)
Partitioning
![Page 23: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/23.jpg)
Key based partitioning
• PK of “root” table controls destination– e.g. user id
• Retains referential integrity
![Page 24: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/24.jpg)
Example: blogger.com
Users
Blogs
Comments
![Page 25: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/25.jpg)
Example: blogger.com
Users
Blogs
Comments
Comments'
![Page 26: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/26.jpg)
Vertical partitioning
• Tables on separate nodes
• Often a table that is too big to keep with the other tables, gets too big for a single node
![Page 27: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/27.jpg)
Growing is hard
![Page 28: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/28.jpg)
Directed partitioning
• Central db that knows what server owns a key
• Makes adding machines easier
• Single point of failure
![Page 29: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/29.jpg)
Partitioning
![Page 30: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/30.jpg)
Partitioning with replication
![Page 31: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/31.jpg)
What these have in common
• Ad hoc
• Error-prone
• Manpower-intensive
![Page 32: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/32.jpg)
To summarize
• Scaling reads sucks
• Scaling writes sucks more
![Page 33: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/33.jpg)
Distributed* databases
• Data is automatically partitioned
• Transparent to application
• Add capacity without downtime
• Failure tolerant
*Like Bigtable, not Lotus Notes
![Page 34: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/34.jpg)
Two famous papers
• Bigtable: A distributed storage system for structured data, 2006
• Dynamo: amazon's highly available key-value store, 2007
![Page 35: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/35.jpg)
The world doesn't need another half-assed key/value store
(See also Olin Shivers' 100% and 80% solutions)
![Page 36: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/36.jpg)
Two approaches
• Bigtable: “How can we build a distributed database on top of GFS?”
• Dynamo: “How can we build a distributed hash table appropriate for the data center?”
![Page 37: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/37.jpg)
Bigtable architecture
![Page 38: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/38.jpg)
![Page 39: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/39.jpg)
Lookup in Bigtable
![Page 40: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/40.jpg)
Dynamo
![Page 41: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/41.jpg)
Eventually consistent
• Amazon: http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
• eBay: http://queue.acm.org/detail.cfm?id=1394128
![Page 42: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/42.jpg)
Consistency in a BASE world
• If W + R > N, you are 100% consistent
• W=1, R=N
• W=N, R=1
• W=Q, R=Q where Q = N / 2 + 1
![Page 43: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/43.jpg)
Cassandra
![Page 44: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/44.jpg)
Memtable / SSTable
Commit log
Disk
![Page 45: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/45.jpg)
ColumnFamilies
keyA column1 column2 column3
keyC column1 column7 column11
Column
Byte[] Name
Byte[] Value
I64 timestamp
![Page 46: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/46.jpg)
LSM write properties
• No reads
• No seeks
• Fast
• Atomic within ColumnFamily
![Page 47: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/47.jpg)
vs MySQL with 50GB of data
• MySQL– ~300ms write
– ~350ms read
• Cassandra– ~0.12ms write
– ~15ms read
• Achtung!
![Page 48: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/48.jpg)
Classic RDBMS persistence
Data
Index
![Page 49: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/49.jpg)
Questions