![Page 1: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/1.jpg)
Database scalability
Jonathan Ellis
![Page 2: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/2.jpg)
Classic RDBMS persistence
Data
Index
![Page 3: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/3.jpg)
Disk is the new tape*
• ~8ms to seek– ~4ms on expensive 15k rpm disks
![Page 4: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/4.jpg)
![Page 5: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/5.jpg)
What scaling means
money
perf
orm
ance
![Page 6: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/6.jpg)
Performance
• Latency
• Throughput
![Page 7: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/7.jpg)
Two kinds of operations
• Reads
• Writes
![Page 8: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/8.jpg)
Caching
• Memcached
• Ehcache
• etc
DB
cache
![Page 9: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/9.jpg)
Cache invalidation
• Implicit
• Explicit
![Page 10: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/10.jpg)
Cache set invalidation
get_cached_cart(cart=13, offset=10, limit=10)
get('cart:13:10:10')
?
![Page 11: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/11.jpg)
Set invalidation 2
prefix = get('cart_prefix:13')
get(prefix + ':10:10')
del('cart_prefix:13')
http://www.aminus.org/blogs/index.php/2007/12/30/memcached_set_invalidation?blog=2
![Page 12: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/12.jpg)
Replication
![Page 13: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/13.jpg)
Types of replication
• Master → slave– Master → slave → other slaves
• Master ↔ master– multi-master
![Page 14: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/14.jpg)
Types of replication 2
• Synchronous
• Asynchronous
![Page 15: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/15.jpg)
Synchronous
• Synchronous = slow(er)
• Complexity (e.g. 2pc)
• PGCluster
• Oracle
![Page 16: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/16.jpg)
Asynchronous master/slave
• Easiest
• Failover
• MySQL replication
• Slony, Londiste, WAL shipping
• Tungsten
![Page 17: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/17.jpg)
Asynchronous multi-master
• Conflict resolution
– O(N3) or O(N2) as you add nodes
– http://research.microsoft.com/~gray/replicas.ps
• Bucardo
• MySQL Cluster
![Page 18: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/18.jpg)
Achtung!
• Asynchronous replication can lose data if the master fails
![Page 19: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/19.jpg)
“Architecture”
• Primarily about how you cope with failure scenarios
![Page 20: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/20.jpg)
Replication does not scale writes
![Page 21: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/21.jpg)
Scaling writes
• Partitioning aka sharding– Key / horizontal
– Vertical
– Directed
![Page 22: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/22.jpg)
Partitioning
![Page 23: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/23.jpg)
Key based partitioning
• PK of “root” table controls destination– e.g. user id
• Retains referential integrity
![Page 24: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/24.jpg)
Example: blogger.com
Users
Blogs
Comments
![Page 25: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/25.jpg)
Example: blogger.com
Users
Blogs
Comments
Comments'
![Page 26: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/26.jpg)
Vertical partitioning
• Tables on separate nodes
• Often a table that is too big to keep with the other tables, gets too big for a single node
![Page 27: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/27.jpg)
Growing is hard
![Page 28: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/28.jpg)
Directed partitioning
• Central db that knows what server owns a key
• Makes adding machines easier
• Single point of failure
![Page 29: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/29.jpg)
Partitioning
![Page 30: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/30.jpg)
Partitioning with replication
![Page 31: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/31.jpg)
What these have in common
• Ad hoc
• Error-prone
• Manpower-intensive
![Page 32: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/32.jpg)
To summarize
• Scaling reads sucks
• Scaling writes sucks more
![Page 33: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/33.jpg)
Distributed* databases
• Data is automatically partitioned
• Transparent to application
• Add capacity without downtime
• Failure tolerant
*Like Bigtable, not Lotus Notes
![Page 34: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/34.jpg)
Two famous papers
• Bigtable: A distributed storage system for structured data, 2006
• Dynamo: amazon's highly available key-value store, 2007
![Page 35: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/35.jpg)
The world doesn't need another half-assed key/value store
(See also Olin Shivers' 100% and 80% solutions)
![Page 36: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/36.jpg)
Two approaches
• Bigtable: “How can we build a distributed database on top of GFS?”
• Dynamo: “How can we build a distributed hash table appropriate for the data center?”
![Page 37: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/37.jpg)
Bigtable architecture
![Page 38: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/38.jpg)
![Page 39: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/39.jpg)
Lookup in Bigtable
![Page 40: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/40.jpg)
Dynamo
![Page 41: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/41.jpg)
Eventually consistent
• Amazon: http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
• eBay: http://queue.acm.org/detail.cfm?id=1394128
![Page 42: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/42.jpg)
Consistency in a BASE world
• If W + R > N, you are 100% consistent
• W=1, R=N
• W=N, R=1
• W=Q, R=Q where Q = N / 2 + 1
![Page 43: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/43.jpg)
Cassandra
![Page 44: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/44.jpg)
Memtable / SSTable
Commit log
Disk
![Page 45: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/45.jpg)
ColumnFamilies
keyA column1 column2 column3
keyC column1 column7 column11
Column
Byte[] Name
Byte[] Value
I64 timestamp
![Page 46: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/46.jpg)
LSM write properties
• No reads
• No seeks
• Fast
• Atomic within ColumnFamily
![Page 47: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/47.jpg)
vs MySQL with 50GB of data
• MySQL– ~300ms write
– ~350ms read
• Cassandra– ~0.12ms write
– ~15ms read
• Achtung!
![Page 48: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/48.jpg)
Classic RDBMS persistence
Data
Index
![Page 49: What Every Developer Should Know About Database Scalability](https://reader034.vdocument.in/reader034/viewer/2022042513/5441e357afaf9f5e208b4801/html5/thumbnails/49.jpg)
Questions