scaling django dc09
Post on 17-Oct-2014
7.503 views
DESCRIPTION
Django scaling from Mike Malone http://immike.net/blog/about/TRANSCRIPT
![Page 1: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/1.jpg)
Scaling Django Web AppsMike Malone
djangocon 2009Thursday, September 10, 2009
![Page 2: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/2.jpg)
Thursday, September 10, 2009
![Page 3: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/3.jpg)
Thursday, September 10, 2009
![Page 4: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/4.jpg)
http://www.flickr.com/photos/kveton/2910536252/Thursday, September 10, 2009
![Page 5: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/5.jpg)
Thursday, September 10, 2009
![Page 6: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/6.jpg)
djangocon 2009
Pownce
• Large scale
• Hundreds of requests/sec
• Thousands of DB operations/sec
• Millions of user relationships
• Millions of notes
• Terabytes of static data
6
Thursday, September 10, 2009
![Page 7: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/7.jpg)
djangocon 2009
Pownce
• Encountered and eliminated many common scaling bottlenecks
• Real world example of scaling a Django app
• Django provides a lot for free
• I’ll be focusing on what you have to build yourself, and the rare places where Django got in the way
7
Thursday, September 10, 2009
![Page 8: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/8.jpg)
Scalability
Thursday, September 10, 2009
![Page 9: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/9.jpg)
djangocon 2009
Scalability
9
• Speed / Performance
• Generally affected by language choice
• Achieved by adopting a particular technology
Scalability is NOT:
Thursday, September 10, 2009
![Page 10: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/10.jpg)
djangocon 2009
import time
def application(environ, start_response): time.sleep(10) start_response('200 OK', [('content-type', 'text/plain')]) return ('Hello, world!',)
A Scalable Application
10
Thursday, September 10, 2009
![Page 11: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/11.jpg)
djangocon 2009
def application(environ, start_response): remote_addr = environ['REMOTE_ADDR'] f = open('access-log', 'a+') f.write(remote_addr + "\n") f.flush() f.seek(0) hits = sum(1 for l in f.xreadlines()
if l.strip() == remote_addr) f.close() start_response('200 OK', [('content-type', 'text/plain')]) return (str(hits),)
A High Performance Application
11
Thursday, September 10, 2009
![Page 12: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/12.jpg)
djangocon 2009
Scalability
12
A scalable system doesn’t need to change when the size of the problem changes.
Thursday, September 10, 2009
![Page 13: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/13.jpg)
djangocon 2009
Scalability
• Accommodate increased usage
• Accommodate increased data
• Maintainable
13
Thursday, September 10, 2009
![Page 14: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/14.jpg)
djangocon 2009
Scalability
• Two kinds of scalability
• Vertical scalability: buying more powerful hardware, replacing what you already own
• Horizontal scalability: buying additional hardware, supplementing what you already own
14
Thursday, September 10, 2009
![Page 15: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/15.jpg)
djangocon 2009
Vertical Scalability
• Costs don’t scale linearly (server that’s twice is fast is more than twice as much)
• Inherently limited by current technology
• But it’s easy! If you can get away with it, good for you.
15
Thursday, September 10, 2009
![Page 16: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/16.jpg)
djangocon 2009
Vertical Scalability
16
Sky scrapers are special. Normal buildings don’t need 10 floor foundations. Just build!
- Cal Henderson
“
Thursday, September 10, 2009
![Page 17: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/17.jpg)
djangocon 2009
Horizontal Scalability
17
The ability to increase a system’s capacity by adding more processing units (servers)
Thursday, September 10, 2009
![Page 18: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/18.jpg)
djangocon 2009
Horizontal Scalability
18
It’s how large apps are scaled.
Thursday, September 10, 2009
![Page 19: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/19.jpg)
djangocon 2009
Horizontal Scalability
• A lot more work to design, build, and maintain
• Requires some planning, but you don’t have to do all the work up front
• You can scale progressively...
• Rest of the presentation is roughly in order
19
Thursday, September 10, 2009
![Page 20: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/20.jpg)
Caching
Thursday, September 10, 2009
![Page 21: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/21.jpg)
djangocon 2009
Caching
• Several levels of caching available in Django
• Per-site cache: caches every page that doesn’t have GET or POST parameters
• Per-view cache: caches output of an individual view
• Template fragment cache: caches fragments of a template
• None of these are that useful if pages are heavily personalized
21
Thursday, September 10, 2009
![Page 22: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/22.jpg)
djangocon 2009
Caching
• Low-level Cache API
• Much more flexible, allows you to cache at any granularity
• At Pownce we typically cached
• Individual objects
• Lists of object IDs
• Hard part is invalidation
22
Thursday, September 10, 2009
![Page 23: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/23.jpg)
djangocon 2009
Caching
• Cache backends:
• Memcached
• Database caching
• Filesystem caching
23
Thursday, September 10, 2009
![Page 24: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/24.jpg)
djangocon 2009
Caching
24
Use Memcache.
Thursday, September 10, 2009
![Page 25: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/25.jpg)
djangocon 2009
Sessions
25
Use Memcache.
Thursday, September 10, 2009
![Page 26: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/26.jpg)
djangocon 2009
Sessions
26
Or Tokyo Cabinethttp://github.com/ericflo/django-tokyo-sessions/
Thanks @ericflo
Thursday, September 10, 2009
![Page 27: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/27.jpg)
djangocon 2009
from django.core.cache import cache
class UserProfile(models.Model): ... def get_social_network_profiles(self): cache_key = ‘networks_for_%s’ % self.user.id profiles = cache.get(cache_key) if profiles is None: profiles = self.user.social_network_profiles.all() cache.set(cache_key, profiles) return profiles
Caching
27
Basic caching comes free with Django:
Thursday, September 10, 2009
![Page 28: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/28.jpg)
djangocon 2009
from django.core.cache import cachefrom django.db.models import signals
def nuke_social_network_cache(self, instance, **kwargs): cache_key = ‘networks_for_%s’ % self.instance.user_id cache.delete(cache_key)
signals.post_save.connect(nuke_social_network_cache, sender=SocialNetworkProfile)signals.post_delete.connect(nuke_social_network_cache, sender=SocialNetworkProfile)
Caching
28
Invalidate when a model is saved or deleted:
Thursday, September 10, 2009
![Page 29: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/29.jpg)
djangocon 2009
Caching
29
• Invalidate post_save, not pre_save
• Still a small race condition
• Simple solution, worked for Pownce:
• Instead of deleting, set the cache key to None for a short period of time
• Instead of using set to cache objects, use add, which fails if there’s already something stored for the key
Thursday, September 10, 2009
![Page 30: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/30.jpg)
djangocon 2009
Advanced Caching
30
• Memcached’s atomic increment and decrement operations are useful for maintaining counts
• They were added to the Django cache API in Django 1.1
Thursday, September 10, 2009
![Page 31: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/31.jpg)
djangocon 2009
Advanced Caching
31
• You can still use them if you poke at the internals of the cache object a bit
• cache._cache is the underlying cache object
try: result = cache._cache.incr(cache_key, delta)except ValueError: # nonexistent key raises ValueError # Do it the hard way, store the result.return result
Thursday, September 10, 2009
![Page 32: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/32.jpg)
djangocon 2009
Advanced Caching
32
• Other missing cache API
• delete_multi & set_multi
• append: add data to existing key after existing data
• prepend: add data to existing key before existing data
• cas: store this data, but only if no one has edited it since I fetched it
Thursday, September 10, 2009
![Page 33: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/33.jpg)
djangocon 2009
Advanced Caching
33
• It’s often useful to cache objects ‘forever’ (i.e., until you explicitly invalidate them)
• User and UserProfile
• fetched almost every request
• rarely change
• But Django won’t let you
• IMO, this is a bug :(
Thursday, September 10, 2009
![Page 34: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/34.jpg)
djangocon 2009
class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';'))
def add(self, key, value, timeout=0): if isinstance(value, unicode): value = value.encode('utf-8') return self._cache.add(smart_str(key), value, timeout or self.default_timeout)
The Memcache Backend
34
Thursday, September 10, 2009
![Page 35: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/35.jpg)
djangocon 2009
class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';'))
def add(self, key, value, timeout=None): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.add(smart_str(key), value, timeout)
The Memcache Backend
35
Thursday, September 10, 2009
![Page 36: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/36.jpg)
djangocon 2009
Advanced Caching
36
• Typical setup has memcached running on web servers
• Pownce web servers were I/O and memory bound, not CPU bound
• Since we had some spare CPU cycles, we compressed large objects before caching them
• The Python memcache library can do this automatically, but the API is not exposed
Thursday, September 10, 2009
![Page 37: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/37.jpg)
djangocon 2009
from django.core.cache import cachefrom django.utils.encoding import smart_strimport inspect as i
if 'min_compress_len' in i.getargspec(cache._cache.set)[0]: class CacheClass(cache.__class__): def set(self, key, value, timeout=None, min_compress_len=150000): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.set(smart_str(key), value, timeout, min_compress_len) cache.__class__ = CacheClass
Monkey Patching core.cache
37
Thursday, September 10, 2009
![Page 38: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/38.jpg)
djangocon 2009
Advanced Caching
38
• Useful tool: automagic single object cache
• Use a manager to check the cache prior to any single object get by pk
• Invalidate assets on save and delete
• Eliminated several hundred QPS at Pownce
Thursday, September 10, 2009
![Page 39: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/39.jpg)
djangocon 2009
Advanced Caching
39
All this and more at:
http://github.com/mmalone/django-caching/
Thursday, September 10, 2009
![Page 40: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/40.jpg)
djangocon 2009
Caching
40
Now you’ve made life easier for your DB server,next thing to fall over: your app server.
Thursday, September 10, 2009
![Page 41: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/41.jpg)
Load Balancing
Thursday, September 10, 2009
![Page 42: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/42.jpg)
djangocon 2009
Load Balancing
• Out of the box, Django uses a shared nothing architecture
• App servers have no single point of contention
• Responsibility pushed down the stack (to DB)
• This makes scaling the app layer trivial: just add another server
42
Thursday, September 10, 2009
![Page 43: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/43.jpg)
djangocon 2009
Load Balancing
43
App Servers
Database
Load Balancer
Spread work between multiple nodes in a cluster using a load balancer.
• Hardware or software• Layer 7 or Layer 4
Thursday, September 10, 2009
![Page 44: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/44.jpg)
djangocon 2009
Load Balancing
44
• Hardware load balancers
• Expensive, like $35,000 each, plus maintenance contracts
• Need two for failover / high availability
• Software load balancers
• Cheap and easy, but more difficult to eliminate as a single point of failure
• Lots of options: Perlbal, Pound, HAProxy, Varnish, Nginx
Thursday, September 10, 2009
![Page 45: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/45.jpg)
djangocon 2009
Load Balancing
45
• Most of these are layer 7 proxies, and some software balancers do cool things
• Caching
• Re-proxying
• Authentication
• URL rewriting
Thursday, September 10, 2009
![Page 46: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/46.jpg)
djangocon 2009
Load Balancing
46
A common setup for large operations is to use redundant layer 4 hardware balancers in front of a pool of layer 7 software balancers.
Hardware Balancers
Software Balancers
App Servers
Thursday, September 10, 2009
![Page 47: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/47.jpg)
djangocon 2009
Load Balancing
47
• At Pownce, we used a single Perlbal balancer
• Easily handled all of our traffic (hundreds of simultaneous connections)
• A SPOF, but we didn’t have $100,000 for black box solutions, and weren’t worried about service guarantees beyond three or four nines
• Plus there were some neat features that we took advantage of
Thursday, September 10, 2009
![Page 48: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/48.jpg)
djangocon 2009
Perlbal Reproxying
48
Perlbal reproxying is a really cool, and really poorlydocumented feature.
Thursday, September 10, 2009
![Page 49: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/49.jpg)
djangocon 2009
Perlbal Reproxying
49
1. Perlbal receives request
2. Redirects to App Server
1. App server checks auth (etc.)
2. Returns HTTP 200 with X-Reproxy-URL header set to internal file server URL
3. File served from file server via Perlbal
Thursday, September 10, 2009
![Page 50: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/50.jpg)
djangocon 2009
Perlbal Reproxying
• Completely transparent to end user
• Doesn’t keep large app server instance around to serve file
• Users can’t access files directly (like they could with a 302)
50
Thursday, September 10, 2009
![Page 51: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/51.jpg)
djangocon 2009
def download(request, filename): # Check auth, do your thing response = HttpResponse() response[‘X-REPROXY-URL’] = ‘%s/%s’ % (FILE_SERVER, filename) return response
Perlbal Reproxying
51
Plus, it’s really easy:
Thursday, September 10, 2009
![Page 52: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/52.jpg)
djangocon 2009
Load Balancing
52
Best way to reduce load on your app servers: don’t use them to do hard stuff.
Thursday, September 10, 2009
![Page 53: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/53.jpg)
Queuing
Thursday, September 10, 2009
![Page 54: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/54.jpg)
djangocon 2009
Queuing
• A queue is simply a bucket that holds messages until they are removed for processing by clients
• Many expensive operations can be queued and performed asynchronously
• User experience doesn’t have to suffer
• Tell the user that you’re running the job in the background (e.g., transcoding)
• Make it look like the job was done real-time (e.g., note distribution)
54
Thursday, September 10, 2009
![Page 55: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/55.jpg)
djangocon 2009
Queuing
• Lots of open source options for queuing
• Ghetto Queue (MySQL + Cron)
• this is the official name.
• Gearman
• TheSchwartz
• RabbitMQ
• Apache ActiveMQ
• ZeroMQ
55
Thursday, September 10, 2009
![Page 56: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/56.jpg)
djangocon 2009
Queuing
• Lots of fancy features: brokers, exchanges, routing keys, bindings...
• Don’t let that crap get you down, this is really simple stuff
• Biggest decision: persistence
• Does your queue need to be durable and persistent, able to survive a crash?
• This requires logging to disk which slows things down, so don’t do it unless you have to
56
Thursday, September 10, 2009
![Page 57: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/57.jpg)
djangocon 2009
Queuing
• Pownce used a simple ghetto queue built on MySQL / cron
• Problematic if you have multiple consumers pulling jobs from the queue
• No point in reinventing the wheel, there are dozens of battle-tested open source queues to choose from
57
Thursday, September 10, 2009
![Page 58: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/58.jpg)
djangocon 2009
from django.core.management import setup_environfrom mysite import settings
setup_environ(settings)
Django Standalone Scripts
58
Consumers need to setup the Django environment
Thursday, September 10, 2009
![Page 59: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/59.jpg)
THE DATABASE!
Thursday, September 10, 2009
![Page 60: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/60.jpg)
djangocon 2009
The Database
• Til now we’ve been talking about
• Shared nothing
• Pushing problems down the stack
• But we have to store a persistent and consistent view of our application’s state somewhere
• Enter, the database...
60
Thursday, September 10, 2009
![Page 61: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/61.jpg)
djangocon 2009
CAP Theorem
• Three properties of a shared-data system
• Consistency: all clients see the same data
• Availability: all clients can see some version of the data
• Partition Tolerance: system properties hold even when the system is partitioned & messages are lost
• But you can only have two
61
Thursday, September 10, 2009
![Page 62: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/62.jpg)
djangocon 2009
CAP Theorem
• Big long proof... here’s my version.
• Empirically, seems to make sense.
• Eric Brewer
• Professor at University of California, Berkeley
• Co-founder and Chief Scientist of Inktomi
• Probably smarter than me
62
Thursday, September 10, 2009
![Page 63: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/63.jpg)
djangocon 2009
CAP Theorem
• The relational database systems we all use were built with consistency as their primary goal
• But at scale our system needs to have high availability and must be partitionable
• The RDBMS’s consistency requirements get in our way
• Most sharding / federation schemes are kludges that trade consistency for availability & partition tolerance
63
Thursday, September 10, 2009
![Page 64: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/64.jpg)
djangocon 2009
The Database
• There are lots of non-relational databases coming onto the scene
• CouchDB
• Cassandra
• Tokyo Cabinet
• But they’re not that mature, and they aren’t easy to use with Django
64
Thursday, September 10, 2009
![Page 65: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/65.jpg)
Denormalization
Thursday, September 10, 2009
![Page 66: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/66.jpg)
djangocon 2009
Denormalization
• Django encourages normalized data, which is usually good
• But at scale you need to denormalize
• Corollary: joins are evil
• Django makes it really easy to do joins using the ORM, so pay attention
66
Thursday, September 10, 2009
![Page 67: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/67.jpg)
djangocon 2009
Denormalization
• Start with a normalized database
• Selectively denormalize things as they become bottlenecks
• Denormalized counts, copied fields, etc. can be updated in signal handlers
67
Thursday, September 10, 2009
![Page 68: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/68.jpg)
Replication
Thursday, September 10, 2009
![Page 69: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/69.jpg)
djangocon 2009
Replication
• Typical web app is 80 to 90% reads
• Adding read capacity will get you a long way
• MySQL Master-Slave replication
69
Read & Write
Read only
Thursday, September 10, 2009
![Page 70: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/70.jpg)
djangocon 2009
Replication
• Django doesn’t make it easy to use multiple database connections, but it is possible
• Some caveats
• Slave lag interacts with caching in weird ways
• You can only save to your primary DB (the one you configure in settings.py)
• Unless you get really clever...
70
Thursday, September 10, 2009
![Page 71: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/71.jpg)
djangocon 2009
class SlaveDatabaseWrapper(DatabaseWrapper): def _cursor(self, settings): if not self._valid_connection(): kwargs = { 'conv': django_conversions, 'charset': 'utf8', 'use_unicode': True, } kwargs = pick_random_slave(settings.SLAVE_DATABASES) self.connection = Database.connect(**kwargs) ... cursor = CursorWrapper(self.connection.cursor()) return cursor
Replication
71
1. Create a custom database wrapper by subclassing DatabaseWrapper
Thursday, September 10, 2009
![Page 72: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/72.jpg)
djangocon 2009
class MultiDBQuerySet(QuerySet): ... def update(self, **kwargs): slave_conn = self.query.connection self.query.connection = default_connection super(MultiDBQuerySet, self).update(**kwargs) self.query.connection = slave_conn
Replication
72
2. Custom QuerySet that uses primary DB for writes
Thursday, September 10, 2009
![Page 73: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/73.jpg)
djangocon 2009
class SlaveDatabaseManager(db.models.Manager): def get_query_set(self): return MultiDBQuerySet(self.model, query=self.create_query())
def create_query(self): return db.models.sql.Query(self.model, connection)
Replication
73
3. Custom Manager that uses your custom QuerySet
Thursday, September 10, 2009
![Page 74: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/74.jpg)
djangocon 2009
Replication
74
http://github.com/mmalone/django-multidb/
Example on github:
Thursday, September 10, 2009
![Page 76: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/76.jpg)
djangocon 2009
Replication
• Goal:
• Read-what-you-write consistency for writer
• Eventual consistency for everyone else
• Slave lag screws things up
76
Thursday, September 10, 2009
![Page 77: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/77.jpg)
djangocon 2009
Replication
77
What happens when you become write saturated?
Thursday, September 10, 2009
![Page 78: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/78.jpg)
Federation
Thursday, September 10, 2009
![Page 79: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/79.jpg)
djangocon 2009
Federation
79
• Start with Vertical Partitioning: split tables that aren’t joined across database servers
• Actually pretty easy
• Except not with Django
Thursday, September 10, 2009
![Page 80: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/80.jpg)
djangocon 2009
Federation
80
django.db.models.base
FAIL!
Thursday, September 10, 2009
![Page 81: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/81.jpg)
djangocon 2009
Federation
• At some point you’ll need to split a single table across databases (e.g., user table)
• Auto-increment PKs won’t work
• It’d be nice to have a UUIDField for PKs
• You can probably build this yourself
81
Thursday, September 10, 2009
![Page 82: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/82.jpg)
Profiling, Monitoring & Measuring
Thursday, September 10, 2009
![Page 83: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/83.jpg)
djangocon 2009
>>> Article.objects.filter(pk=3).query.as_sql()('SELECT "app_article"."id", "app_article"."name", "app_article"."author_id" FROM "app_article" WHERE "app_article"."id" = %s ', (3,))
Know your SQL
83
Thursday, September 10, 2009
![Page 84: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/84.jpg)
djangocon 2009
>>> import sqlparse>>> def pp_query(qs):... t = qs.query.as_sql()... sql = t[0] % t[1]... print sqlparse.format(sql, reindent=True, keyword_case='upper')... >>> pp_query(Article.objects.filter(pk=3))SELECT "app_article"."id", "app_article"."name", "app_article"."author_id"FROM "app_article"WHERE "app_article"."id" = 3
Know your SQL
84
Thursday, September 10, 2009
![Page 85: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/85.jpg)
djangocon 2009
>>> from django.db import connection>>> connection.queries[{'time': '0.001', 'sql': u'SELECT "app_article"."id", "app_article"."name", "app_article"."author_id" FROM "app_article"'}]
Know your SQL
85
Thursday, September 10, 2009
![Page 86: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/86.jpg)
djangocon 2009
Know your SQL
• It’d be nice if a lightweight stacktrace could be done in QuerySet.__init__
• Stick the result in connection.queries
• Now we know where the query originated
86
Thursday, September 10, 2009
![Page 87: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/87.jpg)
djangocon 2009
Measuring
87
Django Debug Toolbar
http://github.com/robhudson/django-debug-toolbar/
Thursday, September 10, 2009
![Page 88: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/88.jpg)
djangocon 2009
Monitoring
• Ganglia
• Munin
88
You can’t improve what you don’t measure.
Thursday, September 10, 2009
![Page 89: Scaling Django Dc09](https://reader034.vdocument.in/reader034/viewer/2022051411/5441e3f5afaf9f52208b481f/html5/thumbnails/89.jpg)
djangocon 2009
Measuring & Monitoring
• Measure
• Server load, CPU usage, I/O
• Database QPS
• Memcache QPS, hit rate, evictions
• Queue lengths
• Anything else interesting
89
Thursday, September 10, 2009