caching up and down the stack
DESCRIPTION
Whether you're looking to make your web app run faster or scale better, one great way to achieve both is to simply do less work. How? By using caches, the data hidey-holes which generations of engineers have thoughtfully left at key junctures in computing infrastructure from your CPU to the backbone of the internet. Requests into web applications, which span great distances and often involve expensive frontend and backend lifting are great candidates for caching of all types. We'll discuss the benefits and tradeoffs of caching at different layers of the stack and how to find low-hanging cachable fruit, with a particular focus on server-side improvementsTRANSCRIPT
Caching Up and Downthe Stack
Long Island/Queens Django Meetup 5/20/14
Hi, I’m Dan Kuebrich
● Software engineer, python fan● Web performance geek● Founder of Tracelytics, now part of AppNeta● Once (and future?) Queens resident
DJANGO
What is “caching”?
● Caching is avoiding doing expensive worko by doing cheaper work
● Common examples?o On repeat visits, your browser doesn’t download
images that haven’t changedo Your CPU caches instructions, data so it doesn’t
have to go to RAM… or to disk!
What is “caching”?
Uncached
Client
Data Source
What is “caching”?
Client
Data Source
Uncached Cached
Cache Intermediary
Client
Data Source
What is “caching”?
Client
Data Source
Uncached Cached
Cache Intermediary
Client
Data Source
Fast!
Slow...
“Latency Numbers Every Programmer Should Know”
Systems Performance: Enterprise and the Cloud by Brendan Gregg http://books.google.com/books?id=xQdvAQAAQBAJ&pg=PA20&lpg=PA20&source=bl&ots=hlTgyxdrnR&sig=CCjddHrY1H6muMVW9BFcbdO7DDo&hl=en&sa=X&ei=dS7oUquhOYr9oAT9oYGoDw&ved=0CCkQ6AEwAA#v=onepage&q&f=false
A whole mess of caching:● Browser cache● CDN● Proxy / optimizer● Application-based
o Full-pageo Fragmento Object cache
● Databaseo Query cacheo Denormalization
Closer to the user
Closer to the data
Caching in Django apps: Frontend
● Client-side assets● Full pages
Client-side assets
Client-side assets
Client-side assets● Use HTTP caches!
o Browsero CDNo Intermediate proxies
● Set policy with cache headerso Cache-Control / Expireso ETag / Last-Modified
HTTP Cache-Control and Expires● Stop the browser from even asking for it● Expires
o Pick a date in the future, good til then
● Cache-controlo More flexibleo Introduced in HTTP 1.1o Use this one
HTTP Cache-Control and Expires
dan@JLTM21:~$ curl -I https://login.tv.appneta.com/cache/tl-layouts_base_unauth-compiled-162c2ceecd9a7ff1e65ab460c2b99852a49f5a43.css
HTTP/1.1 200 OKAccept-Ranges: bytesCache-Control: max-age=315360000Content-length: 5955Content-Type: text/cssDate: Tue, 20 May 2014 23:12:16 GMTExpires: Thu, 31 Dec 2037 23:55:55 GMTLast-Modified: Fri, 16 May 2014 20:51:19 GMTServer: nginxConnection: keep-alive
HTTP Cache Control in Django
https://docs.djangoproject.com/en/dev/topics/cache/
ETag + Last-Modified
ETag + Last-Modified
dan@JLTM21:~$ curl -I www.appneta.com/stylesheets/styles.css
HTTP/1.1 200 OKLast-Modified: Tue, 20 May 2014 05:52:50 GMTETag: "30854c-1c3d3-4f9ce7d715080"Vary: Accept-EncodingContent-Type: text/css...
ETag + Last-Modified
dan@JLTM21:~$ curl -I www.appneta.com/stylesheets/styles.css --header 'If-None-Match: "30854c-1c3d3-4f9ce7d715080"'
HTTP/1.1 304 Not ModifiedLast-Modified: Tue, 20 May 2014 05:52:50 GMTETag: "30854c-1c3d3-4f9ce7d715080"Vary: Accept-EncodingContent-Type: text/cssDate: Tue, 20 May 2014 23:21:12 GMT...
ETag vs Last-Modified
● Last-Modified is date-based● ETag is content-based● Most webservers generate both
● Some webservers (Apache) generate etags
that depend on local stateo If you have a load-balanced pool of servers working
here, they might not be using the same etags!
A whole mess of caching:● Browser cache● CDN● Proxy / optimizer● Application-based
o Full-pageo Fragmento Object cache
● Databaseo Query cacheo Denormalization
CDNs
● Put content closer to your end-userso and offload HTTP requests from
your servers● Best for static assets● Same cache control policies apply
Full-page caching
Client
Data Source
Varnish
No internet standards necessary!
Full-page caching: mod_pagespeed
Client
Data Source
mod_pagespeed
● Dynamically rewrites pages with frontend optimizations
● Caches rewritten pages
A whole mess of caching:● Browser cache● CDN● Proxy / optimizer● Application-based
o Full-pageo Fragmento Object cache
● Databaseo Query cacheo Denormalization
Full-page caching in Django
Wait, where is this getting cached?
● Django makes it easy to configureo In-memoryo File-basedo Memcachedo etc.
Full-page caching: dynamic pages?
Full-page caching: dynamic pages?
Fragment caching
Full-page caching: dynamic pages?
Full-page caching: the ajax solution
Object cachingdef get_item_by_id(key):
# Look up the item in our databasereturn session.query(User)\
.filter_by(id=key)\ .first()
Object cachingdef get_item_by_id(key):
# Check in cacheval = mc.get(key)# If exists, return itif val:
return val# If not, get the val, store it in the cacheval = return session.query(User)\
.filter_by(id=key)\ .first()
mc.set(key, val)return val
Object caching
@decoratordef cache(expensive_func, key):
# Check in cacheval = mc.get(key)# If exists, return itif val:
return val# If not, get the val, store it in the cacheval = expensive_func(key)mc.set(key, val)return val
Object caching@cachedef get_item_by_id(key):
# Look up the item in our databasereturn session.query(User)\
.filter_by(id=key)\ .first()
Object caching in Django
A whole mess of caching:● Browser cache● CDN● Proxy / optimizer● Application-based
o Full-pageo Fragmento Object cache
● Databaseo Query cacheo Denormalization
Query caching
Client
Actual tables
Database
Query Cache
Cached?
Query cachingmysql> select SQL_CACHE count(*) from traces; +----------+| count(*) |+----------+| 3135623 |+----------+1 row in set (0.56 sec)
mysql> select SQL_CACHE count(*) from traces;+----------+| count(*) |+----------+| 3135623 |+----------+1 row in set (0.00 sec)
Query caching
Query caching
Uncached
Cached
Denormalization
mysql> select table1.x, table2.y from table1 join table2 on table1.z = table2.q where table1.z > 100;
mysql> select table1.x, table1.y from table1 where table1.z > 100;
A whole mess of caching:● Browser cache● CDN● Proxy / optimizer● Application-based
o Full-pageo Fragmento Object cache
● Databaseo Query cacheo Denormalization
Caching: what can go wrong?
● Invalidation● Fragmentation● Stampedes● Complexity
Invalidation
Client
Data Source
Cache Intermediary
Update!
Write
Invalidate
Invalidation on page-scale● Browser cache● CDN● Proxy / optimizer● Application-based
o Full-pageo Fragmento Object cache
● Databaseo Query cacheo Denormalization
More savings,generally more invalidation...
Smaller savings,generally less invalidation
Fragmentation
● What if I have a lot of different things to cache?o More misseso Potential cache eviction
Fragmentation
Your pages / objects
Fre
quen
cy o
f Acc
ess
Fragmentation
Your pages / objects
Fre
quen
cy o
f Acc
ess
Stampedes
● On a cache miss extra work is done● The result is stored in the cache● What if multiple simultaneous misses?
Stampedes
http://allthingsd.com/20080521/stampede-facebook-opens-its-profile-doors/
Complexity
● How much caching do I need, and where?● What is the invalidation process
o on data update? on release?● What happens if the caches fall over?● How do I debug it?
Takeaways
● The ‘how’ of caching:o What are you caching?o Where are you caching it?o How bad is a cache miss?o How and when are you invalidating?
Takeaways
● The ‘why’ of caching:o Did it actually get faster?o Is speed worth extra complexity?o Don’t guess – measure!o Always use real-world conditions.
Questions?
?
Thanks!
● Interested in measuring your Django app’s performance?o Free trial of TraceView:
www.appneta.com/products/traceview● See you at Velocity NYC this fall?● Twitter: @appneta / @dankosaur
Resources● Django documentation on caching: https://docs.djangoproject.com/en/dev/topics/cache/● Varnish caching, via Disqus:
http://blog.disqus.com/post/62187806135/scaling-django-to-8-billion-page-views● Django cache option comparisons:
http://codysoyland.com/2010/jan/17/evaluating-django-caching-options/● More Django-specific tips:
http://www.slideshare.net/csky/where-django-caching-bust-at-the-seams● Guide to cache-related HTTP headers:
http://www.mobify.com/blog/beginners-guide-to-http-cache-headers/● Google PageSpeed: https://developers.google.com/speed/pagespeed/module