saving the world from guaranteed apocalypse* using varnish and memcached
DESCRIPTION
From guaranteed APOCALYPSE* using varnish, memcached, and some other stuff From PHP Bulgaria User Group Meeting: 23.11.2013TRANSCRIPT
SAVING THE WORLDFrom guaranteed APOCALYPSE*
using varnish, memcached, and some other stuff
* apocalypse not really guaranteed
WHY NOT DOING CACHING IS BAD?
• Keep executing the same code with the same data
• Waste computing power getting the same result
• That power is probably generated by burning coal*
• Burning stuff produces tons of CO2**
* it most likely is not ** probably a smaller unit of mass
WHY SHOULD YOU CARE?• Your web apps will become WAY faster
• Users and search engines will like you MORE
• You will use A LOT less hardware resources
• You will generate LESS CO2 and/or save $$$
• The Earth will NOT explode and/or you’ll have more $$$
• Women like people who save the world and/or have $$$
• And lots of other stuff** 0 or greater amount of other stuff
WHY YOU SHOULD AVOID USING TTL
• You might use obsolete data
• Your server might get a cache stampede and go down
• You should PUSH the fresh data in your cache as soon as you have it, BEFORE the old one has expired from the cache
WAIT, WHAT IS A CACHE STAMPEDE?
1.A critical piece of your cached data expired through TTL (or is evicted)
requ
ests
seconds
WAIT, WHAT IS A CACHE STAMPEDE?
2. A client requests a service which relies on that data
requ
ests
seconds
WAIT, WHAT IS A CACHE STAMPEDE?
3. That data takes relatively long time to compute
requ
ests
seconds
WAIT, WHAT IS A CACHE STAMPEDE?
5. A lot of them stack on the server before the first one is even finished
requ
ests
seconds
HOW DO I CACHE THINGS?
1. Create a Memcached instance $memcached = new Memcached;$memcached->addServers( $memcachedServers );
2. Put data in $memcached->set( $key, $value, $expireAt );
3. Get data out $memcached->get( $key );
HELPFUL TIPS• It’s best if you cache the final result of an operation rather than the entry data
• You should always have a fallback if you get a cache miss
• Try to avoid flushing the entire cache, use clever key names instead
• Use Memcached::getAllKeys() to help you manage/release/update data
• Use Memcached::stats() to help you improve efficiency
• Have a warmup script!
WHAT TO CHECK IN STATS()
… … [“get_hits”]=>int(110825125) [“get_misses”]=>int(17396765) [“evictions”]=>int(0) … …
VARNISH IS:
• A caching HTTP reverse proxy
• Really, really really FAST
• Usually limited by the speed of the network
• Has decent flexibility with VCL configuration language
COMMON PROBLEMS TO OVERCOME• My pages are mix of highly dynamic sections and mostly static stuff, and Varnish supposedly only caches
whole pages
• I need to control/flush/refresh the cache without stoping/starting/killing/rebooting/pulling the cord/assaulting the datacenter and I prefer to do it from within my app
• My visitors have unique stuff
• Sessions
• Cookies
• Statistics and tracking visitors
ABOUT ESI
• Edge Side Includes or ESI is a small markup language for edge level dynamic web content assembly. The purpose of ESI is to tackle the problem of web infrastructure scaling.
<HTML> <BODY> … <esi:include src=“/esi/private/recentproducts“/> … </BODY> </HTML>
Doesn't change at all
2-4minutes
Doesn't change at all
24h
1minute
1 hour
session specific
session specific
SETTING UP BACKENDS
backend www { .host = “192.168.0.2”; .port = “81”; .connect_timeout = 1s; .first_byte_timeout = 5s; .between_bytes_timeout = 2s; }
HOW DOES IT WORK?
Client request
vcl_recvpass
pipevcl_pipe
vcl_pass
vcl_hit
lookup
vcl_miss
vcl_hash
vcl_deliver vcl_fetch
Backend1
Backend2
pass
pipevcl_error
fetch
vcl_recv• First checkpoint when a request arrives and is parsed
• We must decide whether to lookup, pass or pipe the request
• We can choose a backend to use
• We have the req object
• Definition of PURGE, BAN or REFRESH like requests is here
• We can set a header in the req object to tell our backend the request is from varnish
set req.backend = default; set req.http.X-Varnish-Handshake = “1”; set req.http.X-Forwarded-For = client.ip; !
if (req.url ~ "/esi/") { set req.http.X-Varnish-Esi = regsub(req.url, ".esi/(\w+)/.*", "\1"); remove req.http.Accept-Encoding; } if (req.request != "GET" && req.request != "HEAD") { # We only deal with GET and HEAD by default return (pass); } if (req.http.Cookie !~ “PHPSESSID="){ call generate_session; } return (lookup);
WAIT, WHAT?sub generate_session { C{ char uuid_buf [50]; generate_uuid(uuid_buf); VRT_SetHdr(sp, HDR_REQ, "\030X-Varnish-Fake-Session:", uuid_buf, vrt_magic_string_end ); }C ! if (req.http.Cookie) { set req.http.Cookie = req.http.X-Varnish-Fake-Session + "; " + req.http.Cookie; } else { set req.http.Cookie = req.http.X-Varnish-Fake-Session; } }
WAIT, WHAT?sub generate_session { C{ char uuid_buf [50]; generate_uuid(uuid_buf); VRT_SetHdr(sp, HDR_REQ, "\030X-Varnish-Fake-Session:", uuid_buf, vrt_magic_string_end ); }C ! if (req.http.Cookie) { set req.http.Cookie = req.http.X-Varnish-Fake-Session + "; " + req.http.Cookie; } else { set req.http.Cookie = req.http.X-Varnish-Fake-Session; } }
C{ #include <stdlib.h> #include <stdio.h> #include <time.h> #include <pthread.h> ! static pthread_mutex_t lrand_mutex = PTHREAD_MUTEX_INITIALIZER; ! void generate_uuid(char* buf) { pthread_mutex_lock(&lrand_mutex); long a = lrand48(); long b = lrand48(); long c = lrand48(); long d = lrand48(); pthread_mutex_unlock(&lrand_mutex); sprintf(buf, "PHPSESSID=%08lx%04lx%04lx%04lx%04lx%08lx", a, b & 0xffff, (b & ((long)0x0fff0000) >> 16) | 0x4000, (c & 0x0fff) | 0x8000, (c & (long)0xffff0000) >> 16, d ); return; } }C
HOW DOES IT WORK?
Client request
vcl_recvpass
pipevcl_pipe
vcl_pass
vcl_hit
lookup
vcl_miss
vcl_hash
vcl_deliver vcl_fetch
Backend1
Backend2
pass
pipevcl_error
fetch
vcl_hash
• Generates the hash through which Varnish looks up an object
• We have the req object
• We can make certain objects unique in the cache based on something more than just the url - like a session cookie.
hash_data(req.url); if (req.http.host) { hash_data(req.http.host); } else { hash_data(server.ip); } !
if (req.http.Accept-Encoding) { hash_data(req.http.Accept-Encoding); } !
if (req.http.X-Varnish-Esi == "private" && req.http.Cookie ~ "PHPSESSID=") { hash_data(regsub(req.http.Cookie, "^.*?PHPSESSID=([^;]*);*.*$", "\1")); } !
return (hash);
HOW DOES IT WORK?
Client request
vcl_recvpass
pipevcl_pipe
vcl_pass
vcl_hit
lookup
vcl_miss
vcl_hash
vcl_deliver vcl_fetch
Backend1
Backend2
pass
pipevcl_error
fetch
vcl_fetch• Takes control when a response from the backend is fetched and parsed
• We have the req and beresp objects
• A good place to sanitise the backend response and control TTL
• Removal of Set-Cookie header is a good practice here
• Add helper headers to the cached object for the ban lurker
• We can choose to deliver or hit_for_pass here
beresp.ttl
!
• The s-maxage variable in the Cache-Control response header
• The max-age variable in the Cache-Control response header
• The Expires response header
• The default_ttl parameter
Before Varnish runs vcl_fetch, the beresp.ttl variable has already been set to a value. It will use the first value it finds among:
set beresp.http.X-Url = req.url; set beresp.http.X-Host = req.http.host; set beresp.http.X-Varnish-Session = regsub(req.http.Cookie,"^.*?PHPSESSID=([^;]*);*.*$", “\1"); if (beresp.status != 200 && beresp.status != 404) { set beresp.ttl = 15s; return (hit_for_pass); } if (beresp.http.Set-Cookie) { remove beresp.http.Set-Cookie; } if (beresp.http.X-Varnish-Esi == "1") { set beresp.do_esi = true; } if (req.url ~ "\.(jpg|jpeg|gif|otf|png|ico|css|zip|tgz|gz|rar|bz2|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|scripts)$"){ set beresp.ttl = 180m; } return (deliver);
HOW DOES IT WORK?
Client request
vcl_recvpass
pipevcl_pipe
vcl_pass
vcl_hit
lookup
vcl_miss
vcl_hash
vcl_deliver vcl_fetch
Backend1
Backend2
pass
pipevcl_error
fetch
vcl_deliver• Takes control just before a response is sent to the client
• We have the req and resp objects
• Executes after hit, miss and fetch, hit_for_pass or pass (but not pipe)
• Removal of all headers we set during the VCL flow is a good idea here
• We can also add headers here that should go to the client, but shouldn’t be in the cache
if (req.http.X-Varnish-Fake-Session) { call generate_session_expires; set resp.http.Set-Cookie = req.http.X-Varnish-Fake-Session + "; expires=" + resp.http.X-Varnish-Cookie-Expires + "; path=/"; if (req.http.Host) { set resp.http.Set-Cookie = resp.http.Set-Cookie + "; domain=" + regsub(req.http.Host, ":\d+$", ""); } set resp.http.Set-Cookie = resp.http.Set-Cookie + "; httponly"; unset resp.http.X-Varnish-Cookie-Expires; } if (!client.ip ~ debug) { unset resp.http.X-Host; unset resp.http.X-Url; unset resp.http.X-Varnish-Session; } else { if (obj.hits > 0) { set resp.http.X-Cache = "HIT"; } else { set resp.http.X-Cache = "MISS"; } } !return (deliver);
INVALIDATING CACHED OBJECTS• We can control cached objects through http requests to varnish with
some clever VCL-ing
• PURGE - we can purge a single object from the cache
• BAN - we can ban a selection of matching objects from the cache
• REFRESH - we can fetch a new copy of an object whole the old one is still served in the meantime
sub vcl_recv { if (req.request == "PURGE") { if (!client.ip ~ purge) { error 405 "Not allowed."; } return(lookup); } } !
sub vcl_hit { if (req.request == "PURGE") { purge; error 200 "Purged"; } } !
sub vcl_miss { if (req.request == "PURGE") { error 404 "Not in cache"; } }
$cacheServerSocket = fsockopen($varnishHostname, 80, $errno, $errstr, 2); !
$request = "PURGE /something.htm HTTP/1.0\r\n”; $request .= "Host: www.varnished-site.com\r\n”; $request .= "Connection: Close\r\n\r\n”; !
fwrite($cacheServerSocket, $request); $response = fgets($cacheServerSocket); fclose($cacheServerSocket);
sub vcl_recv { if (req.request == "BAN") { if (!client.ip ~ purge) { error 405 "Not allowed."; } ban("obj.http.X-Host ~ " + req.http.host + " && obj.http.X-Url ~ " + req.url); error 200 "Bannerd"; } }
sub vcl_recv { if (req.request == "REFRESH") { if (!client.ip ~ purge) { error 405 "Not allowed."; } set req.request = "GET"; set req.hash_always_miss = true; } }
COMMON PROBLEMS TO OVERCOME• My pages are mix of highly dynamic sections and mostly static stuff, and Varnish supposedly only caches whole
pages => Use ESI
• I need to control/flush/refresh the cache without stoping/starting/killing/rebooting/pulling the cord/assaulting the datacenter and I prefer to do it from within my app => Set up PURGE/BAN/REFRESH in the VCL
• My visitors have unique stuff => Use the session cookie in the vcl_hash to keep unique copy
• Sessions => Use the generate session in Varnish trick
• Cookies => Uhhh, don't use em?
• Statistics and tracking visitors => Use the memcached VMOD and process stuff asynch on the backend