saving the world from guaranteed apocalypse* using varnish and memcached

SAVING THE WORLDFrom guaranteed APOCALYPSE*

using varnish, memcached, and some other stuff

* apocalypse not really guaranteed

WHAT IS CACHING?

WHY NOT DOING CACHING IS BAD?

• Keep executing the same code with the same data

• Waste computing power getting the same result

• That power is probably generated by burning coal*

• Burning stuff produces tons of CO2**

* it most likely is not ** probably a smaller unit of mass

THE EARTH EXPLODE*Too much CO2 will make

* based on pure speculation

WHY SHOULD YOU CARE?• Your web apps will become WAY faster

• Users and search engines will like you MORE

• You will use A LOT less hardware resources

• You will generate LESS CO2 and/or save $$$

• The Earth will NOT explode and/or you’ll have more $$$

• Women like people who save the world and/or have $$$

• And lots of other stuff** 0 or greater amount of other stuff

ABOUT TTL

WHY YOU SHOULD AVOID USING TTL

• You might use obsolete data

• Your server might get a cache stampede and go down

• You should PUSH the fresh data in your cache as soon as you have it, BEFORE the old one has expired from the cache

WAIT, WHAT IS A CACHE STAMPEDE?

requ

ests

seconds


1.A critical piece of your cached data expired through TTL (or is evicted)

requ

ests

seconds


2. A client requests a service which relies on that data

requ

ests

seconds


3. That data takes relatively long time to compute

requ

ests

seconds


4. Other requests come that need the same data

requ

ests

seconds


5. A lot of them stack on the server before the first one is even finished

requ

ests

seconds

503 SERVICE UNAVAILABLE

I DONT WANT THAT!No you don’t!

MEMCACHED

HOW DO I CACHE THINGS?

1. Create a Memcached instance $memcached = new Memcached;$memcached->addServers( $memcachedServers );

2. Put data in $memcached->set( $key, $value, $expireAt );

3. Get data out $memcached->get( $key );

A SIMPLE BENCHMARK

HELPFUL TIPS• It’s best if you cache the final result of an operation rather than the entry data

• You should always have a fallback if you get a cache miss

• Try to avoid flushing the entire cache, use clever key names instead

• Use Memcached::getAllKeys() to help you manage/release/update data

• Use Memcached::stats() to help you improve efficiency

• Have a warmup script!

WHAT TO CHECK IN STATS()

… … [“get_hits”]=>int(110825125) [“get_misses”]=>int(17396765) [“evictions”]=>int(0) … …

VARNISH

VARNISH IS:

• A caching HTTP reverse proxy

• Really, really really FAST

• Usually limited by the speed of the network

• Has decent flexibility with VCL configuration language

A SIMPLE BENCHMARK

NICE SPEEDNow lets see how to use Varnish effectively on my very dynamic site

COMMON PROBLEMS TO OVERCOME• My pages are mix of highly dynamic sections and mostly static stuff, and Varnish supposedly only caches

whole pages

• I need to control/flush/refresh the cache without stoping/starting/killing/rebooting/pulling the cord/assaulting the datacenter and I prefer to do it from within my app

• My visitors have unique stuff

• Sessions

• Cookies

• Statistics and tracking visitors

ABOUT ESI

• Edge Side Includes or ESI is a small markup language for edge level dynamic web content assembly. The purpose of ESI is to tackle the problem of web infrastructure scaling.

<HTML> <BODY> … <esi:include src=“/esi/private/recentproducts“/> … </BODY> </HTML>

Doesn't change at all

2-4minutes

Doesn't change at all

24h

1minute

1 hour

session specific

session specific

SETTING UP BACKENDS

backend www { .host = “192.168.0.2”; .port = “81”; .connect_timeout = 1s; .first_byte_timeout = 5s; .between_bytes_timeout = 2s; }

HOW DOES IT WORK?

Client request

vcl_recvpass

pipevcl_pipe

vcl_pass

vcl_hit

lookup

vcl_miss

vcl_hash

vcl_deliver vcl_fetch

Backend1

Backend2

pass

pipevcl_error

fetch

vcl_recv• First checkpoint when a request arrives and is parsed

• We must decide whether to lookup, pass or pipe the request

• We can choose a backend to use

• We have the req object

• Definition of PURGE, BAN or REFRESH like requests is here

• We can set a header in the req object to tell our backend the request is from varnish

set req.backend = default; set req.http.X-Varnish-Handshake = “1”; set req.http.X-Forwarded-For = client.ip; !

if (req.url ~ "/esi/") { set req.http.X-Varnish-Esi = regsub(req.url, ".esi/(\w+)/.*", "\1"); remove req.http.Accept-Encoding; } if (req.request != "GET" && req.request != "HEAD") { # We only deal with GET and HEAD by default return (pass); } if (req.http.Cookie !~ “PHPSESSID="){ call generate_session; } return (lookup);

WAIT, WHAT?sub generate_session { C{ char uuid_buf [50]; generate_uuid(uuid_buf); VRT_SetHdr(sp, HDR_REQ, "\030X-Varnish-Fake-Session:", uuid_buf, vrt_magic_string_end ); }C ! if (req.http.Cookie) { set req.http.Cookie = req.http.X-Varnish-Fake-Session + "; " + req.http.Cookie; } else { set req.http.Cookie = req.http.X-Varnish-Fake-Session; } }

C{ #include <stdlib.h> #include <stdio.h> #include <time.h> #include <pthread.h> ! static pthread_mutex_t lrand_mutex = PTHREAD_MUTEX_INITIALIZER; ! void generate_uuid(char* buf) { pthread_mutex_lock(&lrand_mutex); long a = lrand48(); long b = lrand48(); long c = lrand48(); long d = lrand48(); pthread_mutex_unlock(&lrand_mutex); sprintf(buf, "PHPSESSID=%08lx%04lx%04lx%04lx%04lx%08lx", a, b & 0xffff, (b & ((long)0x0fff0000) >> 16) | 0x4000, (c & 0x0fff) | 0x8000, (c & (long)0xffff0000) >> 16, d ); return; } }C

HOW DOES IT WORK?

Client request

vcl_recvpass

pipevcl_pipe

vcl_pass

vcl_hit

lookup

vcl_miss

vcl_hash


Backend1

Backend2

pass

pipevcl_error

fetch

vcl_hash

• Generates the hash through which Varnish looks up an object

• We have the req object

• We can make certain objects unique in the cache based on something more than just the url - like a session cookie.

hash_data(req.url); if (req.http.host) { hash_data(req.http.host); } else { hash_data(server.ip); } !

if (req.http.Accept-Encoding) { hash_data(req.http.Accept-Encoding); } !

if (req.http.X-Varnish-Esi == "private" && req.http.Cookie ~ "PHPSESSID=") { hash_data(regsub(req.http.Cookie, "^.*?PHPSESSID=([^;]*);*.*$", "\1")); } !

return (hash);

HOW DOES IT WORK?

Client request

vcl_recvpass

pipevcl_pipe

vcl_pass

vcl_hit

lookup

vcl_miss

vcl_hash


Backend1

Backend2

pass

pipevcl_error

fetch

vcl_fetch• Takes control when a response from the backend is fetched and parsed

• We have the req and beresp objects

• A good place to sanitise the backend response and control TTL

• Removal of Set-Cookie header is a good practice here

• Add helper headers to the cached object for the ban lurker

• We can choose to deliver or hit_for_pass here

beresp.ttl

!

• The s-maxage variable in the Cache-Control response header

• The max-age variable in the Cache-Control response header

• The Expires response header

• The default_ttl parameter

Before Varnish runs vcl_fetch, the beresp.ttl variable has already been set to a value. It will use the first value it finds among:

set beresp.http.X-Url = req.url; set beresp.http.X-Host = req.http.host; set beresp.http.X-Varnish-Session = regsub(req.http.Cookie,"^.*?PHPSESSID=([^;]*);*.*$", “\1"); if (beresp.status != 200 && beresp.status != 404) { set beresp.ttl = 15s; return (hit_for_pass); } if (beresp.http.Set-Cookie) { remove beresp.http.Set-Cookie; } if (beresp.http.X-Varnish-Esi == "1") { set beresp.do_esi = true; } if (req.url ~ "\.(jpg|jpeg|gif|otf|png|ico|css|zip|tgz|gz|rar|bz2|pdf|txt|tar|wav|bmp|rtf|js|flv|swf|scripts)$"){ set beresp.ttl = 180m; } return (deliver);

HOW DOES IT WORK?

Client request

vcl_recvpass

pipevcl_pipe

vcl_pass

vcl_hit

lookup

vcl_miss

vcl_hash


Backend1

Backend2

pass

pipevcl_error

fetch

vcl_deliver• Takes control just before a response is sent to the client

• We have the req and resp objects

• Executes after hit, miss and fetch, hit_for_pass or pass (but not pipe)

• Removal of all headers we set during the VCL flow is a good idea here

• We can also add headers here that should go to the client, but shouldn’t be in the cache

if (req.http.X-Varnish-Fake-Session) { call generate_session_expires; set resp.http.Set-Cookie = req.http.X-Varnish-Fake-Session + "; expires=" + resp.http.X-Varnish-Cookie-Expires + "; path=/"; if (req.http.Host) { set resp.http.Set-Cookie = resp.http.Set-Cookie + "; domain=" + regsub(req.http.Host, ":\d+$", ""); } set resp.http.Set-Cookie = resp.http.Set-Cookie + "; httponly"; unset resp.http.X-Varnish-Cookie-Expires; } if (!client.ip ~ debug) { unset resp.http.X-Host; unset resp.http.X-Url; unset resp.http.X-Varnish-Session; } else { if (obj.hits > 0) { set resp.http.X-Cache = "HIT"; } else { set resp.http.X-Cache = "MISS"; } } !return (deliver);

ACLs

acl purge { "localhost"; "127.0.0.1"; } !

acl debug { "192.168.0.128"; }

INVALIDATING CACHED OBJECTS• We can control cached objects through http requests to varnish with

some clever VCL-ing

• PURGE - we can purge a single object from the cache

• BAN - we can ban a selection of matching objects from the cache

• REFRESH - we can fetch a new copy of an object whole the old one is still served in the meantime

sub vcl_recv { if (req.request == "PURGE") { if (!client.ip ~ purge) { error 405 "Not allowed."; } return(lookup); } } !

sub vcl_hit { if (req.request == "PURGE") { purge; error 200 "Purged"; } } !

sub vcl_miss { if (req.request == "PURGE") { error 404 "Not in cache"; } }

$cacheServerSocket = fsockopen($varnishHostname, 80, $errno, $errstr, 2); !

$request = "PURGE /something.htm HTTP/1.0\r\n”; $request .= "Host: www.varnished-site.com\r\n”; $request .= "Connection: Close\r\n\r\n”; !

fwrite($cacheServerSocket, $request); $response = fgets($cacheServerSocket); fclose($cacheServerSocket);

sub vcl_recv { if (req.request == "BAN") { if (!client.ip ~ purge) { error 405 "Not allowed."; } ban("obj.http.X-Host ~ " + req.http.host + " && obj.http.X-Url ~ " + req.url); error 200 "Bannerd"; } }

sub vcl_recv { if (req.request == "REFRESH") { if (!client.ip ~ purge) { error 405 "Not allowed."; } set req.request = "GET"; set req.hash_always_miss = true; } }

COMMON PROBLEMS TO OVERCOME• My pages are mix of highly dynamic sections and mostly static stuff, and Varnish supposedly only caches whole

pages => Use ESI

• I need to control/flush/refresh the cache without stoping/starting/killing/rebooting/pulling the cord/assaulting the datacenter and I prefer to do it from within my app => Set up PURGE/BAN/REFRESH in the VCL

• My visitors have unique stuff => Use the session cookie in the vcl_hash to keep unique copy

• Sessions => Use the generate session in Varnish trick

• Cookies => Uhhh, don't use em?

• Statistics and tracking visitors => Use the memcached VMOD and process stuff asynch on the backend

OTHER STUFF

QUESTIONS?*

* answers not guaranteed to be available and/or true

saving the world from guaranteed apocalypse* using varnish and memcached

Technology

varnishset req

defaultset req

cache stampede

cache wait

entire cache

cache miss

req object definition

caching http reverse