june8 presentation
TRANSCRIPT
Varnish – A brief introduction
Nicolas A. Bérard-NaultJune 15, 2011
Regular page view
Reverse proxy cached page view
So what is Varnish ?-Reverse proxy cache-Designed from the ground up to be an HTTP accelerator solution
We will cover-Default configuration and options-ESI-HTTP headers-Keezmovies.com - Benchmarks - Use case - Problems & solutions
Configuring VarnishVarnish uses a configuration file compiled to C on the fly and included as a shared library. The configuration format is called the VCL (Varnish Configuration Language), a domain specific language reminescent of Perl.
If the VCL is not enough, you can configure using inline C and the VRT (Varnish Run Time) library.
For a full reference: http://www.varnish-cache.org/docs/2.1/tutorial/vcl.html
Step by step through the configurationBack end definitions
backend www { .host = "www.example.com"; .port = "http"; .connect_timeout = 1s; .first_byte_timeout = 5s; .between_bytes_timeout = 2s;.probe = {
.url = "/test.jpg";
.timeout = 0.3 s;
.window = 8;
.threshold = 3; }
}
You can have as many backends as you want
Step by step through the configurationDirector definitions
director www_director random { { .backend = www1; .weight = 2; } { .backend = www2; .weight = 1; }
}
director www_director round-robin { { .backend = www1; } { .backend = www2; }
}
You can have as many directors as you want
Highly simplified flow chart of Varnishoperations
Step by step through the configurationrecv: connection is received
sub vcl_recv { if (req.restarts == 0) { if (req.http.x-forwarded-for) { set req.http.X-Forwarded-For = req.http.X-Forwarded-For + ", " + client.ip; } else { set req.http.X-Forwarded-For = client.ip; } } if (req.http.Authorization || req.http.Cookie) { /* Not cacheable by default */ return (pass); } if (req.request != "GET" && req.request != "HEAD") { /* We only deal with GET and HEAD by default */ return (pass); } return (lookup);}
Be careful with your HTTP verbs…
Verb PotencyGET NullipotentPOST Non-idempotentPUT IdempotentDELETE Idempotent
But we always cheat…
vcl_hash
vcl_hash: create object hash for requestsub vcl_hash {
hash_data(req.url);if (req.http.host) {
hash_data(req.http.host);} else {
hash_data(server.ip);}return (hash);
}
vcl_hit, vcl_miss
vcl_pass: request not cacheable
vcl_miss: post-lookup object does not exist in cache
vcl_hit: post-lookup, object exists in cache
sub vcl_pass {return (pass);
}
sub vcl_miss {return (fetch);
}
sub vcl_hit {return (deliver);
}
vcl_fetch
sub vcl_fetch {if (beresp.ttl <= 0s ||
beresp.http.Set-Cookie || beresp.http.Vary == "*") {
set beresp.ttl = 120 s; return (hit_for_pass);}return (deliver);
}
vcl_fetch: post object fetched from back-end
vcl_fetch
Step by step through the configurationvcl_deliver: object is to be delivered to client
sub vcl_deliver {return (deliver);
}
ESI (edge-side include)
Invented by Akamai, only a subset is supported by Varnish
Varnish supports include:<div>Hello:<esi:include src=“/getname.php“ /></div>
Will be processed into:<div>Hello:Roger Cyr</div>
ESI (edge-side include)To enable ESI processing, used the esi keyword in vcl_fetch.
ESI and gzipVarnish WILL NOT be able to do ESI processing on gzip’ed backend responses. It will also not be able to do ungzip an ESI response.
In all cases, ESIs and gzip are not a good mix. Better support is planned for Varnish 3.0.
HTTP headersVarnish relies on HTTP headers to know what to cache and for how long.
This is done through the Cache-Control HTTP header.
Cache-Control: 30Cache-Control: max-age=900Cache-Control: no-cacheCache-Control: must-revalidate
Read the HTTP RFC !http://tools.ietf.org/html/rfc2616#section-14.9
keezmovies.com
keezmovies.com- Average of 13 million hits per day (~ 150 queries per second)- Homepage gets a large part of the hits (~35%, ~53 queries per second)- Logged in traffic is a very, very, very small minority
Perfect candidate for full page caching
Some results for KM
Tested four configurations:1)Apache + PHP2)Apache + PHP + APC3)Lighttpd + PHP + APC4)Varnish
- Homepage (size = 90k, gzipped = 10k).- Tested using Apache Benchmark withIncreasing concurrency.
But…1) Content differs slightly for certain countries
(notoriously, Germany)2) Google Analytics cookies3) And of course, not all GET requests are
nullipotent
The good news is, two of these three problems are easily tackable !
Problem #1: GeolocalizationEssentially, each page has 2 versions:1) German visitor & disclaimer not accepted2) Rest of the world & German visitor who accepted
disclaimer
__attribute__((constructor)) voidload_module(){ /* … */ handle = dlopen(“/usr/lib/varnish/geoip.so”, RTLD_NOW); if (handle != NULL) { get_country_code = dlsym(handle, “get_country_code”); }}}C
sub vcl_recv { C{ char *cc = (*get_country_code)(VRT_IP_string(sp, VRT_r_client_ip(sp))); VRT_SetHdr(sp, HDR_REQ, "\017X-Country-Code:", cc, vrt_magic_string_end); }C
if (req.http.Cookie ~ "age_verified.*" ) { set req.http.X-Age-Verified = "1"; } else { set req.http.X-Age-Verified = "0"; }}
The following code is added to vcl_recv
The PHP page is responsible for setting the age_verified cookie oncethe disclaimer is accepted
sub vcl_hash { if (req.http.x-country-code=="DE" && req.http.x-age-verified == "0") {
set req.hash += req.http.x-age-verified; set req.hash += req.http.x-country-code; }}
You can download the Varnish GeoIP library here: http://www.varnish-cache.org/trac/wiki/GeoipUsingInlineC
It uses the Maxmind GeoIP library.
The following code is added to vcl_hash
Problem #2: Google Analytics cookiesub vcl_recv { if (req.http.Cookie) { if (req.http.Cookie ~ "user_cookie.*" ) {
return( pass); }
remove req.http.Cookie; } }
This removes all cookies except the ones we know to be useful
Problem #3: GET requests with side effects
JSON UDP packets
Stats server- Nodejs server, communicating with database directly (could be communicating with website through API)- Does batch queries- Can handle and aggregate requests from many Varnish servers at the same time-Bonus: can be used for many, many, many other things….
Core: http://github.com/nicobn/AlysObserverVarnish module: http://github.com/nicobn/AlysVarnish
Side note: Your TTL is too highKeezMovies: 53qps on home pageRapidly decreasing marginal utility
0.10.25 0.5 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60
120240
6003600
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
% of saved requests in an hour in function of TTL
% o
f sac
ved
requ
ests
TTL (s)
Dr. Strangelove or how I learned to stop worrying and love low TTLs
Questions ?