choosing a proxy - don’t roll the d20!

61
Choosing a Proxy - Don’t roll the D20! Leif Hedstrom Cisco WebEx

Upload: carlo

Post on 24-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Choosing a Proxy - Don’t roll the D20!. Leif Hedstrom Cisco WebEx. Who am I?. Unix developer since 1985 Yeah, I’m really that old, I learned Unix on BSD 2.9 Long time SunOS/Solaris/Linux user Mozilla committer (but not active now) VP of Apache Traffic Server PMC ASF member - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Choosing a Proxy -      Don’t roll the D20!

Choosing a Proxy-

Don’t roll the D20!

Leif HedstromCisco WebEx

Page 2: Choosing a Proxy -      Don’t roll the D20!

Who am I?• Unix developer since 1985

• Yeah, I’m really that old, I learned Unix on BSD 2.9• Long time SunOS/Solaris/Linux user

• Mozilla committer (but not active now)• VP of Apache Traffic Server PMC• ASF member• Overall hacker, geek and technology addict

[email protected]@zwoop

+lhedstrom

Page 3: Choosing a Proxy -      Don’t roll the D20!

So which proxy cache should you choose?

Page 4: Choosing a Proxy -      Don’t roll the D20!

Plenty of Proxy Servers

PerlBal

Page 5: Choosing a Proxy -      Don’t roll the D20!

And plenty of “reliable” sources…

Page 6: Choosing a Proxy -      Don’t roll the D20!

Answer: the one that solves your problem!

http://mihaelasharkova.files.wordpress.com/2011/05/5steploop2.jpg

Page 7: Choosing a Proxy -      Don’t roll the D20!

But first…

• While you are still awake, and the coffee is fresh:

My crash course in HTTP proxy and caching!

Page 8: Choosing a Proxy -      Don’t roll the D20!

Forward Proxy

Page 9: Choosing a Proxy -      Don’t roll the D20!

Reverse Proxy

Page 10: Choosing a Proxy -      Don’t roll the D20!

Intercepting Proxy

Page 11: Choosing a Proxy -      Don’t roll the D20!

Why Cache is King

• The content fastest served is the data the user already has locally on his computer/browser– This is near zero cost and zero latency!

• The speed of light is still a limiting factor– Reduce the latency -> faster page loads

• Serving out of cache is computationally cheap– At least compared to e.g. PHP or any other higher

level page generation system– It’s easy to scale caches horizontally

Page 12: Choosing a Proxy -      Don’t roll the D20!

Choosing an intermediary

Page 13: Choosing a Proxy -      Don’t roll the D20!

Plenty of Proxy Servers

PerlBal

Page 14: Choosing a Proxy -      Don’t roll the D20!

Plenty of Free Proxy Servers

PerlBal

Page 15: Choosing a Proxy -      Don’t roll the D20!

Plenty of Free Proxy Servers

PerlBal

Page 16: Choosing a Proxy -      Don’t roll the D20!

Plenty of Free Caching Proxy Servers

Page 17: Choosing a Proxy -      Don’t roll the D20!

Choosing an intermediary

Page 18: Choosing a Proxy -      Don’t roll the D20!

The problem

• You can basically not buy a computer today with less than 2 CPUs or cores

• Things will only get “worse”!– Well, really, it’s getting better

• Typical server deployments today have at least 8 – 16 cores– How many of those can you actually use??– And are you using them efficiently??

• NUMA turns out to be kind of a bitch…

Page 19: Choosing a Proxy -      Don’t roll the D20!

Solution 1: Multi-threading

Page 20: Choosing a Proxy -      Don’t roll the D20!

Problems with multi-threading

• It’s a wee bit difficult to get it right!

http://www.flickr.com/photos/stuartpilbrow/3345896050/

Page 21: Choosing a Proxy -      Don’t roll the D20!

Problems with multi-threading

Page 22: Choosing a Proxy -      Don’t roll the D20!

Solution 2: Event Processing

Page 23: Choosing a Proxy -      Don’t roll the D20!

Problems with Event Processing

• It hates blocking APIs and calls!– Hating it back doesn’t help :/

• Still somewhat complicated• It doesn’t scale on SMP by

itself

Page 24: Choosing a Proxy -      Don’t roll the D20!

Where are we at ?Apache TS Nginx Squid Varnish

Processes 1 1 - <n> 1 - <n> 1

Threads Based on cores 1 1 Lots

Evented Yes Yes Yes Yes *)

*) Can use blocking calls, with (large) thread pool

Page 25: Choosing a Proxy -      Don’t roll the D20!

Proxy Cache test setup• AWS Large instances, 2 CPUs• All on RCF 1918 network (“internal” net)• 8GB RAM• Access logging enabled to disk (except on Varnish)• Software versions

– Linux v3.2.0– Traffic Server v3.3.1– Nginx v1.3.9– Squid v3.2.5– Varnish v3.0.3

• Minimal configuration changes• Cache a real (Drupal) site

Page 26: Choosing a Proxy -      Don’t roll the D20!

ATS configuration

• etc/traffficserver/remap.config:

map / http://10.118.154.58• etc/trafficserver/records.config:

CONFIG proxy.config.http.server_ports STRING 80

Page 27: Choosing a Proxy -      Don’t roll the D20!

Nginx configuration try 1, basically defaults (broken, don’t use)

worker_processes 2;access_log logs/access.log main;

proxy_cache_path /mnt/nginx_cache levels=1:2 keys_zone=my-cache:8m \ max_size=16384m inactive=600m;proxy_temp_path /mnt/nginx_temp;

server { listen 80;

location / { proxy_pass http://10.83.145.47/; proxy_cache my-cache;}

Page 28: Choosing a Proxy -      Don’t roll the D20!

Nginx configuration try 2 (works but really slow, 10x slower)

worker_processes 2;access_log logs/access.log main;

proxy_cache_path /mnt/nginx_cache levels=1:2 keys_zone=my-cache:8m \ max_size=16384m inactive=600m;proxy_temp_path /mnt/nginx_temp;

gzip on;server { listen 80;

location / { proxy_pass http://10.83.145.47/; proxy_cache my-cache; proxy_set_header Accept-Encoding "";}

Page 29: Choosing a Proxy -      Don’t roll the D20!

Nginx configuration try 3 (works and reasonably fast, but WTF!)

worker_processes 2;access_log logs/access.log main;

proxy_cache_path /mnt/nginx_cache levels=1:2 keys_zone=my-cache:8m \ max_size=16384m inactive=600m;proxy_temp_path /mnt/nginx_temp;

server { listen 80; set $ae ""; if ($http_accept_encoding ~* gzip) { set $ae "gzip"; }

location / { proxy_pass http://10.83.145.47/; proxy_cache my-cache; proxy_set_header If-None-Match ""; proxy_set_header If-Modified-Since ""; proxy_set_header Accept-Encoding $ae; proxy_cache_key $uri$is_args$args$ae; }

location ~ /purge_it(/.*) { proxy_cache_purge example.com $1$is_args$args$myae }

Thanks to Chris Ueland at NetDNA for the snippet

Page 30: Choosing a Proxy -      Don’t roll the D20!

Squid configurationhttp_port 80 accelhttp_access allow allcache_mem 4096 MBworkers 2memory_cache_shared oncache_dir ufs /mnt/squid 100 16 256cache_peer 10.83.145.47 parent 80 0 no-query originserver

Page 31: Choosing a Proxy -      Don’t roll the D20!

Varnish configuration

backend default { .host = "10.83.145.47”; .port = "80";}

Page 32: Choosing a Proxy -      Don’t roll the D20!

Performance AWS 8KB HTML (gzip)

Page 33: Choosing a Proxy -      Don’t roll the D20!

Performance AWS 8KB HTML (gzip)

Page 34: Choosing a Proxy -      Don’t roll the D20!

Performance AWS 500 bytes JPG

Page 35: Choosing a Proxy -      Don’t roll the D20!

Performance AWS 500 bytes JPG

Page 36: Choosing a Proxy -      Don’t roll the D20!

Choosing an intermediary

Page 37: Choosing a Proxy -      Don’t roll the D20!

RFC 2616 is not optional!

• Neither is the new BIS revision!• Understanding HTTP and how it relates to

Proxy and Caching is important– Or you will get it wrong! I promise.

Page 38: Choosing a Proxy -      Don’t roll the D20!

How things can go wrong: Vary!$ curl -D - -o /dev/null -s --compress http://10.118.73.168/HTTP/1.1 200 OKServer: nginx/1.3.9Date: Wed, 12 Dec 2012 18:00:48 GMTContent-Type: text/html; charset=utf-8Content-Length: 8051Connection: keep-aliveX-Powered-By: PHP/5.4.9X-Drupal-Cache: HITEtag: "1355334762-0-gzip"Content-Language: enX-Generator: Drupal 7 (http://drupal.org)Cache-Control: public, max-age=900Last-Modified: Wed, 12 Dec 2012 17:52:42 +0000Expires: Sun, 19 Nov 1978 05:00:00 GMTVary: Cookie,Accept-EncodingContent-Encoding: gzip

Page 39: Choosing a Proxy -      Don’t roll the D20!

How things can go wrong: Vary!$ curl -D - -o /dev/null -s http://10.118.73.168/HTTP/1.1 200 OKServer: nginx/1.3.9Date: Wed, 12 Dec 2012 18:00:57 GMTContent-Type: text/html; charset=utf-8Content-Length: 8051Connection: keep-aliveX-Powered-By: PHP/5.4.9X-Drupal-Cache: HITEtag: "1355334762-0-gzip"Content-Language: enX-Generator: Drupal 7 (http://drupal.org)Cache-Control: public, max-age=900Last-Modified: Wed, 12 Dec 2012 17:52:42 +0000Expires: Sun, 19 Nov 1978 05:00:00 GMTVary: Cookie,Accept-EncodingContent-Encoding: gzip EPIC FAIL!

Note: no gzip support

Page 40: Choosing a Proxy -      Don’t roll the D20!

What type of proxy do you need?

• Of our candidates, only two fully supports all proxy modes!

Page 41: Choosing a Proxy -      Don’t roll the D20!

CoAdvisor HTTP protocol quality tests for reverse proxies

49%

81%

51%

68%

Page 42: Choosing a Proxy -      Don’t roll the D20!

CoAdvisor HTTP protocol quality tests for reverse proxies

25%

6%

27%

15%

Page 43: Choosing a Proxy -      Don’t roll the D20!

Choosing an intermediary

Page 44: Choosing a Proxy -      Don’t roll the D20!

My subjective opinions

Page 45: Choosing a Proxy -      Don’t roll the D20!

ATS – The good

• Good HTTP/1.1 support, including SSL• Tunes itself very well to the system / hardware

at hand• Excellent cache features and performance

– Raw disk cache is fast and resilient• Extensible plugin APIs, quite a few plugins• Used and developed by some of the largest

Web companies in the world

Page 46: Choosing a Proxy -      Don’t roll the D20!

ATS – The bad

• Load balancing is incredibly lame• Seen as difficult to setup (I obviously disagree)• Developer community is still too small• Code is complicated

– By necessity? Maybe …

Page 47: Choosing a Proxy -      Don’t roll the D20!

ATS – The ugly

• Too many configuration files!• There’s still legacy code that has to be

replaced or removed• Not a whole lot of commercial support

– But there’s hope (e.g. OmniTI recently announced packaged support)

Page 48: Choosing a Proxy -      Don’t roll the D20!

Nginx – The good

• Easy to understand the code base, and software architecture– Lots of plugins available, including SPDY

• Excellent Web and Application server– E.g. Nginx + fpm (fcgi) + PHP is the awesome,

according to a very reputable source• Commercial support available from the people

who wrote and know it best. Huge!

Page 49: Choosing a Proxy -      Don’t roll the D20!

Nginx – The bad

• Adding extensions implies rebuilding the binary

• By far the most configurations required “out of the box” to even do anything remotely useful

• It does not make good attempts to tune itself to the system

• No good support for conditional requests

Page 50: Choosing a Proxy -      Don’t roll the D20!

Nginx – The ugly

• The cache is a joke! Really• The protocol support as an HTTP proxy is

rather poor. It fares the worst in the tests, and can be outright wrong if you are not very careful

• From docs: “nginx does not handle "Vary" headers when caching.” Seriously?

Page 51: Choosing a Proxy -      Don’t roll the D20!

Squid – The Good

• Has by far the most HTTP features of the bunch. I mean, by far, nothing comes even close

• It also is the best HTTP conformant proxy today. It has the best scores in the CoAdvisor tests, by a wide margin

• The features are mature, and used pretty much everywhere

• Works pretty well out of the box

Page 52: Choosing a Proxy -      Don’t roll the D20!

Squid – The Bad

• Old code base• Cache is not particularly efficient• Has traditionally been prone to instability• Complex configurations

– At least IMO, I hate it

Page 53: Choosing a Proxy -      Don’t roll the D20!

Squid – The Ugly

• SMP is quite an afterthought– Duct tape

• Why spend so many years rewriting from v2.x to v3.x without actually addressing some of the real problems? Feels like a boat has been missed…

• Not very extensible– Typically you write external “helper” processes, similar

to fcgi. This is not particularly flexible, nor powerful (can not do everything you’d want as a helper, so might have to rewrite the Squid core)

Page 54: Choosing a Proxy -      Don’t roll the D20!

Varnish – The Good

• VCL• And did I mention VCL? Pure genius!• Very clever logging mechanism• ESI is cool, even with its limited subset

– Not unique to Varnish though• Support from several good commercial

entities

Page 55: Choosing a Proxy -      Don’t roll the D20!

Varnish – The Bad

• Letting the kernel do the hard work might seem like a good idea on paper, but perhaps not so great in the real world. But lets not go into a BSD vs Linux kernel war …

• Persistent caching seems like an after thought at best

• No good support for conditional requests• What impact does “real” logging have on

performance?

Page 56: Choosing a Proxy -      Don’t roll the D20!

Varnish – The Ugly

• There are a lot of threads in this puppy!• No SSL. And presumably, there never will be?

– So what happens with SPDY / HTTP2 ?• Protocol support is weak, without a massive

amount of VCL.• And, you probably will need a PhD in VCL!

– There’s a lot of VCL hacking to do to get it to behave well

Page 57: Choosing a Proxy -      Don’t roll the D20!
Page 58: Choosing a Proxy -      Don’t roll the D20!

Summary

• Please understand your problem`– Don’t listen to @zwoop on twitter…

• Performance in itself is rarely a key differentiator; latency, features and correctness are

• But most important, use a proxy, preferably a good one, if you run a serious web server

Page 59: Choosing a Proxy -      Don’t roll the D20!

Performance AWS 8KB HTML (gzip)

Page 60: Choosing a Proxy -      Don’t roll the D20!

If it ain’t broken, don’t fix itBut by all means, make it less sucky!

Page 61: Choosing a Proxy -      Don’t roll the D20!

However, when all you have is a hammer…

http://www.flickr.com/photos/aai/6936657289/