ci_conf 2012: scaling - chris miller

Post on 14-Aug-2015

250 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Going Big: Scalability

Who am I?

• Chris Miller

• Huffington Post - Senior Developer• CMS platform and API

• Started in systems/network admin before code

What is Huffington Post?

• #87 most popular site in the world (Alexa)• #3 most popular news site in world (Alexa)• #19 most popular US site (Alexa)

• More traffic than nytimes.com

Our Platform: Today

• Everything! No, really.

• Perl: CMS core• PHP “layer” integrated on top of Perl code• MySQL data storage• MongoDB for comments storage• Hadoop for internal statistical analysis• Memcache for lightweight caching• Redis for more structured data types• Varnish for caching!

Our Platform: Tomorrow

• Re-think tools and platform from ground up• Building new API

– Yes, OAuth 2.0!– Complete REST approach– Will be public!

• We can’t re-write everything at once, so the API build has 4 phases:– Build “bridge” middleware to allow access to existing functionality – Refactor backend edit/admin tools– Refactor frontend to use API– Transparently, and calmly, refactor old code while maintaining API

interfaces

So what about CI?

• New API is built on CodeIgniter– Using Phil’s REST library as a starting point

• Thanks Phil!

• Backend editorial tools are being built on CI

• We love CI– But it isn’t our only framework– Different tools work better for different teams– We use what works. You should too.

How we scale

• CDN: Akamai• 80%+ hit rate• Amazon S3 for origin of static files

• Basic page layout/content is generated to flat file• These contain some dynamic content, in PHP• By having the basic page as a flat file, it's less overhead to

load• It also means for certain changes, we have to "regenerate"

the page. Ugh.

Varnish

• HTTP caching reverse proxy (“HTTP Accelerator”)

• Caching layer in front of your web server

• Stores complete responses in memory• If request exists, serves from memory– Otherwise, forwards to web server, and then caches

• Works nicely with Linux Kernel to delegate memory allocation and management to the OS, where it belongs

Controlling Varnish• Set custom TTLs for content:if (beresp.http.X-HP-Cache-Control ~ "s-maxage") {

set beresp.http.X-HP-Cache-Control = regsub(beresp.http.X-HP-Cache-Control, "^.*s-maxage=([0-9]+).*", "\1"); // set the ttl. C{ char *ttl; ttl = VRT_GetHdr(sp, HDR_BERESP, "\023X-HP-Cache-Control:"); VRT_l_beresp_ttl(sp, atoi(ttl)); }C set beresp.http.X-Cacheable = "CUSTOM: " + beresp.ttl ;

} elsif (beresp.http.X-HP-Cache-Control ~ "(no-cache|private)" || beresp.http.pragma ~ "no-cache") {

set beresp.ttl = 0s; set beresp.http.X-Cacheable = "NO-CACHE";

} else {

set beresp.http.X-Cacheable = "DEFAULT: 30s"; set beresp.ttl = 30s;

}

Controlling Varnish

• Refreshing content

sub process_refresh_requests {

if (req.request == "REFRESH") { set req.request = "GET"; set req.hash_always_miss = true; }

}

• This is invoked early in the vcl_recvvcl_recv method

Edge Side Includes• Include cached content blocks into pages

<html><body>

<esi:include src="http://example.com/my_page1.html” alt="http://example.com/my_page2.html" onerror="continue” />

</body></html>

Edge Side Includes

• How to use ESI:– Make complicated blocks independently-

accessible URIs– Create a “template” file with ESI includes to bring

the page together• Why this is powerful– If multiple pages use different combinations of

page components, some may already be cached– Reduces amount of times entire page must be

served; Serve only components needed

Varnish Tricks

• Intelligently purge the cache when your content changes– Allows you to increase TTL without fear of caching

outdated content

if (req.request == "PURGE") { if (!client.ip ~ purgers) { error 405 "Method not allowed"; } return (lookup);}

Other Scaling Tips

• Hardware SSL offloading is your friend• Consider mod_php– CGI has huge overhead– CGI/SuExec has huge security advantages– FastCGI is a happy-medium for some

Other Scaling Tips

• Don’t try to do everything on one server/cluster– Splitting your application is ok– 1 cluster for frontend, 1 server/cluster for backend, etc.

• Keep an open mind about technologies, platforms, and tools

One More Thing…

(sorry, I couldn’t resist)

Guilds!• What a guild is:– Groups of people around a topic– Membership/participating is encouraged, but not

required– Think of it as an internal Meetup

• Join to learn new things• Join to talk about things you are interested in

• Examples: PHP, Front End, Python, Ruby, Management, Platform/Architecture, Big Data, etc…

Guilds!

• Experts to solve technology-specific problems– Example: Front-end swat team to improve page load

time due to slow/too much JS

• Collectively give back to the community around your technology

• Help others learn, and learn from others

• Meet people on other teams

Guilds!

• Try it out

¿Preguntas?

Questions?

Perguntas?

Chris Miller

chris.miller@huffingtonpost.com

@ee99ee

(P.S. – We’re hiring in NYC)

top related