Transcript
Page 1: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

How Do you Scale for both Predictable and

Unpredictable Events on such a Large Scale?

Surge 2013

Page 2: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

We’re going to talk about this:

Whitney Houston Death: February 11, 2012

Page 3: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

… and this:

Page 4: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Without your site going down…

Page 5: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Who Am I?

• Team Lead of CBC.ca System Administration team.

• Been with CBC for over 11 years (since 2002).

• @blakecrosby

[email protected] / [email protected]

Page 6: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?
Page 7: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?
Page 8: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Let’s go back in time……way back

Page 9: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

2010

Page 10: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

2008

Page 11: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

2007

Page 12: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

2006

Page 13: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

2005

Page 14: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

2004

Page 15: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

2003

Page 16: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

“News stories must appear on the site as fast as possible!”

- Every Journalist at CBC

Page 17: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?
Page 18: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?
Page 19: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?
Page 20: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?
Page 21: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

This architecture doesn’t work for news websites.

Page 22: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

This was an important lesson for CBC

Page 23: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Breaking news trafficIt’s unpredictable and short lived.

Page 24: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

From 12k hit/s to 30k hit/s

Royal Baby: July 22, 2013

Page 25: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

From 1Gbps to 2.5Gbps in ~7min

Boston Marathon Bombing: April 15, 2013

Page 26: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

From 1 Gbps to 14 Gbps in ~10 minutes.

Whitney Houston Death: February 11, 2012

Page 27: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Challenges we (or you) face

Page 28: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Too expensive to build out infrastructure for traffic levels that are sustained < 1% of the year.

Page 29: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Content must be flexible to changing traffic conditions

Page 30: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

We have valuable information that users need in a crisis.

Page 31: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

“News stories must appear on the site as fast as possible!”

- Every Journalist at CBC

Page 32: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

How we fixed this problem(back in 2003, remember?)

Page 33: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?
Page 34: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Save everything to

disk.

Page 35: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Advantages

• Observes the principal of least surprise.

• Fast

• Takes advantages of OS and FS caches

• Easy to turn off certain site features.

Page 36: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?
Page 37: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Using SSIs (Server Side Includes)

• Primitive, but fast and secure.

• Can turn off site features or change look and feel by editing one file.

• All pages are updated instantly, without having to wait for pages to be republished.

Page 38: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Use a Content Delivery Network

Page 39: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Use Conditional GETs (If-Modified-Since)

Page 40: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?
Page 41: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Using Expiry and Validation

• Object has a TTL of 30 Seconds.

• Object hast a last modified time of Jan 1, 2013 00:00:00

• Once TTL has expired, cache/CDN will check if object is updated.

• Origin will return "304 Not Modified" and cache will reset TTL and serve object from cache store.

• The 30 second TTL protects the origin from a deluge of "If modified since" requests.

Page 42: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?
Page 43: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Use Last Mile Acceleration (GZIP Compression)

Page 44: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Use persistent HTTP connections

Page 45: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Use Appropriate Cache TTLs. Keep them simple!

Page 46: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Keep tunable options at the origin

Page 47: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Move personalization to the client

Page 48: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Outcomes(Where we are now in 2013)

Page 49: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Outcomes

• 2003 to 2010 – No need to grow origin

• 2010 to today – 9 origin web servers• HP DL360 G7

• Average 45-50% CPU utilization

• Capital cost for hardware? $15,000!

Page 50: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Our secret sauce.(or how to serve 800M requests a day from 9 webservers)

Page 51: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Offload (Bandwidth)

Page 52: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Offload (Hits)

Page 53: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Scaling for Unpredictable Events

Page 54: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Checking the last time a file has changed is faster than delivering that file to a user.

Page 55: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Conditional GETs (304s) will save you.

Page 56: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Make sure users don’t have to search for content

Page 57: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Increase your TTLs

Page 58: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Turn off dynamic components

Page 59: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Scaling for predictable events

Page 60: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Predicting traffic levels is impossible

Page 61: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Some (loose) rules.

• Scheduled events don't peak has high as unpredictable ones.

• Scheduled events last longer, so increase in traffic is spread out over hours, days, or weeks.

• Scheduled events are more "niche". Unlike breaking news where everyone wants to know what's going on.

• Might have to worry about 95/5 and bandwidth overages.

Page 62: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

How do you scale for write operations?

Page 63: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

We let someone else deal with that:

Page 64: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

In Summary…

Page 65: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

• Ensure your TTLs are appropriate

• Make sure your applications/content return last modified headers.

• Don't be afraid to change your site to turn off components that aren't critical during high traffic periods.

• Keep tunables at the Origin. This allows you to make changes quickly without waiting for CDN propagation.

• A CDN will not replace or fix bad origin infrastructure!

Page 66: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

• Predicting the scale of a scheduled event is impossible. You will either over estimate or under estimate.

• Use previous traffic levels during unscheduled events as a high water mark.

• Don't be afraid to ask someone else (SaaS provider) to implement a feature that is not your core business/expertise.

Page 67: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Usenix Paper

http://tinyurl.com/lisa-paper

Page 68: How Do you Scale for both Predictable and Unpredictable Events on such a Large Scale?

Thank You

@[email protected]


Top Related