measuring cdn performance and why you're doing it wrong

Measuring CDN Performance

Hooman Beheshti

VP Technology

Why this matters •  Performance is one of the main reasons we use a CDN

•  Measurement often used during evaluation phase to compare CDNs – Most of what we’ll talk about is in this context

•  Seems easy, but isn’t •  Heavily vendor-‐influenced –  “Ok Google: define irony!”

•  What does the measurement landscape look like

•  Share measurement experiences

•  Help guide towards good testing plan if/when you decide to do this

Background

Delivery: static/cached objects

Client

CDN Node

Origin

Delivery: dynamic/uncached objects

What we’ll be focusing on •  Only on delivery and not all the other features CDNs provide

•  How we measure •  Metrics to measure •  What to measure •  Some gotchas, misconceptions, and common mistakes

Measurement Techniques

(how we measure)

Measurement techniques •  Pretend Users –  Synthetic tests – Not actual users

•  Real Users –  In the browser – Actual users

Synthetic testing

•  Usually a large network of test nodes all over the globe

•  Highly scalable, can do lots of tests at once •  Many vendors that have this model – Examples: Catchpoint, Dynatrace(Gomez), Keynote, Pingdom, etc

Synthetic testing •  Built to do full performance and availability testing

–  Lots of “monitors” – emulating what real users do –  DNS, Traceroute, Ping, Streaming, Mobile –  HTTP

•  Object •  Browser •  Transactions/Flows

•  Tests set up with some frequency to repeatedly test things

–  Aggregates reported

Backbone nodes •  Test machines sitting in datacenters all around the globe •  Really good at:

–  Availability and reachability –  Scale –  Backend problems –  Global reach

•  Terrible indicators of raw performance –  No latency –  Infinite bandwidth

Backbone nodes •  Test machines sitting in datacenters all around the globe •  Really good at:

–  Availability and reachability –  Scale –  Backend problems –  Global reach

•  Often terrible indicators of raw performance –  No latency –  Infinite bandwidth

https://www.flickr.com/photos/stars6/4381851322/

Last mile nodes •  Test machines sitting behind a real home-‐like internet connection

•  Much better at reporting what you can expect from users, but sometimes unreliable

•  Also not as dense in deployment

backbone last mile

Real users (RUM)

•  Use javascript to collect timing metrics

•  Can collect lots of things through browser APIs – Page metrics, asset metrics, user-‐defined metrics

Use test assets

•  Use this model to initiate tests in the browser •  Some vendors: – Cedexis, TurboBytes, CloudHarmony, more… – Usually, this isn’t their business, but the data drives their main business objectives

•  You can build this yourself too

Use real assets in the page •  Collect timings from actual objects – Resource timing

•  Vendors –  SOASTA, New Relic, most synthetic vendors – Boomerang (open source) – Google Analytics User Timings

DATA, DATA, DATA

•  For either RUM technique, we need A LOT of data

•  Too much variance – Most vendors don’t use averages – Medians, percentiles, and histograms

Measurement Metrics

Client Server

1 x RTT

Client Server

DNS DNS

Client Server

DNS DNS

Client Server

DNS DNS

Client Server

DNS DNS

HTTP (TTFB)

Client Server

DNS DNS

HTTP (TTFB)

HTTP (Download)

DNS TCP (TLS) TTFB Download (TTLB-‐TTFB)

DNS RTT to DNS server, DNS iterations, DNS caching and TTLs

RTT to DNS server, DNS iterations, DNS caching and TTLs

RTT to cache server (CDN footprint & routing algorithms)

RTT to cache server (or RTTs depending on TLS False Start), efficiency of TLS engine

RTT to where the object is stored + storage efficiency (different for requests to origin); lower bound = network RTT

TTLB-‐TTFB

RTT to where the object is stored + storage efficiency (different for requests to origin); lower bound = network RTT

Bandwidth, congestion avoidance algorithms (and RTT!)

Core object metrics

•  Not every request experiences every metric: – DNS: once per domain – TCP/TLS setup once per connection – TTFB/Download for every object (not already in browser cache)

Resource timing

http://www.w3.org/TR/resource-‐timing/

Resource timing

window.performance.getEntries()

Mistakes we make

(when evaluating)

vs CDN Y

“I’ll pick an image from my home page, use backbone synthetic tests from all over the world and pick the CDN that has the fastest average time”

“let’s test an asset via RUM on a million page views a day and pick the fastest CDN”

“let’s run webpagetest on both CDNs and go with whichever has a faster page load time”

~$time curl –v http://…

we measure the wrong thing

Web application: objects •  Your application should determine what you test: – Objects served from the edge – Objects served from origin (through CDN)

•  If HTML is from origin (through CDN), we must measure it –  Essential to critical page metrics

Web application: object sizes

•  On any page –  DNS queries only happen a small

number of times –  6 TCP connections per domain –  1 TLS setup per connection –  Many many many HTTP fetches

•  Core metrics –  TTFB –  Download (TTLB-‐TTFB) if

important large objects –  Should have a good idea of DNS/

TCP/TLS, but less critical

Web application •  If CDN only for static/cacheable objects: – One or two representative assets –  TTFB and maybe download most important

Client CDN Node

X-Cache: HIT

Web application •  If CDN also for whole site (HTML going through CDN) –  Sample of key HTML pages, delivered from origin –  TTFB will show efficiency of routing (and connection management) to origin

–  TTLB will show efficiency of delivery

Web Server Client CDN Node

Web application •  If CDN also for whole site (HTML going through CDN) –  Sample of key HTML pages, delivered from origin –  TTFB will show efficiency of routing (and connection management) to origin

–  TTLB will show efficiency of delivery

Web Server Client CDN Node CDN Node

we measure the wrong way

Backbone Nodes

(For true performance measurements)

% of tes

TCP Connect Time Histogram (BB nodes)

object metrics or

page metrics

Download: 15Mbps Upload: 5Mbps Latency: 10 ms, 25 ms

10 msec 25 msec

onload Speed Index Start Render

10 msec

25 msec

What the…??? •  We always assume “all things equal” •  Too many factors affect page load time

–  3rd parties (sometimes varying), content form origin, layout, JS execution, etc

•  Too much variance

Source: httparchive.org

To be clear… •  Always use webpagetest (or something like it) to understand your

application’s performance profile

•  Continue to monitor application performance, and always spot check

•  Be extremely careful when using it to compare CDN performance, it can mislead you –  If using RUM to measure page metrics, with lots of data, things

become a little more meaningful (data volume handles variance)

we overgeneralize and

draw the wrong conclusions

Cache hit ratios

Cache hit ratio: traditional calculation

1 -‐ Requests to Origin

Total Requests

Origin

HOT COLD

Origin

cache “hit”

Cache hit ratio: traditional calculation

1 -‐ Requests to Origin

Total Requests

Isn’t this better?

Total Requests @edge

Hits + Misses @edge

Cache hit ratio

vs. 1 -‐ Requests to Origin

Total Requests

Hits + Misses @edge

Cache hit ratio

Total Requests

Hits + Misses @edge

Offload

Cache hit ratio

Total Requests

Hits + Misses @edge

Offload Performance

Effect on long tail content

(long tail: Cacheable but seldom fetched)

Popular Medium Tail (1hr) Long tail (6hr)

Connect (median)

Popular 14msec

1hr Tail 15msec

6hr Tail 16msec

Connect (median)

Popular 14msec

1hr Tail 15msec

6hr Tail 16msec 6,400+ measurements

77,000+ measurements

Connect (median) Wait (median)

Popular 14msec 19msec

1hr Tail 15msec 26msec

6hr Tail 16msec 32msec 6,400+ measurements

After all that….

How much of this really matter?

(when trying to choose between multiple CDNs)

The bigger picture

•  It’s really easy to lock in on a metric

•  Performance absolutely matters

•  True performance isn’t always as easy to measure

We must ask questions …

What’s the storage model and how does it affect long tail content?

What should I expect with cache hit ratios

for offload and performance?

Footprint?

(is what I’m testing the same as what I’m buying?)

HTTP vs TLS footprint?

Can I serve stale content if necessary?

(stale-while-revalidate & stale-if-error)

What if I can cache something I didn’t think I could?

Key takeaways •  Everything is application-‐dependent

–  Evaluate how your application works and what impacts performance the most

•  Don’t get locked into a single number/metric

•  Always know your application performance and bottlenecks

•  Be mindful of the bigger picture

•  Don’t stop measuring!

Thank you!

hooman@fastly.com

office hours Friday @lunch

measuring cdn performance and why you're doing it wrong

Software

groupthink: you're doing it wrong

7 signs you're in the wrong job

you're doing it wrong - wordcamp atlanta

hashtags you're doing it wrong

benchmarking: you're doing it wrong (strangeloop 2014)

you're monitoring kubernetes wrong

writing for the internet? you're doing it wrong

bootstrap - you're doing it wrong! jab14

controller testing: you're doing it wrong

powerpoint, you're doing it wrong

loyalty you're doing it wrong

you're doing it wrong: a social games rant

it automation: you're doing it wrong

dating: you're doing it wrong

you're doing it wrong! git it right!

word press!you're doing it wrong

hiring for critical roles: you're doing it wrong

you're still doing it all wrong

5 things you're probably doing wrong working with bloggers

timesuck? you're. doing. it. wrong