benchmarking: you're doing it wrong (strangeloop 2014)

Benchmarking: You’re Doing It Wrong

Aysylu Greenberg @aysylu22

In Memes…

To Write Good Benchmarks…

Need to be Full Stack

What’s a Benchmark

How fast?

Your process vs Goal Your process vs Best PracLces

•  How Not to Write Benchmarks •  Benchmark Setup & Results: -  Wrong about the machine -  Wrong about stats -  Wrong about what maOers

•  Becoming Less Wrong

HOW NOT TO WRITE BENCHMARKS

Website Serving Images

•  Access 1 image 1000 Lmes •  Latency measured for each access •  Start measuring immediately •  3 runs •  Find mean •  Dev machine

Web Request

Server

S3 Cache

WHAT’S WRONG WITH THIS BENCHMARK?

BENCHMARK SETUP & RESULTS: COMMON PITFALLS

You’re wrong about the machine

Wrong About the Machine

•  Cache, cache, cache, cache!

It’s Caches All The Way Down

Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009

Web Request

Server

S3 Cache

•  Cache, cache, cache, cache! •  Warmup & Timing

Web Request

Server

S3 Cache

•  Cache, cache, cache, cache! •  Warmup & Timing •  Periodic interference

Web Request

Server

S3 Cache

•  Cache, cache, cache, cache! •  Warmup & Timing •  Periodic interference •  Different specs in test vs prod machines

Web Request

Server

S3 Cache

•  Cache, cache, cache, cache! •  Warmup & Timing •  Periodic interference •  Different specs in test vs prod machines •  Power mode changes

You’re wrong about the stats

Wrong About Stats

•  Too few samples

Wrong About Stats

0 10 20 30 40 50 60

Latency

Convergence of Median on Samples

Stable Samples

Stable Median

Decaying Samples

Decaying Median

Web Request

Server

S3 Cache

Wrong About Stats

•  Too few samples •  Non-‐Gaussian

Web Request

Server

S3 Cache

Wrong About Stats

•  Too few samples •  Non-‐Gaussian •  MulLmodal distribuLon

MulLmodal DistribuLon

50% 99%

# occurren

Latency 5 ms 10 ms

Wrong About Stats

•  Too few samples •  Non-‐Gaussian •  MulLmodal distribuLon •  Outliers

You’re wrong about what maOers

Wrong About What MaOers

•  Premature opLmizaLon

“Programmers waste enormous amounts of Lme thinking about … the speed of noncriLcal parts of their programs ... Forget about small efficiencies …97% of the Lme: premature opKmizaKon is the root of all evil. Yet we should not pass up our opportuniLes in that criLcal 3%.”

-‐-‐ Donald Knuth

•  Premature opLmizaLon •  UnrepresentaLve Workloads

•  Premature opLmizaLon •  UnrepresentaLve Workloads •  Memory pressure

BECOMING LESS WRONG The How

Becoming Less Wrong

User AcLons MaOer

X > Y for workload Z with trade offs A, B, and C

-‐ hOp://www.toomuchcode.org/

Becoming Less Wrong

Profiling Code InstrumentaLon Aggregate Over Logs Traces

Microbenchmarking: Blessing & Curse

+ Quick & cheap + Answers narrow ?s well - Open misleading results - Not representaLve of the program

•  Choose your N wisely

Choose Your N Wisely Prof. Saman Amarasinghe, MIT 2009

•  Choose your N wisely •  Measure side effects

•  Choose your N wisely •  Measure side effects •  Beware of clock resoluLon

•  Choose your N wisely •  Measure side effects •  Beware of clock resoluLon •  Dead Code EliminaLon

•  Choose your N wisely •  Measure side effects •  Beware of clock resoluLon •  Dead Code EliminaLon •  Constant work per iteraLon

Non-‐Constant Work Per IteraLon

Follow-‐up Material •  How NOT to Measure Latency by Gil Tene

–  hOp://www.infoq.com/presentaLons/latency-‐piralls •  Taming the Long Latency Tail on highscalability.com

–  hOp://highscalability.com/blog/2012/3/12/google-‐taming-‐the-‐long-‐latency-‐tail-‐when-‐more-‐machines-‐equal.html

•  Performance Analysis Methodology by Brendan Gregg –  hOp://www.brendangregg.com/methodology.html

•  Robust Java benchmarking by Brent Boyer –  hOp://www.ibm.com/developerworks/library/j-‐benchmark1/ –  hOp://www.ibm.com/developerworks/library/j-‐benchmark2/

•  Benchmarking arLcles by Aleksey Shipilëv –  hOp://shipilev.net/#benchmarking

Takeaway #1: Cache

Takeaway #2: Outliers

Takeaway #3: Workload

Benchmarking: You’re Doing It Wrong

Aysylu Greenberg @aysylu22

benchmarking: you're doing it wrong (strangeloop 2014)

Technology

you're accepted

you're invited!

if you're hot you're hot

you're hired!

benchmarking incubators - ukspa - benchmarking...

you fun you're great friend you're cool you make the world...

you're extraordinary

you're amazing and i love how you're always so enthusiastic...

distributed data analysis with hadoop and r - strangeloop...

(f4) measure works - presentation strangeloop v5 final

maintenance benchmarking instrument - cou.fi · maintenance...

you're online

what to do if you're behind in you're property taxes

you you're cool think you're awesome you're friend ·...

velocity 2010 workshop - the 90-minute optimization life...

riak - from small to large - strangeloop

kotlin @ strangeloop 2011

you're invited

if you're afraid of terrorists, you're stupid

graph computing @ strangeloop 2013