benchmarking: you're doing it wrong (strangeloop 2014)

Post on 01-Jul-2015

1.053 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Knowledge of how to set up good benchmarks is invaluable in understanding performance of the system. Writing correct and useful benchmarks is hard, and verification of the results is difficult and prone to errors. When done right, benchmarks guide teams to improve the performance of their systems. When done wrong, hours of effort may result in a worse performing application, upset customers or worse! In this talk, we will discuss what you need to know to write better benchmarks. We will look at examples of bad benchmarks and learn about what biases can invalidate the measurements, in the hope of correctly applying our new-found skills and avoiding such pitfalls in the future.

TRANSCRIPT

Benchmarking:  You’re  Doing  It  Wrong  

Aysylu  Greenberg  @aysylu22  

In  Memes…  

To  Write  Good  Benchmarks…  

Need  to  be  Full  Stack  

       What’s  a  Benchmark  

How  fast?    

Your  process  vs  Goal  Your  process  vs  Best  PracLces  

 

Today  

•  How  Not  to  Write  Benchmarks  •  Benchmark  Setup  &  Results:  -   Wrong  about  the  machine  -   Wrong  about  stats  -   Wrong  about  what  maOers  

•  Becoming  Less  Wrong  

HOW  NOT  TO  WRITE  BENCHMARKS  

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

WHAT’S  WRONG  WITH  THIS  BENCHMARK?    

BENCHMARK  SETUP  &  RESULTS:  COMMON  PITFALLS  

You’re  wrong  about  the  machine  

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  

It’s  Caches  All  The  Way  Down  

It’s  Caches  All  The  Way  Down  

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Timing  

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Timing  •  Periodic  interference  

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Timing  •  Periodic  interference  •  Different  specs  in  test  vs  prod  machines  

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Timing  •  Periodic  interference  •  Different  specs  in  test  vs  prod  machines  •  Power  mode  changes  

BENCHMARK  SETUP  &  RESULTS:  COMMON  PITFALLS  

You’re  wrong  about  the  stats  

Wrong  About  Stats  

•  Too  few  samples    

Wrong  About  Stats  

0  

20  

40  

60  

80  

100  

120  

0   10   20   30   40   50   60  

Latency  

Time  

Convergence  of  Median  on  Samples  

Stable  Samples  

Stable  Median  

Decaying  Samples  

Decaying  Median  

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Wrong  About  Stats  

•  Too  few  samples  •  Non-­‐Gaussian  

Website  Serving  Images  

•  Access  1  image  1000  Lmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Wrong  About  Stats  

•  Too  few  samples  •  Non-­‐Gaussian  •  MulLmodal  distribuLon  

MulLmodal  DistribuLon  

50%  99%  

#  occurren

ces  

Latency   5  ms   10  ms  

Wrong  About  Stats  

•  Too  few  samples  •  Non-­‐Gaussian  •  MulLmodal  distribuLon  •  Outliers  

BENCHMARK  SETUP  &  RESULTS:  COMMON  PITFALLS  

You’re  wrong  about  what  maOers  

Wrong  About  What  MaOers  

•  Premature  opLmizaLon  

“Programmers  waste  enormous  amounts  of  Lme  thinking  about  …  the  speed  of  noncriLcal  parts  of  their  programs  ...  Forget  about  small  efficiencies  …97%  of  the  Lme:  premature  opKmizaKon  is  the  root  of  all  evil.  Yet  we  should  not  pass  up  our  opportuniLes  in  that  criLcal  3%.”    

-­‐-­‐  Donald  Knuth  

Wrong  About  What  MaOers  

•  Premature  opLmizaLon  •  UnrepresentaLve  Workloads  

Wrong  About  What  MaOers  

•  Premature  opLmizaLon  •  UnrepresentaLve  Workloads  •  Memory  pressure  

BECOMING  LESS  WRONG  The  How  

Becoming  Less  Wrong  

User  AcLons  MaOer    

X  >  Y  for  workload  Z  with  trade  offs  A,  B,  and  C  

-­‐  hOp://www.toomuchcode.org/  

Becoming  Less  Wrong  

Profiling  Code  InstrumentaLon  Aggregate  Over  Logs  Traces    

Microbenchmarking:  Blessing  &  Curse  

+ Quick  &  cheap  + Answers  narrow  ?s  well  - Open  misleading  results  - Not  representaLve  of  the  program  

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely    

Choose  Your  N  Wisely  Prof.  Saman  Amarasinghe,  MIT  2009    

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  •  Beware  of  clock  resoluLon  

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  •  Beware  of  clock  resoluLon  •  Dead  Code  EliminaLon  

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  •  Beware  of  clock  resoluLon  •  Dead  Code  EliminaLon  •  Constant  work  per  iteraLon  

Non-­‐Constant  Work  Per  IteraLon  

Follow-­‐up  Material  •  How  NOT  to  Measure  Latency  by  Gil  Tene  

–  hOp://www.infoq.com/presentaLons/latency-­‐piralls  •  Taming  the  Long  Latency  Tail  on  highscalability.com  

–  hOp://highscalability.com/blog/2012/3/12/google-­‐taming-­‐the-­‐long-­‐latency-­‐tail-­‐when-­‐more-­‐machines-­‐equal.html  

•  Performance  Analysis  Methodology  by  Brendan  Gregg  –  hOp://www.brendangregg.com/methodology.html  

•  Robust  Java  benchmarking  by  Brent  Boyer  –  hOp://www.ibm.com/developerworks/library/j-­‐benchmark1/  –  hOp://www.ibm.com/developerworks/library/j-­‐benchmark2/  

•  Benchmarking  arLcles  by  Aleksey  Shipilëv  –  hOp://shipilev.net/#benchmarking  

Takeaway  #1:  Cache  

Takeaway  #2:  Outliers  

Takeaway  #3:  Workload  

Benchmarking:  You’re  Doing  It  Wrong  

Aysylu  Greenberg  @aysylu22  

top related