latency slos done right - usenix · inform slas” slo - service level ... pros: 1. easy to...
TRANSCRIPT
![Page 1: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/1.jpg)
Latency SLOs Done RightSREcon19 Americas
#SREcon@phredmoyer
![Page 2: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/2.jpg)
#SREcon
Fred Moyer
![Page 3: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/3.jpg)
Latency
Is it important?
#SREcon@phredmoyer
![Page 4: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/4.jpg)
LatencyFor any of your services, how many requests were served within 500 ms over the last month?
@phredmoyer #SREcon
500MS
?
![Page 5: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/5.jpg)
LatencyFor any of your services, how many requests were served within 250 ms over the last month?
@phredmoyer #SREcon
250MS
?
![Page 6: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/6.jpg)
Latency
How would you answer that question for your services?
@phredmoyer #SREcon
![Page 7: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/7.jpg)
Latency
How accurate would your answer be?
@phredmoyer #SREcon
?10% 20%
50% 200%
![Page 8: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/8.jpg)
I’m Fred and I like SLOs- Developer Evangelist @Circonus
- Engineer who talks to people
- Writing code and breaking prod for 20 years
- @phredmoyer
- Likes C, Go, Perl, PostgreSQL
@phredmoyer 100% UPTIME
![Page 9: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/9.jpg)
Talk Agenda● SLO Refresher● A Common Mistake● Computing SLOs with log data● Computing SLOs by counting
requests● Computing SLOs with histograms
@phredmoyer #SREcon
![Page 10: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/10.jpg)
Service Level Objectives
SLI - Service Level Indicator
SLO - Service Level Objectives
SLA - Service Level Agreement
@phredmoyer #SREcon
![Page 11: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/11.jpg)
@phredmoyer
Service Level Objectives
#SREcon
![Page 12: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/12.jpg)
“99th percentile latency of homepage requests over the past 5 minutes< 300ms”
“SLIs drive SLOs which inform SLAs”
SLI - Service Level IndicatorMeasure of the service that can be quantified
Excerpted from:“SLIs, SLOs, SLAs, oh my!”@sethvargo @lizthegrey
https://youtu.be/tEylFyxbDLE
![Page 13: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/13.jpg)
“99th percentile homepage SLI will succeed 99.9% over trailing year”
“SLIs drive SLOs which inform SLAs”
SLO - Service Level Objective, a target for Service Level Indicators
Excerpted from:“SLIs, SLOs, SLAs, oh my!”@sethvargo @lizthegrey
https://youtu.be/tEylFyxbDLE
![Page 14: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/14.jpg)
“99th percentile homepage SLI will succeed 99% over trailing year”
“SLIs drive SLOs which inform SLAs”
SLA - Service Level Agreement, a legal agreement
Excerpted from:“SLIs, SLOs, SLAs, oh my!”@sethvargo @lizthegrey
https://youtu.be/tEylFyxbDLE
![Page 15: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/15.jpg)
Talk Agenda
● SLO Refresher● A Common Mistake● Computing SLOs with log data● Computing SLOs by counting
requests● Computing SLOs with histograms
@phredmoyer #SREcon
![Page 16: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/16.jpg)
A Common Mistake
@phredmoyer
Averaging Percentiles
p95(W1 ∪ W2) != (p95(W1)+ p95(W2))/2Works fine when node workload is symmetric
Hides problems when workloads are asymmetric
#SREcon
![Page 17: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/17.jpg)
A Common Mistake
@phredmoyer #SREcon
![Page 18: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/18.jpg)
A Common Mistake
@phredmoyer
99% of requests served here
#SREcon
![Page 19: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/19.jpg)
@phredmoyer
Averaging Percentiles
A Common Mistake
#SREcon
![Page 20: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/20.jpg)
@phredmoyer
p95(W1) = 220msp95(W2) = 650ms
p95(W1 ∪ W2) = 230ms
(p95(W1)+p95(W2))/2 = 430ms
~200% difference
A Common Mistake
#SREcon
![Page 21: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/21.jpg)
@phredmoyer
Averaging Percentiles
A Common Mistakep95 actual (230ms)
p95 average (430ms)
ERROR
#SREcon
![Page 22: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/22.jpg)
A Common Mistake
@phredmoyer
Log parser => Metrics (mtail)
What metrics are you storing?
Averages?
p50, p90, p95, p99, p99.9, p99.9?
#SREcon
![Page 23: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/23.jpg)
Talk Agenda● SLO Refresher● A Common Mistake● Computing SLOs with log data● Computing SLOs by counting
requests● Computing SLOs with
histograms
@phredmoyer #SREcon
![Page 24: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/24.jpg)
Computing SLOs with log data
@phredmoyer
"%{%d/%b/%Y %T}t.%{msec}t %{%z}t"
#SREcon
~100 bytes per log line
~1GB for 10M requests
![Page 25: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/25.jpg)
@phredmoyer
Logs => HDFS
Logs => ElasticSearch/Splunk
ssh -- `grep ... | awk ... > 550 ... | wc -l`
#SREcon
Computing SLOs with log data
![Page 26: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/26.jpg)
@phredmoyer
1. Extract samples for time window
2. Sort the samples by value
3. Find the sample 5% count from largest
4. That’s your p95
#SREcon
Computing SLOs with log data
![Page 27: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/27.jpg)
@phredmoyer
“95th percentile SLI will succeed 99.9% trailing year”
1. Divide 1 year samples into 1,000 slices
2. For each slice, calculate SLI
3. Was p95 SLI met for 999 slices? Met SLO if so
#SREcon
Computing SLOs with log data
![Page 28: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/28.jpg)
Computing SLOs with log data
@phredmoyer
Pros:
1. Easy to configure logs to capture latency
2. Easy to roll your own processing code, some open source options out there
3. Accurate results
#SREcon
Cons:
1. Expensive (see log analysis solution pricing)
2. Sampling possible but skews accuracy
3. Slow4. Difficult to scale
![Page 29: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/29.jpg)
Talk Agenda● SLO Refresher● A Common Mistake● Computing SLOs with log data● Computing SLOs by counting
requests● Computing SLOs with histograms
@phredmoyer #SREcon
![Page 30: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/30.jpg)
@phredmoyer
1. Count # of requests that violate SLI threshold
2. Count total number of requests
3. % success = 100 - (#failed_reqs/#total_reqs)*100
Similar to Prometheus cumulative ‘<=’ histogram
#SREcon
Computing SLOs by counting requests
![Page 31: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/31.jpg)
Computing SLOs by counting requests
@phredmoyer #SREcon
![Page 32: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/32.jpg)
Computing SLOs by counting requests
@phredmoyer
SLO = 90% of reqs < 30ms
# bad requests = 2,262# total requests = 60,124
100-(2262/60124)*100=96.2%
SLO was met
#SREcon
![Page 33: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/33.jpg)
@phredmoyer
Pros:1. Simple to implement
2. Performant
3. Scalable
4. Accurate
Computing SLOs by counting requests
#SREcon
Pros:
1. Simple to implement2. Performant3. Scalable4. Accurate
Cons:
1. Fixed SLO threshold - must reconfigure
2. Look back impossible for other thresholds
![Page 34: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/34.jpg)
Talk Agenda● SLO Refresher● A Common Mistake● Computing SLOs with log data● Computing SLOs by counting
requests● Computing SLOs with histograms
@phredmoyer #SREcon
![Page 35: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/35.jpg)
Computing SLOs with histogramsAKA distributions
Sample counts in bins/buckets
Gil Tene’s hdrhistogram.org
Sample value
# Samples
Median q(0.5)
Modeq(0.9)
q(1)Mean
![Page 36: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/36.jpg)
@phredmoyer
Some histogram types:
1. Linear2. Approximate3. Fixed bin4. Cumulative5. Log Linear
Computing SLOs by counting requests
#SREcon
![Page 37: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/37.jpg)
@phredmoyer
Log Linear Histogram
github.com/circonus-labs/libcircllhistgithub.com/circonus-labs/circonusllhist
#SREcon
![Page 38: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/38.jpg)
@phredmoyer
Log Linear Histogram
#SREcon
![Page 39: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/39.jpg)
@phredmoyer
h(A ∪ B) = h(A) ∪ h(B)
A & B must have identical bin boundariesCan be aggregated both in space and time
Mergeability
#SREcon
![Page 40: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/40.jpg)
@phredmoyer
How many requests are faster than 330ms?
1. Walk the bins lowest to highest until you reach 330ms
2. Sum the counts in those bins
3. Done
Computing SLOs with histograms
#SREcon
![Page 41: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/41.jpg)
@phredmoyer #SREcon
![Page 42: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/42.jpg)
@phredmoyer
For the libcircllhist implementation we have bins at:
... 320, 330, 340, ...
.... And: 10,11,12,13...
.... And: 0.0000010, 0.0000011, 0.0000012,
For every decimal floating point number, with 2 significant digits, we have a bin (within 10^{+/-128}).
So ... where are the bin boundaries?
#SREcon
![Page 43: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/43.jpg)
@phredmoyer
Pros:
1. Space Efficient (HH: ~ 300bytes / histogram in practice, 10x more efficient than logs)
2. Full Flexibility:- Thresholds can be chosen as needed and analyzed- Statistical methods applicable, IQR, count_below, q(1), etc.
3. Mergability (HH: Aggregate data across nodes)4. Performance (ns insertions, μs percentile calculations)5. Bounded error (half the bin size)6. Several open source libraries available
Computing SLOs with histograms
#SREcon
![Page 44: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/44.jpg)
@phredmoyer
Computing SLOs with histograms
#SREcon
Cons:
1. Math is more complex than other methods
2. Some loss of accuracy (<<5%) in worst cases
![Page 45: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/45.jpg)
@phredmoyer
github.com/circonus-labs/libcircllhist(autoconf && ./configure && make install)
github.com/circonus-labs/libcircllhist/tree/master/src/python(pip install circllhist)
Log Linear histograms with Python
#SREcon
![Page 46: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/46.jpg)
@phredmoyer
h = Circllhist() # make a new histogramh.insert(123) # insert value 123h.insert(456) # insert value 456h.insert(789) # insert value 789print(h.count()) # prints 3print(h.sum()) # prints 1,368print(h.quantile(0.5)) # prints 456
#SREcon
Log Linear histograms with Python
![Page 47: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/47.jpg)
@phredmoyer
from matplotlib import pyplot as pltfrom circllhist import CircllhistH = Circllhist()… # add latency data to H via insert()H.plot()plt.axvline(x=H.quantile(0.95), color=red)
#SREcon
Log Linear histograms with Python
![Page 48: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/48.jpg)
@phredmoyer
Averaging Percentiles
#SREcon
Log Linear histograms with Python
![Page 49: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/49.jpg)
@phredmoyer
Conclusions
1. Averaging Percentiles is tempting, but misleading
2. Use counters or histograms to calculate SLOs correctly
3. Histograms give the most flexibility in choosing latency thresholds, but only a couple libraries implement them (libcircllhist, hdrhistogram)
4. Full support for (sparsely encoded-, HDR-) histograms in TSDBs still lacking (except IRONdb).
#SREcon
![Page 50: Latency SLOs Done Right - USENIX · inform SLAs” SLO - Service Level ... Pros: 1. Easy to configure logs to capture latency 2. Easy to roll your own processing code, some open source](https://reader036.vdocument.in/reader036/viewer/2022062604/5fc29a2541b2c81e8a47fc26/html5/thumbnails/50.jpg)
#SREcon
Fred Moyer
Thank you!
slideshare.net/redhotpenguin