pitfalls in measuring slos€¦ · number of bad events allowed. @fisherdanyel @lizthegrey deploy...

75
Pitfalls in Measuring SLOs Danyel Fisher @fisherdanyel Liz Fong-Jones @lizthegrey

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

Pitfalls in Measuring SLOsDanyel Fisher@fisherdanyel

Liz Fong-Jones@lizthegrey

Page 2: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Page 3: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Page 4: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

What do you do when things break?

How bad was this break?

Page 5: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Page 6: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Build new features! We need to improve quality!

Page 7: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Management

Engineering

Clients and Users

How broken is “too broken”?

What does “good enough” mean?

Combatting alert fatigue

Page 8: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

A telemetry system produces events that correspond to real world use

We can describe some of these events as eligible

We can describe some of them as good

Page 9: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Given an event, is it eligible? Is it good?

Eligible: “Had an http status code”

Good: “... that was a 200, and was served under 500 ms”

Page 10: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Page 11: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Minimum Quality ratio over a period of time

Number of bad events allowed.

Page 12: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Deploy faster

Room for experimentation

Opportunity to tighten SLO

Page 13: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

We always store incoming user data

99.99%

Page 14: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

We always store incoming user data

99.99%

Default dashboards usually load in < 1s

99.9%

Page 15: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

We always store incoming user data

99.99%

Queries often return in < 10 s

Default dashboards usually load in < 1s

99.9%

99%

Page 16: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

We always store incoming user data

99.99%

Queries often return in < 10 s

Default dashboards usually load in < 1s

99.9%

99% 7.3 hours

45 minutes

~4.3 minutes

Page 17: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Page 18: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

User Data Throughput

Page 19: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

User Data Throughput

We blew through three months’ budget in those 12

minutes.

Page 20: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

We dropped customer data

Page 21: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

We dropped customer data

We rolled it back (manually)

We communicated to customers

We halted deploys

Page 22: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

We checked in code that didn’t build.

Page 23: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

We checked in code that didn’t build.

We had experimental CI build wiring.

Page 24: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

We checked in code that didn’t build.

We had experimental CI build wiring.

Our scripts deployed empty binaries.

Page 25: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

We checked in code that didn’t build.

We had experimental CI build wiring.

Our scripts deployed empty binaries.

There was no health check and rollback.

Page 26: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

We stopped writing new features

We prioritized stability

We mitigated risks

Page 27: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

SLOs allowed us to characterize

what went wrong, how badly it went wrong,

andhow to prioritize repair

Page 28: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Learning from SLOs

Page 29: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Final pointA one-line description of it

Page 30: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

SLOs, as espoused at Google vs in practice

We told you "SLOs are magical".

But not everyone has Google's tooling.

And SLO practice at Google is uneven.

Debugging SLOs often is divorced from reporting SLOs.

There had to be a better way, leveraging Honeycomb...

Page 31: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Using a Design Thinking perspective:

● Expressing and Viewing SLOs● Burndown Alerts and Responding● Learning from our Experiences

Page 32: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Displays and Views

Page 33: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

See where the burndown was happening, explain why, and remediate

Page 34: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Expressing SLOs

Time based: “How many 15 minute periods, had a P99(duration) < 50 ms”

Event based: “How many events had a duration < 500 ms”

Page 35: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Status of an SLO

Page 36: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

How have we done?

Page 37: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Page 38: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Where did it go?

Page 39: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

When did the errors happen?

Page 40: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

When did the errors happen?

Page 41: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

What went wrong?

High dimensional data

High cardinality data

Page 42: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Why did it happen?

Page 43: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Why did it happen?

Page 44: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Why did it happen?

Page 45: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

See where the burndown was happening, explain why, and remediate

Page 46: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

User Feedback

“I’d love to drive alerts off our SLOs. Right now we don’t have anything to draw us in and have some alerts on the average error rate but they’re a little spiky to be useful.”

Page 47: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Burndown Alerts

Page 48: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

How is my system doing?Am I over budget?

When will my alarm fail?

Page 49: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

When will I fail?

User goal: get alerts to exhaustion time

Human-digestible units

24 hours: “I’ll take a look in the morning”

4 hours: “All hands on deck!”

Page 50: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Page 51: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

How is my system doing?Am I over budget?

When will my alarm fail?

Page 52: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Implementing Burn Alerts

Run a 30 day query

Page 53: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Implementing Burn Alerts

Run a 30 day query

at a 5 minute resolution

Page 54: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Implementing Burn Alerts

Run a 30 day query

at a 5 minute resolution

every minute

Page 55: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Page 56: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Learning from Experience

Page 57: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Volume is importantTolerate at least dozens of bad events per day

Page 58: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

Faults

Page 59: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

SLOs for Customer ServiceRemember that user having a bad day?

ADD IMAGE

Page 60: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Blackouts are easy… but brownouts are much more interesting

Page 61: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Page 62: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Timeline1:29 am SLO alerts. “Maybe it’s just a blip”

1.5% brownout for 20 minutes

Page 63: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Timeline1:29 am SLO alerts. “Maybe it’s just a blip”

1.5% brownout for 20 minutes4:21 am Minor incident. “It might be an AWS problem”

Page 64: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Timeline1:29 am SLO alerts. “Maybe it’s just a blip”

1.5% brownout for 20 minutes4:21 am Minor incident. “It might be an AWS problem”6:25 am SLO alerts again. “Could it be ALB compat?”

Page 65: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Timeline1:29 am SLO alerts. “Maybe it’s just a blip”4:21 am Minor incident. “It might be an AWS problem”6:25 am SLO alerts again. “Could it be ALB compat?”9:55 am “Why is our system uptime dropping to zero?”

It’s out of memoryWe aren’t alerting on that crash

Page 66: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Timeline1:29 am SLO alerts. “Maybe it’s just a blip”

1.5% brownout for 20 minutes4:21 am Minor incident. “It might be an AWS problem”6:25 am SLO alerts again. “Could it be ALB compat?”9:55 am “Why is our system uptime dropping to zero?”

It’s out of memoryWe aren’t alerting on that crash

Page 67: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Timeline1:29 am SLO alerts. “Maybe it’s just a blip”

1.5% brownout for 20 minutes4:21 am Minor incident. “It might be an AWS problem”6:25 am SLO alerts again. “Could it be ALB compat?”9:55 am “Why is our system uptime dropping to zero?”

It’s out of memoryWe aren’t alerting on that crash

10:32 am Fixed

Page 68: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Page 69: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Page 70: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Cultural ChangeIt’s hard to replace alerts with SLOs

But a clear incident can help

Page 71: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Reduce Alarm FatigueFocus on user-affecting SLOs

Focus on actionable alarms

Page 72: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

Conclusion

Page 73: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

SLOs allowed us to characterize

what went wrong,how badly it went wrong,

andhow to prioritize repair

Page 74: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

@fisherdanyel @lizthegrey

You can do it tooAnd maybe avoid our mistakes

Page 75: Pitfalls in Measuring SLOs€¦ · Number of bad events allowed. @fisherdanyel @lizthegrey Deploy faster Room for experimentation Opportunity to tighten SLO. @fisherdanyel @lizthegrey

Pitfalls in Measuring SLOs

Email: [email protected] / [email protected]

Twitter: @fisherdanyel & @lizthegrey

Come talk to us in Slack!

hny.co/danyel