protect your users with circuit breakers

77
Protect your users with Circuit Breakers @scott_triglia Jim’s

Upload: scott-triglia

Post on 12-Apr-2017

97 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Protect your Users with Circuit breakers

Protect your users with Circuit Breakers

@scott_triglia

Jim’s

Page 2: Protect your Users with Circuit breakers

Yelp’s Mission:Connecting people with great

local businesses.

Page 3: Protect your Users with Circuit breakers

@scott_triglia

I work with 💵💶

Page 4: Protect your Users with Circuit breakers

Let’s talk Circuit Breakers

Page 5: Protect your Users with Circuit breakers
Page 6: Protect your Users with Circuit breakers
Page 8: Protect your Users with Circuit breakers
Page 9: Protect your Users with Circuit breakers
Page 10: Protect your Users with Circuit breakers

Our goals today: introduce a basic circuit

breaker

Page 11: Protect your Users with Circuit breakers

Our goals today: a modular circuit breaker

Page 12: Protect your Users with Circuit breakers

Our goals today: test it out on several

scenarios

Page 13: Protect your Users with Circuit breakers
Page 14: Protect your Users with Circuit breakers
Page 15: Protect your Users with Circuit breakers

15

12

34

Page 16: Protect your Users with Circuit breakers

the fundamental rule: your systems will fail

what’s your response?

Page 17: Protect your Users with Circuit breakers

12

Page 18: Protect your Users with Circuit breakers

12

3

Page 19: Protect your Users with Circuit breakers

12

34

Page 20: Protect your Users with Circuit breakers

Nygard’s circuit breaker

Page 21: Protect your Users with Circuit breakers
Page 22: Protect your Users with Circuit breakers
Page 23: Protect your Users with Circuit breakers
Page 24: Protect your Users with Circuit breakers
Page 25: Protect your Users with Circuit breakers

Circuit Breaker States: * Healthy (or “closed”) * Recovering (or “half-open”) * Unhealthy (or “open”)

Page 26: Protect your Users with Circuit breakers
Page 27: Protect your Users with Circuit breakers

Recovery:

* Wait for recovery_timeout seconds* Send a trial request, trust its results

Page 28: Protect your Users with Circuit breakers

Before a circuit breaker

Page 29: Protect your Users with Circuit breakers

Assume the kitchen gets slow

Page 30: Protect your Users with Circuit breakers

Kitchen’s backlog grows

Page 31: Protect your Users with Circuit breakers

Diners wait much longer to get food

Page 32: Protect your Users with Circuit breakers

New diners make issues worse

Page 33: Protect your Users with Circuit breakers

And your entire system collapses

Page 34: Protect your Users with Circuit breakers

And your entire system collapses

Page 35: Protect your Users with Circuit breakers

With a circuit breaker

Page 36: Protect your Users with Circuit breakers

CB sees backlog, stops orders

Page 37: Protect your Users with Circuit breakers

Fewer frustrated diners

Page 38: Protect your Users with Circuit breakers

Reduced load on the kitchen

Page 39: Protect your Users with Circuit breakers

A well defined failure mode

Page 40: Protect your Users with Circuit breakers

How can we do better?

Page 41: Protect your Users with Circuit breakers

Improvements

Page 42: Protect your Users with Circuit breakers

1) Detecting Unhealthiness

Page 43: Protect your Users with Circuit breakers

2) Mitigating downtime

Page 44: Protect your Users with Circuit breakers

3) Recovering Effectively

Page 45: Protect your Users with Circuit breakers

Detecting Unhealthiness

Component 1:

Page 46: Protect your Users with Circuit breakers
Page 47: Protect your Users with Circuit breakers

def signal_overload(cb): if len(jobs) > THRESH: cb.mark_unhealthy()

Page 48: Protect your Users with Circuit breakers

New Behavior:

* CB gets signals from anywhere * Signal combining logic

Page 49: Protect your Users with Circuit breakers

* Allows many (many) new signals

* Must combine signals * Adds complexity to system

Page 50: Protect your Users with Circuit breakers

Mitigating Downtime

Component 2:

Page 51: Protect your Users with Circuit breakers
Page 52: Protect your Users with Circuit breakers
Page 53: Protect your Users with Circuit breakers
Page 54: Protect your Users with Circuit breakers
Page 55: Protect your Users with Circuit breakers

New Behavior: * Code can check in advance about healthiness of system * Automatic monitoring!

Page 56: Protect your Users with Circuit breakers

* Build features on top of system health status

* Requires a single source of truth?

Page 57: Protect your Users with Circuit breakers

Recovering Effectively

Component 3:

Page 58: Protect your Users with Circuit breakers
Page 59: Protect your Users with Circuit breakers

Dark launch:

* Reject but process normally * Dangerous with side effects

Block User Request Try to process anyway!

Page 60: Protect your Users with Circuit breakers

Synthetic:

* Dark launching with fake requests * Not necessarily representative

Block User Request Process fake requests

Page 61: Protect your Users with Circuit breakers

New Behavior:

* Traffic determines health * Removal of recovery timeouts

Page 62: Protect your Users with Circuit breakers

* Representative recovery * No timeout tuning required * Dark launching not always possible * Synthetic can be unrepresentative

Page 63: Protect your Users with Circuit breakers

in summary

Page 64: Protect your Users with Circuit breakers

Your system will fail, have a plan!

Page 65: Protect your Users with Circuit breakers

The basic CB is better than nothing

Page 66: Protect your Users with Circuit breakers

Questions to ask:

* What is “unhealthy” for my system? * How should I react to unhealthiness? * How do we recover?

Page 67: Protect your Users with Circuit breakers

Questions to ask:

* What is “unhealthy” for my system? * How should I react to unhealthiness? * How do we recover?

Page 68: Protect your Users with Circuit breakers

Questions to ask:

* What is “unhealthy” for my system? * How should I react to unhealthiness? * How do we recover?

Page 69: Protect your Users with Circuit breakers

Questions to ask:

* What is “unhealthy” for my system? * How should I react to unhealthiness? * How do we recover?

Page 70: Protect your Users with Circuit breakers

…and much more!

Page 71: Protect your Users with Circuit breakers

Much comes down to your use case

Page 72: Protect your Users with Circuit breakers

@YelpEngineering

fb.com/YelpEngineers

engineeringblog.yelp.com

github.com/yelp

[email protected] @scott_triglia

Page 73: Protect your Users with Circuit breakers
Page 74: Protect your Users with Circuit breakers

Downtime options besides rejecting

requests?

Page 75: Protect your Users with Circuit breakers

http://techblog.netflix.com/2011/12/making-netflix-api-more-resilient.html

Page 76: Protect your Users with Circuit breakers

How do I safely test out a new circuit breaker?

Page 77: Protect your Users with Circuit breakers

https://engineering.heroku.com/blogs/2015-06-30-improved-production-stability-with-circuit-breakers/