circuit breakers @ api world 2016

78
Handle API Failure gracefully with Circuit Breakers Scott Triglia 1 Jim’s

Upload: scott-triglia

Post on 13-Apr-2017

174 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Circuit breakers @ API World 2016

Handle API Failure gracefully with Circuit Breakers

Scott Triglia

1

Jim’s

Page 2: Circuit breakers @ API World 2016

2

Yelp’s Mission:Connecting people with great

local businesses.

Page 3: Circuit breakers @ API World 2016

Yelp StatsAs of Q2 2016

92M monthly

users

32 countries

72% of searches from

mobile

108M reviews

Page 4: Circuit breakers @ API World 2016

Scott Triglia

4

@scott_triglia

Work with the $$$

Page 5: Circuit breakers @ API World 2016

Let’s talk Circuit Breakers

5

Page 6: Circuit breakers @ API World 2016

6

Page 7: Circuit breakers @ API World 2016

7

Page 8: Circuit breakers @ API World 2016

8Photo by zerial, CC BY-NC 2.0

Page 9: Circuit breakers @ API World 2016

9

Page 10: Circuit breakers @ API World 2016

10

Page 11: Circuit breakers @ API World 2016

Our goals today: introduce a basic circuit breaker

11

Page 12: Circuit breakers @ API World 2016

Our goals today: a modular circuit breaker

12

Page 13: Circuit breakers @ API World 2016

Our goals today: test it out on several scenarios

13

Page 14: Circuit breakers @ API World 2016

14

Page 15: Circuit breakers @ API World 2016

15

Page 16: Circuit breakers @ API World 2016

1616

1

2

34

Page 17: Circuit breakers @ API World 2016

the fundamental rule: your systems will fail

what’s your response?

17

Page 18: Circuit breakers @ API World 2016

1818

1

2

Page 19: Circuit breakers @ API World 2016

1919

1

2

3

Page 20: Circuit breakers @ API World 2016

2020

1

2

34

Page 21: Circuit breakers @ API World 2016

21

Nygard’s circuit breaker

Page 22: Circuit breakers @ API World 2016

22

Page 23: Circuit breakers @ API World 2016

23

Page 24: Circuit breakers @ API World 2016

24

Page 25: Circuit breakers @ API World 2016

25

Page 26: Circuit breakers @ API World 2016

26

Circuit Breaker States: * Healthy (or “closed”) * Recovering (or “half-open”) * Unhealthy (or “open”)

Page 27: Circuit breakers @ API World 2016

27

Page 28: Circuit breakers @ API World 2016

28

Recovery:

* Wait for recovery_timeout seconds* Send a trial request, trust its results

Page 29: Circuit breakers @ API World 2016

29

Before a circuit breaker

Page 30: Circuit breakers @ API World 2016

30

Assume the kitchen gets slow

Page 31: Circuit breakers @ API World 2016

31

Kitchen’s backlog grows

Page 32: Circuit breakers @ API World 2016

32

Diners wait much longer to get food

Page 33: Circuit breakers @ API World 2016

33

New diners make issues worse

Page 34: Circuit breakers @ API World 2016

34

And your entire system collapses

Page 35: Circuit breakers @ API World 2016

35

And your entire system collapses

Page 36: Circuit breakers @ API World 2016

36

With a circuit breaker

Page 37: Circuit breakers @ API World 2016

37

CB sees backlog, stops orders

Page 38: Circuit breakers @ API World 2016

38

Fewer frustrated diners

Page 39: Circuit breakers @ API World 2016

39

Reduced load on the kitchen

Page 40: Circuit breakers @ API World 2016

40

A well defined failure mode

Page 41: Circuit breakers @ API World 2016

41

How can we do better?

Page 42: Circuit breakers @ API World 2016

42

Improvements

Page 43: Circuit breakers @ API World 2016

43

1) Detecting Unhealthiness

Page 44: Circuit breakers @ API World 2016

44

2) Mitigating downtime

Page 45: Circuit breakers @ API World 2016

45

3) Recovering Effectively

Page 46: Circuit breakers @ API World 2016

46

Detecting Unhealthiness

Component 1:

Page 47: Circuit breakers @ API World 2016

47

Page 48: Circuit breakers @ API World 2016

48

def signal_overload(cb): if len(jobs) > THRESH: cb.mark_unhealthy()

Page 49: Circuit breakers @ API World 2016

49

New Behavior:

* CB gets signals from anywhere * Signal combining logic

Page 50: Circuit breakers @ API World 2016

50

* Allows many (many) new signals

* Must combine signals * Adds complexity to system

Page 51: Circuit breakers @ API World 2016

51

Mitigating Downtime

Component 2:

Page 52: Circuit breakers @ API World 2016

52

Page 53: Circuit breakers @ API World 2016

53

Page 54: Circuit breakers @ API World 2016

54

Page 55: Circuit breakers @ API World 2016

55

Page 56: Circuit breakers @ API World 2016

56

New Behavior: * Code can check in advance about healthiness of system * Automatic monitoring!

Page 57: Circuit breakers @ API World 2016

57

* Build features on top of system health status

* Requires a single source of truth?

Page 58: Circuit breakers @ API World 2016

58

Recovering Effectively

Component 3:

Page 59: Circuit breakers @ API World 2016

59

Page 60: Circuit breakers @ API World 2016

60

Dark launch:

* Reject but process normally * Dangerous with side effects

Block User Request Try to process anyway!

Page 61: Circuit breakers @ API World 2016

61

Synthetic:

* Dark launching with fake requests * Not necessarily representative

Block User Request Process fake requests

Page 62: Circuit breakers @ API World 2016

62

New Behavior:

* Traffic determines health * Removal of recovery timeouts

Page 63: Circuit breakers @ API World 2016

63

* Faster(?) recovery * No timeout tuning required * Dark launching not always possible * Synthetic can be unrepresentative

Page 64: Circuit breakers @ API World 2016

64

in summary

Page 65: Circuit breakers @ API World 2016

65

Your system will fail, have a plan!

Page 66: Circuit breakers @ API World 2016

66

The basic CB is better than nothing

Page 67: Circuit breakers @ API World 2016

67

Questions to ask:

* What is “unhealthy” for my system? * How should I react to unhealthiness? * How do we recover?

Page 68: Circuit breakers @ API World 2016

68

Questions to ask:

* What is “unhealthy” for my system? * How should I react to unhealthiness? * How do we recover?

Page 69: Circuit breakers @ API World 2016

69

Questions to ask:

* What is “unhealthy” for my system? * How should I react to unhealthiness? * How do we recover?

Page 70: Circuit breakers @ API World 2016

70

Questions to ask:

* What is “unhealthy” for my system? * How should I react to unhealthiness? * How do we recover?

Page 71: Circuit breakers @ API World 2016

71

…and much more!

Page 72: Circuit breakers @ API World 2016

Much comes down to your use case

72

Page 73: Circuit breakers @ API World 2016

@YelpEngineering

fb.com/YelpEngineers

engineeringblog.yelp.com

github.com/yelp

[email protected] @scott_triglia

Page 74: Circuit breakers @ API World 2016
Page 75: Circuit breakers @ API World 2016

75

Can’t we do better than rejecting requests?

Page 76: Circuit breakers @ API World 2016

http://techblog.netflix.com/2011/12/making-netflix-api-more-resilient.html

Page 77: Circuit breakers @ API World 2016

77

How do I safely test out a new circuit breaker?

Page 78: Circuit breakers @ API World 2016

https://engineering.heroku.com/blogs/2015-06-30-improved-production-stability-with-circuit-breakers/