the fine art of breaking stuff in production on …...resilient applications infrastructure network...

57
CHAOS ENGINEERING THE FINE ART OF BREAKING STUFF IN PRODUCTION ON PURPOSE GEERT VAN DER CRUIJSEN @GEERTVDC

Upload: others

Post on 14-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

CHAOS ENGINEERINGTHE FINE ART OF BREAKING STUFF IN

PRODUCTION ON PURPOSE

GEERT VAN DER CRUIJSEN

@GEERTVDC

Page 2: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

GEERT VAN DER CRUIJSEN

@GEERTVDC

CLOUD NATIVE ARCHITECT

#DOEPICSHIT

FULL CYCLE DEVELOPER

DEVOPS COACH

Page 3: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

CHAOS ENGINEERING ?

WHY DO WE NEED

@GEERTVDC

Page 4: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers
Page 5: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

@GEERTVDC

Page 6: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

@GEERTVDC

Page 7: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

“IN A COMPLEX LANDSCAPE YOUR APPLICATION IS NEVER FULLY UP”

@GEERTVDC

Page 8: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

TRADITIONAL MONITORING

TOOLS ARE DEAD!

@GEERTVDC

Page 9: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

MEASURE

USER IMPACT

@GEERTVDC

Page 10: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

MEASURE

USER IMPACT RELIABILITY

AVAILABILITY LATENCY

THROUGHPUT

CORRECTNESS

FRESHNESS

COVERAGE

QUALITY

DURABILITY

@GEERTVDC

Page 11: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

RESILIENT APPLICATIONS

INFRASTRUCTURE

NETWORK

APPLICATION

PEOPLE

@GEERTVDC

Page 12: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

GRACEFUL DEGRADATION

FAIL OPEN

@GEERTVDC

Page 13: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

GRACEFUL DEGRADATION

FAIL OPEN

BUT WE DO TESTS?

@GEERTVDC

Page 14: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

BUT WE DO TESTS?

UNIT A

INPUT OUTPUT

UNIT TESTS

@GEERTVDC

Page 15: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

BUT WE DO TESTS?

COMPONENT

/ SERVICE A

INPUT OUTPUTCOMPONENT

/SERVICE B

INTEGRATION TESTS

@GEERTVDC

Page 16: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

CHAOS ENGINEERING ?

WHAT IS

@GEERTVDC

Page 17: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

CHAOS ENGINEERING

IS NOT

RANDOMLY BREAKING

STUFF IN PRODUCTION@GEERTVDC

Page 18: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

CHAOS ENGINEERING

“Chaos Engineering is the discipline of

experimenting on a distributed system

in order to build confidence in the

system’s capability to withstand

turbulent conditions in production.”https//principlesofchaos.org

@GEERTVDC

Page 19: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

CHAOS ENGINEERING

“Chaos Engineering is the discipline of

experimenting on a distributed system

in order to build confidence in the

system’s capability to withstand

turbulent conditions in production.”https//principlesofchaos.org

@GEERTVDC

Page 20: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

SERVICE

INPUT OUTPUT

SERVICE

@GEERTVDC

Page 21: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

CHAOS ENGINEERING EXPERIMENTS

HOST FAILURE

RESOURCE CAPACITY ATTACKS

APPLICATION FAILURE

NETWORK ATTACKS

BRENT ATTACK

@GEERTVDC

Page 22: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

CHAOS ENGINEERING

ONLY IN PRODUCTION?

@GEERTVDC

Page 23: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

YOUR FIRST EXPERIMENT

HOW TO START

@GEERTVDC

Page 24: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

GAME DAY

@GEERTVDC

Page 25: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

INCIDENT RESPONSE LEARNING

OUTAGENORMALDETECT &

ANALYSISFIX

LEARNIMPROVE

@GEERTVDC

Page 26: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

CHAOS GAME DAY

CHAOS

EXPERIMENTNORMAL

DETECT &

ANALYSISFIX

LEARNIMPROVE

@GEERTVDC

Page 27: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

CHAOS EXPERIMENT PHASES

STEADY

STATE

DEFINE

HYPOTHESIS

DESIGN &

EXECUTELEARN FIX EMBED

@GEERTVDC

Page 28: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

STEADY STATE

STEADY

STATE

DEFINE

HYPOTHESIS

DESIGN &

EXECUTELEARN FIX EMBED

@GEERTVDC

Page 29: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

STEADY STATE

MEASURE BUSINESS METRICS

100ms extra load time drop Amazon’s sale by 1%

@GEERTVDC

Page 30: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

STEADY STATE

SERVICE

UNDER TESTROUTING SERVICE B

@GEERTVDC

Page 31: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

STEADY STATE

SERVICE

UNDER TESTROUTING SERVICE B

CONTROL

SERVICE

EXPERIMENT

SERVICE

@GEERTVDC

Page 32: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

STEADY STATE

SERVICE

UNDER TESTROUTING SERVICE B

CONTROL

SERVICE

EXPERIMENT

SERVICE

98%

1%

1%

@GEERTVDC

Page 33: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

ALWAYS BE ABLE TO ABORT

@GEERTVDC

Page 34: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

DEFINE HYPOTHESIS

STEADY

STATE

DEFINE

HYPOTHESIS

DESIGN &

EXECUTELEARN FIX EMBED

@GEERTVDC

Page 35: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

DEFINE HYPOTHESIS

BRAINSTORM WHAT CAN GO WRONG

BRING EVERYONE

DEVELOPERS

SRE / OPERATIONS

NETWORKS

BUSINESS

INFRASTRUCTURE

TESTERS

WHAT CAN GO WRONG?

WHAT IF DATABASE IS DOWN?

WHAT IF SERVICE RESPONDS SLOWER?

WHAT IF MY CACHE RESPONDS SLOW?

WHAT IF A POD DIES?

WHAT IF LOADBALANCER STOPS?

WHAT IF ….?

Page 36: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

STOP IF YOU KNOW THE

EXPERIMENT WILL BREAK

@GEERTVDC

Page 37: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

DESIGN & EXECUTE EXPERIMENT

STEADY

STATE

DEFINE

HYPOTHESIS

DESIGN &

EXECUTELEARN FIX EMBED

@GEERTVDC

Page 38: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

DESIGN & EXECUTE EXPERIMENT

START SMALL

NOTIFY PEOPLE INVOLVED

SLOWLY INCREASE BLAST RADIUS

TOOLS:GREMLIN.COM

CHAOSTOOLKIT.ORG

GITHUB.COM/NETFLIX/SIMIANARMY

GITHUB.COM/ASOBTI/KUBE-MONKEY

@GEERTVDC

Page 39: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

LEARN

STEADY

STATE

DEFINE

HYPOTHESIS

DESIGN &

EXECUTELEARN FIX EMBED

@GEERTVDC

Page 40: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

LEARN

HOW FAST DID WE RECOVER?

HOW FAST DID WE DETECT?

DO NOT BLAME!

@GEERTVDC

Page 41: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

FIX

STEADY

STATE

DEFINE

HYPOTHESIS

DESIGN &

EXECUTELEARN FIX EMBED

@GEERTVDC

Page 42: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

FIX

IMPLEMENT FIX

RERUN EXPERIMENT

@GEERTVDC

Page 43: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

EMBED

STEADY

STATE

DEFINE

HYPOTHESIS

DESIGN &

EXECUTELEARN FIX EMBED

@GEERTVDC

Page 44: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

EMBED

ONBOARDING

CONTINUOUS CHAOS

EMBED IN CULTURE

@GEERTVDC

Page 45: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

PATTERNS

RESILIENT ARCHITECTURE

@GEERTVDC

PARALLEL EXECUTION

ASYNC COMMUNICATION

QUEUE BASED LOAD DISTRIBUTION

IDEMPOTENT APIS

BULKHEAD PATTERN

CIRCUIT BREAKERS

SPLIT RESPONSIBILITIES

Page 46: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

MULTI PARALELLISM

PARALLELISM AVAILABILITY DOWNTIME PER YEAR

1 99% 3 DAYS 16 HOURS

2 99,99% 53 MINUTES

3 99,9999% 32 SECONDS

HOW PARALEL IS YOUR CLOUD COMPONENT ?

REGIONSAVAILABILITY ZONES

@GEERTVDC

Page 47: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

ASYNC COMMUNICATION

SYNC REQUIRES A CONNECTION PER REQUEST

FOCUS ON MESSAGE BASED COMMUNICATION

DECOUPLING PUB SUB LISTENER

@GEERTVDC

Page 48: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

QUEUE BASED LOAD DISTRIBUTION

@GEERTVDC

Page 49: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

QUEUE BASED LOAD DISTRIBUTION

SERVICE BUS

@GEERTVDC

Page 50: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

IDEMPOTENT APIS

HTTP METHOD IDEMPOTENCE SAFETY

GET YES YES

HEAD YES YES

PUT YES NO

DELETE YES NO

POST NO NO

PATCH NO NO

@GEERTVDC

Page 51: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

BULKHEAD PATTERN

ISOLATE WORKLOADS LIKE THE HULL OF A SHIP

@GEERTVDC

Page 52: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

CIRCUIT BREAKER

@GEERTVDC

Page 53: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

CIRCUIT BREAKER

ADD JITTER TO RETRIES

Page 54: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

SPLIT RESPONSIBILITIES

READ / WRITE SHARDING

CQRS

@GEERTVDC

Page 55: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

WRAP UP

BIG CULTURE CHANGE

FULL CYCLE DEVELOPERSPRODUCTION ACCESS

START EXPERIMENTING

START SMALL CHECK OUT TOOLSOBSERVABILITY

@GEERTVDC

Page 56: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

“CHAOS ENGINEERING DOESN’T CAUSE PROBLEMS, IT JUST REVEALS THEM”

NORA JONES – CHAOS ENGINEERING LEAD SLACK

Page 57: THE FINE ART OF BREAKING STUFF IN PRODUCTION ON …...resilient applications infrastructure network application people @geertvdc. graceful degradation fail open ... circuit breakers

GEERT VAN DER CRUIJSEN

@GEERTVDC

THANK YOU!ALL PICTURES USED ARE FROM UNSPLASHED.COM

RESOURCES

BOOKS:

Chaos engineering -O’Reilly

Chaos engineering observability -O’Reilly

TOOLS:

chaostoolkit.org

gremlin.com

github.com/netflix/simianarmy

github.com/asobti/kube-monkey

RESOURCES:principlesofchaos.org

github.com/dastergon/awesome-chaos-engineering

docs.microsoft.com/en-us/azure/architecture/patterns/category/resiliency