4developers 2015: designing for failure - architecting fault-tolerant system - jakub derda

23
architecting for failure building fault-tolerant systems Jakub Derda Warsaw, 2015

Upload: proidea

Post on 16-Jul-2015

43 views

Category:

Software


0 download

TRANSCRIPT

architecting for failure building fault-tolerant systems

Jakub Derda

Warsaw, 2015

‘Tree’ component – overview

‘Tree’ component – detailed view

‘Tree’ component – detailed view

client

network connection

sever

‘Tree’ component – detailed view

human factor software client library

ISP protocol stack network

load balancers OS power source

client

network connection

sever

Your component – detailed viewWhat is a fault?

What is not a fault?

Service is not working

on our side*

* Caused by e.g. technical failures, outages, corrupted data, attacks

What is a fault?

The real fault is when we don’t

deliver valueto customers.

Value delivering without working system

Bring your own wine, we’re waiting for license.Last election in Poland

What fault-tolerance is not?

It’s NOT making sure your system

never goes down.

It (eventually) will.

What is a fault-tolerance?

It’s making sure that system can

quickly recover and/or

client is not impacted.

How to solve it?

Solving – redundancy

Hot/warm replicas

Caches

Geographical distribution, CDNs

Hardware redundancy

Alternative systems and procedures

Solving – design

Stateless

Auditing

Idempotent requests

Uniqueness / randomness

Asynchronous and decoupling

EIPs

Commands, not data

Break the rules

Solving – procedures

Backup creation, cleanup and restore

QA & potential problems

Continuous integration

Deployment

Solving – observe

Dive deep, post-mortems

Identify bottlenecks

Observe key metrics

Verify assumptions

Predict traffic

Tradeoffs - simple

1/scope

QUALITY

Tradeoffs - real

cost

durability

time

consistency

trust

audit (traceability)

complexity

security

scalability

functionalitystability

reliability

extensibility

performance

maintainability

manageability

Summary

Learn to live with

crashes

Summary

Automate

procedures

Summary

Don’t be afraid to

cross the line

Fault tolerance is not a

property of a design,

it’s a process.