why resilience - a primer at varying flight altitudes
DESCRIPTION
This session provides a primer to resilience at varying flight altitudes. It starts at a management level and motivates why resilience is important, why it is important today and what the business case for resilience is (or actually is not). Then it descends to a high level architectural view and explains resilience a bit more in detail, its correlation to availability and the difference between resilience and robustness. Afterwards it descends to a design level and explains some selected core principles of resilience, some of them garnished with grass-root level flight altitude code examples. At the end the flight altitude is risen again and some recommendations how to introduce resilient software design into your software development process are given and the correlation to some related topics is explained. Of course this slide deck will only show a fraction of the actual talk contents as the voice track is missing but I hope it will be helpful anyway.TRANSCRIPT
![Page 1: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/1.jpg)
Why Resilience? A primer at varying flight altitudes
Uwe Friedrichsen, codecentric AG, 2014
![Page 2: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/2.jpg)
@ufried Uwe Friedrichsen | [email protected] | http://slideshare.net/ufried | http://ufried.tumblr.com
![Page 3: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/3.jpg)
Resilience? Never heard of it …
![Page 4: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/4.jpg)
re•sil•ience (rɪˈzɪl yəns) also re•sil′ien•cy, n. 1. the power or ability to return to the original form, position,
etc., after being bent, compressed, or stretched; elasticity. 2. ability to recover readily from illness, depression, adversity,
or the like; buoyancy. Random House Kernerman Webster's College Dictionary, © 2010 K Dictionaries Ltd. Copyright 2005, 1997, 1991 by Random House, Inc. All rights reserved.
http://www.thefreedictionary.com/resilience
![Page 5: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/5.jpg)
Resilience (IT) The ability of an application to handle unexpected situations
- without the user noticing it (best case) - with a graceful degradation of service (worst case)
![Page 6: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/6.jpg)
Resilience is not about testing your application
(You should definitely test your application, but that‘s a different story)
public class MySUTTest { @Test public void shouldDoSomething() { MySUT sut = new MySUT(); MyResult result = sut.doSomething(); assertEquals(<Some expected result>, result); } … }
![Page 7: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/7.jpg)
It‘s all about production!
![Page 8: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/8.jpg)
Why should I care?
![Page 9: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/9.jpg)
Business
Production
Availability
Resilience
![Page 10: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/10.jpg)
Your web server doesn‘t look good …
![Page 11: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/11.jpg)
The dreaded SiteTooSuccessfulException …
![Page 12: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/12.jpg)
Reasons to care about resilience • Loss of lives
• Loss of goods (manufacturing facilities)
• Loss of money
• Loss of reputation
![Page 13: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/13.jpg)
Why should I care about it today?
(The risks you mention are not new)
![Page 14: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/14.jpg)
Resilience drivers
• Cloud-based systems
• Highly scalable systems
• Zero Downtime
• IoT & Mobile
• Social
à Reliably running distributed systems
![Page 15: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/15.jpg)
What’s the business case?
(I don’t see any money to be made with it)
![Page 16: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/16.jpg)
Counter question
Can you afford to ignore it?
(It’s not about making money, it’s about not loosing money)
![Page 17: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/17.jpg)
Resilience business case
• Identify risk scenarios
• Calculate current occurrence probability
• Calculate future occurrence probability
• Calculate short-term losses
• Calculate long-term losses
• Assess risks and money
• Do not forget the competitors
![Page 18: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/18.jpg)
Let’s dive deeper into resilience
![Page 19: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/19.jpg)
Classification attempt
Reliability: A set of attributes that bear on the capability of software to maintain its level of performance under stated conditions for a stated period of time.
Efficiency
ISO/IEC 9126 software quality characteristics
Usability
Reliability Portability
Maintainability
Functionality
Available with acceptable latency
Resilience goes beyond that
![Page 20: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/20.jpg)
How can I maximize availability?
![Page 21: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/21.jpg)
Availability ≔ MTTF MTTF + MTTR
MTTF: Mean Time To Failure MTTR: Mean Time To Recovery
![Page 22: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/22.jpg)
Traditional approach (robustness)
Availability ≔ MTTF MTTF + MTTR
Maximize MTTF
![Page 23: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/23.jpg)
A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.
Leslie Lamport
![Page 24: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/24.jpg)
Failures in todays complex, distributed, interconnected systems are not the exception.
They are the normal case.
![Page 25: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/25.jpg)
Contemporary approach (resilience)
Availability ≔ MTTF MTTF + MTTR
Minimize MTTR
![Page 26: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/26.jpg)
Do not try to avoid failures. Embrace them.
![Page 27: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/27.jpg)
What kinds of failures do I need to deal with?
![Page 28: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/28.jpg)
Failure types
• Crash failure
• Omission failure
• Timing failure
• Response failure
• Byzantine failure
![Page 29: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/29.jpg)
How do I implement resilience?
![Page 30: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/30.jpg)
Bulkheads
![Page 31: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/31.jpg)
• Divide system in failure units
• Isolate failure units
• Define fallback strategy
![Page 32: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/32.jpg)
Redundancy
![Page 33: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/33.jpg)
• Elaborate use caseMinimize MTTR / scale transactions / handle response errors / …
• Define routing & balancing strategy Round robin / master-slave / fan-out & quickest one wins / …
• Consider admin involvementAutomatic vs. manual / notification – monitoring / …
![Page 34: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/34.jpg)
Loose Coupling
![Page 35: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/35.jpg)
• Isolate failure units (complements bulkheads)
• Go asynchronous wherever possible
• Use timeouts & circuit breakers
• Make actions idempotent
![Page 36: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/36.jpg)
Implementation Example #1
Timeouts
![Page 37: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/37.jpg)
Timeouts (1) // Basics myObject.wait(); // Do not use this by default myObject.wait(TIMEOUT); // Better use this // Some more basics myThread.join(); // Do not use this by default myThread.join(TIMEOUT); // Better use this
![Page 38: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/38.jpg)
Timeouts (2) // Using the Java concurrent library Callable<MyActionResult> myAction = <My Blocking Action> ExecutorService executor = Executors.newSingleThreadExecutor(); Future<MyActionResult> future = executor.submit(myAction); MyActionResult result = null; try { result = future.get(); // Do not use this by default result = future.get(TIMEOUT, TIMEUNIT); // Better use this } catch (TimeoutException e) { // Only thrown if timeouts are used ... } catch (...) { ... }
![Page 39: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/39.jpg)
Timeouts (3) // Using Guava SimpleTimeLimiter Callable<MyActionResult> myAction = <My Blocking Action> SimpleTimeLimiter limiter = new SimpleTimeLimiter(); MyActionResult result = null; try { result = limiter.callWithTimeout(myAction, TIMEOUT, TIMEUNIT, false); } catch (UncheckedTimeoutException e) { ... } catch (...) { ... }
![Page 40: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/40.jpg)
Implementation Example #2
Circuit Breaker
![Page 41: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/41.jpg)
Circuit Breaker – concept
Client Resource Circuit Breaker
Request
Resource unavailable
Resource available
Closed Open
Half-Open
Lifecycle
![Page 42: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/42.jpg)
![Page 43: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/43.jpg)
Implemented patterns • Timeout
• Circuit breaker
• Load shedder
![Page 44: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/44.jpg)
Supported patterns • Bulkheads
(a.k.a. Failure Units)
• Fail fast
• Fail silently
• Graceful degradation of service
• Failover
• Escalation
• Retry
• ...
![Page 45: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/45.jpg)
Hello, world!
![Page 46: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/46.jpg)
public class HelloCommand extends HystrixCommand<String> { private static final String COMMAND_GROUP = "default"; private final String name; public HelloCommand(String name) { super(HystrixCommandGroupKey.Factory.asKey(COMMAND_GROUP)); this.name = name; } @Override protected String run() throws Exception { return "Hello, " + name; } } @Test public void shouldGreetWorld() { String result = new HelloCommand("World").execute(); assertEquals("Hello, World", result); }
![Page 47: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/47.jpg)
Source: https://github.com/Netflix/Hystrix/wiki/How-it-Works
![Page 48: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/48.jpg)
Fallbacks
![Page 49: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/49.jpg)
• What will you do if a request fails?
• Consider failure handling from the very beginning
• Supplement with general failure handling strategies
![Page 50: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/50.jpg)
Scalability
![Page 51: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/51.jpg)
• Define scaling strategy
• Think full stack
• Apply D-I-D rule
• Design for elasticity
![Page 52: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/52.jpg)
… and many more • Supervision patterns
• Recovery & mitigation patterns
• Anti-fragility patterns
• Supporting patterns
• A rich pattern family
Different approach than traditional
enterprise software development
![Page 53: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/53.jpg)
How do I integrate resilience into my
software development process?
![Page 54: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/54.jpg)
Steps to adopt resilient software design
1. Create awareness: Go DevOps
2. Create capability: Coach your developers
3. Create sustainability: Inject errors
![Page 55: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/55.jpg)
Related topics
Reactive
Anti-fragility
Fault-tolerant software design
Recovery-oriented computing
![Page 56: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/56.jpg)
Wrap-up
• Resilience is about availability
• Crucial for todays complex systems
• Not caring is a risk
• Go DevOps to create awareness
![Page 57: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/57.jpg)
Do not avoid failures. Embrace them!
![Page 58: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/58.jpg)
@ufried Uwe Friedrichsen | [email protected] | http://slideshare.net/ufried | http://ufried.tumblr.com
![Page 59: Why resilience - A primer at varying flight altitudes](https://reader033.vdocument.in/reader033/viewer/2022052822/554fb1ddb4c90586258b51d0/html5/thumbnails/59.jpg)