cs 350 lecture 5-3 resilient design -...
TRANSCRIPT
![Page 1: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/1.jpg)
CS 350 Lecture 5-3Resilient Design
Fall 2019
SoC, KAIST
Doo-Hwan Bae
![Page 2: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/2.jpg)
Resilence Design Patterns
Resources
• https://docs.microsoft.com/en-us/azure/architecture/patterns/category/resiliency
• http://microservices.io/patterns/monolithic.html
• https://conferences.oreilly.com/software-architecture/sa-eu-2017/public/schedule/detail/61746
• https://www.thoughtworks.com/de/insights/blog/scaling-microservices-event-stream
CS350, SoC, KAIST 2
![Page 3: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/3.jpg)
Contents
• What & Why?
• Resilient Patterns
“We will prepare for the armies of illogical users who do crazy, unpredictable things.” (by Michael Nygard)
CS350, SoC, KAIST 3
![Page 4: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/4.jpg)
What?
• Resilience:• Ability of a system to handle unexpected situations
• Best case: without the user noticing it
• Worst case: with a graceful degradation of service
• Part of design activity
CS350, SoC, KAIST 4
![Page 5: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/5.jpg)
Why? (1/2)
• Distributed systems are everywhere
• Fallacies of distributed systems (wrong perception/assumption)• Network is reliable, secure, homogeneous• Zero latency• Infinite bandwidth• No change on topology• One administrator• …
• Failures in distributed systems are not the exception• Normal, and even worse is ‘not predictable’• What do we do with such systems?
• Option 1: Develop a fail-free system• Many internet-service companies give up this option!
• Option 2: Embrace failures and increase availability of the system
CS350, SoC, KAIST 5
![Page 6: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/6.jpg)
Why? (2/2)
• It is getting worse and worse with recent IT evolution• Too complex to manage with traditional approaches
• Some of such system examples are• Cloud-based system
• Microservices
• Zero downtime (100% availability)
• Mobile
• IoT, CPS
• Social Web
• System of Systems
CS350, SoC, KAIST 6
![Page 7: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/7.jpg)
Resilience Approach
• Availability = MTTF / (MTTF + MTTR)- MTTF: Mean Time To Failure- MTTR: Mean Time to Repair
• How can we increase the availability of a (distributed) system?• Increase MTTF: minimize errors/failures, reliable h/w, ..• Reduce MTTR: How?
• Failure types: Crash failure, Omission failure, Timing failure, Response failure, …
CS350, SoC, KAIST 7
![Page 8: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/8.jpg)
Whole Picture for Resilient Design
CS350, SoC, KAIST 8
![Page 9: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/9.jpg)
Whole Picture for Resilient Design Techniques (by Uwe Friedrichsen, Resilient SW Design In a Nutshell)
CS350, SoC, KAIST 9
![Page 10: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/10.jpg)
Isolation
CS350, SoC, KAIST 10
![Page 11: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/11.jpg)
Isolation
• System must not fail as a whole
• Split system in parts and isolate parts against each other
• Avoid cascading failures
• Foundations of resilient software design• Separation of concerns
• High cohesion, low coupling
• Isolation patterns• Bulkhead Design• Monolithic vs. Microarchitecture
CS350, SoC, KAIST 11
![Page 12: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/12.jpg)
Bulkhead Pattern
CS350, SoC, KAIST 12
![Page 13: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/13.jpg)
Bulkhead Pattern (1/2)• Isolate elements of an application into pools so that if one fails, the others will continue to function.
CS350, SoC, KAIST 13
![Page 14: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/14.jpg)
Bulkheads Pattern (2/2)
• Core isolation pattern
• Diverse implementation choices available, such as microservice,
• Shaping good bulkheads is extremely hard• Software design issue
• Needs understanding of SE principles, domain knowledge, and system behavior, future technology evolution, etc…
CS350, SoC, KAIST 14
![Page 15: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/15.jpg)
Monolithic Architecture (1/2) (http://microservices.io/patterns/monolithic.html)
CS350, SoC, KAIST 15
![Page 16: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/16.jpg)
Monolithic Architecture (2/2)
• Benefits• Simple to develop – most of current tools support
• Simple to deploy – deploy WAR file
• Simple to scale – by running multiple copies
• Drawbacks• Difficult to understand
• Difficult to continuous deployments
• Requires a long-term commitment to a technology
CS350, SoC, KAIST 16
![Page 17: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/17.jpg)
Microservice Architecture (1/3)
• Partition a system
into small manageable
pieces, loosely coupled
CS350, SoC, KAIST 17
![Page 18: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/18.jpg)
Microservice Architecture (2/3)
• Benefits• Enables continuous delivery and deployment of large, complex applications• Organize the development effort with multiple, autonomous teams• Easier for a developer to understand• Application starts faster, more productive
• Drawbacks• Additional complexity of developing a distributed system• Difficult to test• Deployment complexity• Increased memory consumption
• M (number of different services) times more JVM
• Difficult to coordinate between teams, multiple services.
CS350, SoC, KAIST 18
![Page 19: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/19.jpg)
Microservice Architecture(3/3)
• When to use the microservice architecture• Startup?• Large-scale service provision?
• How to decompose the application into services• In short, it is an ‘art’! (design is art!)• Some strategies
• Decompose by business capability• Single Responsibility Principle (SRP), • Use case, • Functional cohesion
• How to maintain data consistency• In order to ensure loose coupling, each service has its own database. Then,
how to guarantee data inconsistency?• Check ‘Saga pattern’, ‘Event sourcing’
CS350, SoC, KAIST 19
![Page 20: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/20.jpg)
Communication Paradigm
CS350, SoC, KAIST 20
![Page 21: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/21.jpg)
Communication Paradigm
• Heavily influence resilient patterns to be used
• Request-Response vs. Event-Driven• Request-Response
• Event-Driven
CS350, SoC, KAIST 21
![Page 22: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/22.jpg)
Request-Response vs. Event Driven (1/2)
CS350, SoC, KAIST 22
![Page 23: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/23.jpg)
Request-Response vs. Event Driven (2/2) Orchestration vs. Choreography• Which one looks better?
• Why?
CS350, SoC, KAIST 23
![Page 24: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/24.jpg)
Online Shop Example (1/3)
CS350, SoC, KAIST 24
![Page 25: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/25.jpg)
Online Shop Example (2/3)
CS350, SoC, KAIST 25
![Page 26: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/26.jpg)
Online Shop Example (3/3)
CS350, SoC, KAIST 26
![Page 27: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/27.jpg)
Day 20 Wrap-Up
• What/Why Resilient Design Patterns?• Have to deal with issues on distributed application development
• Such issues used to be system developers’ concern in the past.• However, nowadays software engineers need to deal with them from
software design phase to implementation/maintenance phases.
CS350, SoC, KAIST 27
![Page 28: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/28.jpg)
Detect
CS350, SoC, KAIST 28
![Page 29: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/29.jpg)
Detect: Circuit Breaker (1/2)
• Most often cited resilient pattern
• Takes downstream unit offline if calls fail multiple times.
• Circuit breaker detects failures and prevents the application from trying to perform the action that is doomed to fail (until it's safe to retry).
• Handle faults that might take a variable amount of time to fix when connecting to a remote service or resource.
CS350, SoC, KAIST 29
![Page 30: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/30.jpg)
Detect: Circuit Breaker(2/2)
CS350, SoC, KAIST 30
![Page 31: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/31.jpg)
Recover
CS350, SoC, KAIST 31
![Page 32: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/32.jpg)
Recover: Retry
• Basic recovery pattern
• Recover from omission or other transient errors
• Limit retries to minimize extra load on an already loaded resource
• Limit reties to avoid recurring errors
CS350, SoC, KAIST 32
![Page 33: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/33.jpg)
Recover: Rollback & Roll Forward
Rollback
• Roll back state and/or execution path to a define safe state
• Recover from internal errors caused by external failures
• Use checkpoints and safe points to provide safe rollback points
• Limit retries to avoid recurring errors
Roll Forward
• Advance execution past the point of error
• Often used as escalation if retry or rollback do not succeed
• Not applicable if skipped activity is essential
CS350, SoC, KAIST 33
![Page 34: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/34.jpg)
Recover: Reset & Failover
Reset
• Often used as radical escalation of all other measures failed
• Restart service
• Reset data to a guaranteed consistent state
Failover
• Used as escalation if other measures failed
• Requires redundancy
CS350, SoC, KAIST 34
![Page 35: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/35.jpg)
Mitigate
CS350, SoC, KAIST 35
![Page 36: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/36.jpg)
Mitigate: Fallback
• Execute an alternative action if the original action fails
• Baiss for most mitigation patterns
• Silently ignore the error and continue processing
• Return a predefined default value of an error occurs
CS350, SoC, KAIST 36
![Page 37: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/37.jpg)
Mitigate: Queues for Resources
• Protect resource from temporary overload situations
• Avoid losing requests by queuing them in front of resource
• However, unlimited queues can create excessive latency
CS350, SoC, KAIST 37
![Page 38: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/38.jpg)
Mitigate: Share Load
• Use if additional resources for load sharing are available
• Share load among resources to keep throughput good
• Can be implemented statically or dynamically
• Minimize amount of synchronization needed between resources
CS350, SoC, KAIST 38
![Page 39: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/39.jpg)
Prevent
CS350, SoC, KAIST 39
![Page 40: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/40.jpg)
Prevent: Error Injection
• Inject errors at runtime and observe how the system reacts• Chaos engineering at Netflix
• Make sure to inject errors of all types
(Routine maintenance)
• Keep preventable errors from occurring
• Check system periodically and fix detected faults and errors
CS350, SoC, KAIST 40
![Page 41: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/41.jpg)
Complement
CS350, SoC, KAIST 41
![Page 42: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/42.jpg)
Complement: Redundancy
• Core resilient concept
• Basis for many recovery and mitigation patterns
• Often different variants implemented in a system• N-version program
CS350, SoC, KAIST 42
![Page 43: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/43.jpg)
Complement: Escalation
• Failed units may not have enough time or information to handle errors
• Escalation peer with more time and information needed
• Separate error handling flow from processing flow
• Often multi-level hierarchies
CS350, SoC, KAIST 43
![Page 44: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/44.jpg)
Treat: Hot deployment
• Hot-deployable services are those which can be added to or removed from the running server. It is the ability to change ON-THE-FLY what’s currently deployed without redeploying it.
• Hot deployment is VERY hot for development. The time savings realized when your developers can simply run their build and have the new code auto-deploy instead of build, shutdown, startup is massive.
• Pros: business never stops
• Cons: may require large resources
CS350, SoC, KAIST 44
![Page 45: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/45.jpg)
Whole Picture for Resilient Design Techniques (by Uwe Friedrichsen, Resilient SW Design In a Nutshell)
CS350, SoC, KAIST 45
![Page 46: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/46.jpg)
Using Resilience Patterns
• Patterns are options, not obligations
• Do not pick too many patterns
• Each pattern increase complexity which is the enemy of robustness
• Each pattern costs money
• Look for complementary patterns
CS350, SoC, KAIST 46
![Page 47: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/47.jpg)
Netflix’s Resilient Design Patterns Used
• Choose right ones
for you.
CS350, SoC, KAIST 47
![Page 48: CS 350 Lecture 5-3 Resilient Design - KAISTse.kaist.ac.kr/wp-content/uploads/2019/11/CS350-Lecture... · 2019-11-24 · •Distributed systems are every corner of our society •Attempts](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9798031ffb91682210bea/html5/thumbnails/48.jpg)
Wrap-Up
• Distributed systems are every corner of our society
• Attempts rather to have a fail-free system, better to have a resilient system.
• Resilient SW design patterns (or approaches) need to be mastered for distributed software development (design)
• Try to use of existing ones,
• Even better, “create your own patterns!”
CS350, SoC, KAIST 48