![Page 1: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/1.jpg)
Architecting for Failure Why are distributed systems so hard?
Markus Eisele
![Page 2: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/2.jpg)
@myfear
![Page 3: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/3.jpg)
Evolution
![Page 4: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/4.jpg)
Extreme Uptime (99.999)
Vertical Scaling
Custom Hardware
Hardware High Availability
Centralized
Designed for availability (99.9)
Commodity Hardware
Replicated
Designed for failure (99.999)
Horizontal Scaling
Virtualized / Cloud
Software High Availability
Distributed
Centralized Shared Self Service
“Big Iron” “Enterprise” “Cloud”
![Page 5: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/5.jpg)
60s 80s 90s 2000 2014 2016 2020 2030
Num
ber o
f Ent
erpr
ise
Proj
ects
Mainframe Enterprise Cloud
Distribution of Projects over time.Disclaimer:My personal prediction!
![Page 6: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/6.jpg)
Today’s biggest problem?
![Page 7: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/7.jpg)
High Infrastructure Cost11%
Awful Downtime9%
Meeting Demand21%
Release Frquency20%
Developer Velocity39%
![Page 8: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/8.jpg)
Meeting demands.
http
://w
ww
.inte
rnet
lives
tats
.com
/inte
rnet
-use
rs/
J2EE
Spring
RoR
Akka
Reactive Manifesto
Microservices
![Page 9: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/9.jpg)
What the hell is “Developer Velocity“ anyway?
![Page 10: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/10.jpg)
Release frequency!!
bit.ly/helloworldmsa
![Page 11: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/11.jpg)
And this is why we have Microservices..
![Page 12: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/12.jpg)
ScaleDeployDevelopIndependently
![Page 13: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/13.jpg)
![Page 14: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/14.jpg)
REQ: Building and Scaling Microservices
• Lightweight runtime• Cross – Service Security• Transaction Management• Service Scaling• Load Balancing• SLA’s• Flexible Deployment• Configuration• Service Discovery• Service Versions
• Monitoring• Governance• Asynchronous communication• Non-blocking I/O• Streaming Data• Polyglot Services• Modularity (Service definition)• High performance persistence (CQRS)• Event handling / messaging (ES)• Eventual consistency• API Management• Health check and recovery
![Page 15: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/15.jpg)
If the components do not compose cleanly, then all you are doing is shifting complexity from inside a component to the connections between components. Not just does this just move complexity around, it moves it to a place that's less explicit and harder to control.Martin Fowler
https://martinfowler.com/articles/microservices.html
“
![Page 16: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/16.jpg)
How do we handle “failures” in centralized or shared infrastructures?
![Page 17: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/17.jpg)
![Page 18: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/18.jpg)
Why did Application Server become a thing?
• Network and Threading• Two Phase Commit (2PC)• Shared resources• Manageability• Clustering supports scalability,
performance, and availability.• Programing models• Standardization
https://antoniogoncalves.org/2013/07/03/monster-component-in-java-ee-7/
![Page 19: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/19.jpg)
Checked vs. Unchecked Exceptions
If a client can reasonably be expected to recover from an exception, make it a checked exception. If a client cannot do anything to recover from the exception, make it an unchecked exception.
“
https://docs.oracle.com/javase/tutorial/essential/exceptions/runtime.html
![Page 20: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/20.jpg)
It wasn’t easy – but manageable.
https://docs.oracle.com/javase/tutorial/essential/exceptions/runtime.html
• MVC handles checked• Global exception handlers handle unchecked• Centralized log files
![Page 21: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/21.jpg)
![Page 22: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/22.jpg)
'If it ain't broke, don't fix it!' Bert Lance 1977.
“
![Page 23: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/23.jpg)
What is different for Microservices?
![Page 24: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/24.jpg)
Microservices are Distributed Systems.
![Page 25: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/25.jpg)
![Page 26: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/26.jpg)
![Page 27: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/27.jpg)
• Reactive Microservices Framework for the JVM• Focused on right sized services• Asynchronous I/O and communication as first class
priorities• Highly productive development environment• Takes you all the way to production• https://github.com/lagom/online-auction-java
What is Lagom?
![Page 28: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/28.jpg)
Protect Yourself
with Circuit Breakers
![Page 29: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/29.jpg)
CircuitBreakers
![Page 30: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/30.jpg)
CircuitBreakers
![Page 31: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/31.jpg)
CircuitBreakers
![Page 32: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/32.jpg)
CircuitBreakers
![Page 33: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/33.jpg)
Circuit Breakersdefault Descriptor descriptor() {
return named("item").withCalls(pathCall("/api/item", this::createItem),restCall(Method.POST, "/api/item/:id/start", this::startAuction),pathCall("/api/item/:id", this::getItem),restCall(Method.PUT, "/api/item/:id", this::updateItem),pathCall("/api/item?userId&status", this::getItemsForUser))
.withCircuitBreaker(CircuitBreaker.identifiedBy("item"))
![Page 34: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/34.jpg)
Degraded beats
Unavailable
![Page 35: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/35.jpg)
Degraded > Unavailable
Search
Bid
Item
![Page 36: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/36.jpg)
Degraded>Unavailable
Search
Bid
Item
![Page 37: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/37.jpg)
CompletionStage<PSequence<Bid>> bidHistoryFuture = bidService.getBids(itemUuid)
.invoke().exceptionally(error -> {log.warn("Bidding service failed to load", error);
return TreePVector.empty()});
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletionStage.html#exceptionally-java.util.function.Function-
![Page 38: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/38.jpg)
Bulkheading(Kind of Important)
![Page 39: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/39.jpg)
![Page 40: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/40.jpg)
Duplication isn’t a bad
thing
![Page 41: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/41.jpg)
Degraded > Unavailable
Search
Bid
Item
![Page 42: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/42.jpg)
Publish/SubscribeTopic<BidEvent> bidEvents();
default Descriptor descriptor() {return named("bidding").withCalls(
pathCall("/api/item/:id/bids", this::placeBid),pathCall("/api/item/:id/bids", this::getBids)
).publishing(topic("bidding-BidEvent", this::bidEvents)
)
![Page 43: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/43.jpg)
Publish/SubscribeTopic<BidEvent> bidEventTopic = biddingService.bidEvents();bidEventTopic.subscribe()
.atLeastOnce(Flow.<BidEvent>create().map(this::toDocument).mapAsync(1, indexedStore::store));
![Page 44: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/44.jpg)
Always have a plan B.
![Page 45: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/45.jpg)
•Fallback pattern (cache instead of dB)•The cost of resilience should be accuracy or latency.
•CAP Theorem: Your choice: sacrifice availability or consistency. You can't have all three.
What you can do..
https://codahale.com/you-cant-sacrifice-partition-tolerance/
![Page 46: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/46.jpg)
Do you remember?
![Page 47: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/47.jpg)
8 fallacies of distributed computing
1.Thenetworkisreliable2.Latencyiszero3.Bandwidthisinfinite4.Thenetworkissecure5.Topologydoesn'tchange6.Thereisoneadministrator7.Transportcostiszero8.Thenetworkishomogeneous
![Page 48: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/48.jpg)
Lessons learned.
![Page 49: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/49.jpg)
Some things to remember.
•Distributedsystemsaredifferentbecausetheyfailoften.•Writingrobustdistributedsystemscostsmorethanwritingrobustsingle-machinesystems.
•Robust,opensourcedistributedsystemsaremuchlesscommonthanrobust,single-machinesystems.
•Coordinationisveryhard.• “It’sslow”isthehardestproblemyou’lleverdebug.• Findwaystobepartiallyavailable.
https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/
![Page 50: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/50.jpg)
Where do we go from here?
![Page 51: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/51.jpg)
http://www.ofbizian.com/2016/07/from-fragile-to-antifragile-software.html
![Page 52: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/52.jpg)
![Page 53: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/53.jpg)
Next Steps! Download and try Lagom!Project Site:http://www.lightbend.com/lagom
GitHub Repo:https://github.com/lagom
Documentation:http://www.lagomframework.com/documentation/1.3.x/java/Home.html
Example:https://github.com/lagom/online-auction-java
![Page 54: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/54.jpg)
Written for architects and developers that must quickly gain a fundamental understanding of microservice-based architectures, this free O’Reilly report explores the journey from SOA to microservices, discusses approaches to dismantling your monolith, and reviews the key tenets of a Reactive microservice:
• Isolate all the Things• Act Autonomously• Do One Thing, and Do It Well• Own Your State, Exclusively• Embrace Asynchronous Message-Passing• Stay Mobile, but Addressable• Collaborate as Systems to Solve Problems
http://bit.ly/ReactiveMicroservice
![Page 55: Architecting for failure - Why are distributed systems hard?](https://reader031.vdocument.in/reader031/viewer/2022022411/58ecf8fb1a28ab48328b4577/html5/thumbnails/55.jpg)
The detailed example in this report is based on Lagom, a new framework that helps you follow the requirements for building distributed, reactive systems.
• Get an overview of the Reactive Programming model and basic requirements for developing reactive microservices
• Learn how to create base services, expose endpoints, and then connect them with a simple, web-based user interface
• Understand how to deal with persistence, state, and clients
• Use integration technologies to start a successful migration away from legacy systems
http://bit.ly/DevelopReactiveMicroservice