Download - Image: xkcd.com
![Page 1: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/1.jpg)
Image: xkcd.com
Dependable Cloud Architecture
@mikewoMike Wood
http://mvwood.com
![Page 2: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/2.jpg)
Questions
@mikewo
Mike Wood
http://mvwood.com
Tack
![Page 3: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/3.jpg)
“Failure is alwaysan option.”
Image: Discovery Channel, Fair Use
![Page 4: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/4.jpg)
Protection From:
What are we looking for?
Check out: http://bit.ly/wazbizcontImages: Office ClipArt & Godzilla Releasing Corp (Fair Use)
Hardware Failure Data Corruption Network Failure Loss of Facilities
![Page 5: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/5.jpg)
Image: FOX, Fair Use
Human Error
![Page 6: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/6.jpg)
What we’re trying to achieve
1. Monitoring2. Resilient Solutions
Image: Cohdra
![Page 7: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/7.jpg)
Image: Office ClipArt
Cost vs Risk
99.999% $1, … ,000.00To get more 9’s here add more 0’s here.
![Page 8: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/8.jpg)
Image: NASA
Monitoring
![Page 9: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/9.jpg)
Functional Transparency
Image: Office ClipArt
Logging Messages
Hardware Health
Dependent Services Health
![Page 10: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/10.jpg)
Telemetry
![Page 11: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/11.jpg)
Image: NASA
Analyze your Data
![Page 12: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/12.jpg)
ResilienceImage: Office ClipArt
![Page 13: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/13.jpg)
Remember: Failure is always an option.Common Points of Failure
• Machine\application crashes• Throttling (exceeding capacity)• Connectivity\Network• External service dependencies
Focus less on the uptime of hardware and more about how the solution handles it WHEN
something fails!
![Page 14: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/14.jpg)
Try/catch != Resilient private void createFile() {
string fileName = @"c:\workingDirectory\someFileName.txt";
try {
File.Create(fileName);}catch (DirectoryNotFoundException ex)
{Trace.WriteLine(String.Format("Unable to create {0}. {1}",
fileName, ex));
throw; } } }
![Page 15: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/15.jpg)
Image: Michael Wood
Decompose your system…
![Page 16: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/16.jpg)
Capacity BufferingContent Delivery Networks (CDN’s)
Distributed Application Cache
Local Content Cache
Enables recovery during outages or
spikes in load
Image: jepler
![Page 17: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/17.jpg)
Always carry a spare75% Capacity, half of our load 75% Capacity, half of our load
50% more capacity then needed• Can absorb of temporary spikes• Time to react if need to add capacity
100% of load, 150% Capacity0% Capacity, redirect all load
Over allocated, but still functioning• Degrade, but don’t fail
SYSTEM FAILURE!!!
Image: Kevin Rosseel
![Page 18: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/18.jpg)
Request Buffering
Image: Joe Shlabotnik
QueuesRetry PoliciesAsync Workloads
![Page 19: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/19.jpg)
Dept. of Redundancy Dept.
Have a backup, somewhere elseMore than one? Cost to benefit Ratio?
Ready StateHot = full capacityWarm = scaled down, but ready to growCold = mothballed, starts from zero
Image: Mr. White
![Page 20: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/20.jpg)
Redundancy - Its about probability95% uptime 95% uptime 95% uptime 95% uptime
1 box : 5% downtime or 438hrs per year
2 boxes : 5/100 * 5/100 = 25/10,000 = 0.25% downtime or 22hrs per year
4 boxes : 5/100 * 5/100 * 5/100 * 5/100 = 625/100,000,0000.000625% downtime or 3.285 MINUTES per year
(that’s 18 ½ days!)
![Page 21: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/21.jpg)
Total Outage duration =
Time to Detect+ Time to Diagnose+ Time to Decide+ Time to ActImage: Office ClipArt
![Page 22: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/22.jpg)
Dynamic Addressing & Configuration
![Page 23: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/23.jpg)
What about your data?
Image: barrymieny
![Page 24: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/24.jpg)
Availability via Degradation
Image: Michael Wood
![Page 25: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/25.jpg)
Images: Gizmodo
Virtualization and Automation
![Page 26: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/26.jpg)
Images: Orion Pictures owns Terminator Franchise
![Page 27: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/27.jpg)
The “HI” Point
Check out: http://bit.ly/wazinternalsImages: Office Clip Art
![Page 28: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/28.jpg)
Image: NASA
![Page 29: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/29.jpg)
“Don't be too proud of this technological terror you've constructed…”
ADMIT:• Your Solution WILL fail at some point• You can learn from others just as
well as yourself
DO:• Root cause analysis• Read other root cause analysis• Plan for failure
DON’T:• Get cocky• Stick your head in the sand
Images: LucasFilm, Fair Use
![Page 30: Image: xkcd.com](https://reader036.vdocument.in/reader036/viewer/2022062302/56816771550346895ddc5d57/html5/thumbnails/30.jpg)
Questions@mikewo
Mike Wood
http://mvwood.com
http://bit.ly/CloudFailSafe
Tack