Download - Monitoring & Observability
![Page 1: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/1.jpg)
Monitoring & ObservabilityGetting off the starting blocks.
Wednesday, August 21, 13
![Page 2: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/2.jpg)
THE MANY FACES OF THEOFUN WITH BEARDS AND HAIR
FUCK IT ALLVENDETTA SCARY
DETERMINED CAREFREE NO-FLYZONE
Wednesday, August 21, 13
![Page 3: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/3.jpg)
Agenda
Define stuff.
Set some tenets.
Discuss and implement some tenets.
Answer a lot of questions.
Wednesday, August 21, 13
![Page 4: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/4.jpg)
Monitoring... what it is.
We’ll get to that.
Wednesday, August 21, 13
![Page 5: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/5.jpg)
Observability
Being able to measure “things” orwitness state changes.
Not useful if doing so alters behavior (significantly).
Wednesday, August 21, 13
![Page 6: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/6.jpg)
Development & Production
For the rest of this talk...
There is only production.
Wednesday, August 21, 13
![Page 7: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/7.jpg)
Data & Information Terms
Measurement: a single measurement of something
a value on which numerical operations make sense:
1, -110, 1.234123, 9.886-19, 0, null
“200”, “304”, “v1.234”, “happy”, null
Wednesday, August 21, 13
![Page 8: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/8.jpg)
Data & Information Terms
Metric: something that you are measuring
The version of deployed code
Total cost on Amazon services
total bugs filed, bug backlog
Total queries executed
Wednesday, August 21, 13
![Page 9: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/9.jpg)
Notice no rates
DO NOT STORE RATES.
Wednesday, August 21, 13
![Page 10: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/10.jpg)
Measurement Velocity
The rate of change of measurements.
Wednesday, August 21, 13
![Page 11: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/11.jpg)
Perspective
Sometimes perspective matters
page load times, DNS queries,
consider RUM (real user monitoring)
Usually it does not
total requests made against a web server
Wednesday, August 21, 13
![Page 12: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/12.jpg)
Visualization
The assimilation ofmultiple measurements intoa visual representation.
Wednesday, August 21, 13
![Page 13: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/13.jpg)
Trending
Understanding the“direction” of series of measurements on a metric.
Here direction is loose and means “pattern within.”
Wednesday, August 21, 13
![Page 14: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/14.jpg)
Alerting
To bring something to one’s attention.
Wednesday, August 21, 13
![Page 15: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/15.jpg)
Anomaly Detection
The determination that aspecific measurement isnot within reason.
Wednesday, August 21, 13
![Page 16: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/16.jpg)
Monitoring... what it is.
All of that.
Wednesday, August 21, 13
![Page 17: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/17.jpg)
ReviewMeasurement
Measurement Velocity
Metric
Perspective
Visualization
Trending
Alerting
Anomaly Detection
Observability
Monitoring
Wednesday, August 21, 13
![Page 18: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/18.jpg)
Some Tenets
Most people suck at monitoring.
They monitor all the wrong things (somewhat bad)
The don’t monitor the important things (awful)
Wednesday, August 21, 13
![Page 19: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/19.jpg)
Do not collect rates of things
Rates are like trees making sounds falling in the forest.
Direct measurement of rates leads to data lossand ultimately ignorance.
Wednesday, August 21, 13
![Page 20: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/20.jpg)
Prefer high level telemetry
1. Business drivers via KPIs,
2. Team KPIs,
3. Staff KPIs,
4. ... then telemetry from everything else.
Wednesday, August 21, 13
![Page 21: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/21.jpg)
Implementation
Herein it gets tricky.
Wednesday, August 21, 13
![Page 22: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/22.jpg)
Only because of the tools.
I could show you how to use tool X, or Y or Z.
But I wrote Reconnoiter and founded Circonusbecause X, Y and Z didn’t meet my needs.
Reconnoiter is open.
Circonus is a service.
Wednesday, August 21, 13
![Page 23: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/23.jpg)
Methodology
I’m going to focus on methodologythat can be applied across whatever toolset you have.
Wednesday, August 21, 13
![Page 24: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/24.jpg)
Pull vs. Push
Anyone who says one is better than the other is...WRONG.
They both have their uses.
Wednesday, August 21, 13
![Page 25: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/25.jpg)
Reasons for pull
1. Synthesized observation is desirable.
2. Observable activity is infrequent.
3. Alterations in observation frequency are useful.
Wednesday, August 21, 13
![Page 26: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/26.jpg)
Reasons for push
Direct observation is desirable.
Discrete observed actions are useful.
Discrete observed actions are frequent.
Wednesday, August 21, 13
![Page 27: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/27.jpg)
False reasons.
Polling doesn’t scale.
Wednesday, August 21, 13
![Page 28: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/28.jpg)
Protocol Soup
The great thing about standards is...there are so many to choose from.
Wednesday, August 21, 13
![Page 29: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/29.jpg)
Protocol Soup
SNMP(v1,v2,v3) both push(trap) and pull(query)
collectd(v4,v5) push only
statsd push only
JMX, JDBC, ICMP, DHCP, NTP, SSH, TCP, UDP, barf.
Wednesday, August 21, 13
![Page 30: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/30.jpg)
Color me RESTy
Use JSON.
HTTP(s) PUT/POST somewhere for push
HTTP(s) GET something for pull
Wednesday, August 21, 13
![Page 31: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/31.jpg)
High-volume Data
Occasionally, data velocity is beyond what’s reasonable for individual HTTP PUT/POST for each observation.
1. You can fall back to UDP (try statsd)
2. I prefer to batch them and continue to use REST
Wednesday, August 21, 13
![Page 32: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/32.jpg)
nad
nad is great. use nad.
https://github.com/circonus-labs/nad
Think of it like an SNMP that’s
actually Simple
Monitoring not Management
and trivial extended to suit your needs
Wednesday, August 21, 13
![Page 33: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/33.jpg)
nad online example
To the Internet ➥
Wednesday, August 21, 13
![Page 34: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/34.jpg)
But wait...
nad isn’t methodology...
it’s technology.
Wednesday, August 21, 13
![Page 35: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/35.jpg)
Correct...
Back to the topic.
I talked about nad briefly to provide asuper simple tool to erase the question:“but how?”
Wednesday, August 21, 13
![Page 36: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/36.jpg)
The real question is: “what?”
What should I be monitoring?
This is the best question you can ask yourself.
Before you start.
While you’re implementing.
After you’re done.
Wednesday, August 21, 13
![Page 37: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/37.jpg)
The industry answer:
MONITOR ALL THE THINGS!
I’ll tell you this too, in fact.
But we have put the cart ahead of the horse.
Wednesday, August 21, 13
![Page 38: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/38.jpg)
Question?
If I could monitor one thing, what would it be?
hint: CPU utilization on your web server ain’t it.
Wednesday, August 21, 13
![Page 39: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/39.jpg)
Answer:
It depends on your business.
If you don’t know the answer to this,I suggest you stop worrying about monitoringand start worrying about WTF your company does.
Wednesday, August 21, 13
![Page 40: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/40.jpg)
Here, we can’t continue.
Unless I make stuff up...
So, here I go makin’ stuff up.
Wednesday, August 21, 13
![Page 41: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/41.jpg)
Let us assume
we run a web site where customers buy products
Wednesday, August 21, 13
![Page 42: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/42.jpg)
Monitoring purchases.
So, we should monitor how many purchases were made and ensure it is within acceptable levels.
Not so fast.
Wednesday, August 21, 13
![Page 43: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/43.jpg)
Actually.
We want to make sure customerscan purchase from the site andare purchasing from the site.
This semantic different is critically important.
And choosing which comes down to velocity.
Wednesday, August 21, 13
![Page 44: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/44.jpg)
What is this velocity thing?
Displacement / time(i.e. purchases/second or $/second)
BUT WAIT! You said:“Do not collect rates of things.”
Correct...collect the displacement,visualize and alert on the rate.
Wednesday, August 21, 13
![Page 45: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/45.jpg)
So which?
High velocity w/ predictably smooth trends:velocity is more important
Low velocity or uneven arrival rates:measuring capability is more important
Wednesday, August 21, 13
![Page 46: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/46.jpg)
To rephrase
If you have sufficient real data,observing that data works best;
otherwise, you mustsynthesize data and monitor that.
Wednesday, August 21, 13
![Page 47: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/47.jpg)
As a tenet.
Always synthesize.
additionally observe real data when possible
Wednesday, August 21, 13
![Page 48: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/48.jpg)
More demonstrable(in a short session)
I’ve got a web site that my customers need to visit.
The business understands that we need to serve customers with at least a basic level of QoS:no page loads over 4s
Wednesday, August 21, 13
![Page 49: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/49.jpg)
Active checks.
Wednesday, August 21, 13
![Page 50: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/50.jpg)
A first attempt
curl http://surge.omniti.com/
extract the HTTP response code
if 200, we’re super good!
Admittedly not so good.
Wednesday, August 21, 13
![Page 51: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/51.jpg)
A wealth of data.
Synthesizing an HTTPS GET could provide:
SSL Subject, validity, expiration
HTTP code, Headers and Content
Timings on TCP connection, first byte, full payload
Wednesday, August 21, 13
![Page 52: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/52.jpg)
Still, this is highly imperfect.
Don’t get me wrong, they are useful.We use them all over the place... they are cheap.
But, ideally, you want to load the page closer to the way a user does (all assets, javascript, etc.)
Enter phantomjs
Wednesday, August 21, 13
![Page 53: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/53.jpg)
var page = require('webpage').create();page.viewportSize = { width: 1024, height: 768 };
page.onError = function(err) { stats.errors++; };page.onInitialized = function() { start = new Date(); };page.onLoadStarted = function() { stats.load_started = new Date() - start; };page.onLoadFinished = function() { stats.load_finished = new Date() - start; };page.onResourceRequested = function() { stats.res++; };page.onResourceError = function(err) { stats.res_errors++; };page.onUrlChanged = function() { stats.url_redirects++; };
page.open('http://surge.omniti.com/', function(status) { stats.status = status; stats.duration = new Date() - start; console.log(JSON.stringify(stats)); phantom.exit();});
Wednesday, August 21, 13
![Page 54: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/54.jpg)
var start, stats = { status: null, errors: 0, load_started: null, load_finished: null, resources: 0, resource_errors: 0, url_redirects: 0};
Wednesday, August 21, 13
![Page 55: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/55.jpg)
Passive checks.
Wednesday, August 21, 13
![Page 56: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/56.jpg)
Now for the passive stuff
Some examples are Google Analytics, Omniture, etc.
Statsd (out-of-the-box) and Metricsare mediocre approach.
If we have a lot of observable data N,N̅ isn’t so useful,!, |N|, q(0.5), q(0.95), q(0.99), q(0), q(1), add a lot.
Wednesday, August 21, 13
![Page 57: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/57.jpg)
Still... we can do better.
N̅, !, |N|, q(0,0.5,0.95,0.99,1) is 8 statistical aggregates
Let’s look at API latencies...say we do 1000/s,that’s 60k/minute.
Over a minute of time, 60k points to 8 represents...a lot of information loss.
Wednesday, August 21, 13
![Page 58: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/58.jpg)
First 60k/minute, how?
statsd
http puts
logs
etc.
Wednesday, August 21, 13
![Page 59: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/59.jpg)
Histograms
Wednesday, August 21, 13
![Page 60: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/60.jpg)
Histograms 101This.
This is a histogram.
It shows the frequency ofvalues within a population.
Height represents frequency
Wednesday, August 21, 13
![Page 61: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/61.jpg)
Histograms 101This.
This is a histogram.
It shows the frequency ofvalues within a population.
Now, height and colorrepresents frequency
Wednesday, August 21, 13
![Page 62: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/62.jpg)
This.
This is a histogram.
It shows the frequency ofvalues within a population.
Now, only colorrepresents frequency
Histograms 101
Wednesday, August 21, 13
![Page 63: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/63.jpg)
This.
This is a histogram.
It shows the frequency ofvalues within a population.
Now, only colorrepresents frequency
Histograms ➠ time series
at a single time interval
Wednesday, August 21, 13
![Page 64: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/64.jpg)
A line graph of data.
Wednesday, August 21, 13
![Page 65: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/65.jpg)
A heatmap of data.
Wednesday, August 21, 13
![Page 66: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/66.jpg)
Zoomed in on a heatmap.
Wednesday, August 21, 13
![Page 67: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/67.jpg)
Unfolding to a histogram.
Wednesday, August 21, 13
![Page 68: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/68.jpg)
Observability
I don’t want to launch into a tutorial on DTracedespite the fact that you can simple spin up an OmniOS AMI in Amazon and have it now.
Instead let’s talk about what shouldn’t happen.
Wednesday, August 21, 13
![Page 69: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/69.jpg)
The production questions:I wonder if that queue is backed up...
Performance like that should only happen if our binary tree is badly imbalanced (replace with countless other pathologically bad precipitates of failure); I wonder if it is...
It’s almost like some requests are super slow; I wonder if they are.
STOP WONDERING.
Wednesday, August 21, 13
![Page 70: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/70.jpg)
Instrument your software
Instrument your software and systems and stop the wonder
Do it for the kids
This is simple with DTrace & a bit more work otherwise
Avoiding work is not an excuse for ignorance
Wednesday, August 21, 13
![Page 71: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/71.jpg)
A tour through our Sauna
We have this software that stores data...happens to store all data visualized in Circonus.
We have to get data into the system.
We have to get data out of the system.
I don’t wonder... here’s why.
Wednesday, August 21, 13
![Page 72: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/72.jpg)
Wednesday, August 21, 13
![Page 73: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/73.jpg)
Wednesday, August 21, 13
![Page 74: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/74.jpg)
SummaryLet’s review!
Wednesday, August 21, 13
![Page 75: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/75.jpg)
Bad habits.
While monitoring all things is a good approach,
alerting on things that do not have specific remediation requirements is horribly damaging.
Wednesday, August 21, 13
![Page 76: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/76.jpg)
Data tenet.
Do not collect data twice.
That which you collect for visualizationshould be the same data on which you alert.
Wednesday, August 21, 13
![Page 77: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/77.jpg)
Alerting tenet.
A ruleset against metrics in the system should never produce an alert without documetation:
the failure condition in plain English 中文,
the business impact of the failure condition,
a concise and repeatable remediation procedure,
an escalation path up the chain.
Wednesday, August 21, 13
![Page 78: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/78.jpg)
Alerting post mortems
Try this out:
for each alert, run a post mortem exercise
understand why it alerted, what was done to fix
rehash who the stakeholders arehave them in the meeting
have the stakeholder speak to the business impact
Wednesday, August 21, 13
![Page 79: Monitoring & Observability](https://reader030.vdocument.in/reader030/viewer/2022012213/61df76322805b036277a40ff/html5/thumbnails/79.jpg)
Thank you!
Wednesday, August 21, 13