statistical anomaly detection for database monitoring

47
Statistical Anomaly Detection June 2014

Upload: vividcortex

Post on 19-Jul-2015

216 views

Category:

Software


2 download

TRANSCRIPT

Statistical Anomaly Detection

June 2014

Optimization, Backups, Replication, and more

Baron Schwartz, Peter Zaitsev &

Vadim Tkachenko

High PerformanceMySQL

3rd Edition

Covers Version 5.5

Optimization, Backups, Replication, and more

Baron Schwartz, Peter Zaitsev &

Vadim Tkachenko

High PerformanceMySQL

3rd Edition

Covers Version 5.5

www.vividcortex.com | [email protected]

THE ONLY TOOL YOU NEEDfor MySQL Performance Management

www.vividcortex.com | [email protected]

The Problem

• Measure all the things!

• Ooops. That’s a lot of data.

• Alert spam!

• Now what?

• Find the signal in the noise... but how?

www.vividcortex.com | [email protected]

“Maybe Anomaly Detection”

• “Show me anomalies, not all the data!”

• (There’s a certain logic to this; thresholds are a crude anomaly detection method.)

www.vividcortex.com | [email protected]

Typical Train Of Thought

• Anomalies are a subset

• Likely to be the important subset

• Anomalies thus likely to be bad and rare

• Anomalies will restore me to sanity

www.vividcortex.com | [email protected]

Techniques

• Statistics

• Machine Learning

• Artificial Intelligence

• Physical Models

• More...

www.vividcortex.com | [email protected]

Deflating Some Of The Things

• Anomalies aren’t rare

• Anomalies aren’t bad

• Anomalies aren’t objective truths

www.vividcortex.com | [email protected]

More Deflation

• Predicting isn’t judging

• You’re probably applying a model, frame of reference, and value judgment unconsciously

www.vividcortex.com | [email protected]

The Usual Process

• Define “normal”

• Predict

• Compare (quantify prediction error)

• Flag as anomalous if error too large

www.vividcortex.com | [email protected]

MORE Assumptions!

• “Data is normally distributed”

• “Normal distribution and sigmas is the model”

• “Everyone knows 3 sigmas is the standard”

• “All models result in normally distributed errors”

www.vividcortex.com | [email protected]

In Reality...

• Gaussian models are oft-used because it’s convenient, not because it’s the sole truth

• Sigmas are just a proxy for probabilities

www.vividcortex.com | [email protected]

As Simple As...?

I’d phrase it this way: if you can find a meaningful model that non-destructively transforms the data such that the mean is stable and the prediction errors are normally distributed, and you define an anomaly as an event whose prediction error is larger than 99.7% of prediction errors, then anomaly detection is simple 3-sigma math. That’s a lot of assumptions, but at least they’re stated.

Characterization of Past

Central Tendency

Forecast/Predict

Deviation

Judgment

Anomaly?

www.vividcortex.com | [email protected]

Control ChartsIs the process within normal limits?

www.vividcortex.com | [email protected]

ProblemControl charts assume a stationary mean.

Systems are “less normal” than we assume, in both senses.

www.vividcortex.com | [email protected]

RecencyWhat is a system’s “recent” normal?

www.vividcortex.com | [email protected]

Moving AverageAverage over a window of recent data

www.vividcortex.com | [email protected]

ProblemsMoving average is “more expensive” to compute

Moving average is influenced by “distant” past

www.vividcortex.com | [email protected]

These days should be remembered and kept throughout every generation

- Esther 9:28

Remember All The Things

www.vividcortex.com | [email protected]

Exponential Moving Averages

• Infinite memory, biased towards recent history (past data trails off to nothing)

• Cheap to compute

• Choose a decay factor α

• St = αxt + (1-α) St-1

0

100

200

300

400

1 2 3 4 5

0

100

200

300

400

1 2 3 4 5

1 2 3 4 50

100

200

300

400

0

100

200

300

400

1 2 3 4 5

1 2 3 4 50

100

200

300

400

0

100

200

300

400

1 2 3 4 5

1 2 3 4 50

100

200

300

400

0

100

200

300

400

1 2 3 4 5

1 2 3 4 50

100

200

300

400

0

100

200

300

400

1 2 3 4 5

1 2 3 4 50

100

200

300

400

0

100

200

300

400

1 2 3 4 5

1 2 3 4 50

100

200

300

400

www.vividcortex.com | [email protected]

Choosing Decayα = 2/(N+1), where N is desired avg age of samples

www.vividcortex.com | [email protected]

Exponential Moving Control Charts

• Need exponential moving average - easy

• Need exponential moving standard deviation - hmm.

• Standard deviation = square root of variance

• Variance = “mean of square minus square of mean”

• MVP solution: exponential moving avg of squared values

• (see also http://en.wikipedia.org/wiki/EWMA_chart)

www.vividcortex.com | [email protected]

Shortcomings

• Works well when data is approximately normally distributed

• Non-Gaussian data throws “standard deviation” for a loop; false positives ensue

• Requires more advanced techniques

• STILL USEFUL ANYWAY.

www.vividcortex.com | [email protected]

Solutions (?)

• Non-parametric tests

• Non-statistical methods

• Throw out anomaly detection?

www.vividcortex.com | [email protected]

Other Disciplines

• Finance

• Weather Forecasting

• Signal Processing

• Statistics, More Broadly

• Physics

www.vividcortex.com | [email protected]

Companies / References

• VividCortex doesn’t have a dog in this fight...

• If you’re interested, look into:

• Metafor Software

• Numenta Grok

• Etsy’s Kale

• Ted Dunning at MapR

• Anton Lebedevich (mabrek.github.io)

• Disclaimer: I haven’t yet seen results good enough to alert on.

www.vividcortex.com | [email protected]

What Does VividCortex Do?

• We use Adaptive Fault Detection

• Related, but NOT anomaly detection

• Uses is-work-getting-done model

• Detects small stalls / unavailability

• Not an all-purpose tool, but it’s useful

www.vividcortex.com | [email protected]

Conclusions

• Anomaly detection won’t fix alert spam IMO.

• It can be a good assistive technology.

• It can be a good component of a larger system.

• IT/monitoring is hoping for a silver bullet.

• I suggest focusing on meaning, not metrics.

www.vividcortex.com | [email protected]

Need Performance Management?

• Performance Management, not Monitoring

• Unifying Principle: Measure Work Getting Done

• 1-Second Resolution, Deep Insight

• Super-Simple Install, Zero Disruption/Config

• Fully Hosted, Low-Cost