"building anomaly detection for large scale analytics", yonatan ben shimon, anodot...
TRANSCRIPT
1
Building Anomaly Detection For Large Scale Analytics
Yonatan Ben-Simhon, Anodot Architect16th May, 2016
2
Outline
What is anomaly detection?
Design principals for Anomaly Detection
Anomaly detection? Why do I need it?
Anomaly Detection Methods
The Anodot System
3
What is Anomaly Detection?
4
Find the Anomaly
5
Anomaly Detection in Time Series Signals
Unexpected change of temporal pattern of one or more time series signals.
6
Why Anomaly Detection?
7
Detecting the Unknowns Saves Time + Money
Industrial IoTProactive Maintenance
Detecting issues in factories/machines
Web ServicesDetecting business incidents + unknown
business opportunities
Machine LearningClosing the “Machine Learning” loop
Tracking and detecting ”unknowns” not modeled during training
SecurityDetection of unknown breach/attack
patterns
8
Detecting Business Incidents: Metric Driven Detection
Business
Business Generation: Leads, visitors, usage,
engagements
App: Performance, errors, usability
Infra utilization/state: Middleware, network, System
e.g., Purchases per product, Conversions per campaign…
Per Geo, user segment, page, browser, device…
Per class, method, feature…
Per host, database, switch…
9
Detecting Business Incidents: Metric Driven DetectionDrop in # of visitors
Decrease in ad conversion on Android Price glitch – increase in purchases / decrease in revenue
10
Setting alerts with thresholdsDashboards
Manual Detection of Business incidents
11
Anomaly detection: Design Principals
12
Anomaly Detection: Design Considerations
TimelinessReal time vs. Batch
Detection
Scale100’s vs. Millions
of metrics
Rate of changeAdaptive vs. Offline
learning
ConcisenessUnivariate vs.
Multivariate methods
13
Timeliness: Real time vs. Batch Detection
Real time detection Batch detection
Online learning – cannot iterate over the data
More prone to False Positives
Scales more easily
Batch learning – can iterate over the data
Easier to reduce False Positives
Harder to scale
14
Rate of change
Fast changes Slow changes
• Most common case• ”Closed” systems – e.g., airplanes,
large machinery
• Requires adaptive algorithms• Learn once and apply the model for
a long time
15
Conciseness of Anomalies
Univariate Anomaly Detection Multivariate Anomaly Detection
• Learn normal model for each metric
• Anomaly detection at the metric level
• Easier to scale• Easier to model many types of
behaviors• Causes anomaly storms
• Learn single model for all metrics
• Anomaly detection of complete incident
• Hard to scale• Hard to interpret the anomaly• Often requires metric behaviour
to be homogeneous
Hybrid approach
• Learn normal model for each metric
• Combine anomalies to single incidents if metrics are related
• Scalable• Can combine multiple types of
metric behaviours
16
Anomaly Detection Methods
17
Unsupervised Anomaly Detection
General scheme
Step 1 Step 2 Step 3
Model the normal behavior of the metric(s) using a statistical model
Devise a statistical test to determine if samples are explained by the model.
Apply the test for each sample. Flag as anomaly if it does not pass the test
18
Very Simple Model
1σ1σ
2σ2σ
3σ3σ
μ
99.7%
95.4%
68%
Assume normal behavior is the Normal distribution
Estimate the average, standard deviation over all samples
Test: any sample |x-average|> 3*standard deviation is abnormal
19
A single model does not fit them all!
Smooth (stationary)
Irregular sampling
Multi Modal Sparse
Discrete “Step”
20
Example Online Models/Algorithms
4
2
1
3
Simple Moving Average
Double/Triple exponential (Holt-
Winters)
Kalman Filters + ARIMA and variations
Single exponential forgetting
21
Example: The importance of modeling seasonality
Single seasonal pattern
22
Example Methods to detect seasonality
Finding maximums in Auto-correlation of signal
Computationally expensive
More robust to gaps
Finding maximum(s) in Fourier transform of signal
Challenging to detect low frequency seasons
Challenging to discover multiple seasons
Sensitive to missing data
Exhaustive search based on cost function
Computationally expensive
Robust to gaps
Challenging to discover multiple seasons
23
Large scale anomaly detection – the Anodot system
24
Automatic Anomaly Detection in five Steps: The Anodot Way
Metrics Collection – Universal, scale to millions
Normal behavior learning
Abnormal behavior learning
Behavioral Topology Learning
Feedback Based
Learning
1 2 3 4 5
2525
Webinar
Thank you