using change detection algorithms for detecting anomalous
TRANSCRIPT
Using Change Detection Algorithms for Detecting Anomalous Behavior in
Large Systems
Dr Veena [email protected]
Adjunct Professor, Northwestern UniversityNokia Bell Labs (retired 2020)
Anomaly Detection Concepts
Change Detection Methods
Case Study
Outline
ODSC West - November 20212
Motivation• Maintaining the health of large complex systems is a major issue.
• Anomalies in the system performance data can detect failures or impending failures.
• Goal is to detect anomalous behavior before it escalates to a service degradation or outage.
• Furthermore, diagnosing the anomalous behavior is required to determine the root cause.
• Focus of this talk:
o demonstrate our approach of offline detection of multiple change points in multivariate time series (sequential system performance data);
o through application of multivariate change detection algorithms based on non-parametric methods to detect anomalies; and
o present diagnostic information at fine time granularity for root cause analysis.
ODSC West - November 20213
Anomaly Detection Concepts
ODSC West - November 20214
ODSC West - November 20215
What is an Anomaly
• Pattern in the data that does not conform to the expected behaviouro outlier, exception, peculiarity, etc.
• Important characteristicso Different from the norm with
respect to their featureso Rare in a dataset compared
to the normal instances
Anomaly Detection is Used in Many Domains
• Intrusion detection – network traffic and server applications are monitored to detect potential intrusion attempts
• Network/System monitoring – network traffic, performance indicesand logs are monitored to detect failures in the network
• Fraud detection – log data is analysed, to detect misuse of a system, for example, financial transactions to detect credit card fraud
• Data leakage prevention – accesses to databases, file servers, etc. are logged and analyzed in near real-time to detect uncommon access patterns
• Medical applications – patient monitoring, where ECG or other body sensors are logged and analysed to detect critical or life-threatening situations
• Video surveillance – camera data is analysed for suspicious movements
ODSC West - November 20216
Anomaly Detection – context is important
ODSC West - November 20217
https://www.ncdc.noaa.gov/monitoring-references/dyk/anomalies-vs-temperature
• First, plot an average for all the data for each router
• Routers numbered 101 to 163• Each point is an average of 720 data
points – hourly data for 30 days
• Clear dichotomy between routers- Higher CPU Usage- Lower CPU Usage
• We can plot each group (higher or lower CPU usage) individually
Exploratory Analysis – Visual Methods for Anomaly Detection
Multiple router CPU usage (%) data – hourly data for 1 month – to detect anomalies
9
Multiple router CPU usage (%) data – hourly data for 1 month – to detect anomalies
Exploratory Analysis – Visual Methods for Anomaly Detection
Challenges for Anomaly Detection
• Defining baseline “normal” behaviour èrequires domain expertise.
• Normal behaviour evolves over time.
• Malicious anomalies – adversaries adapt.
• Notion of anomaly depends on application domain and context.
• Availability of labelled data for training/validation.
• Noise in the data may look like an anomaly.
• The boundary between normal and outlying behaviour not precise.
ODSC West - November 202110
Aspects of Anomaly Detection
• Nature of input data o Univariate, Multivariateo Types of attributes – binary, numeric, categoricalo Records, logs, etc.
• Availability of supervision, i.e., data labelso Unsupervised – assumption is that most of the data is normalo Semi-supervised – labels for normal datao Supervised – labels for normal data and for anomalies
• Output of anomaly detection – for each test instance o label: normal/anomaly, 0/1 … output of classification based approacheso score: output is ranked, requires a threshold parameter
• Type of anomaly — point, contextual, collective • Evaluation of anomaly detection
ODSC West - November 202111
Methods of Anomaly Detection
ODSC West - November 202112
Classification BasedRule Based
Neural Networks Based
SVM Based
Nearest Neighbor BasedDensity Based
Distance Based
StatisticalParametric
Non-parametric
Clustering Based
OtherInformation Theory Based
Spectral Decomposition Based
Visualization Based
Source: http://deboj.club/topic/statistical-anomaly-detection-techniques.html
Change Detection Methods
ODSC West - November 202113
Change Point Detection Approaches
ØBayesian
ØDistance-Based
ODSC West - November 202114
Bayesian Approach
ODSC West - November 202115
Bayesian Approach withchangepoint.mv and anomaly R packages
ODSC West - November 202116
where L is the minimum length for any collective anomaly segment.
• changepoint.mv constrains the variance when calculating C, and evaluates change points in mean and variance separately.
• anomaly alleviates this constraint and evaluates change point in mean and variance jointly.• Due to implementation differences:
— anomaly is more sensitive to pointwise anomalies and collective changes to bothmean and variance; and
— changepoint.mv produces higher signal-to-noise ratio change points.
Distance-Based ApproachAssumption: Existence of finite moments up to some order.
ODSC West - November 202117
Distance-Based Approach with ecp R package (1)Assumption: Existence of moments up to second order, w is chosen as
ODSC West - November 202118
Distance-Based Approach with ecp R package (2)
• Setting a = 1, detects change points in the mean.
• Setting a = 2, detects change points in the variance.
• By iteratively calculating this test statistic, the algorithm sequentially searches for the next time stamp at which the between-segment divergence is statistically different from the previous segment.
ODSC West - November 202119
Change Detection Case Study
ODSC West - November 202120
Mendiratta, Veena, et al. "Detecting and Diagnosing Anomalous Behavior in Large Systems with Change Detection Algorithms." 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW).
Change Point Detection
• Our dataset is weakly labeled, some anomalies are annotated as non-anomalous.
• Supervised approaches will likely result in too many false negatives.
• Therefore, we consider unsupervised methods, namely change point detection.
• Our goal is to detect change points and to diagnose their characteristics.
• In particular, we want to identify if an anomaly (change point) is:
— pointwise (single time stamp) or
— collective (collection of time stamps).
ODSC West - November 202121
Data Description
Performance data from a large network server are used to detect signals for anomalous behavior.
Data includes over 100 variables in the following categories:• CPU occupancy of various call processing entities• Internal message buffer and resource usage• Transaction rate• Packet rate• Success and failure rates of various 3GPP transactions• Latency• Average processing time and life-time of transaction resources
Data are reported and logged at 1 second intervals.
ODSC West - November 202122
Distribution of Selected Variables
ODSC West - November 202123
ODSC West - November 202124
Correlation Plot of Selected Variables
Sample Data for 4 Variables
ODSC West - November 202125
Sample Data for 4 Variables during Anomaly Event
ODSC West - November 202126
Experimental Details - parameter settingsThe data is collected every 1000 seconds for 50 features.
• changepoint.mv: Assume data are generated from a normal distribution with estimable mean. The cost function is set to be the negative log-likelihood, and the segment length penalty b = 100.Change points are detected by feature.
• anomaly: Change points are detected by feature based on shift in mean and variance jointly. Segment length penalty b = 10,000, pointwise anomaly penalty = 100, and the minimum length of collective anomaly > 60 seconds.
• ecp: The moment hyperparameter a = 1. The between-segment divergence is calculated collectively, however, each feature is input separately for more diagnostic information. Minimum length of collective anomaly is set at 60 secs to improve the signal-to-noise ratio.
ODSC West - November 202127
Results: Change Points and Variables with changepoint.mv
• Algorithm gives the last 10 change points in the specified interval.
• CP at 359: ProcDur, CXHWMark, CXinsideView, CXMgrUsage
• CP at 502: MAFAdminState
ODSC West - November 202128
Detected Change Points and Diagnostics with changepoint.mv, 22K to 23K
ODSC West - November 202129
Detected Change Points and Diagnostics with changepoint.mv, 24K to 25K
ODSC West - November 202130
Results: Change Points with ecp
• The ecp algorithm provides change points for multivariate data.
• However, it does not provide a diagnosticcapability.
• For diagnostics:
o the algorithm is run for each of the 46 variables, and
o the change points are compiled by variable (next slide).
ODSC West - November 202131
Results: Change Points and Variables with ecp, 22K to 23K
ODSC West - November 202132
Results: Change Points with anomaly by Variable,17K to 25K
ODSC West - November 202133
Results: Summary• Results presented using 3 change point detection algorithms.
• changepoint.mv explicitly outputs the change point and diagnosis.
• ecp and anomaly were run separately for each variable to extract the diagnostic information.
• Some change points and the associated variables are detected by all 3 algorithms.
• Other change points are detected by some of the algorithms.
• This suggests a couple of approaches for deploying such algorithms:
— combine the results from all the algorithms – ensemble approach – though possibly, at the risk of increasing the noise to signal ratio; or
— select the best algorithm based on the data characteristics and projectrequirements.
ODSC West - November 202134
Conclusions• Presented the application of change detection algorithms to multivariate
sequential data from a large system to demonstrate their efficacy for anomaly detection and diagnosis of root cause.
• Results showed that the algorithms can detect not only obvious, but also subtle anomalous behavior, and provide diagnostics for root cause analysis.
• Methods are applicable to most types of system performance data.
• Future work:o An under-explored area is the use of covariance in identifying
contextual anomalies.o Covariance has been shown to be a promising measure when
distribution shifts occur in some features, but at different scales.
ODSC West - November 202135
Thank You! Questions?
ODSC West - November 202136