use case: help me from tons of alarms...e-mail, webhook and integration with external services -...
TRANSCRIPT
Use Case:Help me from tons of alarms
Copyright 2017 FUJITSU LIMITED0
OpenStack Summit Sydney6 Nov, 2017
Hisashi Osanai, Koji Nakazono, Daisuke Fujita (FUJITSU)
Share our use case
Copyright 2017 FUJITSU LIMITED
Alarm management
Got tons of alarms when a trouble
happened in only a few components
Monasca and Monasca-analytics
eliminated unnecessary alarms
Fujitsu Cloud
Monasca
Save the operator from
long investigation works!
1
Infrastructure operators need help!
Large-scale data
40 zeta bytes in 2020
Copyright 2017 FUJITSU LIMITED
Complex infrastructures
Virtualized Data Centers (w/ SDx)
Source: IDC’s Digital Universe Study, 2012
2
Data and SDx tech. interplay lead to…
Slow problem resolution times
Hard to extract clear patterns to alleviate problem
Redundancy very similar solutions
applied everywhere
Difficult to generalize common situations to
establish widely applicable best practices
Copyright 2017 FUJITSU LIMITED3
Widely applicable best practices
Realize long tail distribution
Infrastructure operators don’t want to
use vendor specific solution
To achieve it, we follow two steps
(Easy to) find “unseeing” problems in
multitude of small/mid-sized
installations
Need to share expertise and establish
common practices for the problems
Copyright 2017 FUJITSU LIMITED4
Market Trends
AIOps Platforms (Artificial Intelligence for IT Operations)
Enhance IT operations using…
big data
modern machine learning
other advanced analytics
Enable the concurrent use of multiple…
Data sources
Data collection methods
Analytical technologies
Presentation technologies
Copyright 2017 FUJITSU LIMITED
Source: Gartner, 2017
5
AIOps capabilities
Copyright 2017 FUJITSU LIMITED
Historic data management
Streaming data
management
Log data ingestion
Wire data ingestion
Metric data ingestion
Document text ingestion
Automated pattern discovery
and prediction
Anomaly detection
Root cause determination
On-premises delivery
Software as a service
Delivery related
6
Infrastructure as a service
Monitoring related Analytics related
Machine Learning for operational mgmt.
Slow problem resolution times
Help tame scale and complexity
(easy to extract common patterns)
Speedy resolution of problems
Achieving long tail distribution
No solution to share expertize and
define Machine Learning solution recipes
Copyright 2017 FUJITSU LIMITED
OpenStack community and Monasca-analytics solve this
7
Potential use cases
Copyright 2017 FUJITSU LIMITED
Share our use case in details8
Monasca
Copyright 2017 FUJITSU LIMITED9
Official project of OpenStack
Monitoring-as-a-service solution that well-integrated with OpenStack
Collects/monitors/analyzes metrics and logs of OpenStack entirely
Scalable, HA, High performance
Monasca consists of two features
Monitoring
- Metric monitoring
- Log management
Analytics
What is Monasca
Copyright 2017 FUJITSU LIMITED10
Monasca Architecture
Copyright 2017 FUJITSU LIMITED
Log API
Log Agent
LogPersister
LogTransformer
LogMetrics
Log DB
Metrics API
Metrics Agent
NotificationEngine
ThresholdEngine
Persister
Metrics DBConfig
DB
Message QueueKafka
Monasca UI
AnalyticsEngine
11
Metrics monitoring
Monitors many types of metrics
System metrics, like CPU usage, memory usage
Many applications(OpenStack services, RabbitMQ, MariaDB...)
Detects and monitors applications automatically
Monitors nova instances “cross tenant”
Provides metrics of nova instances to each tenant user without installing
the agent into instances
Copyright 2017 FUJITSU LIMITED12
Collector
Collects metrics periodically
Plugin capability for applications
Statsd
Collects metrics as a Daemon
Forwarder
Sends the metrics collected to
Monasca API
Supervisord
Manages the processes above
Metrics Agent
Monasca Architecture – Metrics Agent
Copyright 2017 FUJITSU LIMITED
Collector Forwarder Statsd
Supervisord
Metrics API
13
Monasca UI
Copyright 2017 FUJITSU LIMITED14
Monasca UI
Copyright 2017 FUJITSU LIMITED15
Monasca UI - Overview
Copyright 2017 FUJITSU LIMITED16
Monasca UI - Alarms
Copyright 2017 FUJITSU LIMITED17
Dashboard for metric
Copyright 2017 FUJITSU LIMITED18
Monasca Architecture
Copyright 2017 FUJITSU LIMITED
Log API
Log Agent
LogPersister
LogTransformer
LogMetrics
Log DB
Metrics API
Metric Agent
NotificationEngine
ThresholdEngine
Persister
Metrics DBConfig
DB
Message QueueKafka
Monasca UI
AnalyticsEngine
19
Log management
Collects logs whole of the environment
Analyzes the logs cross all services with powerful GUI
Alarms with error/critical log messages
Authenticates with Keystone
Supports multi-tenancy
Provides the infra logs to tenant userhttps://www.openstack.org/videos/boston-2017/show-me-my-packet-log-neutron-packet-logging-with-monasca
Copyright 2017 FUJITSU LIMITED20
Why is log management necessary?
Log files are distributed in micro-services systems
OpenStack components are also provided as micro-services
Message monitoring is insufficient for micro-services systems
Can detect the machine which an error occurs, but…
Difficult to trace the logs through components
The centralized log management is needed
Copyright 2017 FUJITSU LIMITED21
Dashboard for log
Copyright 2017 FUJITSU LIMITED22
Introduce Monasca
Copyright 2017 FUJITSU LIMITED23
Before introducing Monasca
Copyright 2017 FUJITSU LIMITED
Monitored Fujitsu Cloud with Zabbix...
Could NOT detect slow response of REST API
Time-consuming to check the logs of each nodes
Zabbixmonitor
investigate
Fujitsu Cloud
notify
24
Introduced Monasca
To monitor the API response
To introduce log management
Copyright 2017 FUJITSU LIMITED
Monasca
monitorcollect logs
Fujitsu Cloud
notify
investig
ate
logs
25
Monitor API response time
Transforms the time of REST API in logs to metric
OpenStack components output the API response time using oslolog
2017-10-28 17:36:05.525 6248 INFO nova.osapi_compute.wsgi.server [req-id id id - default
default] nova1 "GET /v2.1/id/servers/detail?all_tenants=True&changes-since=date
HTTP/1.1" status: 200 len: 413 time: 0.9194989
Picked up the value of “time” by log agent, and send it as a metric to
Monasca API via Statsd of the metric agent
Copyright 2017 FUJITSU LIMITED
Metrics Agent
Forwarder Statsd
Supervisord
Metrics API
novalog
neutronlog
Log AgentStatsdclient
26
Dashboard for monitoring API response
Copyright 2017 FUJITSU LIMITED27
Dashboard for monitoring API response
Copyright 2017 FUJITSU LIMITED28
Notification with Monasca
Supported several notification types
E-mail, webhook and integration with external services
- Jira, Slack, Pagerduty and HipChat are supported
Slack notification can be used with mattermost, we’re using it
Set the notification when response time is over 5 seconds
Copyright 2017 FUJITSU LIMITED
差し替え必須
29
Tons of alarms in trouble
Became to be able to monitor API response time, but…
OpenStack APIs are related with each component
When only one component was in trouble, the other
components also notified the slow down of API response
Copyright 2017 FUJITSU LIMITED
Huge number of alarms were notified in a trouble of only one component!
30
Wasteful alarms bother me
Nearly 400 alarms in a trouble, but
NO needs to care the services which is not the root cause
Quite bothering task to check all alarms...
Copyright 2017 FUJITSU LIMITED
Monasca
monitorcollect logs
Fujitsu Cloudin
vestig
ate
logs
notify
31
Our challenge with Monasca-analytics
Copyright 2017 FUJITSU LIMITED32
Motivation to solve the problem
Focus on only what matters in troubles
Reducing the tons of alarms for the operator to. . .
• Reduce the time of investigation
Copyright 2017 FUJITSU LIMITED
Reducing tons of alarms
Focus onroot cause alarms
Monasca
Notify
33
Alarm mgmt. with Monasca-analytics
Succeeded in reducing tons of alarms dramatically
Monasca-analytics solves to. . .
• Check whether an alarm is necessary or not
• Get root cause alarms
Copyright 2017 FUJITSU LIMITED
Monasca-analytics
Detect root cause alarms
Cinder
Keystone
Nova…
Alarms related to API response time 34
What is Monasca-analytics
Copyright 2017 FUJITSU LIMITED
One of the Monasca’s component
Analyze data and predict anomaly/root cause with machine
learning
Define machine learning solution as a “Recipe”
OpenStack project/repo https://github.com/openstack/monasca-analytics
35
Monasca Architecture
Copyright 2017 FUJITSU LIMITED36
Monasca-analytics Architecture: Framework
Monasca-analytics has a framework that can automatically
execute machine learning process as data flow sequence
The data flow sequence is composed by following modules
able to add new module or customize module
Copyright 2017 FUJITSU LIMITED
DataPrediction
MachineLeaning
Algorithm
DataAggregation
DataPreprocessing
DataReception
37
Monasca-analytics Architecture: Data flow sequence
Training phase
Use past data
Prediction phase
Use streaming data
Copyright 2017 FUJITSU LIMITED38
Monasca-analytics Architecture: Recipe
Configuration file which defines the data flow sequence
Store recipes as templates
Copyright 2017 FUJITSU LIMITED39
How to solve the problem
Find causality of each alarm and notify only the cause alarms
Train causality of each alarm from tons of alarms
Predict causal and affected alarms
Copyright 2017 FUJITSU LIMITED
Cinder
Keystone
Nova
Causal analysis…
Alarms related to API response time 40
Training Phase
Copyright 2017 FUJITSU LIMITED
Message QueueKafka
1. Consume list of alarms which only related to API response time from Kafka
2. Aggregate alarms and transform to training data for machine learning algorithm
3. Learning causality of each alarm
41
Prediction Phase
Copyright 2017 FUJITSU LIMITED
Message QueueKafka
1. Consume list of alarms which only related to API response time from Kafka
2. Predict causal and affected alarms
3. Push list of alarms to Kafka and notify to the Mattermost
42
Result
Copyright 2017 FUJITSU LIMITED43
Before After
Conclusion
Copyright 2017 FUJITSU LIMITED
Number of alarms
per trouble
Time of investigation
per trouble
Operator has no bother about unnecessary alarms!
5.5h
0.5h
397
8
44
Copyright 2017 FUJITSU LIMITED45
Any Questions?
46
Contacts
Gladly explain more details
Copyright 2017 FUJITSU LIMITED47
Feel free to access us if
you have any question! Hisashi Osanai
Koji Nakazono
Daisuke Fujita