event driven solution to monitor datacenters through continuous queries and machine learning

28
DEBS 2010 – 4 th ACM International Conference on Distributed Event-Based System Cambridge, United Kingdom HOLMES: An event-driven solution to monitor data centers through continuous queries and machine learning Pedro Henriques dos Santos Teixeira Ricardo Gomes Clemente Ronald Andreu Kaiser Denis Almeida Vieira Jr

Upload: denis-vieira

Post on 27-Jun-2015

529 views

Category:

Technology


1 download

DESCRIPTION

Our presentation made at DEBS'10, held in Cambridge, UK, in July, 2010. Describes the solution to monitor datacenters through CEP and Machine Learning.

TRANSCRIPT

Page 1: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

HOLMES: An event-driven solution to monitor data centers through continuous queries and

machine learning

Pedro Henriques dos Santos TeixeiraRicardo Gomes Clemente

Ronald Andreu KaiserDenis Almeida Vieira Jr

Page 2: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Topics

• Motivation• Use Case• The Solution

• Overview• System architecture• CEP• Machine learning• CEP & Machine learning integration• Visualization and User Interface

• Conclusion

Page 3: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Motivation

Page 4: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Motivation

• Non-stop growing environment, dynamic• Understand our environment• Too many dependencies• Can't afford downtime

Page 5: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Motivation

• Monitoring can be tricky• Precede the inevitable and try to avoid chaos• 1.2K servers• 14K+ monitored items• Correlation

Page 6: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Use Case

Page 7: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Use Case

• Big Brother Brazil• New world record• 151 million votes in 2 days• Peaks of 13500 votes per minute (~220 v/s)• DDoS atack detected

Page 8: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Overview

Page 9: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Page 10: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

The System Architecture

Page 11: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

HOLMES

Page 12: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

System architecture – modules and its purposes

• CEP module: known problems• Machine learning module: unknown problems• Visualization module: situational awareness• Storage: events history/log

Page 13: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP

Page 14: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP

• Reaction to incidents in real-time is a requirement for data center monitoring

• Expression of abstract rules related to the business is desirable

• Correlation of events through user-defined queries

Page 15: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP - Esper

• Open source CEP Implementation

• Supports an EPL

• High throughput, requirement in our context

• Ease of embed in our application

Page 16: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP – simple example

SELECT avg(response_time) FROM HTTP.win:time(5 min)

E1E5 E4 E3 E2 E1

events stream

Ei

response time...

5 min

4 t.u. 3 t.u. 2 t.u. 3 t.u. 5 t.u.

Page 17: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

If the number of sessions increase in 10% in a 3 minute window and the

average of cpu's usage of the web farm do not

increase in 5% and the number of slow queries in

the database is higher than 10, then we have achieved a

database contention situation. Alarm it!

If the number of sessions increase in 10% in a 3 minute window and the

average of cpu's usage of the web farm do not

increase in 5% and the number of slow queries in

the database is higher than 10, then we have achieved a

database contention situation. Alarm it!

Page 18: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Machine learning“any signal, which is totally predictable, carries no information” - Shannon

Page 19: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Machine learning characteristics

• FRAHST learns to detect anomalous behaviors

• Unsupervised streaming algorithm

• Linear complexity to the number of data streams

Page 20: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

FRAHST, state-of-the-art

For further information, see reference [12] in our paper.

Page 21: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Anomaly detection

Page 22: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP & Machine Learning Integration

• Users choose the data streams to be correlated

• CEP module aggregates events

• Notifications are raised whether a rank variance is detected

Page 23: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Visualization and User Interface

Page 24: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Visualization and User Interface

• Users can create Perspectives

• Real-time dashboard personalizations

• Events history visualization

Page 25: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Dashboards

Page 26: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Page 27: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Conclusion

• Successfully implementation and acceptance in a real use case

• New challenges• improving situational

awareness & prediction• Make creation of queries

more intuitive

Page 28: Event Driven Solution to monitor Datacenters through continuous queries and machine learning

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

This presentation:

http://www.slideshare.net/intelie/debs2010

Our Nagios Plugin source code:

http://github.com/intelie/neb2activemq

Intelligent Monitoring with Esper:

http://esper.codehaus.org/tutorials/tutorial/presentations.html

Denis Vieira Jr. - [email protected] Ronald Kaiser - [email protected]