an integrated framework on mining logs files for computing system management

14

Click here to load reader

Upload: feiwin

Post on 04-Jul-2015

398 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: An Integrated Framework on Mining Logs Files for Computing System Management

An Integrated Framework on Mining Logs Files forComputing System Management

Tao LiSchool of Computer

ScienceFlorida International

UniversityMiami, FL [email protected] Wei Peng

School of Computer ScienceFlorida International University

Miami, FL [email protected]

Feng LiangInsitute of Statistics and Decision Sciences

Duke UniversityDurham, NC [email protected]

Sheng MaMachine Learning for Systems

IBM T.J. Watson Research CenterHawthorne, NY [email protected]

Page 2: An Integrated Framework on Mining Logs Files for Computing System Management

Agenda

Introduction System log categorization

Text mining techniques to categorize text message into a set of common categories

Incorporating the temporal information Two approach of incorporating temporal

information to improve the categorization performance

Mining event relationships Discovering the relationships between different

events Experiments Conclusion and future work

Page 3: An Integrated Framework on Mining Logs Files for Computing System Management

Introduction

Traditional approaches for trouble shooting – relay on the knowledge and experience of domain expert.

Modern computing system are instrumented to generate huge amount of system log data

The date in log file describe Status of each component System operational changes, such as starting and stopping of

services Detection of network applications Software configuration modification Software execution errors

Complicate Different device (e.g. routers, processors, adapters) Different software component (e.g. OS, middleware, user

application) Different provider (e.g. Cisco, IBM, Microsoft)

Different report description

Page 4: An Integrated Framework on Mining Logs Files for Computing System Management

Introduction (con.)

Difficult to perform automated analysis Method:

Categorize the text message with disparate formats into common situations.

Timestamp The temporal characteristics provide additional

context information of the message. Can be used to facilitate date analysis.

Page 5: An Integrated Framework on Mining Logs Files for Computing System Management

An overview of the integrated framework

Page 6: An Integrated Framework on Mining Logs Files for Computing System Management

System log categorization

Common categories Base on the CBE (Common Base Event)

format establish by IBM initiative. The set of categories:

Start, stop, dependency, create, connection, report, request, configuration, and other.

Message categorization Use naive Bayes as classification approach

for learning in text categorization

Page 7: An Integrated Framework on Mining Logs Files for Computing System Management

Incorporating the temporal information Two approach:

Naive Bayes algorithm Hidden Markov model

Page 8: An Integrated Framework on Mining Logs Files for Computing System Management

Mining event relationships - Introduction After log file transformed into common

categories, discover interesting patterns embedded in the data.

Try to find the mining temporal patterns through log timestamp.

Temporal patterns of interest appear in the system management application.

Sequence of events propagating from origin and low layer to high software layer through the dependency tree.

Knowing temporal patterns can help to pinpoint the root cause and take proper action.

Page 9: An Integrated Framework on Mining Logs Files for Computing System Management

Mining event relationships – Notations and problem formulations Temporal patterns:

he temporal patterns assert dependency between events and specify the timing information. Usually, they can be described as “event a happens after event b ,say, about 5 minutes”.

We refer this type of patterns as t-patterns.

Page 10: An Integrated Framework on Mining Logs Files for Computing System Management

Mining event relationships – Discovering t-Patterns• Let Ta and Tb be two point processes for event a and b repecentively.

• The distribution can be interpreted as probability of having event type b within time r.

Page 11: An Integrated Framework on Mining Logs Files for Computing System Management

Experiments Log Data Generation

Log files are collected form different machines with different OS in the school of computer science at Florida international university.

Use Logdump2td (NT data collection tool) developed by Event mining team at IBM research center.

Message Categorization

Page 12: An Integrated Framework on Mining Logs Files for Computing System Management

Discover and Visualize Event Relationships

Page 13: An Integrated Framework on Mining Logs Files for Computing System Management
Page 14: An Integrated Framework on Mining Logs Files for Computing System Management

Conclusion and Future work

Automatically infer the set of common categories from history data.

The number of common categories for can be significantly large.