an integrated framework on mining logs files for computing system management
TRANSCRIPT
An Integrated Framework on Mining Logs Files forComputing System Management
Tao LiSchool of Computer
ScienceFlorida International
UniversityMiami, FL [email protected] Wei Peng
School of Computer ScienceFlorida International University
Miami, FL [email protected]
Feng LiangInsitute of Statistics and Decision Sciences
Duke UniversityDurham, NC [email protected]
Sheng MaMachine Learning for Systems
IBM T.J. Watson Research CenterHawthorne, NY [email protected]
Agenda
Introduction System log categorization
Text mining techniques to categorize text message into a set of common categories
Incorporating the temporal information Two approach of incorporating temporal
information to improve the categorization performance
Mining event relationships Discovering the relationships between different
events Experiments Conclusion and future work
Introduction
Traditional approaches for trouble shooting – relay on the knowledge and experience of domain expert.
Modern computing system are instrumented to generate huge amount of system log data
The date in log file describe Status of each component System operational changes, such as starting and stopping of
services Detection of network applications Software configuration modification Software execution errors
Complicate Different device (e.g. routers, processors, adapters) Different software component (e.g. OS, middleware, user
application) Different provider (e.g. Cisco, IBM, Microsoft)
Different report description
Introduction (con.)
Difficult to perform automated analysis Method:
Categorize the text message with disparate formats into common situations.
Timestamp The temporal characteristics provide additional
context information of the message. Can be used to facilitate date analysis.
An overview of the integrated framework
System log categorization
Common categories Base on the CBE (Common Base Event)
format establish by IBM initiative. The set of categories:
Start, stop, dependency, create, connection, report, request, configuration, and other.
Message categorization Use naive Bayes as classification approach
for learning in text categorization
Incorporating the temporal information Two approach:
Naive Bayes algorithm Hidden Markov model
Mining event relationships - Introduction After log file transformed into common
categories, discover interesting patterns embedded in the data.
Try to find the mining temporal patterns through log timestamp.
Temporal patterns of interest appear in the system management application.
Sequence of events propagating from origin and low layer to high software layer through the dependency tree.
Knowing temporal patterns can help to pinpoint the root cause and take proper action.
Mining event relationships – Notations and problem formulations Temporal patterns:
he temporal patterns assert dependency between events and specify the timing information. Usually, they can be described as “event a happens after event b ,say, about 5 minutes”.
We refer this type of patterns as t-patterns.
Mining event relationships – Discovering t-Patterns• Let Ta and Tb be two point processes for event a and b repecentively.
• The distribution can be interpreted as probability of having event type b within time r.
Experiments Log Data Generation
Log files are collected form different machines with different OS in the school of computer science at Florida international university.
Use Logdump2td (NT data collection tool) developed by Event mining team at IBM research center.
Message Categorization
Discover and Visualize Event Relationships
Conclusion and Future work
Automatically infer the set of common categories from history data.
The number of common categories for can be significantly large.