1 vldb 2006, seoul mapping a moving landscape by mining mountains of logs automated generation of a...
TRANSCRIPT
VLDB 2006, Seoul 1
Mapping a Moving Landscape byMining Mountains of LogsAutomated Generation of a Dependency Model for HUG’s Clinical System
Mirko Steinle, EPFL and HUGKarl Aberer, EPFLSarunas Girdzijauskas, EPFLAlexander Lamb, HUG
VLDB 2006, Seoul 2
Overview
• Background – Dependency Models• Approaches
– L1: Analyzing general service activity– L2: Analyzing user sessions– L3: Analyzing textual content
• Evaluation• Conclusion
VLDB 2006, Seoul 3
Background - A Moving Landscape
• Distributed clinical system of University Hospital Geneva (HUG)– 2000 beds, 4500 PCs, 20000 records accessed per day
• Relevant features– Communication is web service based
• Service Directory: about 50 service groups– Centralized Logging System with a standard XML format
• 10 Mio log messages/day, 1 TeraByte/year– Quite homogeneous infrastructure
• Severe Availability Requirements (24 x 7 x 365)➱ Need for automated support for problem diagnosis
VLDB 2006, Seoul 4
Dependency Model
• Service Orientation allows for easy reuse and integration, but has resulted into a complex dependency structure
• Dependency model is not clear– DM difficult to obtain, impossible to keep up-to-date
manually– Infrastructure for manual documentation of the
dependency structure is available, but not used …
VLDB 2006, Seoul 5
Goal - Automated Dependency Model
• Goal: Automated creation of a model of the system’s dependency structure (DM)– Non-intrusive and low-cost– Focus on invocation dependencies between high-
level objects
• Applications – Support for Fault Localization Algorithms– Prediction of Impact of Management Operations– Support for Architectural Decisions– Detection of Abnormal Behavior
• “you don’t want to interrupt a surgery because of DB maintenance”
VLDB 2006, Seoul 6
Possible Approaches
• Static approaches– Capture dependencies at “compile time” by scanning
configuration files, code etc.
• Dynamic approaches– Capture dependencies at runtime– Approaches include:
• Code instrumentation (standards like JMX or ARM exist but are not yet applied broadly)
• Middleware instrumentation (eg. request tagging)• Active perturbation of system operation• Time series analysis of activity measures, eg. using
Neural Networks, (network communication, cpu usage, …) [Ensel02]
Gen
era
lity
Accu
racy &
Pre
cis
ion
VLDB 2006, Seoul 7
State of the Art
• Research– Focuses on how to exploit a dependency model, little
work on how to obtain it– No generally applicable solution providing sufficiently
correct dependency models seems to exist
• Commercial Products– Most focus on low-level objects and visualization– (Few) existing dynamic approaches: high
configuration effort!
VLDB 2006, Seoul 8
Overview
• Background – Dependency Models• Approaches
– L1: Analyzing general service activity– L2: Analyzing user sessions– L3: Analyzing textual content
• Evaluation• Conclusion
VLDB 2006, Seoul 9
Technique L1: Logs as a General Activity Measure
• Key idea– Activity of dependent objects is
likely to be correlated in some sense– Use logs as an activity measure
• Earlier work– Neural networks on CPU usage,
traffic volume, … [Ensel02]– Drawback: supervised training
• Our approach– statistical approach (no training) – inspired by [LM04] (“Mining Temporal Patterns without
Predefined Time Windows”)
VLDB 2006, Seoul 10
Statistical Approach
• Tests for association of spatial point processes– Compare the typical distance of a random point R in time
to the closest timestamp of a log from B, to the one of a timestamp of a log from A
• Approach– Obtain distances by sampling from R and A– Determine median for distances A-B and R-B– If median for A-B lower than for R-B →
correlation/dependence– Use confidence intervals
VLDB 2006, Seoul 11
Example
confidence interval for median of x1,…,xn: median fallswith probability 95% into this interval,interval [xj, xk] s.t. Bn,½(k-1)- Bn,½(j-1) > 0.95
VLDB 2006, Seoul 12
Observations for L1
• Observations from preliminary experimental evaluation – True dependencies found, but clearly incomplete– Few “random” errors– However, correlation also if no invocation dependency
exists
• Limit analysis to shorter time windows– Eliminate common dependency on time
Transitive dependency Simultaneous use
VLDB 2006, Seoul 13
Technique L2: Logs in a User Session
• One main difficulty is heavy parallelism in system
➱ execution sequences get overshadowed
• Reconstruct user sessions➱ eliminates parallelism due to multiple users
• Then, adapt a procedure from NLP [Evert04]• Two independent steps
1. Extraction of consecutive log-source pairs [APPi, APPj] and creation of contingency tables
2. Statistical test for association on these tables
VLDB 2006, Seoul 14
Construction of Contingency Table
• Session Log
• Bigrams (u, v)
• Contingency table for A-B
u = A u ≠ A
v = B 1 1v ≠ B 0 1
(A,B) - (B,C) - (C,B)
VLDB 2006, Seoul 15
Expected vs. Observed Frequencies
• Expected frequencies under the hypothesis that u and v are statistically independent
VLDB 2006, Seoul 16
Statistical Test for Association
• Log-likelihood test (Dunning)
• Works well for heavily skewed tables (O11 << N)
• For an excellent discussion of statistical tests for correlation see [Evert04]
VLDB 2006, Seoul 17
Observations for L2
• Observations from preliminary experimental evaluation – Many true dependencies found – Interestingly, a few similar errors as in L1
• transitivity and simultaneous use
– Main problem• only a small subset of logs can be assigned to a
session, and many interactions can thus not be observed
VLDB 2006, Seoul 18
Technique L3: Exploiting Textual Content in Logs
• Observation– Invocation of a remote service is typically logged by the
caller– One could identify such logs and process log content to find
callee
• The other way round– Find logs mentioning directory entry contents for a given
service– Infer a dependency of the log’s source, the caller, on the
service
• Example: service s calls notify on server myserver ● Possible content of free text in log entry
Invoke externalService [fct [notify] server [myserver.hguge:9999/myurl]]or(DPINOTIFICATION) notify ($myparams)
VLDB 2006, Seoul 19
Overview
• Background – Dependency Models• Approaches
– L1: Analyzing general service activity– L2: Analyzing user sessions– L3: Analyzing textual content
• Evaluation• Conclusion
VLDB 2006, Seoul 20
Experiments on Logs: Setting
• Test data: 56.8 Mio logs from 1 week
• Reference model (RM)– Created with help of more than a dozen system experts
and developers– 178 dependencies out of 1431 possible dependencies (54
services)
• Strategy1. Validate L1, L2 and L3 against static reference model2. Validate L1 and L2 against L3 and study influence of load
VLDB 2006, Seoul 21
Experiment: Validation against RM
L10.98 level CI: [0.63, 0.73]
L20.98 level CI: [0.71, 0.78]
L30.98 level CI: [0.93, 0.96]
• 30-46 True Positives detected• Small classification error for L1
– about 2% in negative case
• False Positives (FP) for L1– transitive and simultaneous use
(e.g. administrative patient data and laboratory results)
• 51-74 True Positives detected• FP for L2
– asynchronous communication
• Sessions in L2– only 10% of all logs can be
assigned to a session
• 116-152 True Positives detected– 10 False Negatives on the whole
week
VLDB 2006, Seoul 22
Experiment: Influence of Load on Detection
• Realizations of dependency relationships computed with L3• Percentage of False Positives is not influenced by load
CI for linear factorsL1: [-0.284, -0.215]L2: [-0.025, 0.002]
VLDB 2006, Seoul 23
Overview
• Background – Dependency Models• Approaches
– L1: Analyzing general service activity– L2: Analyzing user sessions– L3: Analyzing textual content
• Evaluation• Conclusion
VLDB 2006, Seoul 24
Comparison of Log-based Approaches
L3. Logs as Text L2. Logs in Sessions
L1. Logs as Activity Measure
Accuracy and Precision of Result
Concurrency
Correlation
Implementation and Maintenance
Parametrization
Performance and security impact
Required Structure and Content of Logs (Scope)
Service directory
Session info
Only source and timestamp
• All techniques can be implemented in linear complexity w.r.t. #logs• Invocation direction functional dependency direction• Solution for HUG
– Centralized logging system ➱ little effort for log-based methods– L3 is a viable solution
VLDB 2006, Seoul 25
Conclusion
• Three new approaches to use logs for DM generation with a large scope
• All have been shown to discover useful dependency information in real-world environment
• Seems to be first study on use of logs and first real-world experiment for DM generation
• Sniffing– Applicable for web service oriented systems
• Simple and efficient solution for HUG