causal modeling for anomaly detection andrew arnold machine learning department, carnegie mellon...
Post on 22-Dec-2015
224 views
TRANSCRIPT
Causal Modeling for Anomaly Detection
Andrew ArnoldMachine Learning Department, Carnegie Mellon University
Summer Project with Naoki AbePredictive Modeling Group, IBM
Rick Lawrence, ManagerJune 23, 2006
2
Contributions• Consistent causal structure can be learned
from passive observational data• Anomalous examples have a quantitatively
differentiable causal structure from normal ones
• Causal structure is a significant contribution to the standard analysis tools of independence and likelihood
3
Outline
• Motivation & Problem
• Causation Definition
• Causal Discovery
• Causal Comparisson
• Conclusions & Ongoing Work
4
Motivation• Processors:
– Detection: Is this wafer good or bad?
– Causation: Why is this wafer bad?
– Intervention: How can we fix the problem?
• Business:– Detection: Is this business functioning well or not?
– Causation: Why is this business not functioning well?
– Intervention: What can IBM do to improve performance?
5
Problem
• Interventions are expensive and flawed
• What can passively observed data tell us about the causal structure of a process?
6
Direct Causation
X is a direct cause of Y relative to S, iff
z,x1 x2 P(Y | X set= x1 , Z set= z)
P(Y | X set= x2 , Z set= z)
where Z = S - {X,Y} X Y
[Scheines (2005)]
Asymmetric
Intervene toset Z = zNot just
observe Z = z
7
Causal Graphs
Causal Directed Acyclic Graph G = {V,E}
Each edge X Y represents a direct causal claim:
X is a direct cause of Y relative to V
Exposure Infection Symptoms
[Scheines (2005)]
8
Probabilistic Independence
X and Y are independent iff
x1 x2 P(Y | X = x1) = P(Y | X = x2)
X Y
X Y
X and Y are associated iff
X and Y are not independent
[Scheines (2005)]
9
Causal Structure
Probabilistic Independence
The Causal Markov Axiom
Markov Condition
In a Causal Graph: each variable V is independent of its non-effects, conditional on its direct causes.
[Scheines (2005)]
12
Causal Structure Statistical Data
X3 | X2 X1
X2 X3 X1
Causal Markov Axiom(D-separation)
IndependenceRelations
Causal Graph
[Scheines (2005)]
13
Causal Discovery
Statistical Data Causal Structure
Background Knowledge
- Faithfulness
- X2 before X3
- no unmeasured common causes
X3 | X2 X1
Independence Relations
Data
Statistical Inference
X2 X3 X1
Equivalence Class of Causal Graphs
X2 X3 X1
X2 X3 X1
Discovery Algorithm
Causal Markov Axiom (D-separation)
X2 X3 X1
Equivalence Class Representation
[Scheines (2005)]
14
Causal Discovery Algorithm
• PC algorithm [Spirtes et al., 2000]– Constraint-based search– Only need to know how to test conditional
independence– Do not need to measure all causes– Asymptotically correct
15
PC algorithm
• Begin with the fully connected undirected graph
• For each pair of nodes, test their independence conditional on all subsets of their neighbors:– i.e., (X _||_ Y | Z)?
• If independent for any conditioning– remove edge, record subset conditioned upon
• If dependent for all conditionings– leave edge
• Orient edges, where possible
19
More Orientation Rules:Rule 3: Avoid forming cycles
If there is an undirected edge between X and YAnd there is a directed path from X to Y
– Then direct X-Y as X Y
Given: OK: BAD (cycle): X Y X Y X Y
Z Z Z
20
Our Example
Rule 2: Colliders
Rule 3: No new V-structures
Truth fully recovered
[Scheines (2005)]
28
Using causal structure to explain anomalies
• Why is one wafer good, and another bad?– Separate data into classes– Form causal graphs on each class– Compare causal structures
31
How to compare?• Similarity Score for graphs A and B over common
nodes V :– Consider undirected edges as bi-directed– Of all the ordered pairs of variables (x, y) in V, with an
arc x y in either A or B• In what percentage is there also x y in the other graph• i.e., (AdjA(x,y) || AdjB(x,y)) && (AdjA(x,y) == AdjB(x,y))
• Difference Graph:– If there is an arc x y in either A or B, but not in both,
place the arc x y in the difference graph – i.e., if (AdjA(x,y) != AdjB(x,y)) then AdjDiff(x,y) = True
35
Conclusions• Consistent causal structure can be learned
from passive observational data• Anomalous examples have a quantitatively
differentiable causal structure from normal ones
• Causal structure is a significant contribution to the standard analysis tools of independence and likelihood
36
Ongoing work
• Comparing to maximum likelihood and minimum description length techniques
• Looking at time-ordering– How do variables influence each other over time?
• Using one-class SVM to do clustering– Avoids need for labeled data
• Relaxing assumptions– Allow latent variables
• Evaluation is difficult without domain expert• Using causal structure to help in clustering
37
References• J. Pearl (2000). Causality: Models, Reasoning, and Inference,
Cambridge Univ. Press • R. Scheines, Causality Slides
http://www.gatsby.ucl.ac.uk/~zoubin/SALD/scheines.pdf• P. Spirtes, C. Glymour, and R. Scheines (2000). Causation,
Prediction, and Search, 2nd Edition (MIT Press)
Thank You
¿ Questions ?