causal modeling for anomaly detection andrew arnold machine learning department, carnegie mellon...

Causal Modeling for Anomaly Detection

Andrew ArnoldMachine Learning Department, Carnegie Mellon University

Summer Project with Naoki AbePredictive Modeling Group, IBM

Rick Lawrence, ManagerJune 23, 2006

2

Contributions• Consistent causal structure can be learned

from passive observational data• Anomalous examples have a quantitatively

differentiable causal structure from normal ones

• Causal structure is a significant contribution to the standard analysis tools of independence and likelihood

3

Outline

• Motivation & Problem

• Causation Definition

• Causal Discovery

• Causal Comparisson

• Conclusions & Ongoing Work

4

Motivation• Processors:

– Detection: Is this wafer good or bad?

– Causation: Why is this wafer bad?

– Intervention: How can we fix the problem?

• Business:– Detection: Is this business functioning well or not?

– Causation: Why is this business not functioning well?

– Intervention: What can IBM do to improve performance?

5

Problem

• Interventions are expensive and flawed

• What can passively observed data tell us about the causal structure of a process?

6

Direct Causation

X is a direct cause of Y relative to S, iff

z,x1 x2 P(Y | X set= x1 , Z set= z)

P(Y | X set= x2 , Z set= z)

where Z = S - {X,Y} X Y

[Scheines (2005)]

Asymmetric

Intervene toset Z = zNot just

observe Z = z

7

Causal Graphs

Causal Directed Acyclic Graph G = {V,E}

Each edge X Y represents a direct causal claim:

X is a direct cause of Y relative to V

Exposure Infection Symptoms

[Scheines (2005)]

8

Probabilistic Independence

X and Y are independent iff

x1 x2 P(Y | X = x1) = P(Y | X = x2)

X Y

X Y

X and Y are associated iff

X and Y are not independent

[Scheines (2005)]

9

Causal Structure

Probabilistic Independence

The Causal Markov Axiom

Markov Condition

In a Causal Graph: each variable V is independent of its non-effects, conditional on its direct causes.

[Scheines (2005)]

10

Causal Structure Statistical Data

[Scheines (2005)]

11


[Scheines (2005)]

12


X3 | X2 X1

X2 X3 X1

Causal Markov Axiom(D-separation)

IndependenceRelations

Causal Graph

[Scheines (2005)]

13

Causal Discovery

Statistical Data Causal Structure

Background Knowledge

- Faithfulness

- X2 before X3

- no unmeasured common causes

X3 | X2 X1

Independence Relations

Data

Statistical Inference

X2 X3 X1

Equivalence Class of Causal Graphs

X2 X3 X1

X2 X3 X1

Discovery Algorithm

Causal Markov Axiom (D-separation)

X2 X3 X1

Equivalence Class Representation

[Scheines (2005)]

14

Causal Discovery Algorithm

• PC algorithm [Spirtes et al., 2000]– Constraint-based search– Only need to know how to test conditional

independence– Do not need to measure all causes– Asymptotically correct

15

PC algorithm

• Begin with the fully connected undirected graph

• For each pair of nodes, test their independence conditional on all subsets of their neighbors:– i.e., (X _||_ Y | Z)?

• If independent for any conditioning– remove edge, record subset conditioned upon

• If dependent for all conditionings– leave edge

• Orient edges, where possible

16

Independence Tests

[Scheines (2005)]

17

Edge OrientationRule 1: Colliders

[Scheines (2005)]

18

More Orientation Rules:Rule 2: Avoid forming new colliders

[Scheines (2005)]

19

More Orientation Rules:Rule 3: Avoid forming cycles

If there is an undirected edge between X and YAnd there is a directed path from X to Y

– Then direct X-Y as X Y

Given: OK: BAD (cycle): X Y X Y X Y

Z Z Z

20

Our Example

Rule 2: Colliders

Rule 3: No new V-structures

Truth fully recovered

[Scheines (2005)]

23

Results: Key Performance Indicators

26

Results: Chip Fabrication

27

Temporal ordering is preserved

28

Using causal structure to explain anomalies

• Why is one wafer good, and another bad?– Separate data into classes– Form causal graphs on each class– Compare causal structures

30

Form causal graphs

Good Train

Good Test

Bad

31

How to compare?• Similarity Score for graphs A and B over common

nodes V :– Consider undirected edges as bi-directed– Of all the ordered pairs of variables (x, y) in V, with an

arc x y in either A or B• In what percentage is there also x y in the other graph• i.e., (AdjA(x,y) || AdjB(x,y)) && (AdjA(x,y) == AdjB(x,y))

• Difference Graph:– If there is an arc x y in either A or B, but not in both,

place the arc x y in the difference graph – i.e., if (AdjA(x,y) != AdjB(x,y)) then AdjDiff(x,y) = True

32

ComparisonGood TestGood Train

59% similar Difference Graph

33

ComparisonBadGood Train


34

ComparisonBadGood Test


35

Conclusions• Consistent causal structure can be learned

from passive observational data• Anomalous examples have a quantitatively

differentiable causal structure from normal ones

• Causal structure is a significant contribution to the standard analysis tools of independence and likelihood

36

Ongoing work

• Comparing to maximum likelihood and minimum description length techniques

• Looking at time-ordering– How do variables influence each other over time?

• Using one-class SVM to do clustering– Avoids need for labeled data

• Relaxing assumptions– Allow latent variables

• Evaluation is difficult without domain expert• Using causal structure to help in clustering

37

References• J. Pearl (2000). Causality: Models, Reasoning, and Inference,

Cambridge Univ. Press • R. Scheines, Causality Slides

http://www.gatsby.ucl.ac.uk/~zoubin/SALD/scheines.pdf• P. Spirtes, C. Glymour, and R. Scheines (2000). Causation,

Prediction, and Search, 2nd Edition (MIT Press)

Thank You

¿ Questions ?

causal modeling for anomaly detection andrew arnold machine learning department, carnegie mellon...

Documents