challenges in causality: results of the wcci 2008 challenge isabelle guyon, clopinet constantin...
TRANSCRIPT
Challenges in causality: Results of the WCCI 2008
challenge
Isabelle Guyon, ClopinetConstantin Aliferis and Alexander Statnikov, Vanderbilt Univ.
André Elisseeff and Jean-Philippe Pellet, IBM Zürich
Gregory F. Cooper, Pittsburg University
Peter Spirtes, Carnegie Mellon
Causal discovery
Which actions will have beneficial effects?
…your health?
…climate changes?… the economy?
What affects…
What is causality?
• Many definitions:– Science– Philosophy– Law– Psychology– History– Religion– Engineering
• “Cause is the effect concealed, effect is the cause revealed” (Hindu philosophy)
The system
Systemic causality
External agent
Difficulty
• A lot of “observational” data.
Correlation Causality!
• Experiments are often needed, but:– Costly– Unethical– Infeasible
Causality workbench
http://clopinet.com/causality
Our approach
What is the causal question?
Why should we care?
What is hard about it?
Is this solvable?
Is this a good benchmark?
Four tasks
Toy datasets
Challenge datasets
On-line feed-back
Toy Examples
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
LUCAS0: natural
Causality assessmentwith manipulations
LUCAS1: manipulate
d
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
Causality assessmentwith manipulations
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
LUCAS2: manipulate
d
Causality assessmentwith manipulations
Goal driven causality
0
9 4
11
61
10 2
3
7
5
8
• We define: V=variables of interest
(e.g. MB, direct causes, ...)
• We assess causal relevance: Fscore=f(V,S).
4 11 2 3 1
• Participants return: S=selected subset
(ordered or not).
Causality assessmentwithout manipulation?
Using artificial “probes”
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
FatigueLUCAP0: natural
Probes
P1 P2 P3 PT
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
FatigueLUCAP0: natural
Probes
P1 P2 P3 PT
Using artificial “probes”
Probes
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
P1 P2 P3 PT
LUCAP1&2:
manipulated
Using artificial “probes”
Scoring using “probes”
• What we can compute (Fscore):
– Negative class = probes (here, all “non-causes”, all manipulated).
– Positive class = other variables (may include causes and non causes).
• What we want (Rscore):
– Positive class = causes.
– Negative class = non-causes.
• What we get (asymptotically):
Fscore = (NTruePos/NReal) Rscore + 0.5 (NTrueNeg/NReal)
Results
AUC distribution
Methods employed
• Causal: Methods employing causal discovery technique to unravel cause-effect relationships in the neighborhood of the target.
• Markov blanket: Methods for extracting the Markov blanket, without attempting to unravel cause-effect relationships.
• Feature selection: Methods for selecting predictive features making no explicit attempt to uncover the Markov blanket or perform causal discovery.
Formalism:Causal Bayesian networks
• Bayesian network:– Graph with random variables X1, X2, …Xn as
nodes.– Dependencies represented by edges.– Allow us to compute P(X1, X2, …Xn) as
i P( Xi | Parents(Xi) ).
– Edge directions have no meaning.
• Causal Bayesian network: egde directions indicate causality.
Causal discovery from “observational data”
Example algorithm: PC (Peter Spirtes and Clarck Glymour, 1999)
Let A, B, C X and V X. Initialize with a fully connected un-oriented graph.1. Conditional independence. Cut connection if
V s.t. (A B | V).2. Colliders. In triplets A — C — B (A — B) if there is
no subset V containing C s.t. A B | V, orient edges as: A C B.
3. Constraint-propagation. Orient edges until no change:
(i) If A B … C, and A — C then A C. (ii) If A B — C then B C.
Computational and statistical complexity
Computing the full causal graph poses:• Computational challenges (intractable for large numbers of
variables)• Statistical challenges (difficulty of estimation of conditional
probabilities for many var. w. few samples).
Compromise:• Develop algorithms with good average- case
performance, tractable for many real-life datasets.• Abandon learning the full causal graph and instead
develop methods that learn a local neighborhood.• Abandon learning the fully oriented causal graph and
instead develop methods that learn unoriented graphs.
Target Y
A prototypical MB algo: HITON
Aliferis-Tsamardinos-Statnikov, 2003
Target Y
1 – Identify variables with direct edges to the target
(parent/children)
Aliferis-Tsamardinos-Statnikov, 2003
Target Y
Aliferis-Tsamardinos-Statnikov, 2003
1 – Identify variables with direct edges to the target
(parent/children)
A
B Iteration 1: add A
Iteration 2: add B
Iteration 3: remove A because A Y | B
etc.
A
A B
B
Target Y
Aliferis-Tsamardinos-Statnikov, 2003
2 – Repeat algorithm for parents and children of Y(get
depth two relatives)
Target Y
Aliferis-Tsamardinos-Statnikov, 2003
3 – Remove non-members of the MB
A member A of PCPC that is not in PC is a member of the Markov Blanket if there is some member of PC B, such that A becomes conditionally dependent with Y conditioned on any subset of the remaining variables and B .
A
B
Collider
Spouse
Target Y
Spouse
Collider
Aliferis-Tsamardinos-Statnikov, 2003
4 – Orient edges1. Colliders:
• The presence of a spouse determines a collider.
• The target may also be a collider (B C | Y).
B C
Collider
Spouse
Target Y
Spouse
Collider
Aliferis-Tsamardinos-Statnikov, 2003
4 – Orient edges1. Colliders:
• The presence of a spouse determines a collider.
• The target may also be a collider (B C | Y).
2. Orient remaining edges.
B C
Additional Bells and Whistles
• The basic algorithms make simplifying assumptions:– Faithfulness (any conditional independence
between two variables results in an absence of direct edge.)
– Causal sufficiency (there are no unobserved common causes of the observed variables.)
• Laura E. Brown &Ioannis Tsamardinos:– Violations of “faithfulness”: select product of
features.– Violation of “causal sufficiency”: use Y structures.
Discussion
Top ranking methods
• According to the rules of the challenge:– Yin Wen Chang: SVM => best prediction accuracy on
REGED and CINA. – Gavin Cawley: Causal explorer + linear ridge
regression ensembles => best prediction accuracy on SIDO and MARTI.
• According to pairwise comparisons:– Jianxin Yin and Prof. Zhi Geng’s group: Partial
Orientation and Local Structural Learning => best on Pareto front, new original causal discovery algorithm.
Pairwise comparisons
Gavin CawleyYin-Wen Chang
Mehreen Saeed
Alexander Borisov
E. Mwebaze & J. QuinnH. Jair Escalante
J.G. Castellano
Chen Chu AnLouis Duclos-Gosselin
Cristian Grozea
H.A. Jen
J. Yin & Z. Geng Gr.Jinzhu Jia
Jianming Jin
L.E.B & Y.T.
M.B.Vladimir Nikulin
Alexey Polovinkin
Marius PopescuChing-Wei Wang
Wu Zhili
Florin Popescu
CaMML TeamNistor Grozavu
Causal vs. non-causal
Jianxin Yin: causal Vladimir Nikulin: non-causal
Using manip-MB as feature set using a causal model
Unmanipulated (training)
Manipulation #1 (test)
Heuristic: (1) Use the post-manipulation MB as feature set; (2) train a classifier to predict Y on training data (from the unmanipulated distribution).
Manipulation #2 (test)
Problem: Manipulated children of the target may remain in the post-manipulation MB (if they are also spouses) but with a different dependency to the target.
MB is not the best feature set?
Some features outside the MB may enhance predictivity if:a. Some MB features go undetected (e.g. the direct
causes are children of a common ancestor).
b. The predictor is too “weak” (e.g. the relationship to the target is non-linear but the predictor is linear).
Y
X
Z
y=a x2 z= x2
(a) (b)
Insensitivity to irrelevant features
Simple univariate predictive model, binary target and features, all relevant features correlate perfectly with the target, all irrelevant features randomly drawn. With 98% confidence, abs(feat_weight) < w and i wixi < v.
ng number of “good” (relevant) features
nb number of “bad” (irrelevant) features
m number of training examples.
Conclusion
• Causal discovery from observational data is not an impossible task, but a very hard one.
• This points to the need for further research and benchmark.
• Don’t miss the “pot-luck challenge”
http://clopinet.com/causality
1) Causal Feature SelectionI. Guyon, C. Aliferis, A. Elisseeff In “Computational Methods of Feature Selection”, Huan Liu and Hiroshi Motoda Eds., Chapman and Hall/CRC Press, 2007.
2) Design and Analysis of theCausation and Prediction Challenge
I. Guyon, C. Aliferis, G. Cooper, A. Elisseeff,J.-P. Pellet, P. Spirtes, A. Statnikov, JMLR workshop proceedings, in press.
http://clopinet.com/causality