1 graphical causal models clark glymour carnegie mellon university florida institute for human and...
TRANSCRIPT
11
Graphical Causal ModelsGraphical Causal Models
Clark GlymourClark GlymourCarnegie Mellon UniversityCarnegie Mellon University
Florida Institute for Human and Florida Institute for Human and Machine CognitionMachine Cognition
22
OutlineOutline Part I: Goals and the Miracle of d-Part I: Goals and the Miracle of d-
separationseparation Part II: Statistical/Machine Learning Part II: Statistical/Machine Learning
Search and Discovery Methods for Search and Discovery Methods for Causal RelationsCausal Relations
Part III: A Bevy of Causal Analysis Part III: A Bevy of Causal Analysis ProblemsProblems
33
I. Brains, Trains, and Automobiles: I. Brains, Trains, and Automobiles: Cognitive Neuroscience as Reverse Cognitive Neuroscience as Reverse
Auto MechanicsAuto MechanicsIdeaIdea: Like autos, like trains, like : Like autos, like trains, like
computers, brains have parts.computers, brains have parts.
The parts influence one another The parts influence one another to to produce a behavior.produce a behavior.
The parts can have roles in The parts can have roles in multiple multiple behaviors.behaviors.
Big parts haveBig parts have
littler parts.littler parts.
44
I. Goals of the Automobile I. Goals of the Automobile HypothesisHypothesis
Overall goals:Overall goals: Identify the parts critical to behaviors of Identify the parts critical to behaviors of
interest. interest. Figure out how they influence one Figure out how they influence one
another, in what timing sequences.another, in what timing sequences. Imaging Imaging goalsgoals
Identify relatively BIG parts (ROIs).Identify relatively BIG parts (ROIs). Figure out how they influence one Figure out how they influence one
another, with what timing sequences, in another, with what timing sequences, in producing behaviors of interest.producing behaviors of interest.
55
I. Goal: From Data to I. Goal: From Data to MechanismsMechanisms
X Y
Z
W
Causal Relations among Neurally Localized Variables
Multivariate Time Series
A
B
C
D
66
I. Graphical Causal Models: the I. Graphical Causal Models: the Abstract Structure of Abstract Structure of
InfluencesInfluences
Vehicle deceleration
Friction of pads Friction of shoe
against rotor against wheel
Fluid in caliper Fluid in wheel cyiinder
Fluid level in
master cylinder
Push brake
This system is deterministic (we hope)
77
““Cause” is a vague, metaphysical notion.Cause” is a vague, metaphysical notion. Response: Response: Compare “probability.”Compare “probability.”
““Probability” has a mathematical structure. Probability” has a mathematical structure. “Causation” does not.“Causation” does not. Response: Response: See Spirtes, et al., Causation, See Spirtes, et al., Causation,
Prediction and Search, 1993, 2000; Pearl, Causality, Prediction and Search, 1993, 2000; Pearl, Causality, 2000. Listen to Pearl’s lecture this afternoon.2000. Listen to Pearl’s lecture this afternoon.
The real causes are at the synaptic level, so talk The real causes are at the synaptic level, so talk of ROIs as causes is nonsense.of ROIs as causes is nonsense.
“…“…for many this rhetoric represents a category error…for many this rhetoric represents a category error…because causal [sic] is an attribute of the state equation.” because causal [sic] is an attribute of the state equation.” (Friston, et al, 2007, 602.) (Friston, et al, 2007, 602.)
Response: Response: So, do you think “smoking causes cancer” is So, do you think “smoking causes cancer” is nonsense? “Human activities cause global temperature nonsense? “Human activities cause global temperature increases” is nonsense? “Turning the ignition key causes increases” is nonsense? “Turning the ignition key causes the car to start” is nonsense?the car to start” is nonsense?
I. Philosophical ObjectionsI. Philosophical Objections
88
I. The Abstract Structure of I. The Abstract Structure of InfluencesInfluences
Linear causal models (SEMs) specify a directed graphical structure.
MedFGlb : = a CING(b) + e1
STG(b) : = b CING(b) + e2
IPL(b) := c STG(b) + d CING(b) + e3
e1, e2, e3 jointly independent
S. Hanson, et al., 2008. Middle Occipital Gyrus (mog), Inferior Parietal Lobule (ipl), Middle Frontal Gyrus(mfg), and Inferior Frontal Gyrus (ifg) Middle Occipital Gyrus (mog), Inferior Parietal Lobule (ipl), Middle Frontal Gyrus(mfg), and Inferior Frontal Gyrus (ifg)
But so does any functional form of the influences:MedFGlb : = f(CING(b) ) + e1STG(b) : = g(CING(b) + e2IPL(b) := h(STG(b), CING(b)) + e3e1, e2, e3 jointly independent
This system is not deterministic
99
I. So What?I. So What?1. The directed graph codes the conditional independence relations implied by the model:
MedFGl(b) II {STG(b), IPL(b}) | CING(b).
2. (Almost) All of our tests of models are tests of implications of their conditional independence claims.
So what is the code?
1010
I. d-separation Is the Code!I. d-separation Is the Code!
XX YY ZZ WW
RR
SS
X II {Z, W} | Y
X II W | Z
NOT X II W | R
NOT X || W | S
NOT X || W | {Y, Z, R}
NOT X || W | {Y, Z, S}
Conditioning on a variable in a directed path between X, W blocks the association produced by that path
Conditioning a variable that is a descendant of X, W creates a path that produces an association between X, W
J. Pearl, 1988
What about systems with cycles?
d-separation characterizes conditional independence relations in all such linear systems.
P. Spirtes, 1996
1111
I. How To Determine If Variables I. How To Determine If Variables AA and and ZZ Are Independent Conditional on a Set Are Independent Conditional on a Set QQ of of Variables.Variables.
1.1. Consider each sequence Consider each sequence pp of edge adjacent of edge adjacent variables (each in any direction) without self variables (each in any direction) without self intersections terminating in intersections terminating in AA and and ZZ..
2.2. A collider on p is a variable A collider on p is a variable NN on p such that on p such that variables variables M, OM, O on on pp each have edges directed each have edges directed into N: M -> N <- Ointo N: M -> N <- O
3.3. Sequence (path) p creates a dependency Sequence (path) p creates a dependency between between AA and and ZZ conditional on conditional on QQ if and only if and only if:if:
1.1. No non-collider on No non-collider on pp is in is in QQ..2.2. Every collider on Every collider on pp is in is in QQ or has a descendant in or has a descendant in Q Q
(a directed path from the collider to a member of (a directed path from the collider to a member of QQ.).)
1212
II. So, What Can We Do With II. So, What Can We Do With It?It?
Exploit d-separation in conjunction with Exploit d-separation in conjunction with distribution assumptions to estimate distribution assumptions to estimate graphical causal structure from sample graphical causal structure from sample data.data.
Understand when data analysis and Understand when data analysis and measurement methods distort conditional measurement methods distort conditional independence relations in target systems.independence relations in target systems. Wrong conditional independence relations Wrong conditional independence relations
=> wrong d-separation relations => wrong => wrong d-separation relations => wrong causal structure.causal structure.
1313
II. Simple Illustration (PC)II. Simple Illustration (PC)
X Y Z
Truth:
W
Consequences:X || Z {X,Z} || W | Y
X
Y
Z
W
Method:
X
Y
Z
W
X
Y
Z
W
X
Y
Z
W Spirtes, Glymour, & Scheines. (1993). Causation, Prediction, & Search, Springer Lecture Notes in Statistics.
1414
II. Bayesian Search: Greedy II. Bayesian Search: Greedy Equivlence Search (GES)Equivlence Search (GES)
X Y Z
Truth:
W
1. Start with empty graph.
2. Add or change the edge that most increases fit.
3. Iterate.
Data
X Y Z
W
Model with highest posterior probability
Chickering and Meek, Uncertainty in Artificial Intelligence Proceedings, 2003
1515
X Y Z W
II. With Unknown, Unrecorded II. With Unknown, Unrecorded Confounders: FCIConfounders: FCI
Truth
X Y ZW
UnrecordedVariable
Data FCI
Consistent estimator under i.i.d. sampling
Spirtes, et al., Causation, Prediction and Search But in other cases is often
uninformative
1616
II. Overlapping Databases: II. Overlapping Databases: IONION
Truth:
X
W R
Y
Z
S
WW XX YY ZZ SS RR
DD
11
1.1.55
4.4.88
2.2.22
-4.7-4.7 10.10.11
2.2.33
8.8.88
7.7.44
0.30.3 -5.1-5.1
…… …… …… …… ……
DD
22
7.7.22
3.3.55
1.81.8 9.29.2 7.7.00
1.1.11
4.4.88
11.11.22
12.12.11
6.6.55
…… …… …… …… ……
ION algorithm recoversthe full graph!
But in other cases often generates a number of alternative models
Danks, Tillman and Glymour, NIPS, 2008.
1717
II. Time Series (Structural II. Time Series (Structural VAR)VAR) Basic idea: PC or Basic idea: PC or
GES style search GES style search on “relative” on “relative” time-slicestime-slices
Additive, non-Additive, non-linear model linear model of climate of climate teleconnections teleconnections (5 ocean (5 ocean indices; 563-indices; 563-month series)month series) Chu & Glymour, 2008, Chu & Glymour, 2008,
Journal of Machine Journal of Machine Learning ResearchLearning Research
1818
II. Discovering Latent II. Discovering Latent VariablesVariables
Truth:
M1 M2
T1
M3 M4
M5 M6
T2
M7 M8
M9 M10
T3
M11 M12
Cluster M’s using a heuristic or Build Pure Clusters (Silva, et al. JMLR. 2006)
M1 M2 M3
M5 M6
M9 M10 M11 M12
T1
T2
T3
Applicable to time series?
Apply GES
1919
II. Limits of PC and GESII. Limits of PC and GES
With i.i.d. samples, and correct distribution families, PC With i.i.d. samples, and correct distribution families, PC and GES give correct information almost surely in the large and GES give correct information almost surely in the large sample limit—assuming no unrecorded common causes sample limit—assuming no unrecorded common causes
Works with “random effects “ for linear models.Works with “random effects “ for linear models. But doe not give all the information we want: Often cannot But doe not give all the information we want: Often cannot
determine the directions of influences! determine the directions of influences! Can post process with exhaustive test for all orientations—Can post process with exhaustive test for all orientations—
heuristic.heuristic. Adjacencies more reliable than directions of edgesAdjacencies more reliable than directions of edges
X Y
Z
…predicts the same independencies as…
X Y
Z
X Y
Z
X Y
Z
X Y
Z
X Y
Z
All of these are d-separation equivalent
2020
II. Breaking Down d-separation II. Breaking Down d-separation Equivalence: LiNGAMEquivalence: LiNGAM
X Y
Z
Linear equations (reduced):
X = X
Y = aXX + Y
Z = bXX + bYY + Z
X Y Z
X Y Z
Graphical representation:
Discoverable byLiNGaM (ICA + algebra)!
Disturbance terms must be non-Gaussian
Shimizu, et al. (2006) Journal of Machine Learning Research
2121
II. Feedback SystemsII. Feedback Systems
Truth: X W
Y Z
Two methods:Modified LiNGaM
Lacerda, Spirtes, & Hoyer (2008). Discovering cyclic causal models by independent component analysis. UAI.
Conditional independenciesRichardson & Spirtes (1999). Discovery of linear cyclic models.
X W
Y Z
X W
Y Z
2222
II. Missed Opportunities?II. Missed Opportunities?
None of the machine learning/statistical methods in None of the machine learning/statistical methods in II. have been used with imaging data. Instead:II. have been used with imaging data. Instead: Trial and error guessing and data fittingTrial and error guessing and data fitting RegressionRegression Granger Causality for time series.Granger Causality for time series. Exhaustive testing of all linear models.Exhaustive testing of all linear models.
How come?How come? UnfamiliarityUnfamiliarity The machine learning/statistical methods respect what it The machine learning/statistical methods respect what it
is possible to learn (in the large sample limit), which is is possible to learn (in the large sample limit), which is often less than researchers want to conclude.often less than researchers want to conclude.
2323
III. Simple Possible ErrorsIII. Simple Possible Errors
Pooling data from different subjects:Pooling data from different subjects: If If XX and and YY are independent in population P1 are independent in population P1
and in population P2, but have different and in population P2, but have different probability distributions in the two populations, probability distributions in the two populations, the the XX and and YY are not usually are not usually notnot independent in independent in P1 P1 P2. (G. Yule, 1904). P2. (G. Yule, 1904).
Pooling data from different time points in Pooling data from different time points in fMRI seriesfMRI series If the series is not stationary, data are being If the series is not stationary, data are being
pooled as above.pooled as above. Can remove trends but that doesn’t guarantee Can remove trends but that doesn’t guarantee
stationarity.stationarity.
2424
III. Eliminating OpportunitiesIII. Eliminating Opportunities
Removing autocorrelation by regression Removing autocorrelation by regression interferes with discovering feedback interferes with discovering feedback between variables.between variables.
Data manipulations that tend to make Data manipulations that tend to make variables Gaussianvariables Gaussian Spatial smoothingSpatial smoothing Variables defined by principal components or Variables defined by principal components or
averages over ROIsaverages over ROIs
eliminate or reduce the possibility of taking eliminate or reduce the possibility of taking advantage of LiNGAM algorithms. advantage of LiNGAM algorithms.
2525
III. Simple LimitationsIII. Simple Limitations
Testing all models (e.g., with LISREL chi-Testing all models (e.g., with LISREL chi-square) is a consistent search method for square) is a consistent search method for linear, Gaussian models (folk theorem).linear, Gaussian models (folk theorem).
But it is not feasible except for very small But it is not feasible except for very small numbers of variables, e.g., for 8 variables numbers of variables, e.g., for 8 variables there arethere are
332424 = 22,876,792,454,961 = 22,876,792,454,961 directed graphs.directed graphs.
2626
III. III. Not So Simple Possible Errors: Variables Not So Simple Possible Errors: Variables Defined on ROIs as Proxies for Latent Defined on ROIs as Proxies for Latent
VariablesVariables
X X YY ZZ
AA BB CC
X is independent of Z conditional on Y
But unless B is a perfect measure of Y, A is not independent of C conditional on B.
So if A, B, and C are taken as “proxies” for X, Y and Z, a regression of C on A and B will find, correctly, that X has an indirect influence on Z, through Y, but also, incorrectly, that X has in addition a direct influence on Z not through Y.
2727
III. Not So Obvious Errors: III. Not So Obvious Errors: RegressionRegression
Lots of forms: linear, polynomial, logistic, etc.Lots of forms: linear, polynomial, logistic, etc. All have the following features:All have the following features:
Prior separation of variables into outcome, Y, Prior separation of variables into outcome, Y, and a set and a set SS of possible causes, A, B, C, etc. of Y. of possible causes, A, B, C, etc. of Y.
Regression estimate of the influence of A on Y is a Regression estimate of the influence of A on Y is a measure of the association of A and Y measure of the association of A and Y conditional on conditional on all other variablesall other variables in in SS..
Regression for causal effects always attempts to Regression for causal effects always attempts to estimate the estimate the directdirect (relative to other variables in (relative to other variables in SS) ) influence of A on Y.influence of A on Y.
2828
III. Regression to Estimate III. Regression to Estimate Causal InfluenceCausal Influence
• Let Let VV = { = {XX,Y,,Y,TT}, where }, where
- Y : measured outcome- Y : measured outcome
- measured regressors: - measured regressors: X X = {X= {X11, X, X22, …, X, …, Xnn}}
- latent common causes of pairs in - latent common causes of pairs in X X U Y: U Y: TT = {T = {T11, …, , …,
TTkk}}
• Let the true causal model over Let the true causal model over VV be a Structural be a Structural
Equation Model in which each V Equation Model in which each V VV is a linear is a linear
combination of its direct causes and independent, combination of its direct causes and independent,
Gaussian noise.Gaussian noise.
2929
III. Regression to estimate Causal III. Regression to estimate Causal InfluenceInfluenceConsider the regression equation:Consider the regression equation: Y = Y = 00 + + 11XX11 + + 22XX22 + ..… + ..… nnXXnn
Let the OLS regression estimate Let the OLS regression estimate i i be thebe the estimatedestimated causal influencecausal influence of X of Xii on Y. on Y.
That is, hypothetically holding That is, hypothetically holding XX/Xi /Xi experimentally constant, experimentally constant, i i is an estimate of is an estimate of the change in E(Y) that would result from an the change in E(Y) that would result from an intervention that changes Xi by 1 unit.intervention that changes Xi by 1 unit.
Let the Let the realreal CausalCausal InfluenceInfluence X Xii Y = b Y = bii
When is the OLS estimate When is the OLS estimate i i a consistent estimate a consistent estimate of bof bii??
3030
III. Regression Will Be “inconsistent” III. Regression Will Be “inconsistent” WhenWhen
1. There is an unrecorded common 1. There is an unrecorded common cause of Y and Xicause of Y and Xi
LL
Xi Xi Y Y
If X, Y are the only measured variables, PC, GES and FCI cannot determine whether the influence is from X to Y or from an unmeasured common cause, or both. LiNGAM can if the disturbances are non-Gaussian.
3131
Regression will be Regression will be “inconsistent” when“inconsistent” when
2. Cause and effect are confused2. Cause and effect are confused::Xi Xi Y Y
3. And that error can lead to others3. And that error can lead to others:: Xi YXi Y
XkXk
“…one region, with a long haemodynamic latency, could cause a neuronal response in another that was expressed, haedynamically, before the source.” (Friston, et al., 2007, 602). LiNGAM does not make this error.
Regression concludes Xk is cause of Y. FCI, etc. do not make these errors.
3232
Bad Regression ExampleBad Regression Example
X2
Y
X3 X1
T1
True Model
T2
1
2
3
0 0 X 0 X
Multiple Regression Result
PC, GES, FCI get these kinds of cases right.
3333
Regression ConsistencyRegression Consistency
IfIf • XXii is d-separated from Y conditional on is d-separated from Y conditional on
XX\X\Xii in the true graph after removing in the true graph after removing XXii Y, and Y, and
• XX contains no descendant of Y, contains no descendant of Y, thenthen::
ii is a consistent estimate of bis a consistent estimate of bii
3434
III. Granger CausalityIII. Granger CausalityIdea: Time seriesIdea: Time seriesX is a Granger cause of Y iff stationary {…..Xt-1; ….Yt-X is a Granger cause of Y iff stationary {…..Xt-1; ….Yt-
1} predicts Yt better than does {….Yt-1}1} predicts Yt better than does {….Yt-1}
Obvious GeneralizationsObvious Generalizations:: Non-Gaussian time series.Non-Gaussian time series. Multiple time series—essentially time series version of Multiple time series—essentially time series version of
multiple regression: X is a Granger cause of Y iff Yt is not multiple regression: X is a Granger cause of Y iff Yt is not independent of …Xt-1 conditional on covariates …independent of …Xt-1 conditional on covariates …ZZt-1.t-1.
Less obvious generalizations:Less obvious generalizations: Non-linear time series (finding conditional independence Non-linear time series (finding conditional independence
tests is touchy)tests is touchy)
C. Granger, C. Granger, Econometrica,Econometrica, 1969 1969
3535
GC All Over the PlaceGC All Over the Place
Goebel, R. Roebroeck, A. Kim, D. and Formisano, E. (2003). Goebel, R. Roebroeck, A. Kim, D. and Formisano, E. (2003). Investigating directed cortical interactions in time-resolved fMI Investigating directed cortical interactions in time-resolved fMI data using vector autoregressive modeling and Granger causality data using vector autoregressive modeling and Granger causality mapping. mapping. Magnetic Resonance Imaging,Magnetic Resonance Imaging, 21: 125-161. 21: 125-161.
Chen, Y. Bressler, S.L., Knuth, K.H., Truccolo, W.A., Ding, M.Z., Chen, Y. Bressler, S.L., Knuth, K.H., Truccolo, W.A., Ding, M.Z., (2006). Stochastic modeling of neurobiological time series: power, (2006). Stochastic modeling of neurobiological time series: power, coherence, Granger causality, and separation of evoked responses coherence, Granger causality, and separation of evoked responses from ongoing activity. from ongoing activity. ChaosChaos 16, 26-113. 16, 26-113.
Brovelli, A., Ding, M.Z., Ledberg, A., Chen, Y.H., Nakamura, R., Brovelli, A., Ding, M.Z., Ledberg, A., Chen, Y.H., Nakamura, R., Bressler,S.L., (2004). Beta oscillations in a large-scale Bressler,S.L., (2004). Beta oscillations in a large-scale sensorimotor cortical network: directional influences revealed by sensorimotor cortical network: directional influences revealed by granger causality. Proc. Natl. Acad. Sci. U. S. A. 101: 9849–9854.granger causality. Proc. Natl. Acad. Sci. U. S. A. 101: 9849–9854.
Deshpande, G., Hu, ., Stilla, R, and K. Sathian, (2008) Effective Deshpande, G., Hu, ., Stilla, R, and K. Sathian, (2008) Effective connectivity during haptic perception: A study using Granger connectivity during haptic perception: A study using Granger causality analysis of functional magnetic resonance imaging data. causality analysis of functional magnetic resonance imaging data. NeuroImageNeuroImage, 40: 1807-1814., 40: 1807-1814.
3636
• fMRI series with multiple conditions are not stationary fMRI series with multiple conditions are not stationary
----May not always be serious.May not always be serious. • GC can produce causal errors when there is GC can produce causal errors when there is
measurement error or unmeasured confounding measurement error or unmeasured confounding series.series.
– --Open research problem--Open research problem: find a : find a consistent method to identify unrecorded common consistent method to identify unrecorded common causes of time series, akin to Silva, et al., causes of time series, akin to Silva, et al., JMLRJMLR 2006 for equilibrium data; Glymour and Spirtes, 2006 for equilibrium data; Glymour and Spirtes, J. J. of Econometricsof Econometrics, 1988., 1988.
III. Problems with GCIII. Problems with GC
3737
III. If Xt records an event occurring III. If Xt records an event occurring later than Yt+1, X may be mistakenly later than Yt+1, X may be mistakenly
taken to be a cause of Y. (Friston, taken to be a cause of Y. (Friston, 2007, again.)2007, again.)
• This is a problem for This is a problem for regression;regression; • Not a problem if PC, FCI, GES or LiNGAM are Not a problem if PC, FCI, GES or LiNGAM are
used in estimating the “Structural VAR” used in estimating the “Structural VAR” because they do not require a separation of variables because they do not require a separation of variables into outcome and potential cause, or a time ordering into outcome and potential cause, or a time ordering of variables.of variables.
3838
III. Granger Causality and III. Granger Causality and MechanismsMechanisms
Neural signals occur faster than fMRI Neural signals occur faster than fMRI sampling rate—what is going on in sampling rate—what is going on in between?between?
X1 Y1 Z1 W1
X2 Y2 Z2 W2
X3 Y3 Z3 W3
X4 Y4 Z4 W4
Granger Causes ARE:
W
X Y
Z
Unobserved
Spurious edges
3939
III. Analysis of ResidualsIII. Analysis of Residuals
Regress and apply PC, etc. to Regress and apply PC, etc. to residualsresiduals
X1 Y1 Z1 W1
X2 Y2 Z2 W2
X3 Y3 Z3 W3
X4 Y4 Z4 W4
Regress on X1, Y1, Z1, W1;
W
X Y
Z
Unobserved
Swanson and Granger, JASA; Demiralp and Hoover (2003), Oxford Economic Bulletin
4040
ConclusionConclusion Causal inference from imaging data is about as hard Causal inference from imaging data is about as hard
as it gets;as it gets; Conventional statistical procedures are radically Conventional statistical procedures are radically
insufficient tools;insufficient tools; Lots of unused potentially relevant, principled, tools Lots of unused potentially relevant, principled, tools
in the Machine Learning literature;in the Machine Learning literature; Measurement methods and data transformations can Measurement methods and data transformations can
alter the probability distributions in destructive ways;alter the probability distributions in destructive ways; Graphical causal models are the best available tool Graphical causal models are the best available tool
for thinking about the statistical constraints that for thinking about the statistical constraints that causal hypotheses imply.causal hypotheses imply.
4141
Things There Aren’t:Things There Aren’t:
Magic WandsMagic Wands
Pixie Dust Pixie Dust
4242
If You Forget Everything Else in If You Forget Everything Else in This Talk, Remember This:This Talk, Remember This:
P. Spirtes, et al., P. Spirtes, et al., Causation, Prediction and Causation, Prediction and SearchSearch, Springer Lecture Notes in , Springer Lecture Notes in Statistics, 2Statistics, 2ndnd edition MIT Press, 2000 edition MIT Press, 2000
J. Pearl, J. Pearl, CausalityCausality, Oxford, 2000., Oxford, 2000. Uncertainty in Artificial IntelligenceUncertainty in Artificial Intelligence Annual Annual
Conference ProceedingsConference Proceedings Journal of Machine Learning ResearchJournal of Machine Learning Research Peter Spirtes’ webpagePeter Spirtes’ webpage Judea Pearl’s web page.Judea Pearl’s web page. The TETRAD webpage.The TETRAD webpage.