the mathematics of cause and effect: with reflections on machine learning judea pearl
DESCRIPTION
THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl Departments of Computer Science and Statistics UCLA. OUTLINE. The causal revolution – from statistics to counterfactuals The fundamental laws of causal inference - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/1.jpg)
THE MATHEMATICS OF CAUSE AND EFFECT:
With Reflections on Machine Learning
Judea PearlDepartments of Computer Science
and StatisticsUCLA
![Page 2: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/2.jpg)
1. The causal revolution – from statistics to
counterfactuals
2. The fundamental laws of causal inference
3. From counterfactuals to practical victories
a) policy evaluation
b) attribution
c) mediation
d) generalizability – external validity
e) missing data
OUTLINE
![Page 3: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/3.jpg)
TRADITIONAL STATISTICALINFERENCE PARADIGM
Data
Inference
Q(P)(Aspects of P)
PJoint
Distribution
e.g.,Infer whether customers who bought product Awould also buy product B.Q = P(B | A)
![Page 4: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/4.jpg)
How does P change to P′? New oraclee.g., Estimate P′(cancer) if we ban smoking.
FROM STATISTICAL TO CAUSAL ANALYSIS:1. THE DIFFERENCES
Data
Inference
Q(P′)(Aspects of P′)
P′Joint
Distribution
PJoint
Distribution
change
![Page 5: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/5.jpg)
e.g., Estimate the probability that a customer who bought A would buy B if we were to double the price.
FROM STATISTICAL TO CAUSAL ANALYSIS:1. THE DIFFERENCES
Data
Inference
Q(P′)(Aspects of P′)
P′Joint
Distribution
PJoint
Distribution
change
![Page 6: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/6.jpg)
Data
Inference
Q(M)(Aspects of M)
Data Generating
Model
M – Invariant strategy (mechanism, recipe, law, protocol) by which Nature assigns values to variables in the analysis.
JointDistribution
THE STRUCTURAL MODELPARADIGM
M
“A painful de-crowning of a beloved oracle!”•
![Page 7: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/7.jpg)
WHAT KIND OF QUESTIONS SHOULD THE ORACLE ANSWER?
•Observational Questions:“What if we see A”
•Action Questions:“What if we do A?”
•Counterfactuals Questions:“What if we did things differently?”
•Options: “With what probability?”
(What is?)
(What if?)
(Why?)
THE CAUSAL HIERARCHY
P(y | A)
P(y | do(A)
P(yA’ | A)
- SYNTACTIC DISTINCTION
![Page 8: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/8.jpg)
![Page 9: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/9.jpg)
STRUCTURAL CAUSAL MODELS:THE WORLD AS A COLLECTION
OF SPRINGS
Definition: A structural causal model is a 4-tuple<V,U, F, P(u)>, where• V = {V1,...,Vn} are endogenous variables• U = {U1,...,Um} are background variables• F = {f1,..., fn} are functions determining V,
vi = fi(v, u)• P(u) is a distribution over UP(u) and F induce a distribution P(v) over observable variables
e.g.,
![Page 10: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/10.jpg)
Definition: The sentence: “Y would be y (in situation u), had X been x,”
denoted Yx(u) = y, means:The solution for Y in a mutilated model Mx, (i.e., the equations
for X replaced by X = x) with input U=u, is equal to y.
The Fundamental Equation of Counterfactuals:
COUNTERFACTUALS ARE EMBARRASINGLY SIMPLE
![Page 11: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/11.jpg)
THE TWO FUNDAMENTAL LAWSOF CAUSAL INFERENCE
1. The Law of Counterfactuals
(M generates and evaluates all counterfactuals.)
2. The Law of Conditional Independence (d-separation)
(Separation in the model independence in the ⇒ distribution.)
![Page 12: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/12.jpg)
THE LAW OFCONDITIONAL INDEPENDENCE
Each function summarizes millions of micro processes.
C (Climate)
R (Rain)
S (Sprinkler)
W (Wetness)
U1
U2
U3
U4
CS
![Page 13: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/13.jpg)
Each function summarizes millions of micro processes.
Still, if the U 's are independent, the observed distribution P(C,R,S,W) must satisfy certain constraints that are:(1) independent of the f ‘s and of P(U) and (2) can be read from the structure of the graph.
C (Climate)
R (Rain)
S (Sprinkler)
W (Wetness)
U1
U2
U3
U4
CS
THE LAW OFCONDITIONAL INDEPENDENCE
![Page 14: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/14.jpg)
D-SEPARATION: NATURE’S LANGUAGE FOR COMMUNICATING ITS STRUCTURE
C (Climate)
R (Rain)
S (Sprinkler)
W (Wetness)
Every missing arrow advertises an independency, conditional on a separating set.
Applications:1. Model testing 2. Structure learning3. Reducing "what if I do" questions to symbolic calculus4. Reducing scientific questions to symbolic calculus
![Page 15: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/15.jpg)
SEEING VS. DOING
Effect of turning the sprinkler ON
![Page 16: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/16.jpg)
Q(P) - Identified estimands
T(MA) - Testable implications
A* - Logicalimplications of A
Causal inference
Statistical inference
A - CAUSAL ASSUMPTIONS
Q Queries of interest
Data (D)
THE LOGIC OF CAUSAL ANALYSIS
Goodness of fit
Model testingProvisional claims
Q - Estimates of Q(P)
CAUSAL MODEL
(MA)
![Page 17: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/17.jpg)
THE MACHINERY OF CAUSAL CALCULUS
Rule 1: Ignoring observations P(y | do{x}, z, w) = P(y | do{x}, w)
Rule 2: Action/observation exchange P(y | do{x}, do{z}, w) = P(y | do{x},z,w)
Rule 3: Ignoring actions P(y | do{x}, do{z}, w) = P(y | do{x}, w)
Completeness Theorem (Shpitser, 2006)
![Page 18: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/18.jpg)
DERIVATION IN CAUSAL CALCULUS
Smoking Tar Cancer
Probability Axioms
Probability Axioms
Rule 2
Rule 2
Rule 3
Rule 3
Rule 2
Genotype (Unobserved)
![Page 19: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/19.jpg)
EFFECT OF WARM-UP ON INJURY (After Shrier & Platt, 2008)
No, no!
![Page 20: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/20.jpg)
TRANSPORTABILITY OF KNOWLEDGEACROSS DOMAINS(with E. Bareinboim)
1. A Theory of causal transportability
When can causal relations learned from experimentsbe transferred to a different environment in which no experiment can be conducted?
2. A Theory of statistical transportabilityWhen can statistical information learned in one domain be transferred to a different domain in which
a. only a subset of variables can be observed? Or,
b. only a few samples are available?
![Page 21: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/21.jpg)
• Extrapolation across studies requires “some understanding of the reasons for the differences.” (Cox, 1958)
• “`External validity’ asks the question of generalizability: To what population, settings, treatment variables, and measurement variables can this effect be generalized?” (Shadish, Cook and Campbell, 2002)
• “An experiment is said to have “external validity” if the distribution of outcomes realized by a treatment group is the same as the distribution of outcome that would be realized in an actual program.” (Manski, 2007)
• "A threat to external validity is an explanation of how you might be wrong in making a generalization." (Trochin, 2006)
EXTERNAL VALIDITY(how transportability is seen in other sciences)
![Page 22: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/22.jpg)
MOVING FROM THE LAB TO THE REAL WORLD . . .
Real world
Everything is assumed to be the same, trivially transportable!
Everything is assumed to be the different, not transportable!
X Y
Z
W
X Y
Z
W
X Y
Z
WLabH1
H2
![Page 23: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/23.jpg)
MOTIVATION WHAT CAN EXPERIMENTS IN LA TELL ABOUT NYC?
Experimental study in LAMeasured:
Needed:
Observational study in NYCMeasured:
X (Intervention)
Y (Outcome)
Z (Age)
X(Observation)
Y(Outcome)
Z (Age)
Transport Formula (calibration):
![Page 24: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/24.jpg)
TRANSPORT FORMULAS DEPEND ON THE STORY
a) Z represents age
b) Z represents language skill
X YZ
(b)
S
X Y
Z
(a)
S
?
SS Factorsproducing differences
![Page 25: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/25.jpg)
X
TRANSPORT FORMULAS DEPEND ON THE STORY
a) Z represents age
b) Z represents language skill
c) Z represents a bio-marker
X YZ
(b)
S
(a)X Y
(c)Z
S
?
Y
Z S
![Page 26: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/26.jpg)
U
W
GOAL: ALGORITHM TO DETERMINEIF AN EFFECT IS TRANSPORTABLE
X YZ
V
ST
INPUT: Annotated Causal Graph
OUTPUT:1. Transportable or not?2. Measurements to be taken in the
experimental study3. Measurements to be taken in the
target population4. A transport formula
S Factors creating differences
![Page 27: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/27.jpg)
TRANSPORTABILITYREDUCED TO CALCULUS
TheoremA causal relation R is transportable from ∏ to
∏*ifand only if it is reducible, using the rules of do-calculus, to an expression in which S is separated from do( ).
X Y
Z
S
W
![Page 28: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/28.jpg)
U
W
RESULT: ALGORITHM TO DETERMINEIF AN EFFECT IS TRANSPORTABLE
X YZ
V
ST
INPUT: Annotated Causal Graph
OUTPUT:1. Transportable or not?2. Measurements to be taken in the
experimental study3. Measurements to be taken in the
target population4. A transport formula5. Completeness (Bareinboim, 2012)
S Factors creating differences
![Page 29: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/29.jpg)
X Y(f)
Z
S
X Y(d)
Z
S
W
WHICH MODEL LICENSES THE TRANSPORT OF THE CAUSAL EFFECT X→Y
X Y(e)
Z
S
W
(c)X YZ
S
X YZ
S
WX YZ
S
W
(b)YX
S
(a)YX
S
S External factors creating disparities
Yes YesNo
Yes NoYes
X Y(f)
Z
S
![Page 30: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/30.jpg)
STATISTICAL TRANSPORTABILITY(Transfer Learning)
Why should we transport statistical information?
i.e., Why not re-learn things from scratch ?
1. Measurements are costly.Limit measurements to a subset V * of variables called “scope”.
2. Samples are scarce.Pooling samples from diverse populations will improve precision, if differences can be filtered out.
![Page 31: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/31.jpg)
R=P* (y | x) is transportable over V* = {X,Z}, i.e., R is estimable without re-measuring Y
Transfer LearningIf few samples (N2) are available from ∏* and many samples (N1) from ∏ then estimating R = P*(y | x) by achieves a much higher precision
STATISTICAL TRANSPORTABILITY
Definition: (Statistical Transportability) A statistical relation R(P) is said to be transportable from ∏ to ∏* over V * if R(P*) is identified from P, P*(V *), and D where P*(V *) is the marginal distribution of P* over a subset of variables V *.
X YZ
S
X YZ
S
![Page 32: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/32.jpg)
META-ANALYSIS ORMULTI-SOURCE LEARNING
X Y
(f) Z
W
X Y
(b) Z
W X Y
(c) ZS
WX Y
(a) Z
W
X Y
(g) Z
W
X Y
(e) Z
W
S S
Target population R = P*(y | do(x))
X Y
(h) Z
W X Y
(i) Z
S
W
S
X Y
(d) Z
W
![Page 33: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/33.jpg)
CAN WE GET A BIAS-FREE ESTIMATE OF THE TARGET QUANTITY?
X Y
(a) Z
W
Target population R = P*(y | do(x))
X Y
(d) Z
W
Is R identifiable from (d) and (h) ?
R(∏*) is identifiable from studies (d) and (h).
R(∏*) is not identifiable from studies (d) and (i).
X Y
(h) Z
W
S
X Y
(i) Z
W
S
S
![Page 34: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/34.jpg)
FROM META-ANALYSISTO META-SYNTHESIS
The problem How to combine results of several experimental and observational studies, each conducted on a different population and under a different set of conditions, so as to construct an aggregate measure of effect size that is "better" than any one study in isolation.
![Page 35: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/35.jpg)
META-SYNTHESIS REDUCED TO CALCULUS
Theorem {∏1, ∏2,…,∏K} – a set of studies. {D1, D2,…, DK} – selection diagrams (relative to ∏*). A relation R(∏*) is "meta estimable" if it can bedecomposed into terms of the form:
such that each Qk is transportable from Dk. Open-problem: Systematic decomposition
![Page 36: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/36.jpg)
Principle 1: Calibrate estimands before pooling (to minimize bias)
Principle 2: Decompose to sub-relations before calibrating (to improve precision)
Pooling
Calibration
BIAS VS. PRECISION IN META-SYNTHESIS
W W WX Y
(a) Z
X Y
(h) ZS
X Y
(i) Z
W X Y
(d) Z
W
S
X Y
(g) Z
))(|(**)( xdoyPR
![Page 37: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/37.jpg)
Pooling ))(|(* )( xdoyP all
W W WX Y X Y X YW X YWX Y
(a) Z (h) Z
S
(i) Z (d) Z
S
(g) Z
))(|(**)( xdoyPR
BIAS VS. PRECISION IN META-SYNTHESIS
Composition
Pooling ))(|(* ),( xdowP di
![Page 38: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/38.jpg)
MISSING DATA: A SEEMINGLY STATISTICAL PROBLEM
(Mohan & Pearl, 2012)
• Pervasive in every experimental science.
• Huge literature, powerful software industry, deeply entrenched culture.
• Current practices are based on statistical characterization (Rubin, 1976) of a problem that is inherently causal.
• Consequence: Like Alchemy before Boyle and Dalton, the field is craving for (1) theoretical guidance and (2) performance guarantees.
![Page 39: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/39.jpg)
ESTIMATE P(X,Y,Z)
Sam- Observations Missingness
ple # X* Y* Z* Rx Ry Rz
1 1 0 0 0 0 0
2 1 0 1 0 0 0
3 1 m m 0 1 1
4 0 1 m 0 0 1
5 m 1 m 1 0 1
6 m 0 1 1 0 0
7 m m 0 1 1 0
8 0 1 m 0 0 1
9 0 0 m 0 0 1
10 1 0 m 0 0 1
11 1 0 1 0 0 0
-
![Page 40: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/40.jpg)
Q-1. What should the world be like, for a givenstatistical procedure to produce the expected result?
Q-2. Can we tell from the postulated world whether any method can produce a bias-free result? How?
Q-3. Can we tell from data if the world does notwork as postulated?
•None of these questions can be answered by statistical characterization of the problem.
•All can be answered using causal models
WHAT CAN CAUSAL THEORYDO FOR MISSING DATA?
![Page 41: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/41.jpg)
Causal inference is a missing data problem.(Rubin 2012)
Missing data is a causal inference problem.(Pearl 2012)
Why is missingness a causal problem?
•Which mechanism causes missingness makes a difference in whether / how we can recover information from the data. •Mechanisms require causal language to be properly described – statistics is not sufficient.•Different causal assumptions lead to different routines for recovering information from data, even when the assumptions are indistinguishable by any statistical means.
MISSING DATA:TWO PERSPECTIVES
![Page 42: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/42.jpg)
ESTIMATE P(X,Y,Z)
Sam- Observations Missingness
ple # X* Y* Z* Rx Ry Rz
1 1 0 0 0 0 0
2 1 0 1 0 0 0
3 1 m m 0 1 1
4 0 1 m 0 0 1
5 m 1 m 1 0 1
6 m 0 1 1 0 0
7 m m 0 1 1 0
8 0 1 m 0 0 1
9 0 0 m 0 0 1
10 1 0 m 0 0 1
11 1 0 1 0 0 0
-
X *
XRX
Missingness graph
{
![Page 43: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/43.jpg)
Row #
X Y Z Rx
Ry
Rz
1 1 0 0 0 0 0
2 1 0 1 0 0 0
11 1 0 1 0 0 0
-
• Line deletion estimate is generally biased.
NAIVE ESTIMATE OF P(X,Y,Z)
Complete CasesSam- Observations Missingness
ple # X* Y* Z* Rx Ry Rz
1 1 0 0 0 0 0
2 1 0 1 0 0 0
3 1 m m 0 1 1
4 0 1 m 0 0 1
5 m 1 m 1 0 1
6 m 0 1 1 0 0
7 m m 0 1 1 0
8 0 1 m 0 0 1
9 0 0 m 0 0 1
10 1 0 m 0 0 1
11 1 0 1 0 0 0
- RzRyRx
X Y Z
MCAR
![Page 44: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/44.jpg)
SMART ESTIMATE OF P(X,Y,Z)
Rz
Ry
Rx
Z X Y
Sam- Observations Missingness
ple # X* Y* Z* Rx Ry Rz
1 1 0 0 0 0 0
2 1 0 1 0 0 0
3 1 m m 0 1 1
4 0 1 m 0 0 1
5 m 1 m 1 0 1
6 m 0 1 1 0 0
7 m m 0 1 1 0
8 0 1 m 0 0 1
9 0 0 m 0 0 1
10 1 0 m 0 0 1
11 1 0 1 0 0 0
-
![Page 45: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/45.jpg)
Sam-ple #
X* Y* Z*
1 1 0 0
2 1 0 1
3 1 m m
4 0 1 m
5 m 1 m
6 m 0 1
7 m m 0
8 0 1 m
9 0 0 m
10 1 0 m
11 1 0 1
-
SMART ESTIMATE OF P(X,Y,Z)
![Page 46: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/46.jpg)
Compute
P(Y|Ry=0)Sam-ple #
X* Y* Z*
1 1 0 0
2 1 0 1
3 1 m m
4 0 1 m
5 m 1 m
6 m 0 1
7 m m 0
8 0 1 m
9 0 0 m
10 1 0 m
11 1 0 1
-
Row #
Y*
1 0
2 0
4 1
5 1
6 0
8 1
9 0
10 0
11 0
-
SMART ESTIMATE OF P(X,Y,Z)
![Page 47: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/47.jpg)
Compute
P(X|Y,Rx=0,Ry=0)
Compute
P(Y|Ry=0)Sam-ple #
X* Y* Z*
1 1 0 0
2 1 0 1
3 1 m m
4 0 1 m
5 m 1 m
6 m 0 1
7 m m 0
8 0 1 m
9 0 0 m
10 1 0 m
11 1 0 1
-
Row #
Y*
1 0
2 0
4 1
5 1
6 0
8 1
9 0
10 0
11 0
-
Row #
X* Y*
1 1 0
2 1 0
4 0 1
8 0 1
9 0 0
10 1 0
11 1 0
-
SMART ESTIMATE OF P(X,Y,Z)
![Page 48: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/48.jpg)
Compute
P(Z|X,Y,Rx=0,Ry=0,Rz=0)
Compute
P(X|Y,Rx=0,Ry=0)
Compute
P(Y|Ry=0)Sam-ple #
X* Y* Z*
1 1 0 0
2 1 0 1
3 1 m m
4 0 1 m
5 m 1 m
6 m 0 1
7 m m 0
8 0 1 m
9 0 0 m
10 1 0 m
11 1 0 1
-
Row #
Y*
1 0
2 0
4 1
5 1
6 0
8 1
9 0
10 0
11 0
-
Row #
X* Y*
1 1 0
2 1 0
4 0 1
8 0 1
9 0 0
10 1 0
11 1 0
-
Row #
X* Y* Z*
1 1 0 0
2 1 0 1
11 1 0 1
-
SMART ESTIMATE OF P(X,Y,Z)
![Page 49: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/49.jpg)
INESTIMABLE P(X,Y,Z)
Ry
RxRz
Z X Y
![Page 50: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/50.jpg)
Definition:Given a missingness model M, a probabilistic quantity Q is said to be recoverable if there exists an algorithm that produces a consistent estimate of Q for every dataset generated by M.
Theorem:Q is recoverable iff it is decomposable into terms of the form Qj=P(Sj | Tj), such that:•For each variable V that is in Tj, RV is also in Tj .
•For each variable V that is in Sj, RV is either in Tj or in Sj.
e.g.,
(That is, in the limit of large sample, Q is estimable as if no data were missing.)
RECOVERABILITY FROM MISSING DATA
![Page 51: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/51.jpg)
• Two statistically indistinguishable models, yet P(X,Y) is recoverable in (a) and not in (b).
• No universal algorithm exists that decides recoverability (or guarantees unbiased results) without looking at the model.
AN IMPOSSIBILITY THEOREMFOR MISSING DATA
(a) (b)
RxYX
Accident Injury Injury Treatment
Missing (X)
Education
Missing (X)
![Page 52: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/52.jpg)
• Two statistically indistinguishable models, P(X) is recoverable in both, but through two different methods:
• No universal algorithm exists that produces an unbiased estimate whenever such exists.
NO ESTIMATION WITHOUT CAUSATION
(a) (b)
RxYX
![Page 53: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/53.jpg)
CONCLUSIONS
• Counterfactuals, the building blocks of scientific thought, are encoded meaningfully and conveniently in structural models.
• Identifiability and testability are computable tasks.
• The counterfactual-graphical symbiosis has led to major advances in the empirical sciences, including policy evaluation, mediation analysis, generalizability, credit-blame determination, missing data, and heterogeneity.
• THINK NATURE, NOT DATA
![Page 54: THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl](https://reader037.vdocument.in/reader037/viewer/2022110405/56813207550346895d985aa5/html5/thumbnails/54.jpg)
Thank you