characterising model classes by prime graphs and by ... · – fit a saturated palindromic ising...
TRANSCRIPT
Characterising model classes by prime graphs and bystatistical properties
Nanny Wermuth
Chalmers University of Technology, Sweden andGutenberg-University, Germany
based on joint work with
Giovanni M. Marchetti, University of Florence, andDavid R. Cox, Oxford University
1 May, Cambridge 2017
1 / 24
Trust in empirical results increases
with the possibility of
• replications
• interventions
• a substantive understanding of the development
2 / 24
Consequences for statistical models
If search for causal explanations motivates an empirical studythen an appropriate model should
– be consistent with some causal interpretation
– be checked against available knowledge in the field
– permit a tracing of the pathways of development
3 / 24
A simple example, taken from a longitudinal study, n=2339
designed to develop counselling of prospective students so thatdrop-out at a German University is better avoided
variables
A1:= average grade reached in years 11 to 13 at high school inmedian-dichotomized form (level 1: high achievement: 50%)
A2:= poor integration into high-school classes (yes: 10.3%)
A3:= high-school class repeated (yes: 34.2%)
A4:= change of primary school (yes: 20.1%)
A5:= education of father (at least high-school completed: 42.7%)
1
2
3 4
5
4 / 24
Some general features
For nodes i 6= j and c ⊆ N \ {i, j}, independence of Yi and Yjgiven Yc leads to standard constraints on conditional densities:fi|jc = fi|c or fij|c = fi|c fj|c
A graph is edge-minimal if no edge can be removed withoutintroducing another independence constraint in distributionsgenerated over it
Edge-minimality is an essential, basic property for tracing pathwaysof dependence in graphs
5 / 24
Conditional relations in an edge-minimal graph
For {a, b, c,m} partitioning node set N, we write
a⊥⊥ b|c
for Ya conditionally independent of Yb given Yc
a ttt b|c
for Ya conditionally dependent on Yb given Yc
In an edge-minimal graph, each missing edge means a conditionalindependence each edge present means a conditional dependence
Notation ⊥⊥ : Dawid 1979, JRSS B; ttt: Wermuth & Sadeghi 2012, TEST
6 / 24
Factorizations, independences implied in the example
1 1
2 2
3 34 4
5 5
in a very condensed notation:
fN = f1|23 f23|45 f45 as well as
2⊥⊥ 4|35 and 3⊥⊥ 5|24
Thus, this chain graph is a block-recursive regression graph (LWF chaingraph), where each node in a joint response depends on all of its neighbours
Markov-equivalence proven by Frydenberg (1984) Ann. Statfor a discussion of discrete LWF graphs see Drton (2009) Bernoulli
7 / 24
An old, beautiful result from graph theory
Theorem: Wagner and Halin (1962) Math Annalen Every finite,simple, undirected graph breaks uniquely into its prime graphs
where
– prime graphs are subgraphs characterized by having no cut-set
– a cut-set is the smallest complete subgraph c which separatesdisjoint subsets a and b of the graph
Leimer (1993) Discr Math; Matúš (1994) WUPES94; Dethlefsen andHøjsgaard (2005) Statist Softw
an efficient algorithm implemented in R gives a proper node-setelimination scheme, the prime graphs and the unique cut-sets
8 / 24
Types of prime graphs
cliques (maximal complete prime graphs)
chordless cycles
and more complex incomplete prime graphs
In incomplete prime graphs no node resides in a single clique
9 / 24
All chordal, incomplete graphs in three and four nodes
chordal graphs have exclusively cllques as prime graphs
a V, a diamond, a paw, a 3-leaf star and a single 3-edge path
the diamond has an edge as cut-set, all others have single nodes
outer nodes reside in a single clique and – with every propersingle-node-elimination scheme – one gets a sequence of outernodes and a Markov-equivalent directed acyclic graph (DAGs)
Tarjan and Yannakakis (1984) SIAM J Computing
But, DAGs cannot capture interventions affecting connected responses!
10 / 24
Traceablity of distributions
To combine independences in quite general types of graph, oneneeds composition and intersectionSadeghi & Lauritzen (2014) Bernoulli
To trace pathways of dependences, for distributions generatedover an edge-minimal graph, singleton transitivity is needed inaddition Wermuth (2012) Int Statist Review
All three properties mimic those of regular Gaussian distributionsLnenicka and Matúš (2007) Kybernetika
11 / 24
Proper trees and hollow trees
a traditional or proper tree has edges as prime graphs and singlenodes as cut-sets
a hollow tree has edges or cycles (triangles or chordless cycles) asprime graphs and single nodes or edges as cut-sets
For a hollow tree: if one replaces nodes by prime graphs and edgesby cut-sets, then a ‘single path’ connects each pair of ‘nodes’
12 / 24
A break in the lecture
13 / 24
A general 2× 2 table of probabilities
For effect coding of the levels, that is for levels (−1, 1)
A1 A2 :−1 1−1 α γ
1 β δ
Pearson’s correlation coefficient, ρ, is not a function of the odd-ratio,(αδ)/(βγ), hence it does not vary independently of the marginaldistributions of A1,A2; see Edwards (1963) JRSS C
This alone speaks against linear relations being relevant, but...
14 / 24
In a symmetric 2× 2 table of probabilities
the variables have mean zero and unit variance so that a covariancecoincides with Pearson’s correlation coefficient
A1 A2 : -1 1 sum
−1 α β 1/21 β α 1/2
sum 1/2 1/2 1
odds-ratio also called cross-product ratio: odr = (α/β)2
log-linear parameter: λ = 14 log (odr)
the correlation: ρ = 2(α− β) = tanh(odr)hence the correlation is a 1-1 function of the odds-ratio
15 / 24
Two interesting oddities
For the symmetric 2× 2 table
A1 A2 : -1 1 sum
−1 α β 1/21 β α 1/2
sum 1/2 1/2 1
the log-linear parameter λ coincides with Fisher’s (1924)z-transformation of the correlation coefficient : λ = tanh−1 ρ.
τ = tanhλ coincides with Yule’s (1912) ‘coefficient of colligation’we call τ the hypetan interaction from hype(rbolic) tan(gent)
16 / 24
The trivariate Ising model for symmetric variables
A3 : −1 −1 1 1A1 A2 : −1 1 −1 1−1 α γ δ β
1 β δ γ α
here a palindromic property shows, just like in ‘step on no pets’ andρ12ρ13ρ23
=
−1 −1 0−1 0 −1
0 −1 −1
(4
βγδ
− 1/2)
ρij = τij\k = (τij + τikτjk)/(1 + τijτikτjk)
and
the conditional correlation equals the partial correlation: ρij|k = ρij.k17 / 24
The palindromic Ising model in general
for d binary random variables A1, . . . ,Ad taking values−1,+1and Pr(ω) = Pr(A1 = ω1, . . . ,Ad = ωd) an Ising model forsymmetric binary variables is
log Pr(ω) = λ∅ +∑
s<t λst ωs ωt, −∞ < λst <∞
equivalently, for τst = tanh(λst) and const= 2d(1 +∏
s<tτst)−1
Pr(ω) = const∏
s<t(1 + τstωsωt), −1 < τst < 1
18 / 24
A palindromic Ising model with Markov structure
is said to have been generated over an edge-minimal graph if
λij = 0 ⇐⇒ i⊥⊥ j|N \ {i, j}
and
λij 6= 0 ⇐⇒ i t j|N \ {i, j}
19 / 24
Palindromic Ising models generated over hollow trees
Theorem: A quadratic exponential distribution for symmetric binaryvariables is generated over an edge-minimal hollow tree if and only if
(1) the Markov structure of its graph is defined by the set of zeros inits overall partial correlation matrixand(2) within each of its prime graphs, all conditional correlations agreewith the partial correlations
Corollary: The joint distribution of a palindromic Ising model withproper-tree structure can be generated with simple linearregressions by using any node-set elimination scheme
20 / 24
Palindromic Ising models generated over chordless cycles
1
1
2
2
33
44 5
the induced correlations corresponding to missing edges can bestbe traced in terms of the hypetan interactions
for instance, the induced simple correlation for (1,2) is for the 4-cycle
ρ∗12 = τ12\34 = (τ14τ24 + τ13τ23)/(1 + τ13τ14τ23τ24)
and for the 5-cycle
ρ∗12 = τ12\345 = (τ14τ45τ25 + τ13τ23)/(1 + τ14τ45τ25τ13τ23)
21 / 24
Relevance for general Ising models
One can make a transformation from skewed to symmetric marginswhich preserves all marginal odds-ratios. For πT = [α, β, γ, δ]:the next table, divided by 2(
√αδ +
√βγ) has this form[ √
αδ√βγ√
βγ√αδ
]based on Cox (2006), chap. 6.4; Palmgren (1986) Tech Rep Univ Wash
– fit a saturated palindromic Ising model and use the simple linearrelations to find a well-fitting hollow-tree structure
– given this fitted graph, the mle-fitting to the same hollow-tree uses thesame subset of two-way tables but now for observed skewed margins
22 / 24
What do we gain?
In a palindromic Ising model
• the zeros in the overall partial correlation matrix capture theMarkov structure
• each proper node-set elimination scheme gives a factorisationwith node-sets residing in a single prime graph as responses,those in cut-sets as regressors
• goodness-of-fit tests for cycles relate to these small subsets,giving a chance for local assessments of dependences andchecks against knowledge in the field
• pathways of dependences can be traced and quantified
23 / 24
Some of the more recent references in detail:
Wermuth, N. & Marchetti, G. M. (2017). Generating large Isingmodels with Markov structure via simple linear relations. On ArXiv:1704.01649
Marchetti, G. M. & Wermuth, N. (2016). Palindromic Bernoulli distributions.Electronic Journal of Statistics, 10, 2435–2460; also on ArXiv 1510.09072.
Wermuth, N. (2015). Graphical Markov Models, unifying results and their
interpretation. Wiley Statsref: Statistics Reference Online; also on ArXiv:
1505.02456.
Thanks
24 / 24