characterising model classes by prime graphs and by ... · – ﬁt a saturated palindromic ising...

Characterising model classes by prime graphs and bystatistical properties

Nanny Wermuth

Chalmers University of Technology, Sweden andGutenberg-University, Germany

based on joint work with

Giovanni M. Marchetti, University of Florence, andDavid R. Cox, Oxford University

1 May, Cambridge 2017

1 / 24

Trust in empirical results increases

with the possibility of

• replications

• interventions

• a substantive understanding of the development

2 / 24

Consequences for statistical models

If search for causal explanations motivates an empirical studythen an appropriate model should

– be consistent with some causal interpretation

– be checked against available knowledge in the field

– permit a tracing of the pathways of development

3 / 24

A simple example, taken from a longitudinal study, n=2339

designed to develop counselling of prospective students so thatdrop-out at a German University is better avoided

variables

A1:= average grade reached in years 11 to 13 at high school inmedian-dichotomized form (level 1: high achievement: 50%)

A2:= poor integration into high-school classes (yes: 10.3%)

A3:= high-school class repeated (yes: 34.2%)

A4:= change of primary school (yes: 20.1%)

A5:= education of father (at least high-school completed: 42.7%)

1

2

3 4

5

4 / 24

Some general features

For nodes i 6= j and c ⊆ N \ {i, j}, independence of Yi and Yjgiven Yc leads to standard constraints on conditional densities:fi|jc = fi|c or fij|c = fi|c fj|c

A graph is edge-minimal if no edge can be removed withoutintroducing another independence constraint in distributionsgenerated over it

Edge-minimality is an essential, basic property for tracing pathwaysof dependence in graphs

5 / 24

Conditional relations in an edge-minimal graph

For {a, b, c,m} partitioning node set N, we write

a⊥⊥ b|c

for Ya conditionally independent of Yb given Yc

a ttt b|c

for Ya conditionally dependent on Yb given Yc

In an edge-minimal graph, each missing edge means a conditionalindependence each edge present means a conditional dependence

Notation ⊥⊥ : Dawid 1979, JRSS B; ttt: Wermuth & Sadeghi 2012, TEST

6 / 24

Factorizations, independences implied in the example

1 1

2 2

3 34 4

5 5

in a very condensed notation:

fN = f1|23 f23|45 f45 as well as

2⊥⊥ 4|35 and 3⊥⊥ 5|24

Thus, this chain graph is a block-recursive regression graph (LWF chaingraph), where each node in a joint response depends on all of its neighbours

Markov-equivalence proven by Frydenberg (1984) Ann. Statfor a discussion of discrete LWF graphs see Drton (2009) Bernoulli

7 / 24

An old, beautiful result from graph theory

Theorem: Wagner and Halin (1962) Math Annalen Every finite,simple, undirected graph breaks uniquely into its prime graphs

where

– prime graphs are subgraphs characterized by having no cut-set

– a cut-set is the smallest complete subgraph c which separatesdisjoint subsets a and b of the graph

Leimer (1993) Discr Math; Matúš (1994) WUPES94; Dethlefsen andHøjsgaard (2005) Statist Softw

an efficient algorithm implemented in R gives a proper node-setelimination scheme, the prime graphs and the unique cut-sets

8 / 24

Types of prime graphs

cliques (maximal complete prime graphs)

chordless cycles

and more complex incomplete prime graphs

In incomplete prime graphs no node resides in a single clique

9 / 24

All chordal, incomplete graphs in three and four nodes

chordal graphs have exclusively cllques as prime graphs

a V, a diamond, a paw, a 3-leaf star and a single 3-edge path

the diamond has an edge as cut-set, all others have single nodes

outer nodes reside in a single clique and – with every propersingle-node-elimination scheme – one gets a sequence of outernodes and a Markov-equivalent directed acyclic graph (DAGs)

Tarjan and Yannakakis (1984) SIAM J Computing

But, DAGs cannot capture interventions affecting connected responses!

10 / 24

Traceablity of distributions

To combine independences in quite general types of graph, oneneeds composition and intersectionSadeghi & Lauritzen (2014) Bernoulli

To trace pathways of dependences, for distributions generatedover an edge-minimal graph, singleton transitivity is needed inaddition Wermuth (2012) Int Statist Review

All three properties mimic those of regular Gaussian distributionsLnenicka and Matúš (2007) Kybernetika

11 / 24

Proper trees and hollow trees

a traditional or proper tree has edges as prime graphs and singlenodes as cut-sets

a hollow tree has edges or cycles (triangles or chordless cycles) asprime graphs and single nodes or edges as cut-sets

For a hollow tree: if one replaces nodes by prime graphs and edgesby cut-sets, then a ‘single path’ connects each pair of ‘nodes’

12 / 24

A break in the lecture

13 / 24

A general 2× 2 table of probabilities

For effect coding of the levels, that is for levels (−1, 1)

A1 A2 :−1 1−1 α γ

1 β δ

Pearson’s correlation coefficient, ρ, is not a function of the odd-ratio,(αδ)/(βγ), hence it does not vary independently of the marginaldistributions of A1,A2; see Edwards (1963) JRSS C

This alone speaks against linear relations being relevant, but...

14 / 24

In a symmetric 2× 2 table of probabilities

the variables have mean zero and unit variance so that a covariancecoincides with Pearson’s correlation coefficient

A1 A2 : -1 1 sum

−1 α β 1/21 β α 1/2

sum 1/2 1/2 1

odds-ratio also called cross-product ratio: odr = (α/β)2

log-linear parameter: λ = 14 log (odr)

the correlation: ρ = 2(α− β) = tanh(odr)hence the correlation is a 1-1 function of the odds-ratio

15 / 24

Two interesting oddities

For the symmetric 2× 2 table

A1 A2 : -1 1 sum

−1 α β 1/21 β α 1/2

sum 1/2 1/2 1

the log-linear parameter λ coincides with Fisher’s (1924)z-transformation of the correlation coefficient : λ = tanh−1 ρ.

τ = tanhλ coincides with Yule’s (1912) ‘coefficient of colligation’we call τ the hypetan interaction from hype(rbolic) tan(gent)

16 / 24

The trivariate Ising model for symmetric variables

A3 : −1 −1 1 1A1 A2 : −1 1 −1 1−1 α γ δ β

1 β δ γ α

here a palindromic property shows, just like in ‘step on no pets’ andρ12ρ13ρ23

=

−1 −1 0−1 0 −1

0 −1 −1

(4

βγδ

− 1/2)

ρij = τij\k = (τij + τikτjk)/(1 + τijτikτjk)

and

the conditional correlation equals the partial correlation: ρij|k = ρij.k17 / 24

The palindromic Ising model in general

for d binary random variables A1, . . . ,Ad taking values−1,+1and Pr(ω) = Pr(A1 = ω1, . . . ,Ad = ωd) an Ising model forsymmetric binary variables is

log Pr(ω) = λ∅ +∑

s<t λst ωs ωt, −∞ < λst <∞

equivalently, for τst = tanh(λst) and const= 2d(1 +∏

s<tτst)−1

Pr(ω) = const∏

s<t(1 + τstωsωt), −1 < τst < 1

18 / 24

A palindromic Ising model with Markov structure

is said to have been generated over an edge-minimal graph if

λij = 0 ⇐⇒ i⊥⊥ j|N \ {i, j}

and

λij 6= 0 ⇐⇒ i t j|N \ {i, j}

19 / 24

Palindromic Ising models generated over hollow trees

Theorem: A quadratic exponential distribution for symmetric binaryvariables is generated over an edge-minimal hollow tree if and only if

(1) the Markov structure of its graph is defined by the set of zeros inits overall partial correlation matrixand(2) within each of its prime graphs, all conditional correlations agreewith the partial correlations

Corollary: The joint distribution of a palindromic Ising model withproper-tree structure can be generated with simple linearregressions by using any node-set elimination scheme

20 / 24

Palindromic Ising models generated over chordless cycles

1

1

2

2

33

44 5

the induced correlations corresponding to missing edges can bestbe traced in terms of the hypetan interactions

for instance, the induced simple correlation for (1,2) is for the 4-cycle

ρ∗12 = τ12\34 = (τ14τ24 + τ13τ23)/(1 + τ13τ14τ23τ24)

and for the 5-cycle

ρ∗12 = τ12\345 = (τ14τ45τ25 + τ13τ23)/(1 + τ14τ45τ25τ13τ23)

21 / 24

Relevance for general Ising models

One can make a transformation from skewed to symmetric marginswhich preserves all marginal odds-ratios. For πT = [α, β, γ, δ]:the next table, divided by 2(

√αδ +

√βγ) has this form[ √

αδ√βγ√

βγ√αδ

]based on Cox (2006), chap. 6.4; Palmgren (1986) Tech Rep Univ Wash

– fit a saturated palindromic Ising model and use the simple linearrelations to find a well-fitting hollow-tree structure

– given this fitted graph, the mle-fitting to the same hollow-tree uses thesame subset of two-way tables but now for observed skewed margins

22 / 24

What do we gain?

In a palindromic Ising model

• the zeros in the overall partial correlation matrix capture theMarkov structure

• each proper node-set elimination scheme gives a factorisationwith node-sets residing in a single prime graph as responses,those in cut-sets as regressors

• goodness-of-fit tests for cycles relate to these small subsets,giving a chance for local assessments of dependences andchecks against knowledge in the field

• pathways of dependences can be traced and quantified

23 / 24

Some of the more recent references in detail:

Wermuth, N. & Marchetti, G. M. (2017). Generating large Isingmodels with Markov structure via simple linear relations. On ArXiv:1704.01649

Marchetti, G. M. & Wermuth, N. (2016). Palindromic Bernoulli distributions.Electronic Journal of Statistics, 10, 2435–2460; also on ArXiv 1510.09072.

Wermuth, N. (2015). Graphical Markov Models, unifying results and their

interpretation. Wiley Statsref: Statistics Reference Online; also on ArXiv:

1505.02456.

Thanks

24 / 24

characterising model classes by prime graphs and by ... · – ﬁt a saturated palindromic ising...

Documents