exponential-family random network models for social networks · exponential-family random network...

64
Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics University of California - Los Angeles Supported by NIH NIDA Grant DA012831, NICHD Grant HD041877, NSF award MMS-0851555 and the DoD ONR MURI award N00014-08-1-1015. SAMSI Computational Methods in Social Sciences (CMSS) Workshop, August 18-22 2013

Upload: others

Post on 22-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Exponential-family random network modelsfor social networks

Mark S. Handcock

Ian E. Fellows

Department of StatisticsUniversity of California - Los Angeles

Supported by NIH NIDA Grant DA012831, NICHD Grant HD041877, NSFaward MMS-0851555 and the DoD ONR MURI award N00014-08-1-1015.

SAMSI Computational Methods in Social Sciences (CMSS) Workshop,August 18-22 2013

Page 2: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Statistical Models for Social Networks

NotationA social network is defined as a set of n social “actors”, a socialrelationship between each pair of actors, and a set of variables on thoseactors/pairs.

Yij

=

(1 relationship from actor i to actor j

0 otherwise

call Y ⌘ [Yij

]n⇥n

a graph

a N = n(n � 1) binary array

X be n ⇥ q matrix of actor variates

call (Y ,X ) a network

The basic problem of stochastic modeling is to specify a distributionfor X ,Y i.e., P(Y = y ,X = x)

Page 3: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Statistical Models for Social Networks

NotationA social network is defined as a set of n social “actors”, a socialrelationship between each pair of actors, and a set of variables on thoseactors/pairs.

Yij

=

(1 relationship from actor i to actor j

0 otherwise

call Y ⌘ [Yij

]n⇥n

a graph

a N = n(n � 1) binary array

X be n ⇥ q matrix of actor variates

call (Y ,X ) a network

The basic problem of stochastic modeling is to specify a distributionfor X ,Y i.e., P(Y = y ,X = x)

Page 4: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

The ERGM Framework for Network Modeling

Let Y be the sample space of Y e.g. {0, 1}Nand X be the sample space of X .Model the multivariate distribution of Y given X via:

P⌘(Y = y |X = x) =exp{⌘·g(y |x)}c(⌘, x ,Y)

y 2 Y, x 2 X

Frank and Strauss (1986)

⌘ 2 ⇤ ⇢ Rq q-vector of parameters

g(y |x) q-vector of graph statistics.) g(Y |x) are jointly su�cient for the model

c(⌘, x ,Y) distribution normalizing constant

c(⌘, x ,Y) =X

y2Yexp{⌘·g(y |x)}

Page 5: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Simple model-classes for social networks

Homogeneous Bernoulli graph (Erdos-Renyi model)

Yij

are independent and equally likelywith log-odds ⌘ = logit[P⌘(Yij

= 1)]

P⌘(Y = y) =e⌘

Pi,j yij

c(⌘, x ,Y)y 2 Y

where q = 1, g(y) =P

i,j yij , c(⌘, x ,Y) = [1 + exp(⌘)]N

homogeneity means it is unlikely to be proposed as a model for realphenomena

Page 6: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Dyad-independence models with attributes

Yij

are independent but depend on dyadic covariates {xk,ij}q

k=1

P⌘(Y = y |X = x) =eP

q

k=1

⌘k

g

k

(y |x)

c(⌘, x ,Y)y 2 Y

gk

(y |x) =X

i,j

xk,ijyij , k = 1, . . . , q

c(⌘, x ,Y) =Y

i,j

[1 + exp(qX

k=1

⌘k

xk,ij)]

Of course,logit[P⌘(Yij

= 1|X = x)] =X

k

⌘k

xk,ij

Page 7: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Dyad-independence models with attributes

Yij

are independent but depend on dyadic covariates {xk,ij}q

k=1

P⌘(Y = y |X = x) =eP

q

k=1

⌘k

g

k

(y |x)

c(⌘, x ,Y)y 2 Y

gk

(y |x) =X

i,j

xk,ijyij , k = 1, . . . , q

c(⌘, x ,Y) =Y

i,j

[1 + exp(qX

k=1

⌘k

xk,ij)]

Of course,logit[P⌘(Yij

= 1|X = x)] =X

k

⌘k

xk,ij

Page 8: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Dyad-independence models with attributes

Yij

are independent but depend on dyadic covariates {xk,ij}q

k=1

P⌘(Y = y |X = x) =eP

q

k=1

⌘k

g

k

(y |x)

c(⌘, x ,Y)y 2 Y

gk

(y |x) =X

i,j

xk,ijyij , k = 1, . . . , q

c(⌘, x ,Y) =Y

i,j

[1 + exp(qX

k=1

⌘k

xk,ij)]

Of course,logit[P⌘(Yij

= 1|X = x)] =X

k

⌘k

xk,ij

Page 9: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Some history of exponential family models for socialnetworks

Holland and Leinhardt (1981) proposed a general dyad independencemodel

– Also an homogeneous version they refer to as the “p1” model

P⌘(Y = y) =exp{⇢

Pi<j

yij

yji

+ �y++

+P

i

↵i

yi+

+P

j

�j

y+j

}(⇢,↵,�,�)

where ⌘ = (⇢,↵,�,�).

– � controls the expected number of edges– ⇢ represent the expected tendency toward reciprocation– ↵

i

productivity of node i ; �j

attractiveness of node j

Much related work and generalizations

Page 10: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Some history of exponential family models for socialnetworks

Holland and Leinhardt (1981) proposed a general dyad independencemodel

– Also an homogeneous version they refer to as the “p1” model

P⌘(Y = y) =exp{⇢

Pi<j

yij

yji

+ �y++

+P

i

↵i

yi+

+P

j

�j

y+j

}(⇢,↵,�,�)

where ⌘ = (⇢,↵,�,�).

– � controls the expected number of edges– ⇢ represent the expected tendency toward reciprocation– ↵

i

productivity of node i ; �j

attractiveness of node j

Much related work and generalizations

Page 11: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Generative Theory for Network Structure

Actor Markov statistics

) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”

– Yij

in Y that do not share an actor areconditionally independent given the rest of the network

) analogous to nearest neighbor ideas in spatial modeling

Degree distribution: dk

(y) = proportion of actors of degree k in y .

triangles: triangle(y) =number of triads that form a complete sub-graph in y .

Page 12: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Generative Theory for Network Structure

Actor Markov statistics

) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”– Y

ij

in Y that do not share an actor areconditionally independent given the rest of the network

) analogous to nearest neighbor ideas in spatial modeling

Degree distribution: dk

(y) = proportion of actors of degree k in y .

triangles: triangle(y) =number of triads that form a complete sub-graph in y .

Page 13: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Generative Theory for Network Structure

Actor Markov statistics

) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”– Y

ij

in Y that do not share an actor areconditionally independent given the rest of the network

) analogous to nearest neighbor ideas in spatial modeling

Degree distribution: dk

(y) = proportion of actors of degree k in y .

triangles: triangle(y) =number of triads that form a complete sub-graph in y .

Page 14: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Generative Theory for Network Structure

Actor Markov statistics

) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”– Y

ij

in Y that do not share an actor areconditionally independent given the rest of the network

) analogous to nearest neighbor ideas in spatial modeling

Degree distribution: dk

(y) = proportion of actors of degree k in y .

triangles: triangle(y) =number of triads that form a complete sub-graph in y .

Page 15: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Generative Theory for Network Structure

Actor Markov statistics

) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”– Y

ij

in Y that do not share an actor areconditionally independent given the rest of the network

) analogous to nearest neighbor ideas in spatial modeling

Degree distribution: dk

(y) = proportion of actors of degree k in y .

triangles: triangle(y) =number of triads that form a complete sub-graph in y .

!8

Classes of statistics used for modeling

1) Nodal Markov statistics ) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”– edges in Y that do not share an actor are

conditionally independent given the rest of the network) analogous to nearest neighbor ideas in spatial statistics

• Degree distribution: dk

(y) = proportion of nodes of degree k in y.

• k-star distribution: sk

(y) = proportion of k-stars in the graph y.

• triangles: t1(y) = proportion of triangles in the graph y.

• •

i

j

h

....................................................................................................................................................................................................................................

..................................................................................................................

triangle= transitive triad

• •

j1 j2

i

..................................................................................................................

..................................................................................................................

two-star

• •••

j1 j2

i

j3

....................................................................................

..............................................................................................................................................................

three-star

( Mark S. Handcock Statistical Modeling With ERGM !

Page 16: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

More General mechanisms motivated by conditionalindependence

) Pattison and Robins (2002), Butts (2005)) Snijders, Pattison, Robins and Handcock (2006)

– Yuj and Yiv in Y are conditionallyindependent given the rest of the networkif they could not produce a cycle in the network

Page 17: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

More General mechanisms motivated by conditionalindependence

) Pattison and Robins (2002), Butts (2005)) Snijders, Pattison, Robins and Handcock (2006)

– Yuj and Yiv in Y are conditionallyindependent given the rest of the networkif they could not produce a cycle in the networkNew specifications for ERGMs

• •

• •

i v

u j

........

........

........

........

........

........

........

........

........

........

........

........

........

........

..

........

........

........

........

........

........

........

........

........

........

........

........

........

........

..

............. ............. ............. ............. .............

............. ............. ............. ............. .............

Figure 2: Partial conditional dependence when four-cycle is created

(see Figure 2). This partial conditional independence assumption states thattwo possible edges with four distinct nodes are conditionally dependent when-ever their existence in the graph would create a four-cycle. One substantiveinterpretation is that the possibility of a four-cycle establishes the structuralbasis for a “social setting” among four individuals (Pattison and Robins,2002), and that the probability of a dyadic tie between two nodes (here, iand v) is a�ected not just by the other ties of these nodes but also by otherties within such a social setting, even if they do not directly involve i and v.

A four-cycle assumption is a natural extension of modeling based on tri-angles (three-cycles), and was first used by Lazega and Pattison (1999) inan examination of whether such larger cycles could be observed in an empir-ical setting to a greater extent than could be accounted for by parametersfor configurations involving at most 3 nodes. Let us consider the four-cycleassumption alongside the Markov dependence. Under the Markov assump-tion, Yiv is conditionally dependent on each of Yiu, Yuv, Yij and Yjv, becausethese edge indicators share a node. So if yiu = yjv = 1 (the precondition inthe four-cycle partial conditional dependence), then all five of these possibleedges can be mutually dependent, and hence the exponential model (4) couldcontain a parameter corresponding to the count of such configurations. Weterm this configuration, given by

yiv = yiu = yij = yuv = yjv = 1 ,

a two-triangle (see Figure 3). It represents the edge yij = 1 as part of thetriadic setting yij = yiv = yjv = 1 as well as the setting yij = yiu = yju = 1.

Motivated by this approach, we introduce here a generalization of triadicstructures in the form of graph configurations that we term k-triangles. Fora non-directed graph, a k-triangle with base (i, j) is defined by the presenceof a base edge i � j together with the presence of at least k other nodesadjacent to both i and j. We denote a ‘side’ of a k-triangle as any edge thatis not the base. The integer k is called the order of the k-triangle Thus ak-triangle is a combination of k individual triangles, each sharing the sameedge i� j. The concept of a k-triangle can be seen as a triadic analogue of a

15

Page 18: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

This produces features on configurations of the form:

edgewise shared partner distribution: pk

(y) =proportion of edges between actors with exactly k shared partnersk = 0, 1, . . .

⇥ ⇤9

2) Other conditional independence statistics

⇧ Pattison and Robins (2002), Butts (2005)

⇧ Snijders, Pattison, Robins and Handcock (2004)

– edges in Y that are not tied are conditionally

independent given the rest of the network

• k-triangle distribution: tk(y) = proportion of k-triangles in the graph y.

• edgewise shared partner distribution:

pk(y) = propotion of nodes with exactly k edgewise shared partners in y.

•• • • • •

i

j

h1 h2 h3 h4 h5....................................................................................................................................................

............................................................................................................................

..........................................................................................................................

..................................................................................................................................................

................................................................

..................................................................................................................................................................................................................

..............................................................................................................................................................................................................

...........................................................................................................

.........................................................................................................................................................................................................................................................................................................................

........................................................................................................................................................................................................................................................................

.........................................................................................................................................................

.................................................................................................................................................................................................................................................................................................................................................................................................................................

........................................................................................................................................................................................................................................................................................................................................

................................................................................................................................................................................................................

........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

k-triangle for k = 5, i.e., 5-triangle

⌅ Mark S. Handcock Statistical Modeling With ERGM ⇤

Figure: The actors in the non-directed (i , j) edge have 5 shared partners

dyadwise shared partner distribution:dsp

k

(y) = proportion of dyads with exactly k shared partnersk = 0, 1, . . .

Page 19: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Structural Signatures

– identify social constructs or features– based on intuitive notions or partial appeal to substantive theory

Clusters of edges are often transitive:Recall triangle(y) is the number of triangles amongst triads

triangle(y) =1�g

3

�X

{i,j,k}2(g3

)

yij

yik

yjk

A closely related quantity is theproportion of triangles amongst two-stars

C (y) =3⇥triangle(y)

two�star(y)

mean clustering coe�cient

Figure:

Page 20: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Structural Signatures

– identify social constructs or features– based on intuitive notions or partial appeal to substantive theory

Clusters of edges are often transitive:Recall triangle(y) is the number of triangles amongst triads

triangle(y) =1�g

3

�X

{i,j,k}2(g3

)

yij

yik

yjk

A closely related quantity is theproportion of triangles amongst two-stars

C (y) =3⇥triangle(y)

two�star(y)

mean clustering coe�cient

Figure:

Page 21: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Structural Signatures

– identify social constructs or features– based on intuitive notions or partial appeal to substantive theory

Clusters of edges are often transitive:Recall triangle(y) is the number of triangles amongst triads

triangle(y) =1�g

3

�X

{i,j,k}2(g3

)

yij

yik

yjk

A closely related quantity is theproportion of triangles amongst two-stars

C (y) =3⇥triangle(y)

two�star(y)

mean clustering coe�cient

Figure:

Page 22: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Structural Signatures

– identify social constructs or features– based on intuitive notions or partial appeal to substantive theory

Clusters of edges are often transitive:Recall triangle(y) is the number of triangles amongst triads

triangle(y) =1�g

3

�X

{i,j,k}2(g3

)

yij

yik

yjk

A closely related quantity is theproportion of triangles amongst two-stars

C (y) =3⇥triangle(y)

two�star(y)

mean clustering coe�cient

!8

Classes of statistics used for modeling

1) Nodal Markov statistics ) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”– edges in Y that do not share an actor are

conditionally independent given the rest of the network) analogous to nearest neighbor ideas in spatial statistics

• Degree distribution: dk

(y) = proportion of nodes of degree k in y.

• k-star distribution: sk

(y) = proportion of k-stars in the graph y.

• triangles: t1(y) = proportion of triangles in the graph y.

• •

i

j

h

....................................................................................................................................................................................................................................

..................................................................................................................

triangle= transitive triad

• •

j1 j2

i

..................................................................................................................

..................................................................................................................

two-star

• •••

j1 j2

i

j3

....................................................................................

..............................................................................................................................................................

three-star

( Mark S. Handcock Statistical Modeling With ERGM !

Figure:

Page 23: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Exponential-family Random Network ModelsJoint modeling of Y and X

Let N be the sample space of Y ,X

Model the multivariate distribution of Y ,Xvia the form:

P⌘(Y = y ,X = x) =exp{⌘·g(y , x)}

c(⌘,N )y , x 2 N

⌘ 2 ⇤ ⇢ Rq q-vector of parameters

g(y , x) q-vector of network statistics.) g(Y ,X ) are jointly su�cient for the model

c(⌘,N ) distribution normalizing constant

c(⌘,N ) =

Z

y , x2Nexp{⌘·g(y , x)}·dP

0

(y , x)

Page 24: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Interesting model-classes of ERNM

Relationship to ERGM and Random Fields

Let N (x) = {y : (x , y) 2 N} and N (y) = {x : (x , y) 2 N}

ERGM P(Y = y |X = x ; ⌘) =1

c(⌘; x)e⌘·h(x,y) y 2 N (x)

Gibbs measure P(X = x |Y = y ; ⌘) =1

c(⌘; y)e⌘·h(x,y) x 2 N (y)

The first model is the ERGM for the network conditional on thenodal attributes.

The second model is an exponential-family for the field of nodalattributes conditional on the network.

Page 25: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Relationship with ERGM

The model can be expressed as

P(X = x ,Y = y |⌘) = P(Y = y |X = x |⌘)P(X = x |⌘)where

P(Y = y |X = x ; ⌘) =1

c(⌘; x)e⌘·h(x,y) y 2 N (x)

P(X = x |⌘) =c(⌘; x)

c(⌘,N )x 2 X

The first sub-model is the ERGM for the network conditional on thenodal attributes.

The second sub-model is the marginal representation of the nodalattributes and is not necessarily an exponential-family with canonicalparameter ⌘.

This decomposition makes it clear why the conditional modeling ofY given X via ERGM di↵ers from the joint modeling of Y and Xvia ERNM.

Page 26: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Separable ERGM and Field Models

Suppose the model can be expressed as

P(X = x ,Y = y |⌘1

, ⌘2

) =1

c(⌘1

, ⌘2

,N )e⌘1

·h(x)+⌘2

·g(y) (y , x) 2 N . (1)

where N = Y ⇥ X . Then

P(X = x |⌘1

) =1

c1

(⌘1

,X )e⌘1

·h(x)

P(Y = y |⌘2

) =1

c2

(⌘2

,Y)e⌘2

·g(y).

The first sub-model is a general exponential-family model for theattributes (e.g., generalized linear models)

The second sub-model is an ERGM for the graph that has nodependence on the nodal attributes.

Page 27: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Example: Joint Ising Models

Suppose X is univariate and binary xi

2 {�1, 1}. One measure ofhomophily on x is

homophily(y , x) =nX

i=1

nX

j=1

xi

yi,jxj (2)

A simple model for the network is

P(X = x ,Y = y |⌘1

, ⌘2

) / e⌘1

density(y)+⌘2

homophily(y ,x) (y , x) 2 N .

where density(y) = 1

n

Pi

Pj

yi,j

GLM P(Yi,j = y

i,j |X = x , ⌘1

, ⌘2

) / e⌘1

1

n

y

i,j+⌘2

x

i

y

i,j xj y 2 {0, 1}, x 2 XIsing model P(X = x |Y = y , ⌘

2

) / e⌘2

Pi

Pj

x

i

y

i,j xj (y , x) 2 N

So we have a simple joint Ising model

Page 28: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Model specification issues: Degeneracy

Joint Ising model: n = 20 with moderate homophily (mean-value=0.76):

# of edges within x = 1

Count

Frequency

0 50 100 150 200

020000

# of edges within x = -1

Count

Frequency

0 50 100 150 200

020000

# of edges between x = 1 and x = -1

Count

Frequency

0 20 40 60 80 100 120

06000

# of nodes with x = 1

Count

Frequency

0 5 10 15 20

04000

Figure: 100,000 draws from an Ising homophily network modelwith ⌘

1

= 0 and ⌘2

= 0.13. Mean values are marked in red.

Page 29: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Model specification issues: Degeneracy

So standard homophily in ERGM leads to degeneracy in ERNM

Solution: regularized homophily

Suppose x is categorical with category labels 1, . . . ,K .

homophily

k,l(y , x) =nX

i=1

nX

j=1

I (xi

= k)yi,j I (xj = l). (3)

Let di,k(y , x) =

Pi<j

yij

I (xj

= k) be the number of edgesconnecting node i to nodes in category k .

rhomophily

k,l(y , x) =X

i :x

i

=k

qdi,l(y , x)� E??(

qdi,l(Y ,X )) (4)

where E??(·) is the expectation conditional on the number of nodes ineach category of x under the assumption that X and Y are independent.

Page 30: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

What happens when we fit the model with our new homophily?

Page 31: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Comparison of Homophily Statistics

# of edges within group x = -1 # of edges within group x = 1

# of edges from x=-1 to x=1 # of edges from x=1 to x=-1

Regularized Homophily # of nodes with x = 1

0.00

0.01

0.02

0.03

0.00

0.01

0.02

0.03

0.00

0.02

0.04

0.06

0.08

0.00

0.02

0.04

0.06

0.08

0.0

0.1

0.2

0.3

0.4

0.0

0.2

0.4

0.6

0 50 100 150 200 0 50 100 150 200

0 20 40 60 0 20 40 60

0 10 5 10 15value

density

Model

Ising

Regularized Homophily

Figure:

Page 32: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Fitting Models to Partially Observed Social Network Data

Focus on the joint distribution of Z = (Y ,X ).

Two types of data:Observed relations and covariates (z

obs

= (yobs

, xobs

)),and information about the observation mechanism (D)(e.g., indicators of relations and covariates sampled.

L(⌘, ) ⌘ P(Zobs

= zobs

,D|⌘, )

=X

z

unobs

P(Zobs

= zobs

,Zunobs

= zunobs

,D|⌘, )

=X

z

unobs

P(D|Zobs

= zobs

,Zunobs

= zunobs

, )P⌘(Zobs

= zobs

,Zunobs

= zunobs

)

⌘ is the model parameter

is the sampling parameter

When can we “ignore” the sampling process?

Page 33: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Fitting Models to Partially Observed Social Network Data

Focus on the joint distribution of Z = (Y ,X ).

Two types of data:Observed relations and covariates (z

obs

= (yobs

, xobs

)),and information about the observation mechanism (D)(e.g., indicators of relations and covariates sampled.

L(⌘, ) ⌘ P(Zobs

= zobs

,D|⌘, )

=X

z

unobs

P(Zobs

= zobs

,Zunobs

= zunobs

,D|⌘, )

=X

z

unobs

P(D|Zobs

= zobs

,Zunobs

= zunobs

, )P⌘(Zobs

= zobs

,Zunobs

= zunobs

)

⌘ is the model parameter

is the sampling parameter

When can we “ignore” the sampling process?

Page 34: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Fitting Models to Partially Observed Social Network Data

Focus on the joint distribution of Z = (Y ,X ).

Two types of data:Observed relations and covariates (z

obs

= (yobs

, xobs

)),and information about the observation mechanism (D)(e.g., indicators of relations and covariates sampled.

L(⌘, ) ⌘ P(Zobs

= zobs

,D|⌘, )

=X

z

unobs

P(Zobs

= zobs

,Zunobs

= zunobs

,D|⌘, )

=X

z

unobs

P(D|Zobs

= zobs

,Zunobs

= zunobs

, )P⌘(Zobs

= zobs

,Zunobs

= zunobs

)

⌘ is the model parameter

is the sampling parameter

When can we “ignore” the sampling process?

Page 35: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Adaptive Sampling Designs

We call a sampling design adaptive if:

P(D = d |Zobs

,Zmis

, ) = P(D = d |Zobs

, ) 8z 2 Z.

that is, it uses information collected during the survey to directsubsequent sampling, but the sampling design depends only on theobserved data.

adaptive sampling designs satisfy a condition called “missing atrandom” by Rubin (1976) in the context of missing data.

Result: standard network sampling designs such as conventional,single wave and multi-wave link-tracing sampling designs areadaptive

) Thompson and Frank (2000), Handcock and Gile (2007).

Page 36: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Adaptive Sampling Designs

We call a sampling design adaptive if:

P(D = d |Zobs

,Zmis

, ) = P(D = d |Zobs

, ) 8z 2 Z.

that is, it uses information collected during the survey to directsubsequent sampling, but the sampling design depends only on theobserved data.

adaptive sampling designs satisfy a condition called “missing atrandom” by Rubin (1976) in the context of missing data.

Result: standard network sampling designs such as conventional,single wave and multi-wave link-tracing sampling designs areadaptive

) Thompson and Frank (2000), Handcock and Gile (2007).

Page 37: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

When is sampling adaptive?

Examples of adaptive sampling:

Individual sample, sample based on observed things like race, sex,and age that we know.

Link-tracing sample starting with an adaptive sample with follow-upbased on observed relations with others in the sample, as well asthings like race and sex and age.

Link-tracing with probability proportional to number of partners isadaptive!

Examples of non-adaptive (not missing at random) sampling:

Individual sample based on unobserved properties ofnon-respondents - like infection status or illicit activity.

Link-tracing sample starting where links are followed dependent onunobserved properties of alters.

Page 38: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Adaptive Sampling Designs and their Amenable ModelsDefinition: Consider a sampling design governed by parameter 2 and a stochastic network model P⌘(Y = y ,X = x) governed byparameter ⌘ 2 ⌅. We call the sampling design amenable to the model ifthe sampling design is adaptive and the parameters and ⌘ are distinct.

Result: If the sampling design is amenable to the model the likelihoodfor ⌘ and is

L[⌘, |Zobs

= zobs

,D = d ] / L[ |D = d ,Zobs

= zobs

]L[⌘|Zobs

= zobs

]

sampling design likelihood⇥face-value likelihood

sampling L[ |D = d ,Zobs

= zobs

] = P(D|Zobs

= zobs

)

network L[⌘|Zobs

= zobs

] =X

z

unobs

P⌘(Zobs

= zobs

,Zunobs

= zunobs

)

Page 39: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Adaptive Sampling Designs and their Amenable ModelsDefinition: Consider a sampling design governed by parameter 2 and a stochastic network model P⌘(Y = y ,X = x) governed byparameter ⌘ 2 ⌅. We call the sampling design amenable to the model ifthe sampling design is adaptive and the parameters and ⌘ are distinct.

Result: If the sampling design is amenable to the model the likelihoodfor ⌘ and is

L[⌘, |Zobs

= zobs

,D = d ] / L[ |D = d ,Zobs

= zobs

]L[⌘|Zobs

= zobs

]

sampling design likelihood⇥face-value likelihood

sampling L[ |D = d ,Zobs

= zobs

] = P(D|Zobs

= zobs

)

network L[⌘|Zobs

= zobs

] =X

z

unobs

P⌘(Zobs

= zobs

,Zunobs

= zunobs

)

Page 40: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Adaptive Sampling Designs and their Amenable Models

Result: If the sampling design is not amenable to the model thelikelihood for ⌘ and is

L(⌘, ) =X

z

unobs

P(D|Zobs

= zobs

,Zunobs

= zunobs

, )P⌘(Zobs

= zobs

,Zunobs

= zunobs

)

and the design will need to be represented.

Clearly P(D|Y ,X , ) can be modeled when it is unknown.

Page 41: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Likelihood-based inference for ERNMwhen partially observed

Consider the conditional distribution of T given Tobs

:

P⌘(Tunobs

= t|Tobs

= tobs

) = exp [⌘·g(t + tobs

)� (⌘|tobs

)] t 2 T (tobs

)

where T (tobs

) = {t : t + tobs

2 T }

(⌘|tobs

) = logX

u2T (t

obs

)

exp [⌘·g(u + tobs

)].

Note thatL[⌘|T

obs

= tobs

] / exp [(⌘|tobs

)� (⌘)]

which can then be estimated by MCMC:

the first term by a chain on the complete data over T and;

the second by a chain conditional on tobs

over T (tobs

).

Page 42: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Likelihood-based inference for ERNMwhen partially observed

Consider the conditional distribution of T given Tobs

:

P⌘(Tunobs

= t|Tobs

= tobs

) = exp [⌘·g(t + tobs

)� (⌘|tobs

)] t 2 T (tobs

)

where T (tobs

) = {t : t + tobs

2 T }

(⌘|tobs

) = logX

u2T (t

obs

)

exp [⌘·g(u + tobs

)].

Note thatL[⌘|T

obs

= tobs

] / exp [(⌘|tobs

)� (⌘)]

which can then be estimated by MCMC:

the first term by a chain on the complete data over T and;

the second by a chain conditional on tobs

over T (tobs

).

Page 43: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Likelihood-based inference for ERNMwhen partially observed

In general, the observed data log likelihood ratio of (⌘, ✓) versus (⌘0

, ✓0

) is

`(⌘, ✓)� `(⌘0

, ✓0

) = log(c(t

obs

,w , ⌘, ✓)

c(tobs

,w , ⌘0

, ✓0

))� log(

c(⌘, T )

c(⌘0

, T ))

= log(X

t

miss

p(W = w |T = t, ✓)

p(W = w |T = t, ✓0

)e(⌘�⌘

0

)·g(t) p(W = w |T = t, ✓0

)e⌘0

·g(t)

c(tobs

,w , ⌘0

, ✓0

))

� log(X

t

miss

e(⌘�⌘0

)·g(t) e⌘0

·g(t)

c(⌘, T ))

= log(E⌘0

,✓0

(p(W = w |T , ✓)

p(W = w |T , ✓0

)e(⌘�⌘

0

)·g(T ))|W = w ,Tobs

= tobs

) (5)

� log(E⌘0

(e(⌘�⌘0

)·g(T )))

Page 44: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Likelihood-based inference for ERNMwhen partially observed

In general, the observed data log likelihood ratio of (⌘, ✓) versus (⌘0

, ✓0

)may be approximated by

`(⌘, ✓)� `(⌘0

, ✓0

)

⇡ log(1

M

MX

i

p(w |t(i)m

, ✓)

p(w |t(i)m

, ✓0

)e(⌘�⌘

0

)·g(t(i)m

))� log(1

M

MX

i

e(⌘�⌘0

)·g(t(i))).

Page 45: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Likelihood-based inference for ERNMwhen partially observed

ERGM case implemented in R package statnet (Handcock et al 2003)(http://statnet.org).

ERNM case implemented in ernm package (Fellows 2012).

Page 46: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Adolescent Peer Networks

The National Longitudinal Study of Adolescent Health) www.cpc.unc.edu/projects/addhealth

– “Add Health” is a school-based study of the health-relatedbehaviors of adolescents in grades 7 to 12.

Each nominated up to 5 boys and 5 girls as their friends

160 schools: Smallest has 69 adolescents in grades 7–12

Page 47: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Adolescent Peer Networks

The National Longitudinal Study of Adolescent Health) www.cpc.unc.edu/projects/addhealth

– “Add Health” is a school-based study of the health-relatedbehaviors of adolescents in grades 7 to 12.

Each nominated up to 5 boys and 5 girls as their friends

160 schools: Smallest has 69 adolescents in grades 7–12

Page 48: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

−10 −5 0 5 10

−10

−50

510

12

7 9

10

9

8

10

11

7

8

11

8

10

8

8

10

97

8

8

11

8

99

7

11

9

10

8

11

7

9

11

11

11

10

10

9

9

7

10

10

7

7 9

9

1111

8

12

9

9

10

7

7

9

7

11

9

7

12

7

8

9

11

11

7

8

12

Page 49: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

White (non-Hispanic)Grade 7

Black (non-Hispanic)

Hispanic (of any race)Asian / Native Am / Other (non-Hispanic)

Race NA

Grade 8

Grade 9

Grade 10

Grade 11

Grade 12

Grade NA

Page 50: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Application to substance use in adolescent peer networksData: The National Longitudinal Study of Adolescent Health– Model friendship network as a function of student characteristics

Form Name DefinitionY Mean Degree Average degree of studentsY Log Variance of Degree The log variance of the student degreesY In Degree = 0 # of students with in degree 0Y In Degree = 1 # of students with in degree 1Y Out Degree = 0 # of students with out degree 0Y Out Degree = 1 # of students with out degree 1Y Reciprocity # of reciprocated tiesX Grade = 9 # of freshmenX Grade = 10 # of sophomoresX Grade = 11 # of juniorsX ,Y Within Grade Homophily Pooled homophily within gradeX ,Y +1 Grade Homophily Pooled homophily between each grade

and the grade above it

Table: Model Terms

Page 51: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Application to substance use in adolescent peer networks

⌘ Std. Error Z p�valueMean Degree -217.02 7.81 -27.80 <0.001

Log Variance of degree 25.07 9.06 2.77 0.006In-Degree 0 2.62 0.50 5.20 <0.001In-Degree 1 1.05 0.40 2.62 0.009

Out-Degree 0 4.09 0.52 7.91 <0.001Out-Degree 1 1.93 0.45 4.25 <0.001

Reciprocity 2.71 0.23 11.77 <0.001Grade = 9 1.46 0.62 2.37 0.018Grade = 10 1.93 0.71 2.72 0.007Grade = 11 2.08 0.59 3.54 <0.001

Grade Homophily 4.34 0.46 9.41 <0.001+1 Grade Homophily 0.63 0.21 2.98 0.003

Table: ERNM Model with Standard Errors Based on the Fisher Information

Page 52: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Application to substance use in adolescent peer networks

We see that students in the same grade are much more likely to befriends

The positive coe�cient for ’+1 Grade Homophily’ indicates thatstudents also tend to form connections to the grades just below orjust above them.

Evaluating the goodness-of-fit?Simulate networks from the fitted model, and visually compare themto the observed network (Hunter, Goodreau, Handcock 2008).Simulate network statistics from the model and compare them to theobserved network.

Page 53: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Model-Based Simulated High School

Simulated Network

9

11

9

9

12

9

9

10 11

12

10

11

12

10

1212

12

9

10

10

10

9 1111

11

11

12

9

10

12

11

10

9

11

1111

1112

11

9

12

910

9119

9

9

9

12

9

9

12

9

910

11

10

10

9

11

9

12

12

10

10

10

10

10

10 11

10

11

9

Observed Network

11

11

10

9

109

9

12 10

101210

11

12

99

10

99

11

910

11

99

9

10

9

10

10

1112 11

9

10

9

11

11

11

1111

11

11

10

11

9

10

11

11

12

11

9

9

11

11

1011

1012

9

12

9

10

10

10

9

9

1212

12

11

12

10

12

Page 54: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Model Diagnostics: Goodness-of-fit

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

0

5

10

15

20

25

30

In-Degree

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

0

5

10

15

20

25

30

Out-Degree

10-10

11-10

12-10

9-10

10-11

11-11

12-11

9-11

10-12

11-12

12-12

9-12

10-9

11-9

12-9 9-9

0

20

40

60

80

100

120

# of Edges Between Grades

9 10 11 12

5

10

15

20

25

30

Grade Counts

Page 55: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Network Regression with endogenous nodal attributes

Let Z 2 {0, 1} be a binary outcome variable (e.g., substance use) andconsider:

P(Z = z ,X = x ,Y = y |⌘,�,�) = 1

c(�, ⌘,�)ez·x�+⌘·g(x,y)+�·h(z,y). (6)

So, conditional on Y = y ,�,�:

logodds(zi

= 1|z�i

,Xi

= xi

)� logodds(zi

= 1|z�i

,Xi

= x⇤i

)

= �(xi

� x⇤i

)

where z�i

represents the set of z not including zi

, xi

represents the ithrow of X .

So � have their usual interpretations (conditional on the rest of thenetwork)

The usual independence assumptions do not hold

Page 56: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Simple logistic regression of substance use on gender

What is the relationship between gender and substance use?

Logistic regression of substance use on gender:

� Std. Error Z p�valueIntercept -1.70 0.44 -3.84 <0.001Gender 1.18 0.57 2.09 0.037

Table: Simple Logistic Regression Model Ignoring Network Structure

Page 57: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Logistic Regression using Network Data

Bootstrap Asymptotic⌘ Std. Error Std. Error Z p�value

Mean Degree -215.50 8.32 8.15 -26.44 <0.001Log Variance of degree 24.46 8.80 8.91 2.75 0.006

In-Degree 0 2.68 0.55 0.48 5.55 <0.001In-Degree 1 1.07 0.43 0.41 2.60 0.009

Out-Degree 0 4.15 0.54 0.52 8.03 <0.001Out-Degree 1 1.94 0.50 0.45 4.31 <0.001

Reciprocity 2.71 0.25 0.23 11.96 <0.001Grade Homophily 4.28 0.44 0.47 9.18 <0.001

+1 Grade Homophily 0.62 0.21 0.21 2.99 0.003Gender Homophily 0.78 0.24 0.24 3.27 0.001

Substance Homophily 0.76 0.25 0.25 3.02 0.003Intercept -1.72 0.50 0.44 -3.91 <0.001Gender 0.92 0.55 0.51 1.79 0.073

Table: ERNM Model Inference

Page 58: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Model diagnostics for network regression

# of edges within substance categories

Count

Frequency

100 150 200 250 300 350

0100

# of edges between users and non-users

Count

Frequency

50 100 1500

100

# of non-substance users

Count

Frequency

45 50 55 60

0150

Figure: Substance Use Homophily Diagnostics.The values of the observed statistics are marked in red.

Page 59: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Model diagnostics for network regression

# of edges within substance categories

Count

Frequency

100 150 200 250 300 350

0100

# of edges between users and non-users

Count

Frequency

50 100 1500

100

# of non-substance users

Count

Frequency

45 50 55 60

0150

Figure: Substance Use Homophily Diagnostics.The values of the observed statistics are marked in red.

Page 60: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Latent Class Modeling using ERNM

We observe the relational ties Y

Postulate the existence of a categorical nodal covariate X

Build an ERNM model based on complex statistics

Treat all X values as missing

A new variant of the stochastic block model (Wang and Wong 1987)

Page 61: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Latent Class Modeling using ERNM

We observe the relational ties Y

Postulate the existence of a categorical nodal covariate X

Build an ERNM model based on complex statistics

Treat all X values as missing

A new variant of the stochastic block model (Wang and Wong 1987)

Page 62: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Latent Class Modeling using ERNMExample: Latent Cluster Model of Sampson’s Monks

Expressed “liking” between 18 monks within an isolated monastery) Sampson (1969)

A directed relationship aggregated over a 12 month period before thebreakup of the cloister.

Sampson identified three groups plus:(T)urks, (L)oyal Opposition, (O)utcasts and (W)averers

!! " ! #

!!

"!

#

$

$

$

$

$

$

$

%

%

%

%

%

%

%

&

&&&

Page 63: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Latent Class Modeling using ERNM

We observe the relational ties Y

Postulate the existence of a categorical nodal covariate X

Assume the sample space of X has K = 18 categories

Build a model based on count, density and relative homophily terms

Treat all X values as missing using the above ideas

Term ⌘ µ s.e.(⌘) s.e.(µ)# of edges -0.58 88.23 0.14 7.48Homophily 7.28 15.30 0.91 1.33# in group 2 -0.02 6.95 1.31 0.99# in group 3 -2.50 3.95 1.44 1.08

Table: Latent class model of Sampson’s monks.

p(X |Y = yobs

, ⌘) assign each monk with probability one to thecorrect clusters.

Almost all mass goes on three groups.

Page 64: Exponential-family random network models for social networks · Exponential-family random network models for social networks Mark S. Handcock Ian E. Fellows Department of Statistics

Conclusions

Exponential-family random network models area powerful new way to model network data

Leads to a new approach to network regression

Leads to a new approach to latent variable modeling

Model specification and development are extensions of that forERGM, but require new perspectives