exponential-family random network models for social networks · exponential-family random network...

Post on 22-Aug-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Exponential-family random network modelsfor social networks

Mark S. Handcock

Ian E. Fellows

Department of StatisticsUniversity of California - Los Angeles

Supported by NIH NIDA Grant DA012831, NICHD Grant HD041877, NSFaward MMS-0851555 and the DoD ONR MURI award N00014-08-1-1015.

SAMSI Computational Methods in Social Sciences (CMSS) Workshop,August 18-22 2013

Statistical Models for Social Networks

NotationA social network is defined as a set of n social “actors”, a socialrelationship between each pair of actors, and a set of variables on thoseactors/pairs.

Yij

=

(1 relationship from actor i to actor j

0 otherwise

call Y ⌘ [Yij

]n⇥n

a graph

a N = n(n � 1) binary array

X be n ⇥ q matrix of actor variates

call (Y ,X ) a network

The basic problem of stochastic modeling is to specify a distributionfor X ,Y i.e., P(Y = y ,X = x)

Statistical Models for Social Networks

NotationA social network is defined as a set of n social “actors”, a socialrelationship between each pair of actors, and a set of variables on thoseactors/pairs.

Yij

=

(1 relationship from actor i to actor j

0 otherwise

call Y ⌘ [Yij

]n⇥n

a graph

a N = n(n � 1) binary array

X be n ⇥ q matrix of actor variates

call (Y ,X ) a network

The basic problem of stochastic modeling is to specify a distributionfor X ,Y i.e., P(Y = y ,X = x)

The ERGM Framework for Network Modeling

Let Y be the sample space of Y e.g. {0, 1}Nand X be the sample space of X .Model the multivariate distribution of Y given X via:

P⌘(Y = y |X = x) =exp{⌘·g(y |x)}c(⌘, x ,Y)

y 2 Y, x 2 X

Frank and Strauss (1986)

⌘ 2 ⇤ ⇢ Rq q-vector of parameters

g(y |x) q-vector of graph statistics.) g(Y |x) are jointly su�cient for the model

c(⌘, x ,Y) distribution normalizing constant

c(⌘, x ,Y) =X

y2Yexp{⌘·g(y |x)}

Simple model-classes for social networks

Homogeneous Bernoulli graph (Erdos-Renyi model)

Yij

are independent and equally likelywith log-odds ⌘ = logit[P⌘(Yij

= 1)]

P⌘(Y = y) =e⌘

Pi,j yij

c(⌘, x ,Y)y 2 Y

where q = 1, g(y) =P

i,j yij , c(⌘, x ,Y) = [1 + exp(⌘)]N

homogeneity means it is unlikely to be proposed as a model for realphenomena

Dyad-independence models with attributes

Yij

are independent but depend on dyadic covariates {xk,ij}q

k=1

P⌘(Y = y |X = x) =eP

q

k=1

⌘k

g

k

(y |x)

c(⌘, x ,Y)y 2 Y

gk

(y |x) =X

i,j

xk,ijyij , k = 1, . . . , q

c(⌘, x ,Y) =Y

i,j

[1 + exp(qX

k=1

⌘k

xk,ij)]

Of course,logit[P⌘(Yij

= 1|X = x)] =X

k

⌘k

xk,ij

Dyad-independence models with attributes

Yij

are independent but depend on dyadic covariates {xk,ij}q

k=1

P⌘(Y = y |X = x) =eP

q

k=1

⌘k

g

k

(y |x)

c(⌘, x ,Y)y 2 Y

gk

(y |x) =X

i,j

xk,ijyij , k = 1, . . . , q

c(⌘, x ,Y) =Y

i,j

[1 + exp(qX

k=1

⌘k

xk,ij)]

Of course,logit[P⌘(Yij

= 1|X = x)] =X

k

⌘k

xk,ij

Dyad-independence models with attributes

Yij

are independent but depend on dyadic covariates {xk,ij}q

k=1

P⌘(Y = y |X = x) =eP

q

k=1

⌘k

g

k

(y |x)

c(⌘, x ,Y)y 2 Y

gk

(y |x) =X

i,j

xk,ijyij , k = 1, . . . , q

c(⌘, x ,Y) =Y

i,j

[1 + exp(qX

k=1

⌘k

xk,ij)]

Of course,logit[P⌘(Yij

= 1|X = x)] =X

k

⌘k

xk,ij

Some history of exponential family models for socialnetworks

Holland and Leinhardt (1981) proposed a general dyad independencemodel

– Also an homogeneous version they refer to as the “p1” model

P⌘(Y = y) =exp{⇢

Pi<j

yij

yji

+ �y++

+P

i

↵i

yi+

+P

j

�j

y+j

}(⇢,↵,�,�)

where ⌘ = (⇢,↵,�,�).

– � controls the expected number of edges– ⇢ represent the expected tendency toward reciprocation– ↵

i

productivity of node i ; �j

attractiveness of node j

Much related work and generalizations

Some history of exponential family models for socialnetworks

Holland and Leinhardt (1981) proposed a general dyad independencemodel

– Also an homogeneous version they refer to as the “p1” model

P⌘(Y = y) =exp{⇢

Pi<j

yij

yji

+ �y++

+P

i

↵i

yi+

+P

j

�j

y+j

}(⇢,↵,�,�)

where ⌘ = (⇢,↵,�,�).

– � controls the expected number of edges– ⇢ represent the expected tendency toward reciprocation– ↵

i

productivity of node i ; �j

attractiveness of node j

Much related work and generalizations

Generative Theory for Network Structure

Actor Markov statistics

) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”

– Yij

in Y that do not share an actor areconditionally independent given the rest of the network

) analogous to nearest neighbor ideas in spatial modeling

Degree distribution: dk

(y) = proportion of actors of degree k in y .

triangles: triangle(y) =number of triads that form a complete sub-graph in y .

Generative Theory for Network Structure

Actor Markov statistics

) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”– Y

ij

in Y that do not share an actor areconditionally independent given the rest of the network

) analogous to nearest neighbor ideas in spatial modeling

Degree distribution: dk

(y) = proportion of actors of degree k in y .

triangles: triangle(y) =number of triads that form a complete sub-graph in y .

Generative Theory for Network Structure

Actor Markov statistics

) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”– Y

ij

in Y that do not share an actor areconditionally independent given the rest of the network

) analogous to nearest neighbor ideas in spatial modeling

Degree distribution: dk

(y) = proportion of actors of degree k in y .

triangles: triangle(y) =number of triads that form a complete sub-graph in y .

Generative Theory for Network Structure

Actor Markov statistics

) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”– Y

ij

in Y that do not share an actor areconditionally independent given the rest of the network

) analogous to nearest neighbor ideas in spatial modeling

Degree distribution: dk

(y) = proportion of actors of degree k in y .

triangles: triangle(y) =number of triads that form a complete sub-graph in y .

Generative Theory for Network Structure

Actor Markov statistics

) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”– Y

ij

in Y that do not share an actor areconditionally independent given the rest of the network

) analogous to nearest neighbor ideas in spatial modeling

Degree distribution: dk

(y) = proportion of actors of degree k in y .

triangles: triangle(y) =number of triads that form a complete sub-graph in y .

!8

Classes of statistics used for modeling

1) Nodal Markov statistics ) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”– edges in Y that do not share an actor are

conditionally independent given the rest of the network) analogous to nearest neighbor ideas in spatial statistics

• Degree distribution: dk

(y) = proportion of nodes of degree k in y.

• k-star distribution: sk

(y) = proportion of k-stars in the graph y.

• triangles: t1(y) = proportion of triangles in the graph y.

• •

i

j

h

....................................................................................................................................................................................................................................

..................................................................................................................

triangle= transitive triad

• •

j1 j2

i

..................................................................................................................

..................................................................................................................

two-star

• •••

j1 j2

i

j3

....................................................................................

..............................................................................................................................................................

three-star

( Mark S. Handcock Statistical Modeling With ERGM !

More General mechanisms motivated by conditionalindependence

) Pattison and Robins (2002), Butts (2005)) Snijders, Pattison, Robins and Handcock (2006)

– Yuj and Yiv in Y are conditionallyindependent given the rest of the networkif they could not produce a cycle in the network

More General mechanisms motivated by conditionalindependence

) Pattison and Robins (2002), Butts (2005)) Snijders, Pattison, Robins and Handcock (2006)

– Yuj and Yiv in Y are conditionallyindependent given the rest of the networkif they could not produce a cycle in the networkNew specifications for ERGMs

• •

• •

i v

u j

........

........

........

........

........

........

........

........

........

........

........

........

........

........

..

........

........

........

........

........

........

........

........

........

........

........

........

........

........

..

............. ............. ............. ............. .............

............. ............. ............. ............. .............

Figure 2: Partial conditional dependence when four-cycle is created

(see Figure 2). This partial conditional independence assumption states thattwo possible edges with four distinct nodes are conditionally dependent when-ever their existence in the graph would create a four-cycle. One substantiveinterpretation is that the possibility of a four-cycle establishes the structuralbasis for a “social setting” among four individuals (Pattison and Robins,2002), and that the probability of a dyadic tie between two nodes (here, iand v) is a�ected not just by the other ties of these nodes but also by otherties within such a social setting, even if they do not directly involve i and v.

A four-cycle assumption is a natural extension of modeling based on tri-angles (three-cycles), and was first used by Lazega and Pattison (1999) inan examination of whether such larger cycles could be observed in an empir-ical setting to a greater extent than could be accounted for by parametersfor configurations involving at most 3 nodes. Let us consider the four-cycleassumption alongside the Markov dependence. Under the Markov assump-tion, Yiv is conditionally dependent on each of Yiu, Yuv, Yij and Yjv, becausethese edge indicators share a node. So if yiu = yjv = 1 (the precondition inthe four-cycle partial conditional dependence), then all five of these possibleedges can be mutually dependent, and hence the exponential model (4) couldcontain a parameter corresponding to the count of such configurations. Weterm this configuration, given by

yiv = yiu = yij = yuv = yjv = 1 ,

a two-triangle (see Figure 3). It represents the edge yij = 1 as part of thetriadic setting yij = yiv = yjv = 1 as well as the setting yij = yiu = yju = 1.

Motivated by this approach, we introduce here a generalization of triadicstructures in the form of graph configurations that we term k-triangles. Fora non-directed graph, a k-triangle with base (i, j) is defined by the presenceof a base edge i � j together with the presence of at least k other nodesadjacent to both i and j. We denote a ‘side’ of a k-triangle as any edge thatis not the base. The integer k is called the order of the k-triangle Thus ak-triangle is a combination of k individual triangles, each sharing the sameedge i� j. The concept of a k-triangle can be seen as a triadic analogue of a

15

This produces features on configurations of the form:

edgewise shared partner distribution: pk

(y) =proportion of edges between actors with exactly k shared partnersk = 0, 1, . . .

⇥ ⇤9

2) Other conditional independence statistics

⇧ Pattison and Robins (2002), Butts (2005)

⇧ Snijders, Pattison, Robins and Handcock (2004)

– edges in Y that are not tied are conditionally

independent given the rest of the network

• k-triangle distribution: tk(y) = proportion of k-triangles in the graph y.

• edgewise shared partner distribution:

pk(y) = propotion of nodes with exactly k edgewise shared partners in y.

•• • • • •

i

j

h1 h2 h3 h4 h5....................................................................................................................................................

............................................................................................................................

..........................................................................................................................

..................................................................................................................................................

................................................................

..................................................................................................................................................................................................................

..............................................................................................................................................................................................................

...........................................................................................................

.........................................................................................................................................................................................................................................................................................................................

........................................................................................................................................................................................................................................................................

.........................................................................................................................................................

.................................................................................................................................................................................................................................................................................................................................................................................................................................

........................................................................................................................................................................................................................................................................................................................................

................................................................................................................................................................................................................

........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

k-triangle for k = 5, i.e., 5-triangle

⌅ Mark S. Handcock Statistical Modeling With ERGM ⇤

Figure: The actors in the non-directed (i , j) edge have 5 shared partners

dyadwise shared partner distribution:dsp

k

(y) = proportion of dyads with exactly k shared partnersk = 0, 1, . . .

Structural Signatures

– identify social constructs or features– based on intuitive notions or partial appeal to substantive theory

Clusters of edges are often transitive:Recall triangle(y) is the number of triangles amongst triads

triangle(y) =1�g

3

�X

{i,j,k}2(g3

)

yij

yik

yjk

A closely related quantity is theproportion of triangles amongst two-stars

C (y) =3⇥triangle(y)

two�star(y)

mean clustering coe�cient

Figure:

Structural Signatures

– identify social constructs or features– based on intuitive notions or partial appeal to substantive theory

Clusters of edges are often transitive:Recall triangle(y) is the number of triangles amongst triads

triangle(y) =1�g

3

�X

{i,j,k}2(g3

)

yij

yik

yjk

A closely related quantity is theproportion of triangles amongst two-stars

C (y) =3⇥triangle(y)

two�star(y)

mean clustering coe�cient

Figure:

Structural Signatures

– identify social constructs or features– based on intuitive notions or partial appeal to substantive theory

Clusters of edges are often transitive:Recall triangle(y) is the number of triangles amongst triads

triangle(y) =1�g

3

�X

{i,j,k}2(g3

)

yij

yik

yjk

A closely related quantity is theproportion of triangles amongst two-stars

C (y) =3⇥triangle(y)

two�star(y)

mean clustering coe�cient

Figure:

Structural Signatures

– identify social constructs or features– based on intuitive notions or partial appeal to substantive theory

Clusters of edges are often transitive:Recall triangle(y) is the number of triangles amongst triads

triangle(y) =1�g

3

�X

{i,j,k}2(g3

)

yij

yik

yjk

A closely related quantity is theproportion of triangles amongst two-stars

C (y) =3⇥triangle(y)

two�star(y)

mean clustering coe�cient

!8

Classes of statistics used for modeling

1) Nodal Markov statistics ) Frank and Strauss (1986)

– motivated by notions of “symmetry” and “homogeneity”– edges in Y that do not share an actor are

conditionally independent given the rest of the network) analogous to nearest neighbor ideas in spatial statistics

• Degree distribution: dk

(y) = proportion of nodes of degree k in y.

• k-star distribution: sk

(y) = proportion of k-stars in the graph y.

• triangles: t1(y) = proportion of triangles in the graph y.

• •

i

j

h

....................................................................................................................................................................................................................................

..................................................................................................................

triangle= transitive triad

• •

j1 j2

i

..................................................................................................................

..................................................................................................................

two-star

• •••

j1 j2

i

j3

....................................................................................

..............................................................................................................................................................

three-star

( Mark S. Handcock Statistical Modeling With ERGM !

Figure:

Exponential-family Random Network ModelsJoint modeling of Y and X

Let N be the sample space of Y ,X

Model the multivariate distribution of Y ,Xvia the form:

P⌘(Y = y ,X = x) =exp{⌘·g(y , x)}

c(⌘,N )y , x 2 N

⌘ 2 ⇤ ⇢ Rq q-vector of parameters

g(y , x) q-vector of network statistics.) g(Y ,X ) are jointly su�cient for the model

c(⌘,N ) distribution normalizing constant

c(⌘,N ) =

Z

y , x2Nexp{⌘·g(y , x)}·dP

0

(y , x)

Interesting model-classes of ERNM

Relationship to ERGM and Random Fields

Let N (x) = {y : (x , y) 2 N} and N (y) = {x : (x , y) 2 N}

ERGM P(Y = y |X = x ; ⌘) =1

c(⌘; x)e⌘·h(x,y) y 2 N (x)

Gibbs measure P(X = x |Y = y ; ⌘) =1

c(⌘; y)e⌘·h(x,y) x 2 N (y)

The first model is the ERGM for the network conditional on thenodal attributes.

The second model is an exponential-family for the field of nodalattributes conditional on the network.

Relationship with ERGM

The model can be expressed as

P(X = x ,Y = y |⌘) = P(Y = y |X = x |⌘)P(X = x |⌘)where

P(Y = y |X = x ; ⌘) =1

c(⌘; x)e⌘·h(x,y) y 2 N (x)

P(X = x |⌘) =c(⌘; x)

c(⌘,N )x 2 X

The first sub-model is the ERGM for the network conditional on thenodal attributes.

The second sub-model is the marginal representation of the nodalattributes and is not necessarily an exponential-family with canonicalparameter ⌘.

This decomposition makes it clear why the conditional modeling ofY given X via ERGM di↵ers from the joint modeling of Y and Xvia ERNM.

Separable ERGM and Field Models

Suppose the model can be expressed as

P(X = x ,Y = y |⌘1

, ⌘2

) =1

c(⌘1

, ⌘2

,N )e⌘1

·h(x)+⌘2

·g(y) (y , x) 2 N . (1)

where N = Y ⇥ X . Then

P(X = x |⌘1

) =1

c1

(⌘1

,X )e⌘1

·h(x)

P(Y = y |⌘2

) =1

c2

(⌘2

,Y)e⌘2

·g(y).

The first sub-model is a general exponential-family model for theattributes (e.g., generalized linear models)

The second sub-model is an ERGM for the graph that has nodependence on the nodal attributes.

Example: Joint Ising Models

Suppose X is univariate and binary xi

2 {�1, 1}. One measure ofhomophily on x is

homophily(y , x) =nX

i=1

nX

j=1

xi

yi,jxj (2)

A simple model for the network is

P(X = x ,Y = y |⌘1

, ⌘2

) / e⌘1

density(y)+⌘2

homophily(y ,x) (y , x) 2 N .

where density(y) = 1

n

Pi

Pj

yi,j

GLM P(Yi,j = y

i,j |X = x , ⌘1

, ⌘2

) / e⌘1

1

n

y

i,j+⌘2

x

i

y

i,j xj y 2 {0, 1}, x 2 XIsing model P(X = x |Y = y , ⌘

2

) / e⌘2

Pi

Pj

x

i

y

i,j xj (y , x) 2 N

So we have a simple joint Ising model

Model specification issues: Degeneracy

Joint Ising model: n = 20 with moderate homophily (mean-value=0.76):

# of edges within x = 1

Count

Frequency

0 50 100 150 200

020000

# of edges within x = -1

Count

Frequency

0 50 100 150 200

020000

# of edges between x = 1 and x = -1

Count

Frequency

0 20 40 60 80 100 120

06000

# of nodes with x = 1

Count

Frequency

0 5 10 15 20

04000

Figure: 100,000 draws from an Ising homophily network modelwith ⌘

1

= 0 and ⌘2

= 0.13. Mean values are marked in red.

Model specification issues: Degeneracy

So standard homophily in ERGM leads to degeneracy in ERNM

Solution: regularized homophily

Suppose x is categorical with category labels 1, . . . ,K .

homophily

k,l(y , x) =nX

i=1

nX

j=1

I (xi

= k)yi,j I (xj = l). (3)

Let di,k(y , x) =

Pi<j

yij

I (xj

= k) be the number of edgesconnecting node i to nodes in category k .

rhomophily

k,l(y , x) =X

i :x

i

=k

qdi,l(y , x)� E??(

qdi,l(Y ,X )) (4)

where E??(·) is the expectation conditional on the number of nodes ineach category of x under the assumption that X and Y are independent.

What happens when we fit the model with our new homophily?

Comparison of Homophily Statistics

# of edges within group x = -1 # of edges within group x = 1

# of edges from x=-1 to x=1 # of edges from x=1 to x=-1

Regularized Homophily # of nodes with x = 1

0.00

0.01

0.02

0.03

0.00

0.01

0.02

0.03

0.00

0.02

0.04

0.06

0.08

0.00

0.02

0.04

0.06

0.08

0.0

0.1

0.2

0.3

0.4

0.0

0.2

0.4

0.6

0 50 100 150 200 0 50 100 150 200

0 20 40 60 0 20 40 60

0 10 5 10 15value

density

Model

Ising

Regularized Homophily

Figure:

Fitting Models to Partially Observed Social Network Data

Focus on the joint distribution of Z = (Y ,X ).

Two types of data:Observed relations and covariates (z

obs

= (yobs

, xobs

)),and information about the observation mechanism (D)(e.g., indicators of relations and covariates sampled.

L(⌘, ) ⌘ P(Zobs

= zobs

,D|⌘, )

=X

z

unobs

P(Zobs

= zobs

,Zunobs

= zunobs

,D|⌘, )

=X

z

unobs

P(D|Zobs

= zobs

,Zunobs

= zunobs

, )P⌘(Zobs

= zobs

,Zunobs

= zunobs

)

⌘ is the model parameter

is the sampling parameter

When can we “ignore” the sampling process?

Fitting Models to Partially Observed Social Network Data

Focus on the joint distribution of Z = (Y ,X ).

Two types of data:Observed relations and covariates (z

obs

= (yobs

, xobs

)),and information about the observation mechanism (D)(e.g., indicators of relations and covariates sampled.

L(⌘, ) ⌘ P(Zobs

= zobs

,D|⌘, )

=X

z

unobs

P(Zobs

= zobs

,Zunobs

= zunobs

,D|⌘, )

=X

z

unobs

P(D|Zobs

= zobs

,Zunobs

= zunobs

, )P⌘(Zobs

= zobs

,Zunobs

= zunobs

)

⌘ is the model parameter

is the sampling parameter

When can we “ignore” the sampling process?

Fitting Models to Partially Observed Social Network Data

Focus on the joint distribution of Z = (Y ,X ).

Two types of data:Observed relations and covariates (z

obs

= (yobs

, xobs

)),and information about the observation mechanism (D)(e.g., indicators of relations and covariates sampled.

L(⌘, ) ⌘ P(Zobs

= zobs

,D|⌘, )

=X

z

unobs

P(Zobs

= zobs

,Zunobs

= zunobs

,D|⌘, )

=X

z

unobs

P(D|Zobs

= zobs

,Zunobs

= zunobs

, )P⌘(Zobs

= zobs

,Zunobs

= zunobs

)

⌘ is the model parameter

is the sampling parameter

When can we “ignore” the sampling process?

Adaptive Sampling Designs

We call a sampling design adaptive if:

P(D = d |Zobs

,Zmis

, ) = P(D = d |Zobs

, ) 8z 2 Z.

that is, it uses information collected during the survey to directsubsequent sampling, but the sampling design depends only on theobserved data.

adaptive sampling designs satisfy a condition called “missing atrandom” by Rubin (1976) in the context of missing data.

Result: standard network sampling designs such as conventional,single wave and multi-wave link-tracing sampling designs areadaptive

) Thompson and Frank (2000), Handcock and Gile (2007).

Adaptive Sampling Designs

We call a sampling design adaptive if:

P(D = d |Zobs

,Zmis

, ) = P(D = d |Zobs

, ) 8z 2 Z.

that is, it uses information collected during the survey to directsubsequent sampling, but the sampling design depends only on theobserved data.

adaptive sampling designs satisfy a condition called “missing atrandom” by Rubin (1976) in the context of missing data.

Result: standard network sampling designs such as conventional,single wave and multi-wave link-tracing sampling designs areadaptive

) Thompson and Frank (2000), Handcock and Gile (2007).

When is sampling adaptive?

Examples of adaptive sampling:

Individual sample, sample based on observed things like race, sex,and age that we know.

Link-tracing sample starting with an adaptive sample with follow-upbased on observed relations with others in the sample, as well asthings like race and sex and age.

Link-tracing with probability proportional to number of partners isadaptive!

Examples of non-adaptive (not missing at random) sampling:

Individual sample based on unobserved properties ofnon-respondents - like infection status or illicit activity.

Link-tracing sample starting where links are followed dependent onunobserved properties of alters.

Adaptive Sampling Designs and their Amenable ModelsDefinition: Consider a sampling design governed by parameter 2 and a stochastic network model P⌘(Y = y ,X = x) governed byparameter ⌘ 2 ⌅. We call the sampling design amenable to the model ifthe sampling design is adaptive and the parameters and ⌘ are distinct.

Result: If the sampling design is amenable to the model the likelihoodfor ⌘ and is

L[⌘, |Zobs

= zobs

,D = d ] / L[ |D = d ,Zobs

= zobs

]L[⌘|Zobs

= zobs

]

sampling design likelihood⇥face-value likelihood

sampling L[ |D = d ,Zobs

= zobs

] = P(D|Zobs

= zobs

)

network L[⌘|Zobs

= zobs

] =X

z

unobs

P⌘(Zobs

= zobs

,Zunobs

= zunobs

)

Adaptive Sampling Designs and their Amenable ModelsDefinition: Consider a sampling design governed by parameter 2 and a stochastic network model P⌘(Y = y ,X = x) governed byparameter ⌘ 2 ⌅. We call the sampling design amenable to the model ifthe sampling design is adaptive and the parameters and ⌘ are distinct.

Result: If the sampling design is amenable to the model the likelihoodfor ⌘ and is

L[⌘, |Zobs

= zobs

,D = d ] / L[ |D = d ,Zobs

= zobs

]L[⌘|Zobs

= zobs

]

sampling design likelihood⇥face-value likelihood

sampling L[ |D = d ,Zobs

= zobs

] = P(D|Zobs

= zobs

)

network L[⌘|Zobs

= zobs

] =X

z

unobs

P⌘(Zobs

= zobs

,Zunobs

= zunobs

)

Adaptive Sampling Designs and their Amenable Models

Result: If the sampling design is not amenable to the model thelikelihood for ⌘ and is

L(⌘, ) =X

z

unobs

P(D|Zobs

= zobs

,Zunobs

= zunobs

, )P⌘(Zobs

= zobs

,Zunobs

= zunobs

)

and the design will need to be represented.

Clearly P(D|Y ,X , ) can be modeled when it is unknown.

Likelihood-based inference for ERNMwhen partially observed

Consider the conditional distribution of T given Tobs

:

P⌘(Tunobs

= t|Tobs

= tobs

) = exp [⌘·g(t + tobs

)� (⌘|tobs

)] t 2 T (tobs

)

where T (tobs

) = {t : t + tobs

2 T }

(⌘|tobs

) = logX

u2T (t

obs

)

exp [⌘·g(u + tobs

)].

Note thatL[⌘|T

obs

= tobs

] / exp [(⌘|tobs

)� (⌘)]

which can then be estimated by MCMC:

the first term by a chain on the complete data over T and;

the second by a chain conditional on tobs

over T (tobs

).

Likelihood-based inference for ERNMwhen partially observed

Consider the conditional distribution of T given Tobs

:

P⌘(Tunobs

= t|Tobs

= tobs

) = exp [⌘·g(t + tobs

)� (⌘|tobs

)] t 2 T (tobs

)

where T (tobs

) = {t : t + tobs

2 T }

(⌘|tobs

) = logX

u2T (t

obs

)

exp [⌘·g(u + tobs

)].

Note thatL[⌘|T

obs

= tobs

] / exp [(⌘|tobs

)� (⌘)]

which can then be estimated by MCMC:

the first term by a chain on the complete data over T and;

the second by a chain conditional on tobs

over T (tobs

).

Likelihood-based inference for ERNMwhen partially observed

In general, the observed data log likelihood ratio of (⌘, ✓) versus (⌘0

, ✓0

) is

`(⌘, ✓)� `(⌘0

, ✓0

) = log(c(t

obs

,w , ⌘, ✓)

c(tobs

,w , ⌘0

, ✓0

))� log(

c(⌘, T )

c(⌘0

, T ))

= log(X

t

miss

p(W = w |T = t, ✓)

p(W = w |T = t, ✓0

)e(⌘�⌘

0

)·g(t) p(W = w |T = t, ✓0

)e⌘0

·g(t)

c(tobs

,w , ⌘0

, ✓0

))

� log(X

t

miss

e(⌘�⌘0

)·g(t) e⌘0

·g(t)

c(⌘, T ))

= log(E⌘0

,✓0

(p(W = w |T , ✓)

p(W = w |T , ✓0

)e(⌘�⌘

0

)·g(T ))|W = w ,Tobs

= tobs

) (5)

� log(E⌘0

(e(⌘�⌘0

)·g(T )))

Likelihood-based inference for ERNMwhen partially observed

In general, the observed data log likelihood ratio of (⌘, ✓) versus (⌘0

, ✓0

)may be approximated by

`(⌘, ✓)� `(⌘0

, ✓0

)

⇡ log(1

M

MX

i

p(w |t(i)m

, ✓)

p(w |t(i)m

, ✓0

)e(⌘�⌘

0

)·g(t(i)m

))� log(1

M

MX

i

e(⌘�⌘0

)·g(t(i))).

Likelihood-based inference for ERNMwhen partially observed

ERGM case implemented in R package statnet (Handcock et al 2003)(http://statnet.org).

ERNM case implemented in ernm package (Fellows 2012).

Adolescent Peer Networks

The National Longitudinal Study of Adolescent Health) www.cpc.unc.edu/projects/addhealth

– “Add Health” is a school-based study of the health-relatedbehaviors of adolescents in grades 7 to 12.

Each nominated up to 5 boys and 5 girls as their friends

160 schools: Smallest has 69 adolescents in grades 7–12

Adolescent Peer Networks

The National Longitudinal Study of Adolescent Health) www.cpc.unc.edu/projects/addhealth

– “Add Health” is a school-based study of the health-relatedbehaviors of adolescents in grades 7 to 12.

Each nominated up to 5 boys and 5 girls as their friends

160 schools: Smallest has 69 adolescents in grades 7–12

−10 −5 0 5 10

−10

−50

510

12

7 9

10

9

8

10

11

7

8

11

8

10

8

8

10

97

8

8

11

8

99

7

11

9

10

8

11

7

9

11

11

11

10

10

9

9

7

10

10

7

7 9

9

1111

8

12

9

9

10

7

7

9

7

11

9

7

12

7

8

9

11

11

7

8

12

White (non-Hispanic)Grade 7

Black (non-Hispanic)

Hispanic (of any race)Asian / Native Am / Other (non-Hispanic)

Race NA

Grade 8

Grade 9

Grade 10

Grade 11

Grade 12

Grade NA

Application to substance use in adolescent peer networksData: The National Longitudinal Study of Adolescent Health– Model friendship network as a function of student characteristics

Form Name DefinitionY Mean Degree Average degree of studentsY Log Variance of Degree The log variance of the student degreesY In Degree = 0 # of students with in degree 0Y In Degree = 1 # of students with in degree 1Y Out Degree = 0 # of students with out degree 0Y Out Degree = 1 # of students with out degree 1Y Reciprocity # of reciprocated tiesX Grade = 9 # of freshmenX Grade = 10 # of sophomoresX Grade = 11 # of juniorsX ,Y Within Grade Homophily Pooled homophily within gradeX ,Y +1 Grade Homophily Pooled homophily between each grade

and the grade above it

Table: Model Terms

Application to substance use in adolescent peer networks

⌘ Std. Error Z p�valueMean Degree -217.02 7.81 -27.80 <0.001

Log Variance of degree 25.07 9.06 2.77 0.006In-Degree 0 2.62 0.50 5.20 <0.001In-Degree 1 1.05 0.40 2.62 0.009

Out-Degree 0 4.09 0.52 7.91 <0.001Out-Degree 1 1.93 0.45 4.25 <0.001

Reciprocity 2.71 0.23 11.77 <0.001Grade = 9 1.46 0.62 2.37 0.018Grade = 10 1.93 0.71 2.72 0.007Grade = 11 2.08 0.59 3.54 <0.001

Grade Homophily 4.34 0.46 9.41 <0.001+1 Grade Homophily 0.63 0.21 2.98 0.003

Table: ERNM Model with Standard Errors Based on the Fisher Information

Application to substance use in adolescent peer networks

We see that students in the same grade are much more likely to befriends

The positive coe�cient for ’+1 Grade Homophily’ indicates thatstudents also tend to form connections to the grades just below orjust above them.

Evaluating the goodness-of-fit?Simulate networks from the fitted model, and visually compare themto the observed network (Hunter, Goodreau, Handcock 2008).Simulate network statistics from the model and compare them to theobserved network.

Model-Based Simulated High School

Simulated Network

9

11

9

9

12

9

9

10 11

12

10

11

12

10

1212

12

9

10

10

10

9 1111

11

11

12

9

10

12

11

10

9

11

1111

1112

11

9

12

910

9119

9

9

9

12

9

9

12

9

910

11

10

10

9

11

9

12

12

10

10

10

10

10

10 11

10

11

9

Observed Network

11

11

10

9

109

9

12 10

101210

11

12

99

10

99

11

910

11

99

9

10

9

10

10

1112 11

9

10

9

11

11

11

1111

11

11

10

11

9

10

11

11

12

11

9

9

11

11

1011

1012

9

12

9

10

10

10

9

9

1212

12

11

12

10

12

Model Diagnostics: Goodness-of-fit

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

0

5

10

15

20

25

30

In-Degree

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

0

5

10

15

20

25

30

Out-Degree

10-10

11-10

12-10

9-10

10-11

11-11

12-11

9-11

10-12

11-12

12-12

9-12

10-9

11-9

12-9 9-9

0

20

40

60

80

100

120

# of Edges Between Grades

9 10 11 12

5

10

15

20

25

30

Grade Counts

Network Regression with endogenous nodal attributes

Let Z 2 {0, 1} be a binary outcome variable (e.g., substance use) andconsider:

P(Z = z ,X = x ,Y = y |⌘,�,�) = 1

c(�, ⌘,�)ez·x�+⌘·g(x,y)+�·h(z,y). (6)

So, conditional on Y = y ,�,�:

logodds(zi

= 1|z�i

,Xi

= xi

)� logodds(zi

= 1|z�i

,Xi

= x⇤i

)

= �(xi

� x⇤i

)

where z�i

represents the set of z not including zi

, xi

represents the ithrow of X .

So � have their usual interpretations (conditional on the rest of thenetwork)

The usual independence assumptions do not hold

Simple logistic regression of substance use on gender

What is the relationship between gender and substance use?

Logistic regression of substance use on gender:

� Std. Error Z p�valueIntercept -1.70 0.44 -3.84 <0.001Gender 1.18 0.57 2.09 0.037

Table: Simple Logistic Regression Model Ignoring Network Structure

Logistic Regression using Network Data

Bootstrap Asymptotic⌘ Std. Error Std. Error Z p�value

Mean Degree -215.50 8.32 8.15 -26.44 <0.001Log Variance of degree 24.46 8.80 8.91 2.75 0.006

In-Degree 0 2.68 0.55 0.48 5.55 <0.001In-Degree 1 1.07 0.43 0.41 2.60 0.009

Out-Degree 0 4.15 0.54 0.52 8.03 <0.001Out-Degree 1 1.94 0.50 0.45 4.31 <0.001

Reciprocity 2.71 0.25 0.23 11.96 <0.001Grade Homophily 4.28 0.44 0.47 9.18 <0.001

+1 Grade Homophily 0.62 0.21 0.21 2.99 0.003Gender Homophily 0.78 0.24 0.24 3.27 0.001

Substance Homophily 0.76 0.25 0.25 3.02 0.003Intercept -1.72 0.50 0.44 -3.91 <0.001Gender 0.92 0.55 0.51 1.79 0.073

Table: ERNM Model Inference

Model diagnostics for network regression

# of edges within substance categories

Count

Frequency

100 150 200 250 300 350

0100

# of edges between users and non-users

Count

Frequency

50 100 1500

100

# of non-substance users

Count

Frequency

45 50 55 60

0150

Figure: Substance Use Homophily Diagnostics.The values of the observed statistics are marked in red.

Model diagnostics for network regression

# of edges within substance categories

Count

Frequency

100 150 200 250 300 350

0100

# of edges between users and non-users

Count

Frequency

50 100 1500

100

# of non-substance users

Count

Frequency

45 50 55 60

0150

Figure: Substance Use Homophily Diagnostics.The values of the observed statistics are marked in red.

Latent Class Modeling using ERNM

We observe the relational ties Y

Postulate the existence of a categorical nodal covariate X

Build an ERNM model based on complex statistics

Treat all X values as missing

A new variant of the stochastic block model (Wang and Wong 1987)

Latent Class Modeling using ERNM

We observe the relational ties Y

Postulate the existence of a categorical nodal covariate X

Build an ERNM model based on complex statistics

Treat all X values as missing

A new variant of the stochastic block model (Wang and Wong 1987)

Latent Class Modeling using ERNMExample: Latent Cluster Model of Sampson’s Monks

Expressed “liking” between 18 monks within an isolated monastery) Sampson (1969)

A directed relationship aggregated over a 12 month period before thebreakup of the cloister.

Sampson identified three groups plus:(T)urks, (L)oyal Opposition, (O)utcasts and (W)averers

!! " ! #

!!

"!

#

$

$

$

$

$

$

$

%

%

%

%

%

%

%

&

&&&

Latent Class Modeling using ERNM

We observe the relational ties Y

Postulate the existence of a categorical nodal covariate X

Assume the sample space of X has K = 18 categories

Build a model based on count, density and relative homophily terms

Treat all X values as missing using the above ideas

Term ⌘ µ s.e.(⌘) s.e.(µ)# of edges -0.58 88.23 0.14 7.48Homophily 7.28 15.30 0.91 1.33# in group 2 -0.02 6.95 1.31 0.99# in group 3 -2.50 3.95 1.44 1.08

Table: Latent class model of Sampson’s monks.

p(X |Y = yobs

, ⌘) assign each monk with probability one to thecorrect clusters.

Almost all mass goes on three groups.

Conclusions

Exponential-family random network models area powerful new way to model network data

Leads to a new approach to network regression

Leads to a new approach to latent variable modeling

Model specification and development are extensions of that forERGM, but require new perspectives

top related