introduction - university of washingtonpdhoff/courses/567/notes/slides_2014.pdf¥ graph theory and...

Introduction567 Statistical analaysis of social networks

Peter Ho↵

Statistics, University of Washington

Networks are relational data

Relationship: an irreducible property of two or more entities

• contrast to properties of entities alone (“attributes”)

Focus of SNA: The study of relational data arising from social entities

• Entities: people, animals, groups, locations, organizations, regions, etc.

• Relationships: communication, acquaintanceship, sexual contact, trade,migration rate, alliance/conflict, etc.

Some vocabulary

Network data:A collection of entities and a set ofmeasured relations between them.

• Entities: actors, nodes, vertices.

• Relations: ties, links, edges.

Relations can be

• directed or undirected;

• signed or valued.

Di↵erent perspectives on network analysis

• Social networks community (Heider 1946; Frank 1972; Holland and Leinhardt 1981)

(social theory, description, survey design)

• Machine learning community (Jordan, Jensen, Xing, .... . . )

(clustering, prediction, computation)

• Physics and applied math (Newman, Watts, . . . )

(generative and agent-based models, emergent features )

• Statistical networks community (Frank and Strauss 1986; Snijders 1997)

(statistical modeling, design based inference)

Core areas of statistical network analysis

1. Statistical modeling: evaluation and fitting of network models.• Testing: evaluation of competing theories of network formation.• Estimation: evaluation of parameters in a presumed network model.• Description: summaries of main network patterns.• Prediction: prediction of missing or future network relations.

2. Design-based inference: Network inference from sampled data.• Design: survey and data-gathering procedures.• Inference: generalization of sample data to the full network.

Examples of Friendship Relationships

• “Add Health” - The National Longitudinal Study of Adolescent Health

• a school-based study of adolescent health and social behaviors;• www.cpc.unc.edu/projects/addhealth.

• Data from 160 schools across the US.• smallest has 69 adolescents in grades 7–12,• largest had thousands of participants.

• Relational data: Participants nominated up to 5 boys and 5 girls as friends

Data from a small school

−10 −5 0 5 10

−10

−50

510

12

7 9

10

9

8

10

11

7

8

11

8

10

8

8

10

97

8

8

11

8

99

7

11

9

10

8

11

7

9

11

11

11

10

10

9

9

7

10

10

7

7 9

9

1111

8

12

9

9

10

7

7

9

7

11

9

7

12

7

8

9

11

11

7

8

12

Data from a large school and feeder middle school

White (non-Hispanic)Grade 7

Black (non-Hispanic)

Hispanic (of any race)Asian / Native Am / Other (non-Hispanic)

Race NA

Grade 8

Grade 9

Grade 10

Grade 11

Grade 12

Grade NA

Features of many social networks

In addition to associations to nodal and dyadic attributes, many networksexhibit the following features:

• Reciprocity of ties

• Degree heterogeneity in the propensity to form or receive ties• sociability• popularity

• Homophily by actor attributes

• higher propensity to form ties between actors with similar attributes• attributes may be observed or unobserved

• Transitivity of relationships• friends of friends have a higher propensity to be friends

• Balance of relationships• people feel comfortable if they agree with others whom they like

• Equivalence of nodes• some nodes may have identical or similar patterns of relationships

Data analysis goals

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

● ●

●●

●

● ●●

●●

●

●

●

●

● ●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●● ●

●

●

●

●

●

● ●●●

●●●●

●

●

●

●●

●

●●●●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●

●●

●●

●

●●

●

●

●●

●

●

●

●

●

● ●

●●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

● ●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

● ●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

How can we quantify such network patterns?

1. How can we represent features of social relations?(reciprocity/sociability/popularity/transitivity : descriptive statistics)

2. Can we relate the network to covariate information?(homophily : regression modeling)

3. Can we identify nodes with similar network roles?(stochastic equivalence : node partitioning)

Inferential goals in the regression framework

yi,j measures i ! j , xi,j is a vector of explanatory variables.

Y =

0BBBBB@

y1,1 y1,2 y1,3 NA y1,5 · · ·y2,1 y2,2 y2,3 y2,4 y2,5 · · ·y3,1 NA y3,3 y3,4 NA · · ·y4,1 y4,2 y4,3 y4,4 y4,5 · · ·...

.

.

....

.

.

....

1CCCCCA

X =

0BBBBB@

x1,1 x1,2 x1,3 x1,4 x1,5 · · ·x2,1 x2,2 x2,3 x2,4 x2,5 · · ·x3,1 x3,2 x3,3 x3,4 x3,5 · · ·x4,1 x4,2 x4,3 x4,4 x4,5 · · ·...

.

.

....

.

.

....

1CCCCCA

Consider a basic (generalized) linear model

yi,j ⇠ �Txi,j + ei,j

A model can provide

• a measure of the association between X and Y: �, se(�)

• imputations of missing observations: p(y1,4|Y,X)

• a probabilistic description of network features: g(Y), Y ⇠ p(Y|Y,X)

Course outline

1. Representation of relational data.• graph theory and visualization• matrix representations

2. Descriptive network statistics and summaries• density and degree distributions• centrality• triads, transitivity and balance• equivalence and node partitioning

3. Statistical inference for complete network data• model comparison via hypothesis testing• p1 and ERGM models• social relations model• latent variable models: blockmodels and factor models

4. Statistical inference for sampled network data• egocentric sampling• link tracing designs

Relational data567 Statistical analysis of social networks

Peter Hoff


Defining characteristics of network data

Relational data are data that include

• a set of objects, and

• measurements between pairs of objects.

Example (symmetric social network):The canonical example of a relational datasets is the symmetric social network:

• {1, . . . , n} = labels of n individuals;

• {yi,j : 1 ≤ i < j ≤ n} = binary indicators of friendship between individuals

yi,j =

{1 if i and j are friends0 if i and j are not friends

Components of network data

More generally, a network dataset consists of

• a set or multiple sets of objects:• nodes (objects, actors, egos, individuals)

• variables measured on nodes and pairs of nodes:• dyadic variables (structural variables): measured on pairs of nodes (dyads)• nodal variables (actor attributes): measured on nodes

Standard data: nodes and nodal variables.

Relational data: nodes, nodal variables and dyadic variables.

The presence of dyadic variables is the defining feature of relational data.

Note:One could expand the definition to triadic variables, but such data are rare.

Types of relations

Relations are often characterized as undirected or directed:

• An undirected (or symmetric) relation has only one value per pair;

• A directed relation has two values per pair: one value representing theperspective of each pair member.

Undirected relations

• observer-reported friendships

• military conflict

• amount of time individuals spend together

• geographic distances between cities

Directed relations

• self-reported friendships

• military intervention of one country within another

• number of emails sent between individuals

• flight times between cities

Types of relations

Relations are often characterized as binary or valued:

• A binary (or dichotomous) relation takes only two values.

• A valued relation takes more than two values.• A valued relation whose possible values have an order is called ordinal.• A valued relation whose possible values lack an order is called categorical.

Most valued relations we will encounter are ordinal.

Binary relations

• presence of a friendship

• presence of a treaty between countries

• presence of connections between objects

Valued relations

• amount of time spent together

• number of conflicts between countries

• distance between objects

Example: Friendship nominations

Variables:

• student grade

• student gpa

• student smoking behavior

• student-student friendshipnominations

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

Exercise: Identify the nodes, nodal variables, dyadic variables and types ofrelations.

Example: International conflict, 1990-2000

Variables:

• country population

• country polity

• number of militarized disputesbetween country pairs

• amount of trade betweencountry pairs

• geographic distance betweencountry pairs

• number of shared IGOs betweencountry pairs

AFG

ANG

ARG

AUL

BAH

BNG

BOT

BUICAN

CAO

CHACHN

COS

CUB

CYP

DOM

DRC

EGYFRN

GHA

GRC

GUI

HON

INDINS

IRN

IRQ

ISR ITA

JOR

JPN

KEN

LBR

LES

LIB

MAL

MYA

NIC

NIG

NIR

NTHOMA

PAK

PHI

PRK

QAT

ROK

RWASAF

SAL

SAU

SEN

SIE

SRI

SUDSWA

SYR

TAW

TAZ

THI

TOG

TURUAE

UGA

UKGUSA

VEN

AFGALB

ANGARGAUL

BAH

BEL

BEN

BNG

BUI

CAM

CAN

CDI

CHA

CHL

CHN COLCONCOS

CUB CYP

DRC

EGY

FRN

GHAGNB

GRC

GUI

GUY

HAI

HONIND

INS

IRNIRQ

ISR

ITA

JOR

JPN

LBR

LES

LIB

MAAMLI

MON

MOR

MYA

MZM

NAM

NIC

NIG

NIRNTH

OMA

PAKPHI

PNG

PRK

QAT

ROK

RWA

SAF

SAL

SAU

SEN

SIE

SIN

SPN

SUD

SYR

TAW

TAZTHI

TOG

TRI

TURUAE

UGA

UKG

USA

VENYEM

ZAM

ZIM

Exercise: Identify the nodes, nodal variables, dyadic variables and types ofrelations.

Affiliation networks

The number of shared IGOs is a dyadic variable, but it is derived from a set ofbinary attribute variables:

org 1 org 2 org 3 · · ·country 1 0 1 0 0country 2 0 1 1 0country 3 1 0 1 0

...

A set of binary indicator variables on a node set is sometimes called anaffiliation network.

• each binary variable can be thought of as indicating affiliation to a group.

• each individual’s binary variables can be thought of as their relationshipsto the groups.

Exercise: How would you compute the number of shared affiliations betweennodes from an affiliation network?

Affiliation networks

Affiliation network: A relational network between two sets of nodes for which

• there is no overlap between the two node sets;

• there are no relationships within a node set.

Examples:

• Movie ratings: relations between people and movies

• Consumer data: relations between people and products

• Animal behavior: presence at various locations and/or points in time

Terminology:

• modes: (generally) non-overlapping groups of objects

• bipartite relation: a relationship defined on pairs consisting of onemember from each of two modes.

Affiliation networks are sometimes referred to as two-mode bipartite networks.

Bipartite networks

Bipartite networks may seem like relational data:

• they may involve relations between people;

• they may consist of “dyadic” measurements on pairs of objects.

They differ from relational data as previously defined:

• within-mode ties are structurally impossible;

• many features of one-mode networks are meaningless for two-modebipartite networks:

• reciprocity (typically),• transitivity,• balance;

• they can be treated simply as “matrix-valued data,” for which there is alarge and separate statistical literature.

For these reasons, we will not spend much time on bipartite networks.

Representations of relational data567 Statistical analysis of social networks

Peter Hoff


Fully observed binary network

The vast majority of network analysis tools have been developed for fullyobserved binary networks.

Binary network: Relational data consisting of a single dichotomous relation,typically taken indicate the presence or absence of a relationship.

Fully observed network: The relationship between each pair of individuals isobserved to be either present or absent.

In such cases, the network data can be represented

• as a graph, or

• as a matrix.

Nodes and edges

A graph consists of

• a set of nodes N = {1, . . . , n};• a set of edges or lines between nodes E = {e1, . . . , em}

The graph is denoted G = (N , E).

Each edge e ∈ E is expressed in terms of the pair of nodes the line connects.

Undirected graph:The edges have no direction, and the edge {i , j} is the same as the edge {j , i}:

{i , j} = {j , i}

i.e. each edge is an unordered pair of nodes.

Directed graph:The edges have direction, and the edge (i , j) is not the same as the edge (j , i):

(i , j) 6= (j , i)

i.e. each edge is an ordered pair of nodes.

Example of an undirected graph

N = {1, 2, 3, 4, 5}E = {{1, 2}, {1, 3}, {2, 3}, {2, 4}, {2, 5}, {3, 5}, {4, 5}}

Exercise: Draw this graph

Example of a directed graph

N = {1, 2, 3, 4, 5}E = {(1, 3), (2, 3), (2, 4), (2, 5), (3, 1), (3, 5), (4, 5), (5, 4)}

Exercise: Draw this graph

Some graph terminology

For an undirected graph G = {N , E}• adjacent : nodes i and j are adjacent if {i , j} ∈ E• incident : node i is incident with edge e if e = {i , j} for some j ∈ N .

• empty : the graph is empty if E = ∅, i.e. there are no edges.

• complete : the graph is complete if

E = {{i , j} : i ∈ N , j ∈ N , i 6= j},

that is, all possible edges are present.

Similar definitions are used for directed graphs.

Exercise: Identify some adjacent nodes and incident node-edge pairs for theprevious two example graphs.

SubgraphsA graph Gs = (Ns , Es) is a subgraph of G = (N , E) if• Ns ⊂ N• Es ⊂ E and all edges in Es are between nodes in Ns .

Examples:

N = {1, 2, 3, 4, 5}E = {{1, 2}, {1, 3}, {2, 3}, {2, 4}, {2, 5}, {3, 5}, {4, 5}}

Ns1 = {1, 2, 3, 4}Es1 = {{1, 2}, {1, 3}, {2, 3}, {2, 4}} (generated by nodes 1,2,3 and 4)

Ns1 = {1, 2, 3, 4}Es1 = {{1, 2}, {1, 3}, {2, 4}} (generated by edges {1, 2}, {1, 3}, {2, 4})

Node-generated subgraph

Let Ns ⊂ N .

The subgraph generated by Ns is the subgraph Gs = (Ns , Es) where Esincludes all edges in E between nodes in Ns .

Mathematically,Es = E ∩ {{i , j} : i ∈ Ns , j ∈ Ns}.

Node generated subgraphs are useful:

• often such a subgraph is of scientific interest;

• often there is missing data for some nodes, and so we might focus on thesubgraph generated by nodes with no missing data.

• often we want to identify cohesive subgroups of nodes, that is, subsets ofnodes with a dense node-generated subgraphs.

Dyads and triads

Some simple but useful subgraphs are dyads and triads:

Dyad: A dyad is a subgraph generated by a single pair i , j ∈ N :

• In an undirected graph, the possible states of the dyad are given by eitherEs = {i , j} or Es = ∅, the empty or complete graphs.

• In a directed graph, the four possible states of the dyad are given by

Es = ∅ Es = {(i , j)}Es = {(j , i)} Es = {(i , j), (j , i)}

Dyads and triads

A triad is a subgraph generated by a triple of nodes.For an undirected graph, a triad can be in one of 23 = 8 possible states.

Exercise: Draw the eight possible states.

Note that

• 3 of the 1-edge triad states are equivalent, or isomorphic,

• 3 of the 2-edge triad states are isomorphic.

Isomorphic: Two graphs G and G′ are isomorphic if

• “there is a 1-1 mapping from nodes of G to the nodes of G′ that preservesthe adjacency of nodes.”

• or equivalently, G′ can be obtained by relabeling the nodes of G.

Edge-generated subgraph

Let Es ⊂ E .

The subgraph generated by Es is the subgraph Gs = (Ns , Es) where Ns

includes all nodes in N incident with an edge from Es .These subgraphs may arise in certain types of network sampling schemes, forexample, network event data:

• international conflicts: conflicts are recorded along with the aggressor andtarget countries.

• transactional data: transactional events are recorded, along with theparticipating parties.

Edge-generated subgraphs may be misrepresentative of the underlying graph:

{i , j} ⊂ Ns , (i , j) ∈ E 6⇒ (i , j) ∈ Es

These subgraphs are used less frequently than node-generated subgraphs.

Edge lists

Graphs are often stored on a computer in terms of their edge set, or edge list.

The edge list completely represents the graph unless there are isolated nodes:

Isolated node or isolate: A node that is not adjacent to any other node.

Examples:

In the presence of isolates, the graph can be represented with the edge list anda list of isolates.

Graph-theoretic terms

• graph, node set, edge set, edge list

• undirected graph, directed graph

• adjacent, incident, empty, complete

• subgraph, generated subgraph, dyad, triad

• isomorphic

• isolate

Matrix representations

A very useful way to represent a relational dataset is as a matrix: Let

• N = {1, . . . , n}• yi,j = the (possibly directed) relationship from node i to node j

• Y be the n × n matrix with entries {yi,j : i = 1, . . . , n, j ∈ 1, . . . , n}.Note that the diagonal entries of Y are not defined, or “not available.”

Y =

na y1,2 y1,3 y1,4 y1,5 y1,6y2,1 na y2,3 y2,4 y2,5 y2,6y3,1 y3,2 na y3,4 y3,5 y3,6y4,1 y4,2 y4,3 na y4,5 y4,6y5,1 y5,2 y5,3 y5,4 na y5,6y6,1 y6,2 y6,3 y6,4 y6,5 na

Adjacency matrices

Suppose we have dichotomous (presence/absence) relationship measuredbetween pairs of nodes in a node set N = {1, . . . , n}.As discussed, such relational data can be expressed as a graph G = (N , E).

The data can also be represented by an n × n matrixY = {yi,j : i , j ∈ N , i 6= j}, where

yi,j =

{1 if (i , j) ∈ E0 if (i , j) 6∈ E

This matrix is called the adjacency matrix of the graph G = (N , E).

• The adjacency matrix of every graph is a square, binary matrix with anundefined diagonal.

• Every square, binary matrix with an undefined diagonal corresponds to agraph.

Adjacency matrices

For an undirected binary relation, {i , j} = {j , i} and so yi,j = yj,i by design.

• the representing graph is an undirected graph;

• the representing adjacency matrix is symmetric.

For a directed binary relation, (i , j) 6= (j , i) and it is possible that yi,j 6= yj,i .

• the representing graph is a directed graph;

• the representing adjacency matrix is possibly asymmetric.

Adjacency matrices

Exercise: Draw the directed graph represented by the following matrix:

Y =

na 0 1 1 0 11 na 1 0 0 10 0 na 1 0 10 0 1 na 0 11 0 1 1 na 10 0 1 1 0 na

Sociomatrices

More generally, any relational variable measured on a nodeset can berepresented by a sociomatrix:

Sociomatrix: An square matrix with undefined diagonal entries.

Clearly a sociomatrix can represent a wider variety of relational data:

Y1 =

na 1 0 1 00 na 0 1 00 1 na 0 00 0 0 na 00 0 0 1 na

Y2 =

na 2.1 na 0.0 0.10.0 na 4.1 0.0 na2.1 2.9 na 0.0 1.20.0 0.0 na na 5.4na 2.1 4.1 0.0 na

The sociomatrix on the left can alternatively be expressed as a graph.The sociomatrix on the right cannot:

• the value of the relation is not dichotomous;

• the value of the relation is not measured for all pairs.

no measured relation 6⇒ no relation

Sociomatrices

Advantages of sociomatrices:

• can represent valued (non-dichotomous) relations;

• can indicate missing data.

Graph representations can do neither of these things, yet people willnevertheless try to shoehorn incomplete, non-dichotomous relational data intoa graphical representation:

Y =

na 2.1 na 0.0 0.10.0 na 4.1 0.0 na2.1 2.9 na 0.0 1.20.0 0.0 na na 5.4na 2.1 4.1 0.0 na

⇒ Y =

na 1 0 0 10 na 1 0 01 1 na 0 10 0 0 na 10 1 1 0 na

The sociomatrix on the right is representable as a graph, but

• coarsens the data (throws away information)

• misrepresents uncertainty in the missing values.

Compression

Disadvantage of sociomatrices:

• are an inefficient representation for sparse networks.

Consider an n×n binary sociomatrix Y that is p% 1’s, where p is close to zero.

• the size of the matrix grows quadratically in n

• the number of “1”s in the matrix grows linearly with n.

For such matrices, an edge list provides a much more compact representation:

Y =

na 1 0 0 10 na 1 0 01 0 na 0 10 0 0 na 10 0 1 0 na

E = {(1, 2), (1, 5), (2, 3), (3, 1), (3, 5), (4, 5), (5, 3)}

The advantage of E over Y increases as n increases, if p remains fixed.

Weighted edges

Often the relational variable is either zero or some arbitrary non-zero value.

• communication networks:

yi,j = number of emails sent from i to j

yi,j ∈ {0, 1, 2, . . .}

• conflict networks:

yi,j = military relationship between i and j

yi,j ∈ {−1, 0, 1}

In both cases, yi,j = 0 for the vast majority of i , j-pairs.In such cases, a weighted edge list can be more efficient than a sociomatrix.

Weighted edges

Y =

na 8 0 0 20 na 1 0 07 0 na 0 40 0 0 na 10 0 13 0 na

E =

1 2 81 5 22 3 13 1 73 5 44 5 15 3 13

Compression:

• Y is n × n

• E is m × 3, where m is the number of non-zero relationships.


Variables:

• country population

• country polity

• number of militarized disputesbetween country pairs

• amount of trade betweencountry pairs

• geographic distance betweencountry pairs

• number of shared IGOs betweencountry pairs

AFG

ANG

ARG

AUL

BAH

BNG

BOT

BUICAN

CAO

CHACHN

COS

CUB

CYP

DOM

DRC

EGYFRN

GHA

GRC

GUI

HON

INDINS

IRN

IRQ

ISR ITA

JOR

JPN

KEN

LBR

LES

LIB

MAL

MYA

NIC

NIG

NIR

NTHOMA

PAK

PHI

PRK

QAT

ROK

RWASAF

SAL

SAU

SEN

SIE

SRI

SUDSWA

SYR

TAW

TAZ

THI

TOG

TURUAE

UGA

UKGUSA

VEN

AFGALB

ANGARGAUL

BAH

BEL

BEN

BNG

BUI

CAM

CAN

CDI

CHA

CHL

CHN COLCONCOS

CUB CYP

DRC

EGY

FRN

GHAGNB

GRC

GUI

GUY

HAI

HONIND

INS

IRNIRQ

ISR

ITA

JOR

JPN

LBR

LES

LIB

MAAMLI

MON

MOR

MYA

MZM

NAM

NIC

NIG

NIRNTH

OMA

PAKPHI

PNG

PRK

QAT

ROK

RWA

SAF

SAL

SAU

SEN

SIE

SIN

SPN

SUD

SYR

TAW

TAZTHI

TOG

TRI

TURUAE

UGA

UKG

USA

VENYEM

ZAM

ZIM

In what ways can we represent these data?


Nodal variables: population, gdp and polity.

pop gdp polityAFG 24.78 19.29 -4.64ALB 3.28 8.95 3.82ALG 27.89 133.61 -3.91ANG 10.66 15.38 -2.55ARG 34.77 352.38 7.18AUL 18.1 408.06 10AUS 7.99 170.76 10...

......

...

Nodal variables can be stored as an n × p matrix X:

• n is the number of nodes;

• p is the number of nodal variables.


Dyadic variables: conflict, imports, shared IGOs, distance.

Conflict:

AFG ALB ALG ANG ARG AUL AUSAFG na 0 0 0 0 0 0ALB 0 na 0 0 0 0 0ALG 0 0 na 0 0 0 0ANG 0 0 0 na 0 0 0ARG 0 0 0 0 na 0 0AUL 0 0 0 0 0 na 0AUS 0 0 0 0 0 0 na

CHN DRC IRN IRQ ISR JOR PRKCHN na 0 0 0 0 0 0IRN 0 0 0 6 0 0 0IRQ 0 0 0 0 1 0 0JOR 0 0 0 1 0 0 0PRK 3 0 0 0 0 0 0TUR 0 0 2 6 0 0 0USA 0 0 1 7 0 0 2

These data may be stored as an asymmetric sociomatrix or more compactlyas a weighted, directed edgelist.



Imports:

AFG ALB ALG ANG ARG AUL AUSAFG na 0 0 0 0 0 0ALB 0 na 0 0 0 0 0.01ALG 0 0 na 0.01 0.06 0.03 0.13ANG 0 0 0 na 0.02 0 0ARG 0 0 0.01 0.01 na 0.09 0.07AUL 0 0 0 0 0.06 na 0.23AUS 0 0 0.22 0 0.02 0.03 na...

These data may be stored as an asymmetric sociomatrix or more compactlyas a weighted, directed edgelist.



Distance:

AFG ALB ALG ANG ARG AUL AUSAFG na 4.33 5.86 7.59 15.27 11.35 4.56ALB 4.33 na 1.54 5.61 11.61 15.6 0.81ALG 5.86 1.54 na 5.18 10.17 16.97 1.68ANG 7.59 5.61 5.18 na 7.78 13.26 6.35ARG 15.27 11.61 10.17 7.78 na 11.72 11.82AUL 11.35 15.6 16.97 13.26 11.72 na 15.91AUS 4.56 0.81 1.68 6.35 11.82 15.91 na...

These data may be stored as a symmetric sociomatrix or as a weighted,undirected edgelist.

Density Degree

Density and degree567 Statistical analysis of social networks

Peter Hoff


Density Degree

Summary statistics

Even in the simplest case of an undirected binary relational data, a sociomatrixcan be a complicated and opaque object:

Y =

na 1 0 0 · · ·1 na 0 1 · · ·0 0 na 0 · · ·0 1 0 na · · ·...

• statistic: A statistic t(Y) is any function of the data.

• descriptive data analysis: A representation of the main features of adataset via a set of statistics t1(Y), . . . , tK (Y).

Many important statistics can be computed from the sociomatrix using basicmatrix algebra.

Density Degree

Density

The most basic statistic of a relational dataset is the mean or density:

Density: the proportion of edges present in a graph.= (the number of edges)/(the maximum possible number of edges)

The number of edges observed is |E|.The number of possible edges is

• n(n − 1)/2 in an undirected graph;

• n(n − 1) in a directed graph.

Exercise: Derive the above numbers.

The density is therefore given by

2|E|/[n(n − 1)]for an undirected graph

|E|/[n(n − 1)]for a directed graph

Density Degree

Density as an average

Let yi,j be the binary indicator of an edge from i to j .

Y =

na y1,2 y1,3 y1,4 y1,5 y1,6y2,1 na y2,3 y2,4 y2,5 y2,6y3,1 y3,2 na y3,4 y3,5 y3,6y4,1 y4,2 y4,3 na y4,5 y4,6y5,1 y5,2 y5,3 y5,4 na y5,6y6,1 y6,2 y6,3 y6,4 y6,5 na

The number of edges can be computed from the sociomatrix as

|E| =∑

{i,j}:i<j

yi,j for an undirected graph

|E| =∑

(i,j):i 6=j

yi,j for a directed graph.

Density Degree

Density as an averageRecall the number of possible edges is

n(n − 1)/2 = the number of unordered pairs for an undirected graph

n(n − 1) = the number of ordered pairs for a directed graph.

Therefore, the densites in the two cases are given by

y =2∑{i,j}:i<j yi,j

n(n − 1)for an undirected graph

y =

∑(i,j):i 6=j yi,j

n(n − 1)for a directed graph.

Note that

• the denominators are the same for both cases;

• the numerators are the same as well.

If the relation is undirected, then yi,j = yj,i , and

2∑

i,j :i<j

yi,j =∑

i,j :i<j

yi,j +∑

i,j :i<j

yi,j

=∑

i,j :i<j

yi,j +∑

i,j :i>j

yi,j =∑

i,j :i 6=j

yi,j .

Density Degree

Density as an average

How to obtain the density from the sociomatrix should now be clear:

• n(n − 1) = number of non-diagonal cells in the matrix.

•∑

i,j :i 6=j yi,j = sum of the non-diagonal entries in the matrix.

y =

∑i,j :i 6=j yi,j

n(n − 1)= average value of yi,j among non-diagonal entries.

Density Degree

Computing the density in R

This is very easy to obtain in R:

Y

## [,1] [,2] [,3] [,4] [,5]## [1,] NA 0 1 0 0## [2,] 0 NA 0 1 0## [3,] 0 0 NA 0 0## [4,] 0 0 1 NA 0## [5,] 1 1 0 1 NA

mean(Y, na.rm = TRUE)

## [1] 0.3

sum(Y, na.rm = TRUE)

## [1] 6

nrow(Y)

## [1] 5

sum(Y, na.rm = TRUE)/(nrow(Y) * (nrow(Y) - 1))

## [1] 0.3

Density Degree

Computing the density in R

Convince yourself that the calculation works for directed and undirected data:

Y

## [,1] [,2] [,3] [,4] [,5]## [1,] NA 0 0 1 0## [2,] 0 NA 0 1 0## [3,] 0 0 NA 0 1## [4,] 1 1 0 NA 0## [5,] 0 0 1 0 NA


## [1] 0.3

sum(Y[upper.tri(Y)])

## [1] 3

sum(Y[upper.tri(Y)])/(nrow(Y) * (nrow(Y) - 1)/2)

## [1] 0.3

Density Degree

Densities of subgraphsIt is often of interest to compute densities within and between groups, asdefined by some attribute.

Example: Adolescent friendship ties among n = 145 high-schoolers• Y = undirected, binary friendship data• smoke = indicator of some smoking behavior.

Task: Compute densities of subgraphs based on smoking status

●

●●

●

●● ●

●

●●●

●

●●

● ●

●●

●● ●

●

●

●●●

●

●

●●

●

●

●

●

●

●

● ●●

●

●

●

● ●

●

● ●

●

●

●

●●

●●

●

●●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●●

●

●

●

●

●●

●

● ●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

Density Degree

Densities of subgraphs

Y[1:5, 1:5]

## [,1] [,2] [,3] [,4] [,5]## [1,] NA 0 0 0 0## [2,] 0 NA 0 0 0## [3,] 0 0 NA 0 0## [4,] 0 0 0 NA 0## [5,] 0 0 0 0 NA

smoke[1:5]

## [1] 1 0 0 1 1

mean(Y[smoke == 1, smoke == 1], na.rm = TRUE)

## [1] 0.0127


## [1] 0.05182


## [1] 0.0288

Density Degree

Densities as probability estimates

Densities such as these can be viewed as

• probabilities of the existence of a tie between randomly sampled nodes, or

• estimates of these probabilities,

depending on the context and your perspective.

Let

• i and j be two randomly sampled individuals;

• let θ be the probability that yi,j = 1.

Pr(yi,j = 1) = θ

Then

y = θ if your nodeset is the entire population of nodes.

y = θ if your nodeset is a random sample of nodes.

Density Degree

Nodal degreeNodes typically vary in their involvement in the network.For binary relations, this heterogeneity can be summarized by the nodaldegree.• Undirected relation:

• The degree of a node is the node’s number of ties.• Directed relation:

• The outdegree of a node is the node’s number of outgoing ties.• The indegree of a node is the node’s number of incoming ties.

The degees are easy to calculate from the sociomatrix Y = {yi,j : i 6= j}:

doi =

∑

j :j 6=i

yi,j

d ii =

∑

j :j 6=i

yj,i

This calculation works for both directed and undirected relations.Specifically, for an undirected relation,

doi =

∑

j :j 6=i

yi,j

=∑

j :j 6=i

yj,i = d ii = di

Density Degree

Nodal degree

Y =

na 0 1 1 0 11 na 1 0 0 10 0 na 1 0 10 0 1 na 0 11 0 1 1 na 10 0 1 1 0 na

do4 =

∑

j :j 6=4

y4,j = 2

d i4 =

∑

i :i 6=4

yi,4 = 4

Density Degree

Nodal degree

For an undirected relation:

Y =

na 0 1 1 0 10 na 0 0 0 11 0 na 1 1 11 0 1 na 0 10 0 1 0 na 01 1 1 1 0 na

d4 = do4 =

∑

j :j 6=4

y4,j = 3

= d i4 =

∑

i :i 6=4

yi,4 = 3

Density Degree

Degrees and density

Recall that the formula for the density of a graph, directed or undirected, is

y =1

n(n − 1)

∑

i 6=j

yi,j

=1

n(n − 1)

n∑

i=1

∑

j :j 6=i

yi,j

=1

n(n − 1)

n∑

i=1

doi = do/(n − 1),

and so the average degree is n − 1 times the density.

A similar calculation shows that y = d i/(n − 1). Thus

• the average indegree equals the average outdegree;

• the average degree equals n − 1 times the density.

Density Degree

Example: 1990-2000 international conflict

yi,j is the binary indicator that i initiated an action against j .

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

● ●

●●

●

●

●

●

●

●

●

AFG

ALB

ALG

ANGARG

AUL

BAH

BEL

BEN

BFO

BHUBNG

BOL

BOT

BRA

BUI

BUL

CAM

CAN

CAOCDI

CEN

CHA

CHL

CHN

COL

CON

COS

CUB

CYP

DEN

DJI

DOM

DRC

ECU

EGYEQG

FIN

FJI

FRN

GAB

GAM

GHA

GNB

GRC

GUI

GUY

HAI

HON

HUN

IND

INS

IRE

IRNIRQ

ISRITA JAM

JOR

JPNKEN

LAO

LBRLESLIB

MAA

MAL

MAS

MEX

MLI

MON

MOR

MYA

MZM

NAM

NEP

NEW

NIC

NIG

NIR

NOR

NTH

OMA

PAK

PAN

PAR

PHI

PNG

POL

POR

PRK

QAT

ROK

RUM

RWA

SAF

SAL

SAUSEN

SIE

SIN

SOM

SPNSRI

SUD

SWA

SWD

SWZ

SYR

TAW

TAZ

THI

TOG TRI

TUN

TURUAE

UGA

UKG

URU

USA

VEN

YEM

ZAM

ZIM

Density Degree

Computing degrees in R

odeg <- rowSums(Y, na.rm = TRUE)ideg <- colSums(Y, na.rm = TRUE)

odeg[1:10]

## AFG ALB ALG ANG ARG AUL AUS BAH BEL BEN## 1 0 0 2 1 1 0 1 0 0

ideg[1:10]

## AFG ALB ALG ANG ARG AUL AUS BAH BEL BEN## 2 1 0 3 2 3 0 2 3 1

Density Degree

Degree distributions

For an undirected relation, the set of degrees is an n × 2 matrix.

It is generally desirable to summarize the data further

This can be done by summarizing the joint degree distribution:

• mean degree, standard deviation of in- and outdegrees

• correlation of in- and outdegrees

• empirical marginal distributions of each set of degrees.

Density Degree

Univariate summaries of degrees

Let d = {d1, . . . , dn} be a set of nodal degrees

(either outdegrees, indegrees, or undirected degrees)

The entries of d are often summarized with the

• mean: d =∑

di/n = (n − 1)y ,

• variance: s2d =∑

(di − d)2/(n − 1) ,

• degree distribution.

Density Degree

Degree distribution

The degree distribution is a set of counts {f0, . . . , fn} where

fk = #{di = k} = number of nodes with degree equal to k

For example, ifd = (2, 1, 0, 3, 2, 3, 0, 2, 3, 1)

thenf = (2, 2, 3, 3, 0, 0, 0, 0, 0, 0, 0),

which we might write more informatively as

f =

(0 1 2 3 4 5 6 7 8 9 102 2 3 3 0 0 0 0 0 0 0

)

Density Degree

Bivariate summaries of degrees

Let do = (do1 , . . . , d

on ) and di = (d i

1, . . . , din) be vectors of out and indegrees.

The joint distribution of do and di are often described with

• the correlation between do and di

• a scatterplot of do versus di .

These are all straightforward to obtain in R.

Density Degree

Example: 1990-2000 international conflictmean(odeg)

## [1] 1.562

mean(ideg)

## [1] 1.562

sd(odeg)

## [1] 3.589

sd(ideg)

## [1] 1.984

cor(odeg, ideg)

## [1] 0.604

table(odeg)

## odeg

## 0 1 2 3 5 6 7 11 26 27

## 63 31 12 13 4 3 1 1 1 1

table(ideg)

## ideg

## 0 1 2 3 4 5 6 7 8 15

## 46 34 17 19 8 2 1 1 1 1

Density Degree


010

2030

4050

60

outdegree

tabl

e(od

eg)

0 5 11 26

010

2030

40

indegree

tabl

e(id

eg)

0 2 4 6 8 15 0 5 10 15 20 25

05

1015

odeg

ideg

AFGALBALG

ANGARGAUL

AUS

BAHBEL

BENBFOBHU

BNG

BOLBOTBRABUI

BULCAM

CAN

CAO

CDI

CEN

CHA

CHL

CHN

COL

COM

CONCOSCUBCYP

DENDJIDOM

DRC

ECU

EGY

EQGFINFJI

FRN

GABGAMGHAGNB

GRC

GUAGUIGUY

HAI

HONHUN

IND

INSIRE

IRN

IRQ

ISRITA

JAMJOR

JPN

KENLAO

LBR

LESLIBMAAMAGMALMASMAWMEXMLIMON

MORMYAMZM

NAM

NEPNEW

NIC

NIG

NIR

NOR

NTHOMAPAK

PANPAR

PHIPNG

POLPOR

PRKQATROK

RUM

RWASAFSAL

SAUSEN

SIE

SINSOM

SPN

SRI

SUD

SWASWDSWZ

SYR

TAWTAZTHITOGTRI

TUN

TUR

UAE UGAUKG

URU

USA

VENYEMZAMZIM

Density Degree


Descriptive degree analysis: For the 1990-2000 conflict data:

• The probability that any pair of countries were in conflict at some point isaround %1 ( y = 0.012 ).

• Countries were more hetergeneous in terms of initiating conflict than beingthe target ( sd(do) = 3.59 > 1.98 = sd(di ) ).

• Countries that initiatied more conflicts tended to be the target of moreconflicts ( cor(do , di ) = 0.60 ).

• USA, IRQ, JOR, TUR HAI were the most active nodes:• JOR has a very high outdegree and a low indegree.• HAI has a high indegree and a low outdegree.

Density Degree

More degree distributions

Yeast protein interaction network (n=813)

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●● ●●

●

●

●

●●

●●●

●

●

●●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

● ●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

● ●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●●

050

150

250

350

odeg

tabl

e(od

eg)

0 7 15 55

050

150

250

350

ideg

tabl

e(id

eg)

0 4 8 29

Density Degree

Degree variation

Note that for the conflict and protein networks,

• most nodes have small degrees,

• few nodes have large degrees.

Recall the degree distribution f = {f (k), k = 0, . . . , n}, where

f (k) = fk = #{di = k}.

For the two networks above, the degree distribution f (k) is roughly adecreasing function of k.

Density Degree

Power law behavior

Some researchers have posited an explicit form for f (k):

f (k) = ak−b, a > 0, b > 0.

A distribution for which this (roughly) holds is said to follow a power law.

A network (or network model) whose degree distribution follows a power law issaid to be scale free.

For such a degree distribution,

log f (k) = log a− b log k,

so that the logged value of f (k) should be linearly decreasing in log k.

This can be checked empirically by plotting the log degree distribution versusk, and assessing whether or not the relationship is linear.

Density Degree

Assessing power law behavior: conflict network

●

● ●

●

●

● ● ●●

0.0 1.0 2.0 3.0

01

23

log k

log(

odd)

●

●●

●

●

● ● ● ●

0.0 0.5 1.0 1.5 2.0 2.50

12

3log k

log(

idd)

Density Degree

Assessing power law behavior: protein network

●

●

●

●

●

●

●●● ●

●●●● ●

0 1 2 3 4

01

23

45

6

log k

log(

odd)

●

●

●

●

● ●

●●

●

●

0.0 1.0 2.0 3.00

12

34

56

log k

log(

idd)

Density Degree

Assessing power law behavior

The relationship is arguably linear, except for large values of k.

However, the frequencies for large k depend on only a few nodes, so maybe apower law is a reasonable model for the degree distributions for these networks.

Implications:

• some very simple models of network formation imply a power law fordegrees;

• other very simple models of network formation imply something other thana power law.

The degree distribution may then help us discriminate between classes ofmodels (scale free versus non-scale free).

We will return to this when we discuss hypothesis testing.

Density Degree

More degree distributions

Female friendship network (n=144)

●

●

●

●

● ●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

● ●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

05

1015

20

odeg

tabl

e(od

eg)

0 3 6 9 12 17

05

1015

ideg

tabl

e(id

eg)

0 6 13 21 29 47

Density Degree

Assessing power law behavior: friendship

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

0.0 1.0 2.0 3.0

01

23

log k

log(

odd)

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

0 1 2 3 40

12

3log k

log(

idd)

Density Degree

Degrees for valued graphs

Example: World trade

• average yearly change in trade volume between 30 countries;

• averaged over 10 years, 6 different commodities.

round(Y[1:5, 1:5], 3)

## Australia Austria Brazil Canada China## Australia NA 0.070 0.028 0.011 0.110## Austria 0.083 NA 0.033 0.078 0.156## Brazil 0.031 0.066 NA 0.103 0.151## Canada 0.012 0.024 -0.055 NA 0.084## China 0.145 0.088 0.145 0.157 NA

“Degree” for such data may not make sense:

• changes are all either negative or positive, none zero.

Density Degree

Row and cell means

As discussed,

• density is a mean;

• degrees are row and column sums.

For analysis of valued relational data, natural descriptors are

• the matrix (grand) mean;

• row and column means (or sums).

We should view density and degree as special cases of means (grand, row andcolumn).

Density Degree

Grand, row and column means

Let yi,j be the (possibly valued) relation from i to j .

Grand mean: the mean of all non-missing observations.

y·· = y··/[n(n − 1)] =∑

i 6=j

yi,j/[n(n − 1)]

Row mean: the mean of all non-missing observations in each row

yi· = yi·/(n − 1) =∑

j :j 6=i

yi,j/(n − 1)

Column mean: the mean of all non-missing observations in each column

y·j = y·j/(n − 1) =∑

i :i 6=j

yi,j/(n − 1)

Density Degree

International trade data


## [1] 0.039

rmean <- rowMeans(Y, na.rm = TRUE)cmean <- colMeans(Y, na.rm = TRUE)

mean(rmean)

## [1] 0.039

sd(rmean)

## [1] 0.03225

mean(cmean)

## [1] 0.039

sd(cmean)

## [1] 0.0346

cor(rmean, cmean)

## [1] 0.5645

Density Degree

International trade data

rmean

Fre

quen

cy

0.00 0.05 0.10

02

46

810

cmean

Fre

quen

cy

0.00 0.05 0.10 0.15

02

46

810

0.00 0.05 0.10 0.15

0.00

0.05

0.10

0.15

cmean

rmea

n

Australia

Austria

Brazil

Canada

China

China, Hong Kong SAR

Czech Rep.

DenmarkFinlandFrance

GermanyGreece

IndonesiaIreland

Italy

Japan

MalaysiaMexicoNetherlandsNew Zealand

Norway

Rep. of Korea

Singapore

Spain

Sweden

Switzerland

Thailand

Turkey

United KingdomUSA

Density Degree

Means for binary data

Grand mean: the mean of all non-missing observations.

y·· = y··/[n(n − 1)] =∑

i 6=j

yi,j/[n(n − 1)]

Row mean: the mean of all non-missing observations in each row

yi· = yi·/(n − 1) =∑

j :j 6=i

yi,j/(n − 1)

Column mean: the mean of all non-missing observations in each column

y·j = y·j/(n − 1) =∑

i :i 6=j

yi,j/(n − 1)

Such means can be used for binary data as well:

• the grand mean y·· is the density;

• the row means yi· are the outdegrees divided by n − 1;

• the column means y·j are the indegrees divided by n − 1.

Density Degree

Means for various data types

Means may not be appropriate for every type of relationship:

• categorical, non-ordinal relationships• yi,j ∈ { mother, father, sibling, uncle, . . .}.• yi,j ∈ { red , blue , green}

• ordinal non-metric relationships• yi,j ∈ { dislike, neutral, like }• yi,j ∈ { none, some, many }

• sparse valued data• yi,j = { number of minutes of communication }• yi,j = { number of emails sent }

Density Degree

Means for categorical relations

One strategy for such data is to decompose the relation:

yi,j ∈ { red , blue , green}• yi,j,r = 1× (yi,j = red ).

• yi,j,b = 1× (yi,j = blue ).

• yi,j,g = 1× (yi,j = green ).

Define yi,j = yi,j,r + yi,j,b + yi,j,g ,i.e. yi,j indicates the presence of any relationship.

• Grand mean: ¯y·· = y··r + y··b + y··g .

• Row means: ¯yi· = yi·r + yi·b + yi·g .

• Column means: ¯y·j = y·jr + y·jb + y·jg .

Density Degree

Conditional means

If yi,j is valued but sparse, it can be useful to decompose yi,j as follows:

xi,j =

{0 if yi,j = 01 if yi,j 6= 0

wi,j =

{NA if yi,j = 0yi,j if yi,j 6= 0

xi,j can be analyzed as with a binary relation:

• density, out and indegrees

• grand, row and column means

wi,j can be analyzed with means, but the interpretation is subtle:

• w·· is the mean of non-zero relations;

• wi· is the mean of i ’s non-zero outgoing relations;

• w·j is the mean of j ’s non-zero incoming relations.

Density Degree

Summary

• Grand, row and cell means starting point for relational data analysis:• represent the overall level of relations and heterogeneity among the nodes;• for binary data, they are equivalent to density, outdegree and indegree.

• Nodal Heterogeneity can be explored with degree distributions:• standard deviations, histograms or tables of in and out degrees;• correlations and scatterplots of outdegree versus indegree.

• Modifcations may be necessary for different data types:• non-binary categorical relations;• sparse, valued relations.

Paths and connectivity567 Statistical analysis of social networks

Peter Hoff


Network connectivity

Density (or average degree) is a very coarse description of a graph.

Compare the n-star graph to the n-circle graph below:

12

3

4

5

6

7

8

9 1

2

34

5

6

78

9

The two graphs have roughly the same density, but the structure is verydifferent.

Evaluating connectivity

Recall, density is the average degree divided by (n − 1).

What is the average degree of the

• n-star graph?

• the n circle graph?

For the circle graph,

d =1

n

n∑

i=1

di =1

n2n = 2

For the star graph,

d =1

n

n∑

i=1

di

=1

n((n − 1) + 1 + · · ·+ 1)

=1

n((n − 1) + (n − 1))

= 2n − 1

n≈ 2 for large n


1

2

34

5

6

7

8

9

1

2

3

4

56

7

8

9

Which graph seems more “connected” ?

• The star graph?• Each node is within at most two links of every other node.• Transmitting information in this network is easier than in the circle graph.

• The circle graph?• Removal of one node can completely disconnect the star graph.


What summary statistics can distinguish between the graphs?How about degree variability?

Circle graph: Var(d1, . . . , dn) = 0.

Star graph: Var(d1, . . . , dn) grows linearly with n.

So degree variance can distinguish between these graphs.


1

2

3

45

6

7

8

9

12

3

4

5

6

7

89

• What is the degree variance for each graph?

• Which one is more “connected”?


Intuitively, a “highly connected” graph is one in which nodes can reach eachother via connections, or a “path.”

To evaluate connectivity (and a variety of other statistics) it will be useful tocalculate the length of the shortest path between each pair of nodes.

Walks, trails and paths

Walk: A walk is any sequence of adjacent nodes.

Length of a walk: The number of nodes in the sequence, minus one.

1

2

34

5

6

Identify the following walks on the graph:

• w = (2, 1, 6, 3, 4)

• w = (2, 1, 6, 3, 4, 1, 5)

• w = (2, 1, 2, 5, 1, 4)


Trail: A trail is a walk on distinct edges.

1

2

34

5

6

Which of the following are trails on the graph?

• w = (2, 1, 6, 3, 4)

• w = (2, 1, 6, 3, 4, 1, 5)

• w = (2, 1, 2, 5, 1, 4)


Path: A path is a trail consisting of distinct nodes.

1

2

34

5

6

Which of the following are paths on the graph?

• w = (2, 1, 6, 3, 4)

• w = (2, 1, 6, 3, 4, 1, 5)

• w = (2, 1, 2, 5, 1, 4)

Walks, trails and pathsBy these definitions,

• each path is a trail,

• each trail is a walk.

1

2

34

5

6

Depending on the application, we may be interested in the numbers and kindsof walks, trails and paths between nodes:

• probability: random walks on graphs;

• communication or disease transmission: number of trails between nodes;

• graphical representation: shortest path between two nodes.

Reachability and connectedness

Reachable: Two nodes are reachable if there is a path between them.

Connected: A network is connected if every pair of nodes is reachable.

Component: A network component is a maximal connected subgraph.

A “maximal connected subgraph” is a connected node-generated subgraph thatbecomes unconnected by the addition of another node.

An unconnected graphIdentify all connected components

AFG

ANG

BAH

BNG

BUI

CANCHN

COS

CUB

CYP

DRC

EGY

FRN GHAGRC

GUI

HON

IND

IRNIRQ

ISR

ITA

JPN

LBR

LES

MYA

NIC

NTH

OMA

PAK

PHIPRK

QAT

ROK

RWA

SAF

SAU

SUD

SYR

TAW

TAZ

THI

TOGTUR

UAE

UGA

UKG

USA

Connectivity and cutpoints

How connected is a graph?

One notion of connectivity is robustness to removal of nodes or edges.

Cutpoint: Let G be a graph and G−i be the node-generated subgraph,generated by {1, . . . , n} \ {i}. Then node i is a cutpoint if the number ofcomponents of G−i is larger than the number of components of G.

1

2

3

4

5 6

7

8

9

1

2

3

45

6

7

8

9

Exercise: Identify any cutpoints of the two graphs.

Node connectivity

Intuitively,Node connectivity The node connectivity of a graph k(G) is the minimumnumber of nodes that must be removed to disconnect the graph.

1

23

4

5

67

8

9

1

2

3

45

6

7

8

9

1

2

34

5

6

Exercise: Compute the node connectivities of the above graphs.

Limitations of the node connectivity measure

This notion of connectivity is of limited use:

• perhaps most useful in term of designing robust communication networks;

• less useful for describing the types of networks we’ve seen.

In particular, node connectivity is a very coarse measure

• it disregards the size of the graph;

• it disregards the number of cutpoints;

• it is of limited descriptive value for real social networks.

Exercise: What is the node connectivity of all real graphs we’ve seen so far?

Average connectivity

Node connectivity is based on a “worst case scenario.”

A more representative measure might be some sort of average connectivity.

Connected nodes:Nodes i ,j are connected if there is a path between them.

Dyadic connectivity:k(i , j) = minimum number of removed nodes required to disconnect i , j .

Average connectivity:k =

∑i<j k(i , j)/

(n2

)

• k can be computed in polynomial time;

• bounds on k in term of degree, path distances can be obtained.

(Beineke, Oellermann, Pippert 2002)

Connectivity and bridges

A similar notion of connectivity is to considering robustness to edge removal.

Bridge: Let G be a graph and G−e be the graph minus the edge e. Then e is abridge if the number of components of G−e is greater than the number ofcomponents of G.

1

2

3

4

5

6

7

8

9

1

2

3

4

56

7

8

9

1

2

34

5

6

Exercise: Identify some bridges in the above graphs.

Connectivity and bridgesIdentify bridges in the big connected component of the conflict network:

AFG

ANG

BAH

CAN

CHN

CUBCYP

DRCEGYFRN

GRC

IRN

IRQ

ISR

ITA

JPN

NTH

OMA

PHI

PRK

QAT

ROK

RWA

SAU

SUD

SYRTAWTUR

UAE

UGA

UKG

USA

Edge connectivity

Edge connectivity:The edge connectivity of a graph is the minimum number of that need to beremoved to disconnect the graph.

Like node connectivity, the edge connectivity is of limited use:

• for many real-life graphs, the edge connectivity is one;

• averaged versionse of connectivity may be of more use.

Geodesic distance

A geodesic in graph theory is just a shortest path between two nodes.

The geodesic distance d(i , j) between nodes i and j is the length of a shortestpath between i and j .

1

2

34

5

6

D =

0 1 2 1 1 11 0 3 2 1 22 3 0 1 3 11 2 1 0 2 21 1 3 2 0 21 2 1 2 2 0

Note: Geodesics are not unique. Consider paths connecting nodes 1 and 3.

Nodal eccentricity

The eccentricity of a node is the largest distance from it to any other node:

ei = maxj

di,j .

1

2

34

5

6

e = (2, 3, 3, 2, 3, 2)

Diameter

Eccentricities, like degrees, are node level statistics.

One common network level statistic based on distance is the diameter:

The diameter of a graph is the largest between-node distance:

diam(Y) = maxi,j

di,j

= maxi

maxj

di,j

= maxi

ei

Diameter

1

2

34

5

6

For our simple six-node example graph,

diam(Y) = max{2, 3, 3, 2, 3, 2} = 3

Diameter and average eccentricity

• For a connected graph, the diameter can range from 1 to n − 1.

• For an unconnected graph• by convention the diameter is taken to be infinity;• diameters of connected subgraphs can be computed.

• Like node connectivity, diameter is reflects a “worst case scenario.”• Average eccentricity, e =

∑ei/n may be a more representative measure.

• When comparing graphs with different numbers of nodes, it is useful toscale by n − 1.

A recommendation:1

n − 1e =

1

n(n − 1)

∑ei

Counting walks between nodes

Enumerating the number and types of walks between nodes is useful:

• existence of walks between nodes tells us about connectivity.

• existence of walks of minimal length tells us about geodesics.

Walks of all lengths between nodes can be counted using matrix multiplication.

Matrix multiplication

General matrix multiplication: Let

• X be an l ×m matrix ;

• Y be an m × n matrix.

The matrix product XY is the l × n matrix Z, with entries

zi,j =m∑

k=1

xi,kyk,j

(draw the picture)

Useful note: The entries of Z are dot products of rows of X with columns of Y.

XY =

x1 →x2 →x3 →

(y1 y2 y3 y4

↓ ↓ ↓ ↓

)=

x1 · y1 x1 · y2 x1 · y3 x1 · y4

x2 · y1 x2 · y2 x2 · y3 x2 · y4

x3 · y1 x3 · y2 x3 · y3 x3 · y4

Computing comemberships with matrix multiplication

Let Y be an n ×m affiliation network:

yi,j = membership of person i in group j

Y =

0 1 00 1 11 1 00 0 00 1 11 0 0

Transpose: The transpose of an n ×m matrix Y is the m × n matrix X = YT

with entries xi,j = yj,i .

YT =

0 0 1 0 0 11 1 1 0 1 00 1 0 0 1 0


Exercise: Complete the multiplication of Y by YT :

YYT =

0 1 00 1 11 1 00 0 00 1 11 0 0

0 0 1 0 0 11 1 1 0 1 00 1 0 0 1 0

=

12

20

21

Letting X = YYT , we see xi,j is the number of comemberships of nodes i and j .


Exercise: Compute YTY, and identify what it represents.

YTY =

0 0 1 0 0 11 1 1 0 1 00 1 0 0 1 0

0 1 00 1 11 1 00 0 00 1 11 0 0

Multiplying binary sociomatricesRepeated multiplication of a sociomatrix by itself identifies walks.

1

2

34

5

6

Y

## [,1] [,2] [,3] [,4] [,5] [,6]## [1,] 0 1 0 1 1 1## [2,] 1 0 0 0 1 0## [3,] 0 0 0 1 0 1## [4,] 1 0 1 0 0 0## [5,] 1 1 0 0 0 0## [6,] 1 0 1 0 0 0

Y %*% Y

## [,1] [,2] [,3] [,4] [,5] [,6]## [1,] 4 1 2 0 1 0## [2,] 1 2 0 1 1 1## [3,] 2 0 2 0 0 0## [4,] 0 1 0 2 1 2## [5,] 1 1 0 1 2 1## [6,] 0 1 0 2 1 2

Note: We have replaced the diagonal with zeros for this calculation.

Multiplying binary sociomatrices

1

2

34

5

6

Y %*% Y

## [,1] [,2] [,3] [,4] [,5] [,6]## [1,] 4 1 2 0 1 0## [2,] 1 2 0 1 1 1## [3,] 2 0 2 0 0 0## [4,] 0 1 0 2 1 2## [5,] 1 1 0 1 2 1## [6,] 0 1 0 2 1 2

• How many walks of length 2 are there from i to i?

• How many walks of length 2 are there from i to j?

Multiplying binary sociomatrices

1

2

34

5

6

Y %*% Y %*% Y

## [,1] [,2] [,3] [,4] [,5] [,6]## [1,] 2 5 0 6 5 6## [2,] 5 2 2 1 3 1## [3,] 0 2 0 4 2 4## [4,] 6 1 4 0 1 0## [5,] 5 3 2 1 2 1## [6,] 6 1 4 0 1 0

Result: Let W = Yk . Then

wi,j = # of walks of length k between i and j

Application: Assessing reachability and Connectedness

Define X(k), k = 1, . . . n − 1 as follows:

X(1) = Y

X(2) = Y + Y2

...

X(k) = Y + Y2 + · · ·+ Yk

Note:

• X(1) counts the number of walks of length 1 between nodes;

• X(2) counts the number of walks of length ≤ 2 between nodes;

• X(k) counts the number of walks of length ≤ k between nodes.

Application: Assessing reachability and Connectedness

Recall:If two nodes are reachable, there must be a path (walk) between them oflength less than or equal to n − 1.

Result:Nodes i and j are reachable if X(n−1)[i , j ] > 0.

Recall:A graph is connected if all pairs are reachable.

Result:A graph is connected if X(n−1)

[i,j] > 0 for all i , j .

Finding geodesics

Each path is a walk, so

di,j = length of the shortest path between i and j

= length of the shortest walk between i and j

= first k for which Yk[i,j] > 0

This suggests an algorithm for finding geodesic distances.

Finding geodesic distances

1

2

34

5

6

Y

## [,1] [,2] [,3] [,4] [,5] [,6]## [1,] 0 1 0 1 1 1## [2,] 1 0 0 0 1 0## [3,] 0 0 0 1 0 1## [4,] 1 0 1 0 0 0## [5,] 1 1 0 0 0 0## [6,] 1 0 1 0 0 0

D =

0 1 ? 1 1 11 0 ? ? 1 ?? ? 0 1 ? 11 ? 1 0 ? ?1 1 ? ? 0 ?1 ? 1 ? ? 0


1

2

34

5

6

Y %*% Y

## [,1] [,2] [,3] [,4] [,5] [,6]## [1,] 4 1 2 0 1 0## [2,] 1 2 0 1 1 1## [3,] 2 0 2 0 0 0## [4,] 0 1 0 2 1 2## [5,] 1 1 0 1 2 1## [6,] 0 1 0 2 1 2

D =

0 1 2 1 1 11 0 ? 2 1 22 ? 0 1 ? 11 2 1 0 2 21 1 ? 2 0 21 ? 1 2 2 0


1

2

34

5

6

Y %*% Y %*% Y

## [,1] [,2] [,3] [,4] [,5] [,6]## [1,] 2 5 0 6 5 6## [2,] 5 2 2 1 3 1## [3,] 0 2 0 4 2 4## [4,] 6 1 4 0 1 0## [5,] 5 3 2 1 2 1## [6,] 6 1 4 0 1 0

D =

0 1 2 1 1 11 0 3 2 1 22 3 0 1 3 11 2 1 0 2 21 1 3 2 0 21 3 1 2 2 0

R-function netdist

netdist

## function (Y)## {## n <- dim(Y)[1]## Y0 <- Y## diag(Y0) <- 0## Ys <- Y0## D <- Y## D[Y == 0] <- n + 1## diag(D) <- 0## s <- 2## while (any(D == n + 1) & s < n) {## Ys <- 1 * (Ys %*% Y0 > 0)## D[Ys == 1] <- ((s + D[Ys == 1]) - abs(s - D[Ys == 1]))/2## s <- s + 1## }## D## }

R-function netdist

netdist(Y)

## [,1] [,2] [,3] [,4] [,5] [,6]## [1,] 0 1 2 1 1 1## [2,] 1 0 3 2 1 2## [3,] 2 3 0 1 3 1## [4,] 1 2 1 0 2 2## [5,] 1 1 3 2 0 2## [6,] 1 2 1 2 2 0

Application: Multidimensional scaling

Often we have data on distances or dissimilarities between a set of objects.

• Machine learning: di,j = |xi − xj |, xi = vector of characteristics of object i .

• Social networks: di,j = geodesic distance between i and j .

It is often useful to embed these distances in a low-dimensional space.

• visualization: convert distances to a map in 2 dimensions for plotting.

• data reduction: convert n × n dissimilarity matrix to an n × p matrix ofpositions.


Multidimensional scaling (MDS): Given a dissimilarity or distance matrixD = {di,j} find a set of points x1, . . . , xn in Rp to minimize

∑

i,j

(di,j − |xi − xj |)2

The output of an MDS algorithm is a set of points x1, . . . , xn such that

|xi − xj | ≈ di,j

In other words, treating x1, . . . , xn as points on a map, the between point mapdistances should approximate D.


Cities data: Data on presence of companies in various cities

yi,j = indicator that cities i and j have many companies in common

Amsterdam

Bangkok

Barcelona

Beijing

Brussels

Budapest

Buenos Aires

Caracas

Chicago

Frankfurt

Hong Kong

Jakarta

London

Los Angeles

Madrid

Manila

Mexico City

Milan

Moscow

New YorkParis

Prague

San Francisco

Sao PauloSantiago

Singapore

Sydney

Taipei

Tokyo

Toronto

Warsaw

Zurich

Amsterdam

Bangkok

Barcelona

Beijing

Brussels

Budapest

Buenos Aires

Caracas

Chicago

Frankfurt

Hong Kong

Jakarta

London

Los Angeles

Madrid

Manila

Mexico CityMilanMoscow

New YorkParis

Prague

San Francisco

Sao Paulo

Santiago

Singapore

Sydney

Taipei

Tokyo Toronto

Warsaw

Zurich

Application: Finding connected componentsThe distance matrix, or X(n−1), identifies connected components of a graph.

12

3

4

5 67

8

9

10

1112

Y

## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]## [1,] 0 1 1 0 0 0 0 0 0 0 0 0## [2,] 1 0 1 0 0 0 0 0 0 0 0 0## [3,] 1 1 0 1 0 0 0 0 0 0 0 0## [4,] 0 0 1 0 1 1 0 0 0 0 0 0## [5,] 0 0 0 1 0 1 0 0 0 0 0 0## [6,] 0 0 0 1 1 0 0 0 0 0 0 0## [7,] 0 0 0 0 0 0 0 1 1 0 0 0## [8,] 0 0 0 0 0 0 1 0 1 0 0 0## [9,] 0 0 0 0 0 0 1 1 0 1 0 0## [10,] 0 0 0 0 0 0 0 0 1 0 1 1## [11,] 0 0 0 0 0 0 0 0 0 1 0 1## [12,] 0 0 0 0 0 0 0 0 0 1 1 0


12

3

4

5 67

8

9

10

1112

Y + Y %*% Y

## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]## [1,] 2 2 2 1 0 0 0 0 0 0 0 0## [2,] 2 2 2 1 0 0 0 0 0 0 0 0## [3,] 2 2 3 1 1 1 0 0 0 0 0 0## [4,] 1 1 1 3 2 2 0 0 0 0 0 0## [5,] 0 0 1 2 2 2 0 0 0 0 0 0## [6,] 0 0 1 2 2 2 0 0 0 0 0 0## [7,] 0 0 0 0 0 0 2 2 2 1 0 0## [8,] 0 0 0 0 0 0 2 2 2 1 0 0## [9,] 0 0 0 0 0 0 2 2 3 1 1 1## [10,] 0 0 0 0 0 0 1 1 1 3 2 2## [11,] 0 0 0 0 0 0 0 0 1 2 2 2## [12,] 0 0 0 0 0 0 0 0 1 2 2 2


12

3

4

5 67

8

9

10

1112

Y + Y %*% Y + Y %*% Y %*% Y

## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]## [1,] 4 5 6 2 1 1 0 0 0 0 0 0## [2,] 5 4 6 2 1 1 0 0 0 0 0 0## [3,] 6 6 5 6 2 2 0 0 0 0 0 0## [4,] 2 2 6 5 6 6 0 0 0 0 0 0## [5,] 1 1 2 6 4 5 0 0 0 0 0 0## [6,] 1 1 2 6 5 4 0 0 0 0 0 0## [7,] 0 0 0 0 0 0 4 5 6 2 1 1## [8,] 0 0 0 0 0 0 5 4 6 2 1 1## [9,] 0 0 0 0 0 0 6 6 5 6 2 2## [10,] 0 0 0 0 0 0 2 2 6 5 6 6## [11,] 0 0 0 0 0 0 1 1 2 6 4 5## [12,] 0 0 0 0 0 0 1 1 2 6 5 4

R-function concom

concom

## function (Y)## {## Y0 <- Y## diag(Y0) <- 1## Y1 <- Y0## for (i in 1:dim(Y0)[1]) {## Y1 <- 1 * (Y1 %*% Y0 > 0)## }## cc <- list()## idx <- 1:dim(Y1)[1]## while (dim(Y1)[1] > 0) {## c1 <- which(Y1[1, ] == 1)## cc <- c(cc, list(idx[c1]))## Y1 <- Y1[-c1, -c1, drop = FALSE]## idx <- idx[-c1]## }## cc[order(-sapply(cc, length))]## }

R-function concom

concom(Y)

## [[1]]## [1] 1 2 3 4 5 6#### [[2]]## [1] 7 8 9 10 11 12

connodes <- concom(Y)

Y[connodes[[1]], connodes[[1]]]

## [,1] [,2] [,3] [,4] [,5] [,6]## [1,] 0 1 1 0 0 0## [2,] 1 0 1 0 0 0## [3,] 1 1 0 1 0 0## [4,] 0 0 1 0 1 1## [5,] 0 0 0 1 0 1## [6,] 0 0 0 1 1 0

Walks, trails and paths for directed graphs

All these concepts generalize to directed graphs:

Directed walk: A sequence of nodes i1, . . . , iK such that yik ,ik+1 = 1 fork = 1, . . . ,K − 1.

Powers of the sociomatrix correspond to counts of directed walks.

X(k) = Y + Y2 + · · ·+ Yk

x(k)i,j = # of directed walks from i to j of length k or less

Example: Praise among monks

ROMUL

AMBROSE

BERTH

PETER

LOUIS

VICTOR

netdist(Yr)

## ROMUL AMBROSE BERTH PETER LOUIS VICTOR## ROMUL 0 7 7 7 7 7## AMBROSE 7 0 7 7 7 7## BERTH 2 1 0 1 1 2## PETER 1 2 1 0 1 2## LOUIS 2 3 2 1 0 1## VICTOR 2 2 1 1 1 0

Centrality567 Statistical analysis of social networks

Peter Hoff


Centrality

A common goal in SNA is to identify “central” nodes in a network.

What does “central” mean?

• active?

• important?

• non-redundant?

Koschutzki et al. (2005) attempted a classification of centrality measures

• Reach: ability of ego to reach other vertices

• Flow: quantity/weight of walks passing through ego

• Vitality: effect of removing ego from the network

• Feedback: a recursive function of alter centralities

Common centrality measures

We will define and compare four centrality measures:

• degree centrality (based on degree)

• closeness centrality (based on average distances)

• betweeness centrality (based on geodesics)

• eigenvector centrality (recursive: similar to page rank methods)

Standardized centrality measures

Node-level indicesLet c1, . . . , cn be node-level centrality measures:

ci = centrality of node i by some metric

It is often useful to standardize the ci ’s by their maximum possible value:

ci = ci/cmax

Network centralization

Network-level indicesHow centralized is the network?To what extent is there a small number of highly central nodes?

• Let c∗ = max{c1, . . . , cn}• Let S =

∑i [c∗ − ci ]

Then

• S = 0 if all nodes are equally central;

• S is large if one node is most central.

Network centralization

Network level centralization index

C =

∑i [c∗ − ci ]

max∑

i [c∗ − ci ]

The “max” in the denominator is over all possible networks.

• 0 ≤ C ≤ 1;

• C = 0 when all nodes have the same centrality;

• C = 1 if one actor has maximal centrality and all others have minimal.

Networks for comparison

We will compare the following graphs under different centrality measures:

●

●

●

●

●

●

●

●

●

●

● ● ●

●

● ●

●

●●

●

These are the star graph, line graph, y-graph, the circle graph.

Which do you feel is most “centralized”? Which the least?

Degree centrality

Idea: A central actor is one with many connections.

This motivates the measure of degree centrality

• undirected degree centrality: cdi =∑

j :j 6=i yi,j

• outdegree centrality: coi =∑

j :j 6=i yi,j

• indegree centrality: c ii =∑

j :j 6=i yj,i

The standardized degree centrality is

cdi = cdi /cdmax = cdi /(n − 1)

Degree centrality

●

●

●

●

●

●

●

●

●

●

● ● ●

●

● ●

●

●

●

●

apply(Ys, 1, sum, na.rm = TRUE)

## [1] 1 1 4 1 1

apply(Yl, 1, sum, na.rm = TRUE)

## [1] 1 2 2 2 1

apply(Yy, 1, sum, na.rm = TRUE)

## [1] 1 2 3 1 1

apply(Yc, 1, sum, na.rm = TRUE)

## [1] 2 2 2 2 2

Degree centralization

cdi : actor centrality

cd∗ : maximum actor centrality observed in the network∑

i

[cd∗ − cdi ] : sum of differences between most central actor and others

Centralization

C d =

∑i [c

d∗ − cdi ]

maxY∑

i [cd∗ − cdi ]

What is the maximum numerator that could be attained by an n-node graph?


The maximum occurs when

• one node has the largest possible degree (cd∗ = n − 1),

• the others have the smallest possible degree cdi = 1.

This is the star graph.

maxY

∑

i

[cd∗ − cdi ] =∑

i

[(n − 1)− cdi ]

= 0 + (n − 1− 1) + · · ·+ (n − 1− 1)

= (n − 1)(n − 2)

C d(Y) =

∑i [c

d∗ − cdi ]

(n − 1)(n − 2)


Exercise: Compute the degree centralization for the four n = 5 graphs:

• the star graph;

• the line graph;

• the y-graph;

• the circle graph.


Cd <- function(Y) {n <- nrow(Y)d <- apply(Y, 1, sum, na.rm = TRUE)sum(max(d) - d)/((n - 1) * (n - 2))

}

Cd(Ys)

## [1] 1

Cd(Yy)

## [1] 0.5833

Cd(Yl)

## [1] 0.1667

Cd(Yc)

## [1] 0

●

●

●

●

●

● ● ●

●

●

●

●

●

●

●

●

●

●●

●

Closeness centrality

Idea: A central node is one that is close, on average, to other nodes.

This motivates the idea of closeness centrality

• (geodesic) distance: di,j is the minimal path length from i to j ;

• closeness centrality: cci = 1/∑

j :j 6=i di,j = 1/[(n − 1)di ] ;

• limitation: not useful for disconnected graphs.

Closeness centrality

cci = 1/[(n − 1)di ]

Recall,

da < db ⇒ 1

da>

1

db

and so a node i would be “maximally close” if di,j = 1 for all j 6= i .

cdmax =1

n − 1

The standardized closeness centrality is therefore

cci = cci /cdmax

= (n − 1)cci = 1/di .

Closeness centralization

cci : actor centrality

cc∗ : maximum actor centrality observed in the network∑

i

[cc∗ − cci ] : sum of differences between most central actor and others

Centralization

C c =

∑i [c

c∗ − cdi ]

maxY∑

i [cc∗ − cci ]



The maximum occurs when

• one node has the largest possible closeness (d∗ = 1, cc∗ = 1/(n − 1)),

• the others have the smallest possible closeness, given that cc∗ = 1/(n− 1).

(Freeman, 1979)

For what graph are these conditions satisfied?

• For c∗c = 1/(n − 1), one node must be connected to all others.

• To then maximize centralization, the centrality of the other nodes must beminimized.

This occurs when none of the non-central nodes are tied to each other, i.e. thestar graph.

Closeness centralizationFor a non-central node in the star graph,

di =1 + 2 + · · ·+ 2

n − 1

=2(n − 2) + 1

n − 1

=2n − 3

n − 1

cci = 1/[(n − 1)di ] =1

2n − 3.

Therefore, for the star graph

∑

i

[cc∗ − cci ] = 0 + (1

n − 1− 1

2n − 3) + · · · ( 1

n − 1− 1

2n − 3)

= (n − 1)×(

1

n − 1− 1

2n − 3

)

= (n − 1)× n − 2

(2n − 3)(n − 1)

=n − 2

2n − 3


To review, the maximum of∑

i [cc∗ − cci ] occurs for the star graph, for which

∑

i

[cc∗ − cci ] =n − 2

2n − 3

Therefore, the centralization of any graph Y is

C c(Y) =

∑i [c

c∗ − cci ]

maxY∑

i [cc∗ − cci ]

=

∑i [c

c∗ − cci ]

(n − 2)/(2n − 3)

Alternatively, as cci = (n − 1)cci ,

C c(Y) =

∑i [c

c∗ − cci ]

(n − 2)/(2n − 3)

=

∑i [c

c∗ − cci ]

[(n − 1)(n − 2)]/(2n − 3)


Exercise: Compute the closeness centralization for the four n = 5 graphs:

• the star graph;

• the line graph;

• the y-graph;

• the circle graph.


Cc <- function(Y) {n <- nrow(Y)D <- netdist(Y)c <- 1/apply(D, 1, sum, na.rm = TRUE)

sum(max(c) - c)/((n - 2)/(2 * n - 3))}

Cc(Ys)

## [1] 1

Cc(Yy)

## [1] 0.6352

Cc(Yl)

## [1] 0.4222

Cc(Yc)

## [1] 0

●

●

●

●

●

● ● ●

●

●

●

●

●

●

●

●

●

●●

●

Betweeness centrality

Idea: A central actor is one that acts as a bridge, broker or gatekeeper.

• Interaction between unlinked nodes goes through the shortest path(geodesic);

• A “central” node is one that lies on many geodesics.

This motivates the idea of betweenness centrality

• gj,k = number of geodesics between nodes j and k;

• gj,k(i) = number of geodesics between nodes j and k going through i ;

• cbi =∑

j<k gj,k(i)/gj,k


Interpretation: gj,k(i)/gj,k is the probability a “message” from j to k goesthrough i .

• j and k have gj,k routes of communication;

• i is on gj,k(i) of these routes;

• a randomly selected route contains i with probability gj,k(i)/gj,k .

Note: WF p.191“(betweenness centrality) can be computed even if the graph is not connected”(WF)

• Careful: If j and k are not reachable, what is gj,k(i)/gj,k ?

• By convention this is set to zero for unreachable pairs.


cbi =∑

j<k

gj,k(i)/gj,k

• 0 ≤ cbi , with equality when i lies on no geodesics (draw a picture)

• cbi ≤(n−12

)= (n−1)(n−2)

2, with equality when i lies on all geodesics.

The standardized betweenness centrality is

cbi = 2cbi /[(n − 1)(n − 2)].

Betweenness centrality

●

●

●

●

●

●

●

●

●

●

● ● ●

●

● ●

●

●

●

●

## Error: there is no package called ’sna’

Exercise: Compute the betweenness centrality for each node in each graph.

betweenness(Ys, gmode = "graph")

## Error: could not find function "betweenness"

betweenness(Yl, gmode = "graph")


betweenness(Yy, gmode = "graph")


betweenness(Yc, gmode = "graph")


Betweenness centralization

cbi : actor centrality

cb∗ : maximum actor centrality observed in the network∑

i

[cb∗ − cbi ] : sum of differences between most central actor and others

Centralization

C b =

∑i [c

b∗ − cbi ]

maxY∑

i [cb∗ − cbi ]


Betweenness centralizationThe maximum occurs when

• one node has the largest possible betweeness (cb∗ =(n−12

)),

• the others have the smallest possible betweeness (cdi = 0).

Again, this is the star graph.

maxY

∑

i

[cb∗ − cbi ] =∑

i

[

(n − 1

2

)− cbi ]

= 0 + (

(n − 1

2

)− 0) + · · ·+ (

(n − 1

2

)− 0)

= (n − 1)

(n − 1

2

)

(n−12

)= (n − 1)(n − 2)/2, so

C b(Y) =

∑i [c

b∗ − cbi ]

(n − 1)(n−12

)

= 2

∑i [c

b∗ − cbi ]

(n − 1)2(n − 2)

Betweenness centralization

Cb <- function(Y) {require(sna)n <- nrow(Y)b <- betweenness(Y, gmode = "graph")2 * sum(max(b) - b)/((n - 1)^2 * (n - 2))

}

Cd(Ys)

## [1] 1

Cd(Yy)

## [1] 0.5833

Cd(Yl)

## [1] 0.1667

Cd(Yc)

## [1] 0

●

●

●

●

●

● ● ●

●

●

●

●

●

●

●

●

●

●●

●

Eigenvector centrality

Idea: A central actor is connected to other central actors.

This definition is recursive:Eigenvector centrality: The centrality of each vertex is proportional to thesum of the centralities of its neighbors

• Formula: cei = 1λ

∑j :j 6=i yi,jc

ej

• Central vertices are those with many central neighbors

• A variant of eigenvector centrality is used by Google to rank Web pages

Google Describing PageRank: PageRank relies on the uniquely democraticnature of the web by using its vast link structure as an indicator of anindividual page’s value. In essence, Google interprets a link from page A topage B as a vote, by page A, for page B. But, Google looks at more than thesheer volume of votes, or links a page receives; it also analyzes the page thatcasts the vote. Votes cast by pages that are themselves “important” weighmore heavily and help to make other pages “important.”

Eigenvector centrality

cei =1

λ

∑

j :j 6=i

yi,jcej

Using matrix algebra, such a vector of centralities satisfies

Yce = λce ,

where the missing diagonal of Y has been replaced with zeros.

A vector ce satisfying the above equation is an eigenvector of Y.

There are generally multiple eigenvectors. The centrality is taken to be the onecorresponding to the largest value of λ.

• this corresponds with the best rank-1 approximation to Y;

• nodes with large cei ’s have “strong activity” in the “primary dimension” ofY.

Eigenvector centrality●

●

●

●

●

●

●

●

●

●

● ● ●

●

● ●

●

●●

●

## Error: there is no package called ’sna’

evecc <- function(Y) {diag(Y) <- 0tmp <- eigen(Y)$vec[, 1]tmp <- tmp * sign(tmp[1])tmp

}

evecc(Ys)

## [1] 0.3536 0.3536 0.7071 0.3536 0.3536

evecc(Yl)

## [1] 0.2887 0.5000 0.5774 0.5000 0.2887

evecc(Yy)

## [1] 0.2706 0.5000 0.6533 0.3536 0.3536

evecc(Yc)

## [1] 0.4472 0.4472 0.4472 0.4472 0.4472

Eigenvector centralization

Ce <- function(Y) {n <- nrow(Y)e <- evecc(Y)Y.sgn <- matrix(0, n, n)Y.sgn[1, -1] <- 1Y.sgn <- Y.sgn + t(Y.sgn)e.sgn <- evecc(Y.sgn)sum(max(e) - e)/sum(max(e.sgn) - e.sgn)

}

Ce(Ys)

## [1] 1

Ce(Yy)

## [1] 0.8029

Ce(Yl)

## [1] 0.5176

Ce(Yc)

## [1] 9.421e-16

Empirical study: Comparing centralization of different networks

Comparison of centralization metrics across four networks:

• yeast: binding interactions among 426 yeast proteins

• boys: friendships among 136 boys

• girls: friendships among 142 girls

• tribes: postive relations among 12 NZ tribes


●

●

●●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

● ●●

●●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●●

●● ● ●

●●●

●

●

●

●

●

●●

●

●●

●

●

●

●

● ●●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●

●

●●

●

● ●●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●

●●

●

●

●

●

●

●

●

●●

●

●

●●

●

● ●

●●

●●

●

●●

●

●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●●

●

●

●

●

●●

●

●

●

●

● ●

●

●

●

●●

●

●

●●

●

● ●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●● ●

●

●

●

●

●

● ●

● ●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

● ●

●

●

●

●

●

●

●


degree closeness betweenness eigenvectoryeast .12 .23 .57 .91boys .26 .29 .28 .29girls .27 .32 .18 .38

tribes .35 .50 .51 .47


Comments:• The protein network looks visually centralized, but

• most centralization is local;• globally, somewhat decentralized.

• The friendship networks have similar degree and closeness centralities.

• The tribes network has one particularly central node.

Odds ratios for covariates effects567 Statistical analysis of social networks

Peter Hoff


Statistics for covariate effects

Descriptive network analysis: Computation of

• graph level statistics: density, degree distribution, centralization

• node level statistics: degrees, centralities

Often we also have node-level covariate information.

• Covariate: Node characteristics that “co-vary” with the network.

Questions:

• How to describe the relationship between the network and covariates?

• Can the covariates explain/predict network behavior?

Example: Girls friendships

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

● ●

010

20

0 2 4 6 8 10 13 17 20

05

15

0 4 8 13 21 29 47


## [1] 0.04089

Ce(1 * (Y + t(Y) > 0))

## [1] 0.382

Covariate effects

We also have data on GPA

• hgpa = indicator of above-average gpa;

mean(Y[hgpa == 1, hgpa == 1], na.rm = TRUE)

## [1] 0.04737


## [1] 0.03762


## [1] 0.03936


## [1] 0.03903

Covariate effects

We also have data on smoking behavior:

• hsmoke = indicator of above-average smoking behavior.

mean(Y[hsmoke == 1, hsmoke == 1], na.rm = TRUE)

## [1] 0.04478


## [1] 0.03063


## [1] 0.044


## [1] 0.04426

Summarizing densities of subgraphs

There are a lot of probabilities here (four for each covariate)

Is there a nice way to summarize them?

xj=0 xj=1xi=0 0.039 0.039xi=1 0.038 0.047

Table : gpa

xj=0 xj=1xi=0 0.044 0.044xi=1 0.031 0.045

Table : smoking

Odds ratios

Odds: Let Pr(E) be the probability of an event. The odds of E are

odds(E) =Pr(E)

1− Pr(E)

Probabilities are between 0 and 1, odds are between 0 and ∞.

The “effect” of a variable on a probability is often described via the odds ratio.

Odds ratio: Let

• Pr(E |A) = the probability of some event E under condition A

• Pr(E |B) = the probability of some event E under condition B

The odds ratio is

odds(E : A,B) =Pr(E |A)

1− Pr(E |A)

1− Pr(E |B)

Pr(E |B)

Note thatodds(E : A,B) = 1⇒ Pr(E |A) = Pr(E |B)

Effect of a covariate on a tie

Let xi ∈ {0, 1} for i = 1, . . . , n be a binary variable.

• xi = indicator of high gpa;

• xi = smoking status;

• xi = indicator of membership to some group.

Let Pr(yi,j = 1|xi , xj) = pxi xj

xj=0 xj=1xi=0 p00 p01

xi=1 p10 p11

Given a network, we might want to describe the “effect” of xi and xj on yi,j :

odds(yi,j = 1|{xi = 1, xj = 1}, {xi = 0, xj = 1}) =p11

1− p11

1− p01

p01

Odds ratios

p11 <- mean(Y[hsmoke == 1, hsmoke == 1], na.rm = TRUE)p01 <- mean(Y[hsmoke == 0, hsmoke == 1], na.rm = TRUE)

(p11/(1 - p11))/(p01/(1 - p01))

## [1] 1.018

This result says that the odds of a tie are 1.02 times higher under the conditionxi = 1, xj = 1 than xi = 0, xj = 1.

This result seems to suggest that smokers and non-smokers are equally friendlyto smokers. However, the result could be due to

• no effect of smoking or

• differential rates of ties among smokers and nonsmokers.

Odds ratios for tie preferences

A better question to ask might be:

Does a person’s characteristic determine the characteristics of whomthey choose as friends?

The probabilities related to this question condition on the existence of a tie:

Pr(xj = 1|yi,j = 1, xi = 1) =Pr(yi,j = 1|xj = 1, xi = 1) Pr(xj = 1|xi = 1)

Pr(yi,j = 1|xi = 1)

=Pr(yi,j = 1|xj = 1, xi = 1) Pr(xj = 1)

Pr(yi,j = 1|xi = 1)

= p11Pr(xj = 1)

Pr(yi,j = 1|xi = 1)

This probability can be interpreted as, for example,

What is the probability that a friend of a smoker is another smoker?

Such a probability is more descriptive of tie preferences.


Pr(xj = 1|yi,j = 1, xi = 1) = probability that a friend of a smoker is another smoker

However, this probability will mostly reflect the (typically low) overall tiedensity.

To assess the “effect” of xi on choosing another smoker as a friend, we canlook at an appropriate odds ratio:

odds(xj = 1 : {yi,j = 1, xi = 1}, {yi,j = 1, xi = 0}) =

Pr(xj = 1|yi,j = 1, xi = 1)

Pr(xj = 0|yi,j = 1, xi = 1)

Pr(xj = 0|yi,j = 1, xi = 0)

Pr(xj = 1|yi,j = 1, xi = 0)


px1x2 = Pr( tie |x1, x2)

≈ density in the x1, x2 submatrix

xj=0 xj=1xi=0 p00 p01

xi=1 p10 p11

odds(xj = 1|yi,j = 1, xi = 1) =p11

p10

Pr(xj = 1)

Pr(xj = 0)

odds(xj = 1|yi,j = 1, xi = 0) =p01

p00

Pr(xj = 1)

Pr(xj = 0)

The odds ratio is therefore

odds ratio(xj = 1|{yi,j = 1, xi = 1}{yi,j = 1, xi = 0}) =p11/p10

p01/p00=

p11p00

p10p01


γ =p11p00

p10p01

This ratio represents

the relative preference of egos with x = 1 versus x = 0

to tie to alters with x = 1.

Interestingly, one can show (homework?)

odds ratio(xi = 1|{yi,j = 1, xj = 1}{yi,j = 1, xj = 0}) =p11p00

p10p01= γ.

This ratio represents

the relative attractiveness of alters with x = 1 versus x = 0

to egos with x = 1.


xj=0 xj=1xi=0 p00 p01

xi=1 p10 p11

Are there interesting/useful ways to represent numbers in the table?

• In SNA, more interested in relative rates than absolute rates.

• Absolute rates are derivable from relative rates and a baseline, andvice-vera:

{p00, p01, p10, p11} ∼ {p00, p01/p00, p10/p00, p11/p00}∼ {p00, p01/p00, p10/p00, (p11p00)/(p01p10)}

Interpreting probability ratios

{p00, p01, p10, p11} ∼ {p00, p01/p00, p10/p00, (p11p00)/(p01p10)}

Basline: p00 represents a baseline rate

Relative rates: p01/p00, p10/p00 represent relative rates

• p01/p00 = density of 0→ 1 relative to 0→ 0 (“attractiveness of 1’s”)

• p10/p00 = density of 1→ 0 relative to 0→ 0 (“sociability of 1’s”)

Odds ratio: (p11p00)/(p01p10) = γ represents preferences for homophily.

Interpreting relative rates

You can show (for example) that

p01

p00=

odds(xj = 1|yi,j = 1, xi = 0)

odds(xj = 1)

While this is a ratio of odds, it is not exactly an odds ratio:

• The conditioning events are not complementary.

The ratio still has a reasonable interpretation:

• The ratio can be interpreted as how much the odds of xj = 1 change ifyou are told that j has a link from a person i with xi = 0.

Friendship example

p.smoke <- c(mean(Y[hsmoke == 0, hsmoke == 0], na.rm = TRUE), mean(Y[hsmoke ==1, hsmoke == 0], na.rm = TRUE), mean(Y[hsmoke == 0, hsmoke == 1], na.rm = TRUE),mean(Y[hsmoke == 1, hsmoke == 1], na.rm = TRUE))

(p.smoke[1] * p.smoke[4])/(p.smoke[2] * p.smoke[3])

## [1] 1.471

p.gpa <- c(mean(Y[hgpa == 0, hgpa == 0], na.rm = TRUE), mean(Y[hgpa == 1, hgpa ==0], na.rm = TRUE), mean(Y[hgpa == 0, hgpa == 1], na.rm = TRUE), mean(Y[hgpa ==1, hgpa == 1], na.rm = TRUE))

(p.gpa[1] * p.gpa[4])/(p.gpa[2] * p.gpa[3])

## [1] 1.249

Homophily is positive for both smoking and gpa.

Friendship example

p.smoke[1]

## [1] 0.04426

p.smoke[2]/p.smoke[1]

## [1] 0.692


## [1] 0.9942


## [1] 1.471

Friendship example

• The baseline rate is low (p00 = 0.04 )

• The rate of ties from nonsmokers to smokers is about the same as thatfrom nonsmokers to nonsmokers (p01/p00 = .99).

• The rate of ties from smokers to nonsmokers is much lower than that fromnonsmokers to nonsmokers (p10/p00 = .69).

• There is strong homophily (γ = 1.47)• A tie from a smoker is more likely to be to a smoker than a tie from a

nonsmoker is.• A tie to a smoker is more likely to be from a smoker than a tie to a

nonsmoker is.

Friendship example

p.gpa[1]

## [1] 0.03903

p.gpa[2]/p.gpa[1]

## [1] 0.9638

p.gpa[3]/p.gpa[1]

## [1] 1.008


## [1] 1.249

Note: It is possible for p01/p00 = p10/p00 = 1, but γ to be large.

• In this case γ = p11/p00.

• Deviations from 1 indicate heterogeneity in within-group ties.

• Such deviations indicate within group preferences, or homophily.

Logistic regression

A useful tool for describing effects on a binary variable is logistic regression

Given

• a binary outcome variable y

• binary explanatory variables x1, x2

A logistic regression model for y in terms of x1, x2 is

Pr(y = 1|x1, x2) =eβ0+β1x1+β2x2+β12x1x2

1 + eβ0+β1x1+β2x2+β12x1x2

Based on this, we see that

Pr(y = 0|x1, x2) =1

1 + eβ0+β1x1+β2x2+β12x1x2

odds(y = 1|x1, x2) = exp(β0 + β1x1 + β2x2 + β12x1x2)

log odds(y = 1|x1, x2) = β0 + β1x1 + β2x2 + β12x1x2

Log-odds ratios in logistic regression

For example,

odds(y = 1|0, 0) = exp(β0)

odds(y = 1|1, 0) = exp(β0 + β1)

odds ratio(y = 1|(1, 0), (0, 0)) =exp(β0 + β1)

exp(β0)= exp (β1)

log odds ratio(y = 1|(1, 0), (0, 0)) = β1

In logistic regression

• β1, the “effect” of x1, represents the log odds ratio (y = 1|(1, 0), (0, 0))

• β2, the “effect” of x2, represents the log odds ratio (y = 1|(0, 1), (0, 0))

What about the interaction?


odds(y = 1|x1, x2) = exp(β0 + β1x1 + β2x2 + β12x1x2)

odds ratio(y = 1|(1, 1), (0, 1)) =exp(β0 + β1 + β2 + β12)

exp(β0 + β2)= exp(β1 + β12)

odds ratio(y = 1|(1, 0), (0, 0)) =exp(β0 + β1)

exp(β0)= exp (β1)

Therefore

odds ratio(y = 1|(1, 1), (0, 1))

odds ratio(y = 1|(1, 0), (0, 0))= exp(β12)

logodds ratio(y = 1|(1, 1), (0, 1))

odds ratio(y = 1|(1, 0), (0, 0))= β12


ox1x2 =Pr(y1,2 = 1|x1, x2)

1− Pr(y1,2 = 1|x1, x2)=

px1x2

1− px1x2

x2=0 x2=1x1=0 o00 o01

x1=1 o10 o11

Under the logistic regression model

β0 = log o00

β1 = logo10

o00

β2 = logo10

o00

β12 = logo11/o01

o10/o00= log

o11o00

o01o10

How do {β0, β1, β2, β12} relate to {p00, p10/p00, p01/p00, (p11p00)/(p01p10)} ?

Friendship exampleXr <- matrix(hsmoke, nrow(Y), ncol(Y))

Xc <- t(Xr)

xr <- c(Xr)

xc <- c(Xc)

y <- c(Y)

fit <- glm(y ~ xr + xc + xr * xc, family = binomial)

exp(fit$coef)

## (Intercept) xr xc xr:xc

## 0.04631 0.68225 0.99391 1.49277

Do these numbers look familiar?

p.smoke[1]

## [1] 0.04426


## [1] 0.692


## [1] 0.9942


## [1] 1.471

Comparing summaries

If network density is very low,

• 1− pxi xj ≈ 1

• oxi xj = pxi xj /(1− pxi xj ) ≈ pxi xj

and so

xi=0 xi=1xj=0 p00 p01

xj=1 p10 p11

≈xi=0 xi=1

xj=0 o00 o01

xj=1 o10 o11

Therefore

{p00, p10/p00, p01/p00, (p11p00)/(p01p10)} ≈ {o00, o10/o00, o01/o00, (o11o00)/(o01o10)}= {eβ0 , eβ1 , eβ2 , eβ12}

Undirected data

Now we have

xj=0 xj=1xi=0 p00 p01 = p10

xi=1 p10 = p01 p11

Now there are only three (unique) numbers in the table.

We can express these as follows:

{p00, p01, p11} ∼ {p00, p01/p00, p11p00/p201}

The interpretation of these is roughly the same as before:

• p00 represents a baseline rate (both x ’s 0)

• p01/p00 represents a relative rate (one x 0 versus both x ’s 0)

• p11p00/p201 represents a homophily effect - the preference of like for like.

• the excess within-group density, beyond the effect of one group being moreactive than another.

Logistic regression for undirected data

log odds(y = 1|x1, x2) = β0 + β1x1 + β2x2 + β12x1x2

Here, x1 and x2 are not “sender” and “receiver” effects, as there are no sendersor receivers.

As there is no way to differentiate the effect of x1 versus that of x2, we musthave β1 = β2, and the model becomes

log odds(y = 1|x1, x2) = β0 + β1(x1 + x2) + β2x1x2

• β0 represents a baseline rate;

• β1 represents the “additive” effect of either x1 = 1 or x2 = 1 on the rate;

• β12 represents the additional effect of homophily on the rate.

Interpreting coefficients

log odds(y = 1|x1, x2) = β0 + β1(x1 + x2) + β2x1x2

For example, suppose

• yi,j = indicator of friendship;

• xi = indicator of friendliness.

Under no homophily, i.e. β12 = 0,

log odds(y = 1|0, 1) = β0 + β1

log odds(y = 1|1, 1) = β0 + 2β1

The rate is higher under (xi = 1, xj = 1) than (xi = 1, xj = 0)

• not because of homophily,

• but because both people are friendly, instead of one.

Interpreting coefficients

Under positive homophily, i.e. β12 > 0,

log odds(y = 1|0, 1) = β0 + β1

log odds(y = 1|1, 1) = β0 + 2β1 + β12 > β0 + 2β1

The rate is higher under (xi = 1, xj = 1) than (xi = 1, xj = 0)

• both people are friendly,

• additionally, friendly people prefer friendly people.

Computation in R

ys <- c(1 * (Y + t(Y) > 0))

fit <- glm(ys ~ xr + xc + xr * xc, family = binomial)

exp(fit$coef)

## (Intercept) xr xc xr:xc## 0.07455 0.84579 0.84579 1.45293

Summary

• Effects of binary covariates can be described with submatrix densities.

• Submatrix densities can be reparameterized:• baseline rate;• relative probabilities;• homophily.

• These summaries are related to logistic regression coefficients.

Testing and model evaluation567 Statistical analysis of social networks

Peter Hoff


Testing and rejecting models

Descriptive network analysis: Computation of

• graph level statistics: density, degree distribution, centralization

• node level statistics: degrees, centralities

• covariate effects: relative densities and odds ratios

What conclusions can we draw from such statistics?

• Are observed statistics large or small?• as compared to other observed networks?• as compared to hypothetical networks?

• Are observed statistics consistent with a theoretical model?• what model is appropriate for comparison?• how are comparisons made?

Example: Girls friendships

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

● ●●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

● ●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

010

20

0 2 4 6 8 10 13 17 20

05

15

0 4 8 13 21 29 47


## [1] 0.04089

Cd(Y)

## [1] 0.1004

Cd(t(Y))

## [1] 0.2918

Covariate effectsWe also have data on• gpa: hgpa = indicator of above-average gpa;• smoking behavior: hsmoke = indicator of above-average smoking behavior.


## [1] 0.692


## [1] 0.9942


## [1] 1.471

###

p.gpa[2]/p.gpa[1]

## [1] 0.9638

p.gpa[3]/p.gpa[1]

## [1] 1.008


## [1] 1.249

Descriptive results

Summary:• Density:

• the overall density of ties is 0.041.

• Centrality:• outdegree centrality (.10) is less than indegree centrality (.29)

• Smoking:• smokers tend to be less active as senders of ties (p10/p00 = .69)• there is positive homophily for smoking (γ = 1.47)

• Gpa:• students with high and low gpas have similar densities

( p10/p00 ≈ p01/p00 ≈ 1).• there is positive homophily for gpa (γ = 1.25)

What conclusions can we draw?

Can we infer anything about how these people formed ties?

Models of tie formation

A probability model of tie formation is a probability distribution oversociomatrices.

More specifically, let

Y = {Y : yi,j ∈ {0, 1}, yi,i = NA}

be the set of all possible sociomatrics.

A probability model P over Y assigns a number P(Y) to each Y ∈ Y

0 ≤ P(Y) ≤ 1 for all Y ∈ Y∑

Y∈Y

P(Y) = 1

Simple random graph

The simple random graph model Pθ assumes

• all ties are formed independently of each other;

• each tie exists with some probability θ, common across all ties.

Under P0 the entries of Y are independent and identically distributed:

y1,2, . . . , yn−1,n ∼ i.i.d. binary(θ)

Exercise: Compute the probability of each graph under Pθ

NA 0 1 1 0 00 NA 0 1 0 00 1 NA 0 0 01 0 0 NA 1 00 1 0 1 NA 00 0 0 0 0 NA

NA 1 1 1 1 01 NA 0 0 0 01 0 NA 0 0 01 0 0 NA 0 01 0 0 0 NA 00 0 0 0 0 NA

Simple random graph

Under Pθ, the probability of a graph Y is

Pθ(Y) =∏

i 6=j

θyi,j (1− θ)1−yi,j

= θ∑

yi,j (1− θ)∑

(1−yi,j )

Would this be a good model for our friendship data?

Are the data consistent with this model?

Evaluating the simple random graph

Specification of a probability model requires specification of θ.

Let’s make an ad-hoc selection of θ = 0.04 for now, and ask the question:

Are the data consistent with a simple random graph model withθ = 0.04?

To facilitate our evaluation of the model, let’s focus on our data in terms of afew summary statistics:

• td(Y) = tie density;

• tcr (Y) = outdegree centrality;

• tcc(Y) = indegree centrality;

For each test statistic, we will ask the question

Is the observed value of our test statistic consistent with the values ofthe statistic we could have observed, under Pθ?

Null distributions

t123.stat <- function(Y) {c(mean(Y, na.rm = TRUE), Cd(Y), Cd(t(Y)))

}

###

t123.obs <- t123.stat(Y)theta <- 0.04

T123.sim <- NULLfor (s in 1:S) {

Ys <- matrix(rbinom(nrow(Y)^2, 1, theta), nrow(Y), nrow(Y))diag(Ys) <- NAT123.sim <- rbind(T123.sim, t123.stat(Ys))

}

Null distributions

density0.036 0.040 0.044

050

100

150

200

250

outdegree centralization0.04 0.06 0.08 0.10 0.12

010

2030

4050

indegree centralization0.05 0.10 0.15 0.20 0.25 0.30

010

2030

40

Hypothesis testing and null distributions

A pure hypothesis test is a comparison of the data to a probability model.

Ingredients• A test statistic t:

• t : Y → R ;• t is a known function of the data.

• A null distribution: Pr(·|H)• H refers to the hypothesized probability model, i.e. the “null hypothesis.”• Pr(·|H) is the probability distribution of t under H.

• A comparison of tobs = t(Y) to Pr(·|H).• graphical comparison;• p-value: Pr(t ≥ tobs|H).

To make the p-value useful, we usually choose t to be large for values of Ythat are “far away” from H.

Example: testing with the density statistic

Let’s consider testing the SRG model with θ = .04 using the density statistic.

• H : {yi,j : i 6= j} ∼ i.i.d. binary(0.04)

• t(Y) = |y·· − 0.04|

A large value of the test statistic t

• occurs if y is very different from 0.04

• suggests something is wrong with H.

Computing the null distribution

The null distribution of this particular statistic can be computed with

• a normal approximation;

• a Monte Carlo approximation.

For many other statistics, the Monte Carlo approximation will be moreaccurate.

theta <- 0.04

t.H <- NULL

for (s in 1:S) {Ysim <- matrix(rbinom(nrow(Y)^2, 1, theta), nrow(Y), nrow(Y))diag(Ysim) <- NAt.H <- c(t.H, abs(mean(Ysim, na.rm = TRUE) - theta))

}

Example: testing with the density statistic

density difference

Den

sity

0.000 0.001 0.002 0.003 0.004 0.005

010

020

030

040

050

0

The blue histogram is the (Monte Carlo approximation to) the distribution of tunder H.

The red line is the observed value tobs = t(Y) of the test statistic.

Is there a big discrepancy?Should we “reject” H?

Quantifying model-data discrepancy

A popular way of quantifying the discrepancy is with a p-value:

p = Pr(t(Y) ≥ tobs|H)

The p-value can be approximated via the Monte Carlo method:

p.val <- mean(t.H > t.o)p.val

## [1] 0.5086

The result says that, if H were true, the probability of observing a value of t(Y)bigger than 9× 10−4 is about 0.51.

In other words,

If H were true,

values of t as big as tobs are not extremely unlikely.

A p-value of 0.51 is not generally seen as

• a strong discrepancy between model and data;

• evidence against H.

Decision making and error rates

TruthDecision H not Haccept H correct type II errorreject H type I error correct

Suppose H is true and you will perform the following procedure:

• Sample Y

• Compute tobs = t(Y)

• Compute p = Pr(t(Y) ≥ tobs|H)

• Reject H if p < α, accept otherwise.

Then your probability of making a type I error is α.

Pr(rejectH|H is true ) = Pr(p < α|H) = α.

Often people choose α = 0.05.If H is true, then their chance of falsely rejecting H is 0.05.

Rejecting the SRG

H : {yi,j : i 6= j} ∼ i.i.d. binomial(0.04)

Most wouldn’t reject H based on the density statistic and its p-value.

We might say that the model is adequate in terms of t(Y) = y··.

Is the model adequate in terms of other statistics? Consider

• todc = Cod(Y) (outdegree centralization)

• tidc = Cid(Y) (indegree centralization)

Rejecting the SRG based on centralization

t.H <- NULL

for (s in 1:S) {Ysim <- matrix(rbinom(nrow(Y)^2, 1, theta), nrow(Y), nrow(Y))diag(Ysim) <- NAt.H <- rbind(t.H, c(Cd(Ysim), Cd(t(Ysim))))

}

t.o <- c(Cd(Y), Cd(t(Y)))

outdegree centralization

Den

sity

0.04 0.06 0.08 0.10

010

2030

4050

indegree centralization

Den

sity

0.05 0.10 0.15 0.20 0.25 0.30

010

2030

4050

p-values from centralization

pval.o <- mean(t.H[, 1] >= t.o[1])pval.i <- mean(t.H[, 2] >= t.o[2])

pval.o

## [1] 0

pval.i

## [1] 0

The plots and p-values indicate strong evidence against H:

• The binomial(0.04) model predicts much less outdegree centralizationthan was observed.

• The binomial(0.04) model predicts much much less indegree centralizationthan was observed.

Statistical versus probabilistic models

We evaluated evidence against

H : {yi,j : i 6= j} ∼ i.i.d. binary(0.04) .

Generally, we won’t have such a specific hypothesis:

• Rejecting binary(0.040) doesn’t mean we reject binary(0.041).

• We are more interested in testing all i.i.d. binary models.

Statistical model

A statistical model is a collection of probability distributions indexed by anunknown parameter.

P = {p(Y|θ) : θ ∈ Θ}

• θ is the unknown parameter;

• Θ is the parameter space;

• p(Y|θ) is the distribution of Y if θ is correct.

Example (SRG model): The simple random graph model is given by

• θ ∈ Θ = [0, 1]

• p(Y|θ) is the i.i.d. binary(θ) model:

p(Y|θ) = θ∑

yi,j (1− θ)∑

(1−yi,j )

Rejecting a statistical model

Is there a procedure that can evaluate all SRG distributions at once?

We will discuss two approaches:

• ad-hoc approach: intuitive but not exactly correct in terms of type I error.

• principled: intuitive and correct, but less generalizable.

Ad-hoc approach: the best case scenario

Idea:

• We reject the binary(θ) distribution if samples from it don’t look like Y;

• We would reject all binary(θ) distributions if none of them look like Y;

• Instead of comparing Y to each θ, just compare Y to the “most similar” θ.

Which value of θ ∈ [0, 1] is makes binary(θ) most similar to Y?

Maximum likelihood estimation:

Y ∼ p(Y|θ), θ unknown.

θ ∈ Θ

The maximum likelihood estimator(MLE) of θ is the value θ that maximizesthe probability of the observed data:

p(Y|θ) ≥ p(Y|θ) for all θ ∈ Θ

MLE for SRG

Let’s find the MLE for the SRG:

p(Y|θ) = θ∑

yi,j (1− θ)∑

(1−yi,j )

= θmy (1− θ)m(1−y)

where

• m = n(n − 1) = the number of pairs;

• y =∑

yi,j/m = the density.

Now log x is an increasing function.

Therefore, the maximizer of p(Y|θ) is the maximizer of log p(Y|θ):

log p(Y|θ) = my log θ + m(1− y) log(1− θ)

= m [y log θ + (1− y) log(1− θ)]

MLE for SRG

0.0 0.2 0.4 0.6 0.8 1.0

−80

000

−40

000

theta

log

likel

ihoo

d

0.00 0.02 0.04 0.06 0.08 0.10−

5500

−45

00−

3500

theta

log

likel

ihoo

d

MLE for SRG

Recall, the maximum occurs where the derivative (slope) is zero:

ddθ

log p(Y|θ) = m[y

θ− 1− y

1− θ ]

= m[y

θ− 1− y

1− θ] = 0

y

θ=

1− y

1− θy

1− y=

θ

1− θ

which is satisfied byθ = y .

Convince yourself that this makes intuitive sense.

Result: By the maximum likelihood criterion, the member ofP = {binomial(θ) : θ ∈ [0, 1]} that is closest to Y is the binomial(y)distribution.

Testing with the best case scenario

Intuition:If we reject Y ∼ binomial(θ),we should reject Y ∼ binomial(θ) for all θ ∈ Θ.

Model evaluation procedure: Given a test statistic t(Y),

1. compute θ from Y

2. simulate Y1, . . . , YS from p(Y|θ);

3. compare t(Y) to t(Y1), . . . , t(YS).

Let’s do this for our centralization statistics:

• todc = Cod(Y) (outdegree centralization)

• tidc = Cid(Y) (indegree centralization)

Rejecting the best SRG

theta <- mean(Y, na.rm = TRUE)t.H <- NULL

for (s in 1:S) {Ysim <- matrix(rbinom(nrow(Y)^2, 1, theta), nrow(Y), nrow(Y))diag(Ysim) <- NAt.H <- rbind(t.H, c(Cd(Ysim), Cd(t(Ysim))))

}

t.o <- c(Cd(Y), Cd(t(Y)))

Rejecting the SRG based on centralization


Den

sity

0.04 0.06 0.08 0.10

010

2030

4050


Den

sity

0.05 0.10 0.15 0.20 0.25 0.30

010

2030

4050

pval.o <- mean(t.H[, 1] >= t.o[1])pval.i <- mean(t.H[, 2] >= t.o[2])

pval.o

## [1] 4e-04

pval.i

## [1] 0

Ad-hockery

The null distributions and p-values here are not exactly proper.

Suppose Y ∼ binary(θ) for some true but unknown θ.

• θ is only approximated by θ.

• the distribution of t under binary(θ) is not the same as under binary(θ).

• Note this latter distribution is “random”, as θ is random.

In principle: For some models, we can develop a testing procedure with acorrect null distribution.

In practice: It will turn out that our ad-hoc approach often gives the sameanswers as our exact approach.

Conditional tests567 Statistical analysis of social networks

Peter Hoff


Evaluating statistical models

H : {yi,j , i 6= j} ∼ binary(θ), for some θ ∈ [0, 1]

We would like to evaluate this model, but we don’t know precisely what toexpect from it, as we don’t know which is the correct value of θ, if H were tobe true.

Problem: The null distribution of Y depends on the unknown θ.

A solution:

• Perhaps some aspect of the null distribution doesn’t depend on θ

• If so, then it could be used to evaluate the null model (for all θ ∈ [0, 1]).

The tool we need to develop this idea further is conditional probability

Conditional distributions

Let y1, y2, y3 ∼ i.i.d. binary(θ).

Suppose we are told that y1 + y2 + y3 = 2.

• What is the probability that y1 = 1?

• What is the probability that y1 = y2 = 1 ?

Intuitive calculation

Consider all possible outcomes, before we are told the sum:

y1 0 1 0 1 0 1 0 1y2 0 0 1 1 0 0 1 1y3 0 0 0 0 1 1 1 1


Compute the probabilities of each:

y1 0 1 0 1 0 1 0 1y2 0 0 1 1 0 0 1 1y3 0 0 0 0 1 1 1 1

Pr (1− θ)3 θ(1− θ)2 θ(1− θ)2 θ2(1− θ) θ(1− θ)2 θ2(1− θ) θ2(1− θ) θ3


Now we are told that y1 + y2 + y3 = 2:

y1 0 1 0 1 0 1 0 1y2 0 0 1 1 0 0 1 1y3 0 0 0 0 1 1 1 1

Pr (1− θ)3 θ(1− θ)2 θ(1− θ)2 θ2(1− θ) θ(1− θ)2 θ2(1− θ) θ2(1− θ) θ3

So it seems that having been told y1 + y2 + y3 = 2,

• (1, 1, 0), (1, 0, 1), (0, 1, 1) are the only possibilities;

• each of these was equally probable to begin with;

• they should be equally probable given the information.


Hence,Pr((y1, y2, y3) = (1, 1, 0)|y1 + y2 + y3 = 2)Pr((y1, y2, y3) = (1, 0, 1)|y1 + y2 + y3 = 2)Pr((y1, y2, y3) = (0, 1, 1)|y1 + y2 + y3 = 2)

= 1/3

Let’s answer our conditional probability questions:

Suppose we are told that y1 + y2 + y3 = 2.

• What is the probability that y1 = 1?• 2/3

• What is the probability that y1 = y2 = 1 ?• 1/3

Conditional probability

Let A and B be two uncertain events.

Pr(B|A) =Pr(A and B)

Pr(A)

Example: Consider days with non-rainy mornings:

• B = rainy in the evening;

• A = cloudy in the morning.

B Bc

A .4 .2Ac .1 .3

Pr(B) = Pr(B ∩ A) + Pr(B ∩ Ac) = .4 + .1 = .5

Pr(A) = Pr(B ∩ A) + Pr(Bc ∩ A) = .4 + .2 = .6

Pr(B|A) = Pr(B ∩ A)/Pr(A) = .4/.6 = 2/3


Let A and B be two uncertain events.


Pr(A)

Example: A card deck is shuffled and a single card is dealt.

• B = the card is the 3 of hearts.

• A = the card is red.


Pr(A)

=Pr(the card is the 3 of hearts)

Pr(the card is red)

=1/52

1/2= 1/26


Example: Let y1, y2, y3 ∼ i.i.d. binary(θ).

• B = {(y1, y2, y3) = (0, 1, 1)}• A = {y1 + y2 + y3 = 2}

Pr(B) = (1− θ)× θ × θ = θ2(1− θ)

Pr(A) = Pr((0, 1, 1)) + Pr((1, 0, 1)) + Pr((1, 1, 0)) = 3θ2(1− θ)

Pr(B|A) =θ2(1− θ)

3θ2(1− θ)=

1

3.

Note that this probability doesn’t depend on the value of θ.

Conditional probability distributions

A conditional probability distribution is an assignment of conditionalprobabilities to a partition of the outcome space.

Let B1, . . . ,BK be a partition, so that

• Pr(Bj and Bk) = 0;

• Pr(B1 or · · · or BK ) = Pr(B1) + · · ·Pr(BK ) = 1

A conditional probability distribution over B1, . . . ,BK given A is simply thecollection {Pr(Bk |A), k = 1, . . . ,K}.

Conditional probability distributions

Example: Let y1, y2, y3 ∼ i.i.d. binary(θ).

Conditional on A = {y1 + y2 + y3 = 2}, you should now be able to show that

• Pr((y1, y2, y3) = B|A) = 1/3 if B is either (0,1,1), (1,0,1) or (1,1,0) .

• Pr((y1, y2, y3) = B|A) = 0 otherwise

We say the distribution of y = (y1, y2, y3) given∑

yi = 2 is uniform, as itassigns equal probabilities to all possible events under the condition.

Conditioning binary sequences

y1, . . . , ym ∼ i.i.d. binary(θ)

What is the conditional distribution of

y = (y1, . . . , ym) given

s =∑

yi ?

Let y = (y1, . . . , ym) be a binary sequence.

Pr(y = y|s) =Pr(y = y and

∑yi = s)

Pr(∑

yi = s)

First note that this must equal zero if∑

yi 6= s.

Conditioning binary sequences

y1, . . . , ym ∼ i.i.d. binary(θ)

Let y = (y1, . . . , ym) be a binary sequence with∑

yi = s.

Pr(y = y|s) =Pr(y = y and

∑yi = s)

Pr(∑

yi = s)

=Pr(y = y)

Pr(∑

yi = s)

=θs(1− θ)m−s

(ms

)θs(1− θ)m−s

=1(ms

) .

Recall,(ms

)is the number of sequences that have s ones.

• the probability doesn’t depend on θ;

• the probability is the same for all sequences y such that∑

yi = s;

The distribution of y|s is called a conditionally uniform distribution - eachpossible sequence gets equal probability.

Simulating the conditional uniform distribution

A simulation of y|s can be generated as follows:

0. Put s ones into a bucket, along with m − s zeros.

1. Randomly select a number from the bucket, assign it to y1, and throw itaway.

2. Randomly select a number from the bucket, assign it to y2, and throw itaway.

...

m. Select the last number from the bucket and assign it to ym.

This is called (uniform) sampling without replacement.

Under this scheme, we will always have∑

yi = s.

ER graphs

Let’s return to our SRG model for a sociomatrix:

H : Y = {yi,j , i 6= j} ∼ binary(θ), for some θ ∈ [0, 1]

Under this model, (y1,2, . . . , yn−1,n) forms an i.i.d. binary sequence.

• The conditional distribution of this sequence given∑

yi,j = s is simply theconditional uniform distribution.

• Knowing∑

yi,j is the same as knowing y .

• The conditional distribution of Y given s is sometimes called theErdos-Reyni graph ( SRG(n, s) ).

• Conditioning on s is the same as conditioning on y .

Simulating from Y|s

rY.s <- function(n, s) {Y <- matrix(0, n, n)diag(Y) <- NAY[!is.na(Y)] <- sample(c(rep(1, s), rep(0, n * (n - 1) - s)))Y

}

rY.s(5, 3)

## [,1] [,2] [,3] [,4] [,5]## [1,] NA 0 0 0 1## [2,] 0 NA 0 0 0## [3,] 0 0 NA 0 1## [4,] 0 0 0 NA 0## [5,] 1 0 0 0 NA

rY.s(5, 10)

## [,1] [,2] [,3] [,4] [,5]## [1,] NA 1 1 1 0## [2,] 1 NA 0 0 0## [3,] 0 1 NA 1 1## [4,] 1 1 0 NA 0## [5,] 0 0 0 1 NA

Conditional tests

Suppose Y ∼ SRG(n, θ) for some θ ∈ [0, 1]. Then

Y shoud “look like” another sample from SRG(n, θ)

• (but we can’t generate these).

Y should also “look like” another sample from SRG(n, s), where s =∑

yi,j

• (we can generate these).

Conditional evaluation of the SRG: Given a test statistic t,

compare tobs = t(Y)

to t = t(Y),

where Y ∼ SRG(n,∑

yi,j).

Example: Monk friendships

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●


## [1] 0.2876

Cd(Y)

## [1] 0.07353

Cd(t(Y))

## [1] 0.4044

##

nrow(Y)

## [1] 18

sum(Y, na.rm = TRUE)

## [1] 88

Simulated networks

Ysim <- rY.s(nrow(Y), sum(Y, na.rm = TRUE))

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●


CD.H <- NULLfor (s in 1:S) {

Ysim <- rY.s(nrow(Y), sum(Y, na.rm = TRUE))CD.H <- rbind(CD.H, c(Cd(Ysim), Cd(t(Ysim))))

}


Den

sity

0.1 0.2 0.3 0.4 0.5

02

46

8


Den

sity

0.1 0.2 0.3 0.4 0.5 0.6

02

46

8


mean(CD.H[, 1] <= Cd(Y))

## [1] 0.0044

mean(CD.H[, 2] >= Cd(t(Y)))

## [1] 0.029

These can be interpreted as p-values, but a better thing to say is

• observed outdegree centralization was below the lower 1-percentile of thenull distribution;

• observed indegree centralization was above the upper 3-percentile of thenull distribution;

The interpretation is that both statistics show evidence against H.

Formal conditional testing

For any test statistic t, consider the following procedure:

1. observe Y

2. compute p = Pr(t(Y) > t(Y)|H,∑ yi,j =∑

yi,j)

3. reject H if p < α.

If Y ∼ SRG(n, θ) for some θ ∈ [0, 1] (i.e. H is true) , then

Pr(reject H) = α.

Comparison to ad-hoc approach

CD.H <- NULL

for (s in 1:S) {# Ysim<-rY.s( nrow(Y), sum(Y,na.rm=TRUE) )

Ysim <- matrix(rbinom(nrow(Y)^2, 1, mean(Y, na.rm = TRUE)), nrow(Y), nrow(Y))

CD.H <- rbind(CD.H, c(Cd(Ysim), Cd(t(Ysim))))

}mean(CD.H[, 1] <= Cd(Y))

## [1] 4e-04

mean(CD.H[, 2] >= Cd(t(Y)))

## [1] 0.0174


Den

sity

0.1 0.2 0.3 0.4 0.5

02

46


Den

sity

0.1 0.2 0.3 0.4 0.5

02

46

Example: High school friendships


Ysim <- rY.s(nrow(Y), sum(Y, na.rm = TRUE))CD.H <- rbind(CD.H, c(Cd(Ysim), Cd(t(Ysim))))

}


Den

sity

0.03 0.05 0.07 0.09

010

2030

4050

60


Den

sity

0.05 0.15 0.25

010

3050



# Ysim<-rY.s( nrow(Y), sum(Y,na.rm=TRUE) )Ysim <- matrix(rbinom(nrow(Y)^2, 1, mean(Y, na.rm = TRUE)), nrow(Y), nrow(Y))CD.H <- rbind(CD.H, c(Cd(Ysim), Cd(t(Ysim))))

}


Den

sity

0.04 0.06 0.08 0.10

010

2030

4050


Den

sity

0.05 0.15 0.25

010

2030

4050


Sd <- function(Y) {sd(apply(Y, 1, sum, na.rm = TRUE))

}SD.H <- NULLfor (s in 1:S) {

Ysim <- rY.s(nrow(Y), sum(Y, na.rm = TRUE))SD.H <- rbind(SD.H, c(Sd(Ysim), Sd(t(Ysim))))

}

outdegree sd

Den

sity

2.0 2.5 3.0 3.5

0.0

1.0

2.0

indegree sd

Den

sity

2 3 4 5 6

0.0

1.0

2.0


SD.H <- NULLfor (s in 1:S) {

# Ysim<-rY.s( nrow(Y), sum(Y,na.rm=TRUE) )Ysim <- matrix(rbinom(nrow(Y)^2, 1, mean(Y, na.rm = TRUE)), nrow(Y), nrow(Y))SD.H <- rbind(SD.H, c(Sd(Ysim), Sd(t(Ysim))))

}

outdegree sd

Den

sity

2.0 2.5 3.0 3.5

0.0

0.5

1.0

1.5

2.0

2.5

indegree sd

Den

sity

2 3 4 5 6

0.0

0.5

1.0

1.5

2.0

2.5

SRG (n, θ) for undirected graphs

n <- 10theta <- 0.1Y <- matrix(0, n, n)Y[upper.tri(Y)] <- rbinom(n * (n - 1)/2, 1, theta)Y <- Y + t(Y)Y

## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]## [1,] 0 1 0 0 0 0 0 0 0 0## [2,] 1 0 0 0 0 0 0 0 0 0## [3,] 0 0 0 0 0 0 0 0 1 0## [4,] 0 0 0 0 0 0 0 0 0 0## [5,] 0 0 0 0 0 0 0 0 0 0## [6,] 0 0 0 0 0 0 0 0 0 0## [7,] 0 0 0 0 0 0 0 0 0 0## [8,] 0 0 0 0 0 0 0 0 0 1## [9,] 0 0 1 0 0 0 0 0 0 0## [10,] 0 0 0 0 0 0 0 1 0 0

sum(Y)

## [1] 6

SRG (n, s) for undirected graphs

n <- 10s <- 10Y <- matrix(0, n, n)Y[upper.tri(Y)] <- sample(c(rep(1, s), rep(0, n * (n - 1)/2 - s)))Y <- Y + t(Y)Y

## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]## [1,] 0 1 0 0 0 0 0 0 0 0## [2,] 1 0 1 0 1 1 0 0 0 0## [3,] 0 1 0 0 0 1 0 0 0 0## [4,] 0 0 0 0 0 1 0 0 0 0## [5,] 0 1 0 0 0 1 1 0 0 0## [6,] 0 1 1 1 1 0 0 0 0 0## [7,] 0 0 0 0 1 0 0 0 0 0## [8,] 0 0 0 0 0 0 0 0 1 0## [9,] 0 0 0 0 0 0 0 1 0 1## [10,] 0 0 0 0 0 0 0 0 1 0

sum(Y)

## [1] 20

Row and column effects567 Statistical analysis of social networks

Peter Hoff


Monk data

●

●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

020

50

outdegree0 3 6 11 26

020

40

indegree0 2 4 6 8 15

sd(rsum(Y))

## [1] 0.9003

sd(csum(Y))

## [1] 2.698

Testing the SRG

rSRGfix <- function(n, s) {Y <- matrix(0, n, n)diag(Y) <- NAY[!is.na(Y)] <- sample(c(rep(1, s), rep(0, n * (n - 1) - s)))Y

}

n <- nrow(Y)s <- sum(Y, na.rm = TRUE)

sdo <- sd(rsum(Y))sdi <- sd(csum(Y))

DSD <- NULLfor (sim in 1:S) {

Ysim <- rSRGfix(n, s)DSD <- rbind(DSD, c(sd(rsum(Ysim)), sd(csum(Ysim))))

}

Testing the SRG

outdegree sd

Den

sity

1.0 1.5 2.0 2.5

0.0

0.4

0.8

1.2

indegree sd

Den

sity

1.0 1.5 2.0 2.5 3.00.

00.

40.

81.

2

Testing the SRG

Compared to any SRG,

• the nodes show little heterogeneity in outdegree;

• the nodes show large heterogenetiy in indegree.

No SRG model provides a good representation of Y in terms of its degrees.

What kind of model would provide a better representation of the data?

ANOVA for factorial data

Consider the data matrix for a generic two-factor layout:

1 2 · · · K2

1 y1,1 y1,2 · · · y1,K2

2 y2,1 y2,2 · · · y2,K2

......

K1 yK1,1 yK1,2 · · · yK1,K2

The standard “ANOVA” row and column effects (RCE) model for such data is

yi,j = µ+ ai + bj + εi,j

• µ represents an overall mean;

• ai represents the average deviation of row i from µ;

• bj represents the average deviation of column j from µ.

More abstractly,

• ai is the “effect” of factor one having level i ;

• bj is the “effect” of factor two being level j .

ANOVA for factorial designs

ML and OLS estimates are given by

• µ = y··;

• ai = yi· − y··;

• bj = y·j − y··.

Note that

mean(a1, . . . , aK1) =1

K1

∑(yi· − y··)

=1

K1

∑yi· − y··

=1

K1K2

∑∑yi,j − y··

= y·· − y·· = 0.

var (a1, . . . , aK1) = var(y1·, . . . , yK1·).

RCE model interpretation

For the RCE model

• the model is parameterized in terms of a mean and row and columnheterogeneity;

• heterogeneity in the row sums is represented by heterogeneity in theparameter estimates.

• tests for row and column heterogeneity can be made via F -tests.

Back to binary directed network data:

• the data take the form of a two-way layout;

• we’ve seen empirically the existence of row and column heterogeneity.

Is there an analogous RCE model for network data?

Odd and probabilities

Let Pr(Yi,j = 1) = µ. Then

odds(Yi,j = 1) = µ/(1− µ)

log odds(Yi,j = 1) = log µ/(1− µ) = θ

Now write the probability in terms of the log odds θ:

log odds(Yi,j = 1) = θ

odds(Yi,j = 1) = eθ

µ/(1− µ) = eθ

µ = eθ − µeθ

µ(1 + eθ) = eθ

µ = Pr(Yi,j = 1) =eθ

1 + eθ

Logistic regression

Let {Yi,j : i 6= j} be independent binary random variables with

Pr(Yi,j) = µi,j .

• If µi,j = µ for all i , j , then we have the SRG(n, µ) model.

• Our hypothesis tests generally reject this model.

Thus the hypothesis of homogeneous µi,j ’s is rejected.

What heterogeneity in the µi,j ’s will help us capture the observed datapatterns?

Binary RCE model

RCE model: {Yi,j : i 6= j} are independent with

Pr(Yi,j = 1) =eµi,j

1 + eµi,j

µi,j = µ+ ai + bj

or more compactly,

Pr(Yi,j = 1) =eµ+ai+bj

1 + eµ+ai+bj

Interpretation:

odds(Yi,j = 1) = eµ+ai+bj

odds ratio(Yi,j = 1 : Yk,j = 1) = eµ+ai+bj /eµ+ak+bj

log odds ratio(Yi,j = 1 : Yk,j = 1) = [µ+ ai + bj ]− [µ+ ak + bj ] = ai − ak

Binary RCE model


1 + eµ+ai+bj

The differences among the ai ’s represents heterogeneity among nodes in termsof the probability of sending a tie, and similarly for the bj ’s.

• The ai ’s can represent heterogeneity in nodal outdegree;

• The bj ’s can represent heterogeneity in nodal outdegree.

Caution: The row and column effects a and b are only identified up todifferences. Suppose c + d + e = 0:


1 + eµ+ai+bj

=eµ+ai+bj+c+d+e

1 + eµ+ai+bj+c+d+e

=e(µ+c)+(ai+d)+(bj+e)

1 + e(µ+c)+(ai+d)+(bj+e)=

eµ+ai+bj

1 + eµ+ai+bj

It is standard to restrict the estimates somehow to reflect this:

• Sum to zero side conditions :∑

ai = 0,∑

bj = 0

• Set to zero side conditions : a1 = b1 = 0

Fitting the binary RCE in R

To make an anaolgy to the two-way factorial design,

• sender index = row factor;

• receiver index = columns factor.

To fit the RCE model with logistic regrssion in R, we need to

• code the senders as a row factor;

• code the receivers as a column factor;

• convert everything to vectors.


Ridx <- matrix((1:nrow(Y)), nrow(Y), nrow(Y))Cidx <- t(Ridx)

Ridx[1:4, 1:4]

## [,1] [,2] [,3] [,4]## [1,] 1 1 1 1## [2,] 2 2 2 2## [3,] 3 3 3 3## [4,] 4 4 4 4

Cidx[1:4, 1:4]

## [,1] [,2] [,3] [,4]## [1,] 1 2 3 4## [2,] 1 2 3 4## [3,] 1 2 3 4## [4,] 1 2 3 4

Y[1:4, 1:4]

## ROMUL BONAVEN AMBROSE BERTH## ROMUL NA 1 1 0## BONAVEN 1 NA 0 0## AMBROSE 1 1 NA 0## BERTH 0 0 0 NA


y <- c(Y)ridx <- c(Ridx)cidx <- c(Cidx)

y[1:22]

## [1] NA 1 1 0 1 1 1 1 0 0 0 1 0 1 1 1 0 1 1 NA 1 0

ridx[1:22]

## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1 2 3 4

cidx[1:22]

## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2


fit <- glm(y ~ factor(ridx) + factor(cidx), family = binomial)summary(fit)

#### Call:## glm(formula = y ~ factor(ridx) + factor(cidx), family = binomial)#### Deviance Residuals:## Min 1Q Median 3Q Max## -1.597 -0.804 -0.573 0.909 2.215#### Coefficients:## Estimate Std. Error z value Pr(>|z|)## (Intercept) 1.115 0.756 1.48 0.1400## factor(ridx)2 -0.318 0.778 -0.41 0.6832## factor(ridx)3 -0.455 0.782 -0.58 0.5612## factor(ridx)4 -0.763 0.814 -0.94 0.3482## factor(ridx)5 -0.356 0.782 -0.45 0.6492## factor(ridx)6 -0.816 0.810 -1.01 0.3142## factor(ridx)7 -0.395 0.784 -0.50 0.6139## factor(ridx)8 -0.104 0.765 -0.14 0.8923## factor(ridx)9 -0.782 0.813 -0.96 0.3362## factor(ridx)10 -0.167 0.762 -0.22 0.8269## factor(ridx)11 -0.455 0.782 -0.58 0.5612## factor(ridx)12 -0.723 0.813 -0.89 0.3739## factor(ridx)13 -1.172 0.863 -1.36 0.1743## factor(ridx)14 -0.395 0.784 -0.50 0.6139## factor(ridx)15 -0.145 0.764 -0.19 0.8492## factor(ridx)16 -0.455 0.782 -0.58 0.5612## factor(ridx)17 -0.816 0.810 -1.01 0.3142## factor(ridx)18 -0.145 0.764 -0.19 0.8492## factor(cidx)2 -0.272 0.716 -0.38 0.7041## factor(cidx)3 -2.216 0.823 -2.69 0.0071 **## factor(cidx)4 -1.558 0.745 -2.09 0.0365 *## factor(cidx)5 -0.760 0.712 -1.07 0.2856## factor(cidx)6 -2.714 0.916 -2.96 0.0030 **## factor(cidx)7 -1.262 0.727 -1.74 0.0826 .## factor(cidx)8 -1.518 0.744 -2.04 0.0414 *## factor(cidx)9 -1.867 0.774 -2.41 0.0158 *## factor(cidx)10 -2.677 0.916 -2.92 0.0035 **## factor(cidx)11 -2.216 0.823 -2.69 0.0071 **## factor(cidx)12 -1.026 0.717 -1.43 0.1522## factor(cidx)13 -1.576 0.744 -2.12 0.0342 *## factor(cidx)14 -1.262 0.727 -1.74 0.0826 .## factor(cidx)15 -2.196 0.822 -2.67 0.0076 **## factor(cidx)16 -2.216 0.823 -2.69 0.0071 **## factor(cidx)17 -2.714 0.916 -2.96 0.0030 **## factor(cidx)18 -2.196 0.822 -2.67 0.0076 **## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### (Dispersion parameter for binomial family taken to be 1)#### Null deviance: 367.18 on 305 degrees of freedom## Residual deviance: 327.43 on 271 degrees of freedom## (18 observations deleted due to missingness)## AIC: 397.4#### Number of Fisher Scoring iterations: 4


fit <- glm(y ~ C(factor(ridx), sum) + C(factor(cidx), sum), family = binomial)summary(fit)

#### Call:## glm(formula = y ~ C(factor(ridx), sum) + C(factor(cidx), sum),## family = binomial)#### Deviance Residuals:## Min 1Q Median 3Q Max## -1.597 -0.804 -0.573 0.909 2.215#### Coefficients:## Estimate Std. Error z value Pr(>|z|)## (Intercept) -1.0352 0.1427 -7.25 4e-13 ***## C(factor(ridx), sum)1 0.4700 0.5241 0.90 0.36983## C(factor(ridx), sum)2 0.1525 0.5495 0.28 0.78137## C(factor(ridx), sum)3 0.0155 0.5538 0.03 0.97772## C(factor(ridx), sum)4 -0.2933 0.5918 -0.50 0.62020## C(factor(ridx), sum)5 0.1143 0.5542 0.21 0.83659## C(factor(ridx), sum)6 -0.3456 0.5875 -0.59 0.55641## C(factor(ridx), sum)7 0.0746 0.5562 0.13 0.89336## C(factor(ridx), sum)8 0.3664 0.5335 0.69 0.49225## C(factor(ridx), sum)9 -0.3118 0.5908 -0.53 0.59767## C(factor(ridx), sum)10 0.3032 0.5292 0.57 0.56663## C(factor(ridx), sum)11 0.0155 0.5538 0.03 0.97772## C(factor(ridx), sum)12 -0.2532 0.5919 -0.43 0.66885## C(factor(ridx), sum)13 -0.7020 0.6499 -1.08 0.28006## C(factor(ridx), sum)14 0.0746 0.5562 0.13 0.89336## C(factor(ridx), sum)15 0.3247 0.5315 0.61 0.54125## C(factor(ridx), sum)16 0.0155 0.5538 0.03 0.97772## C(factor(ridx), sum)17 -0.3456 0.5875 -0.59 0.55641## C(factor(cidx), sum)1 1.6805 0.5058 3.32 0.00089 ***## C(factor(cidx), sum)2 1.4084 0.4928 2.86 0.00427 **## C(factor(cidx), sum)3 -0.5358 0.6210 -0.86 0.38825## C(factor(cidx), sum)4 0.1227 0.5266 0.23 0.81572## C(factor(cidx), sum)5 0.9203 0.4862 1.89 0.05837 .## C(factor(cidx), sum)6 -1.0337 0.7273 -1.42 0.15520## C(factor(cidx), sum)7 0.4183 0.5050 0.83 0.40754## C(factor(cidx), sum)8 0.1623 0.5269 0.31 0.75801## C(factor(cidx), sum)9 -0.1864 0.5621 -0.33 0.74021## C(factor(cidx), sum)10 -0.9966 0.7277 -1.37 0.17083## C(factor(cidx), sum)11 -0.5358 0.6210 -0.86 0.38825## C(factor(cidx), sum)12 0.6540 0.4916 1.33 0.18338## C(factor(cidx), sum)13 0.1043 0.5254 0.20 0.84271## C(factor(cidx), sum)14 0.4183 0.5050 0.83 0.40754## C(factor(cidx), sum)15 -0.5156 0.6210 -0.83 0.40636## C(factor(cidx), sum)16 -0.5358 0.6210 -0.86 0.38825## C(factor(cidx), sum)17 -1.0337 0.7273 -1.42 0.15520## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### (Dispersion parameter for binomial family taken to be 1)#### Null deviance: 367.18 on 305 degrees of freedom## Residual deviance: 327.43 on 271 degrees of freedom## (18 observations deleted due to missingness)## AIC: 397.4#### Number of Fisher Scoring iterations: 4


mu.hat <- fit$coef[1]a.hat <- fit$coef[1 + 1:(nrow(Y) - 1)]a.hat <- c(a.hat, -sum(a.hat))b.hat <- fit$coef[nrow(Y) + 1:(nrow(Y) - 1)]b.hat <- c(b.hat, -sum(b.hat))

0 200 400 600 800 1000

2.5

3.0

3.5

4.0

4.5

5.0

iteration

s.H

[, j]

0 200 400 600 800 1000

1.5

2.0

2.5

iteration

s.H

[, j]

0 200 400 600 800 1000

1020

3040

iteration

s.H

[, j]

• Why do nodes with the same outdegrees have different ai ’s?

• Why do nodes with the same indegrees have different bj ’s?

Understanding the MLEs

theta.mle <- mu.hat + outer(a.hat, b.hat, "+")p.mle <- exp(theta.mle)/(1 + exp(theta.mle))diag(p.mle) <- NA

0 20 40 60 80 100

2.5

3.0

3.5

4.0

4.5

5.0

iteration

s.H

[, j]

0 20 40 60 80 100

1.5

2.0

2.5

iteration

s.H

[, j]

0 20 40 60 80 100

510

1520

2530

iteration

s.H

[, j]

Understanding MLEs

Pr(Y|µ, a, b) =∏

i 6=j

e(µ+ai+bj )yi,j

1 + eµ+ai+bj

= exp(µy·· +∑

aiyi· +∑

bjy·j)∏

i 6=j

(1 + eµ+ai+bj )−1

Let’s find the maximizer of the log-likelihood in ai :

log Pr(Y|µ, a, b) = µy·· +∑

aiyi· +∑

bjy·j −∑

i 6=j

log(1 + eµ+ai+bj )

∂∂ai

log Pr(Y|µ, a, b) = yi· −∑

i 6=j

eµ+ai+bj

1 + eµ+ai+bj

The MLE occurs at values (µ, a, b) for which

yi· =∑

j :i 6=j

eµ+ai+bj

1 + eµ+ai+bj

i.e., yi· = pi·

Model evaluation: Best case scenario

How does the “best” RCE model compare to the data?

log odds(Yi,j = 1) = θi,j

θi,j = µ+ ai + bj

θi,j = µ+ ai + bj

theta.mle <- mu.hat + outer(a.hat, b.hat, "+")p.mle <- exp(theta.mle)/(1 + exp(theta.mle))

t.M <- NULLfor (s in 1:S) {

Ysim <- matrix(rbinom(n^2, 1, p.mle), n, n)diag(Ysim) <- NAt.M <- rbind(t.M, c(mmean(Ysim), sd(rsum(Ysim)), sd(csum(Ysim))))

}

Model evaluation: Best case scenario

tH

Den

sity

0.25 0.30 0.35

05

1015

tH

Den

sity

1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

tH

Den

sity

2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.2

0.4

0.6

0.8

1.0

• improvement in terms of indegree heterogeneity;

• still a discrepancy in terms of outdegree heterogeneity (why?)

Comparison to the i.i.d. model

tH

Den

sity

0.20 0.25 0.30 0.35

05

1015

tHD

ensi

ty1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

tH

Den

sity

1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

1.2

tH

Den

sity

0.20 0.25 0.30 0.35

05

1015

tH

Den

sity

1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

tH

Den

sity

1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

Model comparisonOften it is desirable to have a numerical comparions of two models.

SRG : Simple model, shows lack of fit.

RCE : Complex model, shows less lack of fit.

Is the RCE model better? Is the added complexity “worth it”?

Selecting between two false models requires a balance between

model fit and model complexity

• Overly simple model: high explanatory power, poor fit to data, poorgeneralizability;

• Overly complex model: low explanatory power, great fit to data, poorgeneralizability;

• Sufficiently complex model: good explanatory power, good fit to data, andgood generalizability.

These notions can be made precise in various contexts:

• communication and information theory;

• out-of-sample predictive performance.

Model comparison

model performance = model fit - model complexity

Various numerical criteria have been suggested for fit and complexity.One proposal is

• fit = maximized log likelihood

• complexity = number of parameters

which leads to

˜AIC = log p(Y|θ)− p

For historical reasons, this is often multiplied by −2:

AIC = −2 log p(Y|θ) + 2p,

so that a low value of the AIC is better.

Model comparison

fit.0 <- glm(y ~ 1, family = binomial)fit.rc <- glm(y ~ C(factor(ridx), sum) + C(factor(cidx), sum), family = binomial)

AIC(fit.0)

## [1] 369.2

AIC(fit.rc)

## [1] 397.4

This isn’t too surprising:• fit.rc gives a better fit, but• fit.rc has n + n − 2 = 34 more parameters that fit.0 .

Can you think of a better model?Recall fit.rc only seemed to improve the indegree fit.

fit.r <- glm(y ~ C(factor(ridx), sum), family = binomial)fit.c <- glm(y ~ C(factor(cidx), sum), family = binomial)

AIC(fit.r)

## [1] 399.1

AIC(fit.c)

## [1] 368.3

Model comparison

mu.hat <- fit.c$coef[1]b.hat <- c(fit.c$coef[2:18], -sum(fit.c$coef[2:18]))theta.mle <- mu.hat + outer(rep(0, n), b.hat, "+")p.mle <- exp(theta.mle)/(1 + exp(theta.mle))

t.M <- NULLfor (s in 1:S) {

Ysim <- matrix(rbinom(n^2, 1, p.mle), n, n)diag(Ysim) <- NAt.M <- rbind(t.M, c(mmean(Ysim), sd(rsum(Ysim)), sd(csum(Ysim))))

}

tH

Den

sity

0.25 0.30 0.35

05

1015

tH

Den

sity

1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

tH

Den

sity

2.0 2.5 3.0 3.5 4.0 4.50.

00.

20.

40.

60.

81.

0

Model comparison

For these monk data,

• among these models, the column effects model is “best”

• all models show lack of fit in terms of outdegree heterogeneity.

The lack of fit is not surprising - we know an independence model is wrong:

• monks were limited to three nominations per time period.

Ideally, such design features of the data would be incorporated into the model.We will discuss such models towards the end of the quarter.

Conflict in the 90s

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

020

50


020

40


sd(rsum(Y))

## [1] 3.589

sd(csum(Y))

## [1] 1.984

Model selection

fit.0 <- glm(y ~ 1, family = binomial)fit.r <- glm(y ~ C(factor(ridx), sum), family = binomial)fit.c <- glm(y ~ C(factor(cidx), sum), family = binomial)fit.rc <- glm(y ~ C(factor(ridx), sum) + C(factor(cidx), sum), family = binomial)

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

AIC(fit.0)

## [1] 2198

AIC(fit.r)

## [1] 1948

AIC(fit.c)

## [1] 2176

AIC(fit.rc)

## [1] 1897

For these data, the full RCE model is best among these four.

Best case comparison

tH

Den

sity

0.010 0.011 0.012 0.013 0.014

010

020

030

040

050

0

tH

Den

sity

3.0 3.5 4.0 4.5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

tH

Den

sity

1.8 2.0 2.2 2.4 2.6 2.8

0.0

0.5

1.0

1.5

2.0

Row effects only

mu.hat <- fit.r$coef[1]a.hat <- fit.r$coef[1 + 1:(nrow(Y) - 1)]a.hat <- c(a.hat, -sum(a.hat))theta.mle <- mu.hat + outer(a.hat, rep(0, nrow(Y)), "+")

p.mle <- exp(theta.mle)/(1 + exp(theta.mle))

tH

Den

sity

0.009 0.010 0.011 0.012 0.013 0.014 0.015

010

020

030

040

0

tH

Den

sity

3.0 3.5 4.0 4.5 5.0

0.0

0.2

0.4

0.6

0.8

1.0

tH

Den

sity

1.0 1.2 1.4 1.6 1.8 2.0

01

23

4

Fit comparison

tH

Den

sity

0.009 0.010 0.011 0.012 0.013 0.014 0.015

010

020

030

040

0

tHD

ensi

ty3.0 3.5 4.0 4.5 5.0

0.0

0.2

0.4

0.6

0.8

1.0

tH

Den

sity

1.0 1.5 2.0 2.5

01

23

4

tH

Den

sity

0.009 0.010 0.011 0.012 0.013 0.014 0.015

010

020

030

040

050

0

tH

Den

sity

3.0 3.5 4.0 4.5 5.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

tH

Den

sity

1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

Conditional testing

Recall the “best case scenario” evaluation is somewhat ad-hoc.

Compare to the conditional evaluation:

Suppose Y ∼ SRG(n, θ):

• {Y|y··} 6∼ SRG(n, θ);

• {Y|y··} ∼ ERG(n, y··).

Similarly, suppose Y ∼ RCE(µ, a, b) :

• {Y|y··, {yi·}, {y·i}} 6∼ RCE(µ, a, b).

• {Y|y··, {yi·}, {y·i}} ∼?

Testing reciprocity in RCE models567 Statistical analysis of social networks

Peter Hoff


Conflict in the 90s

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

02

4

outdegree1 3 5 7 9 13 18

02

46

indegree1 2 3 4 5 6 7 8 10

sd(rsum(Y))

## [1] 3.589

sd(csum(Y))

## [1] 1.984

Model selection


## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

AIC(fit.0)

## [1] 2198

AIC(fit.r)

## [1] 1948

AIC(fit.c)

## [1] 2176

AIC(fit.rc)

## [1] 1897

For these data, the full RCE model is best among these four.

Evaluating the RCE model

H : log odds(Yi,j = 1) = µ+ ai + bj , Yi,j ’s independent

Let’s evaluate H with the following test statistics: s(Y) = {s1(Y), s2(Y), s3(Y)}

• s1(Y) = sd(outdegree) ;

• s2(Y) = sd(indegree) ;

• s3(Y) = reciprocated dyads

s3(Y) =∑

i<j

yi,jyj,i

If H is true, then

• Y should look like Y ∼ RCE(µ, a, b) for some (µ, a, b), but

• we can’t simulate from this distribution as we don’t know (µ, a, b).

We will first use the ad-hoc “best-case” approach.


mu.hat <- fit.rc$coef[1]a.hat <- fit.rc$coef[1 + 1:(nrow(Y) - 1)]a.hat <- c(a.hat, -sum(a.hat))b.hat <- fit.rc$coef[nrow(Y) + 1:(nrow(Y) - 1)]b.hat <- c(b.hat, -sum(b.hat))

theta.mle <- mu.hat + outer(a.hat, b.hat, "+")p.mle <- exp(theta.mle)/(1 + exp(theta.mle))

s.H <- NULLfor (s in 1:S) {

Ys <- matrix(rbinom(nrow(Y)^2, 1, p.mle), nrow(Y), nrow(Y))diag(Ys) <- NAs.H <- rbind(s.H, c(sd(rsum(Ys)), sd(csum(Ys)), sum(Ys * t(Ys)/2, na.rm = TRUE)))

}


tH

Den

sity

3.0 3.5 4.0 4.5 5.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

tH

Den

sity

1.8 2.0 2.2 2.4 2.6 2.8 3.0

0.0

0.5

1.0

1.5

2.0

tH

Den

sity

0 10 20 30 40

0.00

0.05

0.10

0.15

Conditional testing

Recall the “best case scenario” evaluation is somewhat ad-hoc.

Compare to the conditional evaluation:

Suppose Y ∼ SRG(n, θ):

• {Y|y··} 6∼ SRG(n, θ);

• {Y|y··} ∼ SRG(n, y··).

Similarly, suppose Y ∼ RCE(µ, a, b) :

• {Y|y··, {yi·}, {y·i}} 6∼ RCE(µ, a, b).

• {Y|y··{yi·}, {y·i}} ∼?

Conditioning in exponential families

Return to the SRG:

Pr(Yi,j = 1|θ) = θ =eµ

1 + eµ

Pr(Yi,j = yi,j |θ) =eµyi,j

1 + eµ

Pr(Y = y|θ) = eµy··g(µ)

This is a very simple exponential family model, orexponentially parameterized random graph model (ERGM).

More generally, an ERGM is of the form

Pr(Y = y|θ) = et(y)·θg(θ),

where

• t(y) = (t1(y), . . . , tp(y)) is a vector of statistics;

• θ = (θ1, . . . , θp) is a vector of parameters;

• t(y) · θ =∑

tj(y)θj .

RCE as ERGMCan the RCE model be expressed as an ERGM?

Pr(Y|µ, a, b) =∏

i 6=j

e(µ+ai+bj )yi,j

1 + eµ+ai+bj

= exp(µy·· +∑

aiyi· +∑

bjy·j)∏

i 6=j

(1 + eµ+ai+bj )−1

= exp(t(y) · θ)g(θ)

where

t(y) = (y··, y1·, . . . , yn·, y·1, . . . , y·n)

θ = (µ, a1, . . . , an, b1, . . . , bn)

So yes, the RCE model is an ERGM. The sufficient statistics that generate themodel are the out and indegrees.

• the sum y·· can be computed from the degrees;

• the term “sufficient” means sufficient for inferring the parameters,assuming the model is correct.

Conditional tests for ERGMs

Suppose you want to evaluate the adequacy of an ERGM:

H : Pr(Y|θ) = et(y)·θg(θ) , for some θ ∈ Θ.

Consider evaluation of H based on the statistics s(y)(where s is not a function of t).

How can we reject or accept H based on s, without knowing θ?


Recall our principle for testing:

If Y ∼ ERGM(t, θ) for some θ ∈ Θ, then

Y should “look like” another sample from ERGM(t, θ)

• (but we can’t generate these).

Y should “look like” samples Y from ERGM(t, θ) for which s(Y) = s(Y)

• (can we generate these?)


Pr(Y = y|θ, t(Y) = t(y)) =Pr(Y = y ∩ t(Y) = t(y)|θ)

Pr(t(Y) = t(y)|θ)

=exp(t(y) · θ)g(θ)1(t(Y) = t(y))∑˜Y exp(t(y) · θ)g(θ)1(t(Y) = t(y))

=1(t(Y) = t(y))∑˜Y 1(t(Y) = t(y))

This is the uniform distribution over graphs Y for which t(Y) = t(y).


Conditional testing procedure

1. Compute sobs = s(Y);

2. For k ∈ {1, . . . ,K}:2.1 Simulate Yk uniformly from graphs with t(Y) = t(Y);

2.2 Compute sk = s(Y).

3. Compare sobs to s1, . . . , sK .

Conditionally uniform distributions

t(Y) = {y··, y1·, . . . , yn·, y·1, . . . , y·n}

How can we simulate uniformly from the set of graphs with t(Y) = t(Yobs) ?

Rejection sampling:. Given a current set of simulations {Y(1), . . . ,Y(s)} ,

1. Simulate Y ∼ SRG(n, y obs·· )

2. If t(Y) = t(Yobs), then set Y(s+1) = Y. Otherwise, return to step 1.

As you can imagine, this algorithm is not practical for large n.

MCMC sampling

MCMC sampling: Given a current set of simulations {Y(1), . . . ,Y(s)} ,

1. Make a random perturbation Y of Y(s) so that t(Y) = t(Y(s) = t(Yobs);

2. Set Y(s+1) = Y.

This generates a random dependent sequence from Pr(Y|t(Y) = t(Yobs)).

• dependent, because Y(s+1) depends on Y(s).

• contrast this with sampling Y(1), . . . ,Y(s) i.i.d. from SRG(n, y··).

Random perturbations

Given Y, how can we construct a perturbation Y′ so t(Y) = t(Y′)?

NA 0 1 0 1 0 1 31 NA 0 1 1 0 0 30 1 NA 0 0 0 0 10 1 1 NA 0 1 1 41 0 0 0 NA 0 0 10 0 1 1 1 NA 1 41 0 1 0 0 0 NA 23 2 4 2 3 1 3 18

Suppose we randomly switch a tie with a non-tie, within a row:


Suppose we randomly switch a tie with a non-tie, within a row:

NA 0 1 0 1 0 1 31 NA 0 1 1 → 0 0 → 1 0 3 → 30 1 NA 0 0 0 0 10 1 1 NA 0 1 1 41 0 0 0 NA 0 0 10 0 1 1 1 NA 1 41 0 1 0 0 0 NA 23 2 4 2 3 → 2 1 → 2 3 18

The outdegrees are maintained, but the indegrees change.


Suppose we randomly switch a tie with a non-tie, within a column:

NA 0 1 0 1 0 1 31 NA 0 1 1 → 0 0 0 3→ 20 1 NA 0 0 0 0 10 1 1 NA 0 1 1 41 0 0 0 NA 0 0 10 0 1 1 1 NA 1 41 0 1 0 0 → 1 0 NA 2→ 33 2 4 2 3 → 3 1 3 18

The indegrees are maintained, but the outdegrees change.


To perturb while maintaining both in and outdegrees, we must update at leastfour cells at once:

NA 0 1 0 1 0 1 31 NA 0→ 1 1 1 → 0 0 0 3 → 30 1 NA 0 0 0 0 10 1 1 NA 0 1 1 41 0 0 0 NA 0 0 10 0 1 1 1 NA 1 41 0 1 → 0 0 0 → 1 0 NA 3 → 33 2 3 → 3 2 3 → 3 1 3 18

The in and outdegrees are maintained.


To perturb while maintaining both in and outdegrees, we must update at leastfour cells at once:

Algorithm:

1. Randomly select two rows i1, i2 and two columns j1, j2

2. Obtain the submatrix Yij = Y[(i1, i2), (j1, j2)].

3. Perturb Y[(i1, i2), (j1, j2)] as follows:

• If Yij =

(1 00 1

), set Y[(i1, i2), (j1, j2)] =

(0 11 0

)

• If Yij =

(0 11 0

), set Y[(i1, i2), (j1, j2)] =

(1 00 1

)

Iteration of this algorithm generates a random sequence Y(1), . . . ,Y(S)

• the value of t(Y(s)) is constant throughout the sequence;

• the sequence visits a subset of Y’s for which t(Y) = t(Y(1)).

This means the algorithm samples uniformly from a subset of

{Y : t(Y) = t(Yobs)}.

An MCMC sampler

To sample uniformly from all of {Y : t(Y) = t(Yobs)}, we need to be able tomake larger perturbations to Y.

Here is an outline of an algorithm that works properly:

Given a sequence {Y(1), . . . ,Y(s)}, generate Y(s+1) as follows:

1. Construct Y1 by perturbing a random subsquare of Y(s+1) as before;

2. Construct Y2 by perturbing a random triad of Y1;

3. Set Y(s+1) = Y2.

How is the perturbation of a triad done?

1. Randomly select a triad (i , j , k);

2. Perturb the triad:• If i → j → k → i , then set i ← j ← k ← i ;• If i ← j ← k ← i , then set i → j → k → i .

Exercise: Show that such perturbations leave in and outdegrees unchanged.

A single iteration

rY.Yrc <- function(Y) {n <- nrow(Y)

###i <- sample(1:n, 4)Yi <- Y[i[1:2], i[3:4]]if (abs(Yi[1, 1] + Yi[2, 2] - Yi[1, 2] - Yi[2, 1]) == 2) {

Y[i[1:2], i[3:4]] <- 1 - Yi}###

###i <- sample(1:n, 3)idx <- rbind(c(i[1], i[2]), c(i[1], i[3]), c(i[2], i[3]), c(i[2], i[1]),

c(i[3], i[1]), c(i[3], i[1]))y <- Y[idx]if (all(y[2 * (1:3) - 1] == 1 - y[2 * (1:3)])) {

Y[idx] <- 1 - y}###

Y}

A more efficient samplerrY.Yrc <- function(Y) {

###

n <- nrow(Y)

i1 <- resample((1:n)[apply(Y, 1, sum, na.rm = TRUE) > 0], 1)

j1 <- resample(which(Y[i1, ] == 1), 1)

j2 <- resample(which(Y[i1, ] == 0), 1)

yj1j2 <- Y[, c(j1, j2)]

if (length(c(j1, j2)) == 2) {nnodes <- which(yj1j2[, 1] == 0 & yj1j2[, 2] == 1)

if (length(nnodes) > 0) {i2 <- resample(nnodes, 1)

if (length(i2) == 1) {Y[c(i1, i2), c(j1, j2)] <- 1 - Y[c(i1, i2), c(j1, j2)]

}}

}###

###

Y1 <- Y

diag(Y1) <- 0

Y2 <- Y1 %*% Y1

ikt <- which(Y2 * t(Y1) * (1 - Y1) > 0, arr.ind = TRUE)

ik <- ikt[resample(1:nrow(ikt), 1), ]

j <- resample(which(Y1[ik[1], ] == 1 & Y1[, ik[1]] == 0 & Y1[, ik[2]] ==

1 & Y1[ik[2], ] == 0), 1)

if (length(j) > 0) {ijk <- c(ik[1], j, ik[2])

Y[ijk, ijk] <- 1 - Y[ijk, ijk]

}###

Y

}

Evaluating the RCE model

H : log odds(Yi,j = 1) = µ+ ai + bj , Yi,j ’s independent

If H is true, then

• Y should look like Y ∼ RCE(µ, a, b) for some (µ, a, b).• We can’t simulate from this distribution as we don’t know (µ, a, b).

• Y should look like Y ∼ uniform on t(Y) = t(Y).• We can sample from this distribution using MCMC.

For the conflict data, let’s evaluate H with some test statistics:

s(Y) = {s1(Y), s2(Y), s3(Y)}• s1(Y) = sd(outdegree) ;

• s2(Y) = sd(indegree) ;

• s3(Y) = reciprocated dyads

s3(Y) =∑

i<j

yi,jyj,i

MCMC approximation

s.H <- c(sd(rsum(Y)), sd(csum(Y)), sum(Y * t(Y), na.rm = TRUE)/2)

Ys <- Yfor (s in 1:S) {

Ys <- rY.Yrc(Ys)s.H <- rbind(s.H, c(sd(rsum(Ys)), sd(csum(Ys)), sum(Ys * t(Ys)/2, na.rm = TRUE)))

}

0 200 400 600 800 1000

2.5

3.0

3.5

4.0

4.5

5.0

iteration

s.H

[, j]

0 200 400 600 800 1000

1.5

2.0

2.5

iteration

s.H

[, j]

0 200 400 600 800 100010

2030

40iteration

s.H

[, j]

MCMC approximation

s.H <- matrix(0, nrow = Sbig/Sout, ncol = 3)

Ys <- Yfor (s in 1:Sbig) {

Ys <- rY.Yrc(Ys)if (s%%Sout == 0) {

s.H[s/Sout, ] <- c(sd(rsum(Ys)), sd(csum(Ys)), sum(Ys * t(Ys)/2, na.rm = TRUE))}

}

0 20 40 60 80 100

2.5

3.0

3.5

4.0

4.5

5.0

iteration

s.H

[, j]

0 20 40 60 80 100

1.5

2.0

2.5

iteration

s.H

[, j]

0 20 40 60 80 100

510

1520

2530

iteration

s.H

[, j]

Reciprocity

reciprocal dyads

Den

sity

10 20 30 40

0.00

0.08

The RCE model fails to capture the number of reciprocal dyads:

• There are more reciprocal dyads than expected under any RCE model;

• This makes sense - conflict is likely to be reciprocated;

• Statistically, this suggests that Yi,j and Yj,i are not independent.

References

• Wasserman and Faust, sections 13.5-13.8

• Rao, Jana, Bandyopadhyay 1996. A Markov chain Monte Carlo methodfor generating random (0,1)-matrices with given marginals.

• McDonald, Smith and Forster 2007. Markov chain Monte Carlo exactinference for social networks.

Dyads

(Wasserman and Faust Chapter 13)A dyad is an unordered pair of nodes

• {i , j} = {j , i} is the pair of nodes i and j .

For an undirected binary relation, the dyad can be in one of four states:

• i 6 −j (null)

• i → j (asymmetric)

• i ← j (asymmetric)

• i ↔ j (mutual)

These correspond to the following values of (yi,j , yj,i )

• i 6 −j (0,0)

• i → j (1,0)

• i ← j (0,1)

• i ↔ j (1,1)

Reciprocity

Reciprocity or mutuality describes the tendency for ties to be mutual.

A quantification of reciprocity is a standard part of SNA

• Statistical analysis requires we account for dyadic dependence, if present.

• Reciprocity by itself may be of interest in terms of evaluating social theory.

The dyad censusLet

• M= the number of mutual dyads;

• A= the number of asymmetric dyads;

• N= the number of null dyads.

Then M + A + N = the number of dyads =(n2

).

Computing M: Recall, yi,jyj,i = 1 only if both yi,j and yj,i are 1.

M =∑

i<j

yi,jyj,i

Computing A: A equals the number of links minus the number of reciprocatedlinks.

A = y·· − 2M

Computing N: Recall M + A + N = the number of dyads =(n2

).

N =

(n

2

)−M − A

Other formula are available in WF

Dyad census in R

M <- sum(Y * t(Y), na.rm = TRUE)/2

A <- sum(Y, na.rm = TRUE) - 2 * M

N <- choose(nrow(Y), 2) - M - A

M

## [1] 43

A

## [1] 117

N

## [1] 8225

### checksum((1 - Y) * t((1 - Y)), na.rm = TRUE)/2

## [1] 8225

Evaluating M

For what types of networks will

• M be large?

• M be small?

How do we evaluate M? What should it be compared to?

• Its distribution under some null model (RCE via MCMC);

• Its expected value under some simple conditions (direct calculation).

In the latter case, we

1. posit some simple conditions H on tie selection;

2. calculate E[M|H];

3. compare M to E[M|H].

This is less informative than comparing to a conditional distribution, but canbe done without simulation in some cases.

Mutuality under fixed choice

Fixed nomination scheme:A network survey instrument in which each individual is required to makeexactly d nominations, where d is fixed in advance.

This is a common type of network survey instrument used in institutions:

• Each member given a roster of all members;

• Each member checks off their “top d” friends;

• Often ranks of the top d friends are included (fixed rank nomination)

Such an approach is useful when the goal is to

• to distinguish between the strong and weak ties;

• to control for outdegree heterogeneity.

Mutuality under fixed choice

E[M|H] =

(n

2

)d2

(n − 1)2

=n(n − 1)

2

d2

(n − 1)2=

nd2

2(n − 1)

Mutuality under free choice

H : Individuals make yi· nominations uniformly at random.

It can be shown that

E[M|H] =y 2·· − (

∑y 2i·)

2(n − 1)2

This allows us to evaluate mutuality, controlling for heterogeneity in outdegree.

yod <- rsum(Y)(sum(yod)^2 - sum(yod^2))/(2 * (nrow(Y) - 1)^2)

## [1] 1.179

mean(s.H[, 3])

## [1] 9.49

sum(Y * t(Y), na.rm = TRUE)/2

## [1] 43

For these data, there is more reciprocity than expected, controlling for eitheroutdegrees or both in and outdegrees.

A reciprocity parameter

How should we measure reciprocity?

• reciprocity reflects dependence between Yi,j and Yj,i ;

• a reciprocity metric should measure “the effect” of Yi,j on Yj,i

Pr(Yj,i = 1|Yi,j = 1) =

Pr(Yj,i = 1) under independence0 under complete antireciprocity1 under complete reciprocity

Katz and Powell (1955) propose a reciprocity measure ρ:

Pr(Yj,i = 1|Yi,j = 1) = Pr(Yj,i = 1) + ρPr(Yj,i = 0)

• ρ = 0 implies independence;

• ρ < 0 implies antireciprocity;

• ρ > 0 implies reciprocity.

WF goes through a few ways to estimate this assuming various models.

Log odds ratio

Empirical estimates of these probabilities can be obtained from (M,A,N).

Pr(Yj,i = 1|Yi,j = 1) =Pr(Yi,j = 1,Yj,i = 1)

Pr(Yi,j = 1)

=Pr(Yi,j = 1,Yj,i = 1)

Pr(Yi,j = 1,Yj,i = 1) + Pr(Yi,j = 1,Yj,i = 0)

≈ M/T

M/T + A/(2T )=

2M

2M + A

Similarly,

Pr(Yj,i = 1|Yi,j = 0) =Pr(Yi,j = 1,Yj,i = 0)

Pr(Yi,j = 0)

=Pr(Yi,j = 1,Yj,i = 0)

Pr(Yi,j = 0,Yj,i = 0) + Pr(Yi,j = 1,Yj,i = 0)

≈ A/(2T )

A/(2T ) + N/T=

A

A + 2N

Reciprocity measure for 90s conflict dataM <- sum(Y * t(Y), na.rm = TRUE)/2



M

## [1] 43

A

## [1] 117

N

## [1] 8225

p11 <- 2 * M/(2 * M + A)

p10 <- A/(A + 2 * N)

p11

## [1] 0.4236

p10

## [1] 0.007062

log(p11 * (1 - p10)/((1 - p11) * p10))

## [1] 4.638

Reciprocity measure in 90s conflict data

y <- c(Y)

x <- c(t(Y))

mean(y[x == 1], na.rm = TRUE)

## [1] 0.4236

mean(y[x == 0], na.rm = TRUE)

## [1] 0.007062

fit <- glm(y ~ x, family = binomial)

fit

##

## Call: glm(formula = y ~ x, family = binomial)

##

## Coefficients:

## (Intercept) x

## -4.95 4.64

##

## Degrees of Freedom: 16769 Total (i.e. Null); 16768 Residual

## (130 observations deleted due to missingness)

## Null Deviance: 2200

## Residual Deviance: 1670 AIC: 1670

table(fit$fitted)

##

## 0.00706223230891567 0.423645320196599

## 16567 203

Reciprocity via logistic regression

Exercise: Show that the log-odds ratio is the logistic regression coefficient.

Note: This use of glm is not really fitting a model - the outcome is on bothsides of the regression equation.

The p1 model for mutuality567 Statistical analysis of social networks

Peter Hoff


Running exampleManagers data:

• nodeset = 21 managers in high-tech companies

• yi,j = presence of a directed friendship relation.

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

02

4

outdegree1 3 5 7 9 13 18

02

46

indegree1 2 3 4 5 6 7 8 10

WARNING: These data have been tweaked for didactic purposes:

Y[7, 11] <- Y[9, 17] <- 1

Candidate models

M0 : Pr(Yi,j = 1) =eµ

1 + eµ

Mr : Pr(Yi,j = 1) =eµ+ai

1 + eµ+ai

Mc : Pr(Yi,j = 1) =eµ+bj

1 + eµ+bj

Mrc : Pr(Yi,j = 1) =eµ+ai+bj

1 + eµ+ai+bj

Model selection

ridx <- c(matrix((1:nrow(Y)), nrow(Y), nrow(Y)))cidx <- c(t(matrix((1:nrow(Y)), nrow(Y), nrow(Y))))y <- c(Y)


AIC(fit.0)

## [1] 472.2

AIC(fit.r)

## [1] 412.2

AIC(fit.c)

## [1] 481.9

AIC(fit.rc)

## [1] 408.4

Best case scenario comparisonmu.hat <- fit.rc$coef[1]

a.hat <- fit.rc$coef[1 + 1:(nrow(Y) - 1)]

a.hat <- c(a.hat, -sum(a.hat))

b.hat <- fit.rc$coef[nrow(Y) + 1:(nrow(Y) - 1)]

b.hat <- c(b.hat, -sum(b.hat))

#### Best case scenario comparison

muij.mle <- mu.hat + outer(a.hat, b.hat, "+")

p.mle <- exp(muij.mle)/(1 + exp(muij.mle))

S.H <- NULL

for (s in 1:S) {Ysim <- matrix(rbinom(nrow(Y)^2, 1, p.mle), nrow(Y), nrow(Y))

diag(Ysim) <- NA

S.H <- rbind(S.H, c(mmean(Ysim), sd(rsum(Ysim)), sd(csum(Ysim))))

}

s.obs <- c(mmean(Y), sd(rsum(Y)), sd(csum(Y)))

mean(s.obs[3] <= S.H[, 3])

## [1] 0.934

tH

Den

sity

0.20 0.24 0.28

05

1015

2025

tH

Den

sity

4.0 4.5 5.0 5.5

0.0

0.4

0.8

1.2

tH

Den

sity

2.0 2.5 3.0 3.5

0.0

0.4

0.8

1.2

Evaluating reciprocity

M <- sum(Y * t(Y), na.rm = TRUE)/2



M

## [1] 24

A

## [1] 56

N

## [1] 130

p11 <- 2 * M/(2 * M + A)

p10 <- A/(A + 2 * N)

p11

## [1] 0.4615

p10

## [1] 0.1772

s.obs <- log(p11 * (1 - p10)/((1 - p11) * p10))

s.obs

## [1] 1.381

Empirical reciprocity

Evididence for reciprocity:

Pr(Yi,j = 1|Yj,i = 1)

Pr(Yi,j = 1|Yj,i = 0)≈ 3.98

The corresponding log-odds are about 1.38.

Is this large?

We need to compare this number to what we’d expect under our RCE model.

Best case scenario

S.H <- NULLfor (s in 1:S) {

Ysim <- matrix(rbinom(nrow(Y)^2, 1, p.mle), nrow(Y), nrow(Y))diag(Ysim) <- NA

M <- sum(Ysim * t(Ysim), na.rm = TRUE)/2A <- sum(Ysim, na.rm = TRUE) - 2 * MN <- choose(nrow(Ysim), 2) - M - A

p11 <- 2 * M/(2 * M + A)p10 <- A/(A + 2 * N)s.sim <- log(p11 * (1 - p10)/((1 - p11) * p10))

S.H <- c(S.H, s.sim)}

Best case scenario

tH

Den

sity

−0.5 0.0 0.5 1.0 1.5

0.0

0.4

0.8

1.2

mean(S.H >= s.obs)

## [1] 0.001

Formal test of reciprocity with MCMC

S.H <- NULLYsim <- Yfor (s in 1:S) {

Ysim <- rY.Yrc(Ysim)

M <- sum(Ysim * t(Ysim), na.rm = TRUE)/2A <- sum(Ysim, na.rm = TRUE) - 2 * MN <- choose(nrow(Ysim), 2) - M - A

p11 <- 2 * M/(2 * M + A)p10 <- A/(A + 2 * N)s.sim <- log(p11 * (1 - p10)/((1 - p11) * p10))

S.H <- c(S.H, s.sim)}

Formal test and p-value

tH

Den

sity

0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0

mean(S.H >= s.obs)

## [1] 0.002

Within-dyad dependence

The RCE model fails in terms of representing mutuality:

• s(Y) =∑

i<j yi,jyj,i larger than expected under independent RCE model.

• A large M relative to A and N is interpretable as within-dyad dependence:

Pr(Yj,1 = 1|Yi,j = 1) > Pr(Yj,i = 1) > Pr(Yj,1 = 1|Yi,j = 0)

So we have

• Model failure: The model doesn’t represent the data feature s(Y);

• Statistical failure: The data suggest statistical dependence in (Yi,j ,Yj,i ).

Statistical independence versus dependenceUnder the RCE model

Pr(Y = y) =∏

i 6=j

e(µ+ai+bj )yi,j

1 + eµ+ai+bj

In particular,

Pr(Yi,j = yi,j ,Yj,i = yj,i ) =e(µ+ai+bj )yi,j

1 + eµ+ai+bj× e(µ+aj+bi )yi,j

1 + eµ+aj+bi

= Pr(Yi,j = yi,j)× Pr(Yj,i = yj,i )

This means, for example

Pr(Yi,j = 1|Yj,i = yi,j) =Pr(Yi,j = 1,Yj,i = yj,i )

Pr(Yj,i = yj,i )

=Pr(Yi,j = 1)× Pr(Yj,i = yj,i )

Pr(Yj,i = yj,i )

= Pr(Yi,j = 1)

Pr(Yi,j = 1|Yj,i = 1) = Pr(Yi,j = 1|Yj,i = 0) = Pr(Yi,j = 1)

A model for dependence

For notational convenience, let µi,j = µ+ ai + bj .

To accommodate within-dyad dependence, we want something like

Pr(Yi,j = 1|Yj,i = 0) =eµi,j

1 + eµi,j

Pr(Yi,j = 1|Yj,i = 1) =eµi,j+γ

1 + eµi,j+γ

Convince yourself that if γ T 0, then

Pr(Yi,j = 1|Yj,i = 1) T Pr(Yi,j1|Yj,i = 0).

Additionally, you should be able to show

log odds(Yi,j = 1|Yj,i = 1,Yi,j = 0) = γ


The proposed conditional model is as follows:

Pr(Yi,j = yi,j |Yj,i = yj,i ) ∝ eµi,j yi,j+γyi,j yj,i

To get a joint model for {Yi,j ,Yj,i}, we need to multiply by Pr(Yj,i = yj,i ):

Pr(Yi,j = yi,j ,Yj,i = yj,i ) = Pr(Yi,j = yi,j |Yj,i = yj,i )× Pr(Yj,i = yj,i )

∝ eµi,j yi,j+γyi,j yj,i × Pr(Yj,i = yj,i ).

Symmetry suggests

Pr(Yi,j = yi,j ,Yj,i = yj,i ) ∝ eµi,j yi,j+µj,i yj,i+γyi,j yj,i


Pr(Yi,j = yi,j ,Yj,i = yj,i ) ∝ exp(µi,jyi,j + µj,iyj,i + γyi,jyj,i )

Denote Pr(Yi,j = yi,j ,Yj,i = yj,i ) = p(yi,j , yj,i ). Then

p(0, 0) = ci,j

p(1, 0) = ci,j exp(µi,j)

p(0, 1) = ci,j exp(µj,i )

p(1, 1) = ci,j exp(µi,j + µj,i + γ)

What is ci,j?

1 = p(0, 0) + p(1, 0) + p(0, 1) + p(1, 1)

1 = ci,j(1 + eµi,j + eµj,i + eµi,j+µj,i+γ)

ci,j =1

1 + eµi,j + eµj,i + eµi,j+µj,i+γ

p1 dependence model

Our proposed statistical model is as follows:

• Between-dyad relations are independent;

• Within dyad relations have the following distribution:

p(yi,j , yj,i |µ, ai , bj , γ) =eµi,j yi,j+µj,i yj,i+γyi,j yj,i


This is the so-called “p1” network model (Holland and Leinhardt, 1981).

As you might suspect, this is a type of ERGM.

Let’s find the sufficient statistics.

Sufficient statistics for p1

Pr(Y = y|µ, a, b, γ) =∏

i<j

p(yi,j , yj,i |µ, ai , aj , bi , bj , γ)

=∏

i<j

ci,jeµi,j yi,j+µj,i yj,i+γyi,j yj,i

=

(∏

i<j

ci,j

)exp(

∑

i<j

µi,jyi,j +∑

i<j

µj,iyj,i +∑

i<j

γyi,jyj,i )

= c(µ, a, b, γ) exp(∑

i 6=j

µi,jyi,j + γ∑

i<j

yi,jyj,i )

Sufficient statistics for p1

Recall µi,j = µ+ ai + bj , so

∑

i 6=j

µi,jyi,j = µ∑

i 6=j

yi,j +∑

i

ai∑

j :j 6=i

yi,j +∑

j

bj∑

i :i 6=j

yi,j + γ∑

i<j

yi,jyj,i

= (µ, a1, . . . , an, b1, . . . , bn, γ) · (y··, y1·, . . . , yn·, y·1, . . . , y·n,∑

i<j

yi,jyj,i )

Sufficient statistics for p1 are therefore

• the edge total;

• the outdegrees and indegrees;

• the total mutual dyads.

Estimation

Maximum likelihood estimation: Find (µ, a, b, γ) to maximize

l(µ, a, b, γ : y) = log Pr(Y = y|µ, a, b, γ)

=∑

i<j

log p(yi,j , yj,i |µ, ai , bj , γ)

ll.p1 <- function(Y, mu, a, b, g) {mij <- mu + outer(a, b, "+")diag(mij) <- NAlnum <- sum(mij * Y + t(Y * mij) + g * (Y * (t(Y))), na.rm = TRUE)/2lden <- sum(log(1 + exp(mij) + exp(t(mij)) + exp(mij + t(mij) + g)), na.rm = TRUE)/2lnum - lden

}

Profile likelihood

Recall, the RCE model is obtained by setting γ = 0:

RCE = {Pr(Y = y|µ, a, b, γ)

We can obtain estimates of (µ, a, b) under the RCE via glm:

fit.rc <- glm(y ~ C(factor(ridx), sum) + C(factor(cidx), sum), family = binomial)mu.hat <- fit.rc$coef[1]a.hat <- fit.rc$coef[1 + 1:(nrow(Y) - 1)]a.hat <- c(a.hat, -sum(a.hat))b.hat <- fit.rc$coef[nrow(Y) + 1:(nrow(Y) - 1)]b.hat <- c(b.hat, -sum(b.hat))

Reality check:The maximized log-likelihood of this model should equal l(µ, a, b, 0 : y)

logLik(fit.rc)

## 'log Lik.' -163.2 (df=41)

ll.p1(Y, mu.hat, a.hat, b.hat, 0)

## [1] -163.2

Profile likelihood

The profile likelihood is the likelihood

as a function of one parameter,

with the other parameters fixed at a particular estimate.

Let’s examine the profile likelihood in γ:

l(γ : Y, µ, a, b) = l(µ, a, b, γ : Y) =

GLL <- NULLfor (g in seq(-2, 2, length = 50)) {

GLL <- rbind(GLL, c(g, ll.p1(Y, mu.hat, a.hat, b.hat, g)))}

Profile likelihood

−2 −1 0 1 2

−19

5−

185

−17

5−

165

γ

log−

likel

ihoo

d

GLL[which.max(GLL[, 2]), ]

## [1] 0.6122 -160.4249

logLik(fit.rc)

## 'log Lik.' -163.2 (df=41)

Limits of profile likelihood

Let γ be the maximizer of the profile likelihood:

γ = arg maxγ

l(γ : Y, µ, a, b)

The values (µ, a, b, γ) are not generally the MLE:

• (µ, a, b) are only “best” when γ = 0.

• γ is only “best” for (µ, a, b).

• The MLEs simultaneously maximize the likelihood.

Fitting p1 with ergm

The p1 and other ERGMs can be fit in R with some additional software.

> library(ergm)

ergm: version 3.0-3, created on 2012-05-27Copyright (c) 2003, Mark S. Handcock, University of California-Los Angeles

David R. Hunter, Penn State UniversityCarter T. Butts, University of California-IrvineSteven M. Goodreau, University of WashingtonPavel N. Krivitsky, Penn State UniversityMartina Morris, University of Washington

Based on "statnet" project software (statnet.org).For license and citation information see statnet.org/attributionor type citation("ergm").

The ergm package allows (in theory) estimation and inference for ERGMs.

• accommodates a variety of sufficient statistics;

• accommodates covariate effects;

• accommodates more complicated model features (eg. random effects).

Fitting p1 with ergm

First, lets fit the RCE with the ergm function.

A model is fit to data with ergm by specifying the sociomatrix Y and thesufficient statistics:

fit <- ergm( Y ~ sstat1 + sstat2 + sstat3 )

The above pseduocode fits an ERGM to Y having sufficient statistics sstat1,sstat2 and sstat3.

The sufficient statistics have particular predefined names.

For example, the following command fits the RCE model:

fit.rc.ergm <- ergm(Y ~ edges + sender + receiver)

Reality checklogLik(fit.rc)

## 'log Lik.' -163.2 (df=41)

logLik(fit.rc.ergm)

## 'log Lik.' -163.2 (df=41)

betas <- fit.rc.ergm$coef

mu.ergm <- betas[1]

a.ergm <- c(0, betas[2:nrow(Y)])

b.ergm <- c(0, betas[nrow(Y) + 1:(nrow(Y) - 1)])

mu.ergm - mu.hat

## edges

## 1.551

a.ergm - a.hat

## sender2 sender3 sender4 sender5 sender6 sender7 sender8

## -0.3353 -0.3353 -0.3353 -0.3353 -0.3353 -0.3353 -0.3353 -0.3353

## sender9 sender10 sender11 sender12 sender13 sender14 sender15 sender16

## -0.3353 -0.3353 -0.3353 -0.3353 -0.3353 -0.3353 -0.3353 -0.3353

## sender17 sender18 sender19 sender20 sender21

## -0.3353 -0.3353 -0.3353 -0.3353 -0.3353

b.ergm - b.hat

## receiver2 receiver3 receiver4 receiver5 receiver6

## -1.215 -1.215 -1.215 -1.215 -1.215 -1.215

## receiver7 receiver8 receiver9 receiver10 receiver11 receiver12

## -1.215 -1.215 -1.215 -1.215 -1.215 -1.215

## receiver13 receiver14 receiver15 receiver16 receiver17 receiver18

## -1.215 -1.215 -1.215 -1.215 -1.215 -1.215

## receiver19 receiver20 receiver21

## -1.215 -1.215 -1.215

Reality check

muij.ergm <- mu.ergm + outer(a.ergm, b.ergm, "+")muij.hat <- mu.hat + outer(a.hat, b.hat, "+")

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

−6 −4 −2 0 2 4

−6

−4

−2

02

4

muij.ergm

mui

j.hat

Fitting the p1 model

We have just used ergm to fit the RCE ERGM:

Pr(Y = y|µ, a, b) = c(µ, a, b) exp(µy·· +n∑

i=1

aiyi· +n∑

j=1

bjy·j)

The p1 model is just the RCE model with an additional sufficient statistic:

Pr(Y = y|µ, a, b, γ) = c(µ, a, b, γ) exp(µy··+n∑

i=1

aiyi·+n∑

j=1

bjy·j + γ∑

i<j

yi,jyj,i )

Fitting the p1 modelThis model is easily specified in ergm:

fit.rcm.ergm <- ergm(Y ~ edges + sender + receiver + mutual)

## Iteration 1 of at most 20:## Convergence test P-value: 0e+00## The log-likelihood improved by 0.4525## Iteration 2 of at most 20:## Convergence test P-value: 1.4e-159## The log-likelihood improved by 0.08659## Iteration 3 of at most 20:## Convergence test P-value: 1e-40## The log-likelihood improved by 0.02804## Iteration 4 of at most 20:## Convergence test P-value: 2.4e-11## The log-likelihood improved by 0.01208## Iteration 5 of at most 20:## Convergence test P-value: 1.7e-01## The log-likelihood improved by 0.004244## Iteration 6 of at most 20:## Convergence test P-value: 2.4e-08## The log-likelihood improved by 0.01069## Iteration 7 of at most 20:## Convergence test P-value: 4.9e-02## The log-likelihood improved by 0.005403## Iteration 8 of at most 20:## Convergence test P-value: 4.7e-01## The log-likelihood improved by 0.003493## Iteration 9 of at most 20:## Convergence test P-value: 4.3e-02## The log-likelihood improved by 0.00475## Iteration 10 of at most 20:## Convergence test P-value: 4.2e-02## The log-likelihood improved by 0.005256## Iteration 11 of at most 20:## Convergence test P-value: 9.6e-04## The log-likelihood improved by 0.006738## Iteration 12 of at most 20:## Convergence test P-value: 4.1e-01## The log-likelihood improved by 0.003213## Iteration 13 of at most 20:## Convergence test P-value: 1.8e-01## The log-likelihood improved by 0.004335## Iteration 14 of at most 20:## Convergence test P-value: 6.5e-01## Convergence detected. Stopping.## The log-likelihood improved by 0.002841#### This model was fit using MCMC. To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.

Fitting the p1 model

summary(fit.rcm.ergm)

#### ==========================## Summary of model fit## ==========================#### Formula: Y ~ edges + sender + receiver + mutual#### Iterations: 20#### Monte Carlo MLE Results:## Estimate Std. Error MCMC % p-value## edges -1.2092 0.7378 0 0.1021## sender2 -1.0283 0.9313 2 0.2702## sender3 -1.0244 1.0028 1 0.3076## sender4 0.6931 0.8361 1 0.4077## sender5 0.8983 0.8136 1 0.2703## sender6 1.0796 0.8365 1 0.1976## sender7 -1.5353 1.2210 1 0.2094## sender8 -1.8304 1.2434 1 0.1418## sender9 -2.0023 1.2360 1 0.1061## sender10 1.4815 0.8391 1 0.0783 .## sender11 2.5400 0.8399 1 0.0027 **## sender12 -0.3896 0.8610 1 0.6512## sender13 -0.3499 1.0141 1 0.7302## sender14 -1.0096 0.9927 1 0.3098## sender15 1.4083 0.8060 1 0.0814 .## sender16 -0.8653 0.9897 1 0.3825## sender17 4.6912 1.1349 1 <1e-04 ***## sender18 -1.6954 1.2571 1 0.1783## sender19 1.5759 0.8140 1 0.0536 .## sender20 -0.7081 1.0248 1 0.4900## sender21 -0.0158 0.8674 1 0.9854## receiver2 0.8626 0.8387 1 0.3044## receiver3 -0.7462 0.9016 1 0.4084## receiver4 -1.2792 0.9212 1 0.1658## receiver5 -0.9570 0.8712 1 0.2727## receiver6 -3.1046 1.1585 0 0.0077 **## receiver7 -1.5819 1.0014 1 0.1150## receiver8 -0.5974 0.8710 1 0.4932## receiver9 -0.2279 0.8500 1 0.7887## receiver10 -4.4432 1.4568 0 0.0024 **## receiver11 -1.2563 0.8952 1 0.1613## receiver12 0.1130 0.8456 1 0.8938## receiver13 -3.6293 1.4188 0 0.0109 *## receiver14 -0.7406 0.9004 1 0.4113## receiver15 -1.9766 0.9565 0 0.0395 *## receiver16 -1.1909 0.9064 1 0.1897## receiver17 -1.7559 0.9073 0 0.0537 .## receiver18 -1.0322 0.9137 1 0.2593## receiver19 -1.6027 0.9184 1 0.0818 .## receiver20 -1.7486 1.0050 0 0.0827 .## receiver21 -1.0216 0.8921 1 0.2529## mutual 2.5112 0.6119 0 <1e-04 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Null Deviance: 582 on 420 degrees of freedom## Residual Deviance: 306 on 378 degrees of freedom#### AIC: 390 BIC: 560 (Smaller is better.)

Reality check

betas <- fit.rcm.ergm$coefmu.rcm <- betas[1]a.rcm <- c(0, betas[2:nrow(Y)])b.rcm <- c(0, betas[nrow(Y) + 1:(nrow(Y) - 1)])gamma.rcm <- betas[2 * nrow(Y)]

gamma.rcm

## mutual## 2.511

GLLM <- NULLfor (g in seq(-1, 6, length = 300)) {

GLLM <- rbind(GLLM, c(g, ll.p1(Y, mu.rcm, a.rcm, b.rcm, g)))}

Reality check

−1 0 1 2 3 4 5 6

−30

0−

200

γ

log−

likel

ihoo

d

Reality check

gamma.rcm

## mutual

## 2.511

logLik(fit.rcm.ergm)

## 'log Lik.' -153.2 (df=42)

GLLM[which.max(GLLM[, 2]), ]

## [1] 2.512 -153.234

logLik(fit.rc.ergm)

## 'log Lik.' -163.2 (df=41)

GLL[which.max(GLL[, 2]), ]

## [1] 0.6122 -160.4249

How does ergm work?

Recall the general ERGM:

Pr(Y = y|θ) = c(θ) exp(t(y) · θ)

t(y) (t1(y), . . . , tp(y))θ (θ1, . . . , θp)

c(θ) is a normalizing constant.

Let y be the observed value of the network. The MLE θ maximizes thelog-likelihood:

l(θ : y) = log Pr(Y = y|θ) = θT t(y) + log c(θ).

A standard approach to function estimation is with gradient ascent Thisapproach requires the derivatives of l(θ : y)

∇l(θ : y) = t(y) +∇ log c(θ)

Likelihood derivatives

∇ log c(θ) =∇c(θ)

c(θ)

What is c(θ)? Recall,∑

y∈Y Pr(Y = y|θ) = 1. Therefore,

1 =∑

yPr(Y = y|θ)

1 =∑

yc(θ) exp(t(y) · θ)

1 = c(θ)∑

yexp(t(y) · θ)

c−1(θ) =∑

yexp(t(y) · θ)

c(θ) =1∑

y exp(t(y) · θ)

Likelihood derivatives

∇ log c(θ) = −∇ log c(θ)−1

= −∑

y t(y) exp(t(y) · θ)∑

y exp(t(y) · θ)

For each step of gradient ascent we need to calculate ∇ log c(θ).This requires summing over all possible n × n graphs y.

n number of graphs

2 22 = 43 26 = 64

n 2n(n−1)

20 2300 = 2.46× 10114

This isn’t going to work.

MCMCMLE

The ergm package takes a different strategy.

Consider comparing a “reference” value θ0 to θ:

l(θ)− l(θ0) = θ · t(y) + log c(θ)− θ0 · t(y) + log c(θ0)

= (θ − θ0) · t(y) + logc(θ)

c(θ0)

It turns out that

c(θ0)

c(θ)= E[(θ − θ0) · t(Y)|θ0]

This is an average, that can be approximated with a MCMC routine.

The ergm fitting routine is roughly as follows: Given a current value of θ0

1. Run an MCMC routine to approixmate log c(θ)

c(θ0)for θ near θ0.

2. Approximate the derivative near θ0

3. Move along the derivative to a new value of θ0

4. Repeat until convergence.

Speed issues

This procedure can take a long time:

> date()

[1] "Wed Feb 5 14:20:43 2014"

> fit.rcm.ergm<-ergm(Y ~ edges + sender + receiver + mutual)

Iteration 1 of at most 20:

Convergence test P-value: 0e+00

The log-likelihood improved by 0.4616


Convergence test P-value: 9.9e-163








.

.

.



Convergence detected. Stopping.


This model was fit using MCMC. To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.

> date()

[1] "Wed Feb 5 14:22:30 2014"

A larger datatset

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

dim(Yc90)

## [1] 67 67

> date()[1] "Wed Feb 13 15:11:57 2013"

Iteration 1 of at most 20:the log-likelihood improved by 3.602Iteration 2 of at most 20:the log-likelihood improved by 1.886Iteration 3 of at most 20:the log-likelihood improved by 5.315Iteration 4 of at most 20:the log-likelihood improved by 4.038..Iteration 17 of at most 20:the log-likelihood improved by 0.275Iteration 18 of at most 20:the log-likelihood improved by 0.3019Iteration 19 of at most 20:the log-likelihood improved by 0.4783Iteration 20 of at most 20:the log-likelihood improved by 0.3691

date()[1] "Wed Feb 13 15:22:17 2013"

Additional ergm difficulties

Y <- dget("http://www.stat.washington.edu/~hoff/courses/567/Data/htmanagers")[,, 2]

rsum(Y)

## [1] 5 3 2 6 7 6 0 1 0 7 13 4 2 2 8 2 18 1 9 2 4

csum(Y)

## [1] 8 10 5 5 6 2 3 5 6 1 6 8 1 5 4 4 6 4 5 3 5

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●


Let’s fit the RCE model to the original data:

y <- c(Y)fit.rc <- glm(y ~ C(factor(ridx), sum) + C(factor(cidx), sum), family = binomial)

mu.hat <- fit.rc$coef[1]a.hat <- fit.rc$coef[1 + 1:(nrow(Y) - 1)]a.hat <- c(a.hat, -sum(a.hat))b.hat <- fit.rc$coef[nrow(Y) + 1:(nrow(Y) - 1)]b.hat <- c(b.hat, -sum(b.hat))

logLik(fit.rc)

## 'log Lik.' -156.2 (df=41)

ll.p1(Y, mu.hat, a.hat, b.hat, 0)

## [1] -156.2


Fitting the same model in ERGM:

fit.rc.ergm <- ergm(Y ~ edges + sender + receiver)

## Observed statistic(s) sender7 and sender9 are at their smallest attainable values. Their coefficients will be fixed at -Inf.

betas <- fit.rc.ergm$coefmu.ergm <- betas[1]a.ergm <- c(0, betas[2:nrow(Y)])b.ergm <- c(0, betas[nrow(Y) + 1:(nrow(Y) - 1)])

logLik(fit.rc.ergm)

## 'log Lik.' -162.8 (df=41)

ll.p1(Y, mu.ergm, a.ergm, b.ergm, 0)

## [1] -159.4

Additional ergm difficultiesFitting the p1 model:

fit.rcm.ergm <- ergm(Y ~ edges + sender + receiver + mutual)

## Observed statistic(s) sender7 and sender9 are at their smallest attainable values. Their coefficients will be fixed at -Inf.

## Iteration 1 of at most 20:

## Convergence test P-value: 0e+00

## The log-likelihood improved by 1.656


## Convergence test P-value: 3.2e-280






## Convergence test P-value: 1e-13










## Convergence detected. Stopping.


## Warning: Observed statistic(s) sender7 and sender9 are at their smallest attainable values

and drop=FALSE. The MLE is poorly defined.

##

## This model was fit using MCMC. To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.


logLik(fit.rcm.ergm)

## 'log Lik.' NaN (df=42)

Goodness of fit via network simulationergm provides a convenient function for network simulation:

Ysim <- simulate(fit.rcm.ergm)

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

Goodness of fit via network simulation

Simulation can be handy for goodness of fit checks.

ergm uses simulate to evaluate fit for some pre-specified statistics:

p1gof <- gof(fit.rcm.ergm, GOF = ~idegree + odegree + triadcensus)

0 1 2 3 4 5 6 7 8 9 10 12

0.0

0.1

0.2

0.3

in degree

prop

ortio

n of

nod

es

● ● ● ●

● ● ●

● ● ● ● ● ● ●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ● ●

0 2 4 6 8 10 12 14 16 18 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

out degree

prop

ortio

n of

nod

es

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ●

● ● ●

●

●

●

● ● ●

● ● ● ● ● ● ● ● ●

● ●

●

●

●

●

● ●

● ●

● ● ●

●

● ● ● ●

●

● ●

003 102 021U 111U 201 120U 210

0.0

0.1

0.2

0.3

triad census

prop

ortio

n of

tria

ds

●

●

●

●

●

●●

●

●

●●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

● ● ●

●

●

●

● ●●

●

●

●

Covariate effects in ERGMs567 Social network analysis

Peter Hoff


Dutch college data

Friendship ties among 32 college students enrolled in a particular program.

• Relations on a 6 point scale, from ”dislike” to ”best friends”;

• Relations measured at seven time points;

• Sex, smoking status and subprogram category also available.

Questions:

• What are the effects of sex, smoking status and subgroup on tie formation?

• Is their substantial in and outdegree heterogeneity, or reciprocity?

• How does the network evolve over time?

Dutch college data

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

For now, we’ll analyze

• the indicator of a positive relation;

• the network at the final timepoint.

Preliminary analysis

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

mean(Y7, na.rm = TRUE)

## [1] 0.1694

mean(Y7[X$male == 1, X$male == 1], na.rm = TRUE)

## [1] 0.3571

mean(Y7[X$male == 0, X$male == 0], na.rm = TRUE)

## [1] 0.1957

mean(Y7[X$smoke == 1, X$smoke == 1], na.rm = TRUE)

## [1] 0.2692

mean(Y7[X$smoke == 0, X$smoke == 0], na.rm = TRUE)

## [1] 0.2018

SP <- outer(X$prog, X$prog, "==")mean(Y7[SP], na.rm = TRUE)

## [1] 0.2861


02

46

8

outdegree

freq

uenc

y

0 3 7 10 14 19

01

23

45

67

indegreefr

eque

ncy

0 2 4 6 8 11

mean(outdegree[X$smoker == 1]) - mean(outdegree[X$smoker == 0])

## [1] 0.3563

mean(indegree[X$smoker == 1]) - mean(indegree[X$smoker == 0])

## [1] 0.2267

Degree heterogeneity and Reciprocity

#### degree analysissd(outdegree)

## [1] 4.663

sd(indegree)

## [1] 2.514

cor(outdegree, indegree)

## [1] 0.1706

#### dyad censusM <- sum(Y7 * t(Y7), na.rm = TRUE)/2A <- sum(Y7, na.rm = TRUE) - 2 * MN <- choose(nrow(Y7), 2) - M - A

p11 <- 2 * M/(2 * M + A)p10 <- A/(A + 2 * N)log(p11 * (1 - p10)/((1 - p11) * p10))

## [1] 2.25


Some preliminary findings:

• Covariate effects:• homophily by sex, smoking behavior and program;• smokers seem more outgoing and popular.

• Network patterns:• positive reciprocity;• outdegree variance is larger than indegree variance, and little correlation

betwen the two.

To summarize covariate effects, some researchers employ “network regression:”

• convert sociomatrix and covariate matrices to vectors;

• perform ordinary regression (OLS, logistic regression, Poisson regression).

Such a procedure should not be called network regression:

• it is just regression;

• it ignores the network structure to the data.

Logistic regressionNevertheless, regression with appropriate covariates might be adequate.

In particular, network patterns could be explained by covariates:• degree heterogeneity could be explained by one or more nodal covariates;• reciprocity could be explained by a group comembership variable.

Let’s do an ordinary logistic regression and evaluate the fit.

XM <- array(dim = c(n, n, 5))

XM[, , 1] <- matrix(X[, 2], n, n)

XM[, , 2] <- t(XM[, , 1])

XM[, , 3] <- outer(X[, 1], X[, 1], "==")

XM[, , 4] <- outer(X[, 2], X[, 2], "==")

XM[, , 5] <- outer(X[, 3], X[, 3], "==")

y7 <- c(Y7)

x <- apply(XM, 3, "c")

colnames(x) <- c("rsmoke", "csmoke", "ssex", "ssmoke", "sprog")

fit.glm <- glm(y7 ~ x, family = binomial)

Logistic regression fit

summary(fit.glm)

#### Call:## glm(formula = y7 ~ x, family = binomial)#### Deviance Residuals:## Min 1Q Median 3Q Max## -1.114 -0.691 -0.445 -0.312 2.469#### Coefficients:## Estimate Std. Error z value Pr(>|z|)## (Intercept) -3.212 0.267 -12.03 < 2e-16 ***## xrsmoke 0.255 0.188 1.35 0.17572## xcsmoke 0.213 0.188 1.13 0.25750## xssex 0.693 0.208 3.33 0.00086 ***## xssmoke 0.798 0.186 4.29 1.8e-05 ***## xsprog 1.103 0.180 6.13 8.9e-10 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### (Dispersion parameter for binomial family taken to be 1)#### Null deviance: 902.45 on 991 degrees of freedom## Residual deviance: 819.39 on 986 degrees of freedom## (32 observations deleted due to missingness)## AIC: 831.4#### Number of Fisher Scoring iterations: 5

Fitting logistic regression in ergmLogistic regression is an ERGM with independent relations.

Suppose our model is

log odds(yi,j = 1) = β0 + βrxr,i + βcxc,j + βdxd,i,j

Then

Pr(Y = y|X,β) =∏

i 6=j

(e(β0+βr xr,i+βc xc,j+βd xd,i,j )yi,j

1 + eβ0+βr xr,i+βc xc,j+βd xd,i,j

)

= c(X,β)× exp

β0

∑

i 6=j

yi,j + βr∑

i 6=j

xr,iyi,j+

βc∑

i 6=j

xc,jyi,j + βd∑

i 6=j

xd,i,jyi,j

The sufficient statistics simplify to the four-dimensional vector

t(y) =

y··,

n∑

i=1

xr,iyi·,n∑

j=1

xc,jy·j ,∑

i 6=j

xd,i,jyi,j

.

Fitting logistic regression in ergm

As logistic regression is an ERGM, we should be able to fit it with ergm.

We first need to convert the data to a network object:

library(ergm)netdat <- network(Y7, vertex.attr = X)

Sometimes you want to add vertex attributes one at a time:

netdat <- network(Y7)set.vertex.attribute(netdat, "male", X[, 1])set.vertex.attribute(netdat, "smoker", X[, 2])set.vertex.attribute(netdat, "program", X[, 3])


The model is then fit, as before, by specifying sufficient statistics:

fit.ergm <- ergm(netdat ~ edges + nodeocov("smoker") + nodeicov("smoker") +nodematch("male") + nodematch("smoker") + nodematch("program"))

The terms nodeicov, nodeocov and nodematch create sufficient statistics outof nodal covariates:

• nodeocov creates a row regression effect;

• nodeicov creates a column regression effect;

• nodematch creates a dyadic binary indicator .

See the ergm manual for more details.


summary(fit.ergm)

#### ==========================## Summary of model fit## ==========================#### Formula: netdat ~ edges + nodeocov("smoker") + nodeicov("smoker") + nodematch("male") +## nodematch("smoker") + nodematch("program")#### Iterations: 20#### Monte Carlo MLE Results:## Estimate Std. Error MCMC % p-value## edges -3.212 0.267 NA < 1e-04 ***## nodeocov.smoker 0.255 0.188 NA 0.17603## nodeicov.smoker 0.213 0.188 NA 0.25777## nodematch.male 0.693 0.208 NA 0.00089 ***## nodematch.smoker 0.798 0.186 NA < 1e-04 ***## nodematch.program 1.103 0.180 NA < 1e-04 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Null Deviance: 1375 on 992 degrees of freedom## Residual Deviance: 819 on 986 degrees of freedom#### AIC: 831 BIC: 861 (Smaller is better.)

Goodness of fit

We fit this model without regard to its network structure:

• across sender heterogeneity/within sender correlation;

• across receiver heterogeneity/within receiver correlation;

• reciprocity/within dyad correlation.

It is possible that such patterns could be explained by covariates:

• heterogeneity in smoking leads to heterogeneity in degree;

• homophily for sex, smoking and group leads to reciprocity.

Let’s examine this with a goodness of fit evaluation.

Goodness of fit

s.obs <- c(sd(rsum(Y7)), sd(csum(Y7)), mdyad(Y7))

py.hat <- fit.glm$fitteds.SIM <- NULLfor (s in 1:S) {

Ysim <- matrix(NA, nrow(Y7), nrow(Y7))Ysim[!is.na(Y7)] <- rbinom(length(py.hat), 1, py.hat)# Ysim<-matrix(rbinom(nrow(Y7)^2,1,py.hat),nrow(Y7),nrow(Y7)) diag(Ysim)<-NAs.SIM <- rbind(s.SIM, c(sd(rsum(Ysim)), sd(csum(Ysim)), mdyad(Ysim)))

}

tH

Den

sity

1.5 2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

tH

Den

sity

1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

tH

Den

sity

10 15 20 25 30 35 400.

000.

020.

040.

060.

08

Goodness of fit

mean(s.SIM[, 1] >= s.obs[1])

## [1] 0


## [1] 0.149


## [1] 0

Evaluation: These results indicate

• more outdgree heterogeneity than expected under the MLE;

• more reciprocity than expected;

• indegree heterogeneity is as expected.

p1 with covariatesThis lack of fit can be addressed by adding statistics to the model:

fit.p1cov.1 <- ergm(netdat ~ edges + nodeocov("smoker") + nodeicov("smoker") +nodematch("male") + nodematch("smoker") + nodematch("program") + sender +receiver + mutual)

summary(fit.p1cov.1)

#### ==========================## Summary of model fit## ==========================#### Formula: netdat ~ edges + nodeocov("smoker") + nodeicov("smoker") + nodematch("male") +## nodematch("smoker") + nodematch("program") + sender + receiver +## mutual#### Iterations: 20#### Monte Carlo MLE Results:## Estimate Std. Error MCMC % p-value## edges -38.6822 0.3187 NA < 1e-04 ***## nodeocov.smoker 17.4251 0.7299 NA < 1e-04 ***## nodeicov.smoker 15.4327 0.7793 NA < 1e-04 ***## nodematch.male 0.9798 0.2598 NA 0.00017 ***## nodematch.smoker 0.8787 0.2139 NA < 1e-04 ***## nodematch.program 1.2093 0.2091 NA < 1e-04 ***## sender2 17.3720 0.6219 NA < 1e-04 ***## sender3 19.3771 0.5271 NA < 1e-04 ***## sender4 -Inf NA NA NA## sender5 -Inf NA NA NA## sender6 17.3171 0.7299 NA < 1e-04 ***## sender7 3.4103 0.9187 NA 0.00022 ***## sender8 -0.0115 1.0379 NA 0.99114## sender9 1.8180 0.9201 NA 0.04845 *## sender10 1.3624 0.9221 NA 0.13988## sender11 -0.6799 1.1351 NA 0.54932## sender12 -1.6073 1.2740 NA 0.20739## sender13 -Inf NA NA NA## sender14 0.4869 1.0216 NA 0.63374## sender15 20.1597 0.4757 NA < 1e-04 ***## sender16 18.6122 0.5477 NA < 1e-04 ***## sender17 17.8109 0.6100 NA < 1e-04 ***## sender18 -Inf NA NA NA## sender19 0.9849 1.0934 NA 0.36793## sender20 -Inf NA NA NA## sender21 1.4083 1.0099 NA 0.16350## sender22 17.7014 0.6159 NA < 1e-04 ***## sender23 18.5950 0.5599 NA < 1e-04 ***## sender24 17.9800 0.6236 NA < 1e-04 ***## sender25 18.6165 0.5552 NA < 1e-04 ***## sender26 -0.5010 1.0675 NA 0.63895## sender27 -Inf NA NA NA## sender28 19.7025 0.5838 NA < 1e-04 ***## sender29 19.5991 0.5683 NA < 1e-04 ***## sender30 -Inf NA NA NA## sender31 -Inf NA NA NA## sender32 19.1037 0.5140 NA < 1e-04 ***## receiver2 15.8631 0.6921 NA < 1e-04 ***## receiver3 14.9730 0.6999 NA < 1e-04 ***## receiver4 17.8512 0.5291 NA < 1e-04 ***## receiver5 16.6379 0.6229 NA < 1e-04 ***## receiver6 16.1119 0.6951 NA < 1e-04 ***## receiver7 0.0725 1.0328 NA 0.94408## receiver8 0.3832 1.1137 NA 0.73084## receiver9 -1.0993 1.1401 NA 0.33520## receiver10 1.9373 1.0442 NA 0.06389 .## receiver11 0.2619 1.1311 NA 0.81695## receiver12 1.9376 1.0231 NA 0.05857 .## receiver13 1.5077 1.0167 NA 0.13842## receiver14 1.5060 1.0572 NA 0.15463## receiver15 13.7522 0.7503 NA < 1e-04 ***## receiver16 14.6330 0.7888 NA < 1e-04 ***## receiver17 14.6780 0.8132 NA < 1e-04 ***## receiver18 -Inf NA NA NA## receiver19 0.2892 1.1537 NA 0.80213## receiver20 17.5776 0.5596 NA < 1e-04 ***## receiver21 0.8251 1.1036 NA 0.45490## receiver22 16.4253 0.6220 NA < 1e-04 ***## receiver23 15.6689 0.6738 NA < 1e-04 ***## receiver24 16.1560 0.6550 NA < 1e-04 ***## receiver25 14.6531 0.7884 NA < 1e-04 ***## receiver26 2.3125 1.0065 NA 0.02181 *## receiver27 17.9031 0.5912 NA < 1e-04 ***## receiver28 14.5211 1.0241 NA < 1e-04 ***## receiver29 15.7401 0.7474 NA < 1e-04 ***## receiver30 18.3343 0.5409 NA < 1e-04 ***## receiver31 0.0518 1.1844 NA 0.96510## receiver32 16.0079 0.6442 NA < 1e-04 ***## mutual 3.9466 0.5606 NA < 1e-04 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Warning: The standard errors are suspect due to possible poor convergence.#### Null Deviance: 1375 on 992 degrees of freedom## Residual Deviance: NaN on 923 degrees of freedom#### AIC: NaN BIC: NaN (Smaller is better.)#### Warning: The following terms have infinite coefficient estimates:## sender4 sender5 sender13 sender18 sender20 sender27 sender30 sender31 receiver18

Confounding

Some of these coefficients do not seem realistic:

• βoutsmoke = 15.74

• βinsmoke = 18.33

The problem is confounding between these effects and the sender and receivereffects.

To illustrate this issue, consider a simple model with just

• sender effects;

• one sender-specific covariate.

Pr(Yi,j = yi,j) =e(µ+ai+βxi )yi,j

1 + eµ+ai+βxi

Confounding

The sufficient statistics can be found by summing the exponent over pairs:

∑

i 6=j

µyi,j + aiyi,j + βxiyi,j = µy·· +∑

i

aiyi· + β∑

i

xiyi·

Naively, the parameters and sufficient statistics are

θ = (µ, a1, . . . , an, β)

t(y) = (y··, y1·, . . . , yn·,∑

i

xiyi·)

Note that

1. y·· is a function of y1·, . . . , yn· (this leads to side conditions on the ai ’s);

2.∑

xiyi· is a function of y1·, . . . , yn· (the xi ’s are treated as “fixed” ).

This latter phenomenon means that β and the ai ’s are not jointly estimable.

Confounding

Let’s examine this more explicitly:

t(y) · θ(µ, a, β) = µy·· +∑

i

aiyi· + β∑

i

xiyi·

t(y) · θ(µ, a− cx, β + c) = µy·· +∑

(ai − cxi )yi· + (β + c)∑

xiyi·

= µy·· +∑

aiyi· + β∑

xiyi·

= t(y) · θ(µ, a, β).

Nonidentifiability:This result implies that for any two values of β, say β1 and β2, there arevectors a1 and a2 such that

l(µ, a1, β1 : y) = l(µ, a2, β2 : y).

The data information can’t distinguish between (µ, a1, β1) and (µ, a2, β2).

Modeling optionsThere are three commonly used methods of addressing this issue:

1. fit the model without sender and receiver effects;

2. fit the model without sender and receiver regressors;

3. use a random effects model.

We don’t want to do 1 if the logistic regression model has been rejected.

We will fit the model in item 2, but use a two-stage procedure for estimatingnodal covariate effects: For example,

• obtain θ = (µ, a);

• fit the regression model ai = βxi + εi .

This is an ad-hoc approximation to the random effects approach:

• Model yi,j as a function of ai ;

• Model ai as a function of xi

ai = βxi + εi

{ε1, . . . , εn} ∼ i.i.d.normal(0, σ2a)

We will cover such models shortly.

Dyadic covariates for p1

fit.p1cov.d <- ergm(netdat ~ nodematch("male") + nodematch("smoker") + nodematch("program") +edges + mutual + sender + receiver)

summary(fit.p1cov.d)

#### ==========================## Summary of model fit## ==========================#### Formula: netdat ~ nodematch("male") + nodematch("smoker") + nodematch("program") +## edges + mutual + sender + receiver#### Iterations: 20#### Monte Carlo MLE Results:## Estimate Std. Error MCMC % p-value## nodematch.male 0.97446 0.24817 NA < 1e-04 ***## nodematch.smoker 0.87702 0.20728 NA < 1e-04 ***## nodematch.program 1.21662 0.21337 NA < 1e-04 ***## edges -5.89465 0.88012 NA < 1e-04 ***## mutual 3.93039 0.55701 NA < 1e-04 ***## sender2 -0.00986 1.03227 NA 0.99238## sender3 1.95907 0.94045 NA 0.03751 *## sender4 -Inf NA NA NA## sender5 -Inf NA NA NA## sender6 -0.10719 1.08639 NA 0.92143## sender7 3.44437 0.93016 NA 0.00023 ***## sender8 -0.01680 1.07068 NA 0.98748## sender9 1.86578 0.93275 NA 0.04576 *## sender10 1.37244 0.93891 NA 0.14415## sender11 -0.64060 1.16863 NA 0.58371## sender12 -1.63342 1.24365 NA 0.18937## sender13 -Inf NA NA NA## sender14 0.48167 1.01416 NA 0.63494## sender15 2.73999 0.93699 NA 0.00354 **## sender16 1.19261 0.96329 NA 0.21601## sender17 0.39126 1.01526 NA 0.70005## sender18 -Inf NA NA NA## sender19 1.02229 1.06310 NA 0.33650## sender20 -Inf NA NA NA## sender21 1.44637 1.01456 NA 0.15432## sender22 0.32754 1.03100 NA 0.75079## sender23 1.19862 0.94538 NA 0.20517## sender24 0.62869 1.01638 NA 0.53636## sender25 1.20982 0.97276 NA 0.21393## sender26 -0.46752 1.08454 NA 0.66651## sender27 -Inf NA NA NA## sender28 2.27248 0.98931 NA 0.02184 *## sender29 2.19655 0.98228 NA 0.02558 *## sender30 -Inf NA NA NA## sender31 -Inf NA NA NA## sender32 1.71212 0.92542 NA 0.06462 .## receiver2 0.47111 1.08543 NA 0.66437## receiver3 -0.39548 1.07392 NA 0.71276## receiver4 2.44692 0.99825 NA 0.01442 *## receiver5 1.24176 1.01809 NA 0.22289## receiver6 0.73346 1.07060 NA 0.49346## receiver7 0.15290 1.03892 NA 0.88303## receiver8 0.44042 1.15422 NA 0.70286## receiver9 -1.08385 1.15088 NA 0.34656## receiver10 1.95274 1.04026 NA 0.06081 .## receiver11 0.30147 1.12912 NA 0.78953## receiver12 1.97383 1.01421 NA 0.05194 .## receiver13 1.58064 0.98570 NA 0.10915## receiver14 1.54730 1.04138 NA 0.13767## receiver15 -1.58152 1.12928 NA 0.16171## receiver16 -0.66290 1.12055 NA 0.55427## receiver17 -0.67375 1.14831 NA 0.55753## receiver18 -Inf NA NA NA## receiver19 0.35078 1.10528 NA 0.75104## receiver20 2.16189 0.99069 NA 0.02935 *## receiver21 0.87048 1.07609 NA 0.41877## receiver22 1.10646 1.03312 NA 0.28445## receiver23 0.28000 1.04960 NA 0.78971## receiver24 0.78410 1.06581 NA 0.46211## receiver25 -0.70833 1.12496 NA 0.52908## receiver26 2.36477 1.00810 NA 0.01920 *## receiver27 2.52127 1.01542 NA 0.01321 *## receiver28 -0.82965 1.35988 NA 0.54195## receiver29 0.38473 1.14267 NA 0.73642## receiver30 2.98865 0.95775 NA 0.00186 **## receiver31 0.08804 1.19989 NA 0.94152## receiver32 0.58736 1.00498 NA 0.55906## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Warning: The standard errors are suspect due to possible poor convergence.#### Null Deviance: 1375 on 992 degrees of freedom## Residual Deviance: NaN on 925 degrees of freedom#### AIC: NaN BIC: NaN (Smaller is better.)#### Warning: The following terms have infinite coefficient estimates:## sender4 sender5 sender13 sender18 sender20 sender27 sender30 sender31 receiver18

Extracting row and column effects

a.hat <- c(0, fit.p1cov.d$coef[4 + (2:nrow(Y))])b.hat <- c(0, fit.p1cov.d$coef[4 + nrow(Y) - 1 + (2:nrow(Y))])

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

0 5 10 15

−1

01

23

outdegree

a.ha

t

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

0 2 4 6 8 10

−1

01

23

csum(Y7)

inde

gree

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

−1 0 1 2 3

−1

01

23

a.hat

b.ha

t

Nodal covariate effects

How does a covariate x = {x1, . . . , xn} relate to

• outgoingness (a1, . . . , an)?

• popularity (b1, . . . , bn)?

## Warning: Outlier (-Inf) in boxplot 2 is not drawn## Warning: Outlier (-Inf) in boxplot 1 is not drawn

0 1

−1

01

23

smoking status

a.ha

t

0 1

−1

01

23

smoking status

b.ha

t

Statistical evaluation

lm(a.hat ~ xsmoke)

## Error: NA/NaN/Inf in ’y’

The problem here is that ai is −∞ for nodes with zero outdegree.What can we do?

0. give up;

1. remove the problematic observations;

2. replace the problematic observations with some large negative value;

3. fit a random effects model.

Item 1 removes information and biases the results:

• Zero degree nodes are highly informative about covariate effects.

• Their removal could bias the estimated effects towards zero.

Item 2 requires we can pick the “right” replacement value.

Ad-hoc statistical evaluation

a.hat[a.hat == -Inf] <- NAb.hat[b.hat == -Inf] <- NA

summary(glm(a.hat ~ xsmoke))$coef

## Estimate Std. Error t value Pr(>|t|)## (Intercept) 1.2086 0.3231 3.740 0.001134## xsmoke -0.5836 0.4773 -1.223 0.234371

summary(glm(b.hat ~ xsmoke))$coef

## Estimate Std. Error t value Pr(>|t|)## (Intercept) 0.6031 0.2798 2.1553 0.03958## xsmoke 0.2076 0.4321 0.4805 0.63451

The results suggest that smoking doesn’t have a large effect on sender orreceiver effects, and hence on outgoingness or popularity.

Ad-hoc statistical evaluationHowever: What if

• all −∞ ai ’s corresponded to smokers?

• all −∞ bj ’s corresponded to nonsmokers?

Either possibility would suggest a estimating the parameter as further awayfrom zero, making it “more significant.”

xsmoke[is.na(a.hat)]

## [1] 0 0 1 0 0 0 0 1

xsmoke[is.na(b.hat)]

## [1] 0

mean(xsmoke)

## [1] 0.4062

mean(xsmoke[is.na(a.hat)])

## [1] 0.25

These results don’t give indications of strong relationships between smokingand the tendancy to send or receive ties.

Conflict example

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

● ●

●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

020

50


020

40


mdyad(Y)

## [1] 43

## expected value conditional on outdegree WF p. 517(sum(Y, na.rm = TRUE)^2 - sum(rsum(Y)^2))/(2 * (nrow(Y) - 1)^2)

## [1] 1.179

Network patterns

fit.0 <- ergm(Y ~ edges)s.SIM0 <- NULLfor (s in 1:S) {

Ysim <- as.matrix(simulate(fit.0))diag(Ysim) <- NAs.SIM0 <- rbind(s.SIM0, c(sd(rsum(Ysim)), sd(csum(Ysim)), mdyad(Ysim)))

}

tH

Den

sity

1.0 1.5 2.0 2.5 3.0 3.5

01

23

45

tH

Den

sity

1.0 1.2 1.4 1.6 1.8 2.0

01

23

45

tHD

ensi

ty0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Covariate information

Additionally, we have the following covariates:

• Nodal covariates:• population• gdp• polity

• Dyad covariates:• exports• shared IGOs• geographic distance

Let’s see if these covariates account for any of the network patterns.

Coding covariates

It is common to log values of money, population and distance:

colnames(Xn)

## [1] "pop" "gdp" "polity"

Xn[, 1:2] <- log(Xn[, 1:2])colnames(Xn) <- c("lpop", "lgdp", "polity")netdat <- network(Y, vertex.attr = as.data.frame(Xn))

Dyad covariates enter into ergm via the edgecov function:

fit.cov.ergm <- ergm(netdat ~ edges + nodeocov("lpop") + nodeocov("lgdp") +nodeocov("polity") + nodeicov("lpop") + nodeicov("lgdp") + nodeicov("polity") +edgecov(Xpol) + edgecov(Xigo) + edgecov(Xldst) + edgecov(Xlexp) + edgecov(Xlimp))

Logistic regression fit

summary(fit.cov.ergm)

#### ==========================## Summary of model fit## ==========================#### Formula: netdat ~ edges + nodeocov("lpop") + nodeocov("lgdp") + nodeocov("polity") +## nodeicov("lpop") + nodeicov("lgdp") + nodeicov("polity") +## edgecov(Xpol) + edgecov(Xigo) + edgecov(Xldst) + edgecov(Xlexp) +## edgecov(Xlimp)#### Iterations: 20#### Monte Carlo MLE Results:## Estimate Std. Error MCMC % p-value## edges -2.54860 0.36283 NA < 1e-04 ***## nodeocov.lpop 0.20465 0.08340 NA 0.01415 *## nodeocov.lgdp 0.27799 0.08057 NA 0.00056 ***## nodeocov.polity -0.08160 0.01226 NA < 1e-04 ***## nodeicov.lpop 0.19361 0.08386 NA 0.02097 *## nodeicov.lgdp 0.17116 0.07984 NA 0.03207 *## nodeicov.polity -0.03782 0.01239 NA 0.00227 **## edgecov.Xpol -0.00451 0.00166 NA 0.00655 **## edgecov.Xigo -0.01144 0.00559 NA 0.04084 *## edgecov.Xldst -2.66342 0.14270 NA < 1e-04 ***## edgecov.Xlexp 0.05834 0.42660 NA 0.89122## edgecov.Xlimp -0.03532 0.42869 NA 0.93434## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Null Deviance: 23248 on 16770 degrees of freedom## Residual Deviance: 1568 on 16758 degrees of freedom#### AIC: 1592 BIC: 1685 (Smaller is better.)

Goodness of fit evaluation

s.SIM <- NULL

for (s in 1:S) {Ysim <- as.matrix(simulate(fit.cov.ergm))diag(Ysim) <- NAs.SIM <- rbind(s.SIM, c(sd(rsum(Ysim)), sd(csum(Ysim)), mdyad(Ysim)))

}

tH

Den

sity

1.5 2.0 2.5 3.0 3.5

0.0

0.5

1.0

1.5

2.0

2.5

tH

Den

sity

1.4 1.6 1.8 2.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

tHD

ensi

ty10 20 30 40

0.00

0.02

0.04

0.06

0.08

0.10

Improvement via covariates

tH

Den

sity

1.0 1.5 2.0 2.5 3.0 3.5

01

23

45

tH

Den

sity

1.0 1.2 1.4 1.6 1.8 2.0

01

23

45

tH

Den

sity

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Summary

• Covariates can be included in the ERGMs.• dyad level covariates: nodematch, edgecov and others;• node level covariates: nodeocov, nodeicov and others.

• Covariates can often partially explain degree heterogeneity and reciprocity.

• Node-level parameters are confounded with node-level covariate effects.• two stage approach: fit node-level parameters, and then relate to covariates;• random effects model: next lecture.

Social relations models for binary and valued relations567 Social network analysis

Peter Hoff


1/60

Limitations of ERGMs

We’ve developed, fit and evaluated models of the form

Y ∼ sender covariates+receiver covariates+dyad covariates+network statistics

where “network statistics” includes

• nodal in- and outdegree statistics;

• number of mutual dyads.

Including the above network statistics allows the model to represent

• nodal heterogeneity of in- and outdegrees;

• within-dyad dependence of ties.

The ergm software allows for additional network statistics in the model.

2/60

Limitations of ERGMs

We’ve encountered several limitations of ERGMs so far:

1. computationally intensive estimation algorithms;

2. confounding of nodal covariate effects with nodal degree effects;

3. models are limited to binary relational data.

The third one is particularly restrictive - most relations are not binary:

• monk data: ranked relational outcomes;

• conflict data: positive, negative and valued military relations;

• primate data: counts of relational events.

Seven of the first ten datasets at http://moreno.ss.uci.edu/data.htmlhave valued relations.

3/60

A strategy for valued relations

Network patterns can be viewed as dependencies among observations:

• Reciprocity can be viewed as dependence between yi,j and yj,i .

• Outdegree heterogeneity can be viewed as dependence among{yi,1, . . . , yi,n}.

• Indegree heterogeneity can be viewed as dependence among{y1,j , . . . , yn,j}.

Statistical methods for dependent data use normal random effects models torepresent dependencies among observations.

4/60

Linear regression

Suppose our data consist of the following:

• Y = {yi,j : i 6= j}, an n × n sociomatrix of “continuous” entries;

• X = {xi,j : i 6= j}, an n × n × p array of covariates or predictors.

A linear regression model posits that

yi,j = βTxi,j + εi,j

where

• β is an unknown p × 1 vector of regression parameters;

• βTx = β · x =∑p

k=1 βkxk is the dot-product of β and x.

• {εi,j : i 6= j} are mean-zero disturbance terms or “noise.”

5/60

Ordinary least squares


Given data Y and X, how should we estimate β?

Ordinary least squares (OLS):

βols = arg minβ

∑

i 6=j

(yi,j − βTxi,j)2

= (XT

X)−1XT

y

where

• y is the vectorized version of the matrix Y;

• X is the “matricizized” version of the array X.

6/60

Maximum likelihood estimation

Suppose we assume {εi,j : i 6= j} ∼ i.i.d. normal(0, σ2).

βmle = arg maxβ

log p(Y|X, β)

= arg maxβ− 1

2

∑

i 6=j

(yi,j − βTxi,j)2

= arg minβ

∑

i 6=j

(yi,j − βTxi,j)2

= βols

Under the assumption of i.i.d. normal errors, βmle = βols.

7/60

Properties of OLS estimators

Under very general conditions on X and {εi,j : i 6= j},• βols is unbiased ;

• βols is consistent.

Neither of these properties rely on the εi,j ’s being normal or independent.So why worry about network dependence?

Problem: Generally we are interested in more than a point estimate of β. Wewant

• standard errors;

• p-values;

• hypothesis tests.

These items require an accurate model for the dependence among the εi,j ’s.

8/60

A bit of linear algebra

Matrix form of linear regression:

y = Xβ + ε

The variance of βols can be computed with some matrix algebra:

y = Xβ + ε

β = (XTX)−1XTy

= (XTX)−1XT (Xβ + ε)

= β + (XTX)−1XTε

From this we see E[β] = β if X and ε are uncorrelated, and

Cov[β] = E[(XTX)−1XTεεTX(XTX)−1]

= (XTX)−1XTCov[ε]X(XTX)−1.

The variance of β, and therefore its standard errors, depends on Cov[ε].

9/60

Generalized linear regression

Returning to network and relational data:

Normal linear regression model:


{εi,j} ∼ i.i.d. normal(0, σ2)

Generalized linear regression model:


{εi,j} ∼ ?

Network covariance: What correlations do we expect among the εi,j ’s ?

10/60

Network covariance

What were the main deviations from independence we’ve seen in logisticregression models?

• excess heterogeneity of in- and outdegrees;

• excess reciprocity.

An analogy to continuous variables suggests the possibility of

• within-dyad correlation between εi,j and εj,i ;

• within-sender correlation among {εi,1, . . . , εi,n};• within-receiver correlation among {ε1,j , . . . , εn,j};

Let’s evaluate some relational data for the presence of such dependence.

11/60

Example: Effects of conflict on trade

Trade data on 130 countries from 1990-2000

Outcome variable:

• yi,j = average trade between 130 countries from 1990-2000.

Nodal covariates:

• population

• gdp

• polity

Dyadic covariates:

• number of conflicts

• geographic distance

• number of shared intergovernmental organizations

(trade, population, gdp and distance are on the log scale)

12/60

Some exploratory data analysis

Instead of out and indegrees, we can look at row and column averages:

• row mean = total exports

• column mean = total imports

row mean

Fre

quen

cy

0 20 40 60 80 100

020

60

●●●

●●

●●

●

●

●●●●●●

●

●●●

●

●●●●●

●

●●●●●●

●

●●●●●

●

●

●

●

●●●●

●

●●●●●●

●●●●●

●

●

●●

●

●●●● ●●●

●

●●

●

●●●

●●●●●

●●

●

●

●

●●●●

●

●

●●

●●

●

●●

●

●

●

●●

●

●

●

●●●

●●

●

●

●

●

●●●

●●

●

●

●

●

●●●●

0 20 40 60 80 100

040

80

row mean

colu

mn

mea

n

●●●

●

●

●●

●

●

● ●● ●●●

●

●● ●

●

●●● ●

●

●

●● ●● ●●

●

● ● ●● ●●

●

●

●

●● ●●●●●● ●●●

●●●●

●●

●

● ●

●

●●●●●

● ●

●

● ●

●

●●●

●●● ●●●

●●

●

●

● ●● ●

●

●●●

●●

●

●●

●

●

●

●●

●

●

●

● ●●

●●

●

●

●

●

●● ●●●

●

●

●

●

●●●●

0 2 4 6

040

80

population

row

mea

n

●●●

●

●

●●

●

●

●●● ●●●

●

● ●●

●

● ●● ●

●

●

●● ● ● ●●

●

● ●● ● ●●

●

●

●

●● ●●●

●●● ●●●

●●●●

●●

●

● ●

●

●●●●●

● ●

●

●●

●

●●●

●●● ●●

●●

●

●

●

● ●● ●

●

●●●

●●

●

●●

●

●

●

●●

●

●

●

●●●

●●

●

●

●

●

● ● ●●●

●

●

●

●

●●● ●

0 2 4 6 8

040

80

gdp

● ●●

●

●

●●

●

●

●●● ● ●●

●

● ●●

●

●● ●●

●

●

●●● ●● ●

●

● ●● ●●●

●

●

●

● ●●●●

●● ●● ●●

●● ●●

●●

●

●●

●

●● ● ●●

● ●

●

●●

●

● ●●● ● ●●

●●

●●

●

●

● ● ●●

●

●● ●

●●

●

●●

●

●

●

●●

●

●

●

●●●

●●

●

●

●

●

● ●●●●

●

●

●

●

●● ●●

−10 −5 0 5 10

040

80

polity

13/60

Some exploratory data analysis

●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●

●

●●●●●●

●

●●●●●●●●●●●

●●●●●●●●●

●

●●

●●

●●●●

●

●●●●

●●

●●●●●●●●

●●●

●

●●●●●●

●

●●●●●

●

●

●

●●●

●

●●●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●●

●●

●

●

●●

●

●

●●

●

●●●●

●

●●

●

●

●

●

●

●

●●●●●●

●

●

●●●●●●●

●

●●

●●●●●●●

●

●●

●●●●●

●●

●●●●●●●●●●●●●

●

●●●●●●●●●●●

●

●

●

●

●●●●

●

●●●●

●

●●●

●

●

●●●●●

●

●●

●

●

●

●

●

●●●●

●

●●●●●●

●

●

●●●●●●●

●

●●●

●

●

●

●

●

●

●●●●●

●

●

●●●●●●●●●●●●●●●●●

●

●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●

●●●

●●

●●●

●●●●

●

●●●●●●●●●

●●

●●

●●●●●

●●●●●●●

●

●

●

●

●●●●●●●

●●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●

●

●●●

●

●●●●●●●

●

●●●●●

●●●●●

●

●

●

●

●

●

●●●●

●

●

●●●●

●

●●●

●●●

●

●●

●

●

●●

●

●

●

●●

●●●●●●●●●●●●●

●

●●●●●●●●●●●●

●

●●●●

●

●●●●●●●●●●●●

●●

●

●

●●●●●●●●●●●●●●●●●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●●●

●●●

●

●

●●●

●

●●●●

●

●

●

●

●

●●●●●

●●

●●●

●●●●

●

●

●●

●

●

●

●●

●

●

●●●

●

●

●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●

●

●

●

●

●●●●●●

●●

●●●●

●●

●●

●●●●●●

●

●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●

●

●●●●●●

●●

●●●●●●●●●●●●●●●●●●

●

●

●●●●●●●

●●●

●●●

●●

●●●●●●

●●

●

●●

●●●●●●●●

●

●●●●

●●

●●●●●●

●

●●●

●

●

●

●●●

●

●

●

●●●

●

●●●●●

●

●●●●

●

●

●

●

●●

●●●

●

●●

●

●

●

●●

●

●●

●

●●

●

●●

●

●

●

●

●●●

●

●

●

●

●●

●

●

●

●

●●●

●

●

●

●●

●

●●

●●

●

●

●

●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●

●

●●●●●

●

●

●

●●●●●

●

●●●●●●●●●

●●●

●●●●

●●

●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●

●

●●●●●

●

●

●●

●

●●●●●●●●●●●●●

●●

●

●

●

●

●

●●●●

●

●●●●

●

●

●

●

●

●●

●

●

●

●

●

●●●●●

●

●

●

●●●●●

●

●●

●

●

●

●●●●●●●●

●●

●

●●●●●●●●

●

●●

●

●●●●●

●

●

●

●

●●●●

●

●●●

●

●

●

●

●●●

●

●

●

●●

●

●

●●

●

●●●

●●●

●

●●●●●●

●

●

●●●

●

●

●●●●●

●

●●●●●●

●

●

●●●●

●●●●●●

●●●

●●

●●●●

●●

●●●●●●●●

●

●●●●●●●●●●●

●●

●●●●●●●●●●●●●●

●

●●●●●●●

●

●●●●

●●

●

●

●●●●●●●●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●●

●

●●●●

●

●●●

●●

●

●●

●

●●●

●●●

●●

●

●

●●

●

●●

●

●●●●●●●●

●

●●●●

●●●●

●●

●

●

●

●●

●

●

●

●●●

●

●●

●

●

●

●●●●●●●●●●●●

●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●

●

●●

●

●

●

●

●

●

●●●

●

●●●

●

●

●

●●●●●●●

●

●

●

●

●

●

●●●●●●

●

●

●

●●●

●

●

●

●●

●

●●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●

●●●●●●●●●●

●

●●●●●●

●

●●●

●

●

●

●●●

●

●

●

●●●●

●

●●

●

●

●

●●

●

●●●

●●

●●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●

●

●●●●●●

●

●

●

●●●●●●●●

●

●●●

●

●●●●

●●●

●

●

●

●●●●●●●●●●●●●●●●●●

●

●●●

●●●●●●●●●●●●

●

●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●

●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●

●

●

●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●

●●

●

●

●

●●

●

●●

●●●●●●

●

●

●

●●●

●

●●●●●●●●●

●

●●●●

●●

●

●

●

●●

●

●

●●

●

●

●

●●●

●

●●●

●

●●●

●

●

●●●●

●●●

●

●●

●

●

●

●●

●

●●●●●●

●

●●

●●●●

●●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●●●●●●●●●●

●

●●●●●●●●●

●

●●●●●●●●●●

●●●●●●●

●

●

●●●●●●

●●●●●●●●●

●●●

●●●●●●●●●●●

●

●

●●●●●●

●

●●●

●

●●●●●●●●●●●

●●●●●●●●●●●●●●

●

●

●●●●●●●

●

●

●●

●●●●●●

●

●

●●●●●●●●●

●●

●●●●●

●

●

●

●

●

●

●●●●●●●

●

●

●●●

●

●

●

●●●

●●●●●●

●●

●●●●●

●

●●●●●●●●●●●●●●

●

●●●●●●●●●

●

●

●●●●●

●●

●

●●●●●●

●

●●

●●●●●

●

●

●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●

●

●

●

●

●

●●●●●●●

●

●●●

●

●

●

●

●●

●

●●

●

●●

●●●

●●

●●●

●●●

●

●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●

●

●●●●

●

●●●●

●●

●

●

●●●●●●

●

●●●●●●●●●

●●●

●●

●●●●●●●●●●●●●

●

●●●●●●●●●

●

●●●

●

●●●

●●●●

●

●

●●●●●

●

●●●●●●

●

●●●●●●●

●

●●●●●

●

●

●●●●●

●●●●●●●●●●●●

●

●●●

●●●

●●

●

●●●●

●

●●●

●

●

●●●

●

●

●●●●●●

●

●●●

●

●●●

●

●●

●

●●

●

●

●●

●

●

●

●

●●●●●

●

●

●

●

●

●

●●

●

●●●●

●

●●●●

●

●●●●

●

●

●●●●●●●●●●●

●

●

●

●

●●●

●●

●●●

●

●

●

●●●●●

●

●

●

●

●

●●

●●

●

●●

●●●●●

●

●

●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●●●●

●

●●

●

●

●●

●●●●

●●●

●●

●●●●●●

●●

●●●

●●●●●●●

●●

●●●

●●

●●●●

●

●

●

●●●

●

●●

●●

●

●●●●

●

●●

●

●

●●●●●●●●

●

●●●●●●●●●●

●

●

●

●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●

●●

●

●●

●●●●●●

●

●●

●

●●●●

●

●

●●●●●

●

●●

●

●●●

●

●

●

●●

●●●●

●●●

●

●

●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●

●

●●

●

●

●●

●●●●

●

●

●

●

●●●●●●

●●

●

●●●

●

●●

●

●

●●●●●

●

●

●●

●

●●●●●●●●●●●●

●

●●●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●●●●●●

●●●●●

●

●●●

●

●●●●●

●

●

●●●●●●

●

●●●

●●●

●

●

●●●●●●●

●

●

●●●●●●●●

●

●●●●

●

●

●

●

●

●

●

●

●●●

●●●●●●

●

●●

●●●●●●●

●●●

●●

●●●●●●●●

●●

●

●

●●●

●

●●

●

●●●

●

●●●

●

●●●

●

●

●

●●

●●●

●

●●

●

●

●

●

●

●

●●●

●

●

●

●●●

●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●●●●●●

●●●●●●●●●●●●●●●

●●●

●

●

●

●

●

●●

●

●●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●●●●

●

●

●

●●●

●

●●

●

●●●●●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●●●●●●

●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●●

●

●●

●

●

●

●●●

●●

●

●●

●

●

●

●

●●●

●

●●●

●

●

●

●

●

●

●

●

0 2 4 6 8

01

23

45

number of conflicts

trad

e

0 2 4 6 80.

000.

050.

100.

150.

20number of conflicts

trad

e

14/60

OLS estimation

dimnames(X)[[3]]

## [1] "rpop" "rgdp" "rpty" "cpop" "cgdp" "cpty" "con" "dst" "igo" "pti"

yv <- c(Y)Xm <- apply(X, 3, "c")

fit.ols <- lm(yv ~ Xm)

summary(fit.ols)$coef

## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.2984079 1.118e-02 -26.6862 1.091e-153## Xmrpop -0.0260559 2.195e-03 -11.8716 2.242e-32## Xmrgdp 0.0584610 1.889e-03 30.9484 1.463e-204## Xmrpty -0.0007722 3.484e-04 -2.2168 2.665e-02## Xmcpop -0.0239576 2.194e-03 -10.9173 1.181e-27## Xmcgdp 0.0558042 1.889e-03 29.5427 4.924e-187## Xmcpty -0.0001799 3.483e-04 -0.5166 6.055e-01## Xmcon 0.0343658 9.275e-03 3.7053 2.118e-04## Xmdst -0.0458067 3.805e-03 -12.0370 3.114e-33## Xmigo 0.0053237 1.980e-04 26.8896 5.825e-156## Xmpti 0.0002902 4.521e-05 6.4196 1.403e-10

15/60

Residual diagnostics - dyadic correlation

●●●●●●●●●●●

●●●●●●●●●●●●●

●●

●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●

●●●

●●●●●●●

●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●

●●●●

●●●●●●

●●●●●●

●●

● ●●●●●●●●●●●

●●●●

●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●

●●

●●●●●

●●●●●●

●●●●●●●●●●

●●●●●

●●●●●●

●●●

●●●●●●●●●

●●●●●●●●

●●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●●●●

●●

●●●●●●●●●●●●●●

●●●●

●●

● ●●●●●●●●●●●●●●●

●●●●●

●●

●

●

●●●●

●●●●●●●●●●●●●

●

●●●●

●●●●

●●●●●●●●

●●●●●●●

● ●●

●

●●●●●●

●●●

●●●●●●

●●●●●

●

●●●

●●●●●●●●

●

●●

●●●

●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●

●●●●●●● ●●●

●●●●●

●●●

●

●●

●●●●●●

●●

●●●

●●●●●

● ●●● ●●

● ●●●●●

●● ●●●●●●●●

●●●

●●● ●●●●●●●●●●●●●

●● ●●●●●●●

●●

●●●●

●● ●●●

●●●●●●●●●●●●

●●●●●●● ●●●●

●●●●●●

●

●

●●●●●●●●

●●●●●●●

●●●●●

●●●

●●●●●

●

●●●●●●●●

●●●● ●

●● ●

●●●●

●●●●●●●

●●

●●●

●

●●●

●

●●●●●●●

●

●●● ●●●●●●●

●

●●●

●●●●●●

● ●

●●●●

●

●●●●

●●●

●

●●●●●●●●

●

●

●●●●●

●●

●

●

●

●●●●●●●●●●

● ●●●●

●●●● ●●

●●●●●●●

●●

●●●●●●●

●●●●●

●●

●

● ●●●

● ●●●●●

●

●●●●●●

●

●●●

●●●●●●●●●●● ●●●

●●●●●●●●●

●●●●●●●

●

●●●●●●●●●●●●●

●●●

● ●

●

●●●●●●●●●●

●

●

●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●

●●●●●●●●

●●●● ●●●

●●●●●●

●●●●●●● ●●

●●●●●●●

●●●●●●●●●●●●

●●

●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●● ●●●●●

●●●●

●●●●●

●● ●●●●●●

●

●●●●●●

●●

●●

●●● ●●

●

● ●●●●●●●●●●

●

● ●●

●

●

●

●●

●

●● ●●●●● ●●●●●●●●●●●●●●● ●

●

●●●●●●●●●●●●● ●

●●

● ●●

●

●

●●●

●●

●

●

● ●●●●●●●

●

●

●

●●●●

●●●●●●●

●●●

●●●●●●●●

●●●●●●●●

●●●●●●

●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●

●●●

●●●●●●●●●●

●●●

●●●●●●●

●●●●●●●●●●

●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●

●●

●●●●●

●●●

●●●●●●●●●●●●●

●●●●●●●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●

●●●●●●●

●●●●●

●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●

●●

●●●●●●●●●●●●

●●

●●●●●●●●●●●●

●●●●●●●●●

●●●

●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●

●●

●●●●●●●●●

●●●●●●●

●●

●●●●●

●●●●●●●●● ●●●●●●

●●●●●●●●● ●●

●●●●●●●●●●●●

●●●●●●

●●●●●●●●

●●●●●●● ●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●

●●●●

●●●●●

●●●●●

●●

●●●●

●●● ●●●

●●●●●●●●

●

●●●●●●●●

●●●●

●●●●

●●●

●●●

●●●●●

●●●

●●●●●●

●●

●●

●●

●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●

●●●●

●●●

●●●●●●●●

●●●●●

●●●●●

●●●●●●● ●

●

●●●●●●●●●●●

●●●●

●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●● ●●

●

●●●●

●●●

●● ●●● ●

●●●●●

●●

●●●●●●●●

●●●● ●●●

●

●●● ●

●● ●●●●●●●●

●●●

●

●●

●

● ●●●●●●●●●●

●●●●●●●●●●●●

●

●●●

●●●●●

●●

●●●●●

●●●●●

●

●●●●●●

●

●●●●●●●●

●

●

●

●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●

●●●●

●●●●●●

●●●●●

●●●●●●●●●

●●●●●

●

●●

●●●●

●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●

●●●

●●●●●●●●●

●●●●●●●●

●●●●

●●●

●●●●●●●●

●●●

●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●

●●

●●●●●●●●●

●●●●●●●●●●●●

●●

●●●●

●●●●●●●●●●●●●●●

●●●

●●●●●●●

●●●

●●●●●●●●●●●●

●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●

●●●●●●

● ●●●●●●●●●●●

●●●●●●

●●●●

●●

●●●●●●●

●

● ●●●●●●●

●●●●●

●●

●

●●●●

●●●●●●●●

●●

●●●

●

●●

●

●●●●●●● ●●●

●

●●●●●●●●● ●● ●●●●●● ●●●●●●

●

●●●●●

●●●

●●●●●

●●●

●

● ●●●●●●●

●

●

●

●●●●●●

●●●●●●●●●●

●●●●

●●●●●●●●●●

●●

●●●●●●●●

●●

●●●

●●

●●

●●●●●●●●●●●●●

● ●●●●

●●●●●●●●●●●

●●

●●●●●●

●●●●●●●●●

●●●

●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●

●●●●

●●●●●●●●

●●●●●●●●●

●●

●● ●

●●● ●

●●●●●●●●●●●●●● ●●

●●●●●

●●●●●●●● ●●

●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●●●●●●

●●●

●●●

●●

●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●

● ●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●

●●

●●●●●●●●●

●●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●

● ●●●

●●●●

●●● ●

●●●●●●●

●●●●

●●●●●●

●●

●●●

●●

●●●●●

●●●●●●●●●●●●●●

●●●

●

●●●●●●●●●●●●●● ●●●●●● ●●●● ●●●●●●●●

●●●

●●●●●●

●●●●●●●●●●

●

●● ●●●●●●●

●

●

●●●●●●● ●

●

●

●●

●

●●●●

●●●

●●●

●

●●●●●

● ●●●●●

●●●●●● ● ●●

●

●●● ●●●●●●●●

●●

●●●

●

●

●●

●

●●●●●●●

●

●●

●

●●●●●●●

●●●●●

●

●●●●

●

●●

●●

●

●

●●●

●

●

●●

●

●

●

● ●●

●●

●

●

●

●

●●●●

●

●

●

●

●

● ●●●●●●●

●●●●● ●●

●●●●

●●

●●●●●

●●●●

●●●●●●

●●

●●●●

● ●●●●●●

●●●●●●●●●●●●●●●● ●●●●●●●●●●●

●●●●●●●●

●●●●●●

●●●●●●●●

●●●

●●●●●●●●●●●●●●●●

●●●

●●●●●●●

●

●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●

●●●●●●●● ●●

●●●

●●●

●●

●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●● ●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●

●●●●

● ●●●●●●

●

●●●

●●●●●

●●●

●●●●●

●●

●●

●●

●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●● ●●

●●●●●●●●●●

●●●

●●●●●●

●●●●

●●●●●●●●●

●●●●●●●

●

●

●

●●●●●●

●●●●

●●

●●●●●●

●●

●●●●●●●

●●●

●●●●●

●●

●●●

●●

●●●●●

●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●

●●●

●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●● ●

●●●●●●●

●●●●

●●●

●●●●

●●

●●

●●●●●

●●●●●●●

●

●●●●●

●

●●

●●●●

●

●●●●●●

●●●●●●●●●●

●●

●●●● ●●●●

●●

●●●●●●●●●●●

●●●●●●●●

●●●

●●

●●●●●●●●●

●●●●

●●●●●●

●●●●

●●●●●

●●●●●●●

●●●●● ●●●

●●●●●

●●

●● ●●●●●

●●

●●●●●●

●●●●●●

●

●

●●●●

● ●●●●●●●●●●●●

●

●●

●

●●●●●●●●●●● ●●●

●●●●●●●●

●●

●●●●●●●●

●●●●●●●●●●●●●●●

●

●

●●

●●●●●●●●●

●

●

●

●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●

●●●

●●●●

●

●●●

●●●●●

●●●

●●●●●●

●●●●

●●

●●

●●●●

●●●●

●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●

●●●●

●●●

●●●●●●●●

●●●●●

●●●●●

●●●●●●●

●

●

●●●●●●●●●●

●●

●●●

●●●●●●●

●●●●●●●●●

●●●●●●●●

●●●●

●●●

●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●

●●●

●●

●●

●

●●●

●●●●●

●●●●●

●●●●●

●●

●●

●●

●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●

●●● ●●●●●●●● ●

●●●●

●●●●●

●●

●●●●●●●

●

●

●●●●●●

●●●●●●●●

●●●●●●

●●

●●●●●● ●●●●●●●●

●●●●●

●●

●●●

●●●●●●●●

●●●●●●●

●●

● ●●●●●

●●●●●●

●●●●

●●●●●●●

●● ●●●

●●●●

●●●

● ●●

●●●●●●●●●●●

●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●

●●● ●

●●●

●●●●●

●●

●● ●●●●●

●

●●●●●●●●

●●●●●●

●

●●●●

● ●●●●●●●●

●●●●

●●●

●●●●●●●●●●●●

●●●●●●●●●●●

●●

●●●●●●●●

●●●●●●●●●●●●●●●●

●

●●●●●●●●● ●●

●

●

●

●●●●

●●●●● ●●

●●

●●●●●●

●●

●●●

●●●●●●●

●●●●●●●

●●●●●

●●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●

●●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●●

●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●

●●

●

●●●

●

●

●

●● ●●●●

●

●●●

●

●●●● ●

●

●●●●●●

●

●●●●

●●

●●

●●●●

●

●●●●●

●●●●●

●

●

●

●●

●

●●●● ●●●

●

●●

●

●●

●

●●●●●●●

●●

●

●●●●●●

●

●

●●

●●

●●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●●●

●●●

●

●

●

●

●●●●

●●●●●●●●●●●

●●●●

●●●●

●●●●●●●●●

●●●●●●●●●

●●●

●●●

●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●

●

●

●●●●●●

●●●●●●●●●●

●●●●●●●

●●●●●●●●

●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●

●●●●●●●

●●●●●●●●

●●●●●●

●●●●●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●

●●●●●●●

●●

●●●●

●●●

●●●●●

●●●●

●●●●● ●●

●●●●●●

●●●●●

●

●

●●●●●

●●●●●●

●●●●

●

●●

●●●

●●

●●●●●●●●●

●●●

●●●●●●●●●

● ●●●●●●●

●●

●●●●●

●● ●●●●●●

●●●

●●● ●●●

●●●●●● ●● ●

●●●●

●●●●●●●

●●

●●●●●●

●

●●●

●●●●●

●●●

●●●● ●●

●●

●●●

●●

●●

●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●

●●●

●●●●●●●●

●●●●●●●●●

●●●●●●

●●

●

●●●●●●

●●●●●●●●●

●●●●●●●●

●●●●●●●●

●●●●●●

●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●

●●●●

●●●●

●●●●

●●●● ●●●●●●●

●●●●●●●

●●

●●●

●●

●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●

●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●

●●

●●●

●●●

●●

●●●

●●●●●●●

●●●●●●

●●

●●●

●●

●●

●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●

●●●●●●●●●

●●●●●●●●

●●●●

●●●

●●●●●●●

●●●●

●●●●●●●

●●●●●●●●●

●●●●●●●●●●

●●

●●●●●●

●

●●●

●●●●●

●●●

●●●●

●●●

●●

●●●

●●

●●●●●

●●●●●●●●●●●●●●●

● ●●●●●●●●●●

●●●●●●●●●

●●●

●●●●●●●●●●

●●●

●●●●●●●●●

●●●●

●●●●●●●

●●●●●●●

●

●●●●●●●

●●●

●

●●●●●●●●●

●●

●●

●●●●● ●●

●●●●●●

●●●●●

●

●●●●●●●

●●●●●●

●●●●●●

●●● ●●●●●

●●●●●●

●●●●●●●●●●●●

●●●●●●●●●

●

●●●

●●●●●●●●●●●●

●●●●

●●●

●●●●●●●●

●

●●●●●●●

●● ●●

●●

●●●●

●●●●

●●●●●●●●

●

●●●●●●●●

●●●●●

●●

●●

●●●

● ●●●●●●●

● ●●●●

●●

●

●●●●●●● ●●●● ●●●●●●●●● ●●

●●●

●●●●●●●

●●●●●●●

●●●

●●●● ●●

● ●●●

●● ●●●●

●

●

●

●

●

●●●●●●● ●●

●

●●●●●●● ●●●●●●

●●●●●●

●

●●●●●●●●●●●●●

●●●

●●●●

●●●●●●●●●●●●

●●●

●

●●●●●●●

●

●●●●●●●●●●●●●●●

●

●●●●●

●●● ●●

●

●●●●●

●●

●

●●

●●●●●●

●

●●

●●●●●●

●

●

●

●●●●●●●●

●●

●●

●

●●●

●●●●●

●●●●●●●

● ●●●●●●●●●

●●●●●

●●

●

●●●●

● ●●●●●●●●●●●

●

●●

●

●●●●●●●●

●●● ●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●● ●●

●●●

●●●

● ●●●●●●●●●

●

●

●

●●●●●●

●●

●●●● ●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●

●●

●●

●

● ●●●●

●●●●●●

●●●

●●

●

●●

●

●●●●●

●●●●

●●

●●●●●●●●●●

●●

●

●● ●●●●●● ●

●

●●●

●●

●●●

●●●

●●●●●●

●●●●●●● ●● ●●●

●●●●●●●●●●●●●●●●

●●●● ●

●●●●●●●●●●●●●●●●

●●●●●

●●

●●●●●●

● ●●●●●●●●●● ●● ●●●●●●●●

●●●●●●

●●●

●●●●●●●●●

●●●

●●●●

●●●

●● ●●●●

●●●●●●●●●

●●●●●●●●

●●●●

●●

●

●●●●●●●●

●●●●

●

●●●

●●●● ●●

●●●●●●●

●●

●●●●●●●

●●●●

●

●● ●

●●●●

●●●●●●

●●●●●●

●●●

●●●●●

●●●●●●● ●●●

●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●

● ●●

●●●●●●●●●

●

●

●

● ●●●●●●

●

●●

●

●

●

●●●

●●●

●

●●●

●

●●●● ●

●

●●●●●●

●●

● ●●●

● ●●

●

●●●●

●

●●●●●

●●● ●

●

●

●

●●

●

●●●●

●

●●●

●●

●

●●●

●●●●●●●●●

●

●●●●●●

●●

●●

● ●

●●

●

●

●●

●

●

●

●●●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●●●●●●●●

●●●●

●●●

●●●●

●●

●●●

●●●●●●●

●●●●●●

●●

●●●

●●

●●

●●● ●●●

●●●●●●●●●●●●●

●●

●●●●●●●●●

●●●●●●●●●●●●●● ●●●●●●●●●●

●●●

●●●●●●●●

●●●●●

●●●●●

●●●●●●●

●

●●●●●●

●●●●●●●●●●

●●●●

●●●

●●●●●●●●

●●●●●●●●●●

●

●●

●●●

●●●

●●●●●●

●●●●● ●●●●

●●●●●

●●●●●

●● ●●

●●●●●●●

●●●●●●●●●●

●●●

●●●●●●

●●●●●●

●●●

●●●●●●●●

●●●●●●

●●●●●●●●●●

●

●

●

●

●●●●●●

●

●●●

●

●●●●●

●

●●●●●

●●

●● ●●● ●

●

●

●

●●●●●

●●●●●●

●

●

● ●

●

●

●

●●●●

●

●● ●●

●

● ●

●

●●●●●●●

●

●●●

●

●

●●●

●

●

●●● ● ●

●

●●

●

●

●

●●

●

●

●

●●●

● ●

●

●

●

●

●●●

●

●

●

●

●

●

●●●●●●

●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●● ●● ●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●

●●

●●●●●●●●●●●●

●●● ●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●

●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●

●●

●●●●●●●●●●●●●●●●

●●

● ●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●● ●●●●

●●●●●●●●

●●

●●●●●●● ●●●●●●●●●●●

●● ●●●●●●●● ●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●

●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●

●●●●●●●●

●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●

●

●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●

●●●

●●●●●●

●●●●●

●

●● ●●●

●●●●●●●●

●● ●●

● ●●●●●●

●●●●●●●●●

●●●●

●●●●

●●● ●●●●●●

●●●●●●●●●

●●

●●

●●●●●●●●

●●●●●●●●●●● ●●●●●●●

●●●●●

●●●

●●●●●●●

●●●●●●●●●●

●●●

●●●●●●

●● ●●●●●●●

●●●●●●●●

●●●●●

●●●●●●●●●●●●●

●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●

●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●

●

●●●●●●● ●●●●●●

●

●●●●●

●

●●●●●●●●●●●●●

●●

●

●●●●

●●●●●●●

●●●

●●●●

●●

●

●●●●●●●●●●

●●●●●●●●●● ●

●

●

●●

●●●

●●● ●●

●

●●●●●●●

●

●●● ●●● ●●

●

●

●

●●●●●●

●

●

●

●●●●

●●●●●●●●

●●●●

●●●●●●●

●●●●●●●●

●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●● ●●

●●

●

●●●●● ●●●●●●●●●

● ●●●

●●

● ●●●●

●●● ●●●●●●●●●●● ●●●

●

●●●●●●● ●●●●●●●●●●●●●●

●●●●●● ●●●●

●● ●●●●●●●●●●

●

●●●●●●

●●●●●●●●● ●

●

●

● ●●●●●

●●●●●●

●●●●

●●●●●●●

●●●●●●●●

●●●●●●

●●●●●●

●●

●●●●●

●●●

●●●●●●●●●●●●●

●●

●●●●●●●●●●

●●

●●●●●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●

●●

●●●●●●●●

●●●●

●●●●

●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●

●●●●●●●●●●

●●●●●●●

●● ●●●●●●

●●●●●●●●●

●●●●●●

●●

●●●●●●●●●●●●●●

●●●●●

●●

●

●

●●●●

●●●

●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●

●●●●●●●●●

●●●●

●●●●●●●●

● ●●●●● ●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●● ●●●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●

●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●

●●●●●

●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●

●● ●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●

●●●●●●●●

●●●●●●●●●●●●●●

●●

●●●●

●●●●●●●●●●●●●●●

●●●

●●●●●●●●

●●●

●●●●●●●●●●●●●

●●●

●●●●●●

●●●●●●●●

●●

●●●●

●●●●●●

●●●●●●●

●● ●●●●●●●●●

●

●●● ●●●

●●●●

●●●●●●●●

●●

●●●●●●●●

●●●●

●

●●●●●●

●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●● ●●●●●●●●●●●●

●●●●●●●●●●

●●●

●●●●●●●●●●●

●●●

●●● ●●●● ●●

●

●

●

●●●●

●●●●●●●

●●

●●●●●●

●

●●●

●●●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●

●

●●●●●●●●●●

●●●●

●●●●

●●●●

●●●

●●●●●●●●

●●●●●

●●●●●

●●●●●●●●●

●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●●

●●●

●●●●●●●●●

●●

●●

●

●●● ●

●●●●●●●

●

●●●●●●●●●●●

●●●●●●●●

●●●

●●●●●●●●●●

●●●●●●●

● ●●● ●●●●●●●●●

●

●●●●●●●

●●●●●●●●●●

●

●●●●

●●●●●●●

●●●●

●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●

●●● ●●●●●●●●●

●●●●

●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●

●●●●●

●●

●

●

●●●●●

●●

●●●●●

●

●●●●●

●●●

●

●●●●

●●●●●●

●●●●●●●

●●●●

●●●●●●●●●

●●

●●●

●●●●● ●●●

●

●●●●●●●●●●●●●

●●●●●●●●●●●

●

●●●●● ●●●●

●●

●

●

●

●●●●●● ●●

●●

●

●

●

●●●

●●●●●

● ● ●●●●●●

●

●●●●●●

●

●●●●

● ● ●●

●

●●●●●

●●●●●●●● ●●

●

●

●

●●

●

●●●●●●●●

●●●●●●●●●●●●●●

●●●●● ●●

●●●●

●●●●

●●

●●

●

●

●

●●●

●●

●

●

● ●●●●

●●●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●

●●

●●

●●●●●

●●●●●●●●●●●●●●

●●

●

●●●●●●●●●●●

●●●●●●●●●

●●●●●

●●●●●●●

●

●

●●●●●

●●●●●●●●●●●

●●

●●●

●● ●● ●●●●●●●●●●●

●●●●●●●●

● ●●●

●●●●●●●●●

●●

●●●●●●●●●●●●

●●●●●●

●●

●●●●●●

●●●●●●●●● ●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●

●●●

●●●●●

●● ●●●●●●●●●● ●●

●●●

●●●●●●●

●

●

●●●●

●●●●●●●

●●

●●●

●●●

●

●●●

●●●●●

● ●●

●●●●

●●●

●●

●●●

●●

●●●●

●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●● ●●

●●●●●●●

●●●● ●●●

●●●●●●●

● ●●●

●●● ●●●

●●●●●●

●● ●●●●●●●●●

●●●●

●●●

●●

●●●

●●●

●●●●●

● ●●

●●●●●●

●●

●●●

●●

●●

●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●

●●●●●●●●●

●● ●●●●●

●●●●

●●●

●●●●●●●

● ●●●●●●●●●

●●●●●●

●● ●●●●●●●●●

● ●●●●●●●

●●●●

●●●●●●●●●

●

●●●●●●●●

●●●●●

●●●●

●●

●●●●

●●●●● ●● ●●●●●●

●

●●●●●●●●

●●● ●●●●●●●●●●●

●

●

●●

●●●●●●●

●●●●● ●●●

●

●●●

●●●●●

●

●●●●●● ●●●

●

●

●●●●

●●●●●

●

●●

●●●●

●●●●●

●●●

●●●●●●●

●●●●●●●

●●●●●

●●●●●●

●●●●●●●●

●●●●●●●●●● ●●●●●●●

●●●●

●●●●●●●●●●●

●●●●●●●●●●●● ●●

●●●●●●●

●●●●

●●●●●

●●●●●●●

● ●●●●●●●●● ●

●●●● ●●●●

●●●●●

●●

●●●●●● ●●

●●●●●●●

●●●●●

●●

●●●●

●

●●●●●●●

●●●●●●

●●●●

●●●●●●●●●

●●

●●● ●●●●●●●●●

●●●●●●●

●

●● ●●●

●●●●●●●●

●●●●●

●●●●●●●●

●●●

●●

●●●●●●●

●

●●●● ●●●

●●●●●

●●

●●

●●●●●●●

●●●●●●●

●●●●●

●●

●

●●●●

●●●

●●●●●●●

●●●●●●

●●

●●●●●●●●●

●●●

●●●●●●● ●●

●●●

●●●●●

●

●●●●●

●● ●●●●●

●

●●●●●●●●

●●●●●

●●

●

●

●

●●●●

●●●●●●●●●●●●● ●●●●●●●

●●●●●●

●●●●●●●

●●●●●●

●●

● ●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●

●●●●●●●

●

●●●●● ●●●

●●●●

●●●●●●●

●●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●

●●●●●●●●●●●● ●●●

●

●

●●●●● ●●●●●●

●●●

●●●●●●●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●● ●●

●

●●●●●●● ●●●

●●●

●

●●●● ●

●

● ●●●●●●●●●●● ●

●●

●

●●●●●●●●●●●●

●

●●

●●

●

●●

●

●●

●

●●●●

●

●●

●

●●●●●●● ●●●●●

●

●●●

●

●

●●●● ●

●● ●●

●

●●

●

●●

●●●● ●●

●

●

●

●●●●

●

●

●

●

●

● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●● ●●●

●

●●●●●●●●●●●●●

● ●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●

●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

● ●●●●

●●●

●●●●●

●●●●●●●●

●

●●●●●●●●

●●

●●●

●●

●●●● ●

●●●●●●●●●● ●●●

●

●●

●

● ●●●●●●●●●● ●●●●

●●●●●●●

●●●●●●●●

●●●●

●

●●● ●●●●●●●●●●

●●

●

●●●●●●●●

●

●

●

●●

●●●●●●

●●●●

●●●●

●●●

●

●●●

●●●●●

●●●

●●●●●●

●●

●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●

●●●●

●●●●●●●●●●

●●●

●●●●●●●

●●●●●

●●●●●

●●●●●●●

●

●●●●●●

●●●

●●

●

●●●●●●●

●

●●●●

●●●●●●

●●●●●●●

●●●●● ●●

●

●

●●●●●

●●●●●●

●

●

●●●

●

●

●●

●

●●●●● ●●●●●●●●●●●●●

●●●●●

●

●

●●●

●

●●●●

●

●

●●●●●●

●

●

●

●●●● ●●

●

●●

●●●

●●●

●

●

●

●● ●●●●

●●

●●●●

●●●●

●●●●

●●●●●●

●●●●●●

●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●

●●●

●●

●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●

●●●

●●●●●

●●●●●●●●●●

●●●●●●●

●●●●●●●●

●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●

●●●●●●●

● ●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●

●

●● ●●●●●●●●●●●

●

●●●●●

●

●●●●●●●●●●●●●●●

●

●●●●

●●●●●●●

●●

●

●● ● ●●●

●

●●●

●●●●

●

●●●●●●●●●●●●●●●

●

●●●

●

●

●●●● ●

●

●●●● ●●●●●●●● ● ●●

●

●

●

●●●●●

●

●

●

●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●

● ●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●● ●●

●

●●●

●●●●●

●●● ●●●●●

●● ●●●●

● ●●

●●●●●●

●

●

●●●●●●●●●●● ●

●●●●●

●

●●●

●●●● ●●●●●●

●

●●●

●●●●●● ●●●

●

●●●●●●●

●

●● ●●●●●●

●●●

●●●

● ●●

●●

● ●●●●●●●

●

●

●

●●●●●●●●●●●

●●●●●●

●●●

●●●● ●●●●●●●

●●●●●●●●●●●●

●●● ●●

●●

●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●

●●

●●●●●●

●●●●●●●●●

●●●●●●●

●●●

●●● ●●●

●●●●●●●●

●

●●●●●●

●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●

●

●●

●●●

●●●

●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●

●●●

●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●

● ●

●

●

●●●

●●●●●

●●

●●●●●●

●

●●●●●●

●

●●●●●

●

●

●

●

●●●●

● ●●●●●●●●

●●●●

●

●●

●

●●●●●●●●●●● ●●● ●●●●●●●●

●●

●●●●●●●

●●●

●●●●●●

●●●●●

●●●

●●

●

●●●●●●●●

●

●

●

●●●●●●●●●

●

●

●

●

●●●

●●●●

●●

●●●●●●●

●

●●●●●●

●●●●●● ●●●

●

●●●●

● ●●●●●●●●

●●●

●

●

●●

●

●●●●●●●

●

●●●●●●●●●●●●●●●

●

●● ●●●●●● ●●

●

●●●

●

●

●●

●

●●

●●●●

●

●

●●

●●●●●

●

●

●

●

●●●●●●

●●●●●●●●●●

●●●●

●●●●●●●●●●●

●●●●●●●●●●

●

●●

●●●

●●●

●●●●●●●●●●●●●●●

● ●●●●●

● ●●●●●●

●●●

●●●●●●●●●●●●

●●●●

●●●

● ●●●●●

● ●●●●●●●●

●●●●●●●

● ●●●●●●●●●●●●● ●●

●

●●●

●●●●●●●

●●●

●

●●●● ●

●

● ● ●●●●●●●●●● ●●●

●

●●●●●●●●●●● ●

●

●●●

●

●

●●

●

●●●●●●●

●

●●

●

●●●●●●● ●●●●●

●

●●●●

●

●●●●●

●

●●●

●●

●●

●

●●

●●●● ●●●

●

●●●●●

●

●

●

●

●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●

● ●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●

●●●

●●●● ●●●

●●●

●●●●●●

●

●●●●●●●●

●●●●●

●●

●

●●●●

●●●●●●● ●●

●●●●●

●●

●

●●●●●●●

●

●●● ●●●● ●●●●●●●●

●

●●●●●●●●●●

●●●●● ●●●

●

●●● ●●●

●●

●

●●●●●

●●

●

●

●

●●●●●●

●●●●●●

●●●●

●●●●●●●

●●●●●●●●

●●●●●●

●●●●●●●●

●●●

●●

●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●

●●

●●●

●●●●

●●●

●●●●●

●●●

●●●●●●

●●

●●●

●●

●● ●●●

●●●●●●●●●●●●

●●●● ●●

●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●

●●●

●●●●●●

●●●● ●●●●●

●●●●●●●●●●●

●

●●●●●●

●●

●●●●●●●●

●●●●

●●

●●●●●●●●●

●●●●●●●●●●

●

●●

●

●

●●●●

●●●

●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●

●●●

●●●

●●●●●●●●●●●●

●●●●

●●●●●●●

●●

●●●●●●●● ●●

●●●● ●●●

●●●●

●●

●●●●●●●

● ●●●●●●

●●●

●●●●●

●●

●

●●●●

●

●●●●●●●●●

●●●

●

●● ●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●

●●●

●● ●●●

●●●

●●●●●●●●●●

● ●●●●●●●●

●

●

●

●●●●●●●●●

●●● ●●●●●●●●●●●●●●

●●● ●●●●●●●●

●●●●●

●●● ●●●●●

●●●●●●●

●

●●

●

●● ●●●

●

●●●●

●●● ●●●●●●● ●●●●●●●●● ●

●●

●●●●

●●●

●

●

●●●● ●●●

●

●●●●●●●●

●●

●●●

● ●●●

●●

●●●●

●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●

●●●

●●●●●●●● ●●●●●

●●●●●●●●●

●●●●●●●●●●

●

●●

●

●● ●●● ●

●●

● ●

●

●●●● ●

●

●●●●●●

●

●●●●

●●

●

●

●

●●●●●

●● ●●●●

●●

●

●

●

●

●

●●

●

●●●●●●●

●

●●●

●●●●●●●●

●

●

●

●

●

●●●●

●●

● ●

●●

●

●●

●

●

●

●●

●

●

●

●●●

● ●

●

●

●

●

●●●

●●

●●

●

●●●●●●●●●●●●

●●●

●●

●●●●

●●●

●●●●

●●

●

●●●●●●

●●

●●●

●●

●●

●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●

●●●●●●

●●●●●●●●

●●●

●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●

●●●●●

●●

●

●

●

●

●

●●● ●●●

●

●● ●

●

●●●●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●●

●

●●●

●

●

●●

●

●●

●

●

●

●

●

●●●●●●●

●

●●

●

●●●●●●●

●

●●

●

●

●

●●●

●

●

●

● ●●

●

●

●●

●

●

●

●●

●

●

●

●●●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●●●●●●

●●●●

●●●●

●●

●●●●

●●●●●●●

●●●●●●●●●●● ●

●●●

●●

●●●●●

●●●●●●●●●●●●● ●●●●●●●●●●●

●●●●

●●●●●●●●●●●

●●●●●●●●

●●●●

●●●

●●●●●●●●●●●

●●●●●

●●

●●●●●●●

●

●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●

●●●

●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●●●●

●●●

●●●●●●●●●

●●●●●●●●

●●●

●● ●●●●●

●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●● ●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4

01

23

4

E

t(E

)

E <- matrix(0, nrow(Y), ncol(Y))E[!is.na(Y)] <- fit.ols$res

cor(c(E), c(t(E)), use = "complete")

## [1] 0.9108

16/60

Residual diagnostics - dyadic correlationThe residual within-dyad correlation is quite high: ρ = 0.7011.

Is it large enough to reject H : Cor[εi,j , εj,i ] = ρ = 0 ?

Hypothesis test:Compare ρ with what we would expect under the model

{εi,j} ∼ i.i.d. normal(0, σ2).

A normal theory test for of H : ρ = 0 exists, but we might be concerned withthe normality assumption:

E

Fre

quen

cy

−1 0 1 2 3 4

040

0080

00

17/60

Permutation tests

It turns out we can test independence without assuming normality.

Idea: If {εi,j} are independent, then the null distribution of the residuals {εi,j}given β is (approximately) obtained by sampling from the observed residualswithout replacement.

Analogy: Recall under the SRG, the distribution of the graph given the totalnumber of edges was obtained by sampling the edges uniformly withoutreplacement.

s.obs <- cor(c(E), c(t(E)), use = "complete")

s.SIM <- NULLEsim <- E

for (s in 1:S) {Esim[!is.na(E)] <- sample(E[!is.na(E)])s.SIM <- c(s.SIM, cor(c(Esim), c(t(Esim)), use = "complete"))

}

18/60

Permutation tests

tH

Den

sity

0.0 0.2 0.4 0.6

05

1525

35

We reject the assumption of independence, based on residual dyadic correlation.

19/60

Residual diagnostics - row and column heterogeneity

Under independence,

• all residuals should be centered around zero;

• all residuals within a row should be centered around zero;

• all residuals within a column should be centered around zero.

Failure of the latter two items suggests

• within-row correlation, which is equivalent to

• across-row heterogeneity.

20/60

Residual checks

−0.

50.

51.

5ex

port

res

idua

l

●

−0.

50.

51.

5−

0.5

0.5

1.5

●

●●

●−0.

50.

51.

5−

0.5

0.5

1.5

●●

●●●

●

●

−0.

50.

51.

5

●

●●

●

●

●

●

●●●●●

●●

●●

−0.

50.

51.

5−

0.5

0.5

1.5

●●

●●

−0.

50.

51.

5

●

−0.

50.

51.

5−

0.5

0.5

1.5

−1.

00.

01.

02.

0im

port

res

idua

l

●

−1.

00.

01.

02.

0−

1.0

0.0

1.0

2.0

●●●●

●

●●

●

−1.

00.

01.

02.

0−

1.0

0.0

1.0

2.0

●

●●●●●●

●

●

●

●●

●−1.

00.

01.

02.

0

●●

●

●●

●●●●

−1.

00.

01.

02.

0−

1.0

0.0

1.0

2.0

●

●●

−1.

00.

01.

02.

0

●

−1.

00.

01.

02.

0

●

−1.

00.

01.

02.

0• Some countries have residuals that are mostly below zero;

• Some countries have residuals that are mostly above zero.

Is such a pattern likely under independence?

21/60

F statistics for row and column heterogeneityThe standard way to evaluate for systematic row and column variation is withan ANOVA:

εi,j = ai + bj + ei,j

HR : a1 = · · · an = 0

HC : b1 = · · · bn = 0

Under normality, these hypotheses can be evaluated with F -tests:

rowidx <- matrix(1:nrow(Y), nrow(Y), nrow(Y))colidx <- t(rowidx)anova(lm(c(E) ~ as.factor(c(rowidx)) + as.factor(c(colidx))))

## Analysis of Variance Table#### Response: c(E)## Df Sum Sq Mean Sq F value Pr(>F)## as.factor(c(rowidx)) 129 122 0.945 21.0 <2e-16 ***## as.factor(c(colidx)) 129 119 0.923 20.5 <2e-16 ***## Residuals 16641 749 0.045## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

22/60

Permutation tests

The validity of the p-values depend on normality.

We can evaluate independence without assuming normality with a permutationtest:

s.obs <- anova(lm(c(E) ~ as.factor(c(rowidx)) + as.factor(c(colidx))))$F[1:2]s.SIM <- NULLEsim <- E

for (s in 1:100) {Esim[!is.na(E)] <- sample(E[!is.na(E)])s.SIM <- rbind(s.SIM, anova(lm(c(Esim) ~ as.factor(c(rowidx)) + as.factor(c(colidx))))$F[1:2])

}

23/60

Permutation tests

tH

Den

sity

5 10 15

01

23

4

tHD

ensi

ty5 10 15

0.0

1.0

2.0

3.0

We reject the assumption of independence, based on the row and columnF -statistics.

24/60

Row, column and dyadic effects

Consider a normal RCE model of the form



Our residual diagnostics suggest that

• ai and bi might be correlated;

• ei,j and ej,i might be correlated.

Analysis of the covariance of {ai} {bj}, {ei,j} was studied by Warner, Kennyand Stoto (1979), and their “error decomposition”


is called the social relations model.

25/60

Social relations model

Original SRM:yi,j = µ+ ai + bj + ei,j .

Original motivation: Decompose variance around µ into parts describing

• heterogeneity across rows means (outdegrees);

• heterogeneity across column means (indegrees);

• correlation between row and column means;

• correlation within dyads.

Note that this is nearly a direct analogue to the p1 model.

Pr(Yi,j = 1|Yj,i = 1) =exp(µ+ ai + bj + γyj,i )

1 + exp(µ+ ai + bj + γyj,i )

26/60

The social relations model

Random effects version: (Wong [1982], Li and Loken[2002])

yi,j = µ+ εi,j


{(a1, b1), . . . , (an, bn)} ∼ i.i.d. N(0,Σab)

{(ei,j , ej,i ) : i 6= j} ∼ i.i.d. N(0,Σe)

Σab =

(σ2a σab

σab σ2b

)Σe = σ2

e

(1 ρρ 1

)

27/60

Understanding random effects

What does the SRM imply about the correlation between relationalmeasurements?

Recall,Cov[U,V ] = E[(U − µU)(V − µV )]

Let’s compute the covariance between yi,j and yi,j , j 6= k:

Cov[yi,j , yi,k ] = E[(yi,j − µ)(yi,k − µ)]

= E[(ai + bj + ei,j)(ai + bk + ei,k)]

= E[a2i ] + E[aibk ] + E[aiei,j ] + · · ·+ E[ei,jei,k ]

= σ2a

28/60

Covariances in the SRM

Similar calculations give the following non-zero covariances:

Cov[yi,j , yi,k ] = σ2a (within-row covariance)

Cov[yi,j , yk,j ] = σ2b (within-column covariance)

Cov[yi,j , yj,k ] = σab (row-column covariance)

Cov[yi,j , yj,i ] = 2σab + ρσ2e (row-column covariance plus reciprocity)

All other covariances are zero:

Cov[yi,j , yk,l ] = 0 if i , j , k, l are distinct

29/60

Social relations regression modeling

Just like with the p1 model, we may want to include regressors in the networkdependence model

Regression modeling:

yi,j = βT xi,j + ai + bj + ei,j

{(a1, b1), . . . , (an, bn)} ∼ i.i.d. N(0,Σab)

{(ei,j , ej,i ) : i 6= j} ∼ i.i.d. N(0,Σe)

We call such a model a normal social relations regression model (SRRM).

30/60

An alternative formulation

Suppose our regressors can be categorized as sender, receiver and dyadic:

βT xi,j = βTr x

ri + βT

c xcj + βT

d xdi,j

The SRRM can be equivalently formulated as follows:

yi,j = βTd x

di,j + ri + cj + ei,j

ri = βTr x

ri + ai

cj = βTc x

cj + bj ,

where the variance/covariance of the error terms ai , bj , ei,j is as before.

This is exactly the same model as the SRRM, but can assist with interpretation.

31/60

Fitting SRRMs

amen-package package:amen R Documentation

Additive and multiplicative effects modeling of networks and relationaldata

Description:

This package computes Bayesian estimates for additive andmultiplicative effects (AME) models for networks and relationaldata. The basic model includes regression terms, the covariancestructure of the social relations model (Warner, Kenny and Stoto(1979), Wong (1982)), and multiplicative factor effects(Hoff(2009)). Four different link functions accommodate differentrelational data structures, including binary/network data (bin),normal relational data (nrm), ordinal relational data (ord) anddata from fixed-rank nomination schemes (frn). Development ofthis software was supported in part by NICHD grant1R01HD067509-01A1.

32/60

Package amen

ame_nrm package:amen R Documentation

AME fit for normal relational data

Description:

An MCMC routine providing a fit to an additive and multiplicative

effects (AME) regression model for normal relational data

Usage:

ame_nrm(Y, X=NULL, Xrow=NULL, Xcol=NULL, rvar = TRUE, cvar = TRUE, dcor = TRUE, R = 0, seed = 1, nscan = 50000, burn = 500, odens = 25, plot = TRUE, print = TRUE,intercept=TRUE)

Arguments:

Y: an n x n square relational matrix

X: an n x n x p array of covariates

Xrow: an n x pr matrix of nodal row covariates

Xcol: an n x pc matrix of nodal column covariates

rvar: logical: fit row random effects?

cvar: logical: fit column random effects?

dcor: logical: fit a dyadic correlation?

33/60

SRRM estimates

library(amen)

fit.srrm <- ame_nrm(Y, X)

summary(fit.srrm)

##

## beta:

## pmean psd z-stat p-val

## intercept -0.605 0.056 -10.741 0.000

## pop.row -0.028 0.015 -1.794 0.073

## gdp.row 0.045 0.013 3.391 0.001

## pty.row -0.002 0.002 -0.804 0.422

## pop.col -0.026 0.015 -1.720 0.086

## gdp.col 0.043 0.013 3.319 0.001

## pty.col -0.001 0.002 -0.572 0.567

## con.dyad -0.001 0.004 -0.247 0.805

## dst.dyad 0.013 0.005 2.533 0.011

## igo.dyad 0.015 0.000 39.697 0.000

## pti.dyad 0.000 0.000 -4.053 0.000

##

## Sigma_ab pmean:

## a b

## a 0.022 0.014

## b 0.014 0.022

##

## rho pmean:

## 0.883

34/60

Formulation in patched version of amen

### node level covariatesXn <- conflict90_00$nodevarsXn[, 1:2] <- log(Xn[, 1:2])dimnames(Xn)[[2]] <- c("pop", "gdp", "pty")

### dyad level covariatesXd <- array(dim = c(nrow(Y), nrow(Y), 4))Xd[, , 1] <- conflict90_00$cocoXd[, , 2:4] <- conflict90_00$dyadvars[, , c(4, 3, 1)]Xd[, , 2] <- log(Xd[, , 2] + 1)dimnames(Xd)[[3]] <- c("con", "dst", "igo", "pti")

###library(amen)source("http://www.stat.washington.edu/~hoff/amen_patch.r")fit.srrm <- ame_nrm(Y, Xrow = Xn, Xcol = Xn, Xdyad = Xd)

35/60

Comparison to OLS estimates

cbind(beta.ols, beta.srrm, beta.ols/beta.srrm)

## beta.ols beta.srrm## (Intercept) -0.2984079 -0.6051439 0.4931## Xmrpop -0.0260559 -0.0276680 0.9417## Xmrgdp 0.0584610 0.0450504 1.2977## Xmrpty -0.0007722 -0.0019083 0.4047## Xmcpop -0.0239576 -0.0262606 0.9123## Xmcgdp 0.0558042 0.0429091 1.3005## Xmcpty -0.0001799 -0.0013457 0.1337## Xmcon 0.0343658 -0.0010976 -31.3088## Xmdst -0.0458067 0.0131936 -3.4719## Xmigo 0.0053237 0.0150091 0.3547## Xmpti 0.0002902 -0.0002354 -1.2327

36/60

Comparison to normal theory sds and z-scores

cbind(sd.ols, sd.srrm, sd.ols/sd.srrm)

## sd.ols sd.srrm## (Intercept) 1.118e-02 5.634e-02 0.1985## Xmrpop 2.195e-03 1.543e-02 0.1423## Xmrgdp 1.889e-03 1.329e-02 0.1422## Xmrpty 3.484e-04 2.374e-03 0.1467## Xmcpop 2.194e-03 1.527e-02 0.1437## Xmcgdp 1.889e-03 1.293e-02 0.1461## Xmcpty 3.483e-04 2.352e-03 0.1481## Xmcon 9.275e-03 4.449e-03 2.0846## Xmdst 3.805e-03 5.209e-03 0.7305## Xmigo 1.980e-04 3.781e-04 0.5236## Xmpti 4.521e-05 5.809e-05 0.7782

cbind(beta.ols/sd.ols, beta.srrm/sd.srrm)

## [,1] [,2]## (Intercept) -26.6862 -10.7406## Xmrpop -11.8716 -1.7936## Xmrgdp 30.9484 3.3907## Xmrpty -2.2168 -0.8037## Xmcpop -10.9173 -1.7196## Xmcgdp 29.5427 3.3194## Xmcpty -0.5166 -0.5722## Xmcon 3.7053 -0.2467## Xmdst -12.0370 2.5327## Xmigo 26.8896 39.6969## Xmpti 6.4196 -4.0527

37/60

Bayesian inference and estimation

Estimates and standard errors in amen are obtained using Bayesian inference

Bayesian inference is closely related to maximum likelihood inference,with a few distinctions:

• In Bayesian inference, probability is treated as a measure of uncertainty;

• Bayesian inference combines• data information with• prior information to produce• posterior information.

• For many problems, Bayes and ML inference produce similar results;

• Computational methods for Bayes inference are more universally available.

38/60

A simple example

Scenario: A nation-wide survey will ask people about the number of friendsthey socialize with on average per week. The number of friends per person isassumed to follow a Poisson distribution with mean θ:

Y1, . . . ,Yn ∼ i.i.d. Poisson(θ).

Researchers observe values for Y1, . . . ,Yn, and then wish to obtain an estimateof and confidence interval for θ.

0 2 4 6 8 10

0.00

0.10

0.20

y

p(y|

θ =

2.6

)

39/60

Prior information

Prior information: Researchers have been studying friendship data for a longtime, and have some pretty good guesses about what the value of θ is. Theysummarize this information with a prior distribution.

0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

0.5

θ

p(θ)

Under this prior distribution E[θ] = 2.

40/60

Posterior information

After having observed {Y1 = y1, . . . ,Yn = yn} The researchers combine theirprior information with their data information by computing the

conditional distribution of θ, given {y1, . . . , yn},

also known as

the posterior distribution of θ.

41/60


First consider

• A = {Y1 = y1, . . . ,Yn = yn}, the observed data;

• B = {θ < 3} a statement about the unknown parameter.

Recall Bayes’ rule,

Pr(B|A) =Pr(A|B) Pr(B)

Pr(A)

=Pr(A|B) Pr(B)

Pr(A|B) Pr(B) + Pr(A|Bc) Pr(Bc)

42/60

Conditional probability distribution

More generally, the posterior distribution p(θ|y1, . . . , yn) may be found using aversion of Bayes rule. Letting y = (y1, . . . , yn), we have

p(θ|y) =p(y|θ)p(θ)∫p(y|θ)p(θ)dθ

.

43/60

Data and posterior distribution

Data: Suppose n = 30 and y = 2.8. The MLE is θMLE = y = 2.8. What is theposterior distribution and Bayes estimator?

0 2 4 6 8

0.0

0.4

0.8

1.2

θ

MLEBayes

Given the prior and these data, E[θ|y] = 2.72.

44/60

Posterior inference via Monte Carlo

Inference cannot always be obtained from a simple plot of the posterior.

• often the posterior distribution is a very high-dimensional object;

• often we can’t compute the posterior distribution exactly.

However, for a large number of models we can simulate from p(θ|y).

Simulations θ(1), . . . , θ(S) from p(θ|y) can be used to approximate p(θ|y).

This is known as the Monte Carlo method.

45/60

Monte Carlo approximation

Posterior descriptions involve integrals we’d often like to avoid:

E[θ|y ] =

∫θp(θ|y) dθ

median[θ|y ] = θ1/2 :

∫ θ1/2

−∞p(θ|y) = 1/2

Pr(θ1 < θ2|y1, y2) =

∫ ∞

−∞

∫ θ2

−∞p(θ1, θ2|y1, y2) dθ1 dθ2

We can easily approximate such integrals arbitrarily closely withMonte Carlo approximation.

46/60


The basic principle: If θ(1), . . . , θ(S) iid∼ p(θ|y), then

histogram{θ(1), . . . , θ(S)} ≈ p(θ|y)

This implies that

1

S

∑θ(s) ≈ E[θ|y ]

#{θ(s) < c}S

≈ Pr(θ < c|y)

median{θ(1), . . . , θ(S)} ≈ median[θ|y ]

etc. with the approximation improving with increasing S .

47/60


2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

θ

S=10

2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

θ

S=100

2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

θ

S=1000

48/60

Quick summary of Bayesian inference

• Probability distributions encapsulate information• p(θ) describes prior information• p(y |θ) describes information about y for each θ• p(θ|y) describes posterior information

• Posterior distributions can be calculated via Bayes’ rule

p(θ|y) =p(y |θ)p(θ)∫p(y |θ)p(θ) dθ

• For some models, posteriors can be plotted/summarized with R commands

• dbeta, dgamma, dnorm• qbeta, qgamma, qnorm

• Posterior distributions can be approximated with Monte Carlo methods.

1. simulate θ(1), . . . , θ(S) iid∼ p(θ|y)2. posterior means, variances, quantiles can be approximated with simulation

means, variances, quantiles.

49/60

Bayesian inference in amen

The amen package generates a Monte Carlo approximation to the posteriordistribution of the parameters in the SRM:

• Regression parameters: β

• Row and column effects: a, b

• Covariance parameters: (σ2a , σ

2b, σab) , (σ2

e , ρ).

50/60

Bayesian inference in amen

plot(fit.srrm)

0 100 200 300 400

0.0

0.4

0.8

SA

BR

0 100 200 300 400

−0.

6−

0.2

BE

TA

0 100 200 300 400

0.76

0.80

0.84

reci

proc

ity

0 100 200 300 400

1000

1600

2200

tran

sitiv

e tr

iple

s

0 10 20 30 40 50

020

40

outdegree

coun

t

0 10 20 30 40 50 60

020

40

indegree

coun

t

51/60


fit.srrm$SABR[1:12, ]

## va cab vb rho ve## [1,] 0.02519 0.012852 0.01955 0.8824 0.04159## [2,] 0.02173 0.015294 0.02703 0.8810 0.04194## [3,] 0.01838 0.012051 0.01931 0.8793 0.04066## [4,] 0.02074 0.013576 0.01939 0.8830 0.04224## [5,] 0.01955 0.012998 0.02054 0.8835 0.04091## [6,] 0.01901 0.012910 0.02279 0.8824 0.04137## [7,] 0.02369 0.013405 0.02353 0.8846 0.04173## [8,] 0.02601 0.015275 0.02332 0.8788 0.04163## [9,] 0.02719 0.016061 0.02350 0.8806 0.04147## [10,] 0.01756 0.012264 0.01892 0.8836 0.04308## [11,] 0.01875 0.009443 0.01639 0.8818 0.04174## [12,] 0.02010 0.013664 0.02288 0.8819 0.04114

These numbers represent simulations from p(σ2a , σ

2b, σab, σ

2e , ρ|Y,X).

Consider addressing the following inferential questions about ρ:

• What is a point estimate of ρ?

• What is the uncertainty in ρ?

• What is a 95% confidence interval for ρ?

52/60


rho.mcmc <- fit.srrm$SABR[, 4]

• What is a point estimate of ρ? E[ρ|Y,X]

mean(rho.mcmc)

## [1] 0.8829

• What is our uncertainty in ρ? sd(ρ|Y,X)

sd(rho.mcmc)

## [1] 0.002503

• What is a 95% confidence interval for ρ? (ρ.025, ρ.975)

quantile(rho.mcmc, prob = c(0.025, 0.975))

## 2.5% 97.5%## 0.8775 0.8876

53/60

An example with missing data

NCAA division I men’s basketball, 2012-13 season:

EL[1:5, ]

## [,1] [,2] [,3] [,4]## [1,] "army" "65" "air-force" "76"## [2,] "citadel" "70" "air-force" "77"## [3,] "air-force" "102" "western-st" "68"## [4,] "air-force" "86" "montana-st" "72"## [5,] "colorado" "89" "air-force" "74"

To convert this to a simple relational dataset, let

Yi,j = scorei − scorej

in the game where i hosts j .

Note that most Yi,j ’s are missing.

mean(is.na(Y))

## [1] 0.9626

54/60

Some PAC-12 data

someteams <- c("washington", "washington-st", "oregon", "oregon-st", "california")Y[match(someteams, rownames(Y)), match(someteams, rownames(Y))]

## washington washington-st oregon oregon-st california## washington NA NA -13 10 NA## washington-st -5 NA -2 -1 NA## oregon 5 7 NA NA -2## oregon-st 8 -3 -13 NA -1## california -15 13 4 3 NA

55/60

Descriptive analysis

point spread

Fre

quen

cy

−60 −20 0 20 40 60

020

040

060

080

010

0012

00

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●●●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●●

●

●●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●●●

●

●

●

●

●

●

●

●

●●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●

●●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●●

●●

●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●

●●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●●●

●●●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●●

●

●

●●

●

●●

●

●●●

●●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●●

●

●●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●●●●

●

●●

●●

●

●

●

●●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●●

●●

●

●

●●

●

●

●

●

●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●●

●

●●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●●

●

●

●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●●●●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●●●

●

●

●

●

●

●

●

●●●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●●●

●

●

●●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●●

●●●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●●

●●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●●

●

●●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●●

●●●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●●●●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●●

●

●

●●●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●●

●

●

−2 0 2

−40

−20

020

4060

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−20 −10 0 10 20 30

−10

010

2030

40

home win margin

away

loss

mar

gin

●

●

●

●

●

●

●●●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

●●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●●

●

●

belmont

binghamton

floridagonzaga

grambling

indiana

lamarlongwood

louisville

new−orleanssouth−carolina−st

stephen−f−austin

umes

vcu

56/60

Model fitting with amen

fit.srm <- ame_nrm(Y, nscan = 5000)

Home court advantage:

beta.psamp <- c(fit.srm$BETA)mean(beta.psamp)

## [1] 3.459

sd(beta.psamp)

## [1] 0.173

quantile(beta.psamp, prob = c(0.025, 0.975))

## 2.5% 97.5%## 3.081 3.754

57/60

Model fitting with amen

SRM terms:

SABR <- apply(fit.srm$SABR, 2, mean)SABR

## va cab vb rho ve## 76.616607 -72.626413 70.077548 -0.006638 105.824642

Sab.hat <- matrix(c(SABR[1], SABR[2], SABR[2], SABR[3]), 2, 2)Sab.hat

## [,1] [,2]## [1,] 76.62 -72.63## [2,] -72.63 70.08

cov2cor(Sab.hat)

## [,1] [,2]## [1,] 1.0000 -0.9912## [2,] -0.9912 1.0000

58/60

Predictions of missing data

We can obtain predictions for missing data:

ˆE[Yi,j ] = µ+ ai + bj

EY <- mean(fit.srm$BETA) + outer(fit.srm$A, fit.srm$B, "+")rownames(EY) <- rownames(Y)colnames(EY) <- colnames(Y)EY[match(someteams, rownames(Y)), match(someteams, rownames(Y))]

## washington washington-st oregon oregon-st california## washington 3.229 5.868 -1.498 5.530 0.8577## washington-st 1.255 3.894 -3.472 3.556 -1.1166## oregon 8.789 11.428 4.062 11.090 6.4174## oregon-st 1.172 3.812 -3.555 3.473 -1.1992## california 5.950 8.589 1.223 8.251 3.5786

59/60

Some limitations

a.hat <- fit.srm$Ab.hat <- fit.srm$Bamb.hat <- a.hat - b.hatteams[order(amb.hat, decreasing = TRUE)[1:10]]

## [1] "indiana" "florida" "louisville" "duke" "gonzaga"## [6] "syracuse" "michigan" "pittsburgh" "wisconsin" "kansas"

• Some lesser-known teams are in weak conferences:• A decent team is capable of running up the score on a very weak team.

• The estimation routine assumes the data are missing at random;• This assumption is almost certainly violated;• The estimation procedure would be better suited to modeling outcomes of

teams from conferences of a similar quality.• Adding conference as a nodal characteristic may make the fit more realistic.

60/60

Social relations models for binary, ranked and ordinal relations567 Statistical analysis of social networks

Peter Hoff


1/44

Adapting the normal SRM

Relational data are often valued, non-binary.

The standard SRM is appropriate for normal valued relational data.

However, relational data is often neither binary nor normal:

• Links can be absent or continuous within the same network.• meaning Yi,j is either 0 or some arbitrary real number;• communication networks (time spent communicating);

• Links can be absent or ordinal within the same network.• conflict networks (negative, positive or zero relation);• ranked nominations (friends are ranked, non-friends are not).

The SRM can be adapted via ordinal probit models to handle such data.

2/44

Ordinal data

An ordinal variable has a meaningful ordering to the possible outcomes.

This is in contrast to a categorical (non-ordered) variable.

Ordinal variable

• continuous: (all real numbers)• discrete: (counts, ranks, etc.)

Categorical variable

• non-orderable categories (religion, ethnicity, etc.)

3/44

Binary variable as ordinal

The simplest ordinal variable is a binary random variable Y ∈ {0, 1}. Let

Pr(Y = 1) = θ

Pr(Y = 0) = 1− θ

This model for Y has the following latent variable representation:

µ = Φ−1(θ)

Z ∼ N(µ, 1)

Y = 1× (Z > 0)

Here, Φ−1(θ) is the θ-quantile of the standard normal distribution.

4/44

Probit representation

To confirm the representation, recall

• If Z ∼ N(µ, 1) then Z − µ ∼ N(0, 1);

• If Z ∼ N(µ, 1) then Z = µ+ ε, where ε ∼ N(0, 1);

• If ε ∼ N(0, 1) then −ε ∼ N(0, 1).

Now do the calculation:

Pr(Y = 1) = Pr(Z > 0)

= Pr(−Z < 0)

= Pr([−Z + µ] < µ)

= Pr(ε < µ) = Φ(µ) = θ

5/44

Probit representation

θ = .15, µ = Φ−1(.15) = −1.040.

00.

20.

4p(

z)

−4 −2 0 2 4

0.0

0.4

0.8

z

y

6/44

Probit regression

Now suppose we havebinary data Y1, . . . ,Yn

we want to relate toexplanatory variables x1, . . . , xn

Latent variable model:

ε1, . . . , εn ∼ i.i.d. N(0, 1)

Zi = βTxi + εi

Yi = 1× (Zi > 0)

Under this latent variable model, the Yi ’s are independent and

Pr(Yi = 1) = Pr(Zi > 0)

= Pr(βTxi + εi > 0)

= Pr(−εi < βTxi )

= Pr(εi < βTxi ) = Φ(βTxi )

7/44

Probit regression

This latent variable model is exactly the same as probit regression:

Pr(Y1 = y1, . . . ,Yn = yn|x1, . . . , xn) =n∏

i=1

Φ(βTxi )yi [1− Φ(βTxi )]1−yi

Compare to logistic regression:

Pr(Y1 = y1, . . . ,Yn = yn|x1, . . . , xn) =n∏

i=1

( eβT xi

1+eβT xi

)yi ( 1

1+eβT xi

)1−yi

In fact, logistic regression has a latent variable representation also:

ε1, . . . , εn ∼ i.i.d. L(0, 1)

Zi = βTxi + εi

Yi = 1× (Zi > 0)

8/44

Logistic versus probit regression

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

Sheep dominance data

• counts of dominance encountersbetween 28 male sheep;

• age of each sheep, in years.

Let’s examine the effect of age difference on dominance.

9/44

Sheep dominance data

Let

• dom[i,j] = indicator that i has dominated j at least once;

• aged[i,j] = agei - agej .

mean(aged[dom == 1], na.rm = TRUE)

## [1] 2.184

mean(aged[dom == 0], na.rm = TRUE)

## [1] -1.079

summary(glm(dom ~ aged, family = binomial))$coef

## Estimate Std. Error z value Pr(>|z|)## (Intercept) -0.8358 0.08729 -9.575 1.022e-21## aged 0.2260 0.02317 9.755 1.760e-22

summary(glm(dom ~ aged, family = binomial(link = probit)))$coef

## Estimate Std. Error z value Pr(>|z|)## (Intercept) -0.5138 0.05112 -10.05 8.973e-24## aged 0.1386 0.01337 10.36 3.601e-25

10/44

Logit, probit and other binary regression models

Comparing logistic and probit regression output:

• β’s are on different scales;

• inference (z-scores) are typically similar.

Both models are binary regression models of the form:

Pr(Y1 = y1, . . . ,Yn = yn|x1, . . . , xn) =n∏

i=1

g(βTxi )yi [1− g(βTxi )]1−yi ,

where g−1 is called the inverse-link function.

Link functions:

• logistic regression: g−1(µ) = exp(µ)/[1 + exp(µ)];

• probit regression: g−1(µ) = Φ(µ);

• other binary regression: g−1(µ) is a strictly increasing function.

11/44

SRM for binary relational data

Recall the latent variable representation of probit regression:

Zi,j = βTxi,j + εi,j

Yi,j = 1× (Zi,j > 0)

We could estimate β with a probit regression analysis, but what about networkdependence in the data?

Could there be

• across-row heterogeneity/within row correlation?

• across-column heterogeneity/within column correlation?

• within dyad correlation?

12/44


Recall the latent variable representation of probit regression:


Yi,j = 1× (Zi,j > 0)

Dependence on the Y -scale can be induced by dependence on the Z -scale:


{(a1, b1), . . . , (an, bn)} ∼ i.i.d. N(0,Σab)

{(ei,j , ej,i ) : i 6= j} ∼ i.i.d. N(0,Σe)

Σab =

(σ2a σab

σab σ2b

)Σe =

(1 ρρ 1

)

Note that the variance of ei,j is fixed at 1.

The scales of ei,j and β are not separately identifiable.

13/44


SRM probit model:

Yi,j = 1× (Zi,j > 0)


εi,j = ai + bj + ei,j ,

and {ai , bj , ei,j} are random effects as described previously.

Parameter estimation:

• The likelihood can’t be expressed in closed form;

• Bayesian parameter estimates can be obtained via MCMC.

The latter is provided in the package amen.

14/44

Binary SRM in amen

ame_bin package:amen R Documentation

AME fit for binary relational data

Description:

An MCMC routine providing a fit to an additive and multiplicativeeffects (AME) regression model for binary relational data

Usage:

ame_bin(Y, X, rvar = TRUE, cvar = TRUE, dcor = TRUE, R = 0, seed = 1, nscan = 50000, burn = 500, odens = 25, plot = TRUE, print = TRUE)

Arguments:






15/44

SRM probit in amen

fit_ame_bin <- ame_bin(YB, XA, nscan = 10000, plot = FALSE)

0 100 200 300 400

−0.

50.

51.

5S

AB

R

0 100 200 300 400

−1.

00.

0B

ETA

0 100 200 300 400

0.05

0.15

0.25

reci

proc

ity

0 100 200 300 400

500

800

1100

tran

sitiv

e tr

iple

s

0 5 10 15

02

4

outdegree

coun

t

0 5 10 15 20

02

4

indegree

coun

t

16/44

Posterior analysis

The summary command provides a canned summary of the fitted model:

summary(fit_ame_bin)

#### beta:## pmean psd z-stat p-val## intercept.dyad -0.639 0.172 -3.722 0## agediff.dyad 0.199 0.033 6.024 0#### Sigma_ab pmean:## a b## a 0.573 -0.046## b -0.046 0.170#### rho pmean:## -0.377

17/44

Posterior analysis

Confidence intervals can be obtained via the quantile command:

apply(fit_ame_bin$BETA, 2, quantile, prob = c(0.025, 0.5, 0.975))

## intercept.dyad agediff.dyad## 2.5% -0.9743 0.1382## 50% -0.6338 0.1983## 97.5% -0.3037 0.2684

apply(fit_ame_bin$SABR, 2, quantile, prob = c(0.025, 0.5, 0.975))

## va cab vb rho## 2.5% 0.2902 -0.24559 0.08231 -0.5760## 50% 0.5240 -0.03907 0.15603 -0.3828## 97.5% 1.0793 0.09558 0.32863 -0.1455

18/44

Posterior inference

Summary of results:

Strong evidence of an age effect:

• Older sheep dominate younger ones.

Row and column heterogeneity is not substantial:

• Compare σ2a = 0.52 and σ2

b = 0.16 to the error variance σ2e = 1.

There is evidence that dominance tends to go one-way in a dyad:

• The 95% CI for ρ is (-.58, -.15) .

19/44

Goodness of fit

The default amen plot provides four goodness of fit plots:

Outdegree distribution: Comparing outdegree distribution to simulated;

Indegree distribution: Comparing indegree distribution to simulated;

Reciprocity: Comparing fraction reciprocated ties to simulated fraction;

Transitivity: Comparing number of triangles to simulated number.

These statistics are computed with the commands

t_degreet_recipt_trans

20/44

Model comparison

Let’s use these statistics to evaluate the fit of three models:

fit bin 000: the probit model, fix σ2a = σ2

b = ρ = 0.

fit bin 001: dyadic correlation ρ estimated, σ2a = σ2

b = 0.

fit bin 111: full SRM covariance model.

21/44

Model comparison

fit_bin_000 <- ame_bin(YB, XA, rvar = FALSE, cvar = FALSE, dcor = FALSE)

0 100 200 300 400

−1.

00.

01.

0S

AB

R

0 100 200 300 400

−0.

6−

0.2

0.2

BE

TA0 100 200 300 400

0.10

0.25

reci

proc

ity

0 100 200 300 400

300

600

900

tran

sitiv

e tr

iple

s

0 5 10 15

02

4

outdegree

coun

t

0 5 10 15 20

02

4

indegree

coun

t

22/44

Model comparison

fit_bin_001 <- ame_bin(YB, XA, rvar = FALSE, cvar = FALSE, dcor = TRUE)

0 100 200 300 400

−0.

6−

0.3

0.0

SA

BR

0 100 200 300 400

−0.

6−

0.2

0.2

BE

TA0 100 200 300 400

0.05

0.15

0.25

reci

proc

ity

0 100 200 300 400

400

800

1200

tran

sitiv

e tr

iple

s

0 5 10 15

02

46

outdegree

coun

t

0 5 10 15 20

02

46

indegree

coun

t

23/44

Model comparison

fit_bin_111 <- ame_bin(YB, XA)

0 100 200 300 400

−0.

50.

51.

5S

AB

R

0 100 200 300 400

−1.

00.

0B

ETA

0 100 200 300 400

0.05

0.15

0.25

reci

proc

ity

0 100 200 300 400

500

800

1100

tran

sitiv

e tr

iple

s

0 5 10 15

02

4

outdegree

coun

t

0 5 10 15 20

02

4

indegree

coun

t

24/44

Model comparison

The dyadic correlation model fit bin 001 looks pretty good based on thesestatistics, except possibly for the transitivity statistic (more on that soon).

25/44

Ordered probit models for ordinal data

The original sheep data consists of counts:

YV[1:8, 1:8]

## V1 V2 V3 V4 V5 V6 V7 V8## [1,] NA 0 0 0 0 0 0 1## [2,] 0 NA 0 0 5 2 1 0## [3,] 0 0 NA 0 7 4 0 0## [4,] 0 0 8 NA 0 0 0 0## [5,] 0 0 0 0 NA 1 0 0## [6,] 0 0 0 7 0 NA 0 0## [7,] 0 0 1 0 0 0 NA 0## [8,] 0 1 1 4 5 3 0 NA

Ideally, this information should be taken into consideration.

• In principle, throwing away information is not efficient.

• Effects of some covariates might distinguish high values of the relation.

26/44



Zi,j = βT xi,j + εi,j


Binary probit:

Yi,j =

{0 if Zi,j < 01 if Zi,j > 0

What if Yi,j ∈ {y1, . . . , yK} = Y?

Here, Y is an ordered, countable set of possible values of Yi,j .

27/44



Zi,j = βT xi,j + εi,j


Ordered probit:

Yi,j =

y1 if Zi,j ∈ (−∞, c1)y2 if Zi,j ∈ (c1, c2)

...yK−1 if Zi,j ∈ (cK−2, cK−1)yK if Zi,j ∈ (cK−1,∞)

28/44


table(c(YB))

#### 0 1## 506 250

table(c(YV))

#### 0 1 2 3 4 5 6 7 8 9 10 11 12## 506 100 59 34 20 13 5 7 5 1 3 1 2

round(table(c(YV))/sum(table(c(YV))), 3)

#### 0 1 2 3 4 5 6 7 8 9 10 11## 0.669 0.132 0.078 0.045 0.026 0.017 0.007 0.009 0.007 0.001 0.004 0.001## 12## 0.003

29/44

Link function for ordered probit

−4 −2 0 2 4

0.0

0.2

0.4

p(z)

−4 −2 0 2 4

04

812

z

y

30/44

Fitting ordered probit models in amen

ame_ord package:amen R Documentation

AME fit for ordinal relational data

Description:

An MCMC routine providing a fit to an additive and multiplicativeeffects (AME) regression model for ordinal relational data

Usage:

ame_ord(Y, X, rvar = TRUE, cvar = TRUE, dcor = TRUE, R = 0, seed = 1, nscan= 50000, burn = 500, odens = 25, plot = TRUE, print = TRUE)

Arguments:






31/44

SRM probit in amen

fit_ame_ord <- ame_ord(YV, XA[, , -1], nscan = 10000, plot = FALSE)

0 100 200 300 400

−0.

50.

51.

5S

AB

R

0 100 200 300 400

−1.

00.

0B

ETA

0 100 200 300 400

0.05

0.15

0.25

reci

proc

ity

0 100 200 300 400

500

800

1100

tran

sitiv

e tr

iple

s

0 5 10 15

02

4

outdegree

coun

t

0 5 10 15 20

02

4

indegree

coun

t

32/44

Posterior analysis

Summary:

summary(fit_ame_ord)

#### beta:## pmean psd z-stat p-val## intercept.dyad -0.639 0.172 -3.722 0## agediff.dyad 0.199 0.033 6.024 0#### Sigma_ab pmean:## a b## a 0.573 -0.046## b -0.046 0.170#### rho pmean:## -0.377

33/44

Posterior analysis

Confidence intervals

apply(fit_ame_ord$BETA, 2, quantile, prob = c(0.025, 0.5, 0.975))

## intercept.dyad agediff.dyad## 2.5% -0.9743 0.1382## 50% -0.6338 0.1983## 97.5% -0.3037 0.2684

apply(fit_ame_ord$SABR, 2, quantile, prob = c(0.025, 0.5, 0.975))

## va cab vb rho## 2.5% 0.2902 -0.24559 0.08231 -0.5760## 50% 0.5240 -0.03907 0.15603 -0.3828## 97.5% 1.0793 0.09558 0.32863 -0.1455

These results are very similar to those obtained from the binary probit analysisusing the dichotomized data.

34/44

Dichotomizing ordinal data

Sometimes a binary model for dichotomized ordinal data is sufficient:

• Most responses are zeros, with just a few positive valued relations.

• The main “signal” in the data may be distinguishing zeros from nonzeros.

For other data, a dichotomization is problematic:

• Positive and negative relations (conflict data, linking/disliking, etc.)

• Few zero responses.

35/44

Example: Interaction recall

Data on observed and recalled interaction frequencies among 58 members of afraternity.

Yi,j recall of frequency of interaction;

xi,j observed interaction frequency.

0.00

0.10

0.20

0.30

recall

prop

ortio

n

1 2 3 4 5

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●●

●

●

●

●

●●

●

●●●●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●●

●

●

● ●

●

●

●

● ● ●

●●

●●

●

●●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●● ●●

●

●● ●●

●●●

●●●

●●

●

●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●●

●

●

●

●

●●

●●

●●

● ●●●

●

●

●

●

●

●

●●●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●●●

●● ● ●

●●

●

●●●●

●●●

●

●

●

●

●●

●

●

●●

●●

●●

●

●

●

●

●

●

●●

●

●

●●● ●

●●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●●

●●●●●●

●

● ●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●●● ●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●● ●

●

●

●

●●●●●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●●

●●●

●

●

●

●

●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

● ●

●● ●

●

●

●●●

●

●

●

●●

●

● ●

●

●●

●

●●

●

● ●

●●●

●●●

●

●●

●

●

●

●

●

●● ●

●●

●

●

●

●●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

● ●

●

●●●

●

●●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●●●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●●

●

●

●

●

●●

●

●

●

●

●

●

●●●

●

●

●●

●●

●

●

●●

●

●

●

●

●

●

● ●

●●

●

●●●●

● ●● ● ●

●

●

●

●●●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●●

●

●●

●

●●

●

●●●

●

●

●

●

●●

●

●

●●●●

●

●●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●●

●●

●

● ●

●

●

●

●●

●

●

●●

●

●●

●●

●

●● ●

●

●

●●●

●

●

●●

●

●

● ●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●●

● ●

●

●●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●●

●

●

●

●●●

●

●●●

●

●●

●

●●

●

●

●

●

●

●

●●●●●

●●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●●

●●●

●

●●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

● ●

●

●

● ●●

●

●

●

●●●

●

●

●

●

●

●

●

● ●

●

●

●

● ●●

●

●

●

●

●●

●

●

●●

●●●

●

●

●

●

●●●

●

● ●

●

●●

●●●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●●●

●●

●

●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●●●●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●●●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●●●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●●●●

●

●

●

●

●

● ●

●

●

●●

●

●●●●

●

●●

●●

●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●

●●●

●●

●●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●● ●

●●

●●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●●

●

●

● ●●●●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●●●

●

●●

●

●

●●

●

●

●

●

●

●

●●●●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●●●

●

●

●

●

●

●●●●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●●●

●●●

●

●

●●

●

●

●

●

●

●

●

●

●● ●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

● ●

●

●

● ●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●●

●

●

●●

●

●●● ●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●

●●●●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●●

●

●●

●

●

●●

●

●

●

●

● ●

●

●●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

● ●

●

●

● ●●

●●

●●●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●●●

●

●●

●●

●

●

●●

●

●

●

●

●

●

●●●●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●●

●

●●

●

●

●

●

●

●●

●●●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●● ●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●●●●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●●

●

● ●

●

●

● ●

●

●

●

●

●●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●●●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●●

●●●

●

●●

●

●

●

●●

●

●●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●●

●●●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

● ●

●

●

●

●●

●

●

●●●●

●●●●

●

●●●●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●●●●

●

●

●

●

●

● ●●●

●

●

●●

●●

● ●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●●

●

●●●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●●●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

● ●

●

●

●

●●●●

●

●●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●●●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●●

●●

●

●

●

●

●

●

●

●●●

●

●●

●

●

●

●●

●●●●●

●

●

●

●

●

●

●●●●●

●

●

●

●

●

●

●

●●●●

●

●●

●

●●●●

●

●

●

●●●

●●●●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●●●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●● ●●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●●

●●

●●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●●●

●

●●

●

●

●

●

●

●

●

●

● ●

●●

●●

●

●

●

●

●

● ●

● ●

●

●

●

●

●

●

●

●

●

● ●

●

●●●

●

●●

●●●●●

● ●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●● ●

●●●

●

●

●

●

●●

●

●

●●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

● ●●●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●

●● ●

●

●

●●●●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●●

●

● ●

●

●

●●

●●

●●

●

●●

●

●

●

●

●

●●●

●

●

●●

●

●●●

●

●●

●

●

●

●

●

●●

0 10 20 30 40 50

12

34

5

observed interactions

reca

ll

36/44

SRM probit in amen

fit_recall_ord <- ame_ord(Y, X, nscan = 20000, plot = FALSE)

0 100 200 300 400

0.0

0.2

0.4

0.6

0.8

SA

BR

0 100 200 300 400

0.18

0.20

0.22

0.24

0.26

BE

TA

37/44

Posterior analysis

Summary:

summary(fit_recall_ord)

#### beta:## pmean psd z-stat p-val## .dyad 0.22 0.013 17.23 0#### Sigma_ab pmean:## a b## a 0.437 0.070## b 0.070 0.138#### rho pmean:## 0.584

38/44

Dichotomized version

YD <- 1 * (Y > 3)fit_recall_bin <- ame_ord(YD, X, nscan = 20000, plot = FALSE)

Summary:

summary(fit_recall_bin)

#### beta:## pmean psd z-stat p-val## .dyad 0.247 0.017 14.95 0#### Sigma_ab pmean:## a b## a 1.498 0.361## b 0.361 0.191#### rho pmean:## 0.711

39/44

The role of covariance

Regression modeling:

Yi,j = βTxi,j + εij

Y = 〈X,β〉+ E

OLS estimation:

β = (XTX)−1XTy

Precision of OLS estimators:Let C = Cov[E]

Cov[β] = (XTX)−1XTCX(XTX)−1

= (XTX)−1σ2 if C = σ2I

For networks and relational data, typically C 6= σ2I.Accurate standard errors can’t be obtained unless we know/estimate Cov[E].

40/44

Justifying the SRM

The social relations covariance model can by a symmetry principle.

Exchangeability:

1. randomly sample n individuals from a population;

2. observe εi,j = directed relation between ith and jth person.

Consider a probability model Pr(E) for the possible outcomes of E = {εi,j}

Pr

NA −0.94 0.15−0.7 NA 0.630.63 −0.42 NA

=?

Pr

NA 0.63 −0.420.15 NA −0.940.63 −0.7 NA

=?

Note the second matrix is the same as the first with (1,2,3) relabled as (2,3,1).

41/44

Justifying the SRM

Let π be some permutation of {1, . . . , n}.

E = {εi,j : i 6= j}Eπ = {επi ,πj : i 6= j}

Exchangeability: A probability distribution Pr(E) is exchangeable if

Pr(E) = Pr(Eπ)

for all E and permutations π.

Exchangeability can be justified by

• random sampling of nodes from a population;

• symmetry in the uncertainty in the εi,j ’s.

42/44

Justifying the SRM

Interesting result:Suppose

1. εi,j ’s are normal and mean zero;

2. our probability for E = {εi,j} is exchangeable.

Then


{(a1, b1), . . . , (an, bn)} ∼ i.i.d. N(0,Σab)

{(ei,j , ej,i ) : i 6= j} ∼ i.i.d. N(0,Σe)

for some Σab and Σe (Li and Loken, 2002).

Interpretation:normality + exchnageability ⇒ SRM

43/44

Justifying the SRM

There is a stronger result than this: If E is exchangeable, then


Cov[(ai , bi )] = Σab

Cov[(ei,j , ej,i )] = Σe

for some Σab,Σe .

Interpretation:exchangeability ⇒ SRM covariance structure

Implication:Cov[β] = (XTX)−1XTCX(XTX)−1,

where C = Cov[(εi,j , εj,i )].

Under exchangability, the SRM provides appropriate SEs and CIs for β.

44/44

Transitivity567 Statistical analysis of social networks

Peter Hoff


1/53

Beyond second order dependence

Second order dependence:Variances, covariances and correlations all involve second order moments:

Cov[εi,j , εk,l ] = E[εi,j , εk,l ]

Higher order dependence:Variances and covariances cannot represent higher order moments:

E[εi,jεk,lεm,n] =?

Questions:Are there higher order dependencies in network data?Is the SRM covariance structure sufficient or deficient?

2/53

Goodness of fitConsider the following goodness of fit statistic:

t(Y) =∑

i

∑

j

∑

k

yi,j yj,k yk,i

where yi,j = 1× (yi,j + yj,i > 0).

t_trans

## function (Y)## {## YS <- 1 * (Y + t(Y) > 0)## sm <- 0## for (i in 1:nrow(YS)) {## ci <- which(YS[i, ] > 0)## sm <- sm + sum(YS[ci, ci], na.rm = TRUE)## }## sm/6## }## <environment: namespace:amen>

t(Y) counts the number of triangles in the graph.

• it overcounts - it is really six times the number of triangles;

• it counts triangles regardless of the direction of the ties.

3/53

GOF - sheep data

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

4/53

GOF - sheep data

0 100 200 300 400

−0.

50.

51.

5S

AB

R

0 100 200 300 400

−1.

0−

0.6

−0.

2B

ETA

0 100 200 300 400

0.05

0.15

0.25

reci

proc

ity

0 100 200 300 400

4000

6000

tran

sitiv

e tr

iple

s

0 5 10 15

02

4

outdegree

coun

t

0 5 10 15 20

02

4

indegree

coun

t

5/53

GOF - sheep data

tH

Den

sity

4000 5000 6000 7000

0e+

004e

−04

mean(fit$TT >= fit$tt)

## [1] 0.15

6/53

GOF - high tech managers

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

7/53


0 100 200 300 400

0.0

1.5

3.0

SA

BR

0 100 200 300 400

−1.

4−

0.8

−0.

2B

ETA

0 100 200 300 400

0.2

0.4

0.6

reci

proc

ity

0 100 200 300 400

200

600

1000

tran

sitiv

e tr

iple

s

0 5 10 15

02

46

outdegree

coun

t

0 2 4 6 8 10

02

46

indegree

coun

t

8/53


tH

Den

sity

200 400 600 800 1000 1200

0.00

000.

0015


## [1] 0.5625

9/53

GOF - Dutch college friendships

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

10/53


0 100 200 300 400

0.0

1.0

SA

BR

0 100 200 300 400

−1.

6−

1.0

BE

TA

0 100 200 300 400

0.3

0.5

reci

proc

ity

0 100 200 300 400

200

800

1400

tran

sitiv

e tr

iple

s

0 5 10 15

02

46

8

outdegree

coun

t

0 2 4 6 8 10

02

46

8

indegree

coun

t

11/53


tH

Den

sity

200 400 600 800 1000 1400

0.00

000.

0015


## [1] 0.13

12/53

GOF - Conflict in the 90s

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

13/53


0 100 200 300 400

0.2

0.6

SA

BR

0 100 200 300 400

−3.

2−

2.9

−2.

6B

ETA

0 100 200 300 400

0.20

0.35

0.50

reci

proc

ity

0 100 200 300 400

100

300

tran

sitiv

e tr

iple

s

0 5 10 15 20 25

020

60

outdegree

coun

t

0 5 10 15

020

4060

indegree

coun

t

14/53


tH

Den

sity

100 200 300 400

0.00

00.

002

0.00

40.

006


## [1] 0.01

15/53

Excess triangles and transitivity

Some evidence that the SRM may not always be sufficient:

• Networks often have more “triangles” than predicted under the model.

This corresponds with several social theories about relations:

transitivity: a social preference to be friends with your friends’ friends.

balance: a social preference to be friends with your enemies’ enemies.

homophily: a social preference to be friends with others similar to you.

These social models may not be distinguishable from network data.

Exercise: Explain why each of these may lead to triangles in a network.

16/53

Triads and triangles

A triad is an unordered subset of three nodes.

Consider for simplicity an undirected binary relation.

The node-generated subgraph of a triad is given by

Y[(i , j , k), (i , j , k)] =

� yi,j yi,kyi,j � yj,kyi,k yj,k �

3 relations and 2 possible states per relation ⇒ 23 = 8 possible triad states.

Exercise: Draw the triad states.

17/53

Triads, two-stars and triangles

Consider a node i connected to both nodes j and k.

i

j k?

What are the possibilities for this triad?

i

j k

i

j k

18/53

Triads, two-stars and triangles

For a given network, how much transitivity is there?

How many triangles occur, relative to how many are “possible”?

Triad census: A count of the number of each type of triad.

t(Y) = {# null, one edge, two-star, triangle }

19/53

Triad census

1

2

3

4

5

i j k type1 2 3 triangle1 2 4 one edge1 2 5 one edge1 3 4 two star1 3 5 one edge1 4 5 null2 3 4 two star2 3 5 one edge2 4 5 null3 4 5 one edge

t(Y) = (2, 5, 2, 1)

20/53

Computing the triad census

For a given triad, the state is given by the number of edges:

0 edges: null

1 edge: one edge

2 edges: two star

3 edges: triangle

tcensus <- c(0, 0, 0, 0)for (i in 1:(n - 2)) {

for (j in (i + 1):(n - 1)) {for (k in (j + 1):n) {

Yijk <- Y[c(i, j, k), c(i, j, k)]nedges <- sum(Yijk, na.rm = TRUE)/2tcensus[nedges + 1] <- tcensus[nedges + 1] + 1

}}

}

tcensus

## [1] 2 5 2 1

21/53


Nested loops will be too slow for large graphs.Here is a more efficient algorithm:

tt <- c(0, 0) # count two stars and trianglesfor (i in 1:n) {

i.alters <- which(Y[i, ] > 0)tt <- tt + c(sum(1 - Y[i.alters, i.alters], na.rm = TRUE), sum(Y[i.alters,

i.alters], na.rm = TRUE))}tt/c(2, 6)

## [1] 2 1

This counts the two stars and the triangles.The nulls and one-edges can be found by applying the algorithm to 1− Y.

22/53


triad_census(Y)

## [1] 2 5 2 1

triad_census(Yht) # high tech managers

## [1] 376 509 343 102

triad_census(Ydc) # Dutch college

## [1] 2158 2016 624 162

triad_census(Ysd) # sheep dominance

## [1] 235 955 1103 983

triad_census(Y90) # 90s conflict

## [1] 338443 18221 1029 67

23/53

Evaluating transitivity

Transitivity: “more triangles than expected”

One measure of transitivity is to compare the number of triangles to thenumber of possible triangles:

number of triangles: number of three-tie triads.

number of possible triangles: number of two- or three-tie triads.

transitivity index =#triangles

#triangles or two-stars

≈ Pr(yk,j = 1|yi,j = 1 and yi,k = 1)

This transitivity index can be viewed as how yk,j depends on yi,j and yi,k .

24/53

Transitivity index

tc <- triad_census(Yht) # high tech managerstc[4]/(tc[3] + tc[4])

## [1] 0.2292

tc <- triad_census(Ydc) # Dutch collegetc[4]/(tc[3] + tc[4])

## [1] 0.2061

tc <- triad_census(Ysd) # sheep dominancetc[4]/(tc[3] + tc[4])

## [1] 0.4712

tc <- triad_census(Y90) # 90s conflicttc[4]/(tc[3] + tc[4])

## [1] 0.06113

25/53

Scaling the transitivity index

These results may seem counter-intuitive:

• Y90 seemed to be “more transitive” according to GOF plots;

• Y90 has the lowest transitivity index.

To what should the transitivity index be compared?

26/53

Transitivity index

tc <- triad_census(Yht) # high tech managerstc[4]/(tc[3] + tc[4])

## [1] 0.2292

mean(Yht, na.rm = TRUE)

## [1] 0.2429

##

tc <- triad_census(Y90) # 90s conflicttc[4]/(tc[3] + tc[4])

## [1] 0.06113

mean(Y90, na.rm = TRUE)

## [1] 0.0121

27/53

A scaled transitivity index

Intuitively, think of transitivity as the “effect” of yi,j = 1 and yi,k = 1 on yj,k .This can be measured with a log-odds ratio. Let

• p = transitive triads/( transitive + two-star triads) ;

• p = y .

τ = logodds(yj,k = 1 : yi,j = 1, yi,k = 1)

odds(yj,k = 1)

= logp

1− p

1− p

p

28/53


tlor <- function(Y) {p <- mean(Y, na.rm = TRUE)tc <- triad_census(Y)pt <- tc[4]/(tc[3] + tc[4])

od <- p/(1 - p)odt <- pt/(1 - pt)

log(odt/od)}

29/53


tlor(Yht) # high tech managers

## [1] -0.07568

tlor(Ydc) # Dutch college

## [1] 0.2417

tlor(Ysd) # sheep dominance

## [1] 0.6438

tlor(Y90) # 90s conflict

## [1] 1.67

By this measure Y90 is the most transitive network, matching our intuition.

30/53

Triad census for directed data

For undirected relations in a triad there are

• 23 = 8 possible graphs;

• 4 isomorhphic graphs.

For directed relations the situation is more complicated:

• 22 = 4 states per dyad, and 3 dyads, means

(22)3 = 26 = 64 possible states

• 16 isomorphic states.

31/53

Directed triad states

Each triad state is named according to its dyad census:

003 012 102

32/53

Transitivity

Transitive triple:Consider a triple {i , j , k} for which i → j and j → k.The triple is

transitive if i → k;

intransitive if i 6→ k.

Social theory: Nodes seek out transitive relations, avoid intransitive ones.

33/53

Transitivity

Based on the definition, 003, 012, 102 and the following triads are neithertransitive nor intransitive:

021D 021U

34/53

Transitivity

Any triad with a null dyad cannot be transitive:

021C 111D 111U 201

35/53

Transitivity

Triads with no null dyads can be intransitive, transitive or mixed:

030C 120C 210

Which of these is intransitive? Why are the others “mixed”?

36/53

Transitivity

The triads below are “transitive” in that they all have

• some transitivity;

• no intransitivity.

030T 120U 120D 300

37/53

Modeling transitivity in ERGMs

Recall the basic ERGModel:

Pr(Y = y) = c(θ) exp(θ1t1(y) + · · ·+ θK tK (y))

Evaluation of various forms of transitivity is accomplished by including suchsufficient statistics among t1(y), . . . , tK (y).

Let’s try this out through a model fitting exercise:

1. Y ∼ edges + mutual

2. Y ∼ edges + mutual + transitivity

where transitivity is some network statistic involving triples.

38/53

Monk dataRelation: yi,j = 1 if i “liked” j at any one of the three time periods.

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

tlor(Y)

## [1] -0.333

39/53

ERGM fit

fit_erg1 <- ergm(Y ~ edges + mutual)



## Convergence detected. Stopping.

## The log-likelihood improved by < 0.0001

##


summary(fit_erg1)

##

## ==========================

## Summary of model fit

## ==========================

##

## Formula: Y ~ edges + mutual

##

## Iterations: 20

##

## Monte Carlo MLE Results:

## Estimate Std. Error MCMC % p-value

## edges -1.761 0.205 0 <1e-04 ***

## mutual 2.319 0.412 0 <1e-04 ***

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##

## Null Deviance: 424 on 306 degrees of freedom

## Residual Deviance: 333 on 304 degrees of freedom

##

## AIC: 337 BIC: 344 (Smaller is better.)

40/53

GOF

gof_1 <- gof(fit_erg1, GOF = ~idegree + odegree + triadcensus)

0 1 2 3 4 5 6 7 8 9 10 12 14

0.0

0.1

0.2

0.3

0.4

in degree

prop

ortio

n of

nod

es

● ● ● ●

●

●

●

● ● ● ● ● ● ● ●

●

●

●

●

● ●

●

●

●

●

●

● ● ● ●● ●

●

●

●

●

●

● ●

●

● ●

● ● ●

0 1 2 3 4 5 6 7 8 9

0.0

0.1

0.2

0.3

0.4

0.5

out degree

prop

ortio

n of

nod

es

● ● ● ●

● ●

● ● ● ●

●

●

●

●

● ● ●

●

●

●

● ● ●

●

●

●

●

● ● ●

003 102 021U 111D 030T 201 120U 210

0.0

0.1

0.2

0.3

0.4

triad census

prop

ortio

n of

tria

ds

●

●

●

● ●●

● ●

● ●

●

● ● ● ●●

●

●

●

● ●

●

●●

●●

●

● ●●

●

●

●

●

●

●

● ●

●

●

●●

●

●

● ●

●

●

degree distributions: The fitted model generates degree distributions similar tothe observed;

triad census: The fitted model generates a triad census similar to the observed,perhaps with a few discrepancies (021U,111D,111U and 201).

41/53

Incorporating transitivity in ERGMs

fit_erg2 <- ergm(Y ~ edges + mutual + transitive)fit_erg3 <- ergm(Y ~ edges + mutual + triadcensus)fit_erg4 <- ergm(Y ~ edges + mutual + triadcensus(10))

transitive: Includes the network statistic

t(y) = number of transitive triples

where a “transitive triple is of the type” 030T, 120U, 120D or 300.(note: none of these seemed to stand out for the monk GOF plots.)

triadcensus: Includes 15 network statistics counting the number of triples ofeach type (excluding null triples).

triadcensus(k): Includes the network statistic

t(y) = number of triples of type k

42/53

Counting transitive triplesfit_erg2 <- ergm(Y ~ edges + mutual + transitive)





























































##


43/53

Counting transitive triples

as.matrix(simulate(fit_erg2))

## ROMUL BONAVEN AMBROSE BERTH PETER LOUIS VICTOR WINF JOHN GREG HUGH## ROMUL 0 1 1 1 1 1 1 1 1 1 1## BONAVEN 1 0 1 1 1 1 1 1 1 1 1## AMBROSE 1 1 0 1 1 1 1 1 1 1 1## BERTH 1 1 1 0 1 1 1 1 1 1 1## PETER 1 1 1 1 0 1 1 1 1 1 1## LOUIS 1 1 1 1 1 0 1 1 1 1 1## VICTOR 1 1 1 1 1 1 0 1 1 1 1## WINF 1 1 1 1 1 1 1 0 1 1 1## JOHN 1 1 1 1 1 1 1 1 0 1 1## GREG 1 1 1 1 1 1 1 1 1 0 1## HUGH 1 1 1 1 1 1 1 1 1 1 0## BONI 1 1 1 1 1 1 1 1 1 1 1## MARK 1 1 1 1 1 1 1 1 1 1 1## ALBERT 1 1 1 1 1 1 1 1 1 1 1## AMAND 1 1 1 1 1 1 1 1 1 1 1## BASIL 1 1 1 1 1 1 1 1 1 1 1## ELIAS 1 1 1 1 1 1 1 1 1 1 1## SIMP 1 1 1 1 1 1 1 1 1 1 1## BONI MARK ALBERT AMAND BASIL ELIAS SIMP## ROMUL 1 1 1 1 1 1 1## BONAVEN 1 1 1 1 1 1 1## AMBROSE 1 1 1 1 1 1 1## BERTH 1 1 1 1 1 1 1## PETER 1 1 1 1 1 1 1## LOUIS 1 1 1 1 1 1 1## VICTOR 1 1 1 1 1 1 1## WINF 1 1 1 1 1 1 1## JOHN 1 1 1 1 1 1 1## GREG 1 1 1 1 1 1 1## HUGH 1 1 1 1 1 1 1## BONI 0 1 1 1 1 1 1## MARK 1 0 1 1 1 1 1## ALBERT 1 1 0 1 1 1 1## AMAND 1 1 1 0 1 1 1## BASIL 1 1 1 1 0 1 1## ELIAS 1 1 1 1 1 0 1## SIMP 1 1 1 1 1 1 0

44/53

Goodness of fit

gof_2 <- gof(fit_erg2, GOF = ~idegree + odegree + triadcensus)

0 2 4 6 8 10 12 14

0.0

0.2

0.4

0.6

0.8

1.0

in degree

prop

ortio

n of

nod

es

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●

●

●

●

●

●

● ●

●

● ●

● ● ●

0 1 2 3 4 5 6 7 8 9

0.0

0.2

0.4

0.6

0.8

1.0

out degree

prop

ortio

n of

nod

es

● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●

●

●

●

●

● ● ●

003 021D 111U 201 120C

0.0

0.2

0.4

0.6

0.8

1.0

triad census

prop

ortio

n of

tria

ds

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

●●

● ● ●

●

●● ●

●● ● ●

● ●

45/53

Model degeneracy

This strange phenomonenon is (as far as I know) not a coding error:

• MLEs for ERGMs sometimes produce degenerate distributions;

• A degenerate distribution puts most of its probability on the null or fullgraph;

• See “Assessing Degeneracy in Statistical Models of Social Networks”(Handcock, 2003) for more details.

• Recent research on Bayesian ERGM estimation avoids such degeneracy.

46/53


summary(fit_erg3)

#### ==========================## Summary of model fit## ==========================#### Formula: Y ~ edges + mutual + triadcensus#### Iterations: 20#### Monte Carlo MLE Results:## Estimate Std. Error MCMC % p-value## edges 5.657 0.205 NA < 1e-04 ***## mutual -13.175 0.125 NA < 1e-04 ***## triadcensus.012 -0.126 0.164 NA 0.44275## triadcensus.102 0.723 0.281 NA 0.01069 *## triadcensus.021D -1.445 0.306 NA < 1e-04 ***## triadcensus.021U -0.402 0.363 NA 0.26948## triadcensus.021C -0.481 0.302 NA 0.11259## triadcensus.111D 0.519 0.435 NA 0.23366## triadcensus.111U -0.765 0.176 NA < 1e-04 ***## triadcensus.030T -1.684 0.449 NA 0.00021 ***## triadcensus.030C -Inf NA NA NA## triadcensus.201 -0.116 0.207 NA 0.57399## triadcensus.120D 0.231 0.474 NA 0.62569## triadcensus.120U -1.410 0.511 NA 0.00616 **## triadcensus.120C -0.982 0.466 NA 0.03577 *## triadcensus.210 -0.814 0.320 NA 0.01148 *## triadcensus.300 -1.112 0.818 NA 0.17515## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Warning: The standard errors are suspect due to possible poor convergence.#### Null Deviance: 424 on 306 degrees of freedom## Residual Deviance: NaN on 289 degrees of freedom#### AIC: NaN BIC: NaN (Smaller is better.)#### Warning: The following terms have infinite coefficient estimates:## triadcensus.030C

47/53


0 2 4 6 8 10 12 14

0.0

0.1

0.2

0.3

0.4

in degree

prop

ortio

n of

nod

es

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

●

●

●

● ●

●

● ●

●

● ●

● ●● ●

●

●

●

●

●

● ●

●

● ●

● ● ●

0 1 2 3 4 5 6 7 8 9

0.0

0.2

0.4

0.6

0.8

out degree

prop

ortio

n of

nod

es● ● ● ●

●

●

●

● ● ●● ● ●

●

●

●

●

●

● ●● ● ●

●

●

●

●

● ● ●

003 021D 111U 201 120C

0.00

0.05

0.10

0.15

0.20

0.25

0.30

triad census

prop

ortio

n of

tria

ds

●

●

●

● ● ●

●

●

● ●

●

●● ●

●●

●

●●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●●

●

●● ●

●●

• The degree distribution fit is improved, the triad fit is near perfect.• Is this latter fact surprising?

• The number of parameters in the model is quite large.• 15 additional parameters to represent 3rd order dependence.

• What about a reduced model?

48/53

Backwards elimination

summary(fit_erg4)

#### ==========================## Summary of model fit## ==========================#### Formula: Y ~ edges + mutual + triadcensus(c(2, 3, 6, 8, 10))#### Iterations: 20#### Monte Carlo MLE Results:## Estimate Std. Error MCMC % p-value## edges 1.152 1.639 45 0.483## mutual -7.898 5.930 45 0.184## triadcensus.102 0.772 0.381 42 0.043 *## triadcensus.021D -0.798 0.464 22 0.087 .## triadcensus.111D 0.792 0.351 31 0.025 *## triadcensus.030T -0.706 0.625 21 0.259## triadcensus.201 0.282 0.430 37 0.512## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Null Deviance: 424 on 306 degrees of freedom## Residual Deviance: 312 on 299 degrees of freedom#### AIC: 326 BIC: 352 (Smaller is better.)

49/53

Backwards elimination

summary(fit_erg5)

#### ==========================## Summary of model fit## ==========================#### Formula: Y ~ edges + mutual + triadcensus(c(2, 6))#### Iterations: 20#### Monte Carlo MLE Results:## Estimate Std. Error MCMC % p-value## edges -0.619 0.549 14 0.2603## mutual -2.256 1.630 13 0.1672## triadcensus.102 0.500 0.151 12 0.0010 **## triadcensus.111D 0.542 0.192 1 0.0051 **## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Null Deviance: 424 on 306 degrees of freedom## Residual Deviance: 317 on 302 degrees of freedom#### AIC: 325 BIC: 340 (Smaller is better.)

50/53

Goodness of fit

0 2 4 6 8 10 12 14

0.0

0.1

0.2

0.3

0.4

in degree

prop

ortio

n of

nod

es

● ● ● ●

● ●

●

● ● ● ● ● ● ● ●●

●

●

●

●

●

●

●

●

●

● ●

● ● ●● ●

●

●

●

●

●

● ●

●

● ●

● ● ●

0 1 2 3 4 5 6 7 8 9

0.0

0.1

0.2

0.3

0.4

0.5

out degree

prop

ortio

n of

nod

es

● ● ● ●

● ●

●

● ● ●● ●

●

●

●

●

●

●

●

●

● ● ●

●

●

●

●

● ● ●

003 021D 111U 201 120C

0.00

0.05

0.10

0.15

0.20

0.25

0.30

triad census

prop

ortio

n of

tria

ds

●

●

●

● ●

●

●

●

● ●

●

● ● ●●

●

●●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

● ●

●

●

●●

●

●

● ●

●

●

51/53

Fishing expeditions

Notice that mutual is no longer “significant”

• this does not mean that there is not much reciprocity in the network;

• it means reciprocity can be explained by a tendency for 102 and 111Dtriples.

Is such a tendency meaningful/interpretable?

111D 102

52/53

Fishing expeditions

Comments:Iterative model selection procedures

• produce parsimonious descriptions of the network dataset;

• produce p-values and standard errors that may be misleading:• in a large set of model statistics, some will appear significant due to chance.

Advice:

• descriptive modeling: choose a parsimonious, interpretable model.

• hypothesis testing: choose your models to reflect your hypotheses.

53/53

Latent variable models567 Statistical analysis of social networks

Peter Hoff


1/53

Example: Conflict in the 90s

AFG

ALB

ANG

ARG AUL

BAH

BEL

BEN

BNG

BOT

BUI

CAM

CAN

CAO

CDICHA

CHL

CHN

COL

CON

COS

CUB

CYP

DOM

DRC

EGY

FRN

GHA

GNB

GRC

GUI

GUY

HAI

HON

INDINS

IRN

IRQ

ISRITA

JOR

JPN

KEN

LBR

LES

LIB

MAA

MAL

MLI

MON

MORMYA

MZM

NAM

NIC

NIG

NIR

NTH

OMA

PAK

PHIPNG

PRKQAT

ROK

RWA

SAF

SAL

SAUSEN

SIE

SIN

SPN

SRI

SUD

SWA

SYR

TAW

TAZ

THI

TOG

TRI

TUR UAE

UGA

UKG

USA VEN

YEM

ZAMZIM

2/53

Reciprocity and transitivity

p <- mean(Y, na.rm = TRUE)pr <- mean(Y[t(Y) == 1], na.rm = TRUE)log((pr/(1 - pr)) * ((1 - p)/p))

## [1] 3.365

tlor(Y)

## [1] 0.9407

3/53


0 100 200 300 400

0.0

0.4

0.8

SA

BR

0 100 200 300 400

−2.

5−

2.2

BE

TA

0 100 200 300 400

0.25

0.40

reci

proc

ity

0 100 200 300 400

100

400

tran

sitiv

e tr

iple

s

0 5 10 15 20 25

010

30

outdegree

coun

t

0 5 10 15

010

2030

indegree

coun

t

4/53


tH

Den

sity

100 200 300 400 500 600

0.00

00.

002

0.00

4

mean(fit_p0$tt <= fit_p0$TT)

## [1] 0.0075

5/53

Transitivity explained by dyadic covariates

yi,j ∼ β0 + β1xi,j,1 + β2xi,j,2 + β3xi,j,3

• xi,j,1 = polity interaction;

• xi,j,2 = shared igo’s;

• xi,j,3 = geographic distance.

6/53


0 100 200 300 400

0.2

0.6

SA

BR

0 100 200 300 400

−2.

0−

1.0

0.0

BE

TA

0 100 200 300 400

0.20

0.35

0.50

reci

proc

ity

0 100 200 300 400

100

400

700

tran

sitiv

e tr

iple

s

0 5 10 15 20 25

010

30

outdegree

coun

t

0 5 10 15

010

2030

indegree

coun

t

7/53


tH

Den

sity

100 200 300 400 500 600 700 800

0.00

00.

002

mean(fit_p3$tt <= fit_p3$TT)

## [1] 0.325

8/53

Dyadic functions of nodal characteristics

Adding these dyadic covariates improved the fit with regard to # of triangles.

Note that each of these dyadic covariates was a function of nodal covariates:

• xi,j,1 = f1(polityi , polityj)

• xi,j,2 = f2(igoi , igoj)

• xi,j,3 = f3(locationi , locationj)

This suggests models of the form

yi,j ∼ β0 + β1xi,j

xi,j = s(xi , xj)

where s(·, ·) is some (non-additive) function of the nodal characteristics xi , xj .

9/53

Homophily and stochastic equivalence

More generally, let

ui be a covariate of i as a sender of ties;

vj be covariate of j as a receiver of ties.

yi,j ∼ β0 + β1 × s(ui , vj)

Such a model can describe various types of higher-order dependence, including

transitivity

stochastic equivalence

10/53

Transitivity via homophily

We’ve discussed how homophily on covariates might explain transitivity:

yi,j ∼ β0 + β1 × s(xi , xj)

If β1 > 0, then

• yi,j = 1⇒ xi ≈ xj ;

• yi,k = 1⇒ xi ≈ xk ;

• xi ≈ xj , xi ≈ xk ⇒ xj ≈ xk .

• xj ≈ xk ⇒ yj,k = 1.

11/53

Stochastic equivalence

Returning to the more general model:

yi,j ∼ β0 + β1 × s(ui , vj)

If ui = uk then i and j are equivalent as senders in terms of the model.

Example: Probit regression

Pr(Yi,j = 1) =eβ0+β1s(ui ,vj )

1 + eβ0+β1s(ui ,vj )

If ui = uk = u, then

Pr(Yi,j = 1) =eβ0+β1s(ui ,vj )

1 + eβ0+β1s(ui ,vj )= Pr(Yk,j = 1),

and nodes i and k are stochastically equivalent (as senders).

12/53

Homophily and stochastic equivalence

In our hypothetical model of social relations, we’ve seen how nodalcharacteristics relate to homophily and stochastic equivalence.

Homophily: Similar nodes link to each other

• “similar” in terms of characteristics (potentially unobserved)

• homophily leads to transitive or clustered social networks

• observed transitivity may be due to exogenous or endogenous factors

( See Shalizi and Thomas 2010 for a more careful discussion )

Stochastic equivalence: Similar nodes have similar relational patterns

• similar nodes may or may not link to each other

• equivalent nodes can be thought of as having the same “role”

13/53

Visualizing stochastic equivalence

●●

●

●●

● ●

●●

●

●

●

●●

●

●

●●

●

●

●

●

●

● ●●

● ●

●

●●

●

●

● ●

●

●

●

●●

●●

●

●

●● ●

● ●●

●●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

For which network is homophily a plausible explanation?

Which network exhibits a large degree of stochastic equivalence?

14/53

Homophily and stochastic equivalence in real networks

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

,

;:

.

a

above

abundantly

after

air

all

also

and

appear

be

bearing

beast

beginning

behold

blessedbringbrought

called

cattle

createdcreaturecreepeth

creeping

darkness

day

days

deep

divide

divided

dominion

dry

earth

evening

every

face

female

fifth

fill

finished

firmament

first

fish

fly

forform

forth

fourth

fowl

from

fruit

fruitful

gathered

gathering

give

given

godgood

grass

great

greater

green

had

hath

have

he

heaven

heavens

herb

him

his

host

i

image

inis

it

itself

kind

land

lesser

let

life

light

lights

likeness livingmade

make

male

man

may

meat

midstmorning

moved

moveth

moving

multiply

night

of

one

open

our

over

own

place

replenishrule

said

saw

saying

seaseas

seasons

second

seed

set

shall

signs

sixth

so

spirit

stars

subdue

that

the

their

them

there

thing

third

thus

to

together

tree

two

under

untoupon

us

veryvoid

was

waters

were

whales

wherein

which

whose

winged

without

years

yielding

you

b0185

b2316 b1094

b4187

b2697b1731

b2097

b2496

b4131

b1000b0882

b0437

b0438b1120

b3556

b1557

b1823

b0880

b0623

b3162

b3702

b0184

b0015

b0014

b0215

b2779

b1288

b0180

b2925

b3895

b0684

b3463

b0095

b3517b1493

b3181b3406

b2614

b2231

b3699

b0059b4172

b1099

b4259

b4372b1842

b2526

b3931

b3932

b4000

b0440

b2728

b2729

b3687

b1718

b2530b2529

b1215 b0186

b0179

b2890

b4129

b3418

b2942

b4143

b4142

b3251

b0924

b0923

b3189

b1224

b1225b1226

b1467

b3169

b3982 b0133

b3019

b1127b0903

b3164

b3863

b4226

b1207

b2585

b3725

b2476

b1854 b2892b3822

b3619

b2038

b3780

b1084

b0214

b3704 b3319

b3295b3987b3988

b3067 b3461b3202

b2741b3649

b0098

b3609

b3590

b2620

b3650b2576

b4059

b3115

b0406

b1274b1763b3919

b0170b3339

b3980

b2319

b1913

b4179

b0101

b0119

b0528b0607

b0660

b0877

b0948

b1055b1086

b1103

b1269

b2330

b2517

b2579

b2581b2830b2960

b3180 b3470

b3746

b4191

b0661

b2836b2323b4041b2563b1095

b1093b3730b0736

b3888 b0492

b3701

b3783

b3317

b4203

b3315b3314b0169b3341

b1468

b2592

b0990

b3298

b3320

b3308

b3231

b0911b3296b3303b3984

b3309

b3321b0023

b1552

b3054b2606

b3186b4200b3306

b3230

b3307

b3165b2609

b4202

b0470b0640

b2699

b0595b1294

b2610

b3305

b2515b2311

b3340

b3247

b1413

b0439b0436

b0178

b1808

b3935

b3417

b2487b2721

b3686

b3986

b2096

b0922

b3168

b0480

b2011

b3301b1716

b2372b4051

b1817

b0925

b3637

b3652

b3294

b3499b3304

b3602

b3318

b2804

b3342

• AddHealth friendships: friendships among 247 12th-graders

• Word neighbors in Genesis: neighboring occurrences among 158 words

• Protein binding interactions: binding patterns among 230 proteins

15/53

Measuring transitivity and stochastic equivalence

Transitivity

• number of triangles;

• number of transitive triples;

• triad census.

These measure transitivity as a global phenomenon.


• row correlation: ρri,j = cor(y[i,], y[j,]);

• column correlation: ρci,j = cor(y[,i ], y[,j]);

• row correlations=column correlations if relation is undirected.

These correlations represent a local phenomenon.

16/53

Identifying correlated nodesExample: Cold war data

AFG ALB

ARG

AUL

AUS

BELBRA

BUL

CANCHL

CHN

COLCOS

CUB

CZE

DEN

DOM

ECUEGYETH

FRN

GDR

GFRGRC

GUA

HAI

HON

HUN

IND

INS

IRE

IRN

IRQ

ISR

ITA

JORLBR

LEB

MYA

NEP

NEW

NIC

NOR

NTH

OMA

PANPER

PHI

POR

PRK ROK RUM

SAF

SAL

SAU

SPN

SRI

SWD

TAWTHI TUR

UKGUSA

USR

VEN

YUG

17/53

Identifying correlated nodes

cidx <- c(4, 11, 21, 22, 23, 62, 63, 64)rownames(Y)[cidx]

## [1] "AUL" "CHN" "FRN" "GDR" "GFR" "UKG" "USA" "USR"

Cor(Y)[cidx, cidx]

## AUL CHN FRN GDR GFR UKG USA USR## AUL 1.00000 -0.62989 0.03983 -0.48604 0.3819 0.6440 0.8023 -0.3929## CHN -0.62989 1.00000 0.06979 0.34001 -0.3497 -0.3502 -0.3640 0.5946## FRN 0.03983 0.06979 1.00000 -0.01605 0.1881 0.2016 0.2808 0.1861## GDR -0.48604 0.34001 -0.01605 1.00000 -0.6381 -0.7727 -0.4662 0.8170## GFR 0.38190 -0.34973 0.18810 -0.63812 1.0000 0.7246 0.3625 -0.6622## UKG 0.64397 -0.35018 0.20162 -0.77267 0.7246 1.0000 0.6809 -0.6246## USA 0.80230 -0.36397 0.28079 -0.46622 0.3625 0.6809 1.0000 -0.1250## USR -0.39290 0.59464 0.18609 0.81700 -0.6622 -0.6246 -0.1250 1.0000

18/53

Correlations of correlations

Cor(Cor(Y))[cidx, cidx]

## AUL CHN FRN GDR GFR UKG USA USR## AUL 1.0000 -0.8208 0.3998 -0.7853 0.7530 0.8639 0.9233 -0.7801## CHN -0.8208 1.0000 -0.2121 0.6858 -0.6995 -0.7356 -0.6625 0.7795## FRN 0.3998 -0.2121 1.0000 -0.2415 0.3468 0.4155 0.4289 -0.1494## GDR -0.7853 0.6858 -0.2415 1.0000 -0.9206 -0.9387 -0.7622 0.8865## GFR 0.7530 -0.6995 0.3468 -0.9206 1.0000 0.9293 0.7205 -0.8927## UKG 0.8639 -0.7356 0.4155 -0.9387 0.9293 1.0000 0.8599 -0.8558## USA 0.9233 -0.6625 0.4289 -0.7622 0.7205 0.8599 1.0000 -0.6912## USR -0.7801 0.7795 -0.1494 0.8865 -0.8927 -0.8558 -0.6912 1.0000

19/53

Correlations of correlations

Cor(Cor(Cor(Cor(Y))))[cidx, cidx]

## AUL CHN FRN GDR GFR UKG USA USR## AUL 1.0000 -0.9992 0.9839 -0.9969 0.9962 0.9973 0.9990 -0.9985## CHN -0.9992 1.0000 -0.9771 0.9949 -0.9943 -0.9949 -0.9966 0.9979## FRN 0.9839 -0.9771 1.0000 -0.9886 0.9887 0.9905 0.9897 -0.9843## GDR -0.9969 0.9949 -0.9886 1.0000 -0.9998 -0.9998 -0.9985 0.9992## GFR 0.9962 -0.9943 0.9887 -0.9998 1.0000 0.9997 0.9978 -0.9990## UKG 0.9973 -0.9949 0.9905 -0.9998 0.9997 1.0000 0.9991 -0.9990## USA 0.9990 -0.9966 0.9897 -0.9985 0.9978 0.9991 1.0000 -0.9984## USR -0.9985 0.9979 -0.9843 0.9992 -0.9990 -0.9990 -0.9984 1.0000

20/53

CONCOR

Iteratively computing the correlations until convergence is known as CONCOR.

CONCOR is an algorithm that partitions the nodes into two groups based oncorrelations.

• two nodes with similar relations generally go into the same group;

• the procedure will always produce two groups.

21/53

Identifying correlated nodesExample: Cold war data

AFG

ARG

AUL

BEL

CANCHL

COLCOS

DEN

DOM

EGY

FRN

GFRGRC

HAI

HON

IND

IRE

ISR

ITA

JORLBR

NEP

NEW

NOR

NTH

PAN

PHI

POR

ROK

SAF

SAL

SAU

SPN

SWD

TAWTHI TUR

UKGUSA

YUG

ALB

AUS BUL

CHNCUB

CZE

ECUGDR

GUA

HUN

INS

IRN

IRQ

LEB

MYA

NICPER

PRK RUM

SRI

USR

22/53

Blockmodels

A blockmodel is

• a partition of the nodes into classes;

• an estimation of the rate of ties between and within classes.

Specifically, a blockmodel consists of

• a classification function c : {1, . . . , n} → {1, . . . ,K},i.e. ci = k means i is in block/group k.

• a between-group tie density matrix:

Θ =

θ11 · · · θ1K...

......

θK1 · · · θKK

Under this model,Pr(Yi,j = 1) = θci ,cj

Note: When the classes/blocks are defined by known binary covariates, theblockmodel is essentially the same as binary regression with indicators for blockmemberships.

23/53

Stochastic equivalence and blockmodels

The blockmodel is a type of the form we’ve been discussing:

yi,j ∼ β0 + β1s(xi , xj)

Let

• β0 = 0, β1 = 1;

• xi = ci , xj = cj ;

• s(xi , xj) = θci ,cj

Stochastic equivalence:All nodes within the same block are stochastically equivalent, under this model.

If ci = ck = c, then

Pr(Yi,j = 1) = Pr(Yk,j = 1) = θc,cj .

24/53

Stochastic blockmodels

Consider the task of identifying stochastically equivalent classes from the data.

In the simplest case of an undirected binary relation, we want to find

latent classes c1, . . . , cn ∈ {1, . . . ,K};between-class rates Θ = {θk,l : 1 ≤ k, l ≤ K}that make the probability of our data large:

Pr(Y = y|c,Θ) =∏

i 6=j

θyi,jci ,cj (1− θci ,cj )1−yi,j

=K∏

k=1

K∏

l=1

θsk,lk,l (1− θk,l)nk,l−sk,l ,

where

• nk,l= number of pairs (i , j) for which ci = k and cj = l

• sk,l= number of pairs (i , j) for which ci = k and cj = l and yi,j = 1.

This model is sometimes called the stochastic blockmodel.

25/53

Stochastic blockmodel

You can’t take derivative to find the MLEs of c1, . . . , cn. Instead, use

• EM algorithm

• Gibbs sampling/MCMC

The basic model can be extended in various ways:

• covariates/regressors;

• directed data• dyadic correlation;• separate sender and receiver classes.

Much of this was done in Nowicki and Snijders (2001).

A variety of more complex variants have been recently developed.

26/53

Extension for ordinal data

Suppose Yi,j ∈ {y1, . . . , ym}, where y1 < y2 < · · · < ym.

An version of the blockmodel can be obtained from an ordinal probit model:

Zi,j = θci ,cj + εi,j

Yi,j =

y1 if c1 ≤ Zi,j < c2...

ym if cK ≤ Zi,j

Here, the values of {θk,l : 1 ≤ k, l ≤ K} are not probabilitiies, but they definethe relative probabilities for the outcome Yi,j .

27/53

Latent class for Cold War data

A three-class model gives the following inferred classes:

rownames(Y)[latentclass == 1]

## [1] "CHN" "INS" "IRN" "IRQ" "PRK" "USR"

rownames(Y)[latentclass == 2]

## [1] "AUL" "CAN" "GFR" "ITA" "NEW" "NOR" "NTH" "PHI" "ROK" "THI" "TUR"## [12] "UKG" "USA"

and everyone else in class 3, with the following between-class means:

round(M, 2)

## [,1] [,2] [,3]## [1,] -0.65 -1.73 -0.21## [2,] -1.73 2.96 0.20## [3,] -0.21 0.20 0.15

These rates are on the probit scale (recall the data are ordinal).

28/53

Latent class for Cold War data

AULCAN

GFR

ITA

NEW

NOR

NTH

PHI ROKTHI

TUR

UKGUSA

CHN

INS

IRN

IRQ

PRK USR

AFG ALB

ARGAUS

BELBRA

BUL

CHL

COL

COS

CUB

CZE

DEN

DOM

ECU

EGYETH

FRN

GDR

GRC

GUA

HAI

HON

HUN

IND

IRE

ISRJOR

LBR

LEB

MYA

NEP

NIC

OMA

PANPER

POR

RUM

SAF

SAL

SAU

SPN

SRI

SWD

TAW

VEN

YUG

29/53

Blockmodels for directed data

For directional data, we also want to allow for within-dyad dependence.

Recall from way back when the following ERGM

p(yi,j , yj,i |µ, ai , bj , γ) =eµi,j yi,j+µj,i yj,i+γyi,j yj,i


p1 modelµi,j = µ+ ai + bj

Directional stochastic blockmodel

µi,j = θci ,cj

This latter model is essentially the one presented in Nowicki and Snijders (2001)It is straightforward to include covariate information in the model.

30/53

Matrix form of the blockmodel

The essential feature of the blockmodel is the representation

yi,j ∼ θci ,cjwhere

• ci , cj are unobserved latent class variables;

• Θ is a matrix of between-class intensities.

This model structure can be expressed in matrix form as follows:

θci ,cj = uTi Θuj ,

where

• ui is a K × 1 vector of all 0s except ui [ci ] = 1;

• Θ is the K × K matrix of between class intensities.

(0 0 1

)θ11 θ12 θ13θ21 θ22 θ23θ31 θ32 θ33

100

= θ31

31/53

Generalizing the blockmodel

This blockmodel imposes two somewhat restrictive assumption.


• The model presumes that each node is (probabilistically) equivalent toother members of its class.

• In reality, we might expect that all nodes are somewhat different, althoughsimilar to each other to varying degrees.

Sender-receiver equivalence

• The model presumes that latent attributes defining nodes as senders ofties are the same as reciever attributes.

• In reality, nodes may behave differently as senders of ties than receivers ofties

Example: People may be outgoing or reserved, and popular or not popular.The most efficient way to represent the four types is with

• separate sender and reciever latent classes, with two levels each, instead of

• common sender and reciever latent classes, with four classes each.

32/53

Latent factor models

A natural generalization of the blockmodel uTi Θuj is the latent factor model:

yi,j ∼ uTi Dvj ,

• ui is a vector of latent factors describing i as a sender of ties;

• vj is a vector of latent factors describing j as a receiver of ties;

• D is a diagonal matrix of factor weights.

Normal, binomial, ordinal data can be represented with this structure as follows:

zi,j = uTi Dvj + εi,j

yi,j = g(zi,j)

where g is some increasing function:

• g(z) = z for normal data;

• g(z) = 1(z > 0) for binomial data;

• g(z) = some increasing step function for ordinal data.

33/53

Understanding latent factors

Z = UTDV + E


=R∑

r=1

drui,rvj,r + εi,j

For example, in a 2 factor model, we have

zi,j = d1(ui,1 × vj,1) + d2(ui,2 × vj,2) + εi,j

Interpretation

• ui ≈ uj : similarity of latent factors implies approximate stoch equivalence;

• ui ≈ vj : similarity of latent factors implies high probability of a tie.

34/53

Matrix decomposition interpretation

Recall from linear algebra:

• Every m × n matrix Z can be written

Z = UDVT

where D = diag(d1, . . . , dn), U and V are orthonormal.

• If UDVT is the svd of Z, then

Zk ≡ U[,1:k]D[1:k,1:k]VT[,1:k]

is the least-squares rank-k approximation to Z.

35/53

Least squares matrix approximations

5 10 15 20

05

1015

2025

30si

ngul

ar v

alue

−10 −5 0 5

−10

−5

05

Z

Z

r=1

−10 −5 0 5

−10

−5

05

Z

Z

r=3

−10 −5 0 5

−10

−5

05

Z

Z

r=6

−10 −5 0 5

−10

−5

05

Z

Z

r=12

36/53

LFM for symmetric data

Probit version of the symmetric latent factor model:

yi,j = g(zi,j) , where g is a nondecreasing function

zi,j = uTi Λuj + εi,j , where ui ∈ RK , Λ=diag(λ1, . . . , λK )

{εi,j} iid∼ normal(0, 1)

Writing {zi,j} as a matrix ,

Z = UΛUT + E

Recall from linear algebra:

• Every n × n symmetric matrix Z can be written

Z = UΛUT

where Λ = diag(λ1, . . . , λn) and U is orthonormal.

• If UΛUT is the eigendecomposition of Z, then

Zk ≡ U[,1:k]Λ[1:k,1:k]UT[,1:k]

is the least-squares rank-k approximation to Z.

37/53

Least squares approximations of increasing rank

0 20 40 60 80 100

−50

050

Index

eige

nval

ue

−6 −2 2 4 6

−6

−2

02

46

range(c(x, y))

Z1^

−8 −4 0 2 4 6

−8

−4

02

46

range(c(x, y))

Z3^

−5 0 5

−5

05

range(c(x, y))

Z9^

−6 −2 2 4 6

−6

−2

02

46

range(c(x, y))

Z27^

38/53

Understanding eigenvectors and eigenvalues

Z = UTΛU + E

zi,j = uTi Λuj + εi,j

=R∑

r=1

λrui,ruj,r + εi,j

For example, in a rank-2 model, we have

zi,j = λ1(ui,1 × uj,1) + λ2(ui,2 × uj,2) + εi,j

Interpretation

• ui,r ≈ uj,r : equality of latent factors; represents stochastic equivalence;

• λr > 0: positive eigenvalues represent homophily;

• λr < 0: negative eigenvalues represent antihomophily.

39/53

R-Package eigenmodel

Description:

Construct approximate samples from the posterior distribution of

the parameters and latent variables in an eigenmodel for symmetric

relational data.

Usage:

eigenmodel_mcmc(Y, X = NULL, R = 2, S = 1000, seed = 1, Nss = min(S-burn, 1000), burn = 0)

Arguments:

Y: an n x n symmetric matrix with missing diagonal entries.

Off-diagonal missing values are allowed.

X: an n x n x p array of regressors

R: the rank of the approximating factor matrix

S: number of samples from the Markov chain

seed: a random seed

Nss: number of samples to be saved

burn: number of initial scans of the Markov chain to be dropped

Value: a list with the following components:

Z_postmean: posterior mean of the latent variable in the probit specification

ULU_postmean: posterior mean of the reduced-rank approximating matrix

Y_postmean: the original data matrix with missing values replaced by posterior means

L_postsamp: samples of the eigenvalues

b_postsamp: samples of the regression coefficients

40/53

Friendship example

> library(eigenmodel)> data(YX_Friend)> fit<-eigenmodel_mcmc(Y=YX_Friend$Y,X=YX_Friend$X,R=2,S=100000,burn=5000)

0 200 400 600 800 1000

0.4

0.6

0.8

1.0

1.2

sample

λ

−2 −1 0 1 2

−1.

00.

01.

02.

0

15

1922

33

42

53

5561

66 68

69

79

95100102110

112

117128129 132

141

155

164

177180

183193

194

202213

215

216

217

227

240

242247

253256

258

263

275

283

304320

327

330

334

335

345349

355

367

374

380

383

392

397398

400

414

418

422

433

439

442

450

456462

480

482

483486

492

495

497

508

509

515

519

524

527

535

536

544

545

557 560

561

0 200 400 600 800 1000

0.0

0.1

0.2

0.3

0.4

iteration

β

0.0

0.1

0.2

0.3

0.4

β

same_sex same_race

●

●

41/53

Protein interaction example

> library(eigenmodel)> data(Y_Pro)> fit<-eigenmodel_mcmc(Y=Y_Pro,R=2,S=100000,burn=5000)

0 200 400 600 800 1000

−0.

50.

00.

51.

0

sample

λ

−2.0 −1.5 −1.0 −0.5 0.0 0.5

−2

−1

01

2

b0185b2316 b1094

b4187 b2697

b1731

b2097b2496 b4131

b1000b0882

b0437b0438b1120b3556b1557

b1823b0880

b0623

b3162

b3702

b0184

b0015

b0014

b0215

b2779

b1288

b0180

b2925 b3895b0684

b3463

b0095

b3517b1493

b3181b3406

b2614

b2231

b3699

b0059

b4172

b1099 b4259b4372b1842

b2526

b3931

b3932

b4000

b0440b2728

b2729

b3687

b1718

b2530b2529

b1215

b0186

b0179

b2890b4129

b3418b2942

b4143

b4142

b3251

b0924b0923b3189

b1224b1225b1226b1467

b3169b3982

b0133b3019b1127b0903

b3164

b3863

b4226b1207

b2585

b3725

b2476b1854

b2892

b3822

b3619b2038

b3780

b1084

b0214

b3704

b3319

b3295b3987

b3988

b3067

b3461

b3202b2741

b3649

b0098

b3609

b3590

b2620

b3650b2576

b4059

b3115

b0406

b1274

b1763

b3919b0170b3339

b3980b2319

b1913

b4179

b0101

b0119b0528

b0607

b0660

b0877

b0948

b1055

b1086

b1103

b1269

b2330

b2517

b2579

b2581

b2830b2960

b3180

b3470b3746

b4191

b0661

b2836b2323b4041b2563b1095b1093b3730b0736b3888 b0492b3701

b3783

b3317

b4203b3315

b3314b0169

b3341

b1468b2592b0990

b3298

b3320

b3308

b3231b0911

b3296b3303

b3984

b3309b3321b0023

b1552

b3054b2606b3186

b4200

b3306b3230

b3307

b3165

b2609

b4202

b0470b0640

b2699

b0595b1294

b2610

b3305

b2515b2311b3340

b3247b1413

b0439

b0436b0178

b1808b3935

b3417

b2487b2721b3686

b3986b2096

b0922

b3168b0480

b2011

b3301b1716b2372b4051

b1817

b0925

b3637

b3652

b3294

b3499b3304

b3602b3318

b2804

b3342

42/53

AME models

Returning to directed relations:

SRM: We have motivated the SRM in order to represent 2nd orderdependence:

• within row dependence, within column dependence;

• within dyad dependence.

zi,j = βT xi,j + ai + bj + εi,j

yi,j = g(zi,j)

This model is made up of additive random effects.

LFM: We have motivated the LFM to model more complex structures:

• third order dependence and transitivity;

• stochastic equivalence.


yi,j = g(zi,j)

This model is made up of multiplicative random effects.

43/53

AME modelsCombining them gives an additive and multiplicative effects model

zi,j = βT xi,j + ai + bj + uTi Dvj + εi,j

yi,j = g(zi,j)

Such models can be fit with the amen package:

ame_bin(Y, X, rvar = TRUE, cvar = TRUE, dcor = TRUE, R = 0, seed = 1, nscan = 50000,burn = 500, odens = 25, plot = TRUE, print = TRUE)

Arguments:






R: integer: dimension of the multiplicative effects (can bezero)

44/53


AFG

ALB

ANGARG

AUL

BAH

BEL

BEN

BNG

BOT

BUI

CAM

CAN

CAO CDI

CHACHL

CHN

COL

CON

COS

CUB

CYP

DOM

DRC

EGY

FRN

GHA

GNB

GRC

GUI

GUY

HAI

HON

IND

INS

IRN

IRQISR

ITA

JOR

JPNKEN

LBR

LES

LIB

MAA

MAL

MLI

MON

MORMYA

MZM

NAM

NIC

NIG

NIRNTH

OMA

PAK

PHI

PNG

PRK

QAT

ROK

RWA

SAF

SAL

SAU SEN

SIE

SIN

SPNSRI

SUD

SWA

SYR

TAW

TAZ

THI

TOG

TRI

TUR

UAEUGAUKG

USA

VEN

YEM

ZAM

ZIM

45/53

AMEN fits

## srm fitfit_srm <- ame_bin(Y, XA[, , 1])

## ame fitfit_ame <- ame_bin(Y, XA[, , 1], R = 1)

46/53

GOF - SRM model

0 100 200 300 400

0.0

0.4

0.8

SA

BR

0 100 200 300 400

−2.

5−

2.2

BE

TA

0 100 200 300 400

0.25

0.40

reci

proc

ity

0 100 200 300 400

100

400

tran

sitiv

e tr

iple

s

0 5 10 15 20 25

010

30

outdegree

coun

t

0 5 10 15

010

2030

indegree

coun

t

47/53

GOF - SRM model

tH

Den

sity

100 200 300 400 500 600

0.00

00.

002

0.00

4

mean(fit_srm$tt <= fit_srm$TT)

## [1] 0.0075

48/53

GOF - AME model

0 100 200 300 400

0.0

0.4

0.8

SA

BR

0 100 200 300 400

−3.

0−

2.7

−2.

4B

ETA

0 100 200 300 400

0.20

0.35

0.50

reci

proc

ity

0 100 200 300 400

100

400

700

tran

sitiv

e tr

iple

s

0 5 10 15 20 25

010

30

outdegree

coun

t

0 5 10 15

010

2030

indegree

coun

t

49/53

GOF - AME model

tH

Den

sity

100 200 300 400 500 600 700

0.00

00.

002

0.00

4

mean(fit_ame$tt <= fit_ame$TT)

## [1] 0.15

50/53

Latent factor plot

−0.4 −0.2 0.0 0.2

−0.

3−

0.2

−0.

10.

00.

10.

2

u

v

AFGALB

ANG

ARGAULBAH

BEL

BEN

BNG

BOTBUICAM

CAN

CAO

CDI

CHA

CHL

CHN

COLCON

COS

CUB

CYP

DOM

DRC

EGYFRN

GHAGNB

GRC

GUIGUY

HAI

HONIND

INS

IRN

IRQ

ISR

ITAJOR

JPN

KEN

LBR

LESLIBMAA

MALMLIMON

MOR

MYAMZM

NAM

NIC

NIG

NIR

NTHOMAPAK

PHIPNG

PRK

QAT

ROK

RWA

SAF

SAL

SAUSEN

SIE

SIN

SPN

SRI

SUDSWA

SYR

TAWTAZTHI

TOGTRI

TUR

UAE

UGA

UKG

USA

VEN

YEM

ZAM

ZIM

−0.2 −0.1 0.0 0.1 0.2 0.3−

0.2

0.0

0.1

0.2

0.3

0.4

u

v

AFGALB

ANG

ARG

AUL

BAH

BEL

BEN

BNG

BOTBUICAMCAN

CAOCDI

CHA

CHL

CHNCOL

CON

COS

CUBCYP

DOMDRC

EGY

FRN

GHAGNB

GRCGUIGUY

HAI

HON

INDINS

IRN

IRQ

ISR

ITA

JOR

JPN

KEN

LBR

LES

LIB

MAA

MAL

MLI

MONMOR

MYA MZMNAM

NICNIG

NIR

NTHOMA

PAK

PHIPNG

PRK

QATROK

RWA

SAFSAL SAU

SEN

SIE

SIN

SPNSRI

SUD

SWASYRTAW

TAZ

THITOGTRITUR

UAE

UGA

UKG

USA

VENYEM

ZAM

ZIM

51/53

Latent factor plot

AFG

ALB

ANG

ARG AULBAH BEL

BEN

BNG

BOT

BUI

CAM

CAN

CAO

CDI

CHA

CHL

CHN

COLCON

COS

CUB

CYP

DOM

DRC

EGYFRN

GHA

GNB

GRC

GUIGUY

HAI

HON

IND

INS

IRN

IRQ

ISRITA

JOR

JPN

KEN

LBR

LES

LIB

MAA

MAL

MLI

MONMOR

MYA

MZM

NAM

NIC

NIG

NIRNTH

OMA

PAK

PHI

PNG

PRK

QAT

ROK

RWASAF

SAL

SAU

SENSIE

SIN

SPN

SRI

SUD

SWA

SYR

TAW

TAZ

THITOG

TRI

TUR

UAE

UGA

UKGUSA

VEN

YEM

ZAM

ZIM

AFGALB

ANG

ARG

AUL

BAH

BEL

BEN

BNG

BOT

BUI

CAM

CAN

CAO

CDI

CHA

CHL

CHN

COL

CON

COS

CUB

CYP

DOM

DRC

EGY

FRN

GHA

GNB

GRC

GUIGUY

HAI

HON

IND

INS

IRNIRQ

ISR

ITA

JOR

JPN

KEN

LBR

LES

LIBMAA

MAL

MLI

MON

MOR

MYA

MZM

NAM

NIC

NIG

NIR

NTHOMA

PAK PHIPNG

PRK

QAT

ROK

RWASAF

SAL

SAU

SEN

SIE

SIN

SPN

SRI

SUD

SWA

SYR

TAW

TAZ

THI

TOG

TRI

TUR

UAE

UGA

UKGUSA

VENYEM

ZAM

ZIM

52/53

Summary

• network patterns can often be explained by nonlinear combinations ofnodal characteristics;

• in the absence of observed nodal characteristics, patterns can be explainedby latent nodal characteristics.

53/53

SVD567 Statistical analysis of social networks

Peter Hoff


1/9


AFG

ALB

ALG

ANG

ARG AUL

AUS

BAHBEL

BEN

BFO

BHU

BNG

BOL

BOT

BRA

BUI

BUL

CAM

CAN

CAOCDI

CEN

CHA

CHL

CHN

COL

COM

CON

COS

CUB

CYP

DENDJI

DOM

DRC

ECU

EGY

EQG

FIN

FJI

FRN

GAB

GAM

GHA

GNB

GRC

GUA

GUI

GUY

HAI

HON

HUN

IND

INS

IRE

IRNIRQ

ISR

ITA

JAM

JOR

JPN

KEN

LAO

LBR

LES

LIB

MAA

MAGMAL

MAS

MAW

MEX

MLI

MON

MOR

MYA

MZM

NAM

NEP

NEW

NIC

NIG

NIR

NORNTH

OMA

PAK

PAN

PAR

PHI

PNG

POL

POR

PRK

QAT

ROK

RUM

RWA

SAF SAL

SAU

SEN SIE

SIN

SOM

SPN

SRI

SUD

SWA

SWDSWZ

SYR

TAW

TAZ

THITOG

TRI

TUN

TURUAE

UGAUKG URUUSA

VEN

YEM

ZAM

ZIM

2/9

SVD on sociomatrix

## compute svd and extract componentssvdY <- svd(Y)d <- svdY$dD <- diag(d)U <- svdY$uV <- svdY$v

3/9

Squared singular values

0 20 40 60 80 100 120

050

100

150

200

Index

d^2

sum(d^2)

## [1] 715

sum(d^2) - sum(d[1:2]^2)

## [1] 417.8

sum(d^2) - sum(d[1:4]^2)

## [1] 3094/9

Best 2-factor fit

Y2 <- U[, 1:2] %*% D[1:2, 1:2] %*% t(V[, 1:2])sum((Y - Y2)^2)

## [1] 417.8

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●

●●●●

●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●

●

●●

●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●

●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●

●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●

●

●

●

●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●

●

●●●●●●●●●

●

●

●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●

●

●●●●●●

●

●●●●●●●●●●●●●●●●●

●

●

●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●

●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●

●

●●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●● ●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●

●●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●

●

●●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●● ●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●

●●

●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●

0 2 4 6 8

02

46

8

Y

Y2

5/9

Best 4-factor fit

Y4 <- U[, 1:4] %*% D[1:4, 1:4] %*% t(V[, 1:4])sum((Y - Y4)^2)

## [1] 309

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●

●●●

●

●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●

●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●

●

●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●

●

●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●

●

●●●●●●●●●

●

●

●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●

●

●●●●

●

●●●●●●●●●●●●●●●●●

●

●

●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●

●

●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●

●

●

●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●

●●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●

●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●

●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●

0 2 4 6 8

02

46

8

Y

Y4

6/9

Examining the factors

AFGANGARGAULBAHBNGBOTBUI

CAN

CAOCHA

CHN

COS

CUB

CYPDOMDRCEGY

FRN

GHAGRC GUIHONINDINS

IRN

IRQ

ISRITA

JOR

JPN

KENLBRLESLIBMALMYANICNIGNIRNTHOMAPAK

PHI

PRK

QAT

ROK

RWASAFSALSAU SENSIESRISUDSWA

SYRTAW

TAZTHITOGTUR UAE

UGA

UKGUSA

VENAFG ALBANG

ARG

AUL

BAHBEL

BEN

BNG

BUICAM

CAN

CDICHACHL

CHN

COLCONCOSCUBCYP DRC

EGYFRN

GHAGNB

GRC

GUIGUYHAI HONINDINSIRN

IRQ

ISRITA

JOR

JPN

LBRLESLIBMAAMLI

MONMOR

MYAMZMNAMNICNIG

NIRNTHOMAPAKPHI

PNGPRK

QAT

ROK

RWASAFSAL

SAUSENSIE

SIN

SPN

SUD

SYR

TAW

TAZTHITOGTRI

TURUAE

UGA

UKG

USA

VENYEMZAMZIM

7/9

Predict from the factor plots

Who was attacking Iraq?

y.airq <- Y[, cnames == "IRQ"]y.airq[y.airq > 0]

## BAH EGY FRN GRC IRN ISR ITA JOR NTH OMA SAU TUR UAE UKG USA## 1 1 4 1 6 2 1 1 1 1 1 6 2 5 7

8/9

Predict from the factor plots

Whom did North Korea attack?

y.nkora <- Y[cnames == "PRK", ]y.nkora[y.nkora > 0]

## AUL CAN CHN JPN ROK USA## 1 1 3 4 4 3

9/9

Sampling and incomplete network data567 Statistical analysis of social networks

Peter Hoff


1/58

Network sampling methods

It is sometimes difficult to obtain a complete network dataset:

• the population nodeset is too large;

• gathering all relational information is too costly;

• population nodes are hard to reach.

In such cases, we need to think carefully how to

• gather the data (i.e. design the survey);

• make inference (i.e. estimate and evaluate parameters).

2/58

Common sampling methods

1. node-induced subgraph sampling

2. edge-induced subgraph sampling

3. egocentric sampling

4. link tracing designs

5. censored nomination schemes

3/58

Node-induced subgraph sampling

Procedure:

1. Uniformly sample a set s = {s1, . . . , sns} of nodes

s ⊂ {1, . . . , n}.

2. Observe relations ys between sampled nodes

Ys = {yi,j : i ∈ s, j ∈ s}.

4/58


●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

5/58


●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●●

●

●●

●

●

●

6/58


●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●●

●

●●

●

●

●

7/58


●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

8/58


●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

●

●●

●

●

9/58


●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

●

●●

●

●

10/58

Estimation from sampled data

In what ways does Ys resemble Y?

For what functions g() will g(Ys) estimate g(Y)?

Consider the following setup:

• n × n sociomatrix Y

• n × n dyadic covariate Xd

• n × 1 nodal covariate Xn

Can we estimate the following from a sample?

y = 1n(n−1)

∑

i 6=j

yi,j , xd = 1n(n−1)

∑

i 6=j

xd,i,j , xn = 1n

∑xn,i

yxd = 1n(n−1)

∑

i 6=j

yi,jxd,i,j , yxn =1

n(n − 1)

∑

i

xn,i yi·

11/58


0.0 0.2 0.4 0.6

y

−1.0 −0.5 0.0 0.5

xd

0 2 4 6 8 10

xn

0.0 0.2 0.4 0.6 0.8

yxd

0 2 4 6 8

yxn

12/58


For some functions g , the sample value g(Ys) is an unbiased estimator of thepopulation value g(Y):

g(Y) = an average of subgraphs of size k, for k ≤ ns

g(Y) = 1

(n2)

∑

i<j

h(yi,j , yj,i )

g(Y) = 1

(n3)

∑

i<j<k

h(yi,j , yj,i , yi,k , yk,i , yj,k , yk,j) if ns ≥ 3

Why does it work?:Each subgraph of size k appears in the sample with equal probability (althoughthe subgraphs that appear are dependent).

Some functions of interest are not of this type:

• in and outdegree distributions;

• geodesics, distances, number of paths, etc.

13/58

Edge-induced subgraph sampling

Procedure:

1. Uniformly sample a set e = {e1, . . . , ene} of edges

e ⊂ {(i , j) : yi,j = 1}

2. Let Ys be the edge-generated subgraph of e.

14/58


●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

15/58


●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

16/58


●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●●

● ●

●

●●

●●●

●

●●

●

●

17/58


●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

18/58


●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

19/58


●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●●

●

●

20/58


How well do these subgraphs represent Y?

Can you infer anything about Y from these data?

21/58


0.15 0.25 0.35 0.45

y

−0.2 0.0 0.2 0.4

xd

4 5 6 7 8 9

xn

0.1 0.2 0.3 0.4

yxd

1 2 3 4 5

yxn

22/58

Egocentric sampling

Procedure:

1. Uniformly sample a set s1 = {s1,1, . . . , s1,ns} of nodes

s1 ⊂ {1, . . . , n}.

2. Observe the relations for each i ∈ s1, i.e. observe {yi,1, . . . , yi,n}.3. Let s2 be the set of nodes having a link from anyone in s1. Observe the

relations of anyone in s2 to anyone in s1 ∪ s2.

Ys = {yi,j : i , j ∈ s1 ∪ s2}

For large graphs, these data can be obtained (with high probability) by askingeach i ∈ s1 the following:

1. Who are your friends?

2. Among your friends, which are friends with each other?

23/58

Link-tracing designs

Snowball sampling: Iteratively repeat the egocentric sampler, obtaining thestage-k nodes sk from the links of sk−1.

This is a type of link-tracing design. The links of the current nodes determinewho is next to be included in the sample.

How will such subgraphs Ys be similar to Y?How will they differ?

24/58

Egocentric sampling

0.0 0.2 0.4 0.6

y

−1.0 −0.5 0.0 0.5

xd

0 2 4 6 8

xn

0.0 0.2 0.4 0.6

yxd

0 1 2 3 4 5 6

yxn

25/58

Egocentric sampling

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

26/58

Egocentric sampling

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●●

●●

27/58

Egocentric sampling

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●●

●●

28/58

Egocentric sampling

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●●

●● ●

●

●

●

●

●●

●

●●

●

●●

● ●

●●

●●

●●●

29/58

Egocentric sampling

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●●

●●●

●● ●

●

●

●

●

●●

●

●●

●

●●

● ●

●●

●●

30/58

Egocentric sampling

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

31/58

Egocentric sampling

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

●●

●

32/58

Egocentric sampling

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

●●

●

33/58

Egocentric sampling

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

●●

●

●●

●

●●●●

●

34/58

Egocentric sampling

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●●

●

●

●

●●

●

●●●●

●●●

●

35/58

Inference with egocentric samples

• Ys is not generally representative of Y.

• For some statistics, weighted averages based on Ys can be unbiased(Horwitz-Thompson estimator).

• For many statistics, part of Ys can be used to obtain good estimates:• degree distributions can be estimated from degrees of egos;• covariate distributions can be estimated from those of the egos;

However, use of data from s2 generally requires a reweighting scheme.

36/58

References

• Snijders (1992), “Estimation on the basis of snowball samples: How toWeight?”

• Kolaczyk (2009) “Sampling and Estimation in Network Graphs,” chapter 5of Statistical Analysis of Network Data.

37/58

Parameter estimation with incomplete sampled data

Model: Pr(Y = y|θ), θ ∈ Θ.

Complete data: Y

Observed data: Y[O], where O is a set of pairs of indices

O =

i1 j1i2 j2i3 j3...

...is js

How can we make inference for θ based on Y[O]?

38/58

Study design and missing data


1 2 3 4 5 61 NA 0 1 0 0 12 0 NA 1 0 0 13 1 0 NA 1 0 04 0 1 0 NA 0 05 0 0 1 0 NA 16 1 0 1 0 0 NA

39/58



1 2 3 4 5 61 NA 0 1 0 0 12 0 NA 1 0 0 13 1 0 NA 1 0 04 0 1 0 NA 0 05 0 0 1 0 NA 16 1 0 1 0 0 NA

40/58



1 2 3 4 5 61 NA 0 1 0 0 12 0 NA 1 0 0 13 1 0 NA 1 0 04 0 1 0 NA 0 05 0 0 1 0 NA 16 1 0 1 0 0 NA

41/58


Node-induced subgraph sampling: Observed data

1 2 3 4 5 61 NA NA NA NA NA NA2 NA NA 1 NA 0 NA3 NA 0 NA NA 0 NA4 NA NA NA NA NA NA5 NA 0 1 NA NA NA6 NA NA NA NA NA NA

42/58



1 2 3 4 5 61 NA 0 1 0 0 12 0 NA 1 0 0 13 1 0 NA 1 0 04 0 1 0 NA 0 05 0 0 1 0 NA 16 1 0 1 0 0 NA

43/58



1 2 3 4 5 61 NA 0 1 0 0 12 0 NA 1 0 0 13 1 0 NA 1 0 04 0 1 0 NA 0 05 0 0 1 0 NA 16 1 0 1 0 0 NA

44/58


Edge-induced subgraph sampling: Observed data

1 2 3 4 5 61 NA NA NA NA NA 12 NA NA 1 NA NA NA3 1 NA NA NA NA NA4 NA NA NA NA NA NA5 NA NA 1 NA NA NA6 NA NA 1 NA NA NA

45/58


Egocentric sampling

1 2 3 4 5 61 NA 0 1 0 0 12 0 NA 1 0 0 13 1 0 NA 1 0 04 0 1 0 NA 0 05 0 0 1 0 NA 16 1 0 1 0 0 NA

46/58


Egocentric sampling

1 2 3 4 5 61 NA 0 1 0 0 12 0 NA 1 0 0 13 1 0 NA 1 0 04 0 1 0 NA 0 05 0 0 1 0 NA 16 1 0 1 0 0 NA

47/58


Egocentric sampling

1 2 3 4 5 61 NA 0 1 0 0 12 0 NA 1 0 0 13 1 0 NA 1 0 04 0 1 0 NA 0 05 0 0 1 0 NA 16 1 0 1 0 0 NA

48/58


Egocentric sampling

1 2 3 4 5 61 NA 0 1 0 0 12 0 NA 1 0 0 13 1 0 NA 1 0 04 0 1 0 NA 0 05 0 0 1 0 NA 16 1 0 1 0 0 NA

49/58


Egocentric sampling

1 2 3 4 5 61 NA NA NA NA NA NA2 0 NA 1 0 0 13 NA 0 NA NA NA 04 NA NA NA NA NA NA5 NA NA NA NA NA NA6 NA 0 1 NA NA NA

50/58

Parameter estimation with missing data

If the data are missing at random, i.e. the value of o, what you get to observe,

• doesn’t depend on θ

• doesn’t depend on values of Y,

then valid likelihood and Bayesian inference can be obtained from theobserved-data likelihood:

lMAR(θ : y[o]) = Pr(Y[o] = y[o] : θ)

=∑

y[oc ]

Pr(Y = y : θ)

Inference based on l(θ : y[o]) is provided in amen:

• put NA’s in place of any non-observed relations.

51/58

Missing at random designs

Which designs we’ve discussed correspond to MAR relations?

• Node-induced subgraph sampling?

• Edge-induced subgraph sampling?

• Egocentric sampling?

52/58

Ignorable designs

While egocentric and other link-tracing designs are not MAR, they still can beanalyzed as if they were. The argument is as follows:

The “data” include

• O = o, the determination of which relations you get to see;

• Y[O] = y[o], the relationship values for the observable relations.

The likelihood is then

l(θ : o, y[o]) = Pr(Y[o] = y[o],O = o|θ)

= Pr(Y[o] = y[o]|θ)× Pr(O = o|θ,Y[o] = y[o])

= lMAR(θ : y[o])× Pr(O = o|θ,Y[o] = y[o])

If the design part doesn’t depend on θ, then the observed likelihood isproportional to the MAR likelihood, and the design can be ignored.

53/58

Ignorable designs

l(θ : o, y[o]) = lMAR(θ : y[o])× Pr(O = o|θ,Y[o] = y[o])

When is the design ignorable?

(MAR) If the probability that O equals o doesn’t depend on θ or Y (e.g.,node-induced subgraph sampling), the design is ignorable.

ID If the probability that O equals o

• doesn’t depend on θ

• only depends on Y through Y[o].

then the design is ignorable.

The latter conditions are often met for link tracing designs, like egocentric andsnowball sampling.

54/58

References

• Thompson and Frank (2000) “Model-based estimation with link-tracingsampling designs”

• Heitjan and Basu (1996) “Distinguishing ‘Missing at Random’ and‘Missing Completely at Random’

55/58

Simulation study - ID likelihoods

yi,j = β0 + βrxn,i + βcxn,j + βd,i,j + ai + bj + εi,j

fit.pop = fitted model based on complete network data

fit.samp = fitted model based on sampled network data

How do the parameter estimates of fit.samp compare to those of fit.pop?

56/58

Node-induced subgraph sample

np = 32, ns = 10

β0

−4.0 −3.0 −2.0 −1.0βr

0.0 0.1 0.2 0.3 0.4

βc

−0.2 −0.1 0.0 0.1 0.2 0.3βd

0.0 0.5 1.0 1.5

57/58

Egocentric sample

np = 32, ns1 = 4

β0

−3.2 −2.8 −2.4 −2.0βr

0.05 0.15 0.25 0.35

βc

−0.10 −0.05 0.00 0.05βd

0.4 0.6 0.8 1.0 1.2

58/58

introduction - university of washingtonpdhoff/courses/567/notes/slides_2014.pdf¥ graph theory and...

Documents