joint social selection and social influence models for networks: the interplay of ties and...

Post on 15-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Joint social selection and social influence models for networks:

The interplay of ties and attributes.

Garry Robins

Michael JohnstonUniversity of Melbourne,

Australia

Symposium on the dynamics of networks and behavior

Slovenia, May 10-11, 2004

Thanks to Pip Pattison, Tom Snijders, Henry Wong, Yuval Kalish, Antonietta Pane

A thought experiment:

Most models that purport to explain important global network properties are homogeneous across nodes.

Might a simple model of interactions between node-level and tie-level effects be sufficient to explain global properties?

1. Develop a model that incorporates both social selection and social influence processes.

2. Which global properties of networks are important?

3. Simulate the model to see whether the these properties can be reproduced in a substantial proportion of graphs?

1. Develop a model incorporating both social selection and social influence effects

Simple random graph models

For a fixed n nodes, edges are added between pairs of nodes independently and with fixed probability p

(Erdös & Renyi, 1959)

Pr( ) exp ijx

1X x

Bernoulli random graph distribution:

X is a set of random binary network variables [Xij]; Xij = 1 when an edge is observed, = 0 otherwise;x is a graph realization;θ is an edge parameter.

an exponential random graph (p*) model.

1

ep

e

a homogeneous model – (node homogeneity)

p and θ are independent of node labels

A Bernoulli random graph model will not fit this network well

In this example, actor attributes are important to tie formation Social selection

Yellow: Jewish

Blue: Arab

(Kalish, 2003)Exogenous attributes affect network ties

In this example, actor attributes are important to tie formation Social selection

Binary variables:Xij network tiesZi actor attributes

Exogenous attributes affect network ties

Zi

Zj

Xij

1 2Pr( ) exp ij ij i ij i jx x z x z z

1X x

Robins, Elliott & Pattison, 2001

1 2Pr( ) exp ij ij i ij i jx x z x z z

1X x

Effects in the model

Baseline edge effectirrespective of attributes

Propensity for actors with attribute z=1 to have more partners

Propensity for ties to form between actors who both have attribute z=1

Equivalent (blockmodel) parameterization:

Social influence: Are actor attributes influenced by fixed network structure?

Robins, Pattison & Elliott, 2001

Robins, Pattison & Elliott, 2001

Social influence: Are actor attributes influenced by fixed network structure?

A cutpoint

Social influence: Are actor attributes influenced by fixed network structure?

Exogenous network ties affect attributes

Binary variables:Xij network tiesZi actor attributes

Exogenous network ties affect attributes

Zi

Zj

Xij

1 2Pr( ) exp i ij iz x z

1X x

Effects in the model

Baseline effect for number of attributed nodes (z=1)

Propensity for attributed nodes to have more partners

1 2Pr( ) exp i ij iz x z

1X x

No effect for an actor being influenced by a network partner

need to introduce dependencies among attribute variables

Assume attribute variables are dependent if the actors are tied

partial conditional dependence (Pattison & Robins, 2002)

Zi

Zj

Xij

1 2 3Pr( ) exp i ij i ij i jz x z x z z

1X x

Effects in the model

Baseline effect for number of attributed nodes (z=1)

Propensity for attributed nodes to have more partners

1 2 3Pr( ) exp i ij i ij i jz x z x z z

1X x

Propensity for attributed nodes to be connected

Friendship network for training squad in 12th week of training (Pane, 2003)Green: detachedYellow: team orientedRed: positive

Why should attributes or ties be exogenous?

Models for joint social selection/social influence

Zi

Zk

Xik

1 2 1 2Pr( ) exp

where [ , ]

i i j ij ij i ij i jz z z x x z x z z

1Y y

Y X Z

Xij Zj

1 2 1 2Pr( ) exp i i j ij ij i ij i jz z z x x z x z z

1Y y

Effects in the modelQuadratic effect in no. of attributed nodes

Propensity for attributed nodes to have more partners

Propensity for attributed nodes to be connected

Baseline effect for no. of edges

Equivalent (blockmodel) parameterization:

1 2 1 2Pr( ) exp i i j ij ij i ij i jz z z x x z x z z

1Y y

Change statistics

1 2

Pr( 1 , )log ( )

Pr( 0 , )

Cij ij

i j i jCij ij

Xz z z z

X

z x

z x

2 21 1

Pr( 1 , )log

2 2Pr( 0 , )

Cr r

s rs rs sCs r s r s rr r

Zz x x z

Z

x z

x z

Conditional log-odds for a tie to be observed:

Conditional log-odds for an attribute to be observed:

2. Which global network properties are important ?

– Small worlds• Short average geodesics

• High clustering

– Skewed degree distributions– Regions of higher density among nodes

• cohesive subsets, “community structures”

Confiding (trust) network (Pane, 2003)

An example network (without attributes)

Many observed networks have short average geodesics – small worlds

The confiding network has a median geodesic (G50) of 2: not extreme compared to a distribution of Bernoulli graphs

The confiding network has a third quartile geodesic (G75) of 2:

also not extreme compared to a distribution of Bernoulli graphs.

Observed networks: Path lengths

Many observed networks have high clustering – small worlds

Observed networks: Clustering

Global Clustering coefficient:

3 × (no. of triangles in graph) / (no. of 2-paths in graph) = 3T / S2

The confiding network has a global clustering coefficient of 0.41:

a comparable Bernoulli graph sample has a mean clustering coefficient of 0.25 (sd=0.03)

Many observed networks have high clustering – small worlds

Observed networks: Clustering

Local Clustering coefficient: For each node i, compute density among nodes adjacent to i.

Average across the entire graph.

The confiding network has a local clustering coefficient of 0.58:

a comparable Bernoulli graph sample has a mean clustering coefficient of 0.25 (sd=0.04)

Many observed networks have high clustering – small worlds

The confiding network has a global clustering coefficient of 0.41

The confiding network has a local clustering coefficient of 0.58

Observed networks: Clustering

Many observed networks have skewed degree distributions

as is the case for the confiding network

Observed networks: Degree distribution

DEGREE

20181614121086420

fre

qu

en

cy

10

8

6

4

2

0

Observed networks: Higher order clusteringk-triangles

(Snijders, Pattison, Robins & Handcock, 2004)

Alternating k-triangles

23 221 2 3

( ) ... ( 1)n nn

T TTu T

x

1-triangle

2-triangle

3-triangle

Permits modeling of (semi) cohesive subsets of nodes (cf community structures)

Observed networks: Higher order clustering

Observed networks often exhibit regions (subsets of nodes) with higher density

In which case, we will see an alternating k-triangle statistic higher than for Bernoulli graphs

Alternating k-triangle statistic

4003002001000

Glo

bal c

lust

erin

g

.7

.6

.5

.4

.3

.2

.1

0.0

The k-triangle statistic is not simply equivalent to global clustering

• Short median geodesics (G50)

• Short third quartile geodesics (G75) – perhaps?

• High clustering

• High k-triangle statistics

• Skewed degree distributions

Bernoulli distributions tend to have short median geodesics, low clustering and low k-triangles

Hence a basis for comparison

SummarySome global features not uncommon in observed networks

3. Simulate the model to see whether global properties can be reproduced

Use the Metropolis algorithm –procedure similar to Robins, Pattison & Woolcock (in press)

Typically 300,000 iterationsreject initial simulations for burnin

Sample every 1000th graph

Inspect degree distributions across sample

Compare each graph in sample with a Bernoulli graph distribution with same expected density

Hence can determine if graph- has short G50, G75

- highly clustered; high k-triangles

Define highly clustered and short G50 as SW50 (small world)Similarly define SW75

Simulation of the model

1 2 1 2Pr( ) exp i i j ij ij i ij i jz z z x x z x z z

1X x

Quadratic effect in no. of attributed nodes

Propensity for attributed nodes to have more partners

Propensity for attributed nodes to be connected

Baseline effect for no. of edges

First simulation series: 30 node graphs

1 2

1 2 2 2 1

fix: 7; 0.5; 1.0;

vary: , s.t. 0, 2

Change statistics

1 2

Pr( 1 , )log ( )

Pr( 0 , )

Cij ij

i j i jCij ij

Xz z z z

X

z x

z x

Conditional log-odds for a tie to be observed:

Expect density to be same among:• non-attributed nodes (zi = zj = 0)• attributed nodes (zi = zj = 1)

1 2 2 1vary: , s.t. 2

Numbers of edges and attributed nodes

Beta1

-1.00

-1.50

-1.80

-2.00

-2.20

-2.50

-2.70

-3.00

-3.50

Mea

n nu

mbe

r of

edg

es a

nd a

ttrib

utes

100

80

60

40

20

0

EDGE

Attributed nodes

Assortative and dissasortative mixing

Beta1

-1.00

-1.50

-1.80

-2.00

-2.20

-2.50

-2.70

-3.00

-3.50

.4

.3

.2

.1

0.0

-.1

Mean density

non-attributed nodes

Mean density betw een

att & non-att nodes

Mean density

attributed nodes

Acceptance rates

Beta1

-1.00

-1.50

-1.80

-2.00

-2.20

-2.50

-2.70

-3.00

-3.50

.5

.4

.3

.2

.1

0.0

-.1

Acceptance rate

attributes

Acceptance rate

edges

Clustering

Beta1

-1.00-1.50-1.80-2.00-2.20-2.50-2.70-3.00-3.50

.28

.26

.24

.22

.20

.18

Global clustering

Local clustering

k- triangles

Beta1

-1.00-1.50-1.80-2.00-2.20-2.50-2.70-3.00-3.50

t-st

atis

tic f

or k

-tria

ngle

s

1.6

1.4

1.2

1.0

.8

.6

.4

.2

0.0

-.2

Geodesics and clustering

Beta1

-1.00-1.50-1.80-2.00-2.20-2.50-2.70-3.00-3.50

Per

cent

of

sam

ple

120

100

80

60

40

20

0

Short median

geodesic

Short third quartile

geodesic

High clustering

Small worlds

Beta1

-1.00-1.50-1.80-2.00-2.20-2.50-2.70-3.00-3.50

Per

cent

of

sam

ple

40

30

20

10

0

SW50

SW75

1= 3.0 Degree distributions

D20D18D16D14D12D10D8D6D4D2D0

12

10

8

6

4

2

0

1= 3.0 Graph is SW50 (but not SW75)

t-statistic for k-triangles (relative to Bernoulli) = 2.02

1= 3.0 The graph also has a skewed degree distribution:

Although unusual for graphs in this distribution

Degree

D12D10D8D6D4D2D0

Fre

quen

cy

14

12

10

8

6

4

2

0

Conclusions for this series of simulations

• The parameter estimates results in approximately equal numbers of attributed and non-attributed nodes– Density within the two sets of nodes are similar and high.

• As the “attribute expansiveness” (β1) parameter becomes more negative, and the “attribute connection” (β2) parameter more positive:– acceptance rate for attributes decreases, – clustering and community structure increases, 3rd quartile geodesics

decrease, but median geodesic remain relatively short

• Graphs with “small world” features, but not with skewed degree distributions, are common within a medium range of the “attribute expansiveness” parameter.

1 2 1 2Pr( ) exp i i j ij ij i ij i jz z z x x z x z z

1X x

Quadratic effect in no. of attributed nodes

Propensity for attributed nodes to have more partners

Propensity for attributed nodes to be connected

Baseline effect for no. of edges

Second simulation series: 30 node graphs

1 2 1

2

fix: 6; 1; 2.5; 1

vary: to be increasingly positive

Beta2

4.504.304.204.104.003.903.803.503.00

95%

Con

fiden

ce in

terv

als

80

60

40

20

0

Mean no of edges

Mean no of nodes

Numbers of edges and attributed nodes

Beta2

4.504.304.204.104.003.903.803.503.00

.6

.5

.4

.3

.2

.1

0.0

-.1

Density

Non-attributed nodes

Density

att and non-att

Density

Attributed nodes

Assortative and dissasortative mixing

Beta2

4.504.304.204.104.003.903.803.503.00

.4

.3

.2

.1

0.0

-.1

Acceptance rate

attributes

Acceptance rate

edges

Acceptance rates

Beta2

4.504.304.204.104.003.903.803.503.00

.5

.4

.3

.2

.1

0.0

Global clustering

Local Clustering

Clustering

Beta2

4.504.304.204.104.003.903.803.503.00

t-st

atis

tic f

or k

-tria

ngle

s

4

3

2

1

0

-1

k- triangles

Beta2

4.504.304.204.104.003.903.803.503.00

Per

cent

age

of s

ampl

e

120

100

80

60

40

20

0

Short median

geodesic

Short third quartile

geodesic

High clustering

Geodesics and clustering

Beta2

4.504.304.204.104.003.903.803.503.00

Per

cent

of

dist

ribut

ion

40

30

20

10

0

SW50

SW75

Small worlds

D20D18D16D14D12D10D8D6D4D2D0

16

14

12

10

8

6

4

2

0

2 =3.0 Degree distributions

D20D18D16D14D12D10D8D6D4D2D0

12

10

8

6

4

2

0

2 =4.0 Degree distributions

D20D18D16D14D12D10D8D6D4D2D0

12

10

8

6

4

2

0

2 =4.5 Degree distributions

2 =4.0Graph is SW50 (but not SW75)

t-statistic for k-triangles (relative to Bernoulli) = 3.98

Degree

D12D10D8D6D4D2D0

Fre

quen

cy

8

6

4

2

0

Graph is SW50 (but not SW75)

t-statistic for k-triangles (relative to Bernoulli) = 3.98

And with skewed degree distribution

2 =4.0

2 =4.0

Conclusions for the second series of simulations

• The parameter estimates result in a minority of attributed nodes with high internal density, and a majority of non-attributed nodes with lower density.

• As the “attribute connection” (β2) parameter increases, no of edges and attributes increase somewhat, and acceptance rate for attributes decreases, – clustering and community structure increases, 3rd quartile and median geodesic

become longer.

– Degree distributions become skewed, and then bimodal

• Graphs with “small world” features, and with skewed degree distributions, make up a sizeable proportion of distributions with large “attribute similarity” parameter.

Some final comments

• This “thought experiment” demonstrates that several important global features of social networks may be emergent from attribute-based processes of mutually interacting social influence and social selection:– Short average paths

– High clustering

– Small world properties

– Community structures

– Skewed degree distribution

• Moreover, the models do not presume fixed attributes– although the structural properties begin to emerge as attributes become

“sticky” (changing more slowly)

Some final comments

• Network models typically assume homogeneity across graphs.– This assumption may not be appropriate to the actual processes that are

generating the network.

– One way that homogeneity may break down is through attribute-based processes.

– Other possibilities include: social settings; geographic proximity

• Network studies may require a careful conceptualisation of “process” to ensure that models are properly specified.– Because process is (usually) local, with global implications, the

possibility of node-level effects should not be excluded.

top related