social network analysis

AOT LABDII, UNIPR

SOCIAL NETWORK ANALYSISEnrico Franchi ([email protected])

1

mailto:[email protected]


SNA = Complex Network Analysis on Social Networks

Outline

Notation & Metrics

Models

Models Discussion

Conclusion

Degree DistributionPath LengthsTransitivityRandom GraphsSmall-WorldsPreferential Attachment

2

G = (V ,E) E ⊂V 2

(x, x) x ∈V{ }∩E =∅

Network

Adjacency Matrix

Aij =1 if (i,j) ∈E0 otherwise

⎧⎨⎩

Directed Network

A symmetricUndirected Network

kiin = A ji

j∑ki

out = Aijj∑

ki = kiin + ki

out

ki = A jij∑ = Aij

j∑

Degree Distribution

Average Degree

px =1n# i ki = x{ }

k = n−1 kxx∈V∑

3

Local Clustering Coefficient Ci =ki2( )−1T (i)

Clustering Coefficient C = 1n

Cii∈V∑

T(i): # distinct triangles with i as vertex

C =number of closed paths of length 2( )

number of paths of length 2( ) =number of triangles( )× 3

number of connected triples( )

Measure of Transitivity

4

Sk (M) =M + .^ Mk ^ .+Mk( )

Set of Adjacency Matrices

A,+,⋅( )AB = A + .⋅B The matrix product depends from

the operations of the semi-ring

Other matrix products make sense: e.g., or A,+,^( ) A,^,+( )

min

We consider:

L = Sn …S1( ) M( )Shortest path lengths matrix:

Shortest Path Length and Diameter

Diameter: d = maxijL Average shortest path: = Lij

scalar operations

AB[ ]ij = Aik ⋅Bkjk∑

5

Computational Complexity of ASPL:

O n3+α( ) α ≈ 3 / 4All pairs shortest path matrix based (parallelizable):

All pairs shortest path Bellman-Ford: O n3( )All pairs shortest path Dijkstra w. Fibonacci Heaps: O n2 logn + nm( )

x = Mq (S)

Computing the CPL

q#S elements are ≤ than x and (1-q)#S are > than x

x = Lqδ (S) q#S(1-δ) elements are ≤ than x and (1-q)#S(1-δ) are > than x

s = 2q2ln 2

1−δ( )2δ 2

Huber Algorithm

Let R a random sample of S such that #R=s, then Lqδ(S) = Mq(R) with probability p = 1-ε.

6

s = 2q2ln 2

1−δ( )2δ 2

7

1

10

100

1000

10000

100000

1000000

10000000

1 10 100 1000

Facebook Hugs Degree Distribution

Nodes: 1322631 Edges: 1555597m/n: 1.17 CPL: 11.74Clustering Coefficient: 0.0527Number of Components: 18987Isles: 0Largest Component Size: 1169456

8For small k power-laws do not hold

For large k we have statistical fluctuations

0.1

1

10

100

1000

10000

100000

1000000

1 10 100 1000

Power-Law: ! gamma=3

Many networks have power-law degree distribution. pk ∝ k−γ γ >1• Citation networks

• Biological networks

• WWW graph

• Internet graph

• Social Networks

9

kr = ?

G(n, p)G(n,m)

p

ppp

pp

pp

p

p Pr(Aij = 1) = p

Erdös-Rényi Random Graphs

Ensembles of Graphs

When describe values of properties, we actually the expected value of the property

d := d = Pr(G) ⋅d(G)G∑ ∝ logn

log kPr(G) = pm (1− p)

n2( )−m

m =n2

⎛⎝⎜

⎞⎠⎟p k = (n −1)p

pk =n −1k

⎛⎝⎜

⎞⎠⎟pk (1− p)n−1−k

C = k (n −1)−1

Connectedness Threshold logn / n

pk = e− k k k

k!n→∞ 10

p

Watts-Strogatz Model

11

In the modified model, we only add the edges.

ki =κ + si

Edges in the lattice # added

shortcuts

ps = e−κ s κ p( )s

s!

pk = e−κ s κ p( )k−κ

k −κ( )!

C = 3(κ − 2)4(κ −1)+ 8κ p + 4κ p2

≈ log(npκ )

κ 2p

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Strogatz-Watts Model - 10000 nodes k = 4

CPL(p)/CPL(0)C(p)/C(0)

CP

L(p)

/CP

L(0)

C(p

)/C(0

)

pShort CPLThreshold Large Clustering Coefficient

Threshold12

Matt Britt ©13

BARABASI-ALBERT-MODEL(G,M0,STEPS) FOR K FROM 1 TO STEPS N0 ← NEW-NODE(G) ADD-NODE(G,N0) A ← MAKE-ARRAY() FOR N IN NODES(G) PUSH(A, N) FOR J IN DEGREE(N) PUSH(A, N) FOR J FROM 1 TO M N ← RANDOM-CHOICE(A) ADD-LINK (N0, N)

Barabási-Albert Model

Pr(V = x ) = Pr(E = e) =e∈N (x )∑

= kxm

= 2kxkx

x∑

pk ∝ x−3

No analytical proof available

≈ logn

log logn

C ≈ n−3/4

Scale-free entailsshort CPL

Transitivity disappearswith network size

Connectedness Threshold

lognlog logn

14

OSN Refs. Users Links <k> C CPL

d γ r

Club NexusCyworldCyworld TLiveJournalFlickrTwitterOrkutOrkutYoutubeFacebookFB HFB GLBrightKiteFourSquareLiveJournalTwitterTwitter

Adamic et al 2.5 K 10 K 8.2 0.2 4 13 n.a. n.a.Ahn et al 12 M 191 M 31.6 0.2 3.2 16 -0.1Ahn et al 92 K 0.7 M 15.3 0.3 7.2 n.a. n.a. 0.4

Mislove et al 5 M 77 M 17 0.3 5.9 20 0.2Mislove et al 1.8 M 22 M 12.2 0.3 5.7 27 0.2Kwak et al 41 M 1700 M n.a. n.a. 4 4.1 n.a.

Mislove et al 3 M 223 M 106 0.2 4.3 9 1.5 0.1Ahn et al 100 K 1.5 M 30.2 0.3 3.8 n.a. 3.7 0.3

Mislove et al 1.1 M 5 M 4.29 0.1 5.1 21 -0Gjoka et al 1 M n.a. n.a. 0.2 n.a. n.a. 0.23Nazir et al 51 K 116 K n.a. 0.4 n.a. 29 n.a.Nazir et al 277 K 600 K n.a. 0.3 n.a. 45 n.a.

Scellato et al 54 K 213 K 7.88 0.2 4.7 n.a. n.a.Scellato et al 58 K 351 K 12 0.3 4.6 n.a. n.a.Scellato et al 993 K 29.6 M 29.9 0.2 4.9 n.a. n.a.

Java et al 87 K 829 K 18.9 0.1 n.a. 6 0.59Scellato et al 409 K 183 M 447 0.2 2.8 n.a. n.a.

15

• Moreover:

• Mostly no navigability

• Uniformity assumption

• Sometimes too complex for analytic study

• Few features studied

• Power-law?

16

Static Deg C Rigid

ER

WS

BA

Yes Poisson Low -

Yes Poisson Ok Yes

No PL γ=3 Fixable Yes

Alternative models for degree distributionsPower-laws are difficult to fit.When they do, there are often better distributions.

Power-law with cutoff almost always fits better than plain power-law.

f (x;γ ,β ) = x−γ eβx

Sometimes the log-normal distribution is more appropriate

f (x;σ ,m) = 1xσ (2π )1/2

exp − log(x /m)( )22σ 2

⎛

⎝⎜⎞

⎠⎟

Most of the times random and preferential attachment processes concur

F(x;r) = 1− (rm)1+r (x + rm)−(1+r )r→ 0

scale-free negative exponential dist.

r→∞

17

Nebraska

Kansas

Massachussets

Omaha

Wichita

Boston

6 Degrees

• Random people from Omaha & Wichita were asked to send a postcard to a person in Boston:

• Write the name on the postcard

• Forward the message only to people personally known that was more likely to know the target 18

1st run: 64/296 arrived, most delivered to him by 2 men

2nd run: 24/160 arrived, 2/3delivered by “Mr. Jacobs”

2 ≤ hops ≤ 10; µ=5.x

CPL, hubs, ...... and Kleinberg’s Intuition

Milgram’s Experiment

Biased Preferential AttachmentAt each step:

A new node is added to the network and is assigned to one of thesets P, I and L according to a probability distribution h

e0 ∈+ edges are added to the network

for each edge (u,v) u is chosen with distribution D0 and:

if u ∈ I, v is a new node and is assigned to P;

if u ∈ L, v is chosen according to Dγ.

Dβ (u)∝(β +1)(ku +1) u ∈Lku +1 u ∈I0 u ∈P

⎧⎨⎪

⎩⎪

No analytic results available.19

Transitive Linking Model [Davidsen 02]

I At each step:TL: a random node is chosen, and it introduces two other nodes that

are linked to it; if the node does not have 2 edges, it introduceshimself to a random node

RM: with probability p a node is chosen and removed along its edgesand replaced with a node with one random edge

I When p ⇤ 1 the TL dominates the process:I the degree distribution is a power-law with cutoffI 1 � C = p(⌅k⇧ � 1), i.e., quite large in practice

I For larger values of p the two different process concur to form anexponential degree distribution

I for p ⇥ 1 the degree distribution is essentially a Poissondistribution

Bergenti, Franchi, Poggi (Univ. Parma) Models for Agent-based Simulation of SN SNAMAS ’11 11 / 19

Transitive Linking

Instead of p it would make sense to have distinct p and rparameters for nodes leaving and entering the network

Few analytic results available.20

[1] Dorogovtsev, S. N. and Mendes, J. F. F. 2003 Evolution of Networks: From Biological Nets to the Internet and WWW (Physics). Oxford University Press, USA.

[2] Watts, D. J. 2003 Small Worlds: The Dynamics of Networks between Order and Randomness (Princeton Studies in Complexity). Princeton University Press.

[3] Jackson, M. O. 2010 Social and Economic Networks. Princeton University Press.[4] Newman, M. 2010 Networks: An Introduction. Oxford University Press, USA.[5] Wasserman, S. and Faust, K. 1994 Social Network Analysis: Methods and Applications

(Structural Analysis in the Social Sciences). Cambridge University Press.[6] Scott, J. P. 2000 Social Network Analysis: A Handbook. Sage Publications Ltd.[7] Kepner, J. and Gilbert, J. 2011 Graph Algorithms in the Language of Linear Algebra

(Software, Environments, and Tools). Society for Industrial & Applied Mathematics.[8] Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2009 Introduction to

Algorithms. The MIT Press.[9] Skiena, S. S. 2010 The Algorithm Design Manual. Springer.[10] Bollobas, B. 1998 Modern Graph Theory. Springer.[11] Watts, D. J. and Strogatz, S. H. 1998. Collective dynamics of ‘small-world’networks.

Nature. 393, 6684, 440-442.[12] Barabási, A. L. and Albert, R. 1999. Emergence of scaling in random networks. Science.

286, 5439, 509.[13] Kleinberg, J. 2000. The small-world phenomenon: an algorithm perspective. Proceedings of

the thirty-second annual ACM symposium on Theory of computing. 163-170.[14] Milgram, S. 1967. The small world problem. Psychology today. 2, 1, 60-67.

21

Thanks for your kind attention.

Enrico Franchi ([email protected])AOTLAB, Dipartimento Ingegneria dell’Informazione, Università di Parma

22



social network analysis

Technology

o n log n

arrayxfor n

log n prg

p n1k n pk

cyworldahn et

twitterjava et

orkutahn et

x e xsometimes