cs224w:’social’and’information’network’analysis

76
CS224W: Social and Information Network Analysis Lada Adamic http://cs224w.stanford.edu

Upload: others

Post on 12-Jun-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS224W:’Social’and’Information’Network’Analysis

CS224W:  Social  and  Information  Network  Analysis  Lada  Adamic  

http://cs224w.stanford.edu  

Page 2: CS224W:’Social’and’Information’Network’Analysis
Page 3: CS224W:’Social’and’Information’Network’Analysis
Page 4: CS224W:’Social’and’Information’Network’Analysis
Page 5: CS224W:’Social’and’Information’Network’Analysis
Page 6: CS224W:’Social’and’Information’Network’Analysis

Stanford  Social  Web  (ca.  1999)  

network  of  personal  homepages  at  Stanford  

Page 7: CS224W:’Social’and’Information’Network’Analysis

Y

X

Y

X

Y X

Y

X

indegree

In each of the following networks, X has higher centrality than Y according to a particular measure

outdegree betweenness closeness

different  notions  of  centrality  

Page 8: CS224W:’Social’and’Information’Network’Analysis

Y

X

review:  indegree  

Page 9: CS224W:’Social’and’Information’Network’Analysis

trade  in  petroleum  and  petroleum  products,  1998,  source:  NBER-­‐United  Nations  Trade  Data  

Page 10: CS224W:’Social’and’Information’Network’Analysis

¡   Which  countries  have  high  indegree  (import  petroleum  and  petroleum  products  from  many  others)  §  Saudi  Arabia  §  Japan  §  Iraq  § USA  §  Venezuela  

Page 11: CS224W:’Social’and’Information’Network’Analysis

review:  outdegree  

Y

X

Page 12: CS224W:’Social’and’Information’Network’Analysis

Nepal

Guyana

Ethiopia

Mauritius

Mali

Lebanon

Barbados

Haiti

Cambodia

Suriname

Guadeloupe

Mauritania

Fiji

Costa Rica

Cote Divoire

Bahamas

Jordan

Angola

Nigeria

CanadaUSA

Argentina

Brazil

MexicoJapan

IranIraq Kuwait

Oman

Saudi Arabia

Untd Arab Em

China HK SAR

Korea Rep. MalaysiaSingapore

Thailand

China

Belgium-Lux

France,Monac

GermanyItalyNetherlands

Spain

UK

SwedenRussian Fed

Australia

Indonesia

Poland

Algeria

Portugal

Libya

Jamaica

Panama

Malta

India

South Africa

VenezuelaColombia

Trinidad Tbg

Bahrain

Norway

Egypt

Gabon

Guatemala

Qatar

Afghanistan

Viet NamTaiwan

Myanmar

Sri Lanka

Pakistan

Nicaragua

Korea D P Rp

Guinea

Cuba

Bangladesh

Senegal

trade  in  petroleum  and  petroleum  products,  1998,  source:  NBER-­‐United  Nations  Trade  Data  

Page 13: CS224W:’Social’and’Information’Network’Analysis

¡   Which  country  has  low  outdegree  but  exports  a  significant  quanDty  (thickness  of  the  edges  represents  $$  value  of  export)  of  petroleum  products  §  Saudi  Arabia  §  Japan  §  Iraq  §  USA  §  Venezuela  

Nepal

Guyana

Ethiopia

Mauritius

Mali

Lebanon

Barbados

Haiti

Cambodia

Suriname

Guadeloupe

Mauritania

Fiji

Costa Rica

Cote Divoire

Bahamas

Jordan

Angola

Nigeria

CanadaUSA

Argentina

Brazil

MexicoJapan

IranIraq Kuwait

Oman

Saudi Arabia

Untd Arab Em

China HK SAR

Korea Rep. MalaysiaSingapore

Thailand

China

Belgium-Lux

France,Monac

GermanyItalyNetherlands

Spain

UK

SwedenRussian Fed

Australia

Indonesia

Poland

Algeria

Portugal

Libya

Jamaica

Panama

Malta

India

South Africa

VenezuelaColombia

Trinidad Tbg

Bahrain

Norway

Egypt

Gabon

Guatemala

Qatar

Afghanistan

Viet NamTaiwan

Myanmar

Sri Lanka

Pakistan

Nicaragua

Korea D P Rp

Guinea

Cuba

Bangladesh

Senegal

Page 14: CS224W:’Social’and’Information’Network’Analysis

Korea Rep.

Uruguay

Switz.Liecht

Sri Lanka

GibraltarArmenia

Ireland

Portugal

Nicaragua

Ghana

Morocco

Brazil

Paraguay

El Salvador

Slovenia

Cuba

Bulgaria

Dominican Rp

Barbados

Bermuda

Belarus

Mauritania

Philippines

Korea D P Rp

Burkina Faso

Uzbekistan

Myanmar

Costa Rica

TFYR Macedna Sudan

Senegal

Mongolia

Angola

NigeriaMexico Iran

Iraq

Kuwait

Oman

Saudi Arabia

Untd Arab Em

TurkeyUK

Lithuania

Russian Fed

Libya

Venezuela

Algeria

South Africa

Cote Divoire

USAColombia

Ecuador

Bahamas

Panama

Syria

Denmark

Netherlands

Finland

Norway

Sweden

Egypt

Cameroon

Gabon

Dem.Rp.Congo

Canada

Argentina

Bolivia

Chile

Peru

Guatemala

Trinidad Tbg

Yemen

Afghanistan

Indonesia

Malaysia

Singapore

China

Viet Nam

Estonia

Australia

Papua N.Guin

Kazakhstan

Italy

Spain

Qatar

New Zealand

Pakistan

Tunisia

Georgia

Thailand

Guinea

Liberia

Niger

JapanIndia

Taiwan

Ukraine

Germany

Greece

France,Monac

Austria

IsraelHungary

Benin

Azerbaijan

Belgium-Lux

Malta

Latvia

Jamaica

Poland

Czech Rep

Yugoslavia

Cyprus

Romania

Slovakia

Croatia

trade  in  crude  petroleum  and  petroleum  products,  1998,  source:  NBER-­‐United  Nations  Trade  Data  

Page 15: CS224W:’Social’and’Information’Network’Analysis

Undirected degree, e.g. nodes with more friends are more central.

Assumption: the connections that your friend has don't matter, it is what they can do directly that does (e.g. go have a beer with you, help you build a deck...)

putting  numbers  to  it  

Page 16: CS224W:’Social’and’Information’Network’Analysis

divide degree by the max. possible, i.e. (N-1)

normalization  

Page 17: CS224W:’Social’and’Information’Network’Analysis

Freeman’s general formula for centralization (can use other metrics, e.g. gini coefficient or standard deviation):

CD =CD (n

*) −CD (i)[ ]i=1

g∑[(N −1)(N − 2)]

How much variation is there in the centrality scores among the nodes?

maximum value in the network

centralization:  skew  in  distribution  

Page 18: CS224W:’Social’and’Information’Network’Analysis

CD = 0.167

CD = 0.167

CD = 1.0

degree  centralization  examples  

Page 19: CS224W:’Social’and’Information’Network’Analysis

example financial trading networks

high in-centralization: one node buying from many others

low in-centralization: buying is more evenly distributed

real-­‐world  examples  

Page 20: CS224W:’Social’and’Information’Network’Analysis

In what ways does degree fail to capture centrality in the following graphs?

Page 21: CS224W:’Social’and’Information’Network’Analysis

Stanford  Social  Web  (ca.  1999)  

network  of  personal  homepages  at  Stanford  

Page 22: CS224W:’Social’and’Information’Network’Analysis

Y

X

Page 23: CS224W:’Social’and’Information’Network’Analysis
Page 24: CS224W:’Social’and’Information’Network’Analysis
Page 25: CS224W:’Social’and’Information’Network’Analysis

¡  intuition: how many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops?

Y X

Page 26: CS224W:’Social’and’Information’Network’Analysis

CB (i) = g jk (i) /g jkj<k∑

Where gjk = the number of shortest paths connecting jk gjk(i) = the number that actor i is on.

Usually normalized by:

CB' (i) = CB (i ) /[(n −1)(n − 2) /2]

number of pairs of vertices excluding the vertex itself

Betweenness:  definition  

Page 27: CS224W:’Social’and’Information’Network’Analysis

¡  non-normalized version:

Page 28: CS224W:’Social’and’Information’Network’Analysis

¡  non-normalized version:

A B C E D

n  A lies between no two other vertices n  B lies between A and 3 other vertices: C, D, and E n  C lies between 4 pairs of vertices (A,D),(A,E),(B,D),(B,E)

n  note that there are no alternate paths for these pairs to take, so C gets full credit

Page 29: CS224W:’Social’and’Information’Network’Analysis

¡  non-normalized version:

Page 30: CS224W:’Social’and’Information’Network’Analysis

¡  non-normalized version:

A B

C

E

D

n  why do C and D each have betweenness 1?

n  They are both on shortest paths for pairs (A,E), and (B,E), and so must share credit: n  ½+½ = 1

Page 31: CS224W:’Social’and’Information’Network’Analysis

¡ What  is  the  betweenness  of  node  E?  

E

Page 32: CS224W:’Social’and’Information’Network’Analysis

Lada’s old Facebook network: nodes are sized by degree, and colored by betweenness.

betweenness:  example  

Page 33: CS224W:’Social’and’Information’Network’Analysis

Q:  high  betweenness,  low  degree  

¤ Find a node that has high betweenness but low degree

Page 34: CS224W:’Social’and’Information’Network’Analysis

Q:  low  betweenness,  high  degree  

¤ Find a node that has low betweenness but high degree

Page 35: CS224W:’Social’and’Information’Network’Analysis

¡  What if it’s not so important to have many direct friends?

¡  Or be “between” others ¡  But one still wants to be in the “middle” of

things, not too far from the center

Page 36: CS224W:’Social’and’Information’Network’Analysis

need  not  be  in  a  brokerage  position  

Y X

Y

X

Y

X

Y

X

Page 37: CS224W:’Social’and’Information’Network’Analysis

Closeness is based on the length of the average shortest path between a node and all other nodes in the network

Cc (i) = d(i, j)j=1

N

∑#

$ % %

&

' ( (

−1

CC' (i) = (CC (i)) /(N −1)

Closeness Centrality:

Normalized Closeness Centrality

closeness:  definition  

Page 38: CS224W:’Social’and’Information’Network’Analysis

Cc' (A) =

d(A, j)j=1

N

N −1

#

$

%%%%

&

'

((((

−1

=1+ 2+3+ 4

4#

$%&

'(

−1

=104

#

$%&

'(

−1

= 0.4

A B C E D

Closeness:  toy  example  

Page 39: CS224W:’Social’and’Information’Network’Analysis

Closeness:  more  toy  examples  

Page 40: CS224W:’Social’and’Information’Network’Analysis

Q:high  degree,  low  closeness  

Which  node  has  relatively  high  degree  but  low  closeness?  

Page 41: CS224W:’Social’and’Information’Network’Analysis

¡   How  central  you  are  depends  on  how  central  your  neighbors  are  

Page 42: CS224W:’Social’and’Information’Network’Analysis

c(β) =α(I −βA)−1A1•  α is a normalization constant •  β determines how important the centrality of your neighbors is

• A is the adjacency matrix (can be weighted) • I is the identity matrix (1s down the diagonal, 0 off-diagonal) • 1 is a matrix of all ones.

Bonacich  eigenvector  centrality  

ci (β) = (α +βcjj∑ )Aji

Page 43: CS224W:’Social’and’Information’Network’Analysis

small β è  high  attenuation    only  your  immediate  friends  matter,  and  their  

importance  is  factored  in  only  a  bit    high  β è  low  attenuation  

 global  network  structure  matters  (your  friends,  your  friends'  of  friends  etc.)     β  =  0  yields  simple  degree  centrality  

Bonacich  Power  Centrality:  attenuation  factor  β

ci (β) = (α +βcjj∑ )Aji

Page 44: CS224W:’Social’and’Information’Network’Analysis

If β > 0, nodes have higher centrality when they have edges to other central nodes. If β < 0, nodes have higher centrality when they have edges to less central nodes.

Bonacich  Power  Centrality:  attenuation  factor  β

Page 45: CS224W:’Social’and’Information’Network’Analysis

β=.25

β=-.25

Why does the middle node have lower centrality than its neighbors when β is negative?

Bonacich  Power  Centrality:  examples

Page 46: CS224W:’Social’and’Information’Network’Analysis

¡  WWW ¡  food webs ¡  population dynamics ¡  influence ¡  hereditary ¡  citation ¡  transcription regulation networks ¡  neural networks

Page 47: CS224W:’Social’and’Information’Network’Analysis

¡  We now consider the fraction of all directed paths between any two vertices that pass through a node

n  Only  modification:  when  normalizing,  we  have    (N-­‐1)*(N-­‐2)  instead  of  (N-­‐1)*(N-­‐2)/2,  because  we  have  twice  as  many  ordered  pairs  as  unordered  pairs  €

CB (i) = g jkj ,k∑ (i) /g jk

betweenness of vertex i paths between j and k that pass through i

all paths between j and k

CB

' (i) = CB(i) /[(N −1)(N − 2)]

Page 48: CS224W:’Social’and’Information’Network’Analysis

¡  A node does not necessarily lie on a geodesic (shortest path) from j to k if it lies on a geodesic from k to j

k

j

Page 49: CS224W:’Social’and’Information’Network’Analysis

¡  choose a direction §  in-closeness (e.g. prestige in citation networks) §  out-closeness

¡  usually consider only vertices from which the node i in question can be reached

Page 50: CS224W:’Social’and’Information’Network’Analysis

¡   PageRank  (centrality)  brings  order  to  the  Web:  §  it's  not  just  the  pages  that  point  to  you,  but  how  many  pages  point  to  those  pages,  etc.  

§ more  difficult  to  arDficially  inflate  centrality  with  a  recursive  definiDon  

Many webpages scattered across the web

an important page, e.g. slashdot

if a web page is slashdotted, it gains attention

Page 51: CS224W:’Social’and’Information’Network’Analysis

¡  A random walker following edges in a network for a very long time will spend a proportion of time at each node which can be used as a measure of importance

Page 52: CS224W:’Social’and’Information’Network’Analysis

¡  Problem with pure random walk metric: §  Drunk can be “trapped” and end up going in circles

Page 53: CS224W:’Social’and’Information’Network’Analysis

¡  Allow drunk to teleport with some probability §  e.g. random websurfer follows links for a while, but with

some probability teleports to a “random” page (bookmarked page or uses a search engine to start anew)

Page 54: CS224W:’Social’and’Information’Network’Analysis

1  

2  

3  4  

5  

7  

6   8  

00.10.2

0.30.40.50.60.7

0.80.91

1 2 3 4 5 6 7 8

PageRank

t=0

00.10.2

0.30.40.50.60.7

0.80.91

1 2 3 4 5 6 7 8

PageRank

t=1

20% teleportation probability

slide adapted from: Dragomir Radev

Page 55: CS224W:’Social’and’Information’Network’Analysis

1  

2  

3  4  

5  

7  

6   8   00.10.2

0.30.40.50.60.7

0.80.91

1 2 3 4 5 6 7 8

PageRank

t=0

00.10.2

0.30.40.50.60.7

0.80.91

1 2 3 4 5 6 7 8

PageRank

t=1

00.10.2

0.30.40.50.60.7

0.80.91

1 2 3 4 5 6 7 8

PageRank

t=10

slide from: Dragomir Radev

Page 56: CS224W:’Social’and’Information’Network’Analysis

GUESS PageRank demo ¡  What happens to the

relative PageRank scores of the nodes as you increase the teleportation probability (decrease the damping factor)? §  they equalize §  they diverge §  they are unchanged

PageRank.nlogo  part  of  the  built-­‐in  suite  of  network  models  for  NetLogo  

Page 57: CS224W:’Social’and’Information’Network’Analysis

¡  Centrality § many measures: degree, betweenness,

closeness, eigenvector § may be unevenly distributed

§ measure via distributions and centralization

§  in directed networks §  indegree, outdegree, PageRank

§  consequences: § benefits & risks (Baker & Faulkner) §  information flow & productivity (Aral & Van Alstyne)

Page 58: CS224W:’Social’and’Information’Network’Analysis

(Dme  permiSng)  

9/23/15   58  Jure  Leskovec  and  Lada  Adamic,  Stanford  CS224W:  Social  and  InformaDon  Network  Analysis  

Page 59: CS224W:’Social’and’Information’Network’Analysis

59  

Page 60: CS224W:’Social’and’Information’Network’Analysis

60  

Page 61: CS224W:’Social’and’Information’Network’Analysis

¡  The Response Time Gap

4939N =

ExpertiseRating

lowhigh

WA

ITT

IME(

min

)

10000

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

6996

41

•   The  Expertise  Gap    •   Difficult  to  infer  reliability  of  answers    

   Automatically  ranking  expertise  may  be  helpful.  

   

Zhang,  Ackerman,  Adamic,  WWW’07  

Page 62: CS224W:’Social’and’Information’Network’Analysis

¡  87 sub-forums ¡  1,438,053

messages ¡  community

expertise network constructed: §  196,191 users §  796,270

edges

Page 63: CS224W:’Social’and’Information’Network’Analysis

A B C

Thread 1 Thread 2

Thread  1:  Large  Data,  binary  search  or  hashtable?  user  A    Re:  Large...  user  B    Re:  Large...  user  C  

Thread  2:  Binary  file  with  ASCII  data  user  A    Re:  File  with...  user  C      

A

B

C

1

1

A

B

C

1

2

A

B

C

1/2

1+1//2

A

B

C

0.9 0.1

unweighted

weighted by # threads

weighted by shared credit

weighted with backflow

Page 64: CS224W:’Social’and’Information’Network’Analysis

10  0   10  1   10  2   10  3  10  -4  

10  -3  

10  -2  

10  -1  

10  0  

degree (k)  

cum

ulat

ive

prob

abilit

y  

 

 

 α   = 1.87 fit, R  2   = 0.9730  

number of people one received replies from

number of people one replied to

§ ‘answer people’ may reply to thousands of others

§ ‘question people’ are also uneven in the number of repliers to their posts, but to a lesser extent

Page 65: CS224W:’Social’and’Information’Network’Analysis

•  Core: A strongly connected component, in which everyone asks and answers •  IN: Mostly askers. •  OUT: Mostly Helpers

The  Web  is  a  bow  tie   The  Java  Forum  network  is    

an  uneven  bow  tie      

Page 66: CS224W:’Social’and’Information’Network’Analysis
Page 67: CS224W:’Social’and’Information’Network’Analysis

¡  Human-rated expertise levels §  2 raters §  135 JavaForum users with >= 10 posts §  inter-rater agreement (τ = 0.74, ρ = 0.83) §  for evaluation of algorithms, omit users where raters disagreed by

more than 1 level (τ = 0.80, ρ = 0.83)

L Category Description 5 Top Java expert Knows the core Java theory and related

advanced topics deeply. 4 Java professional Can answer all or most of Java concept

questions. Also knows one or some sub topics very well,

3 Java user Knows advanced Java concepts. Can program relatively well.

2 Java learner Knows basic concepts and can program, but is not good at advanced topics of Java.

1 Newbie Just starting to learn java.

Page 68: CS224W:’Social’and’Information’Network’Analysis

simple  local  measures  do  as  well  (and  better)  than  measures  incorporating  the  wider  network  topology  

 

Top K Kendall’s τ Spearman’s ρ

# answers z-score # answers indegree z-score indegree PageRank HITS authority

0.9 0.8 0.7 0.6 0.5 0.4 0.3

0.2 0.1

0

Page 69: CS224W:’Social’and’Information’Network’Analysis

10121617181920192N =

LEVCOM

1098765432

RAN

K o

f PR

AN

K

160

140

120

100

80

60

40

20

0

-20

9281

5

68

1

10121617181920192N =

LEVCOM

1098765432

RAN

K o

f RE

PLY

140

120

100

80

60

40

20

0

-20

40

101

10121617181920192N =

LEVCOM

1098765432

RANK

of Z

THRE

ADS

160

140

120

100

80

60

40

20

0

-20

40

1011

10121117171917192N =

LEVCOM

1098765432

RANK

of H

ITS_

A UT

140

120

100

80

60

40

20

0

-20

33

# answers

human rating

auto

mat

ed ra

nkin

g

10121617181920192N =

LEVCOM

1098765432

RAN

K o

f IN

DG

R

160

140

120

100

80

60

40

20

0

-20

40

101

10121117171917192N =

LEVCOM

1098765432

RAN

K o

f ZD

GR

140

120

100

80

60

40

20

0

-20

106104

z # answers

HITS authority

indegree

z indegree

PageRank

Page 70: CS224W:’Social’and’Information’Network’Analysis

Control Parameters: n  Distribution of expertise n  Who asks questions most often? n  Who answers questions most often? n  best expert most likely n  someone a bit more expert

ExpertiseNet Simulator

Page 71: CS224W:’Social’and’Information’Network’Analysis

0 1 2 3 4 50

1

2

3

4

5

replier expertise

asker expertise

0.02

0.04

0.06

0.08

0.1

0.12

0 1 2 3 4 50

1

2

3

4

5

replier expertise

asker expertise

0

0.05

0.1

0.15

suppose: expertise is uniformly distributed probability of posing a question is inversely proportional to expertise pij = probability a user with expertise j replies to a user with expertise i

2 models:

‘best’ preferred ‘just better’ preferred

iep ijij /~ )( −β iep ji

ij /~ )( −γ j>i

Page 72: CS224W:’Social’and’Information’Network’Analysis

Best “preferred” just better

Page 73: CS224W:’Social’and’Information’Network’Analysis

best preferred (simulation) just better (simulation)

Java Forum Network as

ker i

ndeg

ree

aske

r ind

egre

e

aske

r ind

egre

e

Page 74: CS224W:’Social’and’Information’Network’Analysis

Preferred Helper: ‘just better’

Preferred Helper: ‘best available’

Page 75: CS224W:’Social’and’Information’Network’Analysis

In the ‘just better’ model, a node is correctly ranked by PageRank but not by HITS

Page 76: CS224W:’Social’and’Information’Network’Analysis

¡  Node  centrality  can  reveal  the  relaDve  importance  of  nodes  within  the  network  

¡  Choose  a  measure  appropriate  to  the  quesDon  you  are  asking  

9/23/15   Jure  Leskovec  and  Lada  Adamic,  Stanford  CS224W:  Social  and  InformaDon  Network  Analysis   76