cs224w: social and information network analysis jure leskovec,...

CS224W: Social and Information Network AnalysisJure Leskovec, Stanford University

http://cs224w.stanford.edu

11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 2

Observations

Smalldiameter,Edgeclustering

Patternsofsignededgecreation

ViralMarketing,Blogosphere,Memetracking

Scale-Free

Densificationpowerlaw,Shrinking diameters

Strengthofweakties,Core-periphery

Models

Erdös-Renyi model,Small-worldmodel

Structuralbalance,Theoryofstatus

Independentcascademodel,Gametheoreticmodel

Preferentialattachment,Copyingmodel

Microscopicmodel ofevolvingnetworks

Kronecker Graphs

Algorithms

Decentralizedsearch

Models forpredictingedgesigns

Influencemaximization,Outbreakdetection,LIM

PageRank,Hubsandauthorities

Linkprediction,Supervised randomwalks

Community detection:Girvan-Newman,Modularity


Whatdoweobservethatneedsexplaining?¡ Small-worldmodel:§ Diameter§ Clusteringcoefficient

¡ PreferentialAttachment:§ Nodedegreedistribution

§ Whatfractionofnodeshasdegree𝒌 (asafunctionof𝒌)?§ Predictionfromsimplerandomgraphmodels:p(𝒌) = exponentialfunctionof𝒌

§ Observation:Oftenapower-law:𝒑 𝒌 ∝ 𝒌(𝜶


Expected based on Gnp Found in data

𝑷 𝒌 ∝ 𝒌(𝜶

¡ Takeanetwork,plotahistogramof𝑷(𝒌) vs.𝒌


Flickr socialnetwork

n= 584,207, m=3,555,115

[Leskovec et al. KDD ‘08]

Plot: fraction of nodes with degree 𝑘:

𝑝(𝑘) =| 𝑢|𝑑0 = 𝑘 |

𝑁

Prob

abilit

y: 𝑝(𝑘)=𝑃(𝑋

=𝑘)

¡ Plotthesamedataonlog-log scale:


Flickr socialnetwork

n= 584,207, m=3,555,115

[Leskovec et al. KDD ‘08]

How to distinguish:𝑃(𝑘) ∝ exp(−𝑘) vs.𝑃(𝑘) ∝ 𝑘(8 ?

Take logarithms: if 𝑦 = 𝑓(𝑥) = 𝑒(= then log 𝑦 = −𝑥

If 𝑦 = 𝑥(8 then log 𝑦 = −𝛼log(𝑥)

So on log-log axis power-law looks like a straight line of slope −𝛼 !

Slope = −𝛼 = 1.75

Prob

abilit

y: 𝑝(𝑘)=𝑃(𝑋

=𝑘) 𝑃 𝑘 ∝ 𝑘(F.GH

¡ InternetAutonomousSystems[Faloutsos,Faloutsos andFaloutsos,1999]


Internet domain topology

¡ TheWorldWideWeb[Broder etal.,2000]


¡ OtherNetworks[Barabasi-Albert,1999]


Power-gridWeb graphActor collaborations

¡ Aboveacertain𝒙 value,thepowerlawisalwayshigherthantheexponential!


20 40 60 80 100

0.2

0.6

1

1)( −= cxxp

xcxp −=)(

5.0)( −= cxxp

x

p(x)

¡ Power-lawvs.Exponentialonlog-logandsemi-log(log-lin)scales


[Clauset-Shalizi-Newman 2007]

semi-log

5.0)( −= cxxp

xcxp −=)(10

100 101 102 103

-4

10-3

10-2

10-1

100

log-log

1)( −= cxxp

xcxp −=)(

5.0)( −= cxxp

1)( −= cxxp

x … lineary … logarithmic

x … logarithmic axisy … logarithmic axis

1 2 3 4 5 6

¡ Power-lawdegreeexponentistypically2<α <3§ Webgraph:

§ αin=2.1,αout=2.4[Broder etal.00]§ Autonomoussystems:

§ α =2.4[Faloutsos3,99]§ Actor-collaborations:

§ α =2.3[Barabasi-Albert00]§ Citationstopapers:

§ α ≈ 3[Redner 98]§ Onlinesocialnetworks:

§ α ≈ 2[Leskovecetal.07]


¡ Definition:Networkswithapower-lawtailintheirdegreedistributionarecalled“scale-freenetworks”

¡ Wheredoesthenamecomefrom?§ Scale invariance: There isnocharacteristicscale

§ Scaleinvariance isthatlawsdonotchangeifscalesoflength,energy,orothervariables,aremultipliedbyacommonfactor

§ Scale-freefunction:𝒇 𝒂𝒙 = 𝒂𝝀𝒇(𝒙)§ Power-lawfunction:𝒇 𝒂𝒙 = 𝒂𝝀𝒙𝝀 = 𝒂𝝀𝒇(𝒙)


Log() or Exp() are not scale free!𝑓 𝑎𝑥 = log 𝑎𝑥 = log 𝑎 + log 𝑥 = log 𝑎 + 𝑓 𝑥𝑓 𝑎𝑥 = exp 𝑎𝑥 = exp 𝑥 O = 𝑓 𝑥 O


Many other quantities follow heavy-tailed distributions



[Chris Anderson, Wired, 2004]


CMU grad-students at the G20 meeting in

Pittsburgh in Sept 2009

¡ Degreesareheavilyskewed:Distribution𝑃(𝑋 > 𝑥) isheavytailedif:

𝐥𝐢𝐦𝒙→U

𝑷 𝑿 > 𝒙𝒆(𝝀𝒙

= ∞¡ Note:

§ NormalPDF:𝑝 𝑥 = FYZ[

𝑒(\]^ _

_`_

§ ExponentialPDF:𝑝 𝑥 = 𝜆𝑒(b=

§ then𝑃 𝑋 > 𝑥 = 1− 𝑃(𝑋 ≤ 𝑥) = 𝑒(b=

arenotheavytailed!


¡ Variousnames,kindsandforms:§ Longtail,Heavytail,Zipf’s law,Pareto’slaw

¡ Heavytaileddistributions:§ P(x)isproportionalto:



𝑃 𝑥 ∝

¡ Whatisthenormalizingconstant?p(x) = Z x-α Z = ?

§ 𝒑(𝒙) isadistribution:∫𝒑 𝒙 𝒅𝒙 = 𝟏Continuous approximation

§ 1 = ∫ 𝑝 𝑥 𝑑𝑥U=g

= 𝑍 ∫ 𝑥(8𝑑𝑥U=g

§ = − i8(F

𝑥(8jF =gU = − i

8(F∞F(8 − 𝑥kF(8

§ ⇒𝑍 = 𝛼 − 1 𝑥k8(F



𝒑 𝒙 =𝜶 − 𝟏𝒙𝒎

𝒙𝒙𝒎

(𝜶

p(x) diverges as x→0 so xm is the minimum

value of the power-law distribution x ∈ [xm, ∞]

xm

Need: α > 1 !

Integral:

n 𝒂𝒙 𝒏 =𝒂𝒙 𝒏j𝟏𝒂(𝒏 + 𝟏)

¡ What’stheexpectedvalueofapower-lawrandomvariableX?

¡ 𝐸 𝑋 = ∫ 𝑥𝑝 𝑥 𝑑𝑥U=g

= 𝑍 ∫ 𝑥(8jF𝑑𝑥U=g

¡ = iY(8

𝑥Y(8 =gU = 8(F =gq]r

((8(Y)[∞Y(8 − 𝑥kY(8]

⇒𝑬 𝑿 =𝜶 − 𝟏𝜶 − 𝟐𝒙𝒎



Need: α > 2 !

Power-law density:

𝑝 𝑥 =𝛼 − 1𝑥k

𝑥𝑥k

(8

𝑍 =𝛼 − 1𝑥kF(8

¡ Power-lawshaveinfinitemoments!

§ If𝛼 ≤ 2 :𝐸[𝑋] = ∞§ If𝛼 ≤ 3 :𝑉𝑎𝑟[𝑋] = ∞

§ Averageismeaningless,asthevarianceistoohigh!¡ Consequence:Sampleaverageofn samplesfromapower-lawwithexponentα


𝐸 𝑋 =𝛼 − 1𝛼 − 2𝑥k

In real networks2 < α < 3 so:E[X] = constVar[X] = ∞

Estimatingα fromdata:¡ (1)Fitalineonlog-logaxisusingleastsquares:§ Solve𝒂𝒓𝒈𝐦𝐢𝐧

𝜶𝐥𝐨𝐠 𝒚 − 𝜶 𝐥𝐨𝐠 𝒙 + 𝒃 𝟐


BAD!

Estimatingα fromdata:¡ PlotComplementaryCDF(CCDF)𝑷 𝑿 ≥ 𝒙 .Thentheestimated𝜶 = 𝟏 + 𝜶′where𝜶′ istheslopeof𝑷(𝑿 ≥ 𝒙).

¡ Fact: If𝒑 𝒙 = 𝑷 𝑿 = 𝒙 ∝ 𝒙(𝜶then𝑷 𝑿 ≥ 𝒙 ∝ 𝒙((𝜶(𝟏)

§ 𝑃 𝑋 ≥ 𝑥 = ∑ 𝑝(𝑗)U��= ≈ ∫ 𝑍𝑦(8𝑑𝑦U

= =

§ = iF(8

𝑦F(8 =U = i

F(8𝑥( 8(F


OK!

Estimatingα fromdata:¡ Usemaximumlikelihoodapproach:§ Thelog-likelihoodofobserveddatadi:

§ 𝐿 𝛼 = ln ∏ 𝑝 𝑑�� = ∑ ln𝑝(𝑑�)�

�

§ = ∑ ln(𝛼 − 1) − ln 𝑥k − 𝛼 ln ��=g

��

§ Wanttofind𝜶 thatmax𝐿(𝜶):Set�� 8�8

= 0

§�� 8�8

= 0⇒ �8(F

− ∑ ln ��=g

�� = 0

§ ⇒𝜶� = 𝟏 + 𝒏 ∑ 𝒍𝒏 𝒅𝒊𝒙𝒎

𝒏𝒊

(𝟏


Power-law density:

𝑝 𝑥 =𝛼 − 1𝑥k

𝑥𝑥k

(8

OK!


LinearscaleLogscale,α=1.75

CCDF,Logscale,α=1.75

CCDF,Logscale,α=1.75,

exp. cutoff

¡ Cannotarisefromsumsofindependentevents!§ Recall:in𝑮𝒏𝒑 eachpairofnodesinconnectedindependentlywithprob.𝒑§ 𝑿… degreeofnode 𝒗§ 𝑿𝒘 … eventthat w linksto v§ 𝑿 = ∑ 𝑿𝒘𝒘§ 𝑬 𝑿 = ∑ 𝑬 𝑿𝒘 = 𝒏 − 𝟏 𝒑𝒘

§ Now,whatis𝑷 𝑿 = 𝒌 ? Centrallimittheorem!§ 𝑿𝟏,… , 𝑿𝒏: randomvars withmean µ, variance σ2

§ 𝑺𝒏 = ∑ 𝑿𝒊𝒊 :𝐸 𝑆� = 𝒏𝝁 ,Var 𝑆� = 𝒏𝝈𝟐,SD 𝑆� = 𝝈 𝒏

§ 𝑷 𝑺𝒏 = 𝑬 𝑺𝒏 + 𝒙 ¤ 𝐒𝐃 𝑺𝒏 ~ 𝟏𝟐𝝅𝐞(𝐱

𝟐

𝟐


Random network Scale-free (power-law) network(Erdos-Renyi random graph)

Degree distribution is Binomial

Degree distribution is Power-law

JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 3011/1/16

¡ Howdoesnetworkconnectivitychangeasnodesgetremoved?[Albertetal.00;Palmeretal.01]

¡ Nodescanberemoved:§ Randomfailure:

§ Removenodesuniformlyatrandom

§ Targetedattack:§ Removenodesinorderofdecreasingdegree

¡ Thisisimportantforrobustnessoftheinternetaswellasepidemiology


¡ Networkswithequalnumberofnodesandedges:§ ERrandomgraph§ Scale-freenetwork

¡ Studythepropertiesofthenetworkasanincreasingfractionofnodesareremoved§ Nodeselection:

§ Random(thiscorrespondstorandomfailures)§ Nodeswithlargestdegrees(correspondstotargetedattacks)

¡ Measures:§ Fractionofnodesinthelargestconnectedcomponent§ Averageshortestpathlengthbetweennodesinthelargestcomponent


¡ Scale-freegraphsareresilienttorandomattacks,butsensitivetotargetedattacks.

¡ Forrandomnetworksthereissmallerdifferencebetweenthetwo

random failure targeted attack

Size

of t

he la

rges

t co

nnec

ted

com

pone

nt


¡ Whatproportionofthenodesmustberemovedinorderforthesize(S)ofthegiantcomponenttodropto0?

¡ Infinitescale-freenetworkswith𝛾 < 3neverbreakdownunderrandomnodefailures

Source: Cohen et al., Resilience of the Internet to Random Breakdowns11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 35

𝛾… degree exponentK… maximum degree

Fraction deleted nodes

Frac

tion

in la

rges

t co

nnec

ted

com

pone

nt

¡ Realnetworksareresilienttorandomfailures¡ Gnp hasbetterresiliencetotargetedattacks

§ Needtoremoveallpagesofdegree>5 todisconnecttheWeb§ Butthisisaverysmallfractionofallwebpages


Fraction of removed nodes

Mea

n pa

th le

ngth

Fraction of removed nodes

Randomfailures

Targetedattack

Gnp networkAS network

Randomfailures

Targetedattack

Source: Error and attack tolerance of complex networks. Réka Albert, Hawoong Jeong and Albert-László Barabási11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 37

Frac

tion

in la

rges

t co

nnec

ted

com

pone

nt


Shor

test

pat

h le

ngth

¡ Thefirstfew%ofnodesremoved.

¡ d… avg.shortestpathlength


Shor

test

pat

h le

ngth


¡ Preferentialattachment[Price‘65,Albert-Barabasi ’99,Mitzenmacher ‘03]

§ Nodesarriveinorder1,2,…,n§ Atstepj,letdi bethedegreeofnodei < j§ Anewnodej arrivesandcreatesm out-links§ Prob.ofj linkingtoapreviousnodei isproportionaltodegreedi ofnodei


∑=→

kk

i

ddijP )(

¡ Newnodesaremorelikelytolinktonodesthatalreadyhavehighdegree

¡ HerbertSimon’sresult:§ Power-lawsarisefrom“Richgetricher”(cumulativeadvantage)

¡ Examples§ Citations[deSolla Price‘65]: Newcitationstoapaperareproportionaltothenumberitalreadyhas§ Herding: Ifalotofpeopleciteapaper,thenitmustbegood,andthereforeIshouldciteittoo

§ Sociology: Mattheweffect§ Eminentscientistsoftengetmorecreditthanacomparativelyunknownresearcher,eveniftheirworkissimilar

§ http://en.wikipedia.org/wiki/Matthew_effect11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 41

Wewillanalyzethefollowingmodel:¡ Nodesarriveinorder1,2,3,… , 𝑛¡ Whennode𝒋 iscreateditmakesasingleout-link toanearliernode𝒊 chosen:§ 1)Withprob.𝒑,𝒋 linksto𝒊 chosenuniformlyatrandom (fromamongallearliernodes)

§ 2)Withprob.𝟏 − 𝒑,node𝒋 chooses𝒊 uniformlyatrandom&linkstoarandomnodel that i pointsto§ Thisissameassaying:Withprob.𝟏 − 𝒑,node𝒋 linkstonode𝒍 withprob.proportionalto𝒅𝒍 (thein-degreeof𝒍)

§ Ourgraphisdirected: Everynodehasout-degree111/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 42

[Mitzenmacher, ‘03]

Node j

¡ Claim: Thedescribedmodelgeneratesnetworkswherethefractionofnodeswithin-degreek scalesas:


)11()( q

i kkdP+−

∝=where q=1-p

p−+=111α

So we get power-lawdegree distributionwith exponent:

¡ Considerdeterministicandcontinuousapproximation tothedegreeofnode𝒊 asafunctionoftime𝒕§ 𝒕 isthenumberofnodesthathavearrivedsofar§ In-Degree𝒅𝒊(𝒕) ofnode𝒊 (𝑖 = 1,2,… , 𝑛)isacontinuousquantity anditgrowsdeterministicallyasafunctionoftime𝒕

¡ Plan: Analyze𝒅𝒊(𝒕) – continuousin-degreeofnode𝒊 attime𝒕 > 𝒊§ Note:Node𝒊 arrivestothegraphattime𝒕


¡ Initialcondition:§ 𝒅𝒊(𝒕) = 𝟎,when 𝒕 = 𝒊 (nodei justarrived)

¡ Expectedchange of𝒅𝒊(𝒕) overtime:§ Node𝒊 gainsanin-linkatstep𝒕 + 𝟏 onlyifalinkfromanewlycreatednode𝒕 + 𝟏 pointstoit.

§ What’stheprobabilityofthisevent?§ Withprob.𝒑 node𝒕 + 𝟏 linksrandomly:§ Linkstoournode𝒊 withprob.𝟏/𝒕

§ Withprob.𝟏− 𝒑 node𝒕 + 𝟏 linkspreferentially:§ Linkstoournode𝒊 withprob.𝒅𝒊(𝒕)/𝒕

§ Prob.node𝒕 + 𝟏 linksto𝒊 is:𝒑 𝟏𝒕+ 𝟏− 𝒑 𝒅𝒊(𝒕)

𝒕11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 45

Node i

¡ At𝒕 = 𝟒 node𝒊 = 𝟒 comes.Ithasout-degreeof1todeterministicallysharewithothernodes:

¡

¡ 𝒅𝒊 𝒕 − 𝒅𝒊 𝒕 − 𝟏 = ��(´)�´

= 𝐩 𝟏𝒕+ 𝟏 − 𝒑 𝒅𝒊(𝒕)

𝒕¡ Howdoes𝒅𝒊(𝒕) evolveas𝒕&∞?


Node i di(t) di(t+1)0 0 =0 + 𝑝 F

¶+ 1 − 𝑝 ·

¶

1 2 =2 + 𝑝 F¶+ 1 − 𝑝 Y

¶

2 0 =0 + 𝑝 F¶+ 1 − 𝑝 F

¶

3 1 =1 + 𝑝 F¶+ 1 − 𝑝 F

¶

4 / 0

01

2 3

4

cs224w: social and information network analysis jure leskovec,...

Documents