presented by september 21, 2007 warsaw, poland 2007... · 2007. 8. 10. · cmu scs mining large...
TRANSCRIPT
![Page 1: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/1.jpg)
THE 18TH EUROPEAN CONFERENCE ON MACHINE LEARNINGAND
THE 11TH EUROPEAN CONFERENCE ON PRINCIPLES AND PRACTICEOF KNOWLEDGE DISCOVERY IN DATABASES
MINING LARGE GRAPHS
LAWS AND TOOLS
TUTORIAL NOTES
presented by
Christos Faloutsos and Jure Leskovec
September 21, 2007
Warsaw, Poland
![Page 2: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/2.jpg)
Prepared and presented by:Christos Faloutsos and Jure LeskovecMachine Learning Department, Carnegie Mellon University, USA
![Page 3: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/3.jpg)
CMU SCS
Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools
Christos Faloutsos and Jure LeskovecM hi L i D t tMachine Learning Department
CMU SCS
Tutorial outlineTutorial outline
Part 1: Laws and generators:What are properties of large graphs?How do we model them?
Part 2: Tools for static and dynamics graphsy g pMethods based on Singular Value Decomposition
Part 3: Case studiesPart 3: Case studiesPropagation and detection of viruses in networksGraph projectionsGraph projections
Faloutsos&LeskovecECML/PKDD 2007
Part 1-2
1
![Page 4: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/4.jpg)
CMU SCS
Part 1: OutlinePart 1: Outline
1 1 Mi i1.1: MiningWhat are the statistical properties of static and time evolving networks?
1.2: Models:How do we build models of network generations of evolution?g
1.3: Fitting:How do we fit models?How do we fit models? How do we generate realistic looking graphs?
Faloutsos&LeskovecECML/PKDD 2007
Part 1-3
CMU SCS
Part 1.1: Mining
What are statistical properties ofWhat are statistical properties of networks across various
domains?domains?
2
![Page 5: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/5.jpg)
CMU SCS
Examples of networksExamples of networks(b) (c)
(a)(a)
(d)(e)
Internet (a)
( )
sexual network (d)Internet (a)citation network (b)World Wide Web (c)
sexual network (d)food web (e)
Faloutsos&LeskovecECML/PKDD 2007
Part 1-5
World Wide Web (c)
CMU SCS
Networks of the real world (1)Networks of the real-world (1)Information networks:
World Wide Web: hyperlinksWorld Wide Web: hyperlinksCitation networksBlog networks
Social networks: people +Social networks: people + interactios
Organizational networksCommunication networks
Florence families Karate club networkCommunication networks Collaboration networksSexual networks Collaboration networks
Technological networks:Power gridAirline, road, river networks, ,Telephone networksInternetAutonomous systems Collaboration networkF i d hi t k
Faloutsos&LeskovecECML/PKDD 2007
Part 1-6
y Collaboration networkFriendship network
3
![Page 6: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/6.jpg)
CMU SCS
Networks of the real world (2)Networks of the real-world (2)
Biological networksmetabolic networksfood webneural networks Yeast protein Semantic network
gene regulatory networks
pinteractions
Language networksSemantic networks
Software networks
Faloutsos&LeskovecECML/PKDD 2007
Part 1-7
…Language network XFree86 network
CMU SCS
Traditional approachTraditional approachSociologists were first to study networks:Sociologists were first to study networks:
Study of patterns of connections between people to understand functioning of thepeople to understand functioning of the societyPeople are nodes interactions are edgesPeople are nodes, interactions are edges Questionares are used to collect link data (hard to obtain inaccurate subjective)(hard to obtain, inaccurate, subjective)Typical questions: Centrality and connectivity
Li it d t ll h ( 10 d ) dLimited to small graphs (~10 nodes) and properties of individual nodes and edges
Faloutsos&LeskovecECML/PKDD 2007
Part 1-8
4
![Page 7: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/7.jpg)
CMU SCS
Motivation: New approach (1)Motivation: New approach (1)
Large networks (e.g., web, internet, on-line social networks) with millions of nodesMany traditional questions not useful anymore:
Traditional: What happens if a node u is removed? ppNow: What percentage of nodes needs to be removed to affect network connectivity?
Focus moves from a single node to study of statistical properties of the network as a wholep p
Faloutsos&LeskovecECML/PKDD 2007
Part 1-9
CMU SCS
Motivation: New approach (2)Motivation: New approach (2)
How the network “looks like” even if I can’t look at it?Need statistical methods and tools to quantify large networksg3 parts/goals:
Statistical properties of large networksStatistical properties of large networksModels that help understand these propertiesPredict behavior of networked systems based onPredict behavior of networked systems based on measured structural properties and local rules governing individual nodes
Faloutsos&LeskovecECML/PKDD 2007
Part 1-10
5
![Page 8: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/8.jpg)
CMU SCS
How to generate a graph?How to generate a graph?
What is the simplest way to generate a graph?g pRandom graph model (Erdos-Renyi model Poisson random graph model):model, Poisson random graph model):
Given n vertices connect each pair i.i.d. with probability p
How good (“realistic”) is this graph g ( ) g pgenerator?
Faloutsos&LeskovecECML/PKDD 2007
Part 1-11
CMU SCS
Small world effect (1)Small-world effect (1)Si d f ti [Mil 60 ]Six degrees of separation [Milgram 60s]
Random people in Nebraska were asked to send letters to stockbrokes in BostonLetters can only be passed to first-name acquanticesOnly 25% letters reached the goalBut they reached it in about 6 stepsBut they reached it in about 6 steps
Measuring path lengths: Diameter (longest shortest path): max dij( g p ) ijEffective diameter: distance at which 90% of all connected pairs of nodes can be reachedMean geodesic (shortest) distance lMean geodesic (shortest) distance l
or
Faloutsos&LeskovecECML/PKDD 2007
Part 1-12
6
![Page 9: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/9.jpg)
CMU SCS
Small world effect (2)Small-world effect (2)[Leskovec&Horvitz,07]
Distribution of shortest path lengths
107
108
Pick a random
[Leskovec&Horvitz,07]
shortest path lengths Microsoft Messenger network
105
106
odes
node, count how many
nodes are at di tnetwork
180 million people1 3 billi d
103
104
mbe
r of n
o distance 1,2,3... hops
71.3 billion edgesEdge if two people exchanged at least
101
102Num
exchanged at least one message in one month period
0 5 10 15 20 25 30100
Distance (Hops)
Faloutsos&LeskovecECML/PKDD 2007
Part 1-13
month period
CMU SCS
Small world effect (3)Small-world effect (3)If number of vertices within distance growsIf number of vertices within distance r grows exponentially with r, then mean shortest path length lincreases as log nImplications:
Information (viruses) spread quicklyErdos numbers are smallErdos numbers are smallPeer to peer networks (for navigation purposes)
Shortest paths exists H bl t fi d th thHumans are able to find the paths:
People only know their friendsPeople do not have the global knowledge of the network
This suggests something special about the structure of the network
On a random graph short paths exists but no one would be ableFaloutsos&LeskovecECML/PKDD 2007
Part 1-14
On a random graph short paths exists but no one would be able to find them
7
![Page 10: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/10.jpg)
CMU SCS
Degree distributions (1)Degree distributions (1)Let denote a fraction of nodes with degree kLet pk denote a fraction of nodes with degree kWe can plot a histogram of pk vs. kIn a (Erdos-Renyi) random graph degreeIn a (Erdos Renyi) random graph degree distribution follows Poisson distributionDegrees in real networks are heavily skewed to the rightthe rightDistribution has a long tail of values that are far above the meanPower-law [Faloutsos et al], Zipf’s law, Pareto’s law, Long tailMany things follow Power law:Many things follow Power-law:
Amazon sales,word length distribution, W lth E th k
Faloutsos&LeskovecECML/PKDD 2007
Part 1-15
Wealth, Earthquakes, …
CMU SCS
Degree distributions (2)Degree distributions (2)Degree distribution in a blog network
( l t th d t i diff t l )
10-3
10-2
2.5
3
3.5 x 10-3Many real world networks contain hubs: highly
(plot the same data using different scales)
lin-lin log-lin
p k p k
6
10-5
10-4
0.5
1
1.5
2hubs: highly connected nodesWe can easily
p
log
p
105
0 200 400 600 800 100010-6
0 200 400 600 800 10000
We can easily distinguish between exponential and power law tail by
kk
Power-law:
103
104power-law tail by plotting on log-lin and log-log axis g
p k
Power law:
101
102
a d og og a sPower-law is a lineon log-log plot log-log
log
Faloutsos&LeskovecECML/PKDD 2007
Part 1-16100 101 102 103 104
100
g g
log k
8
![Page 11: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/11.jpg)
CMU SCS
Poisson vs Scale free networkPoisson vs. Scale-free network
Poisson network S l f ( l ) t kPoisson network Scale-free (power-law) network
Function is
(Erdos-Renyi random graph)Degree distribution is
Faloutsos&LeskovecECML/PKDD 2007
Part 1-17
Function is scale free if:f(ax) = c f(x)Degree distribution is Poisson
distribution is Power-law
CMU SCS
Network resilience (1)Network resilience (1)W b h thWe observe how the connectivity (length of the paths) of the network p )changes as the vertices get removed [Albert et al, 00; Palmer et al 01]Palmer et al, 01]Vertices can be removed:
Uniformly at randomUniformly at randomIn order of decreasing degree
It is important for id i lepidemiologyRemoval of vertices corresponds to vaccination
Faloutsos&LeskovecECML/PKDD 2007
Part 1-18
p
9
![Page 12: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/12.jpg)
CMU SCS
Network resilience (2)Network resilience (2)Real-world networks are resilient to random attacks
One has to remove all web-pages of degree > 5 to disconnect the webBut this is a very small percentage of web pages
Random network has better resilience to targeted attacks
Random networkInternet (Autonomous systems)
Preferential
engt
h
Preferentialremoval
an p
ath
le
Randomremoval
Me
Faloutsos&LeskovecECML/PKDD 2007
Part 1-19Fraction of removed nodes Fraction of removed nodes
CMU SCS
Community structureCommunity structureMost social networks showMost social networks show community structure
groups have higher density of edges ithi thwithin than accross groups
People naturally divide into groups based on interests, age, occupation, …
How to find communities:Spectral clustering (embedding into a l di )low-dim space)Hierarchical clustering based on connection strengthC bi t i l l ith ( i tCombinatorial algorithms (min cut style formulations)Block modelsDiff i th d
Faloutsos&LeskovecECML/PKDD 2007
Part 1-20
Diffusion methods Friendship network of children in a school
10
![Page 13: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/13.jpg)
CMU SCS
Spectral propertiesSpectral properties
Eigenvalues of graph adjacency
Eigenvalue distribution in online social networkgraph adjacency
matrix follow a power law
valu
e
online social network
Network values (components of principal Ei
genv
principal eigenvector) also follow a power-law
log
p[Chakrabarti et al]
log Rank
Faloutsos&LeskovecECML/PKDD 2007
Part 1-21
CMU SCS
What about evolving graphs?What about evolving graphs?
Conventional wisdom/intuition:Constant average degree: the number of g gedges grows linearly with the number of nodesSlowly growing diameter: as the network grows the distances between nodes growgrows the distances between nodes grow
Faloutsos&LeskovecECML/PKDD 2007
Part 1-22
11
![Page 14: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/14.jpg)
CMU SCS
Networks over time: DensificationNetworks over time: DensificationA i l ti Wh t i th l tiA simple question: What is the relation between the number of nodes and the number of edges in a network over time?number of edges in a network over time?Let:
N(t) … nodes at time t( )E(t) … edges at time t
Suppose that:N(t+1) = 2 * N(t)
Q: what is your guess for E( 1) ? 2 * E( )E(t+1) =? 2 * E(t)
A: over-doubled!But obeying the Densification Power Law [KDD05]
Faloutsos&LeskovecECML/PKDD 2007
Part 1-23
But obeying the Densification Power Law [KDD05]
CMU SCS
Networks over time: DensificationNetworks over time: DensificationInternet
Networks are denser over time The number of edges grows faster than the number of nodes
Internet
E(t)
1 2faster than the number of nodes – average degree is increasing lo
g E a=1.2
a … densification exponent log N(t)1 ≤ a ≤ 2:
a=1: linear growth – constant out-degree (assumed in the literature so f )
Citations
E(t)
far)a=2: quadratic growth – clique
a=1.7
log
E
Faloutsos&LeskovecECML/PKDD 2007
Part 1-24log N(t)
12
![Page 15: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/15.jpg)
CMU SCS
Densification & degree distributionDensification & degree distributionH d d ifi ti ff t
Case (a): Degree exponent γ is constant over time TheHow does densification affect
degree distribution?Densification:
γ is constant over time. The network densifies, a=1.2
Densification:Degree distribution: pk=kγGiven densification exponent a
γ(t)
Given densification exponent a, the degree exponent is [TKDD07]:
(a) For γ=const over time, we obtain Case (b): Degree exponent time t
( ) γ ,densification only for 1<γ<2, and then it holds: γ=a/2(b) For γ<2 degree distribution
γ evolves over time. The network densifies, a=1.6
(b) For γ<2 degree distribution evolves according to:
γ(t)
Faloutsos&LeskovecECML/PKDD 2007
Part 1-25time t
Given: densification a, number of nodes n
CMU SCS
Shrinking diametersShrinking diametersInternet
Intuition and prior work say that distances between the
Internet
met
er
nodes slowly grow as the network grows (like log n):
diam
d ~ O(log N)d ~ O(log log N)
Citations
size of the graph
Diameter Shrinks/Stabilizes over time
Citations
er
as the network grows the distances between nodes diam
ete
Faloutsos&LeskovecECML/PKDD 2007
Part 1-26slowly decrease [KDD 05]
time
13
![Page 16: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/16.jpg)
CMU SCS
Properties hold in many graphsProperties hold in many graphsTh tt b b d i lThese patterns can be observed in many real world networks:
World wide web [Barabasi]World wide web [Barabasi]On-line communities [Holme, Edling, Liljeros]Who call whom telephone networks [Cortes]Internet backbone – routers [Faloutsos, Faloutsos, Faloutsos]Movies to actors network [Barabasi]Science citations [Leskovec Kleinberg Faloutsos]Science citations [Leskovec, Kleinberg, Faloutsos]Click-streams [Chakrabarti]Autonomous systems [Faloutsos, Faloutsos, Faloutsos]Co-authorship [Leskovec, Kleinberg, Faloutsos]Sexual relationships [Liljeros]
Faloutsos&LeskovecECML/PKDD 2007
Part 1-27
CMU SCS
Part 1.2: ModelsWe saw properties
H d fi d d l ?How do we find models?
14
![Page 17: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/17.jpg)
CMU SCS
1 2 Models: Outline1.2 Models: Outline
W t th f ll ti liWe present the follow timeline(Erdos-Renyi) Random graphsExponential random graphsSmall-world modelPreferential attachmentEdge copying modelEdge copying modelCommunity guided attachmentForest fireForest fireKronecker graphs
Faloutsos&LeskovecECML/PKDD 2007
Part 1-29
CMU SCS
(Erdos Renyi) Random graph(Erdos-Renyi) Random graphAl k P i d hAlso known as Poisson random graphs or Bernoulli graphs [Erdos&Renyi, 60s]
Gi ti t h i i i d ithGiven n vertices connect each pair i.i.d. with probability p
Two variants:Two variants:Gn,p: graph with m edges appears with probability pm(1-p)M-m, where M=0.5n(n-1) is the max number p ( p) , ( )of edgesGn,m: graphs with n nodes, m edges
Does not mimic reality Very rich mathematical theory: many
Faloutsos&LeskovecECML/PKDD 2007
Part 1-30
y y yproperties are exactly solvable
15
![Page 18: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/18.jpg)
CMU SCS
Properties of random graphsProperties of random graphs
Degree distribution is Poisson since the presence and absence of edges is independent
!)1(
kezpp
kn
pzk
knkk
−− ≈−⎟⎟
⎠
⎞⎜⎜⎝
⎛=
Giant component: average degree k=2m/n:k=1-ε: all components are of size Ω(log n)
⎠⎝
k 1 ε: all components are of size Ω(log n)k=1+ε: there is 1 component of size Ω(n)
All others are of size Ω(log n)( g )They are a tree plus an edge, i.e., cycles
Diameter: log n / log kFaloutsos&LeskovecECML/PKDD 2007
Part 1-31
g g
CMU SCS
Evolution of a random graphEvolution of a random graph
verti
ces
on-G
CC
vfo
r no
Faloutsos&LeskovecECML/PKDD 2007
Part 1-32k
16
![Page 19: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/19.jpg)
CMU SCS
Subgraphs in random graphsSubgraphs in random graphsExpected number ofExpected number of
subgraphs H(v,e) in Gn,p is
n ev⎞⎛ !apnp
av
vn
XEev
e ≈⎟⎟⎠
⎞⎜⎜⎝
⎛=
!)(
a... # of isomorphic graphs
Faloutsos&LeskovecECML/PKDD 2007
Part 1-33
CMU SCS
Random graphs: conclusionRandom graphs: conclusionP
Configuration modelPros:
Simple and tractable modelPhase transitionsPhase transitionsGiant component
Cons:Degree distributionNo community structureNo degree correlationsNo degree correlations
Extensions:Configuration model
Random graphs with arbitrary degree sequenceExcess degree: Degree of a vertex of the end of random edge: qk = k pk
Faloutsos&LeskovecECML/PKDD 2007
Part 1-34
17
![Page 20: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/20.jpg)
CMU SCS
Exponential random graphs(p* models)
Social sciences to analyze the small networks
Examples of εi
Let εi set of measurable properties of a graph (number of edges, g p ( g ,number of nodes of a given degree, number of triangles, …)Exponential random graph model defines a probability distribution over graphs:
Faloutsos&LeskovecECML/PKDD 2007
Part 1-35
CMU SCS
Exponential random graphsExponential random graphsIncludes Erdos-Renyi as a specialIncludes Erdos Renyi as a special caseAssume parameters βi are specifiedspecified
No analytical solutions for the modelBut can use simulation to sample the graphs:graphs:
Define local moves on a graph:Addition/removal of edgesMovement of edges
Example of parameter estimates:Movement of edgesEdge swaps
Parameter estimation: maximum likelihoodmaximum likelihood
Problem:Can’t solve for transitivity (produces cliques)
Faloutsos&LeskovecECML/PKDD 2007
Part 1-36
cliques)Used to analyze small networks
18
![Page 21: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/21.jpg)
CMU SCS
Small world modelSmall-world modelUsed for modeling network transitivityM k ki d f hi lMany networks assume some kind of geographical proximitySmall-world model:Small world model:
Start with a low-dimensional regular latticeRewire:
Add/remove edges to create shortcuts to join remote parts of theAdd/remove edges to create shortcuts to join remote parts of the latticeFor each edge with prob p move the other end to a random vertex
Rewiring allows to interpolate between regular latticeRewiring allows to interpolate between regular lattice and random graph
Faloutsos&LeskovecECML/PKDD 2007
Part 1-37Watts&Strogatz,1998
CMU SCS
Small-world modelSmall world model
Regular lattice (p=0):Clustering coefficient g
C=(3k-3)/(4k-2)=3/4Mean distance L/4k
Almost random graphAlmost random graph (p=1):
Clustering coefficient C=2k/LRewiring probability p
Clustering coefficient C 2k/LMean distance log L / log k
But, real graphs have g ppower-law degree distribution
Faloutsos&LeskovecECML/PKDD 2007
Part 1-38
Degree distribution
19
![Page 22: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/22.jpg)
CMU SCS
Preferential attachmentPreferential attachmentB t d h h P i d di t ib tiBut, random graphs have Poisson degree distributionLet’s find a better model
Preferential attachment [Price 1965, Albert & Barabasi 1999]:
Add a new node, create m out-linksProbability of linking a node ki is
proportional to its degreep p gBased on Herbert Simon’s result
Power-laws arise from “Rich get richer” (cumulative advantage)Examples (Price 1965 for modeling citations):
Citations: new citations of a paper are proportional to the number it already has
Faloutsos&LeskovecECML/PKDD 2007
Part 1-39
it already has
CMU SCS
Preferential attachmentPreferential attachmentLeads to power-law degree distributions
3kBut:
3−∝ kpk
all nodes have equal (constant) out-degree
d l t k l d f thone needs a complete knowledge of the network
There are many generalizations andThere are many generalizations and variants, but the preferential selection is the key ingredient that
Faloutsos&LeskovecECML/PKDD 2007
Part 1-40leads to power-laws
20
![Page 23: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/23.jpg)
CMU SCS
Edge copying modelEdge copying modelB t f ti l tt h t d t hBut, preferential attachment does not have communitiesC i d l [Kl i b t l 99]Copying model [Kleinberg et al, 99]:
Add a node and choose k the number of edges to addWith b β l t k d ti d li k t thWith prob. β select k random vertices and link to themWith prob. 1-β edges are copied from a randomly chosen nodechosen node
Generates power-law degree distributions with exponent 1/(1-β)exponent 1/(1 β)Generates communities
Faloutsos&LeskovecECML/PKDD 2007
Part 1-41
CMU SCS
Community guided attachmentCommunity guided attachmentBut we want toBut, we want to model/explain densification in networks
University
Assume community CS
Science Arts
Assume community structureOne expects many
CS Math Drama Music
One expects many within-group friendships and fewer cross-group onesonesCommunity guided attachment [KDD05]
Self-similar university community structure
Faloutsos&LeskovecECML/PKDD 2007
Part 1-42
attachment [KDD05]
21
![Page 24: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/24.jpg)
CMU SCS
Community guided attachmentCommunity guided attachmentAssuming cross-community linking probabilityAssuming cross community linking probability
The Community Guided Attachment leads toThe Community Guided Attachment leads to Densification Power Law with exponent
a … densification exponentb … community tree branching factory gc … difficulty constant, 1 ≤ c ≤ b
If c = 1: easy to cross communitiesTh 2 d ti th f d liThen: a=2, quadratic growth of edges – near clique
If c = b: hard to cross communitiesThen: a=1 linear growth of edges – constant out-degree
Faloutsos&LeskovecECML/PKDD 2007
Part 1-43
Then: a 1, linear growth of edges constant out degree
CMU SCS
Forest Fire ModelForest Fire ModelBut we do not want to have explicitBut, we do not want to have explicit communitiesWant to model graphs that density and haveWant to model graphs that density and have shrinking diametersIntuition:
How do we meet friends at a party?How do we identify references when writing papers?papers?
Faloutsos&LeskovecECML/PKDD 2007
Part 1-44
22
![Page 25: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/25.jpg)
CMU SCS
Forest Fire ModelForest Fire ModelTh F t Fi d l [KDD05] h 2The Forest Fire model [KDD05] has 2 parameters:
f d b i b bilitp … forward burning probabilityr … backward burning probability
The model:The model:Each turn a new node v arrivesUniformly at random chooses an “ambassador”Uniformly at random chooses an ambassador wFlip two geometric coins to determine the number in-and out-links of w to follow (burn)and out links of w to follow (burn)Fire spreads recursively until it diesNode v links to all burned nodes
Faloutsos&LeskovecECML/PKDD 2007
Part 1-45
CMU SCS
Forest Fire ModelForest Fire ModelForest Fire generates graphs thatForest Fire generates graphs that densify and have shrinking diameter
densification diameterE(t)
densification diameter
r
1.32
amet
erdi
a
Faloutsos&LeskovecECML/PKDD 2007
Part 1-46N(t) N(t)
23
![Page 26: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/26.jpg)
CMU SCS
Forest Fire ModelForest Fire Model
Forest Fire also generates graphs with Power-Law degree distributiong
in-degree out-degreeg g
Faloutsos&LeskovecECML/PKDD 2007
Part 1-47log count vs. log in-degree log count vs. log out-degree
CMU SCS
Forest Fire: Phase transitionsForest Fire: Phase transitionsFix backwardFix backward probability r and vary forward burning
b bilit Cli likprobability pWe observe a sharp transition between
Clique-likegraphIncreasing
diametertransition between sparse and clique-like graphs Sparse
h
diameterConstantdiameter
Sweet spot is very narrow
graphDecreasing
diameter
Faloutsos&LeskovecECML/PKDD 2007
Part 1-48
24
![Page 27: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/27.jpg)
CMU SCS
Kronecker graphsKronecker graphsB t t t h d l th t tBut, want to have a model that can generate a realistic graph with realistic growth:
St ti P ttStatic PatternsPower Law Degree DistributionSmall DiameterSmall DiameterPower Law Eigenvalue and Eigenvector Distribution
Temporal PatternsDensification Power LawShrinking/Constant Diameter
For Kronecker graphs [PKDD05] all theseFor Kronecker graphs [PKDD05] all these properties can actually be proven
Faloutsos&LeskovecECML/PKDD 2007
Part 1-49
CMU SCS
Idea: Recursive graph generationStarting with our intuitions from densification
Idea: Recursive graph generationStarting with our intuitions from densificationTry to mimic recursive graph/community growth because self similarity leads to power-lawsThere are many obvious (but wrong) ways:
Initial graph Recursive expansion
Does not densify, has increasing diameterKronecker Product is a way of generating self-similar
Faloutsos&LeskovecECML/PKDD 2007
Part 1-50
y g gmatrices
25
![Page 28: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/28.jpg)
CMU SCS
Kronecker product: GraphKronecker product: Graph
Intermediate stage
(9x9)(3x3)
Faloutsos&LeskovecECML/PKDD 2007
Part 1-51Adjacency matrixAdjacency matrix
CMU SCS
Kronecker product: GraphKronecker product: Graph
Continuing multypling with G1 we obtain G4 and so on …4
Faloutsos&LeskovecECML/PKDD 2007
Part 1-52G4 adjacency matrix
26
![Page 29: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/29.jpg)
CMU SCS
Kronecker product: DefinitionKronecker product: DefinitionTh K k d t f t i A d B iThe Kronecker product of matrices A and B is given by
N x M K x L
N*K x M*L
We define a Kronecker product of two graphs as a Kronecker product of their adjacency matrices
Faloutsos&LeskovecECML/PKDD 2007
Part 1-53
CMU SCS
Kronecker graphsKronecker graphsW i fWe propose a growing sequence of graphs by iterating the Kronecker productproduct
Each Kronecker multiplicationEach Kronecker multiplication exponentially increases the size of the graphgraphGk has N1
k nodes and E1k edges, so we
get densificationFaloutsos&LeskovecECML/PKDD 2007
Part 1-54
get densification
27
![Page 30: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/30.jpg)
CMU SCS
Kronecker graphs: Intuition (1)Kronecker graphs: Intuition (1)
Intuition:Recursive growth of graph communitiesNodes get expanded to micro communitiesNodes in sub-community link among themselves and to nodes from different communities
Faloutsos&LeskovecECML/PKDD 2007
Part 1-55
CMU SCS
Stochastic Kronecker graphsStochastic Kronecker graphs
But, want a randomized version of Kronecker graphsg pPossible strategies:
R d l dd/d l t dRandomly add/delete some edgesThreshold the matrix, e.g. use only the strongest edges
Wrong, will destroy the structure of theWrong, will destroy the structure of the graph, e.g. diameter, clustering
Faloutsos&LeskovecECML/PKDD 2007
Part 1-56
28
![Page 31: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/31.jpg)
CMU SCS
Stochastic Kronecker graphsStochastic Kronecker graphs
Create N1×N1 probability matrix P1
Compute the kth Kronecker power Pkp p k
For each entry puv of Pk include an edge (u,v)with probability puvwith probability puv
0 25 0 10 0 10 0 04Kronecker
Probability of edge pij
0.5 0.2 Instance0.25 0.10 0.10 0.040.05 0.15 0.02 0.06
Kroneckermultiplication
0.1 0.3P1
matrix K20.05 0.02 0.15 0.060.01 0.03 0.03 0.09
flip biasedFaloutsos&LeskovecECML/PKDD 2007
Part 1-57
1
P2=P1⊗P1
flip biased coins
CMU SCS
Properties of Kronecker graphsProperties of Kronecker graphs
We can show [PKDD05] that Kronecker multiplication generates graphs that have:
Properties of static networksPower Law Degree DistributionPower Law eigenvalue and eigenvector distributionSmall Diameter
Properties of dynamic networksProperties of dynamic networksDensification Power LawShrinking/Stabilizing DiameterShrinking/Stabilizing Diameter
Faloutsos&LeskovecECML/PKDD 2007
Part 1-58
29
![Page 32: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/32.jpg)
CMU SCS
1 3: Fitting models to real1.3: Fitting models to real graphsg p
We saw the modelsWe saw the models.Want to fit a model to a large real
h?graph?
CMU SCS
The problemThe problem
W t t t li ti t kWe want to generate realistic networks:
Some statistical property, e.g., degree distribution
Given a real network
Generate a synthetic network
P1) What are the relevant properties?P2) What is a good analytically tractable model?P2) What is a good analytically tractable model?P3) How can we fit the model (find parameters)?
Faloutsos&LeskovecECML/PKDD 2007
Part 1-60
parameters)?
30
![Page 33: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/33.jpg)
CMU SCS
Model estimation: approachModel estimation: approach
M i lik lih d ti tiMaximum likelihood estimationGiven real graph GE i K k i i i h ( )Estimate Kronecker initiator graph Θ (e.g., ) which
)|(maxarg ΘGPWe need to (efficiently) calculate
)|(maxarg ΘΘ
GP( y)
)|( ΘGPAnd maximize over Θ (e.g., using gradient descent)
Faloutsos&LeskovecECML/PKDD 2007
Part 1-61
)
CMU SCS
Fitting Kronecker graphsFitting Kronecker graphsGiven a graph G and Kronecker matrix Θ we
GG e a g ap G a d o ec e at Θ ecalculate probability that Θ generated G P(G|Θ)
0.25 0.10 0.10 0.040 05 0 15 0 02 0 060 5 0 2
1 0 1 1
0 1 0 10.05 0.15 0.02 0.060.05 0.02 0.15 0.060 01 0 03 0 03 0 09
0.5 0.20.1 0.3
0 1 0 1
1 0 1 1
1 1 1 10.01 0.03 0.03 0.09Θ
Θk
1 1 1 1
GP(G|Θ)P(G|Θ)
]),[1(],[)|( vuvuGP kk Θ−ΠΘΠ=ΘFaloutsos&LeskovecECML/PKDD 2007
Part 1-62
]),[(],[)|(),(),( kGvukGvu ∉∈
31
![Page 34: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/34.jpg)
CMU SCS
ChallengesChallenges
Challenge 1: Mode correspondence problemp
How the map the nodes of the real graph to the nodes of the synthetic graph?the nodes of the synthetic graph?
Challenge 2: ScalabilityFor large graphs O(N2) is too slowScaling to large graphs – performing the g g g p p gcalculations quickly
Faloutsos&LeskovecECML/PKDD 2007
Part 1-63
CMU SCS
Challenge 1: Node correspondenceNodes are unlabeled
Challenge 1: Node correspondence0 25 0 10 0 10 0 04
ΘΘk
Nodes are unlabeledGraphs G’ and G” should have the same probability
0.25 0.10 0.10 0.04
0.05 0.15 0.02 0.06
0.05 0.02 0.15 0.06
0.5 0.20.1 0.3
have the same probabilityP(G’|Θ) = P(G”|Θ)One needs to consider all
0.01 0.03 0.03 0.09
G’ σOne needs to consider all node correspondences σ
1 0 1 0
0 1 1 1
1 1 1 1
1
23
)()|()|( PGPGP ∑ ΘΘ
All correspondences are a
0 0 1 14
2
)(),|()|( σσσ
PGPGP ∑ Θ=Θ
1 0 1 1G”
All correspondences are a priori equally likelyThere are O(N!)
2
14
3
0 1 0 1
1 0 1 1
1 1 1 1
Faloutsos&LeskovecECML/PKDD 2007
Part 1-64
There are O(N!)correspondences
1 1 1 1
P(G’|Θ) = P(G”|Θ)
32
![Page 35: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/35.jpg)
CMU SCS
Challenge 2: calculating P(G|Θ σ)Challenge 2: calculating P(G|Θ,σ)Assume we solved the correspondence problemCalculating
σ… node labeling
]),[1(],[)|(),(),( vukGvuvukGvu
GP σσσσ Θ−ΠΘΠ=Θ∉∈
Takes O(N2) timeInfeasible for large graphs (N ~ 105)
σ… node labeling
0.25 0.10 0.10 0.040.05 0.15 0.02 0.06
1 0 1 10 1 0 1
σ0.05 0.02 0.15 0.060.01 0.03 0.03 0.09
1 0 1 10 0 1 1
σ
Faloutsos&LeskovecECML/PKDD 2007
Part 1-65G
P(G|Θ, σ)Θkc
CMU SCS
Model estimation: solutionModel estimation: solution
N ï l ti ti th K k i iti tNaïvely estimating the Kronecker initiatortakes O(N!N2) time:
N! for graph isomorphism Metropolis sampling: N! (big) const
N2 for traversing the graph adjacency matrixProperties of Kronecker product and sparsity(E << N2): N2 E
We can estimate the parameters of Kronecker graph in linear time O(E)For details see [ICML07]
Faloutsos&LeskovecECML/PKDD 2007
Part 1-66
For details see [ICML07]
33
![Page 36: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/36.jpg)
CMU SCS
Solution 1: Node correspondenceSolution 1: Node correspondence Log-likelihoodLog likelihood
Gradient of log-likelihoodg
Sample the permutations from P(σ|G,Θ) and average the gradients
Faloutsos&LeskovecECML/PKDD 2007
Part 1-67
( | , ) g g
CMU SCS
Sampling node correspondencesSampling node correspondencesMetropolis sampling:p p g
Start with a random permutationDo local moves on the permutationAccept the new permutation
If new permutation is better (gives higher likelihood)If new is worse accept with probability proportional toIf new is worse accept with probability proportional to the ratio of likelihoods
1 4Swap node1
23
4
23
14
Swap node labels 1 and 4
1 0 1 0
0 1 1 11 1 1 0
1 1 1 0
123
12
Can compute efficiently:Only need to account for changes in 2 rows /
Faloutsos&LeskovecECML/PKDD 2007
Part 1-68
1 1 1 1
0 1 1 11 1 1 1
0 0 1 1
34
34
gcolumns
34
![Page 37: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/37.jpg)
CMU SCS
Solution 2: Calculating P(G|Θ σ)Solution 2: Calculating P(G|Θ,σ)
Calculating naively P(G|Θ,σ) takes O(N2)Idea:Idea:
First calculate likelihood of empty graph, a graph with 0 edgesgraph with 0 edgesCorrect the likelihood for edges that we observe i th hin the graph
By exploiting the structure of Kronecker y p gproduct we obtain closed form for likelihood of an empty graph
Faloutsos&LeskovecECML/PKDD 2007
Part 1-69
of an empty graph
CMU SCS
Solution 2: Calculating P(G|Θ σ)Solution 2: Calculating P(G|Θ,σ)
W i t th lik lih dWe approximate the likelihood:
No edge likelihood Edge likelihoodEmpty graph
The sum goes only over the edges
No-edge likelihood Edge likelihoodEmpty graph
g y gEvaluating P(G|Θ,σ) takes O(E) timeReal graphs are sparse E << N2
Faloutsos&LeskovecECML/PKDD 2007
Part 1-70
Real graphs are sparse, E << N2
35
![Page 38: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/38.jpg)
CMU SCS
Experiments: synthetic dataExperiments: synthetic data
C di t d t tCan gradient descent recover true parameters?Optimization problem is not convexHow nice (without local minima) isHow nice (without local minima) is optimization space?
Generate a graph from random parametersGenerate a graph from random parametersStart at random point and use gradient descentdescentWe recover true parameters 98% of the times
Faloutsos&LeskovecECML/PKDD 2007
Part 1-71
CMU SCS
Convergence of propertiesConvergence of propertiesHow does algorithm converge to trueHow does algorithm converge to true parameters with gradient descent iterations?
kelih
ood
bs e
rror
Log-
lik
Avg
ab
Gradient descent iterations
ue
Gradient descent iterations
Dia
met
er
eige
nval
u
Faloutsos&LeskovecECML/PKDD 2007
Part 1-72
D
1ste
36
![Page 39: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/39.jpg)
CMU SCS
Experiments: real networksExperiments: real networks
E i t l tExperimental setup:Given real graphStochastic gradient descent from random initial pointObtain estimated parametersGenerate synthetic graphs y g pCompare properties of both graphs
We do not fit the properties themselvesWe do not fit the properties themselves We fit the likelihood and then compare the
h tiFaloutsos&LeskovecECML/PKDD 2007
Part 1-73graph properties
CMU SCS
AS graph (N=6500 E=26500)AS graph (N=6500, E=26500)
Autonomous systems (internet)We search the space of ~1050,000 permutationsp pFitting takes 20 minutesAS graph is undirected and estimatedAS graph is undirected and estimated parameter matrix is symmetric:
0.98 0.580.58 0.06
Faloutsos&LeskovecECML/PKDD 2007
Part 1-74
37
![Page 40: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/40.jpg)
CMU SCS
AS: comparing graph propertiesAS: comparing graph properties
Generate synthetic graph using estimated parameterspCompare the properties of two graphs
Degree distribution Hop plot
pair
s
coun
t
acha
ble
pdiameter=4
log
og #
of r
e
Faloutsos&LeskovecECML/PKDD 2007
Part 1-75log degree number of hops
lo
CMU SCS
AS i h tiAS: comparing graph propertiesSpectral properties of graph adjacency matrices
Network valueScree plot
nval
ue
alue
log
eige
log
v a
log rank log rank
Faloutsos&LeskovecECML/PKDD 2007
Part 1-76
og ank
38
![Page 41: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/41.jpg)
CMU SCS
Epinions graph (N=76k E=510k)Epinions graph (N=76k, E=510k)1 000 000We search the space of ~101,000,000 permutations
Fitting takes 2 hoursThe structure of the estimated parameter gives insight into the structure of the graph
0.99 0.540.49 0.13
Degree distribution Hop plot
irs
ount
chab
le p
ai
log
co
g #
of re
ac
Faloutsos&LeskovecECML/PKDD 2007
Part 1-77log degree number of hops
log
CMU SCS
E i i h (N 76k E 510k)Epinions graph (N=76k, E=510k)
N t k lS l t Network valueScree plot
ueei
genv
alu
log
log rank log rank
Faloutsos&LeskovecECML/PKDD 2007
Part 1-78
39
![Page 42: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/42.jpg)
CMU SCS
ScalabilityScalability
Fitting scales linearly with the number of edgesg
Faloutsos&LeskovecECML/PKDD 2007
Part 1-79
CMU SCS
ConclusionConclusionK k G h d l hKronecker Graph model has
provable propertiesll b f tsmall number of parameters
We developed scalable algorithms for fitting Kronecker GraphsKronecker GraphsWe can efficiently search large space ( 101 000 000) of permutations(~101,000,000) of permutationsKronecker graphs fit well real networks using few parametersfew parametersWe match graph properties without a priori deciding on which ones to fit
Faloutsos&LeskovecECML/PKDD 2007
Part 1-80
deciding on which ones to fit
40
![Page 43: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/43.jpg)
CMU SCS
Why should we care?Why should we care?
Gives insight into the graph formation process:Anomaly detection – abnormal behavior, evolutionPredictions – predicting future from the pastSimulations of new algorithms where real graphs are hard/impossible to collectGraph sampling – many real world graphs are too l t d l ithlarge to deal with“What if” scenarios
Faloutsos&LeskovecECML/PKDD 2007
Part 1-81
CMU SCS
ReferencesReferencesGraphs over Time: Densification Laws Shrinking Diameters and PossibleGraphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, by Jure Leskovec, Jon Kleinberg, Christos Faloutsos, ACM KDD 2005Graph Evolution: Densification and Shrinking Diameters, by Jure Leskovec, Jon Kleinberg and Christos Faloutsos ACM TKDD 2007Kleinberg and Christos Faloutsos, ACM TKDD 2007Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, by Jure Leskovec, Deepay Chakrabarti, Jon Kleinberg and Christos Faloutsos, PKDD 2005Scalable Modeling of Real Graphs using Kronecker Multiplication by JureScalable Modeling of Real Graphs using Kronecker Multiplication, by Jure Leskovec and Christos Faloutsos, ICML 2007The Dynamics of Viral Marketing, by Jure Leskovec, Lada Adamic, Bernardo Huberman, ACM EC 2006Collective dynamics of 'small world' networks by Duncan J Watts and Steven HCollective dynamics of 'small-world' networks, by Duncan J. Watts and Steven H. Strogatz, Nature 1998Emergence of scaling in random networks, by R. Albert and A.-L. Barabasi, Science 1999O th l ti f d h b P E d d A R i P bli ti f thOn the evolution of random graphs, by P. Erdos and A. Renyi, Publication of the Mathematical Institute of the Hungarian Acadamy of Science, 1960
Faloutsos&LeskovecECML/PKDD 2007
Part 1-82
41
![Page 44: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/44.jpg)
CMU SCS
ReferencesReferencesThe structure and function of complex networks M Newman SIAM ReviewThe structure and function of complex networks, M. Newman, SIAM Review 2003Hierarchical Organization in Complex Networks, Ravasz and Barabasi, Physical Review E 2003A random graph model for massive graphs W Aiello F Chung and L Lu STOCA random graph model for massive graphs, W. Aiello, F. Chung and L. Lu, STOC 2000Community structure in social and biological networks, by Girvan and Newman, PNAS 2002O P l R l ti hi f th I t t T l b F l t F l tOn Power-law Relationships of the Internet Topology by Faloutsos, Faloutsos, and Faloutsos, SIGCOM 1999Power laws, Pareto distributions and Zipf's law by M. Newman, Contemporary Physics 2005S i l N k A l i M h d d A li i W C b idSocial Network Analysis : Methods and Applications, Wasserman, Cambridge University Press 1994The web as a graph: Measurements, models and methods, J. Kleinberg and S. R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins, COCOON 1998
Some plots borrowed from Lada Adamic, Mark Newman, Mark Joseph, Albert Barabasi, Jon Kleinberg, David Lieben-Nowell, Sergi Valverde, and Ricard Sole
Faloutsos&LeskovecECML/PKDD 2007
Part 1-83
42
![Page 45: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/45.jpg)
CMU SCS
Tutorial outlineTutorial outline
Part 1: Laws and generators:What are properties of large graphs?How do we model them?
Part 2: Tools for static and dynamics graphsy g pMethods based on Singular Value Decomposition
Part 3: Case studiesPart 3: Case studiesPropagation and detection of viruses in networksGraph projectionsGraph projections
Faloutsos&LeskovecECML/PKDD 2007
Part 2-1
CMU SCS
Part 2: OutlinePart 2: Outline
MotivationMatrix tools
SVD, PCAHITS, PageRank
Tensor toolsCase studies
gCURCo-clusteringCase studies Co-clustering
Faloutsos&LeskovecECML/PKDD 2007
Part 2-2
43
![Page 46: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/46.jpg)
CMU SCS
Examples of MatricesExamples of Matrices
Example/Intuition: Documents and termsFind patterns groups conceptsFind patterns, groups, concepts
data mining classif. tree ...13 11 22 55 ...5 4 6 7 ...
Paper#1Paper#2
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
Paper#3Paper#4
...Faloutsos&LeskovecECML/PKDD 2007
Part 2-3
...
CMU SCS
Singular Value Decomposition (SVD)Singular Value Decomposition (SVD)X = UΣVTX UΣV
X U
v1σ1
Σ VT
u1 u2 ukx(1) x(2) x(M) = . v2.σ2
vkσk
i l l right singular vectors
input data left singular vectors
singular values
Faloutsos&LeskovecECML/PKDD 2007
Part 2-4
44
![Page 47: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/47.jpg)
CMU SCS
SVD as spectral decompositionSVD as spectral decomposition
n nσ1u1°v1 σ2u2°v2
AmΣ
mVT
≈ +Am ≈ +
Best rank-k approximation in L2 and U
ppFrobenius SVD only works for static matrices (a single
Faloutsos&LeskovecECML/PKDD 2007
Part 2-5
y ( g2nd order tensor)
See also PARAFAC
CMU SCS
Vector outer product intuition:Vector outer product – intuition:owner
20; 30; 40
car typeage
20; 30; 40
VW
20; 30; 40
A
VWVolvoBMW
VWVolvoBMWABMW
2-d histogram1-d histograms +
independence assumption
Faloutsos&LeskovecECML/PKDD 2007
Part 2-6
45
![Page 48: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/48.jpg)
CMU SCS
SVD ExampleSVD - Example
A = U Σ VT - example:retrieval
datainf.
retrievalbrain lung
0 18 01 1 1 0 02 2 2 0 01 1 1 0 0
0.18 00.36 00 18 0CS 9 64 01 1 1 0 0
5 5 5 0 00 0 0 2 2
0.18 00.90 00 0.53
=9.64 00 5.29x x
0 0 0 3 30 0 0 1 1
0 0.530 0.800 0.27
MD 0.58 0.58 0.58 0 00 0 0 0.71 0.71
Faloutsos&LeskovecECML/PKDD 2007
Part 2-7
CMU SCS
SVD ExampleSVD - Example
A = U Σ VT - example:retrieval CS concept
datainf.
retrievalbrain lung
0 18 0
CS-conceptMD-concept
1 1 1 0 02 2 2 0 01 1 1 0 0
0.18 00.36 00 18 0CS 9 64 01 1 1 0 0
5 5 5 0 00 0 0 2 2
0.18 00.90 00 0.53
=9.64 00 5.29x x
0 0 0 3 30 0 0 1 1
0 0.530 0.800 0.27
MD 0.58 0.58 0.58 0 00 0 0 0.71 0.71
Faloutsos&LeskovecECML/PKDD 2007
Part 2-8
46
![Page 49: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/49.jpg)
CMU SCS
SVD ExampleSVD - Example
A = U Σ VT - example:retrieval CS concept
doc-to-concept similarity matrix
datainf.
retrievalbrain lung
0 18 0
CS-conceptMD-concept
1 1 1 0 02 2 2 0 01 1 1 0 0
0.18 00.36 00 18 0CS 9 64 01 1 1 0 0
5 5 5 0 00 0 0 2 2
0.18 00.90 00 0.53
=9.64 00 5.29x x
0 0 0 3 30 0 0 1 1
0 0.530 0.800 0.27
MD 0.58 0.58 0.58 0 00 0 0 0.71 0.71
Faloutsos&LeskovecECML/PKDD 2007
Part 2-9
CMU SCS
SVD ExampleSVD - Example
A = U Σ VT - example:retrieval
datainf.
retrievalbrain lung
0 18 0
‘strength’ of CS-concept
1 1 1 0 02 2 2 0 01 1 1 0 0
0.18 00.36 00 18 0CS 9 64 01 1 1 0 0
5 5 5 0 00 0 0 2 2
0.18 00.90 00 0.53
=9.64 00 5.29x x
0 0 0 3 30 0 0 1 1
0 0.530 0.800 0.27
MD 0.58 0.58 0.58 0 00 0 0 0.71 0.71
Faloutsos&LeskovecECML/PKDD 2007
Part 2-10
47
![Page 50: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/50.jpg)
CMU SCS
SVD ExampleSVD - Example
A = U Σ VT - example:retrieval term-to-concept
datainf.
retrievalbrain lung
0 18 0
similarity matrix
CS1 1 1 0 02 2 2 0 01 1 1 0 0
0.18 00.36 00 18 0CS 9 64 0
CS-concept
1 1 1 0 05 5 5 0 00 0 0 2 2
0.18 00.90 00 0.53
=9.64 00 5.29x x
0 0 0 3 30 0 0 1 1
0 0.530 0.800 0.27
MD 0.58 0.58 0.58 0 00 0 0 0.71 0.71
Faloutsos&LeskovecECML/PKDD 2007
Part 2-11
CMU SCS
SVD ExampleSVD - Example
A = U Σ VT - example:retrieval term-to-concept
datainf.
retrievalbrain lung
0 18 0
similarity matrix
CS1 1 1 0 02 2 2 0 01 1 1 0 0
0.18 00.36 00 18 0CS 9 64 0
CS-concept
1 1 1 0 05 5 5 0 00 0 0 2 2
0.18 00.90 00 0.53
=9.64 00 5.29x x
0 0 0 3 30 0 0 1 1
0 0.530 0.800 0.27
MD 0.58 0.58 0.58 0 00 0 0 0.71 0.71
Faloutsos&LeskovecECML/PKDD 2007
Part 2-12
48
![Page 51: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/51.jpg)
CMU SCS
SVD InterpretationSVD - Interpretation
‘documents’, ‘terms’ and ‘concepts’:Q if A i th d t t t t i h tQ: if A is the document-to-term matrix, what
is AT A?A: term-to-term ([m x m]) similarity matrixQ: A AT ?Q: A A ?A: document-to-document ([n x n]) similarity
matrixmatrix
Faloutsos&LeskovecECML/PKDD 2007
Part 2-13
CMU SCS
SVD propertiesSVD properties
V are the eigenvectors of the covariance matrix ATA
U are the eigenvectors of the Gram (inner-U a e t e e ge ecto s o t e G a ( eproduct) matrix AAT
Further reading:
Faloutsos&LeskovecECML/PKDD 2007
Part 2-14
Further reading:1. Ian T. Jolliffe, Principal Component Analysis (2nd ed), Springer, 2002.2. Gilbert Strang, Linear Algebra and Its Applications (4th ed), Brooks Cole, 2005.
49
![Page 52: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/52.jpg)
CMU SCS
Principal Component Analysis (PCA)
SVDn n
RRk k
Am Σm
R
UVT k
Loading
PCs
Am m U Loading
PCA is an important application of SVDNote that U and V are dense and may have negative
Faloutsos&LeskovecECML/PKDD 2007
Part 2-15
y gentries
CMU SCS
PCA interpretationPCA interpretationbest axis to project on: (‘best’ = minbest axis to project on: ( best min sum of squares of projection errors)
Term2 (‘lung’)
Faloutsos&LeskovecECML/PKDD 2007
Part 2-16Term1 (‘data’)
50
![Page 53: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/53.jpg)
CMU SCS
PCA interpretationPCA - interpretation ΣU
VT
Term2 (‘retrieval’)
PCA projects points first singular vectorp j pOnto the “best” axis
v1
minimum RMS error Term1 (‘data’)
Faloutsos&LeskovecECML/PKDD 2007
Part 2-17
CMU SCS
Part 2: OutlinePart 2: Outline
MotivationMatrix tools
SVD, PCAHITS, PageRank
Tensor toolsCase studies
gCURCo-clusteringCase studies Co-clustering
Faloutsos&LeskovecECML/PKDD 2007
Part 2-18
51
![Page 54: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/54.jpg)
CMU SCS
Kleinberg’s algorithm HITSKleinberg s algorithm HITS
Problem dfn: given the web and a queryfind the most ‘authoritative’ web pages forfind the most authoritative web pages for this query
Step 0: find all pages containing the query termsp p g g q yStep 1: expand by one move forward and
backwardbackward
Faloutsos&LeskovecECML/PKDD 2007
Part 2-19Further reading:1. J. Kleinberg. Authoritative sources in a hyperlinked environment. SODA 1998
CMU SCS
Kleinberg’s algorithm HITSKleinberg s algorithm HITS
Step 1: expand by one move forward and backward
Faloutsos&LeskovecECML/PKDD 2007
Part 2-20
52
![Page 55: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/55.jpg)
CMU SCS
Kleinberg’s algorithm HITSKleinberg s algorithm HITS
on the resulting graph, give high score (= ‘authorities’) to nodes that many important ) y pnodes point togive high importance score (‘hubs’) togive high importance score ( hubs ) to nodes that point to good ‘authorities’
hubs authorities
Faloutsos&LeskovecECML/PKDD 2007
Part 2-21
CMU SCS
Kleinberg’s algorithm HITSKleinberg s algorithm HITS
Observationsrecursive definition!recursive definition!each node (say, ‘i’-th node) has both an
th it ti d h bauthoritativeness score ai and a hubness score hi
Faloutsos&LeskovecECML/PKDD 2007
Part 2-22
53
![Page 56: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/56.jpg)
CMU SCS
Kleinberg’s algorithm: HITSKleinberg s algorithm: HITS
Let A be the adjacency matrix: the (i,j) entry is 1 if the edge from i to j exists( ,j) y g j
Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scoreshubness and authoritativiness scores.
Then:
Faloutsos&LeskovecECML/PKDD 2007
Part 2-23
CMU SCS
Kleinberg’s algorithm: HITSKleinberg s algorithm: HITS
Then:ai = hk + hl + hai hk + hl + hm
that iskl i
ai = Sum (hj) over all j that (j,i) edge exists
lm (j, ) edge e sts
orTa = AT h
Faloutsos&LeskovecECML/PKDD 2007
Part 2-24
54
![Page 57: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/57.jpg)
CMU SCS
Kleinberg’s algorithm: HITSKleinberg s algorithm: HITS
symmetrically, for the ‘hubness’:hi = a + a + ani hi an + ap + aq
that isp
hi = Sum (qj) over all j that (i,j) edge existsq ( ,j) edge e sts
orh = A a
Faloutsos&LeskovecECML/PKDD 2007
Part 2-25
CMU SCS
Kleinberg’s algorithm: HITSKleinberg s algorithm: HITS
In conclusion, we want vectors h and asuch that:
h = A aAT ha = AT h
That is:at sa = ATA a
Faloutsos&LeskovecECML/PKDD 2007
Part 2-26
55
![Page 58: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/58.jpg)
CMU SCS
Kleinberg’s algorithm: HITSKleinberg s algorithm: HITS
i i ht i l t f th dja is a right singular vector of the adjacency matrix A (by dfn!), a.k.a the eigenvector
Tof ATA
Starting from random a’ and iterating, we’ll eventually convergeeventually converge
Q: to which of all the eigenvectors? why?A: to the one of the strongest eigenvalue,
(AT A ) k a = λ1ka
Faloutsos&LeskovecECML/PKDD 2007
Part 2-27
( ) 1
CMU SCS
Kleinberg’s algorithm -discussion
‘authority’ score can be used to find ‘similar pages’ (how?)p g ( )closely related to ‘citation analysis’, social networks / ‘small world’ phenomenanetworks / small world phenomena
Faloutsos&LeskovecECML/PKDD 2007
Part 2-28See also TOPHITS
56
![Page 59: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/59.jpg)
CMU SCS
Part 2: OutlinePart 2: Outline
MotivationMatrix tools
SVD, PCAHITS, PageRank
Tensor toolsCase studies
gCURCo-clusteringCase studies Co-clustering
Faloutsos&LeskovecECML/PKDD 2007
Part 2-29
CMU SCS
Motivating problem: PageRankMotivating problem: PageRank
Gi di t d h fi d it tGiven a directed graph, find its most interesting/central node
A node is important,if it is connectedif it is connected with important nodes( i b t OK!)(recursive, but OK!)
Faloutsos&LeskovecECML/PKDD 2007
Part 2-30
57
![Page 60: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/60.jpg)
CMU SCS
Motivating problem – PageRank g p gsolution
Given a directed graph, find its most interesting/central nodeinteresting/central node
Proposed solution: Random walk; spot most ‘ l ’ d ( t d t t b ( ))‘popular’ node (-> steady state prob. (ssp))
A d h hi hA node has high ssp,if it is connected with high ssp nodes(recursive, but OK!)
Faloutsos&LeskovecECML/PKDD 2007
Part 2-31
( , )
CMU SCS
(Simplified) PageRank algorithm(Simplified) PageRank algorithm
Let A be the transition matrix (= adjacency t i ) thmatrix); let B be the transpose, column-normalized - then
From B
1 2 3
p1
p2
p1
p2
To B1
1 11 2 3 p2
p3
p4
p2
p3
p4
=1 1
1/2 1/2
1/2
45
p4
p5
p4
p5
1/2
1/2
Faloutsos&LeskovecECML/PKDD 2007
Part 2-32
58
![Page 61: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/61.jpg)
CMU SCS
(Simplified) PageRank algorithm(Simplified) PageRank algorithm
B p = p
B p = p
1 2 3p1
p2
p1
p2
1
1 11 3 p2
p3
p4
p2
p3
p4
=1 1
1/2 1/2
1/245
p4
p5
p4
p5
1/2
1/2
Faloutsos&LeskovecECML/PKDD 2007
Part 2-33
CMU SCS
(Simplified) PageRank algorithm(Simplified) PageRank algorithm
B p = 1 * pthus p is the eigenvector thatthus, p is the eigenvector that corresponds to the highest eigenvalue (=1, i th t i i l li d)since the matrix is column-normalized)
Why does such a p exist? p exists if B is nxn, nonnegative, irreducible [Perron–Frobenius theorem][ e o obe us eo e ]
Faloutsos&LeskovecECML/PKDD 2007
Part 2-34
59
![Page 62: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/62.jpg)
CMU SCS
(Simplified) PageRank algorithm(Simplified) PageRank algorithm
In short: imagine a particle randomly moving along the edgesg g gcompute its steady-state probabilities (ssp)
Full version of algo: with occasional random u e s o o a go t occas o a a dojumps
Wh ? T k th t i i d iblWhy? To make the matrix irreducible
Faloutsos&LeskovecECML/PKDD 2007
Part 2-35
CMU SCS
Full AlgorithmFull Algorithm
With probability 1-c, fly-out to a random nodeThen, we have
B (1 )/ 1 >p = c B p + (1-c)/n 1 =>p = (1-c)/n [I - c B] -1 1
Faloutsos&LeskovecECML/PKDD 2007
Part 2-36
60
![Page 63: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/63.jpg)
CMU SCS
Part 2: OutlinePart 2: Outline
MotivationMatrix tools
SVD, PCAHITS, PageRank
Tensor toolsCase studies
gCURCo-clusteringCase studies Co-clustering
Faloutsos&LeskovecECML/PKDD 2007
Part 2-37
CMU SCS
Motivation of CUR or CMDMotivation of CUR or CMD
SVD, PCA all transform data into some abstract space (specified by a set basis)p ( p y )
Interpretability problemLoss of sparsityLoss of sparsity
Faloutsos&LeskovecECML/PKDD 2007
Part 2-38
61
![Page 64: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/64.jpg)
CMU SCS
PCA interpretationPCA - interpretation
Term2 (‘retrieval’)
PCA projects points first singular vectorp j pOnto the “best” axis
v1
minimum RMS error Term1 (‘data’)
Faloutsos&LeskovecECML/PKDD 2007
Part 2-39
CMU SCS
CURExample-based projection: use actual rows and
l t if th bcolumns to specify the subspaceGiven a matrix A∈Rm×n, find three matrices C∈R U R R R h th t ||A CUR|| iRm×c, U∈ Rc×r, R∈ Rr× n , such that ||A-CUR|| is small nn
C
RXm
r
AmC
c OrthogonalU is the pseudo-inverse of X
Orthogonal projection
Faloutsos&LeskovecECML/PKDD 2007
Part 2-40
62
![Page 65: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/65.jpg)
CMU SCS
CURExample-based projection: use actual rows and
l t if th bcolumns to specify the subspaceGiven a matrix A∈Rm×n, find three matrices C∈R U R R R h th t ||A CUR|| iRm×c, U∈ Rc×r, R∈ Rr× n , such that ||A-CUR|| is small nn
C
RXm
r
AmC
c Example-based
U is the pseudo-inverse of X:U = X† = (UT U )-1 UT
Faloutsos&LeskovecECML/PKDD 2007
Part 2-41
CMU SCS
CUR (cont )CUR (cont.)
Key question:How to select/sample the columns and rows?p
Uniform samplingBiased sampling
CUR w/ absolute error boundCUR w/ relative error bound
Reference:1. Tutorial: Randomized Algorithms for Matrices and Massive Datasets, SDM’062. Drineas et al. Subspace Sampling and Relative-error Matrix Approximation: Column-
Faloutsos&LeskovecECML/PKDD 2007
Part 2-42Row-Based Methods, ESA2006
3. Drineas et al., Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition, SIAM Journal on Computing, 2006.
63
![Page 66: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/66.jpg)
CMU SCS
The sparsity property pictorially:The sparsity property – pictorially:
=SVD/PCA:Destroys sparsity
U Σ VT
CUR i t i it
=
CUR: maintains sparsity
=
Faloutsos&LeskovecECML/PKDD 2007
Part 2-43C U R
CMU SCS
The sparsity propertyThe sparsity propertysparse and small
SVD: A = U Σ VT
p
Big but sparse Big and denseg p Big and dense
dense but small
CUR: A = C U RCUR: A C U RBig but sparse Bi b
Faloutsos&LeskovecECML/PKDD 2007
Part 2-44
Big but sparse Big but sparse
64
![Page 67: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/67.jpg)
CMU SCS
Th it t ( t )The sparsity property (cont.)SVDCUR
102 SVD
CUR
102
e ra
tio
CURCMD
1e ra
tio
CURCMD
101
spac
e r
101
spac
e
C
0 0.2 0.4 0.6 0.8 1
10
accuracy0 0.2 0.4 0.6 0.8 1
accuracy
Network DBLPCMD uses much smaller space to achieve the same accuracyCUR li i i d li l dCUR limitation: duplicate columns and rowsSVD limitation: orthogonal projection densifies h d
Faloutsos&LeskovecECML/PKDD 2007
Part 2-45the data
Reference:Sun et al. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM’07
CMU SCS
Tensors for time evolving graphsTensors for time evolving graphs
[Jimeng Sun+ KDD’06][Jimeng Sun+ SMD’07][ g ][CF, Kolda, Sun, SDM’07 tutorial]tutorial]
Faloutsos&LeskovecECML/PKDD 2007
Part 2-46
65
![Page 68: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/68.jpg)
CMU SCS
Social network analysisSocial network analysis
Static: find community structures
Keywords1990
thor
s
DB
Aut
Faloutsos&LeskovecECML/PKDD 2007
Part 2-47
CMU SCS
Social network analysisSocial network analysis
Static: find community structures Dynamic: monitor community structure evolution;Dynamic: monitor community structure evolution; spot abnormal individuals; abnormal time-stamps Keywordsstamps Keywords
DM
2004
DB
1990
DButho
rs
Faloutsos&LeskovecECML/PKDD 2007
Part 2-48
Au
66
![Page 69: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/69.jpg)
CMU SCS
Multiway latent semantic indexing (LSI)(LSI)
2004 Philip Yu
DM2004
1990Michael
Stonebreakerutho
rs
p
DB1990
hors
Uau
DBUkeyword
auth
QueryPatternkeyword
P j ti t i if th l tProjection matrices specify the clustersCore tensors give cluster activation level
Faloutsos&LeskovecECML/PKDD 2007
Part 2-49
g
CMU SCS
Bibliographic data (DBLP)g p ( )
P f VLDB d KDD fPapers from VLDB and KDD conferencesConstruct 2nd order tensors with yearly y ywindows with <author, keywords> Each tensor: 4584×3741Each tensor: 4584×374111 timestamps (years)
Faloutsos&LeskovecECML/PKDD 2007
Part 2-50
67
![Page 70: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/70.jpg)
CMU SCS
Multiway LSIMultiway LSIAuthors Keywords Year
i h l i h l i ll l i i i 199michael carey, michaelstonebreaker, h. jagadish,hector garcia-molina
queri,parallel,optimization,concurr,objectorient
1995
DBsurajit chaudhuri,mitch cherniack,michaelstonebreaker,ugur etintemel
distribut,systems,view,storage,servic,process,cache
2004DB
jiawei han,jian pei,philip s. yu,jianyong wang,charu c. aggarwal
streams,pattern,support, cluster, index,gener,queri
2004
DM
Two groups are correctly identified: Databases and Data mininggPeople and concepts are drifting over time
Faloutsos&LeskovecECML/PKDD 2007
Part 2-51
CMU SCS
ConclusionsConclusions
Tensor-based methods (WTA/DTA/STA):spot patterns and anomalies on timespot patterns and anomalies on time evolving graphs, and
t ( it i )on streams (monitoring)
Faloutsos&LeskovecECML/PKDD 2007
Part 2-52
68
![Page 71: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/71.jpg)
CMU SCS
RoadmapRoadmap
MotivationMatrix tools
SVD, PCAHITS, PageRank
Tensor toolsCase studies
gCURCo-clusteringCase studies Co-clustering
Faloutsos&LeskovecECML/PKDD 2007
Part 2-53
CMU SCS
Co clusteringCo-clustering
Given data matrix and the number of row and column groups k and lg pSimultaneously
Cl t f (X Y) i t k di j i tCluster rows of p(X, Y) into k disjoint groups Cluster columns of p(X, Y) into l disjoint groups
Faloutsos&LeskovecECML/PKDD 2007
Part 2-54
69
![Page 72: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/72.jpg)
CMU SCS
Co clusteringdetails
Co-clustering
Let X and Y be discrete random variables X and Y take values in 1, 2, …, m and 1, 2, …, np(X, Y) denotes the joint probability distribution—if not known, it is often estimated based on co-
d toccurrence dataApplication areas: text mining, market-basket analysis analysis of browsing behavior etcanalysis, analysis of browsing behavior, etc.
Key Obstacles in Clustering Contingency Tables High Dimensionality, Sparsity, NoiseNeed for robust and scalable algorithms
Faloutsos&LeskovecECML/PKDD 2007
Part 2-55Reference:1. Dhillon et al. Information-Theoretic Co-clustering, KDD’03
CMU SCS
⎥⎤
⎢⎡ 00005.05.05.
n
eg terms x documents
⎥⎥⎥⎤
⎢⎢⎢⎡
05.05.05.00005.05.05.00000005.05.05.
meg, terms x documents
⎥⎥⎦⎢
⎢⎣ 04.04.004.04.04.
04.04.04.004.04.
nlk
⎥⎥⎤
⎢⎢⎡
054054042000000042.054.054.000042.054.054.
⎥⎥⎤
⎢⎢⎡
050005.005.
⎥⎦⎤
⎢⎣⎡
223.003. [ ]
36.36.28.00000028.36.36. =
nl
k
kl
⎥⎥⎥
⎦⎢⎢⎢
⎣036.036.028.028036.036.054.054.042.000054.054.042.000
⎥⎥⎥
⎦⎢⎢⎢
⎣5.0005.005.0 ⎦⎣ 2.2.
m
⎥⎦⎢⎣ 036.036.028.028.036.036.⎥⎦⎢⎣ 5.00
Faloutsos&LeskovecECML/PKDD 2007
Part 2-56
70
![Page 73: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/73.jpg)
CMU SCS
med. doccs doc
⎤⎡ 00005.05.05. med. terms
⎥⎥⎥⎤
⎢⎢⎢⎡
05.05.05.00005.05.05.00000005.05.05.
term group xcs terms
⎥⎥⎦⎢
⎢⎣ 04.04.004.04.04.
04.04.04.004.04.term group xdoc. group common terms
⎥⎥⎤
⎢⎢⎡
000042.054.054.000042.054.054.
⎥⎥⎤
⎢⎢⎡
005.005.
⎥⎦⎤
⎢⎣⎡
223.003. [ ]
36.36.28.00000028.36.36. =
⎥⎥⎥
⎦⎢⎢⎢
⎣036.036.028.028036.036.054.054.042.000054.054.042.000
⎥⎥⎥
⎦⎢⎢⎢
⎣5.0005.005.0 ⎦⎣ 2.2.
doc xd ⎥⎦⎢⎣ 036.036.028.028.036.036.⎥⎦⎢⎣ 5.00
t
doc group
Faloutsos&LeskovecECML/PKDD 2007
Part 2-57term x
term-group
CMU SCS
Co clusteringCo-clustering
Observationsuses KL divergence instead of L2uses KL divergence, instead of L2the middle matrix is not diagonal
we’ll see that again in the Tucker tensor decomposition
Faloutsos&LeskovecECML/PKDD 2007
Part 2-58
71
![Page 74: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/74.jpg)
CMU SCS
Problem with Information Theoretic Co-clustering
Number of row and column groups must be specifiedp
Desiderata:
Simultaneously discover row and column groups
Fully Automatic: No “magic numbers”Fully Automatic: No magic numbers
Scalable to large graphs
Faloutsos&LeskovecECML/PKDD 2007
Part 2-59
CMU SCS
Cross associationsCross-associations
Desiderata:
Simultaneously discover row and column groups
Fully Automatic: No “magic numbers”Fully Automatic: No magic numbers
Scalable to large matrices
Faloutsos&LeskovecECML/PKDD 2007
Part 2-60Reference:1. Chakrabarti et al. Fully Automatic Cross-Associations, KDD’04
72
![Page 75: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/75.jpg)
CMU SCS
What makes a cross-association “good”?
ups
ups
Why is this better?
versus
Row
gro
u
Row
gro
u better?
Column groups
Column gro ps
R Rgroups groups
Faloutsos&LeskovecECML/PKDD 2007
Part 2-61
CMU SCS
What makes a cross-association “good”?
ups
ups
Why is this better?
versus
Row
gro
u
Row
gro
u better?
Column groups
Column gro ps
R R
groups groups
simpler; easier to describesimpler; easier to describeeasier to compress!
Faloutsos&LeskovecECML/PKDD 2007
Part 2-62
73
![Page 76: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/76.jpg)
CMU SCS
What makes a cross-association “good”?
Problem definition: given an encoding scheme• decide on the # of col and row groups k and ldecide on the # of col. and row groups k and l• and reorder rows and columns,• to achieve best compressionto achieve best compression
Faloutsos&LeskovecECML/PKDD 2007
Part 2-63
CMU SCS
Main Ideadetails
Main Idea
Good Compression
Better Clustering
sizei * H(xi) + Cost of describing cross-associationsΣiTotal Encoding Cost =
Code Cost Description C tCost
Minimize the total cost (# bits)Minimize the total cost (# bits)
for lossless compressionFaloutsos&LeskovecECML/PKDD 2007
Part 2-64
74
![Page 77: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/77.jpg)
CMU SCS
AlgorithmAlgorithml 5 l
k = 5
l = 5 col groups
row grouups
k=1, l=2
k=2, l=2
k=2, l=3
k=3, l=3
k=3, l=4
k=4, l=4
k=4, l=5
Faloutsos&LeskovecECML/PKDD 2007
Part 2-65
CMU SCS
AlgorithmAlgorithm
Code for cross-associations (matlab):
www.cs.cmu.edu/~deepay/mywww/software/CrossAssociations-01-27-2005.tgzions 01 27 2005.tgz
Variations and extensions:Variations and extensions:‘Autopart’ [Chakrabarti, PKDD’04]www.cs.cmu.edu/~deepay
Faloutsos&LeskovecECML/PKDD 2007
Part 2-66
75
![Page 78: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/78.jpg)
CMU SCS
Matrix tools summaryMatrix tools - summary
SVD: optimal for L2 – VERY popular (HITS, p p p ( ,PageRank, Karhunen-Loeve, Latent Semantic Indexing, PCA, etc etc)g, , )
C-U-R (CMD etc)ti l it i t t bilitnear-optimal; sparsity; interpretability
Tensors
Faloutsos&LeskovecECML/PKDD 2007
Part 2-67
76
![Page 79: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/79.jpg)
CMU SCS
Part 3: Case studiesPart 3: Case studies(Virus) Propagation in networks:(Virus) Propagation in networks:
properties, models, detectionWeb projections:Web projections:
How to predict user intentions?Center Piece Subgraphs:Center Piece Subgraphs:
Find best subgraph connecting a set of nodes
CMU SCS
Part 3 1: Virus and influencePart 3.1: Virus and influence propagationp p g
77
![Page 80: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/80.jpg)
CMU SCS
Cascading Behavior in gLarge Blog Graphs
Blogs Posts
Information
H d i f ti t
LinksInformation
cascade
How does information propagate over the blogosphere?
H d bl it d i fl h th ?How do blogs cite and influence each other?How do such links evolve?
Faloutsos, LeskovecECML/PKDD 2007
Part 3-3
CMU SCS
Motivation: Why blogs?Motivation: Why blogs?
Blogs are a widely used medium of information for many topics and have y pbecome an important mode of communicationcommunicationBlogs cite one another, creating a recordf fof how information and ideas spread
through the social networkgThis record is publicly available
Faloutsos, LeskovecECML/PKDD 2007
Part 3-4
78
![Page 81: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/81.jpg)
CMU SCS
Immediate GoalsImmediate Goals
Temporal questions: Does popularity have half-life? Is there periodicity?p yTopological questions: What topological patterns do posts and blogs follow? Whatpatterns do posts and blogs follow? What shapes to cascades take on? Stars? C ? S ?Chains? Something else?Models: Can we build a generative modelModels: Can we build a generative model that mimics properties of cascades?
Faloutsos, LeskovecECML/PKDD 2007
Part 3-5
CMU SCS
Cascades on the BlogosphereCascades on the BlogosphereB1 a1B1 B2
a
b cB1 B2
11
21
B4B3
deB4
B3
2
1 3
Blogosphereblogs + posts
Blog networklinks among blogs
Post networklinks among posts
Cascade is graph induced by d b ca
a time ordered propagation of information (edges)
Cascadese e
Faloutsos, LeskovecECML/PKDD 2007
Part 3-6
Cascades
79
![Page 82: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/82.jpg)
CMU SCS
Blog dataBlog data45 000 blogs participating in cascades45,000 blogs participating in cascadesAll their posts for 3 months (Aug-Sept ‘05)2 4 illi t2.4 million posts ~5 million links (245,404 inside the dataset)
sr o
f pos
tsN
umbe
r
Faloutsos, LeskovecECML/PKDD 2007
Part 3-7Time [1 day]
CMU SCS
Temporal ObservationsTemporal Observations
Posts have weekend effect, less traffic on po
sts
# lin
ks
Saturday/Sunday.Day of the week Day of the week
#
#
Post popularity drop-off follows a power law which can be explained by a simple queuing model
nks
(log)
p y p q g
The probability that a post written at time t acquires a
mbe
r in-
linwritten at time tp acquires a link at time tp + Δ is:
p(t +Δ) ~ Δ-1.5
Faloutsos, LeskovecECML/PKDD 2007
Part 3-8Days after post
Nup(tp+Δ) ~ Δ
80
![Page 83: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/83.jpg)
CMU SCS
Blog Networkg44,356 nodes, 122,153 edges. Half of blogs b l t l t t d tbelong to largest connected component.
e) ks
log
scal
e
og in
-link
Cou
nt (
ber o
f blo
Number of blog in-links (log scale) Number of blog out-linksN
umIn- and out-degree follow power law distribution. In-degree exponent 1.7, out-degree exponent -3.
But, in-links and out-links are not correlated. ρ=0.16
Faloutsos, LeskovecECML/PKDD 2007
Part 3-9
out degree exponent 3.
Strong rich-get-richer phenomena.
CMU SCS
Post NetworkPost Network2.4M nodes, 250K edges, gBoth in- and out-degree follow power laws. In-degree exponent - un
t
2.1, out-degree exponent -3.
Power laws also in total degree d b f t bl
Cou
and number of posts per blog.Number of blog-blog links
ount
unt
Co
Cou
Faloutsos, LeskovecECML/PKDD 2007
Part 3-10Post in-degree Posts per blog
81
![Page 84: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/84.jpg)
CMU SCS
Topological patterns: CascadesTopological patterns: CascadesProcedure for gathering cascades:Procedure for gathering cascades:
Find all initiators (nodes with out-degree 0)Follow in-linksProduces directed acyclic graphProduces directed acyclic graphCount cascade shapes (use our multi-level graph isomorphism testing algorithm)graph isomorphism testing algorithm)
a a
d b c ab c
d
b c
d
d
e
b c
e
Faloutsos, LeskovecECML/PKDD 2007
Part 3-11e e
CMU SCS
Cascade shapesCascade shapesC dCascade shapes
(ranked by(ranked by frequency)
The probability of observing a cascade on n nodesa cascade on n nodes follows a Zipf distribution:
p(n) n-2
Also power laws in in/out-degree, type of cascades (chains stars)
p(n) ~ n 2
Faloutsos, LeskovecECML/PKDD 2007
Part 3-12
type of cascades (chains, stars), and degree per level.
82
![Page 85: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/85.jpg)
CMU SCS
Properties of CascadesProperties of CascadesMost of cascades are trees
edge
s
met
er
umbe
r of
ctiv
e di
am
Cascade size (number of nodes)
Nu
Cascade size
Effe
c
t
Number of cascades per
Cou
nt
Cou
ntcascades per node also
follows power-l di t ib ti
Faloutsos, LeskovecECML/PKDD 2007
Part 3-13
Number of joined cascades Cascades per node
law distribution.
CMU SCS
Cascade Generation ModelCascade Generation Model1) Randomly pick blog to 2) Infect each in-linked1) Randomly pick blog to infect, add to cascade.
1
2) Infect each in linked neighbor with probability β.
B B1
B1 B2
11
21
B1B1
B1 B21
21
B4B31 3
3) Add infected neighbors 4) Set node infected in (i) to
B4B31 3
B1B B
1
3) Add infected neighbors to cascade.
4) Set node infected in (i) to uninfected.
B1 B21
21
B1 B21
21B1
B
B1
Faloutsos, LeskovecECML/PKDD 2007
Part 3-14B4B31 3B4
B31 3B4 B4
83
![Page 86: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/86.jpg)
CMU SCS
Experimental ResultsExperimental ResultsG ti d l
ount
ount
Generative model produces realistic cascades C Ccascades
β=0.025Cascade size Cascade node in-degree
Cou
nt
Cou
ntC
Faloutsos, LeskovecECML/PKDD 2007
Part 3-15Most frequent cascades Size of star cascade Size of chain cascade
CMU SCS
ConclusionsConclusionsTemporal PropertiesTemporal PropertiesPopularity drop-off follows power-law distribution exactly as found in response times in other work
Topological Properties
exactly as found in response times in other work. Posts follow weekly periodicity.
Topological PropertiesPower law distributions in almost every topological property Star cascades are more common than chainsproperty. Star cascades are more common than chains, and size of cascades follow a power law.
Generative ModelGenerative ModelWe developed a generative model based on SIS model in epidemiology that matched properties of
Faloutsos, LeskovecECML/PKDD 2007
Part 3-16
model in epidemiology that matched properties of cascades.
84
![Page 87: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/87.jpg)
CMU SCS
Part 3 2: How to detectPart 3.2: How to detect outbreaks in networks?
CMU SCS
Scenario 1: Water networkScenario 1: Water network
Given a real city water distributionwater distribution networkAnd data on how S
Where should we place sensors to efficientlycontaminants spread
in the network (for any possible
sensors to efficientlydetect the all possible
contaminations?any possible contaminant starting point) S
contaminations?p )
Faloutsos, LeskovecECML/PKDD 2007
Part 3-18
85
![Page 88: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/88.jpg)
CMU SCS
Scenario 2: Cascades in blogsScenario 2: Cascades in blogs
Which blogs sho ld oneBlogs Posts
Which blogs should one read to detect stories as
Information effectively as possible?
Links cascade
Faloutsos, LeskovecECML/PKDD 2007
Part 3-19
CMU SCS
What means effectively?What means effectively?
What do we mean by effectively:Want to capture stories/contaminations soon?
Time to detection
Want to detect as many contaminations/stories as ibl ?possible?
Detection likelihood
Want to be the first to know? Want few people to getWant to be the first to know? Want few people to get infected?
Population affectedPopulation affected
What is the cost of reading a blog (placing a sensor)?
Faloutsos, LeskovecECML/PKDD 2007
Part 3-20
sensor)?
86
![Page 89: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/89.jpg)
CMU SCS
General problemGeneral problemW t t k d bl hWater networks and blogs have common structure:
Given a dynamic process spreading over the networkGiven a dynamic process spreading over the network (e.g., a virus, information, influence)We want to select a set of nodes to detect the process effectivelyprocess effectively
Many other examples:Epidemics – which people to monitor to detectEpidemics – which people to monitor to detect disease outbreaks?Viral marketing – who are easily influenced people?Water distribution network – where to place sensors?Information propagation networks
Faloutsos, LeskovecECML/PKDD 2007
Part 3-21
CMU SCS
Problem settingProblem settingGiven a graph G(V E)Given a graph G(V,E)and a budget of B sensors and data on how contaminations spread over thepnetwork:
for each contamination i we know the time T(i, u) when itcontaminated node ucontaminated node u
We want to select a subset of nodes A that minimizes the expected penalty typ p y
Detection time
Pen
alt
where T(i, A) is the time when placement A first detects contamination i
Detection time
Faloutsos, LeskovecECML/PKDD 2007
Part 3-22
detects co ta at o iThe later we detect higher the penalty
87
![Page 90: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/90.jpg)
CMU SCS
Objective (penalty) functionsObjective (penalty) functionsObj ti f ti d fi d b B ttl fObjective functions were defined by Battle of Water Sensor Networks competitionW h th t th bj ti f ti ( ltWe show that the objective functions (penalty reductions):
Ti t d t ti (DT)Time to detection (DT)How long does it take to detect a contamination?
Detection likelihood (DL)Detection likelihood (DL)How many contaminations do we detect?
Population affected (PA)How many people drank contaminated water?
are submodular
Faloutsos, LeskovecECML/PKDD 2007
Part 3-23
CMU SCS
Objective function propertiesObjective function propertiesSubmodularity a diminishing returnsSubmodularity, a diminishing returns property of functions:
For all placements it holds
The marginal benefit (penalty reduction) of ddi l di i i hadding a sensor only diminishes as more
sensors are addedO ti i i b d l f ti iOptimizing submodular functions is NP-hard
Faloutsos, LeskovecECML/PKDD 2007
Part 3-24
88
![Page 91: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/91.jpg)
CMU SCS
Unit cost: Greedy algorithmUnit cost: Greedy algorithmOptimization is NP hard soOptimization is NP-hard, so lets use simple greedy algorithmbenefit gHow far is greedy solution from the optimal solution?
a
b b
dbenefit
Unit cost (each sensor costs the same):
A greedy algorithm is near
b
c
a
ce
A greedy algorithm is near optimal – it is at most 1-1/e(~63%) from optimal [N h t l ‘78]
d
e [Nemhauser et al ‘78] Greedy algorithm is slow
scales as O(|V|B)
e
Faloutsos, LeskovecECML/PKDD 2007
Part 3-25
scales as O(|V|B)
CMU SCS
Variable cost: Greedy algorithmVariable cost: Greedy algorithm
B t l i diff t ( diBut placing different sensors (reading different blogs) may have different costsVariable sensor cost:
Greedy algorithm can fail arbitrarily badlyGreedy algorithm can fail arbitrarily badlyWe develop an new algorithm that achieves ½(1-1/e) factor approximation½(1 1/e) factor approximation
We also develop a new algorithm independent bound that is in practiceindependent bound that is in practice much tighter
Faloutsos, LeskovecECML/PKDD 2007
Part 3-26
89
![Page 92: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/92.jpg)
CMU SCS
Scaling up greedy algorithmScaling up greedy algorithm
Submodularity guarantees that marginal improvements decrease with the solution psizeImprovement for including sensor u nowImprovement for including sensor u nowsmaller (or equal) than it was in previous
fstep of the greedy algorithm
Can do lazy evaluations! [Robertazzi et al, ‘89]
Faloutsos, LeskovecECML/PKDD 2007
Part 3-27
al, 89]
CMU SCS
Scaling up greedy algorithmScaling up greedy algorithm
Our improved algorithm:Keep an ordered list of d
benefitpmarginal benefits bi from previous iteration
a
b ab
d
pRe-evaluate bi for top-sensor
cc
d
e
sensorIf that removes it from the top pick next sensor
d
e
top, pick next sensorContinue until top remains
h dFaloutsos, LeskovecECML/PKDD 2007
Part 3-28unchanged
90
![Page 93: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/93.jpg)
CMU SCS
Scaling up greedy algorithmScaling up greedy algorithm
dbenefit
Our improved algorithm:Keep an ordered list of
a
ab
d
b
pmarginal benefits bi from previous iteration
cd
c e
pRe-evaluate bi for top-sensor d
e
sensorIf that removes it from the top pick next sensortop, pick next sensorContinue until top remains
h dFaloutsos, LeskovecECML/PKDD 2007
Part 3-29unchanged
CMU SCS
Scaling up greedy algorithmScaling up greedy algorithm
Our improved algorithm:Keep an ordered list of d
benefitpmarginal benefits bi from previous iteration
a
ab
d
d
pRe-evaluate bi for top-sensor
cb
e
e
sensorIf that removes it from the top pick next sensor
c
e
top, pick next sensorContinue until top remains
h dFaloutsos, LeskovecECML/PKDD 2007
Part 3-30unchanged
91
![Page 94: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/94.jpg)
CMU SCS
Experiments: 2 case studiesExperiments: 2 case studies
W h l ti d tWe have real propagation dataBlog network:
We crawled blogs for 1 yearWe identified cascades – temporal propagation of i f tiinformation
Water distribution network:B ttl f W t S N t k titiBattle of Water Sensor Networks competitionReal city water distribution networksRealistic simulator of water consumption and waterRealistic simulator of water consumption and water flow provided by US Environmental Protection Agency
Faloutsos, LeskovecECML/PKDD 2007
Part 3-31
g y
CMU SCS
Case study 1: Cascades in blogsCase study 1: Cascades in blogs
We crawled 45,000 blogs for 1 yearWe obtained 10 million postsWe obtained 10 million postsAnd identified 350,000 cascades
Faloutsos, LeskovecECML/PKDD 2007
Part 3-32
92
![Page 95: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/95.jpg)
CMU SCS
Blogs: Solution qualityBlogs: Solution quality
Our bound is much tighter13% instead of 66%% %
Our algorithm
Faloutsos, LeskovecECML/PKDD 2007
Part 3-33
CMU SCS
Blogs: Cost of a blog (1)Blogs: Cost of a blog (1)If ll bl t thIf all blogs cost the same algorithm picks large popular blogs:large popular blogs: instapundit.com, michellemalkin.com
But reading largeBut reading large blogs is time consumingSo consider the case where cost of a blog is proportional to theis proportional to the number of posts on the blog
Faloutsos, LeskovecECML/PKDD 2007
Part 3-34
g
93
![Page 96: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/96.jpg)
CMU SCS
Blogs: Cost of a blog (2)Blogs: Cost of a blog (2)B t th l ithBut then algorithm picks lots of small blogs that participateblogs that participate in few cascadesWe pick best
Score π=0.4π=0.3We pick best
solution that interpolates betweeninterpolates between the costsWe can get good
π=0.3
We can get good solutions with few blogs and few posts Each curve represents solutions
Faloutsos, LeskovecECML/PKDD 2007
Part 3-35
g p Each curve represents solutions with the same score
CMU SCS
Blogs: Comparison to heuristicsBlogs: Comparison to heuristics
Use heuristics to select blogsgNumber of in-links and out links of aand out-links of a blog perform bestBut much worse than our algorithmthan our algorithm
Faloutsos, LeskovecECML/PKDD 2007
Part 3-36
94
![Page 97: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/97.jpg)
CMU SCS
Blogs: ScalabilityBlogs: Scalability
Our algorithm that uses lazy yevaluations runs 700 times faster700 times faster than simple greedy algorithmgreedy algorithm
Faloutsos, LeskovecECML/PKDD 2007
Part 3-37
CMU SCS
Case study 2: Water networkCase study 2: Water network
B ttl f W t S N t kBattle for Water Sensor Networks competitionReal metropolitan area water distribution network (largest network optimized):( g p )
V = 21,000 nodesE = 25 000 pipesE = 25,000 pipes
3.6 million different epidemic scenarios (152 GB of simulation data)(152 GB of simulation data)By exploiting sparsity we can fit it into
Faloutsos, LeskovecECML/PKDD 2007
Part 3-38main memory (16GB)
95
![Page 98: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/98.jpg)
CMU SCS
Water network: Solution qualityWater network: Solution quality
Our bound is much tighter
Faloutsos, LeskovecECML/PKDD 2007
Part 3-39
CMU SCS
Water network: Heuristic placementWater network: Heuristic placement
Our algorithms outperforms heuristic node selection
Faloutsos, LeskovecECML/PKDD 2007
Part 3-40selection
96
![Page 99: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/99.jpg)
CMU SCS
Water network: Sensor placementWater network: Sensor placement
Different objective functions give different sensor placementsp
Population affected Detection likelihood
Faloutsos, LeskovecECML/PKDD 2007
Part 3-41
Population affected
CMU SCS
Water network: ScalabilityWater network: Scalability
Our algorithm is 10 times faster than greedy
Faloutsos, LeskovecECML/PKDD 2007
Part 3-42
g y
97
![Page 100: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/100.jpg)
CMU SCS
Results of BWSN [Ostfeld et al]Results of BWSN [Ostfeld et al]Author #non- dominated
(out of 30)(out of 30)Our approach 26Berry et. al. 21
Battle of Water Sensor Networks y
Dorini et. al. 20Wu and Walski 19
competitionMulti criterion Ostfeld et al 14
Propato and Piller 12Eli d l 11
Multi-criterion optimization
Eliades et. al. 11Huang et. al. 7Guan et al 4
[Ostfeld et al ‘07]: count number of Guan et. al. 4
Ghimire and Barkdoll 3Trachtman 2
count number of non-dominated solutions
Faloutsos, LeskovecECML/PKDD 2007
Part 3-43Gueli 2Preis and Ostfeld 1
solutions
CMU SCS
Other resultsOther results
We also consider Fractional selection of the blogsgGeneralization to future unseen cascadesMulti criterion optimization when one caresMulti-criterion optimization when one cares about more than one optimization function
And show that triggering model of Kempe et al is a special case of out network poutbreak detection problem
Faloutsos, LeskovecECML/PKDD 2007
Part 3-44
98
![Page 101: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/101.jpg)
CMU SCS
Conclusion (1)Conclusion (1)
We presented a general methodology for selecting nodes to detect outbreaks in gnetworksFor non unit cost case we developed anFor non-unit cost case we developed an algorithm that achieves ½(1-1/e) of optimal
l isolutionWe developed a new online bound that inWe developed a new online bound that in practice is very tight
Faloutsos, LeskovecECML/PKDD 2007
Part 3-45
CMU SCS
Conclusion (2)Conclusion (2)
We demonstrate how lazy evaluation drastically speeds up our algorithmy p p gWe show that our methodology can easily be extended to specific problem domainsbe extended to specific problem domainsWe evaluated our approach on two large real world application domains with many domain specific questionsdomain specific questions
Faloutsos, LeskovecECML/PKDD 2007
Part 3-46
99
![Page 102: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/102.jpg)
CMU SCS
Part 3.3: Web projections
CMU SCS
MotivationMotivationI f ti t i l t diti ll id dInformation retrieval traditionally considered documents as independentWeb retrieval incorporates global hyperlinkWeb retrieval incorporates global hyperlink relationships to enhance ranking (e.g., PageRank, HITS)g )
Operates on the entire graphUses just one feature (principal eigenvector) of the graphthe graph
Our work on Web projections focuses oncontextual subsets of the web graph; in-betweencontextual subsets of the web graph; in between the independent and global consideration of the documentsa rich set of graph theoretic properties
Faloutsos, LeskovecECML/PKDD 2007
Part 3-48
a rich set of graph theoretic properties
100
![Page 103: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/103.jpg)
CMU SCS
Web projectionsWeb projections
Web projections: How they work?Project a set of web pages of interest onto the web graphThis creates a subgraph of the web called
j ti hprojection graphUse the graph-theoretic properties of the subgraph for tasks of interestfor tasks of interest
Query projectionsQuery results give the context (set of web pages)Use characteristics of the resulting graphs for
di ti b t h lit d b h iFaloutsos, LeskovecECML/PKDD 2007
Part 3-49predictions about search quality and user behavior
CMU SCS
Query projectionsQuery Results Projection on the web graph
Query projections
Q• -- -- ----• --- --- ----• ------ ---• ----- --- --• ------ -----• ------ -----
Query projection graph Query connection graph
Generate graphical
Faloutsos, LeskovecECML/PKDD 2007
Part 3-50features Construct
case libraryPredictions
101
![Page 104: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/104.jpg)
CMU SCS
Questions we exploreQuestions we explore
H d h lt j tHow do query search results project onto the underlying web graph?Can we predict the quality of search results from the projection on the web p jgraph?Can we predict users’ behaviors withCan we predict users behaviors with issuing and reformulating queries?
Faloutsos, LeskovecECML/PKDD 2007
Part 3-51
CMU SCS
Is this a good set of search results?Is this a good set of search results?
Faloutsos, LeskovecECML/PKDD 2007
Part 3-52
102
![Page 105: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/105.jpg)
CMU SCS
Will the user reformulate the query?Will the user reformulate the query?
Faloutsos, LeskovecECML/PKDD 2007
Part 3-53
CMU SCS
Resources and conceptsResources and conceptsWeb as a graphWeb as a graph
URL graph:Nodes are web pages, edges are hyper-linksD f M h 2006Data from March 2006 Graph: 22 million nodes, 355 million edges
Domain graph:g pNodes are domains (cmu.edu, bbc.co.uk). Directed edge (u,v) if there exists a webpage at domain u pointing to vData from February 2006 yGraph: 40 million nodes, 720 million edges
Contextual subgraphs for queriesProjection graphConnection graph
Compute graph theoretic featuresFaloutsos, LeskovecECML/PKDD 2007
Part 3-54
Compute graph-theoretic features
103
![Page 106: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/106.jpg)
CMU SCS
“Projection” graphProjection graph
Example query: Subaru
Project top 20 results byProject top 20 results by the search engineNumber in the node denotes the searchdenotes the search engine rankColor indicates relevancy yas assigned by human:
PerfectExcellentExcellentGoodFairPoor
Faloutsos, LeskovecECML/PKDD 2007
Part 3-55
PoorIrrelevant
CMU SCS
“Connection” graphConnection graphProjection graph is generally disconnectedFi d t dFind connector nodesConnector nodes are existing nodes that areexisting nodes that are not part of the original result setesu t setIdeally, we would like to introduce fewest possible pnodes to make projection graph connected Connector
nodesFaloutsos, LeskovecECML/PKDD 2007
Part 3-56
nodesProjection nodes
104
![Page 107: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/107.jpg)
CMU SCS
Finding connector nodesFinding connector nodesFind connector nodes is a Steiner tree problem whichFind connector nodes is a Steiner tree problem which is NP hardOur heuristic:
Connect 2nd largest connected component via shortest path toConnect 2nd largest connected component via shortest path to the largestThis makes a new largest componentRepeat until the graph is connectedRepeat until the graph is connected
2nd largest component
Largest component
Faloutsos, LeskovecECML/PKDD 2007
Part 3-572nd largest component
CMU SCS
Extracting graph featuresExtracting graph featuresTh idThe idea
Find features that describe the str ct re of the graphstructure of the graphThen use the features for machine learningmachine learning
Want features that describeC ti it f th h
vs.Connectivity of the graphCentrality of projection and connector nodesconnector nodesClustering and density of the core of the graph
Faloutsos, LeskovecECML/PKDD 2007
Part 3-58
core of the graph
105
![Page 108: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/108.jpg)
CMU SCS
Examples of graph featuresExamples of graph featuresProjection graphj g p
Number of nodes/edgesNumber of connected componentsSize and density of the largest connected componentN b f t i d i th hNumber of triads in the graph
Connection graphN b f t d
vs.Number of connector nodesMaximal connector node degreeMean path length betweenMean path length between projection/connector nodesTriads on connector nodes
Faloutsos, LeskovecECML/PKDD 2007
Part 3-59
Triads on connector nodesWe consider 55 features total
CMU SCS
Experimental setupQuery Results Projection on the web graph
Experimental setup
Q• -- -- ----• --- --- ----• ------ ---• ----- --- --• ------ -----• ------ -----
Query projection graph Query connection graph
Generate graphical
Faloutsos, LeskovecECML/PKDD 2007
Part 3-60features Construct
case libraryPredictions
106
![Page 109: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/109.jpg)
CMU SCS
Constructing case library forConstructing case library for machine learningg
Given a task of interestGenerate contextual subgraph and extract featuresextract featuresEach graph is labeled by target outcomeLearn statistical model that relates the features with the outcomeMake prediction on unseen graphs
Faloutsos, LeskovecECML/PKDD 2007
Part 3-61
CMU SCS
Experiments overviewExperiments overviewGi t f h lt tGiven a set of search results generate projection and connection graphs and their featuresfeatures
Predict quality of a search result set Discriminate top20 vs. top40to60 resultsPredict rating of highest rated document in the set
Predict user behaviorPredict user behaviorPredict queries with high vs. low reformulation probabilityp yPredict query transition (generalization vs. specialization)Predict direction of the transition
Faloutsos, LeskovecECML/PKDD 2007
Part 3-62
Predict direction of the transition
107
![Page 110: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/110.jpg)
CMU SCS
Experimental detailsExperimental detailsF tFeatures
55 graphical featuresNote we use only graph features no contentNote we use only graph features, no content
LearningWe use probabilistic decision trees (“DNet”)We use probabilistic decision trees ( DNet )
Report classification accuracy using 10-fold cross validation Compare against 2 baselines
Marginals: Predict most common classRankNet: use 350 traditional features (document, anchor text, and basic hyperlink features)
Faloutsos, LeskovecECML/PKDD 2007
Part 3-63
CMU SCS
Search results qualitySearch results quality
D t tDataset:30,000 queriesTop 20 results for each
Each result is labeled by a human judge using a 6-point scale from "Perfect" to "Bad"
Task:Predict the highest rating in the set of results
6-class problem2-class problem: “Good” (top 3 ratings) vs. “P ” (b tt 3 ti )
Faloutsos, LeskovecECML/PKDD 2007
Part 3-64“Poor” (bottom 3 ratings)
108
![Page 111: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/111.jpg)
CMU SCS
Search quality: the taskSearch quality: the task
Predict the rating of the top result in the set
Predict “Good” Predict “Poor”Faloutsos, LeskovecECML/PKDD 2007
Part 3-65
Predict Good Predict Poor
CMU SCS
Search quality: resultsSearch quality: resultsP di t t h ti iPredict top human rating in the set
Binary classification: GoodAttributes URL
GraphDomain Graph
Binary classification: Good vs. Poor
10-fold cross validation classification accuracy
Marginals 0.55 0.55
RankNet 0.63 0.60classification accuracy
Observations:
RankNet 0.63 0.60
Projection 0.80 0.64
Web Projections outperform both baseline methodsJust projection graph already
Connection 0.79 0.66
Projection + 0 82 0 69Just projection graph already performs quite wellProjections on the URL graph perform better
Connection 0.82 0.69
All 0.83 0.71
Faloutsos, LeskovecECML/PKDD 2007
Part 3-66
perform better
109
![Page 112: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/112.jpg)
CMU SCS
Search quality: the modelSearch quality: the modelThe learned model shows graph properties of good result setsresult setsGood result sets have:
Search result nodes areSearch result nodes are hub nodes in the graph (have large degrees)S ll dSmall connector node degreesBig connected componentg pFew isolated nodes in projection graphF t d
Faloutsos, LeskovecECML/PKDD 2007
Part 3-67Few connector nodes
CMU SCS
Predict user behaviorPredict user behavior
DatasetQuery logs for 6 weeks35 million unique queries, 80 million total query reformulationsWe only take queries that occur at least 10We only take queries that occur at least 10 timesThis gives us 50,000 queries and 120,000 g , q ,query reformulations
TaskPredict whether the query is going to be reformulated
Faloutsos, LeskovecECML/PKDD 2007
Part 3-68
110
![Page 113: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/113.jpg)
CMU SCS
Query reformulation: the taskQuery reformulation: the taskGi d di j ti dGiven a query and corresponding projection and connection graphsPredict whether query is likely to be reformulated
Q t lik l t b f l t dFaloutsos, LeskovecECML/PKDD 2007
Part 3-69
Query not likely to be reformulated Query likely to be reformulated
CMU SCS
Query reformulation: resultsQuery reformulation: resultsOb tiObservations:
Gradual improvement as using more features
Attributes URL Graph
Domain Graph
as using more featuresUsing Connection graph features helps
Marginals 0.54 0.54g p pURL graph gives better performance
Projection 0.59 0.58
Connection 0.63 0.59We can also predict type of reformulation ( i li ti
Projection + Connection 0.63 0.60
(specialization vs. generalization) with 0 80 accuracy
Connection
All 0.71 0.67
Faloutsos, LeskovecECML/PKDD 2007
Part 3-70
0.80 accuracy
111
![Page 114: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/114.jpg)
CMU SCS
Query reformulation: the modelQuery reformulation: the model
Queries likely to be reformulated have:
Search result nodes have low degreeConnector nodes are hubsM t dMany connector nodesResults came from many different domainsmany different domainsResults are sparsely knit
Faloutsos, LeskovecECML/PKDD 2007
Part 3-71
knit
CMU SCS
Query transitionsQuery transitions
Predict if and how will user transform the queryq y
transition
Q: Strawberry Q: Strawberry shortcake
Faloutsos, LeskovecECML/PKDD 2007
Part 3-72
Q: Strawberry shortcake
Q: Strawberry shortcake pictures
112
![Page 115: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/115.jpg)
CMU SCS
Query transitionQuery transition
With 75% accuracy we can say whether a query is likely to be reformulated: q y y
Def: Likely reformulated p(reformulated) > 0.6With 87% accuracy we can predictWith 87% accuracy we can predict whether observed transition is specialization or generalizationWith 76% we can predict whether the userWith 76% we can predict whether the user will specialize or generalize
Faloutsos, LeskovecECML/PKDD 2007
Part 3-73
CMU SCS
ConclusionConclusionW i t d d W b j tiWe introduced Web projections
A general approach of using context-sensitive sets of web pages to focus attention on relevant subsetof web pages to focus attention on relevant subsetof the web graphAnd then using rich graph-theoretic features of the g g psubgraph as input to statistical models to learn predictive models
We demonstrated Web projections using search result graphs for
P di ti lt t litPredicting result set qualityPredicting user behavior when reformulating queries
Faloutsos, LeskovecECML/PKDD 2007
Part 3-74
queries
113
![Page 116: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/116.jpg)
CMU SCS
Part 3 4: Center piecePart 3.4: Center piece subgraphsg p
CMU SCS
MasterMind ‘CePS’MasterMind – CePS
w/ Hanghang Tong, KDD 2006htong <at> cs.cmu.edu
Faloutsos, LeskovecECML/PKDD 2007
Part 3-76
114
![Page 117: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/117.jpg)
CMU SCS
Center Piece Subgraph(Ceps)Center-Piece Subgraph(Ceps)Gi Q dGiven Q query nodesFind Center-piece ( )b≤p ( )
App.App.Social NetworksLaw Inforcement BLaw Inforcement, …
Id
B
Idea:Proximity -> random walk A C
Faloutsos, LeskovecECML/PKDD 2007
Part 3-77with restarts
CMU SCS
C St d ANDCase Study: AND query
R A l Ji i HR. Agrawal Jiawei Han
V. Vapnik M. Jordan
Faloutsos, LeskovecECML/PKDD 2007
Part 3-78
115
![Page 118: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/118.jpg)
CMU SCS
C St d ANDCase Study: AND query
Faloutsos, LeskovecECML/PKDD 2007
Part 3-79
CMU SCS databases
ML/StatisticML/Statistics
2_SoftAndFaloutsos, LeskovecECML/PKDD 2007
Part 3-80
_query
116
![Page 119: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/119.jpg)
CMU SCS
DetailsDetails
Main idea: use random walk with restarts, to measure ‘proximity’ p(i,j) of node j to p y p( ,j) jnode i
Faloutsos, LeskovecECML/PKDD 2007
Part 3-81
CMU SCS
ExampleExample
5
P b (RW ill fi ll t t j)
411
12•Starting from 1
Prob (RW will finally stay at j)
310 13
g
•Randomly to neighbor
2 6
•Some p to return to 1
1 789
Faloutsos, LeskovecECML/PKDD 2007
Part 3-82
89
117
![Page 120: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/120.jpg)
CMU SCS
Individual Score CalculationIndividual Score Calculation
Q1 Q2 Q3Node 1 0.5767 0.0088
0.0088
5
Node 2Node 3Node 4Node 5
0.00880.1235 0.0076
0.00760 0283 0 0283
11 124
0.00760.00240.0333
Node 5Node 6Node 7Node 8
0.0283 0.0283 0.0283
0.0076 0.12350.0076
10 133
0.1260
0.1235
0.02830.0024
0.0076
Node 9Node 10Node 11Node 12
0.0088 0.5767 0.0088
0.0076 0.0076 0 1235
19 8
62
0.1260 0.0333
0 0088
7
Node 12Node 13
0.12350.0088 0.0088
0.57670.0333 0.0024
9 80.5767 0.0088
Faloutsos, LeskovecECML/PKDD 2007
Part 3-830.1260
0.1260 0.0024 0.0333
0 1260 0 0333
CMU SCS
Individual Score CalculationIndividual Score Calculation
Q1 Q2 Q3Node 1 0.5767 0.0088
0.0088
5
Node 2Node 3Node 4Node 5
0.00880.1235 0.0076
0.00760 0283 0 0283
11 124
0.00760.00240.0333
Node 5Node 6Node 7Node 8
0.0283 0.0283 0.0283
0.0076 0.12350.0076
10 133
0.1260
0.1235
0.02830.0024
0.0076
Node 9Node 10Node 11Node 12
0.0088 0.5767 0.0088
0.0076 0.0076 0 1235
19 8
62
0.1260 0.0333
0 0088
7
Node 12Node 13
0.12350.0088 0.0088
0.57670.0333 0.0024 Individual Score matrix
9 80.5767 0.0088
Faloutsos, LeskovecECML/PKDD 2007
Part 3-840.1260
0.1260 0.0024 0.0333
0 1260 0 0333
Individual Score matrix
118
![Page 121: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/121.jpg)
CMU SCS
AND: Combining ScoresAND: Combining Scores
Q: How to combine scores?A: Multiply…= prob. 3 random… prob. 3 random particles coincide on node jnode j
Faloutsos, LeskovecECML/PKDD 2007
Part 3-85
CMU SCS
K SoftAnd: Combining ScoresdetailsK_SoftAnd: Combining Scores
Generalization –SoftAND:
We want nodes close to k of Q (k<Q) query ( ) ynodes.
Q: How to do that?Q: How to do that?
Faloutsos, LeskovecECML/PKDD 2007
Part 3-86
119
![Page 122: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/122.jpg)
CMU SCS
K SoftAnd: Combining ScoresdetailsK_SoftAnd: Combining Scores
Generalization –softAND:
We want nodes close to k of Q (k<Q) query ( ) ynodes.
Q: How to do that?Q: How to do that? A: Prob(at least k-out-
of-Q will meet eachof-Q will meet each other at j)
Faloutsos, LeskovecECML/PKDD 2007
Part 3-87
CMU SCS
AND query vs K SoftAnd queryAND query vs. K_SoftAnd query
x 1e-4 5
0.4505
11 124
0.07100.10100.1010
10 133
0.1010
0.0710
0.22670.1010
0.0710
1 7
62
0.1010 0.1010
And Query 2_SoftAnd Query9 8
0.4505 0.4505
Faloutsos, LeskovecECML/PKDD 2007
Part 3-88
120
![Page 123: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/123.jpg)
CMU SCS
1 SoftAnd query = OR query0 0103
1_SoftAnd query = OR query
5
0.0103
11 124
0.13870.16170.1617
10 133
0.1617
0.08490.1617
3
62
0.1387 0.1387
1 7
9 80 0103
0.1617 0.1617
0.0103
Faloutsos, LeskovecECML/PKDD 2007
Part 3-89
0.0103 0.0103
CMU SCS
Challenges in CepsChallenges in Ceps
Q1: How to measure the importance?A: RWRA: RWR
Q2: How to do it efficiently?y
Faloutsos, LeskovecECML/PKDD 2007
Part 3-90
121
![Page 124: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/124.jpg)
CMU SCS
Graph Partition: Efficiency IssueGraph Partition: Efficiency Issue
St i htf dStraightforward waysolve a linear system: ti li t # ftime: linear to # of edges
ObservationSkewed distSkewed dist.communities
How to exploit them?Graph partition
Faloutsos, LeskovecECML/PKDD 2007
Part 3-91
p p
CMU SCS
Even better:Even better:
We can correct for the deleted edges (Tong+, ICDM’06, best paper award)( g , , p p )
Faloutsos, LeskovecECML/PKDD 2007
Part 3-92
122
![Page 125: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/125.jpg)
CMU SCS
Experimental SetupExperimental Setup
DatasetDBLP/authorshipDBLP/authorshipAuthor-Paper315k nodes1 8M edges1.8M edges
Faloutsos, LeskovecECML/PKDD 2007
Part 3-93
CMU SCS
Query Time vs. Pre-Computation Time
Log Query Time
•Quality: 90%+•Quality: 90%+ •On-line:
•Up to 150x speedupPre computation:
L P t ti Ti
•Pre-computation: •Two orders saving
Log Pre-computation Time
Faloutsos, LeskovecECML/PKDD 2007
Part 3-94
123
![Page 126: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/126.jpg)
CMU SCS
Query Time vs StorageQuery Time vs. Storage
Log Query Time
•Quality: 90%+•Quality: 90%+ •On-line:
•Up to 150x speedupPre storage:
L St
•Pre-storage: •Three orders saving
Log Storage
Faloutsos, LeskovecECML/PKDD 2007
Part 3-95
CMU SCS
ConclusionsB
Conclusions
Q1 H t th i t ?A C
Q1:How to measure the importance?A1: RWR+K SoftAnd_(Q2: How to find connection subgraph?A2:”Extract” Alg )A2: Extract Alg.)Q3:How to do it efficiently?A3:Graph Partition and Sherman-Morrison
~90% quality90% quality6:1 speedup; 150x speedup (ICDM’06, b p award)
Faloutsos, LeskovecECML/PKDD 2007
Part 3-96
b.p. award)
124
![Page 127: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/127.jpg)
CMU SCS
Part 3 5: FraudPart 3.5: Fraud detection on e-bayy
CMU SCS
E bay Fraud detectionE-bay Fraud detection
/ P l Ch &w/ Polo Chau &Shashank Pandit, CMU
‘non-delivery’ fraud:yseller takes $$ and disappears
Faloutsos, LeskovecECML/PKDD 2007
Part 3-98
and disappears
125
![Page 128: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/128.jpg)
CMU SCS
Idea: ‘Accomplices’, and Belief Propagation
3 t f d h t f d3 types of nodes: honest, fraud, accomplices‘accomplices’ never do fraud – they just give high ratings to fraudsters-to-beg g g
B.P.:If I am honest my neighbors are eitherIf I am honest, my neighbors are either honest or ‘accomplices’If I’m an accomplice, my neighbors are either honest or fraud
Faloutsos, LeskovecECML/PKDD 2007
Part 3-99...
CMU SCS
E bay Fraud detection NetProbeE-bay Fraud detection - NetProbe
Faloutsos, LeskovecECML/PKDD 2007
Part 3-100
126
![Page 129: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/129.jpg)
CMU SCS
ConclusionsConclusions
MasterMind -> CePS and random walkscommunities over time -> tensorscommunities over time > tensorsvirus propagation -> eigenvaluefraud detection -> belief propagation
Faloutsos, LeskovecECML/PKDD 2007
Part 3-101
CMU SCS
ReferencesReferences
Shashank Pandit, Duen Horng Chau, Samuel Wang, and Christos Faloutsosg,NetProbe: A Fast and Scalable System for Fraud Detection in Online AuctionFraud Detection in Online Auction Networks, WWW 2007.
Faloutsos, LeskovecECML/PKDD 2007
Part 3-102
127
![Page 130: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/130.jpg)
CMU SCS
ReferencesReferences
Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast Random Walk with Restart and Its Applications ICDM 2006, Hong Kong. Hanghang Tong, Christos Faloutsos Center-g g gPiece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA
Faloutsos, LeskovecECML/PKDD 2007
Part 3-103
CMU SCS
ReferencesReferencesJ L k J Kl i b d Ch i tJure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws Shrinking Diameters and PossibleLaws, Shrinking Diameters and Possible Explanations KDD 2005, Chicago, IL. ("Best Research Paper" award)Research Paper award). Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg Christos Faloutsos RealisticKleinberg, Christos Faloutsos Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, g p(ECML/PKDD 2005)
Faloutsos, LeskovecECML/PKDD 2007
Part 3-104
128
![Page 131: presented by September 21, 2007 Warsaw, Poland 2007... · 2007. 8. 10. · CMU SCS Mining Large Graphs:Mining Large Graphs: Laws and toolsLaws and tools Christos Faloutsos and Jure](https://reader036.vdocument.in/reader036/viewer/2022081619/60ebd44cdff1dd4ee844ca37/html5/thumbnails/131.jpg)
CMU SCS
ReferencesReferencesJi S D h T Ch i t F l tJimeng Sun, Dacheng Tao, Christos Faloutsos Beyond Streams and Graphs: Dynamic Tensor Analysis KDD 2006 Philadelphia PAAnalysis, KDD 2006, Philadelphia, PA Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos Less is More: Compact MatrixFaloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis Minnesota Apr 2007Minneapolis, Minnesota, Apr 2007. Web Projections: Learning from Contextual Subgraphs of the Web by Jure LeskovecSubgraphs of the Web, by Jure Leskovec, Susan Dumais, Eric Horvitz, WWW 2007
Faloutsos, LeskovecECML/PKDD 2007
Part 3-105
129