cs224w: social and information network analysis jure...
TRANSCRIPT
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
http://cs224w.stanford.edu
Today 6:00-7:30pm: SNAP.PY recitation, Nvidia Auditorium 9/26 4:15-5:45pm: Review of Probability, Gates B01
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2
Properties
Small diameter, Edge clustering
Patterns of signed edge creation
Viral Marketing, Blogosphere, Memetracking
Scale-Free
Densification power law, Shrinking diameters
Strength of weak ties, Core-periphery
Models
Erdös-Renyi model, Small-world model
Structural balance, Theory of status
Independent cascade model, Game theoretic model
Preferential attachment, Copying model
Microscopic model of evolving networks
Kronecker Graphs
Algorithms
Decentralized search
Models for predicting edge signs
Influence maximization, Outbreak detection, LIM
PageRank, Hubs and authorities
Link prediction, Supervised random walks
Community detection: Girvan-Newman, Modularity
For example, last time we talked about Observations and Models for the Web graph: 1) We took a real system: the Web 2) We represented it as a directed graph 3) We used the language of graph theory Strongly Connected Components
4) We designed a computational experiment: Find In- and Out-components of a given node v
5) We learned something about the structure of the Web: BOWTIE!
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3
v
Out(v)
Undirected graphs Links: undirected
(symmetrical, reciprocal relations)
Undirected links: Collaborations Friendship on Facebook
Directed graphs Links: directed
(asymmetrical relations)
Directed links: Phone calls Following on Twitter
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4
A
B
D
C
L
M F
G
H
I
A G
F
B C
D
E
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5
Aij = 1 if there is a link from node i to node j
Aij = 0 otherwise
=
0111100010011010
A
=
0110000000011000
A
Note that for a directed graph (right) the matrix is not symmetric.
4
1 2 3
4
1 2 3
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6
Und
irect
ed
A
Node degree, ki: the number of edges adjacent to node i
Dire
cted
A G
F
B C
D
E
In directed networks we define an in-degree and out-degree. The (total) degree of a node is the sum of in- and out-degrees.
2=inCk 1=out
Ck 3=Ck
outin kk =
Avg. degree:
Source: Node with kin = 0 Sink: Node with kout = 0 N
Ek =
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7
2)1(
2max−
=
=
NNNE
The maximum number of edges in an undirected graph on N nodes is
An undirected graph with the number of edges E = Emax is called a complete graph, and its average degree is N-1
Most real-world networks are sparse E << Emax (or k << N-1)
WWW (Stanford-Berkeley): N=319,717 ⟨k⟩=9.65 Social networks (LinkedIn): N=6,946,668 ⟨k⟩=8.87 Communication (MSN IM): N=242,720,596 ⟨k⟩=11.1 Coauthorships (DBLP): N=317,080 ⟨k⟩=6.62 Internet (AS-Skitter): N=1,719,037 ⟨k⟩=14.91 Roads (California): N=1,957,027 ⟨k⟩=2.82 Proteins (S. Cerevisiae): N=1,870 ⟨k⟩=2.39 (Source: Leskovec et al., Internet Mathematics, 2009)
Consequence: Adjacency matrix is filled with zeros! (Density of the matrix (E/N2): WWW=1.51×10-5, MSN IM = 2.27×10-8)
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8
Unweighted (undirected)
Weighted (undirected)
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9
3
1
4
2
Aij =
0 1 1 01 0 1 11 1 0 00 1 0 0
3
1
4
2
Aij =
0 2 0.5 02 0 1 4
0.5 1 0 00 4 0 0
NEkAnonzeroE
AAAN
jiij
jiijii
2)(21
0
1,==
==
∑=
Examples: Friendship, Hyperlink Examples: Collaboration, Internet, Roads
Self-edges (self-loops) (undirected)
Multigraph (undirected)
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10
3
1
4
2 3
1
4
2
Aij =
1 1 1 01 0 1 11 1 0 00 1 0 1
∑ ∑≠= =
+=
=≠N
jiji
N
iiiij
jiijii
AAE
AAA
,1, 121
0
NEkAnonzeroE
AAAN
jiij
jiijii
2)(21
0
1,==
==
∑=
Aij =
0 2 1 02 0 1 31 1 0 00 3 0 0
Examples: Proteins, Hyperlinks Examples: Communication, Collaboration
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11
WWW >
Facebook friendships >
Citation networks >
Collaboration networks >
Mobile phone calls >
Protein Interactions >
> directed multigraph with self-edges > undirected, unweighted > unweighted, directed, acyclic > undirected multigraph or weighted graph > directed, (weighted?) multigraph > undirected, unweighted with self-interactions
Bipartite graph is a graph whose nodes can be divided into two disjoint sets U and V such that every link connects a node in U to one in V; that is, U and V are independent sets
Examples: Authors-to-papers (they authored) Actors-to-Movies (they appeared in) Users-to-Movies (they rated)
“Folded” networks: Author collaboration networks Movie co-rating networks
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12
U V
A
B
C
D
E
A B
C
D
E
Folded version of the graph above
Degree distribution P(k): Probability that a randomly chosen node has degree k Nk = # nodes with degree k
Normalized histogram: P(k) = Nk / N ➔ plot
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14
k
P(k)
1 2 3 4
0.1 0.2 0.3 0.4 0.5 0.6
k
Nk
A path is a sequence of nodes in which each node is linked to the next one
Path can intersect itself and pass through the same edge multiple times E.g.: ACBDCDEG In a directed graph a path
can only follow the direction of the “arrow”
Pn = {i0,i1,i2,...,in}
Pn = {(i0 ,i1),(i1,i2 ),( i2 ,i3 ),...,( in−1,in )}
C
A B
D E
H
F
G
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15
Distance (shortest path, geodesic) between a pair of nodes is defined as the number of edges along the shortest path connecting the nodes *If the two nodes are disconnected, the
distance is usually defined as infinite
In directed graphs paths need to follow the direction of the arrows
Consequence: Distance is not symmetric: hA,C ≠ hC, A
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17
B
A
D
C
B
A
D
C
hB,D = 2
hB,C = 1, hC,B = 2
Diameter: the maximum (shortest path) distance between any pair of nodes in a graph
Average path length for a connected graph (component) or a strongly connected (component of a) directed graph
Many times we compute the average only over the
connected pairs of nodes (that is, we ignore “infinite” length paths)
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18
∑≠
=iji
ijhE
h,max2
1where hij is the distance from node i to node j
Clustering coefficient: What portion of i’s neighbors are connected? Node i with degree ki Ci ∈ [0,1]
Average clustering coefficient:
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20
Ci=0 Ci=1/3 Ci=1
i i i
where ei is the number of edges between the neighbors of node i
∑=N
iiC
NC 1
Clustering coefficient: What portion of i’s neighbors are connected? Node i with degree ki
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21
where ei is the number of edges between the neighbors of node i
C
A B
D E
H
F
G
kB=2, eB=1, CB=2/2 = 1
kD=4, eD=2, CD=4/12 = 1/3
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22
Degree distribution: P(k)
Path length: h
Clustering coefficient: C
MSN Messenger activity in June 2006: 245 million users logged in 180 million users engaged in
conversations More than 30 billion
conversations More than 255 billion
exchanged messages
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24
25 9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Network: 180M people, 1.3B edges 26 9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Contact Conversation Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9/25/2014 27
Messaging as an undirected graph • Edge (u,v) if users u and v
exchanged at least 1 msg • N=180 million people • E=1.3 billion edges
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 30
Note: We plotted the same data as on the previous slide, just the axes are now logarithmic.
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 31
∑=
=kki
ik
ki
CN
C:
1Ck: average Ci of nodes i of degree k:
Avg. clustering of the MSN: C = 0.1140
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 32
Number of links between pairs of
nodes
Avg. path length 6.6 90% of the nodes can be reached in < 8 hops
Steps #Nodes 0 1
1 10
2 78
3 3,96
4 8,648
5 3,299,252
6 28,395,849
7 79,059,497
8 52,995,778
9 10,321,008
10 1,955,007
11 518,410
12 149,945
13 44,616
14 13,740
15 4,476
16 1,542
17 536
18 167
19 71
20 29
21 16
22 10
23 3
24 2
25 3
# no
des
as w
e do
BFS
out
of a
rand
om n
ode
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 33
Degree distribution:
Path length: 6.6
Clustering coefficient: 0.11
Heavily skewed avg. degree= 14.4
Are these values “expected”? Are they “surprising”?
To answer this we need a null-model!
P(k) = δ(k-4) ki =4 for all nodes 𝐶𝐶 = 1
𝑁𝑁(12
(𝑁𝑁 − 4) + 2 + 2 23) = ½ as N → ∞
Path length: ℎ𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑁𝑁−12
= 𝑂𝑂(𝑁𝑁)
Avg. shortest path-length: ℎ� < 2𝑁𝑁 𝑁𝑁−1
𝑁𝑁−12
𝑁𝑁(𝑁𝑁−1)2
= 𝑂𝑂(𝑁𝑁)
So, we have: Constant degree, Constant avg. clustering coeff. Linear avg. path-length
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 34
Note about calculations: We are interested in quantities as graphs get large (N→∞)
We will use big-O: f(x) = O(g(x)) as x→∞ if f(x) < g(x)*c for all x > x0 and some constant c.
P(k) = δ(k-6) k =6 for each inside node
C = 6/15 for inside nodes Path length:
In general, for lattices:
Average path-length is (D… lattice dimensionality)
Constant degree, constant clustering coefficient 9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 35
DNh /1≈
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37
What did we learn so far?
MSN Network is neither a chain
nor a grid
Erdös-Renyi Random Graphs [Erdös-Renyi, ‘60] Two variants: Gn,p: undirected graph on n nodes and each
edge (u,v) appears i.i.d. with probability p
Gn,m : undirected graph with n nodes, and m uniformly at random picked edges
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9/25/2014 39
What kinds of networks does such model produce?
n and p do not uniquely determine the graph! The graph is a result of a random process
We can have many different realizations given the same n and p
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 40
n = 10 p= 1/6
How likely is a graph on E edges? P(E): the probability that a given Gnp
generates a graph on exactly E edges:
where Emax=n(n-1)/2 is the maximum possible number of edges in an undirected graph of n nodes
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 41
EEE ppE
EEP −−
= max)1()( max
P(E) is exactly the Binomial distribution >>> Number of successes in a sequence of Emax independent yes/no experiments
What is expected degree of a node? Let Xv be a rnd. var. measuring the degree of node v We want to know:
For the calculation we will need: Linearity of expectation For any random variables Y1,Y2,…,Yk If Y=Y1+Y2+…Yk, then E[Y]= ∑i E[Yi]
An easier way: Decompose Xv to Xv= Xv,1+Xv,2+…+Xv,n-1 where Xv,u is a {0,1}-random variable
which tells if edge (v,u) exists or not
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42
How to think about this? • Prob. of node u linking to node v is p • u can link (flips a coin) to all other (n-1) nodes • Thus, the expected degree of node u is: p(n-1)
)( ][1
0jXPjXE v
n
jv == ∑
−
=
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 43
Degree distribution: P(k)
Path length: h
Clustering coefficient: C
What are values of these properties for Gnp?
Fact: Degree distribution of Gnp is Binomial. Let P(k) denote a fraction of nodes with
degree k:
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 44
knk ppk
nkP −−−
−= 1)1(
1)(
Select k nodes out of n-1
Probability of having k edges
Probability of missing the rest of the n-1-k edges
)1( −= npk By the law of large numbers, as the network size increases, the distribution becomes increasingly
narrow—we are increasingly confident that the degree of a node is in the vicinity of k.
P(k
)
Mean, variance of a binomial distribution
k
Remember:
Edges in Gnp appear i.i.d. with prob. p
So:
Then:
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 45
nk
nkp
kkkkpCii
ii ≈−
==−
−⋅=
1)1()1(
Clustering coefficient of a random graph is small. For a fixed avg. degree (that is p=1/n), C decreases with the graph size n.
)1(2
−=
ii
ii kk
eC
Number of distinct pairs of neighbors of node i of degree ki Each pair is connected
with prob. p
Where ei is the number of edges between i’s neighbors
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 46
Degree distribution:
Clustering coefficient: C=p=k/n
Path length: next!
knk ppk
nkP −−−
−= 1)1(
1)(
To prove the diameter of a Gnp we define few concepts Define: Random k-Regular graph Assume each node has k spokes (half-edges) k=1:
k=2:
k=3:
Randomly pair them up! Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9/25/2014 48
Graph is a set of pairs
Graph is a set of cycles
Arbitrarily complicated graphs
Graph G(V, E) has expansion α: if∀ S ⊆ V: # of edges leaving S ≥ α⋅ min(|S|,|V\S|)
Or equivalently:
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 49
|)\||,min(|#min SVS
SleavingedgesVS⊆
=α
S V \ S
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 50
S nodes α·S edges
S’ nodes α·S’ edges
(A big) graph with “good” expansion
|)\||,min(|#min SVS
SleavingedgesVS⊆
=α
Expansion is measure of robustness: To disconnect l nodes, we need to cut ≥ α⋅ l edges
Low expansion:
High expansion:
Social networks: “Communities”
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 51
|)\||,min(|#min SVS
SleavingedgesVS⊆
=α
k-regular graph (every node has degree k): Expansion is at most k (when S is a single node)
Is there a graph on n nodes (n→∞), of fixed max deg. k, so that expansion α remains const?
Examples: n×n grid: k=4: α =2n/(n2/4)→0
(S=n/2 × n/2 square in the center)
Complete binary tree: α →0 for |S|=(n/2)-1
Fact: For a random 3-regular graph on n nodes, there is some const α (α >0, independent. of n) such that w.h.p. the expansion of the graph is ≥ α
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 52
S
S
|)\||,min(|#min SVS
SleavingedgesVS⊆
=α
Fact: In a graph on n nodes with expansion α for all pairs of nodes s and t there is a path of O((log n) / α) edges connecting them.
Proof: Proof strategy: We want to show that from any
node s there is a path of length O((log n)/α) to any other node t
Let Sj be a set of all nodes found within j steps of BFS from s. How does Sj increase as a function of j?
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 53
s
S0
S1
S2
Proof (continued): Let Sj be a set of all nodes found
within j steps of BFS from s. We want to relate Sj and Sj+1
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 54
=+≥+ kS
SS jjj
α1
At most k edges “collide” at a node
Expansion
s
S0
S1
S2
|Sj| nodes
At least α|Sj| edges
|Sj+1| nodes
Each of degree k
Proof (continued): In how many steps of BFS
do we reach >n/2 nodes? Need j so that:
Let’s set: Then:
In 2k/α·log n steps |Sj| grows to Θ(n).
So, the diameter of G is O(log(n)/ α) 9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 55
s 2
1 nk
Sj
j ≥
+=
αt
αnkj 2log
=
221 2
2
log
log
nnk
n
nk
>=≥
+
αα n
nk
k2
2
log
log
21 ≥
+
ααClaim:
Remember n>0, α ≤ k then:
In log(n) steps, we reach >n/2 nodes
In log(n) steps, we reach >n/2 nodes
( )
nnnx
nn
ex
xk
22
2
22
logloglog
loglog11
211 and
: then 0 if
211:k if
>=
+
∞→=→
=+=
αα
α
In j steps, we reach >n/2 nodes
In j steps, we reach >n/2 nodes
⇒ Diameter = 2·j
⇒ Diameter = 2 log(n)
x
x xe
+=
∞→
11lim
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 56
Degree distribution:
Path length: O(log n)
Clustering coefficient: C = p = k / n
knk ppk
nkP −−−
−= 1)1(
1)(
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 57
Degree distribution:
Path length: 6.6 O(log n)
Clustering coefficient: 0.11 k / n ≈ 8·10-8
≈ 8.2
MSN Gnp
Are real networks like random graphs? Giant connected component: Average path length: Clustering Coefficient: Degree Distribution:
Problems with the random networks model: Degreed distribution differs from that of real networks Giant component in most real network does NOT
emerge through a phase transition No local structure – clustering coefficient is too low
Most important: Are real networks random? The answer is simply: NO! 9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 58
If Gnp is wrong, why did we spend time on it? It is the reference model for the rest of the class. It will help us calculate many quantities, that can
then be compared to the real data It will help us understand to what degree is a
particular property the result of some random process
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 59
So, while Gnp is WRONG, it will turn out to be extremly USEFUL!
What happens to Gnp when we vary p?
Remember, expected degree We want E[Xv] be independent of n So let: p=c/(n-1) Observation: If we build random graph Gnp
with p=c/(n-1) we have many isolated nodes Why?
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 61
c
n
nn e
ncpvP −
∞→
−− →
−−=−=
11
11)1(]0 degree has [
c
cx
x
cxn
ne
xxnc −
−−
∞→
⋅−−
∞→
=
−=
−=
−−
11111
1 limlim1
11
−=
nc
xUse substitution e
x
x xe
+=
∞→
11limBy definition:
pnXE v )1(][ −=
How big do we have to make p before we are likely to have no isolated nodes?
We know: P[v has degree 0] = e-c
Event we are asking about is: I = some node is isolated where Iv is the event that v is isolated
We have:
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 62
Nv
vII∈
=
( ) ( )∑∈∈
−=≤
=
Nvv
Nvv
cneIPIPIP Union bound
∑≤i
ii
i AA
Ai
We just learned: P(I) = n e-c
Let’s try: c = ln n then: n e-c = n e-ln n =n⋅1/n= 1 c = 2 ln n then: n e-2 ln n = n⋅1/n2 = 1/n
So if: p = ln n then: P(I) = 1 p = 2 ln n then: P(I) = 1/n → 0 as n→∞
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 63
Graph structure of Gnp as p changes:
Emergence of a Giant Component: avg. degree k=2E/n or p=k/(n-1) k=1-ε: all components are of size Ω(log n) k=1+ε: 1 component of size Ω(n), others have size Ω(log n)
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 64
0 1 p
1/(n-1) Giant component
appears
c/(n-1) Avg. deg const. Lots of isolated
nodes.
log(n)/(n-1) Fewer isolated
nodes.
2*log(n)/(n-1) No isolated nodes.
Empty graph
Complete graph
Gnp, n=100k, p(n-1) = 0.5 … 3
9/25/2014 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 65
Fraction of nodes in the largest component