damping effect on pagerank distributiontcliu/publication/hpec... · damping e ect on pagerank...
TRANSCRIPT
Damping Effect on PageRank Distribution
IEEE High Performace Extreme Computing, Waltham, MA, USA
September 26, 2018
Tiancheng Liu Yuchen Qian Xi Chen Xiaobai Sun
Department of Computer Science, Duke University, USA
Outline
� Personalized PageRank model:
invention by Brin and Page (1998)
in need of innovative extension
� The PageRank model family:
an analytic apparatus with increased
description power and scope
� Analysis:
damping effects on PageRank distributions
� Algorithm:
exploiting structures of the personalized,
stochastic Krylov (PSK) space
� Findings:
by experiments on real-world network data
Sparse graphs in sparse matrix representations
x1
x2
x3
x4
x5
x6 x7
x8
x9
x10
x11
x12
x13
x14x15
x16x17
x18
x19
x20
link graph G(V ,E)directed edge(u, v) ∈ E
2
4
6
8
10
12
14
16
18
20 2 4 6 8 10 12 14 16 18 20
2
4
6
8
10
12
14
16
18
20
2 4 6 8 10 12 14 16 18 20
adjacency matrix AA(v , u) = 1
din in-degreesdout out-degrees
2 4 6 8 10 12 14 16 18 20
2
4
6
8
10
12
14
16
18
20
probability transition matrix PP = A · diag(1./dout)
factor form in storage
1 / 26
Precursor: Personalized PageRank
Web surfing modeled as a random walk on Mα(v), a Markov chain with a personalized term S
Mα(v) = αdamping factor
Plink graph
+ (1− α)S , S = vpersonalized vector
eT
gathering vector
x1
x2
x3
x4
x5
x6 x7
x8
x9
x10
x11
x12
x13
x14x15
x16x17
x18
x19
x20
personalized Markov chain
= α
x1
x2
x3
x4
x5
x6 x7
x8
x9
x10
x11
x12
x13
x14x15
x16x17
x18
x19
x20
link graph
+(1− α)
x1
x2
x3
x4
x5
x6 x7
x8
x9
x10
x11
x12
x13
x14x15
x16x17
x18
x19
x20
personalized direct links
Bernoulli decision at each click:
follow P-links or S-links
with probability α ∈ (0, 1)
a.k.a. damping factor
The personalized term S :
direct links to v-nodes (yellow)
gathering/broadcasting
rank-1, stochastic 2 / 26
Precursor: Personalized PageRank
Web surfing modeled as a random walk on Mα(v), a Markov chain with a personalized term S
Mα(v) = αdamping factor
Plink graph
+ (1− α)S , S = vpersonalized vector
eT
gathering vector
2 4 6 8 10 12 14 16 18 20
2
4
6
8
10
12
14
16
18
20
= α0.85
2 4 6 8 10 12 14 16 18 20
2
4
6
8
10
12
14
16
18
20
+ (1− α)
0.15
5 10 15 20
2
4
6
8
10
12
14
16
18
20
Bernoulli decision at each click:
follow P-links or S-links
with probability α ∈ (0, 1)
a.k.a. damping factor
The personalized term S :
direct links to v-nodes (yellow)
gathering/broadcasting
rank-1, stochastic2 / 26
Equivalent expressions of PageRank distribution vector
Purpose: multi-aspect investigation for interpretation and computational analysis
1. Steady state distribution of Mα
Mαx =[αP + (1− α)veT
]x = x
the power method(2 4 6 8 10 12 14 16 18 20
2
4
6
8
10
12
14
16
18
20
)k
Mkα
2
4
6
8
10
12
14
16
18
20
x0
−→2
4
6
8
10
12
14
16
18
20
x
Asymptotic walk on Mα, memoryless of x0
2. Solution to sparse linear system
(I − αP)x = (1− α)v
many iterative solution methods
3. Explicit representation
x = (1− α)∑
k αk(Pkv)
in Neumann series with P, v , α
(1− α)∑k
αk(
2 4 6 8 10 12 14 16 18 20
2
4
6
8
10
12
14
16
18
20
link graph P
)k 2
4
6
8
10
12
14
16
18
20
v
−→2
4
6
8
10
12
14
16
18
20
x
Cumulative propagation of v on P
4. Differential transition equation
x(α) = [P(I − αP)−1 − (1− α)−1I ]x(α)
spectrum-based method
3 / 26
Outline
� Personalized PageRank model:
invention by Brin and Page (1998)
in need of innovative extension
� The PageRank model family:
an analytic apparatus with increased
description power and scope
� Analysis:
damping effects on PageRank distributions
� Algorithm:
exploiting structures of the personalized,
stochastic Krylov (PSK) space
� Findings:
by experiments on real-world network data
PageRank model family: characterizing various propagation patterns
Model description in equivalent
expressions:
� Propagation kernel functions
propagation patterns
� Cumulative propagation on P
� Linear systems
� Differential transitions
PageRank distribution response
to damping variation
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Geometric kernels (Brin-Page)
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Poisson kernels (Chung)
0 2 4 6 8 10 12 14 16 18 20
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Conway-Maxwell-Poisson kernels (slow)
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
Conway-Maxwell-Poisson kernels (fast)
0 2 4 6 8 10 12 14 16 18 20
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Negative Binomial kernels
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Logarithmic kernels
A few particular subfamilies of propagation kernel functions
4 / 26
Propagation kernel functions
Propagation kernel function fρ(λ)
fρ( λgraph
eigenvalue
) =∑k
wk(ρ)
discretepmf
λk
PageRank vector (model solution) with particular
network P and personalized distribution vector v
x = fρ(P)v =∑k
wk(ρ)
damping onk-th step
· Pkvk-th step
propagation
{wk(ρ)} : any probability mass function (pmf)
of variable ρ, w.i./w.o. additional parameters
10 -5 10 0 10 50
1
2
3
4
5
6
7
# o
f n
od
es (
bin
co
un
ts)
10 6
0.9
0
0.8
2
10-5
4
Bin
co
un
ts
106
100
0.7
6
105
10 -5 10 0 10 50
1
2
3
4
5
6
7
# o
f n
od
es (
bin
co
un
ts)
10 6
30
200
2
1010
-5
4
Bin
co
un
ts
106
100
6
105
10 -5 10 0 10 50
1
2
3
4
5
6
7
# o
f n
od
es (
bin
co
un
ts)
10 6
0.95
0.90
0.85
2
10-5
4
106
Bin
co
un
ts
0.8100
6
105
PageRank distributions of 3 propagation patternswith P for link graph Twitter(www) 1
1 H. Kwak et al. (2009) 5 / 26
Propagation pattern kernels : CMP sub-family
Conway-Maxwell-Poisson (CMP):
wk( ρ
dampingvariable
, νdampingspeed
) =ρk
(k!)ν Znormalization
constant
Damping speed parameter ν ≥ 0
ν =
0, geometric, (B-P, 1998)
1, Poisson, (Chung, 2007)
< 1, slow decaying with k
> 1, fast decaying with k
0 2 4 6 8 10 12 14 16 18 20
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
slow damping speed: 0 ≤ ν ≤ 1 (ρ = 0.9)including BP model and Chung’s model
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
fast damping speed: ν ≥ 1 (ρ = 5)
Slow and fast propagation patterns of CMP distribution
6 / 26
Propagation pattern kernels: NB sub-family
Negative Binomial (NB): step k
wk( ρ
dampingvariable
, rdistribution
shape
) =
(k + r − 1
k
)ρk(1− ρ)r
Distribution shape parameter r :
r =
1, geometric distribution
∞, Poisson distribution, with r · ρ(1−ρ)
= const
0 2 4 6 8 10 12 14 16 18 20
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Propagation patterns of NB distribution
7 / 26
Propagation pattern kernels: logarithmic distribution
Logarithmic: step k
wk(ρ) =−1
ln(1− ρ)
ρk
k, ρ ∈ (0, 1)
unique new model in the model family:
weight decay faster than geometric distribution
weight decay slower than Poisson distribution
no extra control parameters0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Propagation patterns of logarithmic distributions
8 / 26
Propagation pattern kernels: precursor models and new model
Precursor models:
Brin-Page1 model: geometric distribution
wk(α) = (1− α)αk
Chung’s2 model: Poisson distribution
wk(β) = e−ββk
k!
new model in the family:
log-γ model: logarithmic distribution
wk(γ) =−1
ln(1− γ)
γk
k
1 L. Page and S. Brin, 1998 2 F. Chung, PNAS, 2007
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
9 / 26
Cumulative propagation on P
link graph P andpersonalized vector v
2 4 6 8 10 12 14 16 18 20
2
4
6
8
10
12
14
16
18
20
P
2
4
6
8
10
12
14
16
18
20
v
2
4
6
8
10
12
14
16
18
20
v
2
4
6
8
10
12
14
16
18
20
Pv
2
4
6
8
10
12
14
16
18
20
P2v
· · ·
propagation on P
2
4
6
8
10
12
14
16
18
20
Pm−1v
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
2 4 6 8 10 12 14 16 18 20
geometric kernel (Brin-Page)
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
2 4 6 8 10 12 14 16 18 20
Poisson kernel (Chung)
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
2 4 6 8 10 12 14 16 18 20
Logarithmic kernel (log-γ)
0 2 4 6 8 10 12 14 16 18 20
0
0.05
0.1
0.15
0.2
0.25
2
4
6
8
10
12
14
16
18
20
x(α) = zα∑k
αk Pk v
0 2 4 6 8 10 12 14 16 18 20
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
2
4
6
8
10
12
14
16
18
20
x(β) = zβ
∑k
βk
k!Pk v
0 2 4 6 8 10 12 14 16 18 20
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
2
4
6
8
10
12
14
16
18
20
x(γ) = zγ∑k
γk
kPk v
10 / 26
Linear systems
Close-form expression of the coefficient matrix
Aρ(P)x = v , Aρ(P) = f −1ρ (P)
Particular instances
Brin-Page model:
Aα(P) = (1− α)−1(I − αP)
Chung’s model:Aβ(P) = e−β(I−P)
log-γ model:
Aγ(P) = ln(1− γ) ln−1(I − γP)
– Except the Brin-Page model, explicit forma-
tion of the coefficient matrix is non-necessary
– This formulation is used for derivation of the
differential transition equation (next)
11 / 26
Differential transition
Effect of damping variation in one model:
Node-wise trajectory of PageRank vector x(ρ)
x(ρ) =d
dρx(ρ) =
∂
∂ρfρ(P)v = Qρ(P)x(ρ)
at any particular value of ρ
Brin-Page model:
Qα(P) = [P(I − αP)−1 − (1− α)−1I ]
Chung’s model:Q = −(I − P)
log-γ model:
Qγ(P) =(1− γ)−1
ln(1− γ)I − P(I − γP)−1(ln(I − γP))−1
◦ Matrix-vector multiplication for Chung’s model
◦ Linear-solver may be used once again for Brin-
Page model
• An efficient spectrum-based algorithm for all
models, without eigen-decomposition of P
12 / 26
Outline
� Personalized PageRank model:
invention by Brin and Page (1998)
in need of innovative extension
� The PageRank model family:
an analytic apparatus with increased
description power and scope
� Analysis:
damping effects on PageRank distributions
� Algorithm:
exploiting structures of the personalized,
stochastic Krylov (PSK) space
� Findings:
by experiments on real-world network data
Inter-model correspondence
statistically similar damping level of propagation on P:
at expected propagation weight center
µ(wk(ρ)) =∑k∈Nw
k · wk(ρ)
Brin-Page ←→ Chung’sα
1− α= β
Brin-Page ←→ log-γα
1− α=
(γ
1− γ
)−1
ln(1− γ)
0 2 4 6 8 10 12 14 16 18 20
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
pmfs associated with Brin-Page, Chung’s,and log-γ model, at corresponding dampingvariables (α = 0.85, β = 5.66, γ = 0.94)
13 / 26
Intra-model damping effect by KL divergence and its derivative
Aggregated effect of damping variation: KL divergence
of PageRank vectors (scalar)
KL(x(ρ), x(ρo)) =∑i
xi (ρ) logxi (ρ)
xi (ρo)
d
dρKL(x(ρ), x(ρo)) = x(ρ)(log x(ρ)− log x(ρo) + e)
0.7 0.75 0.8 0.85 0.9 0.95 1
0
0.05
0.1
0.15
0.2
-1
0
1
2
3
4
5
6
7
KL divergence
analytical derivative
emprical derivative, =0.004
emprical derivative, =0.002
Damping variation in KL and dKL/dρ(Twitter-www, Brin-Page model)
* dKL/dρ in red, KL in blue
* reference damping factor denote as ρo 14 / 26
Outline
� Personalized PageRank model:
invention by Brin and Page (1998)
in need of innovative extension
� The PageRank model family:
an analytic apparatus with increased
description power and scope
� Analysis:
damping effects on PageRank distributions
� Algorithm:
exploiting structures of the personalized,
stochastic Krylov (PSK) space
� Findings:
by experiments on real-world network data
Personalized, stochastic Krylov space
Personalized, stochastic Krylov (PSK) space:
PSK(P, v) = span{v ,Pv ,P2v , · · · ,Pkv , · · · },v ≥ 0, eTv = 1
Properties:
◦ Any convex combination of the Krylov vectors is
a probability distribution
◦ The same PSK space is shared by all models,
housing all model solutions and their trajectories
◦ The PSK space is of finite dimension m
◦ Let K = [v ,Pv ,P2, · · · ,Pm−1v ] and K = QR.
There exists a Hessenberg matrix H such that
PQ = QH, Qe1 = v and that g(P)v = Q g(H)e1for any function g
link graph P andpersonalized vector v
2 4 6 8 10 12 14 16 18 20
2
4
6
8
10
12
14
16
18
20
P
2
4
6
8
10
12
14
16
18
20
v
2
4
6
8
10
12
14
16
18
20
v
2
4
6
8
10
12
14
16
18
20
Pv
2
4
6
8
10
12
14
16
18
20
P2v
· · ·
Krylov vectors
2
4
6
8
10
12
14
16
18
20
Pm−1v
PageRank vector
x(ρ) = fρ(P)v ∈ PSK(P, v)
PageRank vector trajectory
x(ρ) = Qρ(P)x(ρ) ∈ PSK(P, v)
15 / 26
Efficient algorithm for damping effect analysis
intra-model, inter-model damping variations, across all models under consideration
based on the PSK properties, without eigen-decomposition
P
n×n
v
n×1
K
n×m
Krylov matrixQ
n×m
R
m×m
H
m×m
Hessenbergmatrix
{x(ρ)}{x(ρ)}
PageRankdistributions
PageRankdistributiontrajectories
Krylov space
construction QR decomp.
PQ = QH
g(P)v = Qg(H)e1
16 / 26
Outline
� Personalized PageRank model:
invention by Brin and Page (1998)
in need of innovative extension
� The PageRank model family:
an analytic apparatus with increased
description power and scope
� Analysis:
damping effects on PageRank distributions
� Algorithm:
exploiting structures of the personalized,
stochastic Krylov (PSK) space
� Findings:
by experiments on real-world network data
Data: real-world large social and knowledge network snapshots
Total #nodes #nodes in LSCC [max(dout), µ(dout),max(din)]
Google 1 875,713 434,818 [4209, 8.86, 382]
Wikilink 2 12,150,976 7,283,915 [7527, 50.48, 920207]
DBpedia 3 18,268,992 3,796,073 [8104, 26.76, 414924]
Twitter(www) 4 41,652,230 33,479,734 [2936232, 42.65, 768552]
Twitter(mpi) 5 52,579,682 40,012,384 [778191, 47.57, 3438929]
Friendster 6 68,349,466 48,928,140 [3124, 32.76, 3124]
1 Google Inc. (2002) 2 Wikipedia Foundation (2017) 3 DBpedia (2017)4 H. Kwak et al. (2009) 5 M. Cha et al. (2010) 6 ArchiveTeam (2011) 17 / 26
Sparse real-world networks under Dulmage-Mendelsohn permutation
200 400 600 800
100
200
300
400
500
600
700
800
0
0.5
1
1.5
2
2.5
3
3.5
Google (τ = 8)5000 10000 15000
2000
4000
6000
8000
10000
12000
14000
16000
180000
0.5
1
1.5
2
2.5
3
3.5
4
DBpedia (τ = 2)2000 4000 6000 8000 10000 12000
2000
4000
6000
8000
10000
12000 0
1
2
3
4
5
Wikilink (τ = 2)
1 2 3 4
104
0.5
1
1.5
2
2.5
3
3.5
4
104
0
1
2
3
4
5
Twitter(www) (τ = 2)
1 2 3 4 5
104
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
104
0
1
2
3
4
5
Twitter(mpi) (τ = 3)
1 2 3 4 5 6
104
1
2
3
4
5
6
104
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Friendster (τ = 3)
each point represent a 1000× 1000 block, a block with ≥ τ non-zeros is colored blue18 / 26
Personalized stochastic Krylov space: small-world phenomenon
0 10 20 30 40 50 60 70 80
-16
-14
-12
-10
-8
-6
-4
-2
0
Google (m = 62)
0 10 20 30 40 50 60 70 80
-16
-14
-12
-10
-8
-6
-4
-2
0
DBpedia (m = 19)
0 10 20 30 40 50 60 70 80
-16
-14
-12
-10
-8
-6
-4
-2
0
Wikilink (m = 27)
0 10 20 30 40 50 60 70 80
-16
-14
-12
-10
-8
-6
-4
-2
0
Twitter(www) (m = 25)
0 10 20 30 40 50 60 70 80
-16
-14
-12
-10
-8
-6
-4
-2
0
Twitter(mpi) (m = 30)
0 10 20 30 40 50 60 70 80
-16
-14
-12
-10
-8
-6
-4
-2
0
Friendster (m = 24)
Effective PSK(P, v) dimension m by Rii in QR decomposition
19 / 26
Damping effect: KL and dKL/dρ across models
0.7 0.75 0.8 0.85 0.9 0.95 1
0
0.05
0.1
0.15
0.2
-1
0
1
2
3
4
5
6
7
KL divergence
analytical derivative
emprical derivative, =0.004
emprical derivative, =0.002
α0 = 0.85
0.75 0.8 0.85 0.9 0.95 1
0
0.01
0.02
0.03
0.04
0.05
0.06
0
5
10
15KL divergence
analytical derivative
emprical derivative, =0.004
emprical derivative, =0.002
γ0 = 0.94146
0 5 10 15 20 25 30 35
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
KL divergence
analytical derivative
emprical derivative, =0.004
emprical derivative, =0.002
β0 = 5.6
0.7 0.75 0.8 0.85 0.9 0.95 1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
-3
-2
-1
0
1
2
KL divergence
analytical derivative
emprical derivative, =0.004
emprical derivative, =0.002
α0 = 0.95
0.75 0.8 0.85 0.9 0.95 1
-0.05
0
0.05
0.1
0.15
0.2
-1
0
1
2
3
4
KL divergence
analytical derivative
emprical derivative, =0.004
emprical derivative, =0.002
γ0 = 0.98831
0 5 10 15 20 25 30 35
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
KL divergence
analytical derivative
emprical derivative, =0.004
emprical derivative, =0.002
β0 = 19
Twitter(www) dataset
substantial different sensitivity
patterns across model
B-P model and log-γ model are
sensitive when damping parameter
approaches 1
Chung’s model is less sensitive with
damping parameter change,
especially with large β
20 / 26
Damping effect: KL and dKL/dρ across datasets
0.7 0.75 0.8 0.85 0.9 0.95 1
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0
2
4
6
8
10
12
14KL divergence
analytical derivative
emprical derivative, =0.008
emprical derivative, =0.002
0.7 0.75 0.8 0.85 0.9 0.95 1
-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
-1
0
1
2
3
4
5
KL divergence
analytical derivative
emprical derivative, =0.004
emprical derivative, =0.002
DBpedia
0.7 0.75 0.8 0.85 0.9 0.95 1
-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
-1
0
1
2
3
4
5
6
7KL divergence
analytical derivative
emprical derivative, =0.004
emprical derivative, =0.002
Wikilink
0.7 0.75 0.8 0.85 0.9 0.95 1
0
0.05
0.1
0.15
0.2
-1
0
1
2
3
4
5
6
7
KL divergence
analytical derivative
emprical derivative, =0.004
emprical derivative, =0.002
Twitter(www)
0.7 0.75 0.8 0.85 0.9 0.95 1
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
-1
0
1
2
3
4
5
6KL divergence
analytical derivative
emprical derivative, =0.004
emprical derivative, =0.002
Twitter(mpi)
0.7 0.75 0.8 0.85 0.9 0.95 1
-0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
-0.5
0
0.5
1
1.5
2
2.5
3
3.5KL divergence
analytical derivative
emprical derivative, =0.004
emprical derivative, =0.002
Friendster
Brin-Page model, α0 = 0.85
similar trend across 6 datasets
low variation with relatively small α
substantially larger variation when
α −→ 1
21 / 26
Intra-model variation: PageRank vector profiles across models
10 -5 10 0 10 50
1
2
3
4
5
6
7
# o
f n
od
es (
bin
co
un
ts)
10 6
Brin-Page model
10 -5 10 0 10 50
1
2
3
4
5
6
7
# o
f n
od
es (
bin
co
un
ts)
10 6
Chung’s model
10 -5 10 0 10 50
1
2
3
4
5
6
7
# o
f n
od
es (
bin
co
un
ts)
10 6
log-γ model
0.9
0
0.8
2
10-5
4
Bin
co
un
ts
106
100
0.7
6
105
30
200
2
1010
-5
4B
in c
ou
nts
106
100
6
105
0.95
0.90
0.85
2
10-5
4
106
Bin
co
un
ts
0.8100
6
105
PageRank vector profile: normalized histogram of PageRank valuesTwitter(www) dataset
22 / 26
Intra-model variation: PageRank vector profiles across datasets
10 -6 10 -4 10 -2 10 0 10 20
1
2
3
4
5
6
7
# o
f n
od
es (
bin
co
un
ts)
10 4
10 -5 10 0 10 50
1
2
3
4
5
6
7
8
# o
f n
od
es (
bin
co
un
ts)
10 5
DBpedia
10 -5 10 0 10 50
2
4
6
8
10
12
14
# o
f n
od
es (
bin
co
un
ts)
10 5
Wikilink
10 -5 10 0 10 50
1
2
3
4
5
6
7
# o
f n
od
es (
bin
co
un
ts)
10 6
Twitter(www)
10 -5 10 0 10 50
1
2
3
4
5
6
7
8
9
10
# o
f n
od
es (
bin
co
un
ts)
10 6
Twitter(mpi)
10 -8 10 -6 10 -4 10 -2 10 0 10 20
1
2
3
4
5
6
7
8
# o
f nodes (
bin
counts
)
10 6
Friendster
Brin-Page model, α0 = 0.8523 / 26
Recap
Intellectual merits
◦ Rich family of PageRank models
capturing, differentiating various activities
and propagation patterns with
quantitative form and speed
◦ Unified analysis of damping effects
easily instantiated on particular network P
and personalized vector v
◦ The PSK space
residence for all model solutions,
foundation for efficient model solution
methods
Experimental findings
� Model utility
inter-model difference in PageRank
distribution profile is much greater than
intra-model difference
� Bump/peak in PageRank distribution
single, with minority support
� The PSK dimension
with small-world networks, the dimension
of personalized, stochastic Krylov space is
low, which leads to upper bounds on
algorithm complexity
24 / 26
Recap
Intellectual merits
◦ Rich family of PageRank models
capturing, differentiating various activities
and propagation patterns with
quantitative form and speed
◦ Unified analysis of damping effects
easily instantiated on particular network P
and personalized vector v
◦ The PSK space
residence for all model solutions,
foundation for efficient model solution
methods
Experimental findings
� Model utility
inter-model difference in PageRank
distribution profile is much greater than
intra-model difference
� Bump/peak in PageRank distribution
single, with minority support
� The PSK dimension
with small-world networks, the dimension
of personalized, stochastic Krylov space is
low, which leads to upper bounds on
algorithm complexity