Model Reduction forEdge-Weighted Personalized PageRank
D. Bindel
11 Apr 2016
D. Bindel Purdue 11 Apr 2016 1 / 33
The Computational Science & Engineering Picture
Application
Analysis Computation
MEMSSmart gridsNetworks
Linear algebraApproximation theorySymmetry + structure
HPC / cloudSimulatorsSolvers
D. Bindel Purdue 11 Apr 2016 2 / 33
Collaborators
Wenlei Xie LinkedInDavid Bindel CornellJohannes Gehrke MicrosoftAl Demers Cornell
D. Bindel Purdue 11 Apr 2016 3 / 33
PageRank Problem
Unweighted Node weighted Edge weighted
Goal: Find “important” vertices in a network
Basic approach uses only topology
Weights incorporate prior info about important nodes/edges
D. Bindel Purdue 11 Apr 2016 4 / 33
PageRank Model
Unweighted Node weighted Edge weighted
Random surfer model: x (t+1) = ↵Px (t) + (1� ↵)v where P = AD�1
Stationary distribution: Mx = b where M = (I � ↵P), b = (1� ↵)v
D. Bindel Purdue 11 Apr 2016 5 / 33
Edge Weight vs Node Weight Personalization
vi = vi (w)
w 2 Rd
�ij = �ij(w)
Introduce personalization parameters w 2 Rd in two ways:Node weights: M x(w) = b(w)Edge weights: M(w) x(w) = b
D. Bindel Purdue 11 Apr 2016 6 / 33
Edge Weight vs Node Weight Personalization
Node weight personalization is well-studied
Topic-sensitive PageRank: fast methods based on linearity
Localized PageRank: fast methods based on sparsity
Some work on edge weight personalization
ObjectRank/ScaleRank: personalize weights for di↵erent edge types
But lots of work incorporates edge weights without personalization
Our goal: General, fast methods for edge weight personalization
D. Bindel Purdue 11 Apr 2016 7 / 33
Edge Weight Parameterizations
Di↵erent ways to personalize =) di↵erent algorithm options
1 Linear: Take an edge of type i with probability ↵wi
P(w) =dX
i=1
wiP(i)
2 Scaled linear: Take an edge with probability / (linear) edge weight
P(w) = A(w)D(w)�1, A(w) =dX
i=1
wiA(i), D(w) =
dX
i=1
wiD(i),
3 Fully nonlinear: Both A and P depend nonlinearly on w
D. Bindel Purdue 11 Apr 2016 8 / 33
Model Reduction
=
Expensive full model(Mx = b)
⇡ U
Reduced basis =
Reduced model(M̃y = b̃)
Approximation ansatz
Model reduction procedure from physical simulation world:
O✏ine: Construct reduced basis U 2 Rn⇥k
O✏ine: Choose � k equations to pick approximation x̂ = Uy
Online: Solve for y(w) given w and reconstruct x̂
D. Bindel Purdue 11 Apr 2016 9 / 33
Reduced Basis Construction: SVD (aka POD/PCA/KL)
⇡ U
⌃ V T
Snapshot matrix
x1 x2 . . . xr
w2wr
w1 Sample points
D. Bindel Purdue 11 Apr 2016 10 / 33
Choosing Good Spaces
What is the best possible approximation x̂ = Uy?
miny
kUy � x(w)k2 �k+1kxk2 + einterp(w)
where
einterp(w) =
������x(w)�
rX
j=1
x(wj)cj(w)
������2
is error in an interpolant.
Pay attention where x has large derivatives!
Also suggests sampling strategies (sparse grids, adaptive methods)
D. Bindel Purdue 11 Apr 2016 11 / 33
Approximation Ansatz
Want r = MUy � b ⇡ 0. Consider two approximation conditions:
Method Ansatz Properties
Bubnov-Galerkin UT r = 0 Good accuracy empiricallyFast for P(w) linear
DEIM min krIk Fast even for nonlinear P(w)(collocation) Complex cost/accuracy tradeo↵
Petrov-Galerkin a bit more accurate than Bubnov-Galerkin – future work.
D. Bindel Purdue 11 Apr 2016 12 / 33
Bubnov-Galerkin Method
UT
M U
y
� b = 0.
Linear case: wi = probability of transition with edge type i
M(w) = I � ↵
X
i
wiP(i)
!, M̃(w) = I � ↵
X
i
wi P̃(i)
!
where we can precompute P̃(i) = UTP(i)U
Nonlinear: Cost to form M̃(w) comparable to cost of PageRank!
D. Bindel Purdue 11 Apr 2016 13 / 33
Discrete Empirical Interpolation Method (DEIM)
M U
y
� b
I
= 0.Equations in I
Ansatz: Minimize krIk for chosen indices IOnly need a few rows of M (and associated rows of U)
If given A(w), also need column sums for normalization.
Di↵erence from physics applications: high-degree nodes!
D. Bindel Purdue 11 Apr 2016 14 / 33
Error Behavior
Similar error analysis framework for both Galerkin and DEIM
Consistency + Stability = Accuracy
Consistency: Does the subspace contain good approximants?
Stability: Is the approximation subproblem far from singular?
Characterize stability by a quasi-optimality condition
kx � Uyk minz
Ckx � Uzk
D. Bindel Purdue 11 Apr 2016 15 / 33
Standard Quasi-Optimality Approach
Define a solution projector:
⇧x = approximate solution when true solution is x
Note that ⇧U = U.
The error projector I � ⇧ maps a true solution to error
e = x � ⇧x = (I � ⇧)x
Note that (I � ⇧)U = 0.
If emin = x � Uz is the smallest norm error in the space, then
e = (I � ⇧)x � (I � ⇧)Uz = (I � ⇧)emin
Therefore, a bound on kI �⇧k 1+ k⇧k establishes quasi-optimality.
D. Bindel Purdue 11 Apr 2016 16 / 33
Quasi-Optimality: Galerkin and DEIM
Galerkin : ⇧ = UM̃�1W TM M̃ ⌘ W TMU
DEIM : ⇧ = UM̃†MI,: M̃ ⌘ MI,:U
Key to stability: M̃ far from singular
Suggests pivoting schemes for “good” I in DEIM
Also helps to explicitly enforceP
i x̂i = 1
Can bound k⇧k o✏ine for Galerkin + linear parameterization.
D. Bindel Purdue 11 Apr 2016 17 / 33
Interpolation Costs
Consider subgraph relevant to one interpolation equation:
i 2 I
Incoming neighbors of i
. . .1/3 1/50
Really care about weights of edges incident on I
Need more edges to normalize (unless A(w) linear)
Cost to include i 2 I: |{j , k : aij 6= 0 and akj 6= 0}|
High in/out degree are expensive but informative
D. Bindel Purdue 11 Apr 2016 18 / 33
Interpolation Cost and Accuracy
Key question: how to choose I to balance cost vs accuracy?
Want to pick I once, so look at rows of
Z =⇥M(w1)U M(w2)U . . .
⇤
for sample parameters w (i).
Pivoted QR-like greedy row selection with proxy measures for
Cost: Nonzeros in row (+ assoc columns if normalization required)
Accuracy: Residual when projecting row onto those previously selected
Several heuristics for cost/accuracy tradeo↵ (see paper)
D. Bindel Purdue 11 Apr 2016 19 / 33
Online Costs
If ` = # PR components needed, online costs are:
Form M̃ O(dk2) for B-GMore complex for DEIM
Factor M̃ O(k3)Solve for y O(k2)Form Uy O(k`)
Online costs do not depend on graph size!(unless you want the whole PR vector)
D. Bindel Purdue 11 Apr 2016 20 / 33
Example Networks
DBLP (citation network)3.5M nodes / 18.5M edgesSeven edge types =)seven parametersP(w) linearCompetition: ScaleRank
Weibo (micro-blogging)1.9M nodes / 50.7M edgesWeight edges by topicalsimilarity of postsNumber of parameters =number of topics (5, 10, 20)
(Studied global and local PageRank – see paper for latter.)
D. Bindel Purdue 11 Apr 2016 21 / 33
Singular Value Decay
10-1
100
101
102
103
104
105
106
0 50 100 150 200
Val
ue
ith
Largest Singular Value
DBLP-LWeibo-S5
Weibo-S10Weibo-S20
r = 1000 samples, k = 100
D. Bindel Purdue 11 Apr 2016 22 / 33
DBLP Accuracy
10-5
10-4
10-3
10-2
10-1
100
Galerkin
DEIM
-100
DEIM
-120
DEIM
-200
ScaleR
ank
Kendall@100
Normalized L1
D. Bindel Purdue 11 Apr 2016 23 / 33
DBLP Running Times (All Nodes)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Galerkin
DEIM
-100
DEIM
-120
DEIM
-200
ScaleRank
Runnin
g t
ime
(s)
CoefficientsConstruction
D. Bindel Purdue 11 Apr 2016 24 / 33
Weibo Accuracy
10-4
10-3
10-2
10-1
5 10 20
# Parameters
Kendall@100
Normalized L1
D. Bindel Purdue 11 Apr 2016 25 / 33
Weibo Running Times (All Nodes)
0
0.1
0.2
0.3
0.4
0.5
5 10 20
Ru
nn
ing
tim
e (s
)
# Parameters
CoefficientsConstruction
D. Bindel Purdue 11 Apr 2016 26 / 33
Application: Learning to Rank
Goal: Given T = {(iq, jq)}|T |q=1, find w that mostly ranks iq over j1.
(c.f. Backstrom and Leskovec, WSDM 2011)
Standard idea: Gradient descent
@x
@wj= M(w)�1
↵@P(w)
@wjx(w)
�
Dominant cost: d + 1 solves with the PageRank system M(w)
One PageRank solve to evaluate a loss function
One PageRank solve per parameter to evaluate gradients
D. Bindel Purdue 11 Apr 2016 27 / 33
Application: Learning to Rank
Goal: Given T = {(iq, jq)}|T |q=1, find w that mostly ranks iq over j1.
(c.f. Backstrom and Leskovec, WSDM 2011)
Standard: Gradient descent on full problem
One PR computation for objective
One PR computation for each gradient component
Costs d + 1 PR computations per step
With model reduction
Rephrase objective in reduced coordinate space
Use factorization to solve PR for objective
Re-use same factorization for gradient
D. Bindel Purdue 11 Apr 2016 28 / 33
DBLP Learning Task
100
150
200
250
300
350
400
0 2 4 6 8 10 12 14 16 18 20
Obje
ctiv
e F
unct
ion V
alue
Iteration
StandardGalerkin
DEIM-200
(8 papers for training + 7 params)
D. Bindel Purdue 11 Apr 2016 29 / 33
The Punchline
Test case: DBLP, 3.5M nodes, 18.5M edges, 7 params
Cost per Iteration:
Method Standard Bubnov-Galerkin DEIM-200Time(sec) 159.3 0.002 0.033
D. Bindel Purdue 11 Apr 2016 30 / 33
Roads Not Taken
In the paper (but not the talk)
Selecting interpolation equations for DEIM
Localized PageRank experiments (Weibo and DBLP)
Comparison to BCA for localized PageRank
Room for future work! Analysis, applications, systems, ...
D. Bindel Purdue 11 Apr 2016 31 / 33
Questions?
Edge-Weighted Personalized PageRank:Breaking a Decade-Old Performance Barrier
Wenlei Xie, David Bindel, Johannes Gehrke, and Al Demers
KDD 2015, paper 117
Sponsors:
NSF (IIS-0911036 and IIS-1012593)
iAd Project from the National Research Council of Norway
D. Bindel Purdue 11 Apr 2016 32 / 33
Welcome to SIAM ALA in Hong Kong!
Hong Kong Baptist University, 4-8 May 2018
D. Bindel Purdue 11 Apr 2016 33 / 33