fast matrix computations for pair-wise and column-wise katz scores and commute times
DESCRIPTION
A seminar I gave at the University of Chicago about the ideas to compute the Katz matrices and commute times quickly.TRANSCRIPT
![Page 1: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/1.jpg)
FAST MATRIX COMPUTATIONS FOR
PAIRWISE AND COLUMN-WISE KATZ
SCORES AND COMMUTE TIMES
David F. Gleich
Purdue University
University of Chicago
Statistical and Scientific Computing Seminar
October 6th, 2011
With Pooya Esfandiar, Francesco Bonchi, Chen Grief,
Laks V. S. Lakshmanan, and Byung-Won On
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 1 / 47
![Page 2: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/2.jpg)
MAIN RESULTS
A – adjacency matrix
L – Laplacian matrix
Katz score :
Commute time:
For Commute
Compute one fast
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
For Katz Compute one fast Compute top fast
2 of 47
![Page 3: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/3.jpg)
OUTLINE
Why study these measures?
Katz Rank and Commute Time
How else do people compute them?
Quadrature rules for pairwise scores
Sparse linear systems solves for top-k
As many results as we have time for…
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 3 of 47
![Page 4: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/4.jpg)
WHY? LINK PREDICTION
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient
Neighborhood based
Path based
4 of 47
![Page 5: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/5.jpg)
NOTE
All graphs are undirected
All graphs are connected
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 5 of 47
![Page 6: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/6.jpg)
LEO KATZ
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 6 of 47
![Page 7: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/7.jpg)
NOT QUITE, WIKIPEDIA
: adjacency, : random walk
PageRank
Katz
These are equivalent if has constant degree
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 7 of 47
![Page 8: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/8.jpg)
WHAT KATZ ACTUALLY SAID
Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43
“we assume that each link independently has the
same probability of being effective” …
“we conceive a constant , depending
on the group and the context of the particular
investigation, which has the force of a probability
of effectiveness of a single link. A k-step chain
then, has probability of being effective.”
“We wish to find the column sums of the matrix”
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 8 of 47
![Page 9: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/9.jpg)
A MODERN TAKE
The Katz score (node-based) is
The Katz score (edge-based) is
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 9 of 47
![Page 10: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/10.jpg)
RETURNING TO THE MATRIX
Carl Neumann
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 10 of 47
![Page 11: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/11.jpg)
Carl Neumann
I’ve heard the Neumann series called the “von Neumann”
series more than I’d like! In fact, the von Neumann kernel
of a graph should be named the “Neumann” kernel!
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
Wikipedia page
11 / 47
![Page 12: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/12.jpg)
PROPERTIES OF KATZ’S MATRIX
is symmetric
exists when
is spd when
Note that 1/max-degree suffices
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 12 of 47
![Page 13: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/13.jpg)
COMMUTE TIME
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
Picture taken from Google images, seems to be
Bay Bridge Traffic by Jim M. Goldstein.
13 / 47
![Page 14: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/14.jpg)
COMMUTE TIME
Consider a uniform random walk on a graph
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
Also called the hitting
time from node i to j, or
the first transition time
14 of 47
![Page 15: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/15.jpg)
SKIPPING DETAILS
: graph Laplacian
is the only null-vector
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 15 of 47
![Page 16: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/16.jpg)
WHAT DO OTHER PEOPLE DO?
1) Just work with the linear algebra formulations
2) For Katz, Truncate the Neumann series as a few (3-5) terms
3) Use low-rank approximations from EVD(A) or EVD(L)
4) For commute, use Johnson-Lindenstrauss inspired random sampling
5) Approximately decompose into smaller problems
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009,
Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007, Wang et al. ICDM2007 16 of 47
![Page 17: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/17.jpg)
THE PROBLEM
All of these techniques are
preprocessing based because
most people’s goal is to compute
all the scores.
We want to avoid
preprocessing the graph.
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse
17 of 47
![Page 18: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/18.jpg)
WHY NO PREPROCESSING?
The graph is constantly changing
as I rate new movies.
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 18 of 47
![Page 19: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/19.jpg)
WHY NO PREPROCESSING?
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
Top-k predicted “links”
are movies to watch!
Pairwise scores give
user similarity
19 of 47
![Page 20: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/20.jpg)
PAIR-WISE ALGORITHMS
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 20 / 47
![Page 21: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/21.jpg)
PAIRWISE ALGORITHMS
Katz
Commute
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
Golub and Meurant
to the rescue!
21 of 47
![Page 22: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/22.jpg)
MMQ - THE BIG IDEA
Quadratic form
Weighted sum
Stieltjes integral
Quadrature approximation
Matrix equation David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
Think
A is s.p.d. use EVD
“A tautology”
Lanczos
22 of 47
![Page 23: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/23.jpg)
LANCZOS
, k-steps of the Lanczos method produce
and
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
=
23 of 47
![Page 24: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/24.jpg)
PRACTICAL LANCZOS
Only need to store the last 2 vectors in
Updating requires O(matvec) work
is not orthogonal
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 24 of 47
![Page 25: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/25.jpg)
MMQ PROCEDURE
Goal
Given
1. Run k-steps of Lanczos on starting with
2. Compute , with an additional eigenvalue at ,
set 3. Compute , with an additional eigenvalue at , set
4. Output as lower and upper bounds on
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
Correspond to a Gauss-Radau rule, with
u as a prescribed node
Correspond to a Gauss-Radau rule, with
l as a prescribed node
25 of 47
![Page 26: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/26.jpg)
PRACTICAL MMQ
Increase k to become more accurate
Bad eigenvalue bounds yield worse results
and are easy to compute
not required, we can iteratively
update it’s LU factorization
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 26 of 47
![Page 27: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/27.jpg)
PRACTICAL MMQ
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 27 of 47
![Page 28: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/28.jpg)
ONE LAST STEP FOR KATZ
Katz
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 28 of 47
![Page 29: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/29.jpg)
COLUMN-WISE
ALGORITHMS
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 29 / 47
![Page 30: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/30.jpg)
COLUMN-WISE COMMUTE-TIME
requires entire diagonal of
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
Paige and Saunders, 1975
=
Each vector is computed by the a Lanczos based CG algorithm.
30 of 47
![Page 31: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/31.jpg)
COLUMN-WISE COMMUTE TIME
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
following Hein et al. 2010 is a MUCH better and faster approximation.
31
![Page 32: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/32.jpg)
KATZ SCORES ARE LOCALIZED
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
Up to 50 neighbors is 99.65% of the total mass
32 of 47
![Page 33: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/33.jpg)
PARTICIPATION RATIOS
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
Participation
Ratios
“effective non-zeros” in a vector
33 of 47
![Page 34: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/34.jpg)
TOP-K ALGORITHM FOR KATZ
Approximate
where is sparse
Keep sparse too
Ideally, don’t “touch” all of
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of 47
![Page 35: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/35.jpg)
INSPIRATION - PAGERANK
Approximate
where is sparse
Keep sparse too? YES!
Ideally, don’t “touch” all of ? YES!
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too.
35 of 47
![Page 36: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/36.jpg)
THE ALGORITHM - MCSHERRY
For
Start with the Richardson iteration
Rewrite
Richardson converges if
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 36 of 47
![Page 37: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/37.jpg)
THE ALGORITHM
Note is sparse.
If , then is sparse.
Idea
only add one component of to
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 37 of 47
![Page 38: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/38.jpg)
THE ALGORITHM
For
Init:
How to pick ?
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 38 of 47
![Page 39: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/39.jpg)
THE ALGORITHM FOR KATZ
For
Init:
Pick as max David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more
39 of 47
![Page 40: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/40.jpg)
CONVERGENCE?
If 1/max-degree then is sub-stochastic and the PageRank based proof applies because the matrix is diagonally dominant
For , then for symmetric , this algorithm is the Gauss-Southwell procedure and it still converges.
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
40 of 47
![Page 41: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/41.jpg)
RESULTS – DATA, PARAMETERS
All unweighted, connected graphs
Easy
Hard
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 41 of 47
![Page 42: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/42.jpg)
KATZ BOUND CONVERGENCE
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 42 of 47
![Page 43: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/43.jpg)
COMMUTE BOUND CONVERG.
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 43 of 47
![Page 44: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/44.jpg)
KATZ SET CONVERGENCE
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar
For arXiv graph.
44 of 47
![Page 45: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/45.jpg)
TIMING
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 45 of 47
![Page 46: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/46.jpg)
CONCLUSIONS
These algorithms are faster than many alternatives.
For pairwise commute, stopping criteria are simpler with bounds
For top-k problems, we often need less than 1 matvec for good enough results
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 46 of 47
![Page 47: Fast matrix computations for pair-wise and column-wise Katz scores and commute times](https://reader034.vdocument.in/reader034/viewer/2022051611/54b721974a795903798b47ae/html5/thumbnails/47.jpg)
By AngryDogDesign on DeviantArt
Paper at WAW2010, J. Internet Mathematics
Slides should be online soon
Code is online already
www.cs.purdue.edu/homes/dgleich/
/codes/fast-katz-2011
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 47