fast relaxation methods for the matrix exponential
DESCRIPTION
The matrix exponential is a matrix computing primitive used in link prediction and community detection. We describe a fast method to compute it using relaxation on a large linear system of equations. This enables us to compute a column of the matrix exponential is sublinear time, or under a second on a standard desktop computer.TRANSCRIPT
Relaxation methods for !the matrix exponential !
on large networks
David F. Gleich!Purdue University!
Joint work with Kyle Kloster @ Purdue supported by "NSF CAREER 1149756-CCF
Code www.cs.purdue.edu/homes/dgleich/codes/nexpokit!
Mines David Gleich · Purdue 1
Models and algorithms for high performance !matrix and network computations
Mines David Gleich · Purdue 2
18 P. G. CONSTANTINE, D. F. GLEICH, Y. HOU, AND J. TEMPLETON
1
error
0
2
(a) Error, s = 0.39 cm
1
std
0
2
(b) Std, s = 0.39 cm
10
error
0
20
(c) Error, s = 1.95 cm
10
std
0
20
(d) Std, s = 1.95 cm
Fig. 4.5: Error in the reduce order model compared to the prediction standard de-viation for one realization of the bubble locations at the final time for two values ofthe bubble radius, s = 0.39 and s = 1.95 cm. (Colors are visible in the electronicversion.)
the varying conductivity fields took approximately twenty minutes to construct usingCubit after substantial optimizations.
Working with the simulation data involved a few pre- and post-processing steps:interpret 4TB of Exodus II files from Aria, globally transpose the data, compute theTSSVD, and compute predictions and errors. The preprocessing steps took approx-imately 8-15 hours. We collected precise timing information, but we do not reportit as these times are from a multi-tenant, unoptimized Hadoop cluster where otherjobs with sizes ranging between 100GB and 2TB of data sometimes ran concurrently.Also, during our computations, we observed failures in hard disk drives and issuescausing entire nodes to fail. Given that the cluster has 40 cores, there was at most2400 cpu-hours consumed via these calculations—compared to the 131,072 hours ittook to compute 4096 heat transfer simulations on Red Sky. Thus, evaluating theROM was about 50-times faster than computing a full simulation.
We used 20,000 reducers to convert the Exodus II simulation data. This choicedetermined how many map tasks each subsequent step utilized—around 33,000. Wealso found it advantageous to store matrices in blocks of about 16MB per record. Thereduction in the data enabled us to use a laptop to compute the coe�cients of theROM and apply to the far face for the UQ study in Section 4.4.
Here are a few pertinent challenges we encountered while performing this study.Generating 8192 meshes with di↵erent material properties and running independent
Tensor eigenvalues"and a power method
28
Tensor methods for network alignment
Network alignment is the problem of computing an approximate isomorphism between two net-works. In collaboration with Mohsen Bayati, Amin Saberi, Ying Wang, and Margot Gerritsen,the PI has developed a state of the art belief propagation method (Bayati et al., 2009).
FIGURE 6 – Previous workfrom the PI tackled net-work alignment with ma-trix methods for edgeoverlap:
i
j
j
0i
0
OverlapOverlap
A L B
This proposal is for match-ing triangles using tensormethods:
j
i
k
j
0
i
0
k
0
TriangleTriangle
A L B
If xi, xj , and xk areindicators associated withthe edges (i, i0), (j, j0), and(k, k0), then we want toinclude the product xixjxk
in the objective, yielding atensor problem.
We propose to study tensor methods to perform network alignmentwith triangle and other higher-order graph moment matching. Similarideas were proposed by Svab (2007); Chertok and Keller (2010) alsoproposed using triangles to aid in network alignment problems.In Bayati et al. (2011), we found that triangles were a key missingcomponent in a network alignment problem with a known solution.Given that preserving a triangle requires three edges between twographs, this yields a tensor problem:
maximizeX
i2L
wixi +X
i2L
X
j2L
xixjSi,j +X
i2L
X
j2L
X
k2L
xixjxkTi,j,k
| {z }triangle overlap term
subject to x is a matching.
Here, Ti,j,k = 1 when the edges corresponding to i, j, and k inL results in a triangle in the induced matching. Maximizing thisobjective is an intractable problem. We plan to investigate a heuris-tic based on a rank-1 approximation of the tensor T and usinga maximum-weight matching based rounding. Similar heuristicshave been useful in other matrix-based network alignment algo-rithms (Singh et al., 2007; Bayati et al., 2009). The work involvesenhancing the Symmetric-Shifted-Higher-Order Power Method due toKolda and Mayo (2011) to incredibly large and sparse tensors . On thisaspect, we plan to collaborate with Tamara G. Kolda. In an initialevaluation of this triangle matching on synthetic problems, using thetensor rank-1 approximation alone produced results that identifiedthe correct solution whereas all matrix approaches could not.
vision for the future
All of these projects fit into the PI’s vision for modernizing the matrix-computation paradigmto match the rapidly evolving space of network computations. This vision extends beyondthe scope of the current proposal. For example, the web is a huge network with over onetrillion unique URLs (Alpert and Hajaj, 2008), and search engines have indexed over 180billion of them (Cuil, 2009). Yet, why do we need to compute with the entire network?By way of analogy, note that we do not often solve partial di↵erential equations or modelmacro-scale physics by explicitly simulating the motion or interaction of elementary particles.We need something equivalent for the web and other large networks. Such investigations maytake many forms: network models, network geometry, or network model reduction. It is thevision of the PI that the language, algebra, and methodology of matrix computations will
11
maximize
Pijk
T
ijk
x
i
x
j
x
k
subject to kxk2
= 1
Human protein interaction networks 48,228 triangles Yeast protein interaction networks 257,978 triangles The tensor T has ~100,000,000,000 nonzeros
We work with it implicitly
where ! ensures the 2-norm
[x
(next)
]
i
= ⇢ · (
X
jk
T
ijk
x
j
x
k
+ �x
i
)
SSHOPM method due to "Kolda and Mayo
Simulation data analysis SIMAX ‘09, SISC ‘11,MapReduce ‘11, ICASSP ’12
Network alignment ICDM ‘09, SC ‘11, TKDE ‘13
Fast & Scalable"Network centrality SC ‘05, WAW ‘07, SISC ‘10, WWW ’10, …
Data clustering WSDM ‘12, KDD ‘12, CIKM ’13 …
Ax = b
min kAx � bkAx = �x
Massive matrix "computations
on multi-threaded and distributed architectures
3 Image from rockysprings, deviantart, CC share-alike
Everything in the world can be explained by a matrix, and we see
how deep the rabbit hole goes
The talk ends, you believe -- whatever
you want to.
Matrix exponentials
exp(A) is defined as
1X
k=0
1
k !
Ak Always converges
dx
dt= Ax(t) , x(t) = exp(tA)x(0)
Evolution operator "for an ODE
A is n ⇥ n, real
Mines David Gleich · Purdue 4
special case of a function of a matrix f (A)
others are f (x) = 1/x ; f (x) = sinh(x)...
This talk: a column of the matrix exponential
x = exp(P)ec
x the solution
P the matrix
ec the column
Mines David Gleich · Purdue 5
Mines David Gleich · Purdue 6
Matrix computations in a red-pill
Solve a problem better by exploiting its structure!
This talk: a column of the matrix exponential
x = exp(P)ec
x the solution
P the matrix
ec the column
localized large, sparse, stochastic
Mines David Gleich · Purdue 7
Localized solutions
0 2 4 6x 105
0
0.5
1
1.5plot(x)
nnz(x) = 513, 969
100 102 104 10610−15
10−10
10−5
100
100 102 104 10610−15
10−10
10−5
100
nonzeros
erro
r
x = exp(P)ec
length(x) = 513, 969Mines David Gleich · Purdue 8
Our mission!Find the solution with work "roughly proportional to the "localization, not the matrix.
Mines David Gleich · Purdue 9
Our algorithm!www.cs.purdue.edu/homes/dgleich/codes/nexpokit
100 102 104 10610−15
10−10
10−5
100
100 102 104 10610−15
10−10
10−5
100
nonzeros
erro
r
Mines David Gleich · Purdue 10
Outline
1. Motivation and setup 2. Converting x = exp(P) ec into a linear system 3. Relaxation methods for "
linear systems from large networks 4. Error analysis 5. Experiments
Mines David Gleich · Purdue 11
6,$0� 5(9,(:� "��6RFLHW\�IRU�,QGXVWULDO�DQG�$SSOLHG�0DWKHPDWLFV�9RO������1R�����2FWREHU������ �������������������������������
1,1(7((1� '8%,286� :$<6� 72� &20387(�7+(� (;321(17,$/� 2)� $�0$75,; �
&/(9(� 02/(5W� $1'� &+$5/(6� 9$1� /2$1W�
$EVWUDFW��,Q�SULQFLSOH��WKH�H[SRQHQWLDO�RI�D�PDWUL[�FRXOG�EH�FRPSXWHG�LQ�PDQ\�ZD\V��0HWKRGV�LQYROYLQJ�DSSUR[LPDWLRQ�WKHRU\��GLIIHUHQWLDO�HTXDWLRQV��WKH�PDWUL[�HLJHQYDOXHV��DQG�WKH�PDWUL[�FKDUDFWHULVWLF�SRO\��QRPLDO�KDYH�EHHQ�SURSRVHG��,Q�SUDFWLFH��FRQVLGHUDWLRQ�RI�FRPSXWDWLRQDO�VWDELOLW\�DQG�HIILFLHQF\�LQGLFDWHV�WKDW�VRPH�RI�WKH�PHWKRGV�DUH�SUHIHUDEOH�WR�RWKHUV��EXW�WKDW�QRQH�DUH�FRPSOHWHO\�VDWLVIDFWRU\��
���,QWURGXFWLRQ��0DWKHPDWLFDO�PRGHOV� RI� PDQ\�SK\VLFDO��ELRORJLFDO��DQG�HFRQRPLF�SURFHVVHV�LQYROYH�V\VWHPV�RI�OLQHDU��FRQVWDQW�FRHIILFLHQW�RUGLQDU\�GLIIHUHQWLDO�HTXDWLRQV�
[��W�� �$[��W���
+HUH�$� LV�D�JLYHQ��IL[HG��UHDO�RU�FRPSOH[�Q�E\�Q�PDWUL[��$�VROXWLRQ�YHFWRU�[�W��LV�VRXJKW�ZKLFK�VDWLVILHV�DQ�LQLWLDO�FRQGLWLRQ�
[��2�� �[R��
,Q�FRQWURO�WKHRU\��$�LV�NQRZQ�DV�WKH�VWDWH�FRPSDQLRQ�PDWUL[�DQG�[�W��LV�WKH�V\VWHP�UHVSRQVH��
,Q�SULQFLSOH��WKH�VROXWLRQ�LV�JLYHQ�E\�[�W�� � HW$[R�ZKHUH�HW$�FDQ�EH�IRUPDOO\�GHILQHG�E\�WKH�FRQYHUJHQW�SRZHU�VHULHV�
W�$��W$� W$�H� ,�,$�� a������
7KH�HIIHFWLYH�FRPSXWDWLRQ�RI�WKLV�PDWUL[�IXQFWLRQ�LV�WKH�PDLQ�WRSLF�RI�WKLV�VXUYH\��:H�ZLOO�SULPDULO\�EH�FRQFHUQHG�ZLWK�PDWULFHV�ZKRVH�RUGHU�Q�LV�OHVV�WKDQ�D�IHZ�
KXQGUHG��VR�WKDW�DOO�WKH�HOHPHQWV�FDQ�EH�VWRUHG�LQ�WKH�PDLQ�PHPRU\�RI�D�FRQWHPSRUDU\�FRPSXWHU��2XU�GLVFXVVLRQ�ZLOO�EH�OHVV�JHUPDQH�WR�WKH�W\SH�RI�ODUJH��VSDUVH�PDWULFHV�ZKLFK�RFFXU�LQ�WKH�PHWKRG�RI�OLQHV�IRU�SDUWLDO�GLIIHUHQWLDO�HTXDWLRQV��
'R]HQV�RI�PHWKRGV�IRU�FRPSXWLQJ�H�W$FDQ�EH�REWDLQHG�IURP�PRUH�RU�OHVV�FODVVLFDO�UHVXOWV�LQ�DQDO\VLV��DSSUR[LPDWLRQ�WKHRU\��DQG�PDWUL[�WKHRU\��6RPH�RI�WKH�PHWKRGV�KDYH�EHHQ�SURSRVHG�DV�VSHFLILF�DOJRULWKPV��ZKLOH�RWKHUV�DUH�EDVHG�RQ�OHVV�FRQVWUXFWLYH�FKDUDFWHUL]DWLRQV��2XU� ELEOLRJUDSK\�FRQFHQWUDWHV�RQ� UHFHQW�SDSHUV�ZLWK�VWURQJ�DOJRULWKPLF�FRQWHQW��DOWKRXJK�ZH�KDYH�LQFOXGHG�D�IDLU�QXPEHU�RI�UHIHUHQFHV�ZKLFK�SRVVHVV�KLVWRULFDO�RU�WKHRUHWLFDO�LQWHUHVW��
,Q�WKLV�VXUYH\�ZH�WU\�WR�GHVFULEH�DOO�WKH�PHWKRGV�WKDW�DSSHDU�WR�EH�SUDFWLFDO��FODVVLI\�WKHP�LQWR�ILYH�EURDG�FDWHJRULHV��DQG�DVVHVV�WKHLU�UHODWLYH�HIIHFWLYHQHVV��$FWX��DOO\��HDFK�RI�WKH��PHWKRGV��ZKHQ�FRPSOHWHO\�LPSOHPHQWHG�PLJKW�OHDG�WR�PDQ\�GLIIHUHQW�FRPSXWHU�SURJUDPV�ZKLFK�GLIIHU�LQ�YDULRXV�GHWDLOV��0RUHRYHU��WKHVH�GHWDLOV�PLJKW�KDYH�PRUH�LQIOXHQFH�RQ�WKH�DFWXDO�SHUIRUPDQFH�WKDQ�RXU�JURVV�DVVHVVPHQW�LQGLFDWHV��7KXV��RXU�FRPPHQWV�PD\�QRW�GLUHFWO\�DSSO\�WR�SDUWLFXODU�VXEURXWLQHV��
,Q�DVVHVVLQJ�WKH�HIIHFWLYHQHVV�RI�YDULRXV�DOJRULWKPV�ZH�ZLOO�EH�FRQFHUQHG�ZLWK�WKH�IROORZLQJ�DWWULEXWHV��OLVWHG�LQ�GHFUHDVLQJ�RUGHU�RI�LPSRUWDQFH��JHQHUDOLW\��UHOLDELOLW\��
�5HFHLYHG�E\�WKH�HGLWRUV�-XO\�����������DQG�LQ�UHYLVHG�IRUP�0DUFK�����������W�'HSDUWPHQW�RI�0DWKHPDWLFV��8QLYHUVLW\�RI�1HZ�0H[LFR��$OEXTXHUTXH��1HZ�0H[LFR��������7KLV�
ZRUN�ZDV�SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6����������W�'HSDUWPHQW�RI�&RPSXWHU�6FLHQFH��&RUQHOO�8QLYHUVLW\��,WKDFD��1HZ�<RUN��������7KLV�ZRUN�ZDV�
SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6����������
����
This content downloaded from 128.210.126.199 on Sun, 28 Jul 2013 21:30:56 PMAll use subject to JSTOR Terms and Conditions
SIAM REVIEW c⃝ 2003 Society for Industrial and Applied MathematicsVol. 45, No. 1, pp. 3–49
Nineteen Dubious Ways toCompute the Exponential of aMatrix, Twenty-Five Years Later∗
Cleve Moler†
Charles Van Loan‡
Abstract. In principle, the exponential of a matrix could be computed in many ways. Methods involv-ing approximation theory, differential equations, the matrix eigenvalues, and the matrixcharacteristic polynomial have been proposed. In practice, consideration of computationalstability and efficiency indicates that some of the methods are preferable to others butthat none are completely satisfactory.
Most of this paper was originally published in 1978. An update, with a separatebibliography, describes a few recent developments.
Key words. matrix, exponential, roundoff error, truncation error, condition
AMS subject classifications. 15A15, 65F15, 65F30, 65L99
PII. S0036144502418010
1. Introduction. Mathematical models of many physical, biological, and eco-nomic processes involve systems of linear, constant coefficient ordinary differentialequations
x(t) = Ax(t).
Here A is a given, fixed, real or complex n-by-n matrix. A solution vector x(t) issought which satisfies an initial condition
x(0) = x0.
In control theory, A is known as the state companion matrix and x(t) is the systemresponse.
In principle, the solution is given by x(t) = etAx0, where etA can be formallydefined by the convergent power series
etA = I + tA +t2A2
2!+ · · · .
The effective computation of this matrix function is the main topic of this survey.
∗Published electronically February 3, 2003. A portion of this paper originally appeared in SIAMReview, Volume 20, Number 4, 1978, pages 801–836.
http://www.siam.org/journals/sirev/45-1/41801.html†The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098 ([email protected]).‡Department of Computer Science, Cornell University, 4130 Upson Hall, Ithaca, NY 14853-7501
3
Dow
nloa
ded
07/2
8/13
to 1
28.2
10.1
26.1
99. R
edis
tribu
tion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.siam
.org
/jour
nals
/ojs
a.ph
p
Mines David Gleich · Purdue 12
Matrix exponentials on large networks
exp(A) =
1X
k=0
1
k !
Ak If A is the adjacency matrix, then Ak counts the number of length k paths between node pairs.
[Estrada 2000, Farahat et al. 2002, 2006] Large entries denote important nodes or edges. Used for link prediction and centrality
If P is a transition matrix, then "Pk is the probability of a length k walk between node pairs.
[Kondor & Lafferty 2002, Kunegis & Lommatzsch 2009, Chung 2007] Used for link prediction, kernels, and clustering or community detection
exp(P) =
1X
k=0
1
k !
Pk
Mines David Gleich · Purdue 13
Another useful matrix exponential
P column stochastic
e.g. P = AT D�1
A is the adjacency matrix
if A is symmetric
exp(PT) = exp(D�1A) = D�1
exp(AD�1
)D = D�1
exp(P)D
Mines David Gleich · Purdue 14
Another useful matrix exponential
P column stochastic
e.g. P = AT D�1
A is the adjacency matrix
if A is symmetric
exp(�L) = exp(D�1/2AD�1/2 � I)
=
1
eexp(D�1/2AD�1/2
)
=
1
eD�1/2
exp(AD�1
)D1/2
=
1
eD�1/2
exp(P)D1/2
Negative Normalized Laplacian
Mines David Gleich · Purdue 15
heat kernel of a graph
solves the heat equation at t=1.
dx(t)dt
= �Lx(t)
Matrix exponentials on large networks Is a single column interesting? Yes!
exp(P)ec =
1X
k=0
1
k !
Pk ec Link prediction scores for node c A community relative to node c
But … modern networks are "large ~ O(109) nodes, sparse ~ O(1011) edges, constantly changing …
and so we’d like "speed over accuracy
Mines David Gleich · Purdue 16
Mines David Gleich · Purdue 17
Newman’s netscience
collaboration network!
379 vertices
1828 non-zeros
“zero” on most nodes
ec has a single "one here
x = exp(P)ec
The issue with existing methods
We want good results in less than one matvec. Our graphs have small diameter and fast fill-in. Krylov methods !A few matvecs, quick loss of sparsity due to orthogonality !Direct expansion!A few matvecs, quick loss of sparsity due to fill-in
Mines David Gleich · Purdue 18
exp(P)ec ⇡ ⇢Vexp(H)e1
[Sidje 1998]"ExpoKit
exp(P)ec ⇡PN
k=0
1
k !
Pk ec
Outline
1. Motivation and setup 2. Converting x = exp(P) ec into a linear system 3. Relaxation methods for "
linear systems from large networks 4. Error analysis 5. Experiments
✓
Mines David Gleich · Purdue 19
Our underlying method
Direct expansion!A few matvecs, quick loss of sparsity due to fill-in This method is stable for stochastic P!
"… no cancellation, unbounded norm, etc. !!
Mines David Gleich · Purdue 20
x = exp(P)ec ⇡PN
k=0
1
k !
P
kec = xN
Lemma kx � xNk1 1N!N
Our underlying method !as a linear system Direct expansion! "!!!
Mines David Gleich · Purdue 21
x = exp(P)ec ⇡PN
k=0
1
k !
P
kec = xN
2
6666664
III�P/1 III
�P/2. . .. . . III
�P/N III
3
7777775
2
6666664
v0v1......
vN
3
7777775=
2
6666664
ec0......0
3
7777775xN =
NX
i=0
vi
(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec
Lemma we approximate xN well if we approximate v well
Our mission (2)!Approximately solve " when A, b are sparse,"x is localized.
Mines David Gleich · Purdue 22
Ax = b
Outline
1. Motivation and setup 2. Converting x = exp(P) ec into a linear system 3. Relaxation methods for "
linear systems from large networks 4. Error analysis 5. Experiments
✓
Mines David Gleich · Purdue 23
✓
Coordinate descent, Gauss-Southwell, Gauss-Seidel, relaxation & “push” methods
Be greedy Don’t look at the whole system. Look at equations that are violated and try and fix them.
Mines David Gleich · Purdue 24
Coordinate descent, Gauss-Southwell, Gauss-Seidel, relaxation & “push” methods
Mines David Gleich · Purdue 25
Algebraically! Procedurally!
Solve(A,b) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + r(j) For i where A(i,j) != 0 r(i) = r(i) – z*A(i,j)
Ax = b
r
(k ) = b � Ax
(k )
x
(k+1) = x
(k ) + ejeTj r
(k )
r
(k+1) = r
(k ) � r (k )j Aej
It’s called the “push” method because of PageRank
Mines David Gleich · Purdue 26
(III� ↵P)x = v
r
(k )
= v � (III� ↵P)x
(k )
x
(k+1)
= x
(k )
+ ejeTj r
(k )
“r
(k+1)
= r
(k ) � r (k )
j Aej ”
r (k+1)
i =
8><
>:
0 i = jr (k )
i + ↵Pi ,j r(k )
j Pi ,j 6= 0
r (k )
i otherwise
PageRankPush(links,v,alpha) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + z r(j) = 0 z = alpha * z / deg(j) For i where “j links to i” r(i) = r(i) + z
It’s called the “push” method because of PageRank
Mines David Gleich · Purdue 27
Demo
Justification of terminology
This method is frequently “rediscovered” (3 times for PageRank!)
Let Ax = b, diag(A) = I It’s Gauss-Seidel if j is chosen cyclically It’s Gauss-Southwell if j is the largest entry in the residual It’s coordinate descent if A is symmetric, pos. definite It’s a relaxation step for any A
Works great for other problems too! "[Bonchi, Gleich, et al. J. Internet Math. 2012]
Mines David Gleich · Purdue 28
Back to the exponential
Mines David Gleich · Purdue 29
2
6666664
III�P/1 III
�P/2. . .. . . III
�P/N III
3
7777775
2
6666664
v0v1......
vN
3
7777775=
2
6666664
ec0......0
3
7777775xN =
NX
i=0
vi
(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec
Solve this system via the same method. Optimization 1 build system implicitly Optimization 2 don’t store vi, just store sum xN
Code (inefficient, but working) for !Gauss-Southwell to solve function x = nexpm(P,c,tol) n = size(P,1); N = 11; sumr=1; r = zeros(n,N+1); r(c,1) = 1; x = zeros(n,1); % the residual and solution while sumr >= tol % use max iteration too [ml,q]=max(r(:)); i=mod(q-1,n)+1; k=ceil(q/n); % use a heap in practice for max r(q) = 0; x(i) = x(i)+ml; sumr = sumr-ml;% zero the residual, add to solution [nset,~,vals] = find(P(:,i)); ml=ml/k; % look up the neighbors of node i for j=1:numel(nset) % for all neighbors if k==N, x(nset(j)) = x(nset(j)) + vals(j)*ml; % add to solution else, r(nset(j),k+1) = r(nset(j),k+1) + vals(j)*ml;% or add to next residual sumr = sumr + vals(j)*ml; end, end, end % end if, end for, end while Todo use dictionary for x, r and use heap or queue for residual
Mines David Gleich · Purdue 30
Outline
1. Motivation and setup 2. Converting x = exp(P) ec into a linear system 3. Relaxation methods for "
linear systems from large networks 4. Error analysis 5. Experiments
✓
Mines David Gleich · Purdue 31
✓ ✓
Error analysis for Gauss-Southwell
Mines David Gleich · Purdue 32
Theorem
Assume P is column-stochastic, v
(0)
= 0.
(Nonnegativity)
iterates and residuals are nonnegative
v
(l) � 0 and r
(l) � 0
(Convergence)
residual goes to 0:
kr
(l)k1
Q
l
k=1
�1 � 1
2dk
� l
(
� 1
2d
)
(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec
“easy”
“annoying” d is the
largest degree
Proof sketch
Gauss-Southwell picks largest residual ⇒ Bound the update by avg. nonzeros in residual (sloppy) ⇒ Algebraic convergence with slow rate, but each update is
REALLY fast O(d max log n).
If d is log log n, then our method runs in sub-linear time "(but so does just about anything)
Mines David Gleich · Purdue 33
Overall error analysis
Mines David Gleich · Purdue 34
Components!Truncation to N terms Residual to error Approximate solve
Theorem kxN(`) � xk1 1
N!N+
1e· `� 1
2d
After ℓ steps of Gauss-Southwell
More recent error analysis
Mines David Gleich · Purdue 35
Theorem (Gleich and Kloster, 2013 arXiv:1310.3423)" Consider solving personalized PageRank using the Gauss-Southwell relaxation method in a graph with a Zipf-law in the degrees with exponent p=1 and max-degree d, then the work involved in getting a solution with 1-norm error ε is
work = O⇣
log(
1
" )(
1
" )
3/2d2
(log d)
2
⌘
Outline
1. Motivation and setup 2. Converting x = exp(P) ec into a linear system 3. Relaxation methods for "
linear systems from large networks 4. Error analysis 5. Experiments
✓
Mines David Gleich · Purdue 36
✓ ✓
✓
Our implementations
C++ mex implementation with a heap to implement Gauss-Southwell. C++ mex implementation with a queue to store all residual entries ≥ 1/(tol nN).
At completion, the residual norm ≤ tol. We use the queue except for the runtime comparison.
Mines David Gleich · Purdue 37
Accuracy vs. tolerance
Mines David Gleich · Purdue 38
0
0.2
0.4
0.6
0.8
1
−2 −3 −4 −5 −6 −7log10 of residual tolerance
Pre
cisi
on a
t 1
00
pgp−ccpgp social graph, 10k vertices
For the pgp social graph, we study the precision in finding the 100 largest nodes as we vary the tolerance. This set of 100 does not include the nodes immediate neighbors. (Boxplot over 50 trials)
Accuracy vs. work
Mines David Gleich · Purdue 39
For the dblp collaboration graph, we study the precision in finding the 100 largest nodes as we vary the work. This set of 100 does not include the nodes immediate neighbors. (One column, but representative)
10−2 10−1 100
0
0.2
0.4
0.6
0.8
1
dblp−cc
Effective matrix−vector products
Prec
isio
n
to
l=10
−4
tol=
10−5
@10@25@100@1000
dblp collaboration graph, 225k vertices
Runtime
Mines David Gleich · Purdue 40
103
104
105
106
10−4
10−2
100
|E| + |V|
Runtim
e (
secs
).
TSGSTSGSQEXPVMEXPVTAYLOR
Flickr social network"500k nodes, 5M edges
Outline
1. Motivation and setup 2. Converting x = exp(P) ec into a linear system 3. Coordinate descent methods for "
linear systems from large networks 4. Error analysis 5. Experiments
✓
Mines David Gleich · Purdue 41
✓ ✓
✓ ✓
References and ongoing work
Kloster and Gleich, Workshop on Algorithms for the Web-graph, 2013. Also see the journal version on arXiv. www.cs.purdue.edu/homes/dgleich/codes/nexpokit
• Error analysis using the queue (almost done …) • Better linear systems for faster convergence • Asynchronous coordinate descent methods • Scaling up to billion node graphs (done …)
Mines David Gleich · Purdue 42
Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich