fast relaxation methods for the matrix exponential

42
Relaxation methods for the matrix exponential on large networks David F. Gleich Purdue University Joint work with Kyle Kloster @ Purdue supported by NSF CAREER 1149756-CCF Code www.cs.purdue.edu/homes/dgleich/codes/nexpokit Mines David Gleich · Purdue 1

Upload: david-gleich

Post on 15-Jan-2015

322 views

Category:

Technology


5 download

DESCRIPTION

The matrix exponential is a matrix computing primitive used in link prediction and community detection. We describe a fast method to compute it using relaxation on a large linear system of equations. This enables us to compute a column of the matrix exponential is sublinear time, or under a second on a standard desktop computer.

TRANSCRIPT

Page 1: Fast relaxation methods for the matrix exponential

Relaxation methods for !the matrix exponential !

on large networks

David F. Gleich!Purdue University!

Joint work with Kyle Kloster @ Purdue supported by "NSF CAREER 1149756-CCF

Code www.cs.purdue.edu/homes/dgleich/codes/nexpokit!

Mines David Gleich · Purdue 1

Page 2: Fast relaxation methods for the matrix exponential

Models and algorithms for high performance !matrix and network computations

Mines David Gleich · Purdue 2

18 P. G. CONSTANTINE, D. F. GLEICH, Y. HOU, AND J. TEMPLETON

1

error

0

2

(a) Error, s = 0.39 cm

1

std

0

2

(b) Std, s = 0.39 cm

10

error

0

20

(c) Error, s = 1.95 cm

10

std

0

20

(d) Std, s = 1.95 cm

Fig. 4.5: Error in the reduce order model compared to the prediction standard de-viation for one realization of the bubble locations at the final time for two values ofthe bubble radius, s = 0.39 and s = 1.95 cm. (Colors are visible in the electronicversion.)

the varying conductivity fields took approximately twenty minutes to construct usingCubit after substantial optimizations.

Working with the simulation data involved a few pre- and post-processing steps:interpret 4TB of Exodus II files from Aria, globally transpose the data, compute theTSSVD, and compute predictions and errors. The preprocessing steps took approx-imately 8-15 hours. We collected precise timing information, but we do not reportit as these times are from a multi-tenant, unoptimized Hadoop cluster where otherjobs with sizes ranging between 100GB and 2TB of data sometimes ran concurrently.Also, during our computations, we observed failures in hard disk drives and issuescausing entire nodes to fail. Given that the cluster has 40 cores, there was at most2400 cpu-hours consumed via these calculations—compared to the 131,072 hours ittook to compute 4096 heat transfer simulations on Red Sky. Thus, evaluating theROM was about 50-times faster than computing a full simulation.

We used 20,000 reducers to convert the Exodus II simulation data. This choicedetermined how many map tasks each subsequent step utilized—around 33,000. Wealso found it advantageous to store matrices in blocks of about 16MB per record. Thereduction in the data enabled us to use a laptop to compute the coe�cients of theROM and apply to the far face for the UQ study in Section 4.4.

Here are a few pertinent challenges we encountered while performing this study.Generating 8192 meshes with di↵erent material properties and running independent

Tensor eigenvalues"and a power method

28

Tensor methods for network alignment

Network alignment is the problem of computing an approximate isomorphism between two net-works. In collaboration with Mohsen Bayati, Amin Saberi, Ying Wang, and Margot Gerritsen,the PI has developed a state of the art belief propagation method (Bayati et al., 2009).

FIGURE 6 – Previous workfrom the PI tackled net-work alignment with ma-trix methods for edgeoverlap:

i

j

j

0i

0

OverlapOverlap

A L B

This proposal is for match-ing triangles using tensormethods:

j

i

k

j

0

i

0

k

0

TriangleTriangle

A L B

If xi, xj , and xk areindicators associated withthe edges (i, i0), (j, j0), and(k, k0), then we want toinclude the product xixjxk

in the objective, yielding atensor problem.

We propose to study tensor methods to perform network alignmentwith triangle and other higher-order graph moment matching. Similarideas were proposed by Svab (2007); Chertok and Keller (2010) alsoproposed using triangles to aid in network alignment problems.In Bayati et al. (2011), we found that triangles were a key missingcomponent in a network alignment problem with a known solution.Given that preserving a triangle requires three edges between twographs, this yields a tensor problem:

maximizeX

i2L

wixi +X

i2L

X

j2L

xixjSi,j +X

i2L

X

j2L

X

k2L

xixjxkTi,j,k

| {z }triangle overlap term

subject to x is a matching.

Here, Ti,j,k = 1 when the edges corresponding to i, j, and k inL results in a triangle in the induced matching. Maximizing thisobjective is an intractable problem. We plan to investigate a heuris-tic based on a rank-1 approximation of the tensor T and usinga maximum-weight matching based rounding. Similar heuristicshave been useful in other matrix-based network alignment algo-rithms (Singh et al., 2007; Bayati et al., 2009). The work involvesenhancing the Symmetric-Shifted-Higher-Order Power Method due toKolda and Mayo (2011) to incredibly large and sparse tensors . On thisaspect, we plan to collaborate with Tamara G. Kolda. In an initialevaluation of this triangle matching on synthetic problems, using thetensor rank-1 approximation alone produced results that identifiedthe correct solution whereas all matrix approaches could not.

vision for the future

All of these projects fit into the PI’s vision for modernizing the matrix-computation paradigmto match the rapidly evolving space of network computations. This vision extends beyondthe scope of the current proposal. For example, the web is a huge network with over onetrillion unique URLs (Alpert and Hajaj, 2008), and search engines have indexed over 180billion of them (Cuil, 2009). Yet, why do we need to compute with the entire network?By way of analogy, note that we do not often solve partial di↵erential equations or modelmacro-scale physics by explicitly simulating the motion or interaction of elementary particles.We need something equivalent for the web and other large networks. Such investigations maytake many forms: network models, network geometry, or network model reduction. It is thevision of the PI that the language, algebra, and methodology of matrix computations will

11

maximize

Pijk

T

ijk

x

i

x

j

x

k

subject to kxk2

= 1

Human protein interaction networks 48,228 triangles Yeast protein interaction networks 257,978 triangles The tensor T has ~100,000,000,000 nonzeros

We work with it implicitly

where ! ensures the 2-norm

[x

(next)

]

i

= ⇢ · (

X

jk

T

ijk

x

j

x

k

+ �x

i

)

SSHOPM method due to "Kolda and Mayo

Simulation data analysis SIMAX ‘09, SISC ‘11,MapReduce ‘11, ICASSP ’12

Network alignment ICDM ‘09, SC ‘11, TKDE ‘13

Fast & Scalable"Network centrality SC ‘05, WAW ‘07, SISC ‘10, WWW ’10, …

Data clustering WSDM ‘12, KDD ‘12, CIKM ’13 …

Ax = b

min kAx � bkAx = �x

Massive matrix "computations

on multi-threaded and distributed architectures

Page 3: Fast relaxation methods for the matrix exponential

3 Image from rockysprings, deviantart, CC share-alike

Everything in the world can be explained by a matrix, and we see

how deep the rabbit hole goes

The talk ends, you believe -- whatever

you want to.

Page 4: Fast relaxation methods for the matrix exponential

Matrix exponentials

exp(A) is defined as

1X

k=0

1

k !

Ak Always converges

dx

dt= Ax(t) , x(t) = exp(tA)x(0)

Evolution operator "for an ODE

A is n ⇥ n, real

Mines David Gleich · Purdue 4

special case of a function of a matrix f (A)

others are f (x) = 1/x ; f (x) = sinh(x)...

Page 5: Fast relaxation methods for the matrix exponential

This talk: a column of the matrix exponential

x = exp(P)ec

x the solution

P the matrix

ec the column

Mines David Gleich · Purdue 5

Page 6: Fast relaxation methods for the matrix exponential

Mines David Gleich · Purdue 6

Matrix computations in a red-pill

Solve a problem better by exploiting its structure!

Page 7: Fast relaxation methods for the matrix exponential

This talk: a column of the matrix exponential

x = exp(P)ec

x the solution

P the matrix

ec the column

localized large, sparse, stochastic

Mines David Gleich · Purdue 7

Page 8: Fast relaxation methods for the matrix exponential

Localized solutions

0 2 4 6x 105

0

0.5

1

1.5plot(x)

nnz(x) = 513, 969

100 102 104 10610−15

10−10

10−5

100

100 102 104 10610−15

10−10

10−5

100

nonzeros

erro

r

x = exp(P)ec

length(x) = 513, 969Mines David Gleich · Purdue 8

Page 9: Fast relaxation methods for the matrix exponential

Our mission!Find the solution with work "roughly proportional to the "localization, not the matrix.

Mines David Gleich · Purdue 9

Page 10: Fast relaxation methods for the matrix exponential

Our algorithm!www.cs.purdue.edu/homes/dgleich/codes/nexpokit

100 102 104 10610−15

10−10

10−5

100

100 102 104 10610−15

10−10

10−5

100

nonzeros

erro

r

Mines David Gleich · Purdue 10

Page 11: Fast relaxation methods for the matrix exponential

Outline

1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Relaxation methods for "

linear systems from large networks 4.  Error analysis 5.  Experiments

Mines David Gleich · Purdue 11

Page 12: Fast relaxation methods for the matrix exponential

6,$0� 5(9,(:� "��6RFLHW\�IRU�,QGXVWULDO�DQG�$SSOLHG�0DWKHPDWLFV�9RO������1R�����2FWREHU������ �������������������������������

1,1(7((1� '8%,286� :$<6� 72� &20387(�7+(� (;321(17,$/� 2)� $�0$75,; �

&/(9(� 02/(5W� $1'� &+$5/(6� 9$1� /2$1W�

$EVWUDFW��,Q�SULQFLSOH��WKH�H[SRQHQWLDO�RI�D�PDWUL[�FRXOG�EH�FRPSXWHG�LQ�PDQ\�ZD\V��0HWKRGV�LQYROYLQJ�DSSUR[LPDWLRQ�WKHRU\��GLIIHUHQWLDO�HTXDWLRQV��WKH�PDWUL[�HLJHQYDOXHV��DQG�WKH�PDWUL[�FKDUDFWHULVWLF�SRO\��QRPLDO�KDYH�EHHQ�SURSRVHG��,Q�SUDFWLFH��FRQVLGHUDWLRQ�RI�FRPSXWDWLRQDO�VWDELOLW\�DQG�HIILFLHQF\�LQGLFDWHV�WKDW�VRPH�RI�WKH�PHWKRGV�DUH�SUHIHUDEOH�WR�RWKHUV��EXW�WKDW�QRQH�DUH�FRPSOHWHO\�VDWLVIDFWRU\��

���,QWURGXFWLRQ��0DWKHPDWLFDO�PRGHOV� RI� PDQ\�SK\VLFDO��ELRORJLFDO��DQG�HFRQRPLF�SURFHVVHV�LQYROYH�V\VWHPV�RI�OLQHDU��FRQVWDQW�FRHIILFLHQW�RUGLQDU\�GLIIHUHQWLDO�HTXDWLRQV�

[��W�� �$[��W���

+HUH�$� LV�D�JLYHQ��IL[HG��UHDO�RU�FRPSOH[�Q�E\�Q�PDWUL[��$�VROXWLRQ�YHFWRU�[�W��LV�VRXJKW�ZKLFK�VDWLVILHV�DQ�LQLWLDO�FRQGLWLRQ�

[��2�� �[R��

,Q�FRQWURO�WKHRU\��$�LV�NQRZQ�DV�WKH�VWDWH�FRPSDQLRQ�PDWUL[�DQG�[�W��LV�WKH�V\VWHP�UHVSRQVH��

,Q�SULQFLSOH��WKH�VROXWLRQ�LV�JLYHQ�E\�[�W�� � HW$[R�ZKHUH�HW$�FDQ�EH�IRUPDOO\�GHILQHG�E\�WKH�FRQYHUJHQW�SRZHU�VHULHV�

W�$��W$� W$�H� ,�,$�� a������

7KH�HIIHFWLYH�FRPSXWDWLRQ�RI�WKLV�PDWUL[�IXQFWLRQ�LV�WKH�PDLQ�WRSLF�RI�WKLV�VXUYH\��:H�ZLOO�SULPDULO\�EH�FRQFHUQHG�ZLWK�PDWULFHV�ZKRVH�RUGHU�Q�LV�OHVV�WKDQ�D�IHZ�

KXQGUHG��VR�WKDW�DOO�WKH�HOHPHQWV�FDQ�EH�VWRUHG�LQ�WKH�PDLQ�PHPRU\�RI�D�FRQWHPSRUDU\�FRPSXWHU��2XU�GLVFXVVLRQ�ZLOO�EH�OHVV�JHUPDQH�WR�WKH�W\SH�RI�ODUJH��VSDUVH�PDWULFHV�ZKLFK�RFFXU�LQ�WKH�PHWKRG�RI�OLQHV�IRU�SDUWLDO�GLIIHUHQWLDO�HTXDWLRQV��

'R]HQV�RI�PHWKRGV�IRU�FRPSXWLQJ�H�W$FDQ�EH�REWDLQHG�IURP�PRUH�RU�OHVV�FODVVLFDO�UHVXOWV�LQ�DQDO\VLV��DSSUR[LPDWLRQ�WKHRU\��DQG�PDWUL[�WKHRU\��6RPH�RI�WKH�PHWKRGV�KDYH�EHHQ�SURSRVHG�DV�VSHFLILF�DOJRULWKPV��ZKLOH�RWKHUV�DUH�EDVHG�RQ�OHVV�FRQVWUXFWLYH�FKDUDFWHUL]DWLRQV��2XU� ELEOLRJUDSK\�FRQFHQWUDWHV�RQ� UHFHQW�SDSHUV�ZLWK�VWURQJ�DOJRULWKPLF�FRQWHQW��DOWKRXJK�ZH�KDYH�LQFOXGHG�D�IDLU�QXPEHU�RI�UHIHUHQFHV�ZKLFK�SRVVHVV�KLVWRULFDO�RU�WKHRUHWLFDO�LQWHUHVW��

,Q�WKLV�VXUYH\�ZH�WU\�WR�GHVFULEH�DOO�WKH�PHWKRGV�WKDW�DSSHDU�WR�EH�SUDFWLFDO��FODVVLI\�WKHP�LQWR�ILYH�EURDG�FDWHJRULHV��DQG�DVVHVV�WKHLU�UHODWLYH�HIIHFWLYHQHVV��$FWX��DOO\��HDFK�RI�WKH��PHWKRGV��ZKHQ�FRPSOHWHO\�LPSOHPHQWHG�PLJKW�OHDG�WR�PDQ\�GLIIHUHQW�FRPSXWHU�SURJUDPV�ZKLFK�GLIIHU�LQ�YDULRXV�GHWDLOV��0RUHRYHU��WKHVH�GHWDLOV�PLJKW�KDYH�PRUH�LQIOXHQFH�RQ�WKH�DFWXDO�SHUIRUPDQFH�WKDQ�RXU�JURVV�DVVHVVPHQW�LQGLFDWHV��7KXV��RXU�FRPPHQWV�PD\�QRW�GLUHFWO\�DSSO\�WR�SDUWLFXODU�VXEURXWLQHV��

,Q�DVVHVVLQJ�WKH�HIIHFWLYHQHVV�RI�YDULRXV�DOJRULWKPV�ZH�ZLOO�EH�FRQFHUQHG�ZLWK�WKH�IROORZLQJ�DWWULEXWHV��OLVWHG�LQ�GHFUHDVLQJ�RUGHU�RI�LPSRUWDQFH��JHQHUDOLW\��UHOLDELOLW\��

�5HFHLYHG�E\�WKH�HGLWRUV�-XO\�����������DQG�LQ�UHYLVHG�IRUP�0DUFK�����������W�'HSDUWPHQW�RI�0DWKHPDWLFV��8QLYHUVLW\�RI�1HZ�0H[LFR��$OEXTXHUTXH��1HZ�0H[LFR��������7KLV�

ZRUN�ZDV�SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6����������W�'HSDUWPHQW�RI�&RPSXWHU�6FLHQFH��&RUQHOO�8QLYHUVLW\��,WKDFD��1HZ�<RUN��������7KLV�ZRUN�ZDV�

SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6����������

����

This content downloaded from 128.210.126.199 on Sun, 28 Jul 2013 21:30:56 PMAll use subject to JSTOR Terms and Conditions

SIAM REVIEW c⃝ 2003 Society for Industrial and Applied MathematicsVol. 45, No. 1, pp. 3–49

Nineteen Dubious Ways toCompute the Exponential of aMatrix, Twenty-Five Years Later∗

Cleve Moler†

Charles Van Loan‡

Abstract. In principle, the exponential of a matrix could be computed in many ways. Methods involv-ing approximation theory, differential equations, the matrix eigenvalues, and the matrixcharacteristic polynomial have been proposed. In practice, consideration of computationalstability and efficiency indicates that some of the methods are preferable to others butthat none are completely satisfactory.

Most of this paper was originally published in 1978. An update, with a separatebibliography, describes a few recent developments.

Key words. matrix, exponential, roundoff error, truncation error, condition

AMS subject classifications. 15A15, 65F15, 65F30, 65L99

PII. S0036144502418010

1. Introduction. Mathematical models of many physical, biological, and eco-nomic processes involve systems of linear, constant coefficient ordinary differentialequations

x(t) = Ax(t).

Here A is a given, fixed, real or complex n-by-n matrix. A solution vector x(t) issought which satisfies an initial condition

x(0) = x0.

In control theory, A is known as the state companion matrix and x(t) is the systemresponse.

In principle, the solution is given by x(t) = etAx0, where etA can be formallydefined by the convergent power series

etA = I + tA +t2A2

2!+ · · · .

The effective computation of this matrix function is the main topic of this survey.

∗Published electronically February 3, 2003. A portion of this paper originally appeared in SIAMReview, Volume 20, Number 4, 1978, pages 801–836.

http://www.siam.org/journals/sirev/45-1/41801.html†The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098 ([email protected]).‡Department of Computer Science, Cornell University, 4130 Upson Hall, Ithaca, NY 14853-7501

([email protected]).

3

Dow

nloa

ded

07/2

8/13

to 1

28.2

10.1

26.1

99. R

edis

tribu

tion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.siam

.org

/jour

nals

/ojs

a.ph

p

Mines David Gleich · Purdue 12

Page 13: Fast relaxation methods for the matrix exponential

Matrix exponentials on large networks

exp(A) =

1X

k=0

1

k !

Ak If A is the adjacency matrix, then Ak counts the number of length k paths between node pairs.

[Estrada 2000, Farahat et al. 2002, 2006] Large entries denote important nodes or edges. Used for link prediction and centrality

If P is a transition matrix, then "Pk is the probability of a length k walk between node pairs.

[Kondor & Lafferty 2002, Kunegis & Lommatzsch 2009, Chung 2007] Used for link prediction, kernels, and clustering or community detection

exp(P) =

1X

k=0

1

k !

Pk

Mines David Gleich · Purdue 13

Page 14: Fast relaxation methods for the matrix exponential

Another useful matrix exponential

P column stochastic

e.g. P = AT D�1

A is the adjacency matrix

if A is symmetric

exp(PT) = exp(D�1A) = D�1

exp(AD�1

)D = D�1

exp(P)D

Mines David Gleich · Purdue 14

Page 15: Fast relaxation methods for the matrix exponential

Another useful matrix exponential

P column stochastic

e.g. P = AT D�1

A is the adjacency matrix

if A is symmetric

exp(�L) = exp(D�1/2AD�1/2 � I)

=

1

eexp(D�1/2AD�1/2

)

=

1

eD�1/2

exp(AD�1

)D1/2

=

1

eD�1/2

exp(P)D1/2

Negative Normalized Laplacian

Mines David Gleich · Purdue 15

heat kernel of a graph

solves the heat equation at t=1.

dx(t)dt

= �Lx(t)

Page 16: Fast relaxation methods for the matrix exponential

Matrix exponentials on large networks Is a single column interesting? Yes!

exp(P)ec =

1X

k=0

1

k !

Pk ec Link prediction scores for node c A community relative to node c

But … modern networks are "large ~ O(109) nodes, sparse ~ O(1011) edges, constantly changing …

and so we’d like "speed over accuracy

Mines David Gleich · Purdue 16

Page 17: Fast relaxation methods for the matrix exponential

Mines David Gleich · Purdue 17

Newman’s netscience

collaboration network!

379 vertices

1828 non-zeros

“zero” on most nodes

ec has a single "one here

x = exp(P)ec

Page 18: Fast relaxation methods for the matrix exponential

The issue with existing methods

We want good results in less than one matvec. Our graphs have small diameter and fast fill-in. Krylov methods !A few matvecs, quick loss of sparsity due to orthogonality !Direct expansion!A few matvecs, quick loss of sparsity due to fill-in

Mines David Gleich · Purdue 18

exp(P)ec ⇡ ⇢Vexp(H)e1

[Sidje 1998]"ExpoKit

exp(P)ec ⇡PN

k=0

1

k !

Pk ec

Page 19: Fast relaxation methods for the matrix exponential

Outline

1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Relaxation methods for "

linear systems from large networks 4.  Error analysis 5.  Experiments

Mines David Gleich · Purdue 19

Page 20: Fast relaxation methods for the matrix exponential

Our underlying method

Direct expansion!A few matvecs, quick loss of sparsity due to fill-in This method is stable for stochastic P!

"… no cancellation, unbounded norm, etc. !!

Mines David Gleich · Purdue 20

x = exp(P)ec ⇡PN

k=0

1

k !

P

kec = xN

Lemma kx � xNk1 1N!N

Page 21: Fast relaxation methods for the matrix exponential

Our underlying method !as a linear system Direct expansion! "!!!

Mines David Gleich · Purdue 21

x = exp(P)ec ⇡PN

k=0

1

k !

P

kec = xN

2

6666664

III�P/1 III

�P/2. . .. . . III

�P/N III

3

7777775

2

6666664

v0v1......

vN

3

7777775=

2

6666664

ec0......0

3

7777775xN =

NX

i=0

vi

(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec

Lemma we approximate xN well if we approximate v well

Page 22: Fast relaxation methods for the matrix exponential

Our mission (2)!Approximately solve " when A, b are sparse,"x is localized.

Mines David Gleich · Purdue 22

Ax = b

Page 23: Fast relaxation methods for the matrix exponential

Outline

1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Relaxation methods for "

linear systems from large networks 4.  Error analysis 5.  Experiments

Mines David Gleich · Purdue 23

Page 24: Fast relaxation methods for the matrix exponential

Coordinate descent, Gauss-Southwell, Gauss-Seidel, relaxation & “push” methods

Be greedy Don’t look at the whole system. Look at equations that are violated and try and fix them.

Mines David Gleich · Purdue 24

Page 25: Fast relaxation methods for the matrix exponential

Coordinate descent, Gauss-Southwell, Gauss-Seidel, relaxation & “push” methods

Mines David Gleich · Purdue 25

Algebraically! Procedurally!

Solve(A,b) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + r(j) For i where A(i,j) != 0 r(i) = r(i) – z*A(i,j)

Ax = b

r

(k ) = b � Ax

(k )

x

(k+1) = x

(k ) + ejeTj r

(k )

r

(k+1) = r

(k ) � r (k )j Aej

Page 26: Fast relaxation methods for the matrix exponential

It’s called the “push” method because of PageRank

Mines David Gleich · Purdue 26

(III� ↵P)x = v

r

(k )

= v � (III� ↵P)x

(k )

x

(k+1)

= x

(k )

+ ejeTj r

(k )

“r

(k+1)

= r

(k ) � r (k )

j Aej ”

r (k+1)

i =

8><

>:

0 i = jr (k )

i + ↵Pi ,j r(k )

j Pi ,j 6= 0

r (k )

i otherwise

PageRankPush(links,v,alpha) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + z r(j) = 0 z = alpha * z / deg(j) For i where “j links to i” r(i) = r(i) + z

Page 27: Fast relaxation methods for the matrix exponential

It’s called the “push” method because of PageRank

Mines David Gleich · Purdue 27

Demo

Page 28: Fast relaxation methods for the matrix exponential

Justification of terminology

This method is frequently “rediscovered” (3 times for PageRank!)

Let Ax = b, diag(A) = I It’s Gauss-Seidel if j is chosen cyclically It’s Gauss-Southwell if j is the largest entry in the residual It’s coordinate descent if A is symmetric, pos. definite It’s a relaxation step for any A

Works great for other problems too! "[Bonchi, Gleich, et al. J. Internet Math. 2012]

Mines David Gleich · Purdue 28

Page 29: Fast relaxation methods for the matrix exponential

Back to the exponential

Mines David Gleich · Purdue 29

2

6666664

III�P/1 III

�P/2. . .. . . III

�P/N III

3

7777775

2

6666664

v0v1......

vN

3

7777775=

2

6666664

ec0......0

3

7777775xN =

NX

i=0

vi

(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec

Solve this system via the same method. Optimization 1 build system implicitly Optimization 2 don’t store vi, just store sum xN

Page 30: Fast relaxation methods for the matrix exponential

Code (inefficient, but working) for !Gauss-Southwell to solve function x = nexpm(P,c,tol) n = size(P,1); N = 11; sumr=1; r = zeros(n,N+1); r(c,1) = 1; x = zeros(n,1); % the residual and solution while sumr >= tol % use max iteration too [ml,q]=max(r(:)); i=mod(q-1,n)+1; k=ceil(q/n); % use a heap in practice for max r(q) = 0; x(i) = x(i)+ml; sumr = sumr-ml;% zero the residual, add to solution [nset,~,vals] = find(P(:,i)); ml=ml/k; % look up the neighbors of node i for j=1:numel(nset) % for all neighbors if k==N, x(nset(j)) = x(nset(j)) + vals(j)*ml; % add to solution else, r(nset(j),k+1) = r(nset(j),k+1) + vals(j)*ml;% or add to next residual sumr = sumr + vals(j)*ml; end, end, end % end if, end for, end while Todo use dictionary for x, r and use heap or queue for residual

Mines David Gleich · Purdue 30

Page 31: Fast relaxation methods for the matrix exponential

Outline

1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Relaxation methods for "

linear systems from large networks 4.  Error analysis 5.  Experiments

Mines David Gleich · Purdue 31

✓ ✓

Page 32: Fast relaxation methods for the matrix exponential

Error analysis for Gauss-Southwell

Mines David Gleich · Purdue 32

Theorem

Assume P is column-stochastic, v

(0)

= 0.

(Nonnegativity)

iterates and residuals are nonnegative

v

(l) � 0 and r

(l) � 0

(Convergence)

residual goes to 0:

kr

(l)k1

Q

l

k=1

�1 � 1

2dk

� l

(

� 1

2d

)

(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec

“easy”

“annoying” d is the

largest degree

Page 33: Fast relaxation methods for the matrix exponential

Proof sketch

Gauss-Southwell picks largest residual ⇒  Bound the update by avg. nonzeros in residual (sloppy) ⇒  Algebraic convergence with slow rate, but each update is

REALLY fast O(d max log n).

If d is log log n, then our method runs in sub-linear time "(but so does just about anything)

Mines David Gleich · Purdue 33

Page 34: Fast relaxation methods for the matrix exponential

Overall error analysis

Mines David Gleich · Purdue 34

Components!Truncation to N terms Residual to error Approximate solve

Theorem kxN(`) � xk1 1

N!N+

1e· `� 1

2d

After ℓ steps of Gauss-Southwell

Page 35: Fast relaxation methods for the matrix exponential

More recent error analysis

Mines David Gleich · Purdue 35

Theorem (Gleich and Kloster, 2013 arXiv:1310.3423)" Consider solving personalized PageRank using the Gauss-Southwell relaxation method in a graph with a Zipf-law in the degrees with exponent p=1 and max-degree d, then the work involved in getting a solution with 1-norm error ε is

work = O⇣

log(

1

" )(

1

" )

3/2d2

(log d)

2

Page 36: Fast relaxation methods for the matrix exponential

Outline

1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Relaxation methods for "

linear systems from large networks 4.  Error analysis 5.  Experiments

Mines David Gleich · Purdue 36

✓ ✓

Page 37: Fast relaxation methods for the matrix exponential

Our implementations

C++ mex implementation with a heap to implement Gauss-Southwell. C++ mex implementation with a queue to store all residual entries ≥ 1/(tol nN).

At completion, the residual norm ≤ tol. We use the queue except for the runtime comparison.

Mines David Gleich · Purdue 37

Page 38: Fast relaxation methods for the matrix exponential

Accuracy vs. tolerance

Mines David Gleich · Purdue 38

0

0.2

0.4

0.6

0.8

1

−2 −3 −4 −5 −6 −7log10 of residual tolerance

Pre

cisi

on a

t 1

00

pgp−ccpgp social graph, 10k vertices

For the pgp social graph, we study the precision in finding the 100 largest nodes as we vary the tolerance. This set of 100 does not include the nodes immediate neighbors. (Boxplot over 50 trials)

Page 39: Fast relaxation methods for the matrix exponential

Accuracy vs. work

Mines David Gleich · Purdue 39

For the dblp collaboration graph, we study the precision in finding the 100 largest nodes as we vary the work. This set of 100 does not include the nodes immediate neighbors. (One column, but representative)

10−2 10−1 100

0

0.2

0.4

0.6

0.8

1

dblp−cc

Effective matrix−vector products

Prec

isio

n

to

l=10

−4

tol=

10−5

@10@25@100@1000

dblp collaboration graph, 225k vertices

Page 40: Fast relaxation methods for the matrix exponential

Runtime

Mines David Gleich · Purdue 40

103

104

105

106

10−4

10−2

100

|E| + |V|

Runtim

e (

secs

).

TSGSTSGSQEXPVMEXPVTAYLOR

Flickr social network"500k nodes, 5M edges

Page 41: Fast relaxation methods for the matrix exponential

Outline

1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for "

linear systems from large networks 4.  Error analysis 5.  Experiments

Mines David Gleich · Purdue 41

✓ ✓

✓ ✓

Page 42: Fast relaxation methods for the matrix exponential

References and ongoing work

Kloster and Gleich, Workshop on Algorithms for the Web-graph, 2013. Also see the journal version on arXiv. www.cs.purdue.edu/homes/dgleich/codes/nexpokit

•  Error analysis using the queue (almost done …) •  Better linear systems for faster convergence •  Asynchronous coordinate descent methods •  Scaling up to billion node graphs (done …)

Mines David Gleich · Purdue 42

Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich