the power and arnoldi methods in an algebra of circulants

30
David F. Gleich (Purdue) CCAM Seminar 1 / 29 The power and Arnoldi methods in an algebra of circulants David F. Gleich Computer Science Purdue University CCAM Seminar April 19th, 2013 In collaboration with Chen Greif and Jim Varah (UBC) Supported by a research grant from NSERC and the Sandia National Labs John von Neumann fellowship

Upload: david-gleich

Post on 15-Jan-2015

353 views

Category:

Technology


0 download

DESCRIPTION

My talk from the CCAM seminar on April 19th on our NLA paper with Chen Greif and Jim Varah (http://dx.doi.org/10.1002/nla.1845)

TRANSCRIPT

Page 1: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 1 / 29

The power and Arnoldi methods in an algebra of circulants

David F. Gleich

Computer SciencePurdue University

CCAM SeminarApril 19th, 2013

In collaboration withChen Greif and Jim Varah (UBC)

Supported by a research grant from NSERCand the Sandia National Labs John von Neumann fellowship

Page 2: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 2 / 29

IntroductionKilmer, Martin, and Perrone (2008) presented a circulantalgebra: a set of operations that generalize matrix algebra tothree-way data and provided an SVD.

The essence of this approach amounts to viewingthree-dimensional objects as two-dimensional arrays (i.e.,matrices) of one-dimensional arrays (i.e., vectors).

Braman (2010) developed spectraland other decompositions.

We have extended this algebra withthe ingredients required for iterativemethods such as the power methodand Arnoldi method, and have char-acterized the behavior of these algo-rithms.

Page 3: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 3 / 29

A look at the power methodRequire A,x(0), τx(0) ← x(0)

x(0)

−1

for k = 1, . . . , until convergencey(k) ← Ax(k−1)α(k) ←

y(k)

x(k) ← y(k)α(k)−1

if ‖sign((k)1 )x(k) − sign((k−1)1 )x(k−1)‖ < τ

return x(k)

end ifend for

Require a scalar inverse, norm, absolute value,...

Page 4: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 4 / 29

Three-way arrays

Given an m × n × k table of data,we view this data as an m × n ma-trix where each “scalar” is a vectorof length k.

A ∈ Km×nk

We denote the space of length-k scalars as Kk.These scalars interact like circulant matrices.

Page 5: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 5 / 29

Circulants

Circulant matrices are a commutative, closed class under thestandard matrix operations.

α1 αk . . . α2

α2 α1. . .

......

. . .. . . αk

αk . . . α2 α1

We’ll see more of their properties shortly!

Page 6: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 6 / 29

The circ operationWe denote the space of length-k scalars as Kk.These scalars interact like circulant matrices.

α = {α1 ... αk} ∈ Kk.

α ↔ circ(α) ≡

α1 αk . . . α2

α2 α1. . .

......

. . .. . . αk

αk . . . α2 α1

.

α+β↔ circ(α)+circ(β) and α◦β↔ circ(α)circ(β);

0 = {0 0 ... 0} 1 = {1 0 ... 0}

Kk is the ring of length-k circulants.

Page 7: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 7 / 29

The circ operation on matrices

A ◦ x =

∑nj=1 A1,j ◦ j

...∑nj=1 Am,j ◦ j

circ(A1,1) ... circ(A1,n)...

. . ....

circ(Am,1) ... circ(Am,n)

circ(1)...

circ(n)

.

Define

circ(A) ≡

circ(A1,1) ... circ(A1,n)...

. . ....

circ(Am,1) ... circ(Am,n)

circ(x) ≡

circ(1)...

circ(n)

A ◦ x↔ circ(A)circ(x) matrix-vector products.

x ◦ α↔ circ(x)circ(α) vector-scalar products

This is equivalent to Kilmer, Martin, Perrone (2008).

Page 8: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 8 / 29

A look at the power methodRequire A,x(0), τ −→ A,x(0), τx(0) ← x(0)

x(0)

−1 −→ x(0) ◦

x(0)

−1↔ circ(x(0))circ(

x(0)

)−1

for k = 1, . . . , until convergencey(k) ← Ax(k−1) −→ y(k) ← A ◦ x(k−1)↔ circ(A)circ(x(k−1))α(k) ←

y(k)

−→ . . .

x(k) ← y(k)α(k)−1

if ‖sign((k)1 )x(k) − sign((k−1)1 )x(k−1)‖ < τ

return x(k)

end ifend for

Require a scalar inverse Ø, norm (?), absolute value (?) ,...

Page 9: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 9 / 29

Circulants and Fourier transformsLet C be a k × k circulant matrix. Then the eigenvector matrixof C is given by the k × k discrete Fourier transform matrix F,where

Fj =1pkω(−1)(j−1)

and ω = e2πι/k.É This matrix is complex symmetric, FT = F, and unitary,F∗ = F−1. Thus, C = FDF∗, D = dig(λ1, . . . , λk).

É Multiplying a vector by F or F∗ can be accomplished viathe fast Fourier transform in O(k logk) time instead ofO(k2) for the typical matrix-vector product algorithm.

É Computing the matrix D can be done in time O(k logk) aswell.

d = fft(a)

Page 10: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 10 / 29

cft and icftWe define the “Circulant Fourier Transform” or cft

cft : α ∈ Kk 7→ Ck×k

and its inverseicft : Ck×k 7→ Kk

as follows:

cft(α) ≡�

α1. . .

αk

= F∗circ(α)F,

icft

��

α1. . .

αk

��

≡ α↔F cft(α)F∗,

where αj are the eigenvalues of circ(α) as produced in theFourier transform. These transformations satisfyicft(cft(α)) = α and provide a convenient way of movingbetween operations in Kk to the more familiar environment ofdiagonal matrices in Ck×k.

Page 11: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 11 / 29

Operations

Let α, β ∈ Kk. Note that

α + β = icft(cft(α) + cft(β)), and

α ◦ β = icft(cft(α) cft(β)).

In the Fourier space – the output of the cft operation – theseoperations are both O(k) time because they occur betweendiagonal matrices. These simplifications generalize tomatrix-based operations too. For example,

A ◦ x = icft(cft(A) cft(x)).

Page 12: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 12 / 29

Operations (cont.)In the Fourier space, this system is a series of independentmatrix vector products:

cft(A) cft(x) =

A1. . .Ak

x1. . .xk

=

A1x1. . .Ak xk

.

We use Aj and xj to denote the blocks of Fourier coefficients, orequivalently, circulant eigenvalues. This formulation takes

O(mnk logk + nk logk)︸ ︷︷ ︸

cft and icft

+O(kmn)︸ ︷︷ ︸

matvecs

operations instead of O(mnk2) using the circ formulation.

Page 13: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 13 / 29

Operations (cont.)

More operations are simplified in the Fourier space too. Letcft(α) = dig [α1, ..., αk]. Because the αj values are theeigenvalues of circ(α), we have:

abs(α) = icft(dig [|α1 |, ..., |αk |]),α = icft(dig [α1, ..., αk]) = icft(cft(α)∗), and

angle(α) = icft(dig [α1/ |α1 |, ..., αk / |αk |]).

Page 14: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 14 / 29

Decompositional interpretation of cftAlgebraically, the cft operation for a matrix A ∈ Km×n

k is

cft(A) = Pm(m ⊗ F∗)circ(A)(n ⊗ F)PTn ,

where Pm and Pn are permutation matrices. We canequivalently write this directly in terms of the eigenvalues ofeach of the circulant blocks of circ(A):

cft(A) ≡

A1. . .Ak

, Aj =

λ1,1j ... λ1,nj

.... . .

...λm,1j ... λm,n

j

,

where λr,s1 , . . . , λr,sk are the diagonal elements of cft(Ar,s). Theinverse operation icft, takes a block diagonal matrix andreturns the matrix in Km×n

k :

icft(A)↔ (m ⊗ F)PTmAPn(n ⊗ F∗).

Page 15: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 15 / 29

Back to figure

Page 16: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 16 / 29

ExampleLet A =

� {2 3 1} {8 −2 0}{−2 0 2} {3 1 1}

. The result of the circ and cft

operations are:

circ(A) =

2 1 3 8 0 −23 2 1 −2 8 01 3 2 0 −2 8−2 2 0 3 1 10 −2 2 1 3 12 0 −2 1 1 3

,

(⊗ F∗)circ(A)(⊗ F) =

6 6−p3ι −9+

p3ιp

3ι −9−p3ι

0 5−3+

p3ι 2

−3−p3ι 2

,

cft(A) =

6 60 5

−p3ι −9+

p3ι

−3+p3ι 2 p

3ι −9−p3ι

−3−p3ι 2

.

Page 17: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 17 / 29

A look at the power methodRequire A,x(0), τx(0) ← x(0)

x(0)

−1

for k = 1, . . . , until convergencey(k) ← Ax(k−1)α(k) ←

y(k)

x(k) ← y(k)α(k)−1

if ‖sign((k)1 )x(k) − sign((k−1)1 )x(k−1)‖ < τ

return x(k)

end ifend for

Require a scalar inverse Ø, norm Ø, absolute value Ø,...

Page 18: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 18 / 29

These operations can now bestraightforwardly definedinverse of a scalar:

α−1↔ circ(α)−1.

more generally, function of a scalar:

ƒ (α)↔ ƒ (circ(α))

angle:

angle() || = , angle(α)↔ circ(abs(α))−1circ(α).

The norm of a vector in Knk

produces a scalar in Kk:

x

↔ (circ(x)∗circ(x))1/2 =

n∑

=1

circ()∗circ()

!1/2

.

Inner product:⟨x,y⟩ ↔ circ(y)∗circ(x).

Page 19: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 19 / 29

Example

Run the power method on�

{2 3 1} {0 0 0}{0 0 0} {3 1 1}

Result

λ = (1/3) {10 4 4}

Page 20: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 19 / 29

Example

Run the power method on�

{2 3 1} {0 0 0}{0 0 0} {3 1 1}

Result λ = (1/3) {10 4 4}

Page 21: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 20 / 29

Example

A =�

{2 3 1} {0 0 0}{0 0 0} {3 1 1}

A1 =�

6 00 5

, A2 =

-ιp3 0

0 2

, A3 =

ιp3 00 2

.

λ1 = icft(dig [6 2 2]) = (1/3) {10 4 4}λ2 = icft(dig [5 -ιp3 ι

p3]) = (1/3) {5 2 2}

λ3 = icft(dig [6 -ιp3 ιp3]) = {2 3 1}

λ4 = icft(dig [5 2 2]) = (1/3) {3 1 1} .

The corresponding eigenvectors are

x1 =�

{1/3 1/3 1/3}{2/3 -1/3 -1/3}

; x2 =�

{2/3 -1/3 -1/3}{1/3 1/3 1/3}

;

x3 =�

{1 0 0}{0 0 0}

; x4 =�

{0 0 0}{1 0 0}

.

Page 22: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 21 / 29

Canonical setThere are more eigenvalues

λ5 = icft(dig [6 -ιp3 2]) λ6 = icft(dig [6 2 ιp3])

λ7 = icft(dig [5 -ιp3 2]) λ8 = icft(dig [5 2 ιp3]),

altogether polynomial number, exceeds dimension of matrix.

Definition. A canonical set of eigenvalues and eigenvectors isa set of minimum size, ordered such thatabs(λ1) ≥ abs(λ2) ≥ . . . ≥ abs(λk), which contains theinformation to reproduce any eigenvalue or eigenvector of A

In this case, the only canonical set is {(λ1,x1), (λ2,x2)}. (Needtwo, and have abs(λ1) ≥ abs(λ2).)

Page 23: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 22 / 29

Keeping in real...

Let A ∈ Kn×nk be real-valued with diagonalizable Aj matrices. If

k is odd, then the eigendecomposition X ◦Λ ◦X−1 is real-valuedif and only if A1 has real-valued eigenvalues. If k is even, thenX ◦Λ ◦X−1 is real-valued if and only if A1 and Ak/2+1 havereal-valued eigenvalues.

Page 24: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 23 / 29

The power method converges

Let A ∈ Kn×nk have a canonical set of eigenvalues λ1, . . . , λn

where |λ1| > |λ2|, then the power method in the circulantalgebra convergences to an eigenvector x1 with eigenvalue λ1.Where we use the ordering ...

α < β↔ cft(α) < cft(β) elementwise

Page 25: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 24 / 29

The Arnoldi processÉ Let A be an n× n matrix with real valued entries. Then the

Arnoldi method is a technique to build an orthogonal basisfor the Krylov subspace Kt(A,v) = span{v,Av, . . . ,At−1v},where v is an initial vector.

É We have the decomposition

AQt = Qt+1Ht+1,t

where Qt is an n× t matrix, and Ht+1,t is a (t + 1)× t upperHessenberg matrix.

É Using our repertoire of operations, the Arnoldi method inthe circulant algebra is equivalent to individual Arnoldiprocesses on each matrix Aj.

É Equivalent to a block Arnoldi process.É Using the cft and icft operations, we produce an Arnoldi

factorization:A ◦Qt = Qt+1 ◦Ht+1,t.

Page 26: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 25 / 29

ExampleConsider

−Δ(, y) = ƒ (, y) (,0) = (,1), (0, y) = y(1, y) = 0

for (, y) ∈ [0,1]× [0,1] with a uniform mesh and the standard5-point discrete Laplacian:

−Δ(, yj) ≈ −(−1, yj)− (, yj−1)+ 4(, yj)− (+1, yj)− (, yj+1).

Apply the boundary conditions and organizing the unknowns of in y-major order.An approximate solution is given by solving anN(N− 1)×N(N− 1) block-tridiagonal, circulant-block system.

Page 27: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 26 / 29

The Linear System

C −

− C. . .

. . .. . . −− C

︸ ︷︷ ︸

A

(1, ·)(2, ·)

...(N−1, ·)

︸ ︷︷ ︸

=

f(1, ·)f(2, ·)

...f(N−1, ·)

︸ ︷︷ ︸

f

,

C =

4 −1 −1

−1 4. . .

. . .. . . −1

−1 −1 4

︸ ︷︷ ︸

N×N

,

That is, A = f, or A ◦ = f, where A is an N− 1×N− 1 matrixof KN elements, and f have compatible sizes, andA = circ(A), = vec(), f = vec(f).

Page 28: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 27 / 29

The canonical eigenvalues of A are

λj = {4+2cos(jπ/N),−1,0,...,0,−1} .

To see this result, let λ(μ) = {μ,−1,0,...,0,−1} . Then

(A− λ(μ) ◦ ) =

(4− μ) ◦ 1 −1 ◦ 1

−1 ◦ 1 (4− μ) ◦ 1. . .

. . .. . . −1 ◦ 1−1 ◦ 1 (4− μ) ◦ 1

.

Page 29: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 28 / 29

2000 4000 6000 800010

−15

10−10

10−5

100

(2 +2 cos( 2π/n)

2 +2 cos( π/n)

)2 i

(6 +2 cos( 2π/n)

6 +2 cos( π/n)

)2 i

(6 +2 cos( 2π/n)

6 +2 cos( π/n)

)i

iteration

magnitude

Eigenvalue ErrorEigenvector Change

Figure: The convergence behavior of the powermethod in the circulant algebra. The gray lines showthe error in the each eigenvalue in Fourier space.These curves track the predictions made based onthe eigenvalues as discussed in the text. The redline shows the magnitude of the change in theeigenvector. We use this as the stopping criteria. Italso decays as predicted by the ratio of eigenvalues.The blue fit lines have been visually adjusted tomatch the behavior in the convergence tail.

0 10 20 30 40 50

10−15

10−10

10−5

100

Arnoldi iteration

Magnitude

Absolute errorResidual magnitude

Figure: The convergence behavior of a GMRESprocedure using the circulant Arnoldi process. Thegray lines show the error in each Fourier componentand the red line shows the magnitude of theresidual. We observe poor convergence in oneFourier component; until the Arnoldi basis capturesall of the eigenvalues after N/2+ 1 = 26 iterations.These results show how the two computations areperforming individual power methods or Arnoldiprocesses in Fourier space.

Page 30: The power and Arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 29 / 29

The End

Paper available online from http://www.cs.ubc.ca/˜greif:“The power and Arnoldi methods in an algebra of circulants”,David Gleich, Chen Greif and Jim Varah

Thank you!