the power and arnoldi methods in an algebra of circulants

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) CCAM Seminar 1 / 29

The power and Arnoldi methods in an algebra of circulants

David F. Gleich

Computer SciencePurdue University

CCAM SeminarApril 19th, 2013

In collaboration withChen Greif and Jim Varah (UBC)

Supported by a research grant from NSERCand the Sandia National Labs John von Neumann fellowship

40 60 80 100 120

40

60

80

mm


IntroductionKilmer, Martin, and Perrone (2008) presented a circulantalgebra: a set of operations that generalize matrix algebra tothree-way data and provided an SVD.

The essence of this approach amounts to viewingthree-dimensional objects as two-dimensional arrays (i.e.,matrices) of one-dimensional arrays (i.e., vectors).

Braman (2010) developed spectraland other decompositions.

We have extended this algebra withthe ingredients required for iterativemethods such as the power methodand Arnoldi method, and have char-acterized the behavior of these algo-rithms.

40 60 80 100 120

40

60

80

mm


A look at the power methodRequire A,x(0), τx(0) ← x(0)

x(0)

−1

for k = 1, . . . , until convergencey(k) ← Ax(k−1)α(k) ←

y(k)

x(k) ← y(k)α(k)−1

if ‖sign((k)1 )x(k) − sign((k−1)1 )x(k−1)‖ < τ

return x(k)

end ifend for

Require a scalar inverse, norm, absolute value,...

40 60 80 100 120

40

60

80

mm


Three-way arrays

Given an m × n × k table of data,we view this data as an m × n ma-trix where each “scalar” is a vectorof length k.

A ∈ Km×nk

We denote the space of length-k scalars as Kk.These scalars interact like circulant matrices.

40 60 80 100 120

40

60

80

mm


Circulants

Circulant matrices are a commutative, closed class under thestandard matrix operations.

α1 αk . . . α2

α2 α1. . .

......

. . .. . . αk

αk . . . α2 α1

We’ll see more of their properties shortly!

40 60 80 100 120

40

60

80

mm


The circ operationWe denote the space of length-k scalars as Kk.These scalars interact like circulant matrices.

α = {α1 ... αk} ∈ Kk.

α ↔ circ(α) ≡

α1 αk . . . α2

α2 α1. . .

......

. . .. . . αk

αk . . . α2 α1

.

α+β↔ circ(α)+circ(β) and α◦β↔ circ(α)circ(β);

0 = {0 0 ... 0} 1 = {1 0 ... 0}

Kk is the ring of length-k circulants.

40 60 80 100 120

40

60

80

mm


The circ operation on matrices

A ◦ x =

∑nj=1 A1,j ◦ j

...∑nj=1 Am,j ◦ j

↔

circ(A1,1) ... circ(A1,n)...

. . ....

circ(Am,1) ... circ(Am,n)

circ(1)...

circ(n)

.

Define

circ(A) ≡

circ(A1,1) ... circ(A1,n)...

. . ....

circ(Am,1) ... circ(Am,n)

circ(x) ≡

circ(1)...

circ(n)

A ◦ x↔ circ(A)circ(x) matrix-vector products.

x ◦ α↔ circ(x)circ(α) vector-scalar products

This is equivalent to Kilmer, Martin, Perrone (2008).

40 60 80 100 120

40

60

80

mm


A look at the power methodRequire A,x(0), τ −→ A,x(0), τx(0) ← x(0)

x(0)

−1 −→ x(0) ◦

x(0)

−1↔ circ(x(0))circ(

x(0)

)−1

for k = 1, . . . , until convergencey(k) ← Ax(k−1) −→ y(k) ← A ◦ x(k−1)↔ circ(A)circ(x(k−1))α(k) ←

y(k)

−→ . . .

x(k) ← y(k)α(k)−1


return x(k)

end ifend for

Require a scalar inverse Ø, norm (?), absolute value (?) ,...

40 60 80 100 120

40

60

80

mm


Circulants and Fourier transformsLet C be a k × k circulant matrix. Then the eigenvector matrixof C is given by the k × k discrete Fourier transform matrix F,where

Fj =1pkω(−1)(j−1)

and ω = e2πι/k.É This matrix is complex symmetric, FT = F, and unitary,F∗ = F−1. Thus, C = FDF∗, D = dig(λ1, . . . , λk).

É Multiplying a vector by F or F∗ can be accomplished viathe fast Fourier transform in O(k logk) time instead ofO(k2) for the typical matrix-vector product algorithm.

É Computing the matrix D can be done in time O(k logk) aswell.

d = fft(a)

40 60 80 100 120

40

60

80

mm


cft and icftWe define the “Circulant Fourier Transform” or cft

cft : α ∈ Kk 7→ Ck×k

and its inverseicft : Ck×k 7→ Kk

as follows:

cft(α) ≡�

α1. . .

αk

�

= F∗circ(α)F,

icft

��

α1. . .

αk

��

≡ α↔F cft(α)F∗,

where αj are the eigenvalues of circ(α) as produced in theFourier transform. These transformations satisfyicft(cft(α)) = α and provide a convenient way of movingbetween operations in Kk to the more familiar environment ofdiagonal matrices in Ck×k.

40 60 80 100 120

40

60

80

mm


Operations

Let α, β ∈ Kk. Note that

α + β = icft(cft(α) + cft(β)), and

α ◦ β = icft(cft(α) cft(β)).

In the Fourier space – the output of the cft operation – theseoperations are both O(k) time because they occur betweendiagonal matrices. These simplifications generalize tomatrix-based operations too. For example,

A ◦ x = icft(cft(A) cft(x)).

40 60 80 100 120

40

60

80

mm


Operations (cont.)In the Fourier space, this system is a series of independentmatrix vector products:

cft(A) cft(x) =

A1. . .Ak

�

x1. . .xk

�

=

A1x1. . .Ak xk

.

We use Aj and xj to denote the blocks of Fourier coefficients, orequivalently, circulant eigenvalues. This formulation takes

O(mnk logk + nk logk)︸︷︷︸

cft and icft

+O(kmn)︸︷︷︸

matvecs

operations instead of O(mnk2) using the circ formulation.

40 60 80 100 120

40

60

80

mm


Operations (cont.)

More operations are simplified in the Fourier space too. Letcft(α) = dig [α1, ..., αk]. Because the αj values are theeigenvalues of circ(α), we have:

abs(α) = icft(dig [|α1 |, ..., |αk |]),α = icft(dig [α1, ..., αk]) = icft(cft(α)∗), and

angle(α) = icft(dig [α1/ |α1 |, ..., αk / |αk |]).

40 60 80 100 120

40

60

80

mm


Decompositional interpretation of cftAlgebraically, the cft operation for a matrix A ∈ Km×n

k is

cft(A) = Pm(m ⊗ F∗)circ(A)(n ⊗ F)PTn ,

where Pm and Pn are permutation matrices. We canequivalently write this directly in terms of the eigenvalues ofeach of the circulant blocks of circ(A):

cft(A) ≡

A1. . .Ak

, Aj =

λ1,1j ... λ1,nj

.... . .

...λm,1j ... λm,n

j

,

where λr,s1 , . . . , λr,sk are the diagonal elements of cft(Ar,s). Theinverse operation icft, takes a block diagonal matrix andreturns the matrix in Km×n

k :

icft(A)↔ (m ⊗ F)PTmAPn(n ⊗ F∗).

40 60 80 100 120

40

60

80

mm


Back to figure

40 60 80 100 120

40

60

80

mm


ExampleLet A =

� {2 3 1} {8 −2 0}{−2 0 2} {3 1 1}

�

. The result of the circ and cft

operations are:

circ(A) =

2 1 3 8 0 −23 2 1 −2 8 01 3 2 0 −2 8−2 2 0 3 1 10 −2 2 1 3 12 0 −2 1 1 3

,

(⊗ F∗)circ(A)(⊗ F) =

6 6−p3ι −9+

p3ιp

3ι −9−p3ι

0 5−3+

p3ι 2

−3−p3ι 2

,

cft(A) =

6 60 5

−p3ι −9+

p3ι

−3+p3ι 2 p

3ι −9−p3ι

−3−p3ι 2

.

40 60 80 100 120

40

60

80

mm


A look at the power methodRequire A,x(0), τx(0) ← x(0)

x(0)

−1

for k = 1, . . . , until convergencey(k) ← Ax(k−1)α(k) ←

y(k)

x(k) ← y(k)α(k)−1


return x(k)

end ifend for

Require a scalar inverse Ø, norm Ø, absolute value Ø,...

40 60 80 100 120

40

60

80

mm


These operations can now bestraightforwardly definedinverse of a scalar:

α−1↔ circ(α)−1.

more generally, function of a scalar:

ƒ (α)↔ ƒ (circ(α))

angle:

angle() || = , angle(α)↔ circ(abs(α))−1circ(α).

The norm of a vector in Knk

produces a scalar in Kk:

x

↔ (circ(x)∗circ(x))1/2 =

n∑

=1

circ()∗circ()

!1/2

.

Inner product:⟨x,y⟩ ↔ circ(y)∗circ(x).

40 60 80 100 120

40

60

80

mm


Example

Run the power method on�

{2 3 1} {0 0 0}{0 0 0} {3 1 1}

�

Result

λ = (1/3) {10 4 4}

40 60 80 100 120

40

60

80

mm


Example

Run the power method on�

{2 3 1} {0 0 0}{0 0 0} {3 1 1}

�

Result λ = (1/3) {10 4 4}

40 60 80 100 120

40

60

80

mm


Example

A =�

{2 3 1} {0 0 0}{0 0 0} {3 1 1}

�

A1 =�

6 00 5

�

, A2 =

�

-ιp3 0

0 2

�

, A3 =

�

ιp3 00 2

�

.

λ1 = icft(dig [6 2 2]) = (1/3) {10 4 4}λ2 = icft(dig [5 -ιp3 ι

p3]) = (1/3) {5 2 2}

λ3 = icft(dig [6 -ιp3 ιp3]) = {2 3 1}

λ4 = icft(dig [5 2 2]) = (1/3) {3 1 1} .

The corresponding eigenvectors are

x1 =�

{1/3 1/3 1/3}{2/3 -1/3 -1/3}

�

; x2 =�

{2/3 -1/3 -1/3}{1/3 1/3 1/3}

�

;

x3 =�

{1 0 0}{0 0 0}

�

; x4 =�

{0 0 0}{1 0 0}

�

.

40 60 80 100 120

40

60

80

mm


Canonical setThere are more eigenvalues

λ5 = icft(dig [6 -ιp3 2]) λ6 = icft(dig [6 2 ιp3])

λ7 = icft(dig [5 -ιp3 2]) λ8 = icft(dig [5 2 ιp3]),

altogether polynomial number, exceeds dimension of matrix.

Definition. A canonical set of eigenvalues and eigenvectors isa set of minimum size, ordered such thatabs(λ1) ≥ abs(λ2) ≥ . . . ≥ abs(λk), which contains theinformation to reproduce any eigenvalue or eigenvector of A

In this case, the only canonical set is {(λ1,x1), (λ2,x2)}. (Needtwo, and have abs(λ1) ≥ abs(λ2).)

40 60 80 100 120

40

60

80

mm


Keeping in real...

Let A ∈ Kn×nk be real-valued with diagonalizable Aj matrices. If

k is odd, then the eigendecomposition X ◦Λ ◦X−1 is real-valuedif and only if A1 has real-valued eigenvalues. If k is even, thenX ◦Λ ◦X−1 is real-valued if and only if A1 and Ak/2+1 havereal-valued eigenvalues.

40 60 80 100 120

40

60

80

mm


The power method converges

Let A ∈ Kn×nk have a canonical set of eigenvalues λ1, . . . , λn

where |λ1| > |λ2|, then the power method in the circulantalgebra convergences to an eigenvector x1 with eigenvalue λ1.Where we use the ordering ...

α < β↔ cft(α) < cft(β) elementwise

40 60 80 100 120

40

60

80

mm


The Arnoldi processÉ Let A be an n× n matrix with real valued entries. Then the

Arnoldi method is a technique to build an orthogonal basisfor the Krylov subspace Kt(A,v) = span{v,Av, . . . ,At−1v},where v is an initial vector.

É We have the decomposition

AQt = Qt+1Ht+1,t

where Qt is an n× t matrix, and Ht+1,t is a (t + 1)× t upperHessenberg matrix.

É Using our repertoire of operations, the Arnoldi method inthe circulant algebra is equivalent to individual Arnoldiprocesses on each matrix Aj.

É Equivalent to a block Arnoldi process.É Using the cft and icft operations, we produce an Arnoldi

factorization:A ◦Qt = Qt+1 ◦Ht+1,t.

40 60 80 100 120

40

60

80

mm


ExampleConsider

−Δ(, y) = ƒ (, y) (,0) = (,1), (0, y) = y(1, y) = 0

for (, y) ∈ [0,1]× [0,1] with a uniform mesh and the standard5-point discrete Laplacian:

−Δ(, yj) ≈ −(−1, yj)− (, yj−1)+ 4(, yj)− (+1, yj)− (, yj+1).

Apply the boundary conditions and organizing the unknowns of in y-major order.An approximate solution is given by solving anN(N− 1)×N(N− 1) block-tridiagonal, circulant-block system.

40 60 80 100 120

40

60

80

mm


The Linear System

C −

− C. . .

. . .. . . −− C

︸︷︷︸

A

(1, ·)(2, ·)

...(N−1, ·)

︸︷︷︸

=

f(1, ·)f(2, ·)

...f(N−1, ·)

︸︷︷︸

f

,

C =

4 −1 −1

−1 4. . .

. . .. . . −1

−1 −1 4

︸︷︷︸

N×N

,

That is, A = f, or A ◦ = f, where A is an N− 1×N− 1 matrixof KN elements, and f have compatible sizes, andA = circ(A), = vec(), f = vec(f).

40 60 80 100 120

40

60

80

mm


The canonical eigenvalues of A are

λj = {4+2cos(jπ/N),−1,0,...,0,−1} .

To see this result, let λ(μ) = {μ,−1,0,...,0,−1} . Then

(A− λ(μ) ◦ ) =

(4− μ) ◦ 1 −1 ◦ 1

−1 ◦ 1 (4− μ) ◦ 1. . .

. . .. . . −1 ◦ 1−1 ◦ 1 (4− μ) ◦ 1

.

40 60 80 100 120

40

60

80

mm


2000 4000 6000 800010

−15

10−10

10−5

100

(2 +2 cos( 2π/n)

2 +2 cos( π/n)

)2 i

(6 +2 cos( 2π/n)

6 +2 cos( π/n)

)2 i

(6 +2 cos( 2π/n)

6 +2 cos( π/n)

)i

iteration

magnitude

Eigenvalue ErrorEigenvector Change

Figure: The convergence behavior of the powermethod in the circulant algebra. The gray lines showthe error in the each eigenvalue in Fourier space.These curves track the predictions made based onthe eigenvalues as discussed in the text. The redline shows the magnitude of the change in theeigenvector. We use this as the stopping criteria. Italso decays as predicted by the ratio of eigenvalues.The blue fit lines have been visually adjusted tomatch the behavior in the convergence tail.

0 10 20 30 40 50

10−15

10−10

10−5

100

Arnoldi iteration

Magnitude

Absolute errorResidual magnitude

Figure: The convergence behavior of a GMRESprocedure using the circulant Arnoldi process. Thegray lines show the error in each Fourier componentand the red line shows the magnitude of theresidual. We observe poor convergence in oneFourier component; until the Arnoldi basis capturesall of the eigenvalues after N/2+ 1 = 26 iterations.These results show how the two computations areperforming individual power methods or Arnoldiprocesses in Fourier space.

40 60 80 100 120

40

60

80

mm


The End

Paper available online from http://www.cs.ubc.ca/˜greif:“The power and Arnoldi methods in an algebra of circulants”,David Gleich, Chen Greif and Jim Varah

Thank you!

the power and arnoldi methods in an algebra of circulants

Technology

gleich purdue ccam seminar

x0 x0 x0

matrix f

circulant matrix

circx0circ x0

fourier space

circulant fourier

matrix d