the power and arnoldi methods in an algebra of circulants
DESCRIPTION
My talk from the CCAM seminar on April 19th on our NLA paper with Chen Greif and Jim Varah (http://dx.doi.org/10.1002/nla.1845)TRANSCRIPT
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 1 / 29
The power and Arnoldi methods in an algebra of circulants
David F. Gleich
Computer SciencePurdue University
CCAM SeminarApril 19th, 2013
In collaboration withChen Greif and Jim Varah (UBC)
Supported by a research grant from NSERCand the Sandia National Labs John von Neumann fellowship
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 2 / 29
IntroductionKilmer, Martin, and Perrone (2008) presented a circulantalgebra: a set of operations that generalize matrix algebra tothree-way data and provided an SVD.
The essence of this approach amounts to viewingthree-dimensional objects as two-dimensional arrays (i.e.,matrices) of one-dimensional arrays (i.e., vectors).
Braman (2010) developed spectraland other decompositions.
We have extended this algebra withthe ingredients required for iterativemethods such as the power methodand Arnoldi method, and have char-acterized the behavior of these algo-rithms.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 3 / 29
A look at the power methodRequire A,x(0), τx(0) ← x(0)
x(0)
−1
for k = 1, . . . , until convergencey(k) ← Ax(k−1)α(k) ←
y(k)
x(k) ← y(k)α(k)−1
if ‖sign((k)1 )x(k) − sign((k−1)1 )x(k−1)‖ < τ
return x(k)
end ifend for
Require a scalar inverse, norm, absolute value,...
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 4 / 29
Three-way arrays
Given an m × n × k table of data,we view this data as an m × n ma-trix where each “scalar” is a vectorof length k.
A ∈ Km×nk
We denote the space of length-k scalars as Kk.These scalars interact like circulant matrices.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 5 / 29
Circulants
Circulant matrices are a commutative, closed class under thestandard matrix operations.
α1 αk . . . α2
α2 α1. . .
......
. . .. . . αk
αk . . . α2 α1
We’ll see more of their properties shortly!
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 6 / 29
The circ operationWe denote the space of length-k scalars as Kk.These scalars interact like circulant matrices.
α = {α1 ... αk} ∈ Kk.
α ↔ circ(α) ≡
α1 αk . . . α2
α2 α1. . .
......
. . .. . . αk
αk . . . α2 α1
.
α+β↔ circ(α)+circ(β) and α◦β↔ circ(α)circ(β);
0 = {0 0 ... 0} 1 = {1 0 ... 0}
Kk is the ring of length-k circulants.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 7 / 29
The circ operation on matrices
A ◦ x =
∑nj=1 A1,j ◦ j
...∑nj=1 Am,j ◦ j
↔
circ(A1,1) ... circ(A1,n)...
. . ....
circ(Am,1) ... circ(Am,n)
circ(1)...
circ(n)
.
Define
circ(A) ≡
circ(A1,1) ... circ(A1,n)...
. . ....
circ(Am,1) ... circ(Am,n)
circ(x) ≡
circ(1)...
circ(n)
A ◦ x↔ circ(A)circ(x) matrix-vector products.
x ◦ α↔ circ(x)circ(α) vector-scalar products
This is equivalent to Kilmer, Martin, Perrone (2008).
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 8 / 29
A look at the power methodRequire A,x(0), τ −→ A,x(0), τx(0) ← x(0)
x(0)
−1 −→ x(0) ◦
x(0)
−1↔ circ(x(0))circ(
x(0)
)−1
for k = 1, . . . , until convergencey(k) ← Ax(k−1) −→ y(k) ← A ◦ x(k−1)↔ circ(A)circ(x(k−1))α(k) ←
y(k)
−→ . . .
x(k) ← y(k)α(k)−1
if ‖sign((k)1 )x(k) − sign((k−1)1 )x(k−1)‖ < τ
return x(k)
end ifend for
Require a scalar inverse Ø, norm (?), absolute value (?) ,...
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 9 / 29
Circulants and Fourier transformsLet C be a k × k circulant matrix. Then the eigenvector matrixof C is given by the k × k discrete Fourier transform matrix F,where
Fj =1pkω(−1)(j−1)
and ω = e2πι/k.É This matrix is complex symmetric, FT = F, and unitary,F∗ = F−1. Thus, C = FDF∗, D = dig(λ1, . . . , λk).
É Multiplying a vector by F or F∗ can be accomplished viathe fast Fourier transform in O(k logk) time instead ofO(k2) for the typical matrix-vector product algorithm.
É Computing the matrix D can be done in time O(k logk) aswell.
d = fft(a)
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 10 / 29
cft and icftWe define the “Circulant Fourier Transform” or cft
cft : α ∈ Kk 7→ Ck×k
and its inverseicft : Ck×k 7→ Kk
as follows:
cft(α) ≡�
α1. . .
αk
�
= F∗circ(α)F,
icft
��
α1. . .
αk
��
≡ α↔F cft(α)F∗,
where αj are the eigenvalues of circ(α) as produced in theFourier transform. These transformations satisfyicft(cft(α)) = α and provide a convenient way of movingbetween operations in Kk to the more familiar environment ofdiagonal matrices in Ck×k.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 11 / 29
Operations
Let α, β ∈ Kk. Note that
α + β = icft(cft(α) + cft(β)), and
α ◦ β = icft(cft(α) cft(β)).
In the Fourier space – the output of the cft operation – theseoperations are both O(k) time because they occur betweendiagonal matrices. These simplifications generalize tomatrix-based operations too. For example,
A ◦ x = icft(cft(A) cft(x)).
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 12 / 29
Operations (cont.)In the Fourier space, this system is a series of independentmatrix vector products:
cft(A) cft(x) =
A1. . .Ak
�
x1. . .xk
�
=
A1x1. . .Ak xk
.
We use Aj and xj to denote the blocks of Fourier coefficients, orequivalently, circulant eigenvalues. This formulation takes
O(mnk logk + nk logk)︸ ︷︷ ︸
cft and icft
+O(kmn)︸ ︷︷ ︸
matvecs
operations instead of O(mnk2) using the circ formulation.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 13 / 29
Operations (cont.)
More operations are simplified in the Fourier space too. Letcft(α) = dig [α1, ..., αk]. Because the αj values are theeigenvalues of circ(α), we have:
abs(α) = icft(dig [|α1 |, ..., |αk |]),α = icft(dig [α1, ..., αk]) = icft(cft(α)∗), and
angle(α) = icft(dig [α1/ |α1 |, ..., αk / |αk |]).
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 14 / 29
Decompositional interpretation of cftAlgebraically, the cft operation for a matrix A ∈ Km×n
k is
cft(A) = Pm(m ⊗ F∗)circ(A)(n ⊗ F)PTn ,
where Pm and Pn are permutation matrices. We canequivalently write this directly in terms of the eigenvalues ofeach of the circulant blocks of circ(A):
cft(A) ≡
A1. . .Ak
, Aj =
λ1,1j ... λ1,nj
.... . .
...λm,1j ... λm,n
j
,
where λr,s1 , . . . , λr,sk are the diagonal elements of cft(Ar,s). Theinverse operation icft, takes a block diagonal matrix andreturns the matrix in Km×n
k :
icft(A)↔ (m ⊗ F)PTmAPn(n ⊗ F∗).
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 15 / 29
Back to figure
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 16 / 29
ExampleLet A =
� {2 3 1} {8 −2 0}{−2 0 2} {3 1 1}
�
. The result of the circ and cft
operations are:
circ(A) =
2 1 3 8 0 −23 2 1 −2 8 01 3 2 0 −2 8−2 2 0 3 1 10 −2 2 1 3 12 0 −2 1 1 3
,
(⊗ F∗)circ(A)(⊗ F) =
6 6−p3ι −9+
p3ιp
3ι −9−p3ι
0 5−3+
p3ι 2
−3−p3ι 2
,
cft(A) =
6 60 5
−p3ι −9+
p3ι
−3+p3ι 2 p
3ι −9−p3ι
−3−p3ι 2
.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 17 / 29
A look at the power methodRequire A,x(0), τx(0) ← x(0)
x(0)
−1
for k = 1, . . . , until convergencey(k) ← Ax(k−1)α(k) ←
y(k)
x(k) ← y(k)α(k)−1
if ‖sign((k)1 )x(k) − sign((k−1)1 )x(k−1)‖ < τ
return x(k)
end ifend for
Require a scalar inverse Ø, norm Ø, absolute value Ø,...
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 18 / 29
These operations can now bestraightforwardly definedinverse of a scalar:
α−1↔ circ(α)−1.
more generally, function of a scalar:
ƒ (α)↔ ƒ (circ(α))
angle:
angle() || = , angle(α)↔ circ(abs(α))−1circ(α).
The norm of a vector in Knk
produces a scalar in Kk:
x
↔ (circ(x)∗circ(x))1/2 =
n∑
=1
circ()∗circ()
!1/2
.
Inner product:⟨x,y⟩ ↔ circ(y)∗circ(x).
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 19 / 29
Example
Run the power method on�
{2 3 1} {0 0 0}{0 0 0} {3 1 1}
�
Result
λ = (1/3) {10 4 4}
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 19 / 29
Example
Run the power method on�
{2 3 1} {0 0 0}{0 0 0} {3 1 1}
�
Result λ = (1/3) {10 4 4}
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 20 / 29
Example
A =�
{2 3 1} {0 0 0}{0 0 0} {3 1 1}
�
A1 =�
6 00 5
�
, A2 =
�
-ιp3 0
0 2
�
, A3 =
�
ιp3 00 2
�
.
λ1 = icft(dig [6 2 2]) = (1/3) {10 4 4}λ2 = icft(dig [5 -ιp3 ι
p3]) = (1/3) {5 2 2}
λ3 = icft(dig [6 -ιp3 ιp3]) = {2 3 1}
λ4 = icft(dig [5 2 2]) = (1/3) {3 1 1} .
The corresponding eigenvectors are
x1 =�
{1/3 1/3 1/3}{2/3 -1/3 -1/3}
�
; x2 =�
{2/3 -1/3 -1/3}{1/3 1/3 1/3}
�
;
x3 =�
{1 0 0}{0 0 0}
�
; x4 =�
{0 0 0}{1 0 0}
�
.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 21 / 29
Canonical setThere are more eigenvalues
λ5 = icft(dig [6 -ιp3 2]) λ6 = icft(dig [6 2 ιp3])
λ7 = icft(dig [5 -ιp3 2]) λ8 = icft(dig [5 2 ιp3]),
altogether polynomial number, exceeds dimension of matrix.
Definition. A canonical set of eigenvalues and eigenvectors isa set of minimum size, ordered such thatabs(λ1) ≥ abs(λ2) ≥ . . . ≥ abs(λk), which contains theinformation to reproduce any eigenvalue or eigenvector of A
In this case, the only canonical set is {(λ1,x1), (λ2,x2)}. (Needtwo, and have abs(λ1) ≥ abs(λ2).)
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 22 / 29
Keeping in real...
Let A ∈ Kn×nk be real-valued with diagonalizable Aj matrices. If
k is odd, then the eigendecomposition X ◦Λ ◦X−1 is real-valuedif and only if A1 has real-valued eigenvalues. If k is even, thenX ◦Λ ◦X−1 is real-valued if and only if A1 and Ak/2+1 havereal-valued eigenvalues.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 23 / 29
The power method converges
Let A ∈ Kn×nk have a canonical set of eigenvalues λ1, . . . , λn
where |λ1| > |λ2|, then the power method in the circulantalgebra convergences to an eigenvector x1 with eigenvalue λ1.Where we use the ordering ...
α < β↔ cft(α) < cft(β) elementwise
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 24 / 29
The Arnoldi processÉ Let A be an n× n matrix with real valued entries. Then the
Arnoldi method is a technique to build an orthogonal basisfor the Krylov subspace Kt(A,v) = span{v,Av, . . . ,At−1v},where v is an initial vector.
É We have the decomposition
AQt = Qt+1Ht+1,t
where Qt is an n× t matrix, and Ht+1,t is a (t + 1)× t upperHessenberg matrix.
É Using our repertoire of operations, the Arnoldi method inthe circulant algebra is equivalent to individual Arnoldiprocesses on each matrix Aj.
É Equivalent to a block Arnoldi process.É Using the cft and icft operations, we produce an Arnoldi
factorization:A ◦Qt = Qt+1 ◦Ht+1,t.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 25 / 29
ExampleConsider
−Δ(, y) = ƒ (, y) (,0) = (,1), (0, y) = y(1, y) = 0
for (, y) ∈ [0,1]× [0,1] with a uniform mesh and the standard5-point discrete Laplacian:
−Δ(, yj) ≈ −(−1, yj)− (, yj−1)+ 4(, yj)− (+1, yj)− (, yj+1).
Apply the boundary conditions and organizing the unknowns of in y-major order.An approximate solution is given by solving anN(N− 1)×N(N− 1) block-tridiagonal, circulant-block system.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 26 / 29
The Linear System
C −
− C. . .
. . .. . . −− C
︸ ︷︷ ︸
A
(1, ·)(2, ·)
...(N−1, ·)
︸ ︷︷ ︸
=
f(1, ·)f(2, ·)
...f(N−1, ·)
︸ ︷︷ ︸
f
,
C =
4 −1 −1
−1 4. . .
. . .. . . −1
−1 −1 4
︸ ︷︷ ︸
N×N
,
That is, A = f, or A ◦ = f, where A is an N− 1×N− 1 matrixof KN elements, and f have compatible sizes, andA = circ(A), = vec(), f = vec(f).
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 27 / 29
The canonical eigenvalues of A are
λj = {4+2cos(jπ/N),−1,0,...,0,−1} .
To see this result, let λ(μ) = {μ,−1,0,...,0,−1} . Then
(A− λ(μ) ◦ ) =
(4− μ) ◦ 1 −1 ◦ 1
−1 ◦ 1 (4− μ) ◦ 1. . .
. . .. . . −1 ◦ 1−1 ◦ 1 (4− μ) ◦ 1
.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 28 / 29
2000 4000 6000 800010
−15
10−10
10−5
100
(2 +2 cos( 2π/n)
2 +2 cos( π/n)
)2 i
(6 +2 cos( 2π/n)
6 +2 cos( π/n)
)2 i
(6 +2 cos( 2π/n)
6 +2 cos( π/n)
)i
iteration
magnitude
Eigenvalue ErrorEigenvector Change
Figure: The convergence behavior of the powermethod in the circulant algebra. The gray lines showthe error in the each eigenvalue in Fourier space.These curves track the predictions made based onthe eigenvalues as discussed in the text. The redline shows the magnitude of the change in theeigenvector. We use this as the stopping criteria. Italso decays as predicted by the ratio of eigenvalues.The blue fit lines have been visually adjusted tomatch the behavior in the convergence tail.
0 10 20 30 40 50
10−15
10−10
10−5
100
Arnoldi iteration
Magnitude
Absolute errorResidual magnitude
Figure: The convergence behavior of a GMRESprocedure using the circulant Arnoldi process. Thegray lines show the error in each Fourier componentand the red line shows the magnitude of theresidual. We observe poor convergence in oneFourier component; until the Arnoldi basis capturesall of the eigenvalues after N/2+ 1 = 26 iterations.These results show how the two computations areperforming individual power methods or Arnoldiprocesses in Fourier space.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) CCAM Seminar 29 / 29
The End
Paper available online from http://www.cs.ubc.ca/˜greif:“The power and Arnoldi methods in an algebra of circulants”,David Gleich, Chen Greif and Jim Varah
Thank you!