iterative methods andcombinatorial preconditioners
Post on 02-Jan-2016
35 Views
Preview:
DESCRIPTION
TRANSCRIPT
2003-09-10
Maverick Woo
Iterative Methods andIterative Methods andCombinatorial PreconditionersCombinatorial PreconditionersIterative Methods andIterative Methods andCombinatorial PreconditionersCombinatorial Preconditioners
2
This talk is not about…
3
4
CreditsCreditsCreditsCreditsSolving Symmetric Diagonally-
Dominant Systems By Preconditioning
Bruce Maggs, Gary Miller, Ojas Parekh, R. Ravi, mw
Combinatorial Preconditioners for Large, Sparse, Symmetric, Diagonally-
Dominant Linear SystemsKeith Gremban (CMU PhD 1996)
5
Linear SystemsLinear SystemsLinear SystemsLinear Systems
n by n matrix n by 1 vector
A useful way to do matrix algebra in your head:Matrix-vector multiplication = Linear combination of matrix columns
Ax = bknown unknown known
6
Matrix-Vector MultiplicationMatrix-Vector MultiplicationMatrix-Vector MultiplicationMatrix-Vector Multiplication
Using BTAT = (AB)T,xTA should also be interpreted asa linear combination of the rows of A.
0
@11 12 1321 22 2331 32 33
1
A
0
@abc
1
A = a
0
@112131
1
A +b
0
@122232
1
A + c
0
@132333
1
A
=
0
@11a+ 12b+ 13c21a+ 22b+ 23c31a+ 32b+ 33c
1
A
7
How to Solve?How to Solve?How to Solve?How to Solve?
Find A-1
Guess x repeatedly until we guess a solutionGaussian Elimination
Ax = b
Strassen had a fastermethod to find A-1
Strassen had a fastermethod to find A-1
8
Large Linear Systems in The Real World
9
Circuit Voltage ProblemCircuit Voltage ProblemCircuit Voltage ProblemCircuit Voltage Problem
Given a resistive network and the net current flow at each terminal, find the voltage at each node.
2A
2A
Conductance is the reciprocal of resistance. It’s unit is siemens (S).
Conductance is the reciprocal of resistance. It’s unit is siemens (S).
3S
4A
2S
1S
1S
A node with an external connection is a terminal. Otherwise it’s a junction.
A node with an external connection is a terminal. Otherwise it’s a junction.
10
Kirchhoff’s Law of CurrentKirchhoff’s Law of CurrentKirchhoff’s Law of CurrentKirchhoff’s Law of Current
At each node,net current flowing at each node = 0.
Consider v1. We have
which after regrouping yields
2A
2A
3S
4A
2S
1S
1S
v1
v2v4
v3
I=VCI=VC
3(v1 ¡ v2) + (v1 ¡ v3) = 2:
2+ 3(v2 ¡ v1) + (v3 ¡ v1) = 0;
11
Summing UpSumming UpSumming UpSumming Up
2A
2A
3S
4A
2S
1S
1S
v1
v2v4
v3
12
SolvingSolvingSolvingSolving
2A
2A
3S
4A
2S
1S
1S
2V2V
3V
0V
13
Did I say “LARGE”?Did I say “LARGE”?Did I say “LARGE”?Did I say “LARGE”?
Imagine this being the power grid of America.
2A
2A
3S
4A
2S
1S
1S
v1
v2v4
v3
14
LaplaciansLaplaciansLaplaciansLaplacians
Given a weighted, undirected graph G = (V, E), we can represent it as a Laplacian matrix.
3 2
1
1
v1
v2v4
v3
15
LaplaciansLaplaciansLaplaciansLaplacians
Laplacians have many interesting properties, such as
Diagonals ¸ 0 denotes total incident weights
Off-diagonals < 0 denotes individual edge weights
Row sum = 0Symmetric
3 2
1
1
v1
v2v4
v3
16
Net Current FlowNet Current FlowNet Current FlowNet Current Flow
LemmaSuppose an n by n matrix A is the Laplacian of a resistive network G with n nodes.If y is the n-vector specifying the voltage at each node of G, then Ay is the n-vector representing the net current flow at each node.
17
Power DissipationPower DissipationPower DissipationPower Dissipation
LemmaSuppose an n by n matrix A is the Laplacian of a resistive network G with n nodes.If y is the n-vector specifying the voltage at each node of G, then yTAy is the total power dissipated by G.
18
SparsitySparsitySparsitySparsity
Laplacians arising in practice are usually sparse.The i-th row has (d+1) nonzeros if vi has d neighbors.
3 2
1
1
v1
v2v4
v3
19
Sparse MatrixSparse MatrixSparse MatrixSparse Matrix
An n by n matrix is sparse when there are O(n) nonzeros.
A reasonably-sized power grid has way more junctions and each junction has only a couple of neighbors.3 2
1
1
v1
v2v4
v3
20
Had Gauss owned a supercomputer…
(Would he really work on Gaussian Elimination?)
21
A Model ProblemA Model ProblemA Model ProblemA Model Problem
Let G(x, y) and g(x, y) be continuous functions defined in R and S respectively, where R and S are respectively the region and the boundary of the unit square (as in the figure).
R S
1
0 1
22
A Model ProblemA Model ProblemA Model ProblemA Model Problem
We seek a function u(x, y) that satisfies
Poisson’s equation in R
and the boundary condition in S.
@2u@x2
+@2u@y2 = G(x;y)
u(x;y) = g(x;y)
R S
1
0 1
If g(x,y) = 0, this iscalled a Dirichlet
boundary condition.
If g(x,y) = 0, this iscalled a Dirichlet
boundary condition.
23
DiscretizationDiscretizationDiscretizationDiscretization
Imagine a uniform grid with a small spacing h.
1
0 1
h
24
Five-Point DifferenceFive-Point DifferenceFive-Point DifferenceFive-Point Difference
Replace the partial derivatives by difference quotients
The Poisson’s equation now becomes
@2u=@y2 s [u(x;y + h) + u(x;y ¡ h) ¡ 2u(x;y)]=h2
@2u=@x2 s [u(x + h;y) + u(x ¡ h;y) ¡ 2u(x;y)]=h2
4u(x;y) ¡ u(x +h;y) ¡ u(x ¡ h;y)
¡ u(x;y+h) ¡ u(x;y ¡ h) = ¡ h2G(x;y)
Exercise:Derive the 5-pt diff. eqt. from first principle (limit).
Exercise:Derive the 5-pt diff. eqt. from first principle (limit).
25
For each point in For each point in RR
The total number of equations is .
Now write them in the matrix form, we’ve got one BIG linear system to solve!
4u(x;y) ¡ u(x +h;y) ¡ u(x ¡ h;y)
¡ u(x;y+h) ¡ u(x;y ¡ h) = ¡ h2G(x;y)
4u(x;y) ¡ u(x +h;y) ¡ u(x ¡ h;y)
¡ u(x;y+h) ¡ u(x;y ¡ h) = ¡ h2G(x;y)
(1h¡ 1)2
26
An ExampleAn ExampleAn ExampleAn Example
Consider u3,1, we have
which can be rearranged to
4u(x;y) ¡ u(x +h;y) ¡ u(x ¡ h;y)
¡ u(x;y+h) ¡ u(x;y ¡ h) = ¡ h2G(x;y)
4
0 4
u31 u41u21
u32
u30
4u(3;1) ¡ u(4;1) ¡ u(2;1)
¡ u(3;2) ¡ u(3;0) = ¡ G(3;1)
4u(3;1) ¡ u(2;1) ¡ u(3;2)
= ¡ G(3;1) + u(4;1) + u(3;0)
27
An ExampleAn ExampleAn ExampleAn Example
Each row and column can have a maximum of 5 nonzeros.
4u(x;y) ¡ u(x +h;y) ¡ u(x ¡ h;y)
¡ u(x;y+h) ¡ u(x;y ¡ h) = ¡ h2G(x;y)
28
Sparse Matrix AgainSparse Matrix AgainSparse Matrix AgainSparse Matrix Again
Really, it’s rare to see large dense matrices arising from applications.
29
Laplacian???Laplacian???Laplacian???Laplacian???
I showed you a system that is not quite Laplacian.We’ve got way too many boundary points in a 3x3 example.
30
Making It LaplacianMaking It LaplacianMaking It LaplacianMaking It Laplacian
We add a dummy variable and force it to zero.(How to force? Well, look at the rank of this matrix first…)
31
Sparse Matrix Sparse Matrix RepresentationRepresentationSparse Matrix Sparse Matrix RepresentationRepresentation
A simple schemeAn array of columns, where each column Aj is a linked-list of tuples (i, x).
32
Solving Sparse SystemsSolving Sparse SystemsSolving Sparse SystemsSolving Sparse Systems
Gaussian Elimination again?Let’s look at one elimination step.
33
Solving Sparse SystemsSolving Sparse SystemsSolving Sparse SystemsSolving Sparse Systems
Gaussian Elimination introduces fill.
34
Solving Sparse SystemsSolving Sparse SystemsSolving Sparse SystemsSolving Sparse Systems
Gaussian Elimination introduces fill.
35
FillFillFillFillOf course it depends on the elimination order.
Finding an elimination order with minimal fill is hopelessGarey and Johnson-GT46, Yannakakis SIAM JADM 1981O(log n) ApproximationSudipto Guha, FOCS 2000Nested Graph Dissection and Approximation Algorithms(n log n) lower bound on fill(Maverick still has not dug up the paper…)
36
When Fill Matters…When Fill Matters…When Fill Matters…When Fill Matters…
Find A-1
Guess x repeatedly until we guessed a solutionGaussian Elimination
Ax = b
37
Inverse Of Sparse MatricesInverse Of Sparse MatricesInverse Of Sparse MatricesInverse Of Sparse Matrices
…are not necessarily sparse either!
B B-1
38
And the winner is…And the winner is…And the winner is…And the winner is…
Find A-1
Guess x repeatedly until we guessed a solutionGaussian Elimination
Ax = b Can we be so lucky?
Can we be so lucky?
39
Iterative Methods
Checkpoint• How large linear system actually
arise in practice• Why Gaussian Elimination may not
be the way to go
40
The Basic IdeaThe Basic IdeaThe Basic IdeaThe Basic Idea
Start off with a guess x (0).Using x (i ) to compute x (i+1) until
converge.
We hopethe process converges in a small number of iterationseach iteration is efficient
41
Residual
The RF Method [Richardson, The RF Method [Richardson, 1910]1910]The RF Method [Richardson, The RF Method [Richardson, 1910]1910]
Domain Range
x(i+1) = x(i ) ¡ (Ax(i ) ¡ b)
x(i+1)
Ax(i)
x
x(i)
bTest
Correct
A
42
Why should it converge at Why should it converge at all?all?Why should it converge at Why should it converge at all?all?
Domain Range
x(i+1)
Ax(i)
x
x(i)
bTest
Correct
x(i+1) = x(i ) ¡ (Ax(i ) ¡ b)
43
It only converges when…It only converges when…It only converges when…It only converges when…
TheoremA first-order stationary iterative
method
converges iff
x(i+1) = x(i ) ¡ (Ax(i ) ¡ b)
½(G) < 1.
(A) is the maximumabsolute eigenvalue of A(A) is the maximum
absolute eigenvalue of A
x(i+1) = Gx(i ) + k
44
Fate?Fate?Fate?Fate?
Once we are given the system, we do not have any control on A and b.
How do we guarantee even convergence?
Ax = b
45
PreconditioningPreconditioningPreconditioningPreconditioning
Instead of dealing with A and b,we now deal with B-1A and B-1b.
B -1Ax = B -1b
The word “preconditioning”originated with Turing in 1948,
but his idea was slightly different.
The word “preconditioning”originated with Turing in 1948,
but his idea was slightly different.
46
Preconditioned RFPreconditioned RFPreconditioned RFPreconditioned RF
Since we may precompute B-1b by solving By = b, each iteration is dominated by computing B-
1Ax(i), which is a multiplication step Ax(i) and a direct-solve step Bz = Ax(i).
Hence a preconditioned iterative method is in fact a hybrid.
x(i+1) = x(i ) ¡ (B -1Ax(i ) ¡ B -1b)
47
The Art of PreconditioningThe Art of PreconditioningThe Art of PreconditioningThe Art of Preconditioning
We have a lot of flexibility in choosing B.Solving Bz = Ax(i) must be fastB should approximate A well for a low iteration count
I A
Trivial
What’s the
point?
B
48
ClassicsClassicsClassicsClassics
JacobiLet D be the diagonal sub-matrix of A.Pick B = D.
Gauss-SeidelLet L be the lower triangular part of A w/ zero diagonalsPick B = L + D.
x(i+1) = x(i ) ¡ (B -1Ax(i ) ¡ B -1b)
49
““Combinatorial”Combinatorial”““Combinatorial”Combinatorial”
We choose to measure how well B approximates A
by comparing combinatorial properties of (the graphs represented by) A and B.
Hence the term “Combinatorial Preconditioner”.
50
Questions?Questions?Questions?Questions?
51
Graphs as Matrices
52
Edge Vertex Incidence Edge Vertex Incidence MatrixMatrixEdge Vertex Incidence Edge Vertex Incidence MatrixMatrix
Given an undirected graph G = (V, E), let be a |E| £ |V | matrix of {-1, 0, 1}.
For each edge (u, v), set e,u to -1 and e,v to 1. Other entries are all zeros.
53
Edge Vertex Incidence Edge Vertex Incidence MatrixMatrixEdge Vertex Incidence Edge Vertex Incidence MatrixMatrix
ab
cd
e fg
h
ij
a b c d e f g h i j
54
Weighted GraphsWeighted GraphsWeighted GraphsWeighted Graphs
Let W be an |E| £ |E| diagonalmatrix where We,e is the weightof the edge e.
ab
cd
e fg
h
ij
3
62
1
8
9
3 2
4
5
1
55
LaplacianLaplacianLaplacianLaplacian
The Laplacian of G is defined tobe TW .
ab
cd
e fg
h
ij
3
62
1
8
9
3 2
4
5
1
56
Properties of LaplaciansProperties of LaplaciansProperties of LaplaciansProperties of Laplacians
Let L = TW .“Prove by example”
ab
cd
e fg
h
ij
3
62
1
8
9
3 2
4
5
1
57
Properties of LaplaciansProperties of LaplaciansProperties of LaplaciansProperties of Laplacians
Let L = TW .“Prove by example”
ab
cd
e fg
h
ij
3
62
1
8
9
3 2
4
5
1
58
Properties of LaplaciansProperties of LaplaciansProperties of LaplaciansProperties of Laplacians
A matrix A is Positive SemiDefinite if
Since L = TW , it’s easy to see that for all x
8x;xTAx ¸ 0:
xT(¡TW¡ )x = (W12 ¡ x)T(W
12 ¡ x) ¸ 0:
59
LaplacianLaplacianLaplacianLaplacian
The Laplacian of G is defined tobe TW .
ab
cd
e fg
h
ij
60
Graph Embedding
A Primer
61
Graph EmbeddingGraph EmbeddingGraph EmbeddingGraph Embedding
Vertex in G Vertex in HEdge in G Path in H
ab
cd
e fg
h
ij
a b e
c
f g
h i j
d
Guest Host
62
DilationDilationDilationDilation
For each edge e in G, define dil(e) to be the number of edges in its corresponding path in H.
ab
cd
e fg
h
ij
a b e
c
f g
h i j
d
Guest Host
63
CongestionCongestionCongestionCongestion
For each edge e in H, define cong(e) to be the number of embedding paths that uses e.Guest Host
ab
cd
e fg
h
ij
a b e
c
f g
h i j
d
64
Support Theory
65
DisclaimerDisclaimerDisclaimerDisclaimer
The presentation to follow is only “essentially correct”.
66
SupportSupportSupportSupport
DefinitionThe support required by a matrix B for a matrix A,both n by n in size, is defined as ¾(A=B) := minf¿ 2 Rj8x;xT(¿B ¡ A)x ¸ 0g
A / B)
Think of B supportingA at the bottom.
A / B)
Think of B supportingA at the bottom.
67
Support With LaplaciansSupport With LaplaciansSupport With LaplaciansSupport With Laplacians
Life is Good when the matrices are Laplacians.Remember the resistive circuit analogy?
68
Power DissipationPower DissipationPower DissipationPower Dissipation
LemmaSuppose an n by n matrix A is the Laplacian of a resistive network G with n nodes.If y is the n-vector specifying the voltage at each node of G, then yTAy is the total power dissipated by G.
69
Circuit-SpeakCircuit-SpeakCircuit-SpeakCircuit-Speak
Read this loud in circuit-speak:“The support for A by B is the minimum number so thatfor all possible voltage settings, copies of B burn at least as much energy asone copy of A.”
¾(A=B) := minf¿ 2 Rj8x;xT(¿B ¡ A)x ¸ 0g
70
Congestion-Dilation LemmaCongestion-Dilation LemmaCongestion-Dilation LemmaCongestion-Dilation Lemma
Given an embedding from G to H,
71
TransitivityTransitivityTransitivityTransitivity
Pop QuizFor Laplacians, prove this in circuit-
speak.
¾(A=C) · ¾(A=B) ¢¾(B=C)
72
Generalized Condition Generalized Condition NumberNumberGeneralized Condition Generalized Condition NumberNumber
DefinitionThe generalized condition number of a pair of PSD matrices is
A is Positive Semi-definiteiff 8x, xTAx ¸ 0.
A is Positive Semi-definiteiff 8x, xTAx ¸ 0.
73
Preconditioned Conjugate Preconditioned Conjugate GradientGradientPreconditioned Conjugate Preconditioned Conjugate GradientGradient
Solving the system Ax = b using PCG with preconditioner B requires at most
iterations to find a solution such that
Convergence rate is dependent on the actual iterative method used.
Convergence rate is dependent on the actual iterative method used.
74
Support Trees
75
Information FlowInformation FlowInformation FlowInformation FlowIn many iterative methods, the only operation using A directly is to compute Ax(i) in each iteration.
Imagine each node is an agent maintaining its value.The update formula specifies how each agent should update its value for round (i+1) given all the values in round i.
x(i+1) = x(i ) ¡ (Ax(i ) ¡ b)
76
The Problem With The Problem With MultiplicationMultiplicationThe Problem With The Problem With MultiplicationMultiplication
Only neighbors can “communicate” in a multiplication, which happens once per iteration.
2A
2A
3S
4A
2S
1S
1S
v1
v2v4
v3
77
Diameter As A Natural Lower Diameter As A Natural Lower BoundBoundDiameter As A Natural Lower Diameter As A Natural Lower BoundBound
In general, for a node to settle on its final value, it needs to “know” at least the initial values of the other nodes.
2A
2A
3S
4A
2S
1S
1S
v1
v2v4
v3
Diameter is the maximumshortest path distance
between any pair of nodes.
Diameter is the maximumshortest path distance
between any pair of nodes.
78
Preconditioning As Preconditioning As ShortcuttingShortcuttingPreconditioning As Preconditioning As ShortcuttingShortcutting
By picking B carefully, we can introduce shortcuts for faster communication.
But is it easy to find shortcutsin a sparse graph to reduceits diameter?
x(i+1) = x(i ) ¡ (B -1Ax(i ) ¡ B -1b)
2A
2A
3S
4A
2S
1S
1S
v1
v2v4
v3
79
Square MeshSquare MeshSquare MeshSquare Mesh
Let’s pick the complete graph induced on all the mesh points.
Mesh: O(n) edgesComplete graph: O(n2) edges
80
Bang!Bang!Bang!Bang!
So exactly how do we propose to solve a dense n by n system faster than a sparse one?
B can have at most O(n) edges, i.e., sparse…
81
Support TreeSupport TreeSupport TreeSupport TreeBuild a Steiner tree to introduce
shortcuts!
If we pick a balancedtree, no nodes will be farther than O(log n)hops away.
Need to specify weightson the tree edges
2A
2A
3S
4A
2S
1S
1S
v2v4
v3
v1
82
Mixing SpeedMixing SpeedMixing SpeedMixing Speed
The speed of communication is proportional to the corresponding coefficients on the paths between nodes.
2A
2A
3S
4A
2S
1S
1S
v1
v2v4
v3
83
Setting WeightsSetting WeightsSetting WeightsSetting Weights
The nodes should be able to talk at least as fast as they could without shortcuts.
How about setting all the weights to 1?
2A
2A
3S
4A
2S
1S
1S
v2v4
v3
v1
?
???
? ?
84
Recall PCG’s Convergence Recall PCG’s Convergence RateRateRecall PCG’s Convergence Recall PCG’s Convergence RateRate
Solving the system Ax = b using PCG with preconditioner B requires at most
iterations to find a solution such that
85
Size MattersSize MattersSize MattersSize Matters
How big is the preconditioner matrix B?(2n-1) by (2n-1)
The major contribution(among another) of our paper is deal with thisdisparity in size.
2A
2A
3S
4A
2S
1S
1S
v2v4
v3
v1
86
The Search Is HardThe Search Is HardThe Search Is HardThe Search Is Hard
Finding the “right” preconditioner is really a tradeoffSolving Bz = Ax(i) must be fastB should approximate A well for a low iteration count
It could very well be harder than we think.
87
How to Deal With Steiner Nodes?
88
The Trouble of Steiner NodesThe Trouble of Steiner NodesThe Trouble of Steiner NodesThe Trouble of Steiner Nodes
Computation
Definitions, e.g.,¾(B=A) := minf¿ 2 Rj8x;xT(¿A ¡ B)x ¸ 0g
x(i+1) = x(i ) ¡ (B -1Ax(i ) ¡ B -1b)
89
Generalized SupportGeneralized SupportGeneralized SupportGeneralized Support
Let where W is n by n.
Then (B/A) is defined to be
where y = -T -1Ux.
B =
µT UUT W
¶
minf¿ 2 Rj8x;¿xTAx ¸µ
yx
¶TB
µyx
¶g
90
Circuit-SpeakCircuit-SpeakCircuit-SpeakCircuit-Speak
Read this loud in circuit-speak:“The support for B by A is the minimum number so thatfor all possible voltage settings at
the terminals, copies of A burn at least as much energy asone copy of B.”
¾(B=A) := minf¿ 2 Rj8x;¿xTAx ¸µ
yx
¶TB
µyx
¶g
91
Thomson’s PrincipleThomson’s PrincipleThomson’s PrincipleThomson’s Principle
Fix the voltages at theterminals. The voltages at the junctions will be set such that the total power dissipation in the circuit is minimized.
2V 1
V
3V
5V
¾(B=A) := minf¿ 2 Rj8x;¿xTAx ¸µ
yx
¶TB
µyx
¶g
92
Racke’s Decomposition Tree
93
Laminar DecompositionLaminar DecompositionLaminar DecompositionLaminar Decomposition
A laminar decomposition naturally defines a tree.
ab
cd
e fg
h
ij
a b e
c
f g
h i j
d
94
Racke, FOCS 2002Racke, FOCS 2002Racke, FOCS 2002Racke, FOCS 2002
Given a graph G of n nodes, there exists a laminar decomposition tree T with all the “right” propertiesas a preconditioner for G.
Except his advisor didn’t tell him about this…
95
For More DetailsFor More DetailsFor More DetailsFor More Details
Our paper is available onlinehttp://www.cs.cmu.edu/~maverick/
Our contributionsAnalysis of Racke’s decomposition tree as a preconditionerProvided tools for reasoning support between Laplacians of different dimensions
96
The Search Is NOT OverThe Search Is NOT OverThe Search Is NOT OverThe Search Is NOT Over
Future DirectionsRacke’s tree can take exponential time to find
Many recent improvements, but not quite there yet
Insisting on balanced tree can hurt in many easy cases
top related