implementation of block algebraic iterative reconstruction ...pcha/hdtomo/sc/blockair.pdf8 p. c....
TRANSCRIPT
Implementation of Block Algebraic Iterative Reconstruction Methods
Per Christian Hansen joint work with
Hans Henrik B. Sørensen
May 2014 2 P. C. Hansen – Implementation of Block AIR Methods
About Me …
• Interests: inverse problems, tomography, regularization algorithms, matrix compu-tations, image deblurring, signal processing, Matlab software, …
• Head of the project High-Definition Tomography, funded by an ERC Advanced Research Grant.
• Author of several Matlab software packages. • Author of four books.
Forward problem
May 2014 3 P. C. Hansen – Implementation of Block AIR Methods
Outline of Talk
We consider reconstruction problems in computed tomography: reconstruct a 2D or 3D object from its projections, i.e., (noisy) measurements of the damping of rays that go through the domain.
We obtain a very large system of equations A x = b with a very sparse matrix A, which must be solved by an iterative method.
1. Classical iterative reconstruction techniques
2. Performance considerations
3. An overview of block methods
4. How to compare the block methods
5. Numerical results
May 2014 4 P. C. Hansen – Implementation of Block AIR Methods
Analogy: the “Sudoku” Problem – 数独
3
7
4 6
This matrix in rank deficient and there are infinitely many solutions.
0
BB@
1 0 1 00 1 0 11 1 0 00 0 1 1
1
CCA
0
BB@
x1
x2
x3
x4
1
CCA =
0
BB@
3746
1
CCA
3
7
4 6
0
BBBB@
1 0 1 00 1 0 11 1 0 00 0 1 11 0 0 1
1
CCCCA
0
BB@
x1
x2
x3
x4
1
CCA =
0
BBBB@
37465
1
CCCCA
5
Unique solution!
May 2014 5 P. C. Hansen – Implementation of Block AIR Methods
3D Tomography Test Problem
• Parallel X-rays are sent through the object.
• The object is discretized in an array of N×N×N voxels.
• Projections are recorded on detectors with p×p pixels.
• The directions of the rays are ”evenly” distributed over the half-sphere using Lebedev quadrature points.
May 2014 6 P. C. Hansen – Implementation of Block AIR Methods
Setting Up the Algebraic Model
Damping of the i-th X-ray through domain (Beer’s law):
bi =Rrayi
Â(s) d`; Â(s) = attenuation coef.
Discretization leads to a large, sparse, ill-conditioned system:
A x = b
Geometry
Image
Projections Noise
¹b = A ¹x
b = ¹b + e
2D example:
May 2014 7 P. C. Hansen – Implementation of Block AIR Methods
A Note About the 3D Algebraic Model
Each ray corresponds to a particular row of the matrix A.
Each ray intersects only a very small number of voxels.
Hence, many rows of A are structurally orthogonal.
May 2014 8 P. C. Hansen – Implementation of Block AIR Methods
Noise Sensitivity
Assume that A has full rank, and consider the two problems:
A ¹x = ¹b (no noise) A x ¼ b = ¹b + e
The last term dominates because A is very ill conditioned! We must use regularization to compute an approximate solution that is less senstive to the noise.
xnaive = A¡1b = xexact + A¡1e; kA¡1ek À kxexactkLet us define the ”naive” solution:
May 2014 9 P. C. Hansen – Implementation of Block AIR Methods
Some Large-Scale Reconstruction Algorithms
Bayesian Methods My knowledge here is very limited …
Transform-Based Methods The forward problem is formulated as a certain transform
→ find a stable way to compute the inverse transform. Examples: the inverse Radon transform for tomography
→ filtered back-projection, FDK.
Algebraic Iterative Methods The forward problem is formulated as a discretized problem
→ solve A x = b using an iterative method. Examples: Cimmino, Kaczmarz, CGLS.
May 2014 10 P. C. Hansen – Implementation of Block AIR Methods
ART (Algebraic Reconstruction Technique)
Relaxation parameter
Parallelism at the level of an inner product
Algorithm: ART (Classical Kaczmarz)
Let x0 = 0Repeat the above for k = 1; 2; 3; : : :
Algorithm: xk à ART-sweep (¸; A; b; xk¡1)
xk;0 = xk¡1
xk;i = P
µxk;i¡1 + ¸
bi ¡ aTi xk;i¡1
kaik22ai
¶; i = 1; : : : ; m
xk = xk;m.
May 2014 11 P. C. Hansen – Implementation of Block AIR Methods
SIRT (Simultaneous Iter. Reconstr. Tech.)
matrix that defines the specific method
Algorithm: SIRT
Let x0 = 0For k = 1; 2; 3; : : :
xk =¡
xk¡1 + ¸ ATM (b¡A xk¡1)¢
Relaxation parameter
Parallelism at the level of a matrix-vector product
No evidence that x0 6= 0 gives better solutions or smaller computing time.
Cimmino:M = 1
mdiag(1=kaik22).
May 2014 12 P. C. Hansen – Implementation of Block AIR Methods
Performance
ART and SIRT (Cimmino) for very small λ = 0.01 and for ”optimal” λ.
Slow convergence.
ART can converge a lot faster than SIRT.
kxk¡
¹xk 2
=k¹xk 2
May 2014 13 P. C. Hansen – Implementation of Block AIR Methods
Performance
Iterations k
Rela
tive
erro
r
Test Problem: • Parallel-beam tomography. • 13 projections. • 3D Shepp-Logan phantom, Schabel (2006).
kxk ¡ ¹xk2=k¹xk2
ART
May 2014 14 P. C. Hansen – Implementation of Block AIR Methods
Performance 1 core
Intel Xeon E5620 2.40 GHz (1 core)
Same number of flops! The difference is due to the cache: ART uses row ai twice once it is loaded.
ART SIRT
May 2014 15 P. C. Hansen – Implementation of Block AIR Methods
Intel Xeon E5620 2.40 GHz (4 cores)
Performance 4 cores
ART SIRT
Four cores are better suited for block matrix-vector operations.
May 2014 16 P. C. Hansen – Implementation of Block AIR Methods
Our Dilemma
ART has faster convergence than SIRT – i.e., more reduction of the error per iteration.
SIRT can better take advantage of multi-core architecture than ART.
How to achieve the ”best of both worlds?” → Block methods!
May 2014 17 P. C. Hansen – Implementation of Block AIR Methods
Block Methods
In each iteration we can: • Treat the blocks sequentially or simultaneously (i.e., in parallel). • Treat each block by an iterative or by a direct computation.
We obtain several methods: • Sequential processing + ART on each block → classical ART • Sequential processing + SIRT on each block • Sequential processing + pseudoinverse of Aℓ • Parallel processing + ART on each block • Parallel processing + SIRT on each block → classical SIRT • Parallel processing + pseudoinverse of Aℓ
May 2014 18 P. C. Hansen – Implementation of Block AIR Methods
The convergence depends on the number of blocks p: If p = 1, we recover SIRT If p = m, we recover ART
Block-Sequential Methods
Eggermont, Herman, Lent (1981) Elfving (1980)
Parallelism given by the tradeoff:
Algorithm: Block-Sequential
Initialization: choose an arbitrary x0 2 Rn
Iteration: for k = 0; 1; 2; : : :
xk;0 = xk¡1
xk;` = P¡
xk;`¡1 + ¸ AT` M` (b` ¡A` xk;`¡1)
¢; ` = 1; 2; : : : ; p
xk = xk¡1;p
M` = (A`AT` )y ) AT
` M` = Ay`Variant by Elfving (1980):
May 2014 19 P. C. Hansen – Implementation of Block AIR Methods
The convergence depends on p: If p = 1, we recover ART
If p = m, we recover SIRT
Block-Parallel Methods
Algorithm: Block-Parallel
Initialization: choose an arbitrary x0 2 Rn
Iteration: for k = 0; 1; 2; : : :
for ` = 1; : : : ; p execute in parallel
xk;` = ART-sweep(¸; A`; b`; xk¡1)
xk = 1=pPp
`=1 xk;`.
Variants: Elfving (1980) – inner step:
CARP algorithm, Gordon & Gordon (2005): xk;` = P
¡xk¡1;` + ¸ Ay
`(b` ¡ A` xk¡1;`)¢
xk =Pp
`=1 D` xk;`; D` depends on sparsity structure
Censor, Elfving, Herman (2001)
Parallelism is given by:
May 2014 20 P. C. Hansen – Implementation of Block AIR Methods
Block Sequential
4 blocks
The ”building blocks” are SIRT iterations, suited for multicore. The blocks are treated sequentailly! Hence the error reduc-tion per iteration is close to that of ART.
ART SIRT Block-Seq.
Intel Xeon E5620
2.40 GHz (4 cores)
May 2014 21 P. C. Hansen – Implementation of Block AIR Methods
Block Parallel
ART SIRT Block Seq.
Block Par.
Intel Xeon E5620
2.40 GHz (4 cores)
May 2014 22 P. C. Hansen – Implementation of Block AIR Methods
Fair Comparison of the Methods …
It is quite easy to make an unfair comparison between the methods: choose a bad λ for the method you don’t like.
To make a fair comparison between the methods, we choose the value of λ that is (near) optimal for each method!
What do we mean by ”(near) optimal”? – Choose a test problem with a known solution. – Find the parameter λ that gives fastest semi-convergence.
The relaxation parameter λ makes comparisons difficult …
xnaive = A¡1b = xexact + A¡1e; kA¡1ek À kxexactk
Recall that we do not want the ”naive” solution:
May 2014 23 P. C. Hansen – Implementation of Block AIR Methods
Illustration of Semi-Convergence
A¡1b
May 2014 24 P. C. Hansen – Implementation of Block AIR Methods
Semi-convergence and relaxation parameter λ
Optimal λ reaches min. error in fewest iterations
Training for Optimal λ
Optimal λ
Iteration k
May 2014 25 P. C. Hansen – Implementation of Block AIR Methods
Convergence Results I
Only convergence is considered here, the number of cores is irrelevant.
May 2014 26 P. C. Hansen – Implementation of Block AIR Methods
Convergence Results II
Only convergence is considered here, the number of cores is irrelevant.
May 2014 27 P. C. Hansen – Implementation of Block AIR Methods
Blocks of Structurally Orthogonal Rows
When a block has structurally orthogonal rows then ART, SIRT and ”pinv” are equivalent. It is worthwhile to utilize this!
PART algorithm, Gordon (2006)
In 3D tomography, it is easy to find sets of rows that are orthogonal due to the structure of zeros/nonzeros.
Thus, a re-ordering of the rows can produce blocks with mutually orthogonal rows (= the traces of rays are non-overlapping).
May 2014 28 P. C. Hansen – Implementation of Block AIR Methods
Single-Core Results
Intel Core i7-3820 3.60 GHz (1 core)
Block-Seq: block-sequential-SIRT Block-Par: block-parallel-ART (Censor, Elfving, Herman) CARP: block-parallel-ART (Gordon, Gordon) PART – utilizes struct. orthog. ART (1 thread)
May 2014 29 P. C. Hansen – Implementation of Block AIR Methods
Multi-Core Performance – 4 cores
Block-seq-SIRT Block-par-ART (Censor, Elfving, Herman) Block-par-ART (Gordon, Gordon) PART – utilizes struct. orthog. ART (1 thread)
May 2014 30 P. C. Hansen – Implementation of Block AIR Methods
Multi-core Results – 4 Cores
Intel Core i7-3820 3.60 GHz (4 cores)
The advantage of PART over standard ART is due to the improved use of multicore architecture.
Block-Seq: block-sequential-SIRT Block-Par: block-parallel-ART (Censor, Elfving, Herman) CARP: block-parallel-ART (Gordon, Gordon) PART – utilizes struct. orthog. ART (1 thread)
May 2014 31 P. C. Hansen – Implementation of Block AIR Methods
Multi-core Results – 32 Cores
4 socket AMD Opteron 6282 SE 2.60 GHz (32 cores)
With many cores, PART is a clear winner. Block-Seq: block-sequential-SIRT
Block-Par: block-parallel-ART (Censor, Elfving, Herman) CARP: block-parallel-ART (Gordon, Gordon) PART – utilizes struct. orthog. ART (1 thread)
May 2014 32 P. C. Hansen – Implementation of Block AIR Methods
Conclusions
Block algebraic iterative reconstruction techniques are able to achieve initial convergence rate similar to that of ART,
and with the smaller computing time of SIRT, because we can utilize the multicore architecture.
With a suitable row ordering and choice of blocks, we can produce blocks of structurally orthogonal rows.
PART has identical convergence to ART and very good scaling properties in practice.
Next step: target GPUs (up to 2688 cores).