a domain decomposition algorithm for the numerical solution of...
TRANSCRIPT
A DOMAIN DECOMPOSITION ALGORITHM FOR THE NUMERICAL SOLUTION OF MAXWELL'S
EQUATIONS
Yijun Lu
M .Sc., Huazhong University of Science and Technology, 1990
B.Sc., Huazhong University of Science and Technology, 1985
A THESIS SUBMITTED IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
in the Department of Mathematics and Statistics
@ Yijun Lu 1995
SIMON FRASER UNIVERSITY
August 1995
All rights reserved. This work may not be
reproduced in whole or in part, by photocopy
or other means, without the permission of the author.
APPROVAL
Name: Yijun Lu
Degree: Master of Science
Title of thesis: A Domain Decomposition Algorithm for the Numerical Solu-
tion of Maxwell's Equations
Examining Committee:
Chairman: Dr. B. R. Alspach
Dr. C. Y. Shen
Senior %pervisor
-
Dr. M. Singh
Dr. G. A. C. Graham
Dr. R. W. Lardner
External Examiner
Department of Mathematics and Statistics
Simon Fraser University
~ugust 8, 1995 Date Approved:
PARTIAL COPYRIGHT LICENSE
I hereby grant to Simon Fraser Universi the right to lend my thesis, pro'ect or extended essay (the title o which is shown below) f' B to users o the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users. I further agree that permission for multiple copying of this work for scholarly purposes may be granted by me or the Dean of Graduate Studies. It is understood that copying or publication of this work for financial gain shall not be allowed without my written permission.
Title of Thesis/Project/Extended Essay
A Domain Decomposition Algorithm f o r t h e Numerical Solut ion
of Maxwell's Equations
Author., (signatur
June 27, 1995
(date)
Abstract
The development of scalable parallel algorithms for implementation on massively parallel
processing (MPP) computers has become an important research topic in scientific comput-
ing. The long-term goal of this work to develop an accurate, efficient, flexible and scable
parallel algorithm for solving Maxwell's equations.
A domain decomposition technique together with an implicit finite difference scheme has
been used to design a parallel algorithm to solve for the electromagnetic scattering by an
infinite square metallic cylinder in the time-domain. The implicit difference scheme yields
first order discretization accuary, unconditional stability, and a large system of linear equa-
tions at each time step. The domain decomposition technique reduces the solution of this
large system to that of many independent smaller subsystems. A concept of balance factor
is proposed to analyze the speedup of the algorithm for several different cases where the
computational domain is decomposed into 4 and 8 subregions and the size of computational
domain varies from 1.6X x 1.6X to 7.2X x 7.2X.
The present algorithm has been implemented on a coarse-grain parallel vector supercom-
puter CRAY C98, running in the dedicated mode, to obtain a speedup close to the number
of available CPU's for a perfectly balanced case. The present algorithm can also be adapted
to MPP computers.
Dedication
To my grandmother
Acknowledgements
I would like to thank my senior supervisor, Dr. C.Y. Shen fa lr his e mcouragement, patient
guidence and constant support during the preparation of this thesis. I would also like to
thank Cray Research, Inc. and Mr. Evans Harrigan for providing the dedicated time on
CRAY supercomputers. Finally, financial support from the Department of Mathematics
and Statistics at Simon Fraser University is much appreciated.
Contents
Approval ii
Abstract iii
Dedication iv
Acknowledgements v
1 Introduction 1
1.1 FDTD method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Domain Decomposition Technique . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 A Domain Decomposition Algorithm 7
2.1 Implicit Finite Difference Approximation . . . . . . . . . . . . . . . . . . . . . 7
2.2 Treatment of Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Domain Decomposition Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 1 1
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Numerical Results 18
3.1 Test Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Programming and Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . 20
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Code Validation 22
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Correctness 22
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Stability 24
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Speedup 26
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Summary 30
4 Concluding Remarks 3 2
Bibliography 34
Appendix 38
vii
List of Figures
1.1 Positions of the field components on a unit cell of FDTD lattice . . . . . . . . 2
. . . . . . . . . . . . . . . . . . . . . . . . . 1.2 A domain andi ts decomposition 5
2.1 Staggered spatial mesh scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 8
. . . . . . . . . . . . . . . . . . . . . . . 2.2 Left boundary mesh scheme for i= l 10
. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Geometry of the test problem 19
. . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Decomposition: four subdomains 19
. . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Decomposition: eight subdomains 20
3.4 Comparison 1: solid line for one domain, -0- points for 4 and 8 subdomains . 23
3.5 Comparison 2: solid line for one domain. -0- points for 4 and 8 subdomains . 23
3.6 Stability 1: h=1/20. Nx=Ny=41. dt=3.e.l0. No evidence of unstibility . . . . 25
3.7 Stability 2: h=1/20. Nx=Ny=41. dt=5.e-10; Unstability is detected at n=70 25
3.8 Stability 3: h=1/50. Nx=Ny=lOl. dt=5.0e-10; Unstability is postponed to
n=200 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Speedup: 4 subdomain case 27
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Speedup: 8 subdomain case 27
. . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Balance factor: 4 subdomain case 29
. . . . . . . . . . . . . . . . . . . . . . . . . 3.12 Balance factor: 8 subdomain case 29
. . . . . . . . . . 3.13 Speedup as a function of balance factor: 4 subdomain case 30
viii
3.14 Speedup as a function of balance factor: 8 subdomain case . . . . . . . . . . 31
Chapter 1
Introduction
Electromagnetic wave propagation problem can be formulated as an initial and/or boundary
value problem involving Maxwell's equations. The determination of the resultant electro-
magnetic fields due to various mode of excitation is both an important theoretical and
practical problem. In the case of electromagnetic scattering, the exact analytical solutions
can be obtained only for very simple scatterers such as the sphere or the circular cylinder.
For most scatterers, we must resort to numerical methods to determine the scattered field.
We will briefly discuss the FDTD method for solving Maxwell's equations in $1.1 , the
domain decomposition technique for solving elliptic problems in 51.2, and an outline of the
thesis in $1.3 .
1.1 FDTD method
The finite-difference time-domain (FDTD)([4]) algorithm is developed to solve the two
Maxwell's curl equations directly in the time domain [28]
CHAPTER 1 . INTRODUCTION 2
where E = (Ex, Ey7 E,) is the electrical field and H = (H,, H,, Hz) the magnetic field. The
constants E , a, and p are electric permittivity, electric conductivity and magnetic perme-
ability, respectively.
The FDTD algorithm uses a second-order finite difference approximation to the space
and time derivatives of each field component. Figure 1.1 shows a typical spatial mesh
scheme.
z
Figure 1.1: Positions of the field components on a unit cell of FDTD lattice
All quantities on the right-hand side of each difference equation are known from compu-
tations performed a t the previous time step. This results in a fully explicit system whereby
chronological values of the electric and magnetic field components a t each location are ob-
tained in a temporal leapfrog manner. For example, the finite-difference equations for the
field components H, and E, are as follows:
At +-[(At)-'(E,"(i, j + 112, k + 1) - EC(i, j + 112, k))
P +(Ay)-'(~,"(i, j , k + 112) - E:(i, j + 1, k + 1/2)] (1.3)
CHAPTER 1 . INTRODUCTION
- ~ ; + ' / ~ ( i - 112, j, k + 112))
+(A~)-'H:+'/~(~, j - 112, k + 112)
- H,"f 'I2(i, j + 112, k + 1/2))] (1.4)
where Fn(i , j , k) = F(iAx, j a y , kAz, nAt) for any function F(x, y, z, t).
The FDTD lattice shown in Figure 1.1 was first proposed by Yee[42] in the mid-1960's,
but its use was very limited until early 1980's when Mur[25] introduced absorbing boundary
conditions which were employed to truncate the infinite scattering field to a finite solution
domain. The basis of this absorbing ( or radiation ) boundary condition is a two-term Taylor
series approximation of a one-way wave equation [ll]. Detailed discussion of the absorbing
boundary conditions can be found in [17, 181.
Various aspects of the FDTD method have been studied by many researchers. A stabil-
ity condition vAt < (& + + + &)-'I2 was given in [29] for the computations of the
explicit difference scheme, where v is the wave velocity. Extentions of the FDTD method
to handle curved surfaces and irregular nonorthogonal meshes were given in [21]. A FDTD
algorithm in curvilinear coordinates was discussed in [12, 131. A convergence analysis of
FDTD scheme on nonuniform grids was provided in [23].
Many papers have been published on the application of FDTD method to various elec-
tromagnetic problems. An accurate simulation of an incident wave of arbitrary duration,
pulse shape, angle of incidence and polarization was reported independently in [25] and
[39]. In [39], Umashankar and Taflove provided means to obtain unambiguous sinusoidal
steady-state data from the transient reponse. Accurate computations of far-field and mono-
staticlbistatic radar cross section were given in [39, 31, 341. A computation of coupling of
wires and wire bundles in free space and in a metal cavity was reported in [40,32]. Also see
[26, 30, 35, 33, 36, 37, 411 for other applications.
CHAPTER 1 . INTRODUCTION 4
Research is ongoing for each of the problems mentioned above. Key questions include
efficient use of computer resources and good resolution for large and complex problems. The
complexity of many scattering problems requires faster computing speed and large memory
of the computer system. Multiprocessor supercomputers and massively parallel processors
have been used in the electromagnetic computing. For problems involving complex scat-
terers it is desirable to use requires different techniques to deal with different parts of the
scattering domains. In such cases, a uniformed stability condition will be difficult to be
implemented. Therefore it is necessary to design a method without the requirement of a
stability condition.
1.2 Domain Decomposition Technique
Domain decomposition approach is ideally suited for the parallel solution of very large
systems of linear or nonlinear algebraic equations that arise from the discretization of a
boundary value problem.
To simplify the discussion, we consider the following two-dimensional Poisson problem:
There are two variants of the domain decomposition method, namely, those in which the
subdomains are overlapped and those non-overlapped. We shall consider the latter. If the
given domain R is divided into two subdomains as shown in Figure 1.2, and the equations
( 1.5)-( 1.6) are discretized by using finite difference or finite element approximation, we
arrive a t a linear system which can be expressed as
We number the unknowns associated with the interior points of the subdomains first and
then followed by those with the interface I'. The linear system ( 1.7) can be written in
CHAPTER 1 . INTRODUCTION
partitioned form as
Figure 1.2: A domain and its decomposition
If we eliminate u1 and u2 from the above equations by using block Gaussian elimination,
we obtain the Schur complement system
where
C = A33 - AT3A;: A13 - A ~ ~ A T ; ~ 2 3 ,
and
f3 = 63 - AT3A;bl - AT3AZ;Llb2.
If u3 can be found from ( 1.8), then we can solve the following two independent subdomain
problems
Allul = bl - A13113
and
= b2 - A23u3.
In general, the matrix C is expensive to compute explicitly. The preconditioned con-
jugate gradient (PCG) method is an attractive alternative for solving ( 1.8). In this case,
CHAPTER 1 . INTRODUCTION 6
C does not have to be formed explicitly. All required is the matrix-vector product C w for
a given vector w. It is very important to keep the number of iterations low. A number
of preconditioners have been suggested in the literature to improve the convergence of the
method (see [ I , 2, 6, 10, 9, 14, 151).
Domain decomposition methods have been widely studied for solving elliptic boundary
value problems[7]. The parallel implementation of domain decomposition techniques was
reported in [15]. But little is known for the applicability or the effectiveness of the technique
when it is applied to hyperbolic or parabolic problems. The main objective of this thesis is
to demonstrate that the domain decomposition strategy can be efficiently utilized to solve
Maxwell's equations in the time domain.
1.3 Outline of the Thesis
In order to avoid the stability condition for the traditional FDTD algorithm and to effi-
ciently utilize multiprocessor supercomputers such as CRAY C90's to solve the Maxwell's
equations, a domain decomposition algorithm is proposed in this thesis. Based upon an
implicit difference discretization, the algorithm solves systems of linear equations at ev-
ery time step. Implicit difference approximations for a two-dimensional scattering problem
are described in $2.1. The treatment of an absorbing boundary condition is considered in
$2.2. The domain decomposition algorithm is given in $2.3. In Chapter 3, we discuss the
implementation of the algorithm on a CRAY C98 system. The parallelization and the pro-
gramming of the algorithm are explored in $3.2. The correctness of the numerical results
and the stability of the algorithm are given in 53.3. Also a concept of balance factor which
is used to describe the workload allocation among parallel processors is introduced to an-
alyze the speedup of the algorithm. Finally, some concluding remarks are given in Chapter 4.
Chapter 2
A Domain Decomposition
Algorithm
An implicit finite difference time-domain scheme is presentee 1. When this scheme is
combined with the appropriate approximation of an absorbing boundary condition ($2.2),
a linear system of equations with unknowns involving both the electric and magnetic fields
can be obtained. A domain decomposition algorithm is developed in $2.3 to solve a linear
system of equations involving only the electric field.
2.1 Implicit Finite Difference Approximation
Let us consider the transverse magnetic (TM) wave in two dimensions. In an isotropic
medium, the three field components Hz, H, and E, satisfy the following Maxwell's equations
in a region R. By using an absorbing boundary condition which will be discussed in $2.2,
the infinite scattering domain R can be truncated into a bounded region.
CHAPTER 2. A DOMAIN DECOMPOSITION ALGORITHM 8
Figure 2.1: Staggered spatial mesh scheme
Consider a staggered spatial mesh as shown in Figure 2.1 with mesh parameters A x and
AY.
Let F n ( i , j ) = F ( x o + i A x , yo+ j a y , to+nAt) for any function F ( x , y, t ) , where ( x o , yo, t o )
is a fixed point and A t is the temporal discretization parameter. We have the following finite
difference approximations to ( 2.1)-( 2.3):
Ez(i- 1 j )
1 1 1 1 ( H i j + 5 ) - H j + 5 ) ) = --(E:(i, j + 1 ) - E F ( 4 j ) ) A t PAY (2.4) 1 1 1 1
( i + , j ) ) = -(E:(i + 1, j ) - EF(i , j ) ) -(H;(i + 5 , j ) - Hv A t PAX (2.5)
1 1 1 1 1 ( E , j ) - E j ) ) = -[-(H;(i + 5 , j ) - H;(i - -, j ) ) A t E A x 2
1 1 1 --(H:(i, j + 5 ) - H:(i, j - -1)
AY 2 -aE:(i , j ) l (2.6)
An alternative to the above differencing schemes is to use the central difference, which
yields the following discretizations:
Hx:i
Hyci-1/2j)
Hx(i
Ezt
1 1 1 1 ( H i j + ) - H j + - ) = --[E;(i7 j + 1 ) - E:(i7 j ) A t 2 PAY
Ez(ij+l)
j+ l /2 )
E z ~ i j )
j-1R)
i j - 1)
Hy(i+1/2j) Ez(i+l j )
CHAPTER 2. A DOMAIN DECOMPOSITION ALGORITHM
1 1 1 1 ( H i + -, j ) - ( , ) = [E:(i + 1, j ) - E:(i,j) At 2
1 1 1 1 . 1 ( E , j ) - ( i j ) ) = -{[-[H;(i + -, j ) - H;(i - ?, j ) At 2~ Ax 2
It is not difficult to show that the implicit difference schemes ( 2.4)-( 2.6) and ( 2.7)-
( 2.9) are unconditionally stable and have first-order and second-order accuracy respectively.
2.2 Treatment of Boundary Conditions
As mentioned above, the exterior problem ( 2.1)-( 2.3) needs to be restricted to a finite
computational domain. The scattering field is truncated into some regularly shaped domain
such as a circle or a rectangle which will also be denoted by R. It is necessary to impose
some boundary conditions on the outer boundary a R so that the scattered wave from the
interior of R can pass through a R without being reflected. In general, it is difficult to find
an efficient boundary condition which will perform the above task perfectly [17].
In this thesis, Engquist and Majda's [ll, 171 first-order absorbing (or radiation) boundary
condition will be used, i.e.
where v = ( p ~ ) - ' / ~ is the velocity of wave propagation, and the partial derivative with
respect to n denotes the derivative in the direction of the outer normal of the boundary 8 0 .
By choosing the outer boundary as a rectangle, i.e. 52 = (a, b ) x (c, d), ( 2.10) becomes
CHAPTER 2. A DOMAIN DECOMPOSITION ALGORITHM 10
Using the same discretization parameters A t , Ax and A y as in 32.1, we can write differ-
ence approximations of ( 2.11)-( 2.12) as
1 ( ( 1 , j ) - E 1 ( 1 j ) ) + ( E 1 ( 2 , j ) - ( 1 ) = 0 (2.15) At Ax
1 v -(EF(nz, j ) - E;-l(n, , j ) ) - -(E:-'(n,, j ) -'E;-'(n, - I d ) = 0 (2.16) At Ax
where n, is the number of mesh points in the x-direction. Figure 2.2 shows the boundary
Figure 2.2: Left boundary mesh scheme for i = l
situation for i = 1 . Notice that we use the explicit discretization here to make the problem
easier to handle. From ( 2.15) and ( 2.16), we have
The discretizations of ( 2.13) and ( 2.14) are similar to ( 2.17) and ( 2.18). Notice that I I
a weighted mean is needed for the computation of the boundary value at each vertex of I
CHAPTER 2. A DOMAIN DECOMPOSITION ALGORITHM 11
the rectangular boundary 8 0 , since we can have two values, for example, a t the point with
coordinates ( a , c). The values of Hz and H, on 80 can easily be computed from E, by
using ( 2.4) and ( 2.5).
Once an incident wave is introduced, the boundary condition on the scatterer surface
can be obtained. For example, for a perfectly conducting body, we have the condition
EScat = -E;,, applied to the E-field tangential t o the surface of the scatterer. In the TM
wave case, we have
(Ez)scat = -(Ez)inc. (2.19)
2.3 Domain Decomposition Algorithm
A linear system of equations a t each time step tn = nAt (n = 1,2, ...) can be obtained
by combining ( 2.4)-( 2.6) with the discretized boundary conditions given in $2.2. But this
linear system contains unknows involving the electric field as well as the magnetic field, thus
the solution is difficult t o be computed. In this section, we attempt t o get rid of Hx and H,
in ( 2.6) by using ( 2.4) and ( 2.5) and develop a domain decomposition algorithm t o solve
the reduced linear system.
In fact, equations ( 2.4)-( 2.6) can be rewritten as
At H;(i, j + 112) = ( E , j + 1 - E , j ) ) + H , j + 2 ) (2.20)
PAY At
( + I , j) = -(EF(i + 1, j ) - Er( i , j ) ) + ~ ; - ' ( i + 1/2 , j ) (2.21) PAX
and
At 1 1 + [ H i , j + ) - H i , j - - 1 = E , j ) (2.22) EAY 2
Substituting ( 2.20) and ( 2.21) into ( 2.22), we find
CHAPTER 2. A DOMAIN DECOMPOSITION ALGORITHM
u At ( 1 + -)E:(i, j ) - s ( s ( ~ F ( i + 1, j ) - E;( i , j ) ) + H;-'(i + 112, j )
E -L(EF(~, j ) - E:(i - 1, j ) ) - H;-'(i - 112, j ) ) PA^ +$(-s(~;(i, j + 1) - EF(i , j ) ) t H;-'(i , j + 1/21
+-$$(EF(~, j ) - EF(i , j - 1 ) ) - H;-'(i, j - 1/21) = E:-'(i, j )
After some simplifications, we have
where
u At d = 1 + - + 2(a+ b),
E
and At 1 1
fn-'(2, j ) = ~ : - ' ( i , j ) + -[H;-'(i + -, j ) - H;-'(i - 5 , j ) ] &Ax 2
At -- 1 1 [H;-'(i, j + 5 ) - ~ ; - ' ( i , j - -)I
2 (2.24)
EAY
The combination of the difference equations ( 2.23) and the discretized boundary condi-
tions in 52.2 gives us a linear system in which only the electric field E , at n-th time level is
involved as the unknows. It is worthy pointing out that the linear system ( 2.23) is strictly
diagonally dominante since d > 2(a + b).
The linear system can be written in the form of
Assume that the domain Q is decomposed into N subdomains R;, i = 1,2, ..., N . The union
of N interfaces r;, i = 1,2 , ..., N , which separate these N subdomains from each other is
denoted by r. We have the following relations:
Q = Q~ u n2 u ... u Q~ u r , 0, n Q j = 0 , for i # j ,
r = rl u r , u ... u rN
CHAPTER 2. A DOMAIN DECOMPOSITION ALGORITHM 13
There is no direct coupling between any two different subdomains. The coupling between
the subdomains and interfaces is represented by the matrices A;, i = 1,2, ..., N, which are
sparse. Most rows of A;(i = 1,2, ..., N) are zeros and there are at most two nonzero entries
for any nonzero rows. If we use the same ordering method for the unknowns as that described
in 51.2, then the system ( 2.25) becomes
A 1
A2
where the partitions of the coefficient matrix, the unknown vector and the right hand side
are obvious.
If we let n;,i = 1,2, ..., N, be the number of unknowns in each of the subdomains,
and n, be the number of unknowns on the interface I?, then each of the matrices A;,, A; is
of the order n, x n; and n; x n, respectively for i = 1,2, ..., N. Likewise Ar is of order n, x n,.
The Schur complement system corresponding to ( 2.27) is
where
and
Once the Schur complement system ( 2.28) has been solved, we can obtain the rest of the
solution of the system ( 2.25) by solving the following subdomain problems
A;; X; = g;, (2.29)
where g, = b, - A;y, i = 1,2, . . . , N. It is clear that the problems ( 2.29) are independent of
each other and that the solutions can be sought in a parallel manner.
CHAPTER 2. A DOMAIN DECOMPOSITION ALGORITHM
A domain decomposition algorithm for solving ( 2.27) can be described as follows:
ALGORITHM 1
1. Solve ( 2.28) for x r ;
2. Solve the linear systems ( 2.29) simultaneously.
For the numerical solution of the Maxwell's equations ( 2.1)-( 2.3) with a given incident
wave, we have the following algorithm:
ALGORITHM 2
1. n := 1, initialize the calculation;
2. Use ALGORITHM 1 to solve for the E, values a t the n-th time level ;
3. Use ( 2.20) and ( 2.21) t o compute Hz and H , from E, ;
4. n := n + 1, if n 5 Nmax (number of time steps), goto (2); Else exit.
Remarks
1. By using the same procedure as that used to generate ( 2.23)' we can derive the
following difference equations from ( 2.7)-( 2.9):
- ( j - ) - E - 1 j ) + E ( j ) - E ( + l j - E , 1 = f"-'(i,j) (2.30)
where
and
CHAPTER 2. A DOMAIN DECOMPOSITION ALGORITHM 15
All the discussions for the difference equations ( 2.4)-( 2.6) can be applied to the dif-
ference equations ( 2.7)-( 2.9).
2. If each subdomain is a rectangle and the unknowns E, inside the subdomain are
ordered from left t o right, and from down and up, then each of the matrices A;;(i =
1 ,2 , . . ., N ) in ( 2.27) has the form
where
and
Since I?;, i = 1,2, ..., N , satisfies ( 2.26), no points from different Ti's are expected to
appear in the same difference equation ( 2.23). This accounts for the special structure
of matrix Ar. In fact, Ar is block diagonal. That is
with each of B;;, i = 1,2, ..., N , being of the same form as matrix T.
CHAPTER 2. A DOMAIN DECOMPOSITION ALGORITHM 16
With these special forms, the system in ( 2.29) can be solved directly, and the inverse
of A;;(i = 1,2, . . . , N ) is easy to obtain. Furthermore, for the time-dependent prob-
lem( 2.1)-( 2.3), the matrices A;,(i = 1,2, . . . , N) and C are the same for every time
step, therefore it favors the strategy to compute A;' and C exlicitly at the beginning
of the computation.
3. Equation ( 2.23) is a two-level difference scheme involving (E:, H2, H t ) and (EF-', H:-',
HC-I). But fn-'(i, j ) can be expressed in terms of E:-'(i, j ) and E:-'(i, j ) , that is
a At fn-'(i, j ) = (2 + -)E:-'(i, j ) - ~ : - ~ ( i , j ) .
E (2.32)
So ( 2.23) can be tranformed to a three-level difference scheme. The right hand side
p - ' ( i , j ) of ( 2.30) cannot have a simple form similar to ( 2.32). The equation ( 2.23)
in which fn-'(i, j) is in the form ( 2.32) cannot be obtained directly from
which is formulated by eliminating Hz and H, from ( 2.3) using ( 2.1) and ( 2.2). But
we can use the following discretization for ( 2.33):
where
and similarly for the other terms in the right hand side of the above equation. In this
case, ( 2.34) is equivalent to
CHAPTER 2. A DOMAIN DECOMPOSITION ALGORITHM 17
where d = 2 + 9 + 2 ( a + b ) . As we can see, this is a three-level scheme. We have
to use arrays EF-', E: as well as H c and H i for the computation at the n-th
time level. Therefore, ( 2.23) or ( 2 . 3 0 ) is a better scheme because we need to use
fewer working arrays to implement the computation.
2.4 Summary
Two implicit finite difference schemes have been presented for the discretization of the 2-
dimensional Maxwell's equations. The use of staggered grids yields the symmetric schemes.
The unconditional stability of the difference schemes is a desirable property.
To truncate an infinite scattering domain into a finite computational domain, an ab-
sorbing boundary condition is necessary. Here Engquist and Majda's first-order absorbing
boundary condition is employed, and its difference approximation is discussed.
For the linear system involving only E, unknowns, which is derived from the differ-
ence discretization of Maxwell's equations and the absorbing boundary condition, a domain
decomposition algorithm is developed. Our main concern is how to solve the problem on
multiprocessor systems such as CRAY C90's. The idea of allocating workload among sev-
eral processors requires the decomposition of the solution domain. The numerical results in
Chapter 3 will show various properties of our algorithm.
Chapter 3
Numerical Results
In this chapter, we shall consider the implementation of the algorithm given in the previous
chapter on CRAY supercomputers. In section 3.1, we describe a electromagnetic scattering
problem which will be used for our numerical experiment. Section 3.2 is dedicated to
parallelize the domain decomposition algorithm. Section 3.3 presents numerical results
which illustrate the correctness and the stability of the algorithm, and demonstrate the
speedup for different sizes of the problem and different numbers of subdomains.
3.1 Test Problems
We shall calculate the scattering field of a perfectly conducting square cylinder illuminated
by an incident plane wave. The cylinder is assumed to be infinite in the z direction. The
incident wave is assumed to be a +x-directed TM wave. Because there is no variation of
either scatterer geometry or incident fields in the z-direction, this problem may be treated
as a two-dimensional scattering problem, with only E,, Hz and Hy present. Thus the mesh
scheme of Figure 2.1 is used. The configuration of the problem is illustrated in Figure 3.1.
The scattering electric field on the surface of the conducting cylinder is set to be equal to
the negative of the incident E-field. Engquist and Majda's first-order boundary condition
is enforced on the outer boundary of the rectangular computational domain.
CHAPTER 3. NUMERICAL RESULTS 19
For the purpose of domain decomposition, we shall consider the following three cases:
1. One domain only as shown in Figure 3.1;
1
Figure 3.1: Geometry of the test problem
2. Four subdomains as shown in Figure 3.2;
Figure 3.2: Decomposition: four subdomains
3. Eight subdomains as shown in Figure 3.3.
CHAPTER 3. NUMERICAL RESULTS
Figure 3.3: Decomposition: eight subdomains
Throughout the discussion below, the incident plane wave is taken to be [22]
t 2 27r E ( t , x ) = [ l - e-(t..) ] sin - (c t - z)
X ( 3 . 1 )
where c is the speed of light, X is the wavelength and is chosen to be 25h, and h = A x = A y .
The exponential term in ( 3.1) is for the smooth transition from zero to a sinusoidal variation
and t , is chosen to be 20At . The parameters in the Maxwell's equations ( 2.1)- ( 2.3) are
taken to be
In this case, the velocity of the wave is
3.2 Programming and Parallelizat ion
Our numerical experiments are carried out on a C R A Y C98 which has eight CPUs. Its ma-
chine accuracy is around 0.8 x 10-l4 and has 512MW central memory and 4nsec clock period.
CHAPTER 3. N U M E R I C A L RESULTS 2 1
There are two classes of global variables being used in the programming of the do-
main decomposition algorithm, namely, the internal variables associated with nodes within
subdomains and the interface variables associated with nodes belonging to two or more
subdomains.
After A i l , i = 1,2, ..., N , are computed by the LINPACK subroutines SGEFA and
SGEDI, we could use the following subroutines to implement the computation of the algo-
rithm:
MATMUTI compute Y , := ATA;;~ for i = 1,2, ..., N;
MATMUT2 implement Y,Ai to get ATAilA;, i = 1,2, ..., N;
FORMUL implement ( 2.24) or ( 2.31) over each subdomain R;, i = 1,2, ..., N;
RHSl formulate the right hand side f related to the interface;
RHS formulate the right hand sides b;, i = 1,2, ..., N, related to the subdomains;
MATVECT implement Y,b; to obtain ATA;%; for i = 1,2, ..., N;
DISINT distribute interface solutions to each segment of the interface;
MULTI solve the systems ( 2.29) for i = 1,2, ..., N ;
DISVAL use ( 2.20) and ( 2.21) to calculate Hz and H , fields in every subdomain.
The domain decomposition algorithm formulated in $2.3 essentially reduces the solution
of a large linear system into that of several disjoint smaller subsystems. This is a typical
example of the coarse-grained parallelism. This kind of pardelism can be efficiently imple-
mented on the CRAY C90 systems.
Coarse-grained parallelism can also be found in
1. The calculation of Ac1(i = 1,2, ..., N);
2. The product of matrices ATA;;'A;(~ = 1,2, ..., N);
CHAPTER 3. NUMERICAL RESULTS
3. The product of matrix-vector AT~i ' b ; ( i = 1,2, ..., N);
4. The computation of the Hz and H , fields from E, over the N subdomains.
As discussed in 53.1, N will be chosen to be 4 or 8 for the implementation of the algo-
rithm on a CRAY C98 system.
Autotasking techniques are used to realize the parallelization of the algorithm. For
example, if the same subroutine SAMPLE(A) is called N times for different parameters A :=
A;(i = 1,2, ..., N), we can put these different parameters into a 2-D array A(M, N ) such
that
A(-, i) := A;
Then in the CRAY autotasking environment, these calls become
CFPP$ CNCALL
DO 10 i = l,N
CALL SAMPLE( A(1,i) )
10 CONTINUE
This kind of processing can make the N calls of the subroutine SAMPLE to be distributed
to N processors and implemented concurrently.
3.3 Code Validation
The desirable properties of the domain decomposition algorithm are stability and appli-
cability to multiprocessor supercomputers. Various numerical results which support our
previous discussion are presented below. Throughout the numerical computation, only the
finite difference scheme ( 2.4)-( 2.6) is implemented.
3.3.1 Correctness
CHAPTER 3. NUMERICAL RESULTS
Figure 3.4: Comparison 1: solid line for one domain, -0- points for .4 and 8 subdomains
Figure 3.5: Comparison 2: solid line for one domain, -0- points for 4 and 8 subdomains
Prior to demonstrating the stability and speedup of ALGORITHM 2, we need to confirm
the correctness of the algorithm. For this purpose, a sequential algorithm which solves the
linear system of equations( 2.23) over the whole computational domain (see Figure 3.1) by
using Gaussian elimination method is coded.
CHAPTER 3. NUMERICAL RESULTS
The computational results for the two kinds of algorithms are illustrated in Figures 3.4
and 3.5 with different parameters. The graphs show that the electric fields E, obtained by
different methods at a fixed point are exactly the same.
3.3.2 Stability
As indicated in [29], care must be taken in setting discretization parameters At and h for
the conventional
FDTD becomes
For example, At
FDTD method. For the two-dimensional case, the stability condition for
< 1.18 x 10-lo when h = 1/20. Because of the unconditional stability of
the implicit finite difference scheme ( 2.4)-( 2.6), it is not necessary for our algorithm to
satisfy ( 3.2) in the interior of R. However, for the discretization of the absorbing boundary
condition, an explicit finite difference scheme is used. According to [17], we must have
in order to satisfy the stability condition at the outer boundary. This stability condition
( 3.3) is a sufficient condition. We have considered several choices of At for the same value 1 o f h = = .
When X = 1.0, N , = N, = 41, we implemented our algorithm by taking At = 3 . 0 ~ 10-lo,
which makes > 1. There is no any evidence of instability for time step N,,, 5 5000.
This result is plotted in Figures 3.6 for the values of E, at the point (6,21) and time steps
between 1 and 600. If At = 5.0 x 10-lo, the result is given in Figure 3.7 which shows that
the instability can be detected at about n = 70. This kind of instability can be postponed
by enlarging the outer boundary. Figure 3.8 illustrates the E-field at the same physical
point and with the same parameters as in Figure 3.6 except N , = N , = 101. One can
find that the instability is postponed to n = 200. To reduce the instibility generated from
the discretized absorbing boundary condition, other approximations including the implicit
scheme to ( 2.10) will be studied in the future.
CHAPTER 3. NUMERICAL RESULTS
n (time step)
Figure 3.6: Stability 1: h=1/20, Nx=Ny=41, dt=3.e-10; No evidence of unstibility
Figure 3.7: Stability 2: h=1/20, Nx=Ny=41, dt=5.e-10; Unstability is detected at n=70
7
I -
I -.
I -
I
!
-2.5
-
-
0 10 20 30 40 50 60 70 n (time step)
CHAPTER 3. NUMERICAL RESULTS
-1 1 I 0 50 100 150 200 250
n (time step)
Figure 3.8: Stability 3: h=1/50, Nx=Ny=lOl, dt=5.0e-10; Unstability is postponed to n=200
3.3.3 Speedup
Autotasking provides a mechanism for automatic multitasking on CRAY systems. Multi-
tasking is used to decrease wall-clock execution time for a program relative to that required
for single-processor execution. A multitasked program generally has the same amount of
work for the processors to perform as does the corresponding unitasked program. However,
when the work is spread across many processors, the wall-clock time required to complete
the work should be less.
Notice that multitasking decreases only wall-clock time. In fact, multitasking generally
increases CPU time because of extra code required for starting, stopping and synchronizing
processors.
Theoretically, speedup is defined to be the ratio of the execution time for the best
sequential algorithm and the parallel algorithm. However, it is not a trivial task to determine
CHAPTER 3. NUMERICAL RESULTS 2 7
an optimal sequential algorithm for a particular application and computer architecture.
Following [15], the meaning of speedup discussed here refers to measurements relative to
the uni- and multiprocessor implementation of an algorithm. On a dedicated CRAY system,
the speedup can be calculated in the following manner:
wall-clock execution time (single- processor) Speedup =
wall-clock execution time (multitasked)
With N CPU's, a speedup as close as possible to N is desired.
By using this definition, we obtain the following two figures (Figures 3.9 and 3.10) that
show the speedups of our algorithm implemented on a CRAY C98 system for various domain
sizes at four subdomains and at eight subdomains.
Speedup -7
Figure 3.9: Speedup: 4 subdomain case
n I Wall-clock time (seconds) I n 11 Case I N.(= N,) I Sequential I p a r d e l ' I Speedup 11
Figure 3.10: Speedup: 8 subdomain case
C H A P T E R 3. NUMERICAL RESULTS 28
It should be noted, from Figures 3.9 and 3.10, that, with the increasing of the domain
size, the speedup is decreasing. To analyze this situation, we need to introduce Amdahl's law
[5]. The formulation of Amdahl's law for multitasking is shown in the following equation:
where
S,,, - Maximum expected speedup from multitasking;
N - Number of processors avaiable for parallel execution;
fp - Fraction of a program that can be executed in parallel;
f, - Fraction of a program that is sequential, and f, + f, = 1.
The speedup from multitasking, S,, is in terms of wall-clock time, not CPU time.
If half of the execution time for a program can be spent in parallel execution (50% par-
allelism) on an eight-processor CRAY C98 system, the theoretical potential speedup would
be 1.78. If 95% of the program execution in parallel, the theoetical speedup would be 5.93.
Therefore, based on Amdahl's law, it is clear that significant speedup cannot be obtained
unless a significant portion of the execution is done in parallel. To obtain speedup equal
to the physical number of processors requires the execution program to use all processors
effectively 100% of the time with no overhead. Because this is virtually impossible, perfor-
mance is dominated by the fraction of the time spent executing serial code.
But there is an assumption in the Amdhal's law that the parallel portion fp could be
allocated to the available processors evenly. That is the workload is perfectly balanced
among processors. If this goal is unable to be achieved, some kind of balance information
should be introduced into the Amdahl's law. Here we propose the following
Definition The balance factor a for a given concurrent processing of a multitasked program
implemented on a computer system with N processors is defined by
Total size of all subtasks a =
N x (size of the largest subtask) '
Since there may be several concurrent processings for a multitasked program, we may have
several different balance factors. If these balance factors are equal, then we can describe a
C H A P T E R 3. NUMERICAL RESULTS
modified Amdahl's law as
It is easy to see that 5 a 5 1. In the perfectly balanced case, we have a = 1 and the
above formula degenerates to the original Amdah17s law ( 3.4).
Once a multitasked program is given, the sequential portion f, as well as the paral-
lel portion f, is generally fixed. Therefore on a dedicated system with N processors, the
speedup is mainly determined by the balance factor a.
n 1 Size of subtask I n
Figure 3.11: Balance factor: 4 subdomain case
Case 1
n I Size of subtask I n
Nz(= N y ) 40
Figure 3.12: Balance factor: 8 subdomain case
Case 1
smallest I largest 200 1 400
a 0.750
Nz(= N y ) 60
a 1.000
smallest 400
largest 400
CHAPTER 3. NUMERICAL RESULTS 30
Figures 3.11 and 3.12 display the balance factors corresponding to cases in Figures 3.9
and 3.10. One can see from the two given figures that, although the size of the compu-
tational domain is increasing, the balance factor is decreasing. That is why we have the
situation illustrated in Figures 3.9 and 3.10.
Figures 3.13 and 3.14 graphically show speedups with respect to balance factor a. It
can be found that the speedup is an increasing function with respect to the balance factor.
In the perfectly balanced case - case 1 in Figure 3.12, a speedup 7.81 is obtained.
balance factor
Figure 3.13: Speedup as a function of balance factor: 4 subdomain case
3.4 Summary
Numerical results concerning correctness, stability and speedup of the domain decomposi-
tion algorithm given in Chapter 2 are demonstrated in this Chapter.
For the programming of our algorithm, the subroutines listed in 53.2 are important.
The construction of the coupling matrices A;, i = 1,2, ..., N, in ( 2.27) is complicated. The
method described in the last part of 53.2 plays a key role in the use of autotasking techniques
CHAPTER 3. NUMERICAL RESULTS
balance factor
Figure 3.14: Speedup as a function of balance factor: 8 subdomain case
on CRAY systems. The Fortran program is listed in Appendix.
After illustrating the correctness and stability of the algorithm, we focus on the speedup
of our algorithm. To discuss the performance of our parallel programs, a concept of balance
factor is incorporated into Amdahl's law. The results show that, once the sequential and
parallel portions are fixed for a given multitasked program, the speedup is an increasing
function of the balance factor. On a dedicated CRAY C98 system with eight CPUs, A
speedup of 7.81, which is very close to the physical number of processors, is obtained in the
perfectly balanced case.
Chapter 4
Concluding Remarks
Based upon an implicit finite difference discretization of the two-dimensional Maxwell's
equations, a domain decomposition algorithm has been developed to solve the problem on
a multiprocessor supercomputer. The domain decomposition technique reduces the solu-
tion of a large linear system into that of several independent smaller subsystems. This is a
typical example of the large-granularity parallelism and may be efficiently implemented on
CRAY C90 systems. Numerical results have shown the correctness and the stability of the
algorithm.
A concept of balance factor is proposed to analyze the speedups for different sizes of the
test problem. In the case where the sequential and parallel portions are fixed for a given
multitasked program, the speedup of the algorithm is mainly determined by the balance
factor. A speedup close to the physical number of available processors for the execution of
our program can be obtained in the perfectly balanced case.
It is our main interest to solve electromagnetic scattering problems on CRAY systems.
To achieve the efficient use of multiprocessor architectures, one has to divide the problem
into several independent parts. So the decomposition of the solution domain is an important
strategy. Domain decomposition techniques can also be applied to complex problems. In
such cases, we may decompose the computational domain into several regular subdomains,
and ,use different discretization schemes and different solution methods for each subdomain.
CHAPTER 4 . CONCLUDING REMARKS 33
Thus it is possible to combine the advantages of finite difference, finite element and spec-
tral methods and provide opportunitities for devising more efficient and accurate algorithms.
For three-dimensional problems, we will not have a linear system similar to ( 2.23) or
( 2.27) because E,, Ey and E, will be involved in the system. The splitting of the three
spatial directions or the idea of AD1 (alternating direction implicit) method may be helpful
to solve the three-dimensional problem.
As mentioned in section 3.4, the formulation of the coupling matrices A;'s in ( 2.27) is
tedious. To avoid this, we may use explicit finite difference scheme on the interfaces and
implicit scheme in each subdomain. Once the computation at the ( n - 1)-th time step is
completed, we can implement the explicit scheme with smaller time step size At several times
to get the numerical solutions on the interfaces a t n-th time step. These solutions are taken
to be the boundary values of subdomains. Then the problems associated with different
subdomains can be solved independently. This kind of domain decomposition approach
is easy to be implemented on CRAY C90 systems even for three-dimensional problems.
More research should be done the its stability, accuracy and effectiveness of the domain
decomposition method when it is applied to three-dimensional problems.
Bibliography
[I] P.E. Bjarstad and O.B. Widlund, Iterative methods for the solution of elliptic problems
on regions partitioned into substructures, SIAM J. Numer. Anal, 23, 1986, 1097-1120.
[2] J.H. Bramble, J.E. Pasciak and A.H. Schatz, The construction of preconditioners for
elliptic problems by substructuring I, Math. Comp., 47, 103-134, 1986.
[3] V.J.Brankovic et al., An efficient two-dimensional graded mesh finite-difference time-
domain algorithm for shielded or open waveguide structures, IEEE, MTT-40, 2272-
2277, 1992.
[4] A.C.Cangellaris et al., Analysis of the numerical error cased by the stair-stepped ap-
proximation of a conducting boundary in FDTD simulations of electormagnetic phe-
nomena, IEEE AP-39, 1518-1525, 1991.
[5] CF77 Volume 4: Parallel Processing Guide, SG-3074 5.0, Cray Research, Inc.
[6] T.F. Chan, Analysis of preconditioners for domain decomposition, SIAM J. Numer.
Anal., 24, 382-390, 1987.
[7] T.F. Chan, R. Glowinski, J. Pdriaux and O.B. Widlund(Eds), Third International Sym-
posium on Domain Decomposition Methods for Partial Differential Equations, SIAM,
Philadelphia, 1990.
[8] T.F. Chan and D.E. Keyes, Interface preconditionings for domain-decomposed
convection-diffusion operators, In [7], 245-262.
[9] T.F. Chan and D.C. Resasco, A domain-decomposed fast Poisson solver on a rectangle,
SIAM J. Sci. Statist. Comput., 8, s14-s26, 1987.
BIBLIOGRAPHY 35
[lo] D.Colton and R.Kress, Integral equation methods in scattering theory, John Wiley &
Sons, Inc., NY,1983.
[ll] B.Engquist and A.Majda, Absorbing boundary conditions for the numerical simulation
of waves, Math. Comp., Vol. 31, 629-651, 1977.
[12] M.Fusco, FDTD algorithm in curvilinear coordinates, IEEE Trans. Antennas Propa-
gat., vol. 38, 76-89, 1990.
[13] M.A.Fusco et al., A three-dimensional FDTD algorithm in curvilinear coordinates,
IEEE AP-39, 1463-1471, 1991.
[14] G.H. Golub and D.F. Mayers, The use of pre-conditioning over irregular regions, Lecture
at Sixth International Conference on Computing Methods in Applied Sciences and
Engineering, Versailles, France, December 1983.
[15] W.D. Gropp and D.E. Keyes, Complexity of parallel implementation of domain de-
composition techniques for elliptic partial differential equations, SIAM J. Sci. Statist.
Comput., 9, 1988, 312-326.
[16] B.Gustafsson and J.Oliger, Stable boundary approximations for implicit time dis-
cretization~ for gas dynamics, SIAM J.Sci.Statist.Comput., v.3, 1982, 408-421.
[17] R.L.Higdon, Numerical absorbing boundary conditions for the wave equation, Math.
Comput., v.49, 65-91, 1987.
[18] R.L.Higdon, Absorbing boundary conditions for difference approximations to the multi-
dimensional wave equation, Math.Comput., v.47, No.176,437-459, 1986.
[19] R.Holland et al., Finite-difference analysis of EMP coupling to lossy dielectric struc-
tures, IEEE EMC-22, 203-209, 1983.
[20] T.G. Jurrgens, A.Taflove, K .Urnashanker and T.G. Moore, Finite-difference time-
domain modeling of curved surfaces, IEEE AP-40, 357-366, 1992.
[21] N.Madsen and R.Ziolkowski, Numerical solution of Maxwell's equations in time domain
using irregular nonorthogckal grids, Wave Motion, vol. 10, 583-596, 1988.
[22] K.Mei et al., Superabsorption - A method to improve absorbing bounbary conditions,
IEEE AP-40, 1001-1010, 1992.
BIBLIOGRAPHY 36
[23] P.Monk and E.Suli, A convergence analysis of Yee's scheme on nonuniform grids, SIAM
J. Numer. Anal., Vo1.31, No.2, 393-412, 1994.
[24] T.G.Moore et al., Theory and application of radiation boundary operators, IEEE AP-
36, 1797-1812, 1988.
[25] G.Mur, Absorbing boundary conditions for the finite-difference approximation of the
time-domain electromagnetic-field equations, IEEE Trans. on Elect. Compatibility,
EMC-23, 377-382, 1981.
[26] A.T.Perlik, T.Opash1 and A.Taflove, Predicting scattering of electromagnetic fields
using FDTD on a connection machine, IEEE Trans. Magn., vo1.25, 2910-2912, 1989.
[27] D.B.Shorthouse et al., The incorporation of static field solutions into the finite difference
time domain algorithm, IEEE MTT-40, 986-994, 1992.
[28] J.A.Stratton, Electromagnetic Theory, McGraw-Hill Book Company, NY, 1941.
[29] A.Taflove and M.E.Brodwin, Numerical solution of steady-state electromagnetic scat-
tering problems using the time-dependent Maxwell's equations, IEEE AP-23, 623-630,
1975.
[30] A.Taflove et al., Detailed FD-TD analysis of electromagnetic fields penetrating narrow
slots and lapped joints in thick condiucting screens, IEEE AP-36, 247-257, 1988.
[3 11 A.Taflove and K.R.Umashankar, A hybrid moment methodlfinite-difference time-
domain approach to electromagnetic coupling and aperture penetration into complex
geometries, IEEE Trans. Antennas Propagat., vol. AP-30, 617-627, 1982.
[32] A.Ta.flove and K.R.Umashankar, Radar cross section of general three-dimensional scat-
terers, IEEE Trans. Electromagn. Compat., Vol. EMC-25, 433-440,1983.
[33] A.Taflove and K.R.Umashankar, The finite-difference time-domain (FD-TD) method
for electromagnetic scattering and interaction problems, J. Electromag.Waves Appl.,
V01.1, 243-267, 1987.
[34] A.Taflove, K.R.Umashankar and T.G.Jurgens, Validation of FD-TD modeling of the
radar cross section of three-dimensional scatterers, IEEE Tran. Antennas Propagat.,
V O ~ . AP-33, 662-666, 1985.
BIBLIOGRAPHY 3 7
[35] A.Taflove, Application of the finite-difference time-domain method to sinusoidal steady-
state electromagnetic penetration problems, IEEE Trans. Electromagn. Cornpat., vol.
EMC-22, 191-202, 1980.
[36] P.A.Tirkas et al., Modeling of thin dielectric structures using the finite-difference time-
domain technique, IEEE AP-39, 1338-1344, 1991.
[37] P.A.Tirkas et al., Finite-difference time-domain method for antenna radiation, IEEE
AP-40, 334-340, 1992.
[38] P.A.Tirkas et al., Higher order absorbing boundary conditions for the finite-difference
time-domain method, IEEE AP-40,1215-1222,, 1992.
[39] K .R.Umashankar and A.Taflove, A novel method to analyze electromagnetic scattering
of complex objects, IEEE Trans. Electromagn. Compaat., vol. EMC-24, 397-405, 1982.
[40] K.R.Umashankar, A.Taflove and B.Beker, Calculation and experimental validation of
induced currents on coupled wires in an arbitrary shaped cavity, IEEE Trans. Antennas
Propagat., Vol. AP-35, 1248-1257, 1987.
[41] Chen Wu et al., Accurate characterization of planar printed antennas using finite-
difference time-domain method, IEEE AP-40, 526-534, 1992.
[42] K.S.Yee, Numerical solution of initial boundary value problems involving Maxwell's
equations on isotropic media, IEEE AP-14, 302-307, 1966.
APPENDIX. FORTRAN PROGRAMS
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCccCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C This program is used to solve llaxwell's equations in 2-D C C by implicit FDTD method with domain decomposition. The C C scattering region is decomposed into FOUR subdomains, and C C the subproblems in the subdomains are solved in parallel C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C MOTE: (1) use first-order difference approximation C C (2) use the Engquist-llajda's 1st-order B.C. C C (3) use direct method to solve linear system C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCccCCCCCCCCCCCCCCCCCCCCCCCCCC
program maxws4p
parameter(nx4) C c ----- CASE 1 ----- C
parameter(nlx~4l,nly=ll,n2x=ll,n2y=21) parameter(n3x~ll.n3ya2y,n4x=nlx,n4y=ll) parameter(urlllx,myn2y)
C c ----- CASE 2 ----- C c parameter(nlx=61,niy~2l,n2~=21,n2y=21) c parameter(n3~=2l,n3yaZy,n4x=nix,n4y=21) c parareter(u=nlx,mpn2y) C c ----- CASE 3 ----- C c parameter(nlx=81,nly=31,n2~=31,n2y=21) c parameter(n3~=3l,n3ya2y,n4x=nlx,n4y=31) c parameter(u=nlx,mpnly) C ----- CASE 4 ----- C c parameter(nlx~l0l,nlp41,n2x~41,n2y~21) c parameter(n3~=41,n3ya2y,n4x=nlx,n4y=41) c parameter(mx=nlx,mpnly) C c ----- CASE 5 ----- C c parameter(nlx=l21,nlp51,n2x=51,n2y=21) c parameter(n3~~51,n3ya2y,n4x=nlx,n4y=51) c parameter(urlllx,myrnly) C ----- CASE 6 ----- C C. parameter(nlx=14l,nlp61,n2~~61,n2y21) c parameter(n3~=6l,n3ylnZy,n4x~nlx,n4y~61) c parameter(u=nlx,mpnly) C
parameter(m=(nlx-l)*(nly-1)) par~eter(mi=2*(n2x+n3x)-8,m2=nlx) par~eter(mll=(nlx-2)*~nly-2),m22~(n2~-2)*(n2y-2)) parameter(m33r(n3~-2)*(n3y-2),m4rlr(n4~-2)*(n4y-2))
parameter(ixa2x-5,jyll) dimension lngth(n) ,nx(n) ,ny(n) ,n-i(n)
C c u is the largest among nlx,n2x,n3x, and n4x. c my is the largest among nly,n2y,n3y, and n4y. c mil is the largest among mll,m22,m33, and m44. c ml is the total number of mesh points on the interface. C
dimension hxn(u,my ,n) ,hyn(u,my ,n) ,ezn(ax,my ,n) dimension hyin(n2x,n),ezin(n2x,n) dimension a(ml1 ,mll ,n) ,a5(mll ,ml ,n) ,y(ml ,mll,n) dimension c(ml,ml,n),a55(ml,ml),bS(ml),u5~ml) dimension b(mll.n),u(mil,n) dimension z(ml1 ,n) integer ipvt(ml1.n) dimension r(20) real mu,eps,wl,pi
C
APPENDIX. FORTRAN PROGRAMS
c ------- set up constants ------------ C
pi=4.0*atan(1.0) mu=4.0*pi*(l.Oe-7) eps=(1.0/(36.0*pi))*(1.Oe-9) v=i.O/sqrt(mu*eps) write(*,*) 'speed v=',v write (*,*)'Please input the number of time levels lax=?' read(*,*) nax write(*,*)'Please input time step size dt=?' read(*,*) dt wl-1 .O h=wl/10.0
c h=wl/25.0 write(*,*)'dt=',dt dx=h dy=dx tw=20.0*dt tx=dt/dx ty=dt/dy txv=v*tx tyv=v*ty
C c set up coefficient matrices
&PP$ CICALL do 15 i=l,n
call matrix(a(l,l,i),mll,nx(i).ny(i),d) 15 continue
C c set up A55 matrix C c ---------- C11 C
1-0 do 18 i=l,n
call matrixO(a55,ml,n-i(i),l,ll,d) la11
18 continue C c -------- A15 C
lr=(nly-3)*(nlx-2) lc-0 do 2040 i=l,n2x-2
aS(lr+i,lc+i 2040 continue
lr=lr+(nlx-2)-(n3x-2 lc=lc+(n2x-2) do 2050 i=l,n3x-2
aS(lr+i,lc+i 2050 continue C -
APPENDIX. FORTRAN PROGRAMS
l i m o do 2060 i=l,n2x-2
aS(lr+i,lc+i,2)=-d 2060 continue
lr=(n2y-3)*(n2x-2) lc=(n2~-2)+(n3~-2) do 2070 i=l.n2x-2
a5(ir+i ,lc+i ,2)=-d 2070 continue
C lr=O lc=n2x-2 do 2080 i=l,n3x-2
aS(lr+i,lc+i,J)=-d 2080 continue
lr=(n3y-3)*(n3x-2) lc=2*(n2~-2)+(113x-2) do 2090 i=l,n3x-2
aS(lr+i,lc+i,3)=-d 2090 continue
C c --------- A45 C
lrlO lc=(n2~-2)+(n3~-2) do 2100 i=l,nZx-2
aS(lr+i,lc+i,4)=-d 2100 continue
lr=lr+(n4~-2)-(n3~-2) lc=lc+(nZx-2) do 2110 i=l.n3x-2
a5(ir+i ,lc+i ,4)=-d 2110 continue
C c find the inverse matricer
EFPPS CICALL do 2115 i4.n
call sgefa(a(l,l,i) ,mil ,lngh(i) ,ipvt(l,i) ,info) 2115 continue
:FPPS CICALL do 2116 i=l,n
call rgedi(a(l,l,i),mll,lngth(i) ,ipvt(l,i), det ,z(l ,i) ,0l)
2116 continue
EFPPS CICALL do 2118 i-1,n
call matmut2(y(1,lpi) ,a6(l,l,i) ,c(l,l,i) ,mll.ln@h(i)+l) 2118 continue C
do 2119 i=l,n call sum(a55,c(l,l,i),ml)
21 19 continue C c find the inverse of C (stored still in A55) C
call sgefa(a55,ml ,mi ,ipvt ,info) call agedi(a55,ml,ml,ipvt,det,z,Ol)
C c start to compute Hx,Hy ,Ez C
C c --- set up outer boundary values ----
APPENDIX. FORTRAN PROGRAMS
r24.0-rl do 1100 i=l,nlx
ezn(i,l.l)=r2*ezn(i,l,l)+rl*ezn(i,2,1) 1100 continue
sl=ezn(l,l,l) s2=ezn(nlx 1,l) do 1110 j=i,nly
ezn(l,j,l)=r2*ezn(l,j,l)+rl*ezn(2, j,l) ezn(nlx,j,l)=r2*ezn(n1x,j,l)+rl*ezn(nlx-l,j,1)
1110 continue ezn(l,l.l)=0.5*(sl+ezn(l,l,l)) ezn(nlx,l,l)rO.5*(s2+ezn(nlx,1,1)) do 1120 j=l,n2y
ezn(l,j,2)=r2*ezn(i,j,2)+rl*ezn(2,j.2) ezn(n3x,j,3)=r2*ezn(n3x,j,3)+rl+ezn(n3~-l,j,3)
1120 continue do 1130 j=l,n4y
ezn(l,j,4)=r2*ezn(l,j,4)+rl*ezn(2,j,4) ezn(n4x,j,4)=r2*ezn(n4x,j,4)+rl*ezn(n4~-i,j,4)
1130 continue sl=ezn(l,n4y,4) s2=ezn(n4x,n4y,4) do 1140 i=l,n4x
ezn(i,n4y,4)=r2*ezn(i,n4y,4)+rl*ezn(i,n4y-1,4) 1140 continue
ezn(l,n4y,4)r0.5*(sl+ezn(l,n4y.4)) ezn(n4x,n4y,4)=0.5*(.2+ezn(n4x,n4y,4))
C ezin(l,l)=ezn(l,niy,l) ezin(n3~,2)=ezn(nlx,nly.l) ezin(l,3)=ezn(l,l,4) ezin(n3~,4)=ezn(n4~,1,4)
C C --- set up values related to the last time level only --- C
CFPPS
1142 C c form C c ---- C
1000 C c ---- C
1010 C c ---- C
1020 C c ---- C
1030 C c ----
interface information related to the last time level only
interface 12 (1) ---- do 1000 i=2,n2x-1
s=ezin(i,l)+r(S)*(hyin(i,l)-hyinci-1,l)) s=s-r(S)*(hxn(i,1,2)-hxn(i,nly-l,1)) ezin(i,l)=s
continue
interface 24 (3) ---- do 1010 i=2,n2x-1
s=ezin(i,3)+r(S)*(hyin(i,3)-hyin(i-1,3)) s=s-r(S)*(hm(i.1,4)-hxn(i,n2y-1,2)) ezin(i ,3)=s
continue
interface 13 (2) ---- nx13mlx-n3x do 1020 i=2,n3x-1
s=ezin(i,2)+r(9)*(hyin(i,2)-hyinci-1,2)) s=s-r(S)*(hxn(i,i,3)-hxn(nxl3+i,nly-l ,I)) ezin(i,2)=s
continue
interface 34 (4) ---- nx43m4x-n3x do 1030 i=Z,n3x-l
s=ezin(i,4)+r(9)*(hyin(i,4)-hyin(i-1,4)) s=s-r(9)*(hxn(nx43+i,l,4)-hxn(i,n3y-l,3)) ezin(i,4)=s
continue
set up interior boundary values (scatterer surface) ----
APPENDIX. FORTRAN PROGRAMS
subdomain 1 ---- s=nn*dt/tv s=s*s s=exp(-s) s=1.0-s sl=2.O*pi/vl cndt=v*nn*dt do 1040 i=n2x,nlx-n3x+l
ii=i-n2x-(n2y-1)/2 sZ=sl*(cndt-(ii)+dx) ezn(i,nly,l)= - srsin(s2)
continua
subdomain 2 ---- s=ezn(n2x,nly,l) do 1060 j=l,n2y
ezn(n2x, j ,2)=s continue
subdomain 3 ---- s=ezn(nlx-n3x+l,nly,1) do 1070 j=l,n3y
ezn(1, j ,3)=s continue
subdomain 4 ---- do 1080 i=n2x,n4x-n3x+l
ezn(i,1,4)=ezn(i,nly,l) continue
c set up RHS b5 on interfaces C
1 =o do 1082 i=l,n
call rhsl(b5,ezin(l,i),ml,n2x,n-i(i),l,r) l=l+(n-i(i)-2)
1082 continue
&PPS CICALL do 301 1 i=l ,n
call rhs(b(1,i) ,ezn(l,l,i) ,mll,mx,my,nx(i) ,ny(i) ,r) 3011 continue
C do 3012 i=l,n
call matvect(y(1 ,l ,i) ,b(l,i) ,bS,mll ,ml ,nx(i) ,ny(i)) 3012 continue
C c solve the linear system on the interfaces C
APPENDIX. FORTRAN PROGRAMS
do 510 j=l,ml s=s+a55(i,j)*bS(j)
510 continue u5(i)=s
500 continue C C ---- obtain Ez and Hy on the four interfaces C
l=O call disint(u5,ezin(l,l),hyin(l,l),n2x,l,r) l=l+(n2x-2) call disint(u5,ezin(l,2),hyin(l,2),n3x,l,r) l=l+(n3x-2) call disint (u5 ,ezin(l,3) .hyin(l,3) ,n2x,l ,r) l=l+(n2xy2) call disint(u5,ezin(i,4),hyin(l,4),n3r ..r)
C C ---- solve independent linear systems in the four subdomains P
CFPPS CICALL do 512 i=l,n
call multi(a(l,l,i),a5(1,l,i),u5,m1l,ml, nx(i),ny(i),b(l,i),u(l,i))
512 continue C C ---- obtain Ez,Hx and Hy over subdomains independently C
C c artifical boundary values C
do 2000 i=l,n2x ezn(i,nly,l)=ezin(i,l) ezn(i,l,2)=ezin(i,l) ezn(i,n2y,2)=ezin(i,3) ezn(i,l,4)=ezin(i,3)
2000 continue nxl3lnlx-n3x do 2010 i=l,n3x
ezn(nxl3+i,nly,l)=ezin(i,2) ezn(i,l,3)=ezin(i,2) ezn(i,n3~,3)=ezin(i,4) ezn(nx13+i,l,4)=ezin(i,4)
2010 continue c c ............................................ C c In each subdomain, distribute u into Ezn, then get the c values of Hxn and Hyn at each mesh point.
&PP$ CICALL do 2012 i-1,n
call disval(u(1,i) ,ezn(l,l,i) ,hxn(l,l.i), , hyn(l,l,i) ,u,my,nx(i) ,ny(i),r) 2012 continue
C C ---- record the solutions on the nn-th time level ---- C
srite(7,*) nn srite(8,*) ezn(ix,jy ,2) srite(*,*)'Ez2n(', ix.',',jy,') =====a', ezn(ix,jy,2) if(nn.lt.nax) goto 100 stop end
C ............................................. C
subroutine matrixO(a,ml.nx,l,ll,d) dimension a(m1,ml)
10 continue ll=lr return
APPENDIX. FORTRAN PROGRAMS
end C ............................................. C
subroutine matrix(a,m,nx,ny,dd) dimension a(.,*)
li=lr+i if(j.eq.1) goto 40 a(1r lr-m)=-d cont h u e a(lr,lr)=d4 if(i.gt.1) a(lr,lr-I)=-d if(i.1t.u) a(lr,lr+i)=-d if(j.0q.m~) goto 50 a(lr,lr+u)=-d continue
C subroutine matmutl(a,x,y,n,m,ml) dimension a(n,n) ,x(n,ml) ,y(ml ,n)
C do 10 i=1 mi
d; 20 j=i ,m srO.0 do 30 k-1 .m
s=s+x(k,i)*a(k, j) 30 cont jnue
y(i,~)=s 20 continue 10 continue
return end
C ............................................. C
subroutine matmut2(y ,x,c,n,m,ml) dimension y(m1 ,n) ,x(n.ml) ,c(ml,ml)
C do 10 i=l ,ml
do 20 j-1,ml 810.0 do 30 k=l,m
s=s+y(i,k)*x(k, j) 30 cont jnue
c(~,J)=s 20 continue 10 continue
return end
c ............................................ C
subroutine sum(a,c,ml) dimension a(ml .ml) , c h i ,ml)
20 continue 10 continue
return end
C ............................................ C
subroutine disint(uS,ez,hy,nx,l,r' dimension u5(*) ,ez(*) ,hy(*) ,r(*)
APPENDIX. FORTRAN PROGRAMS
li=l+i-1 ez(i)=u5(li)
10 continue
C subroutine multi(a,x,u5,m,ml,nx,ny,b,u) dimension a(m,m),x(m,ml),u5(ml),b(m),u(m)
C l=(ny-2)*(nx-2) do 10 i=l,l
srO.0 do 20 j=l,ml
s=s+x(i,j)*uS(j) 20 continue
b(i)=b(i)-s 10 continue
do 30 i=l,l s=O.O do 40 j=l ,l
s=s+a(i,j)*b(j) 40 continue
u(i)=s 30 continue
return end
C ............................................ C
subroutine rhs(b,ezn,mll,m,n,nr,ny,r) dimension b(ml1) ,ezn(m,n) ,r(*)
C d=r(ll) u=nx-2 my=ny-2
C c set up RHS C
do 60 j=l ,my k=(j-l)*mx do 70 i=l u
k:=k+i b(ki)=ezn(i+l,j+l) if(i.eq.1) b(ki)=b(ki)+d*ezn(i,j+l) if(i.9q.u) b(ki)=b(ki)+d*ezn(i+2!j+l) if(j.eq.1) b(ki)=b(ki)+d*ezn(i+l,j) if(j.9q.m~) b(ki)=b(ki)+d*ezn(i+l,j+2)
70 continue 60 continue
return end
C ........................................... C
subroutine rhsl(b5,ez,ml,n,nx,l,r) dimension b5(ri) ,ex(n) ,r(*)
C
C subroutine disval(b,ez,hx,hy,m,n,nx,ny,r) dimension b(*) ,ez(m,n) ,hx(m,n) ,hy(m,n) dimension r(*)
C
APPENDIX. FORTRAN PROGRAMS
k=(j-l)*u do 90 i=l u
k:=k+i ez(i+l,j+l)=b(ki)
90 continue 80 continue
do 100 j=l,ny-1 do 110 i=l,nx
hx(i,j)=-r(6)*(ez(i,j+l)-ez(i,j))+hx(i,j) 110 cont inue 100 continue
do 120 j=l,ny do 130 i=l,nx-1
hy(i,j)=r(6)+(ez(i+l,j)-ez(i,j))+hy(i,j) 130 continue 120 continue
return end
C .......................................... C
subroutine matvect(y ,b,bS ,m,ml ,nx,ny) dimension y(m1 ,a) ,bh) ,bS(ml)
C l=(ny-2)*(nx-2) do 10 i=l,ml
s=O.O do 20 j=l,l
s=s+y(i,j)*b(j) 20 continue
bS(i)=bS(i)-s 10 continue
return end
C .......................................... C
subroutine fomul(hx,hy,ez,m,n,nx,ny,r9) dimension hx(6.n) ,hy(m,n) ,ez(m,n)
C
do 10 j=2,ny-1 do 20 i=2,nx-1
s=rg*(hy(i,j)-hy(i-1,j)) s=s-rS*(hx(i, j)-hx(i, j-1)) ez(i,j)=ez(i,j)+s
20 continue 10 continue
return end
APPENDIX. FORTRAN PROGRAMS
C This program is used to solve Maxwell's equations in 2-D li C
C by implicit FDTD method with domain decomposition. The C C scattering region i 6 decomposed into EIGHT subdomains, and C C the subproblems in the subdomains are solved in parallel C - (i
C IOTE: (1) use first-order difference approximation C (2) use the Engquist-llajda's 1st-order B.C. C (3) use direct method to solve linear system P
program maxws8p ,- - C ---- n is the required SCPUs ----
CASE 1 -----
CASE 2 -----
CASE 3 -----
CASE 4 -----
CASE 5 -----
c ----- CASE 6 ----- C
APPENDIX. FORTRAN PROGRAMS
dimension hxn(u,my ,n) ,hyn(u,my ,n) ,ezn(u,my ,n) dimension hxin(my,n),hyin(u,n),ezin(u,n)
dimension a(m,m,n) ,a9(m,ml ,n) ,y(ml ,m,n) dimension c(m1 ,ml ,n) ,a99(ml,ml) ,b9(ml) ,u9(ml) dimension b(m,n),u(m,n) dimension r(20) dimension z(m,n) integer ipvt(m,n) real mu, eps ,sl
open(unit-7,file-'n.m',status="unknown") open(unit4,file-'ez.m',status="unknoon")
C c ------- set up constants ------------
pi=4.O*atan(l.O) mu=4.O*pi*(l.Oe-7) eps=(l.O/(36.0*pi))*(l1Oe-9) v=l.O/sqrt(mu*eps) write(*,*) 'speed v-',v write (*,*) 'Please input the number of time step lax=?' read(*,*) n u write(*,*)'Please input time step size dt=?' read(*.*) dt
dx-h dy-dx tw=20.0*dt tx=dt/dx ty=dt/dy txv=v*tx tyv=v*ty r(l)=txv r(2)=tyv r 6)=tx/mu rt9)-tx/eps r(lO)=ty/eps r(ll)=r(6)*r(9) d=r(ll)
APPENDIX. FORTRAN PROGRAMS
C c set up coefficient C CFPPS CICALL
do 15 i 4 . n call
15 continue
C
c set up A99 matrix C C ---------- Cll
l=O do 18 ir1.n
call l=1l
18 continue
materices
matrix(a(i,i,i),m,nx(i),ny(i),d)
lr= -1 lc=O do 2040 j=l,nly-2
lr=lr+(nlx-2) lc=lc+1 aS(lr,lc,l)=-d
2040 continue lr=lr- nlx-2) lc=lc+lniy-2)
-- -- - lc=lc+1 a9(1r,lc,l)=-d
2042 continue
continue
ir=ir+(kx-2) lc=lc+l a9(lr,lc,3)=-d
2046 continue lr=lr lc=lc+(ntx-2) do 2048 i=l.(n3x-2)
lr=ir+l lc=lc+l a9(lr,lc,3)=-d
2048 continue
do 2066 i=l ,n4;-2 a9(lr+i,lc+i,4)=-d
2050 continue lr=(n4y-3)*(n4x-2) lc=lc+(nlx-2)+(n3x-2) do 2052 i-l,n4x-2
ag(lr+i,lc+i,4)=-d 2052 continue
APPENDIX. FORTRAN PROGRAMS
2054 continue lrr(n6r-3)*(nSx-2) lc=lc+(n3~-2)+(n4~-2) do 2066 i=l,n5x-2
ag(lr+i,lc+i,S)=-d 2056 continue
C c --------- A69 "
1r-O lc=(nly-2)+(n2y-2)+(nlx-2)+(n3x-2) do 2058 i=l,(n6x-2)
aS(lr+i,lc+i,6)=-d 2058 continue
lrr-1 lc=lc+(n4~-2)+(n5~-2) do 2060 i=l,(n6x-2)
lr=lr+(n6x-2) lc=lc+l a9(1r,lc,6)=-d
2060 continue
lc=lc+l aS(lr,lc,7)=-d a9(1r+n7~-3,lc+n7~-2,7)=-d
2062 continue
C lr=O lc=(nly-2)+(n2y-2)+(nlx-2)+(n3~-2)+(n4x-2) do 2064 i=l,(n8x-2)
ag(lr+i.lc+i,O)=-d 2064 continue
lr=-(n8x-2)+1 Ic=lc+(nSx-2)+(n6y-2) do 2066 j-1, (n8y-2)
lr=lr+(n8x-2) lc=lc+l ag(lr,lc,O)=-d
2066 continue C c find the inverse matrices C CFPPJ CICALL
do 2115 i=l,n call sgefa(a(l,l,i),m,lngth(i),ipvt(l,i),info)
2115 continue
EFPPS CICALL do 2116 i=l,n
call sgedi(a(1 ,l,i) ,m,lngth(i) ,ipvt(l ,i) , det,z(l,i) ,01)
2116 continue
C c get Schur complement C ,. EFPPS CICAU
do 2117 i=l,n call matmutl(a(l,l,i) ,a9(1,1,i) ,y(l,l,i) ,m,lngth(i) ,mi)
2117 continue C CFPPS CICALL
do 2118 i=l,n call matmut2(y(l,l,i) ,a9(l,l,i) ,c(l,l,i) ,m,lngth(i) ,mi)
2118 continue C
do 2119 i=l,n call slu(a99,c(l,l,i),ml)
21 19 continue C c find the inverse of C (stored still in A99) C
APPENDIX. FORTRAN PROGRAMS
call s~efa(a99,ml,ml,ipvt,info) call sgedi(a99,ml,ml,ipvt,det,z,Ol)
C c start to compute Hx,Hy,Ez C
c write(* ,*) ----- START TO CONPUTE FOB THE' ,nn, '-TH TINE LEVEL -----' C
c --- set up outer boundary values ----
ezn~i,l,l)=r2*ezn(i,l,l)+rl*ezn(i,2,1) continue sl=ezn(l 1 1) do 1110 j=i,nly
&n(l, j ,l)=r2*ezn(l, j,l)+rl*ezn(2, j,l) continue ezn(1 1 l)=0.5*(sl+ezn(l,l,l)) do 1130 'is1 ,n2x
ezn(i,l,2)=r2*ezn(i,l,l)+rl*ezn(i,2,2) continue do 1122 i=l,n3x
ezn(i,l,3)=r2*ezn(i,l,3)+rl*ezn(i,2,3) continue sl=ezn(n3x.l.3)
;zn<n3;, j ,3)=r2*ezn(n3x,j ,3)+rlr.zn(n3x-1, j ,3) continue ezn(n3~,1,3)4.5*(sl+ezn(n3~,1,3)) do 1130 i=l,n4y
&nil , j ,4)=r2*ezn(l, j ,4)+rl*ezn(2, j ,4) ezn(nSx,j,S)=r2*ezn(n5x,j,S)+rl*ezn(n5~-1,j,5)
continue do 1140 i=l,n6x
ezn(i,n6y,6)=r2*ezn(i,n6y,6)+rl*ezn(i,n6y-l,6) continue sl=ezn(l,n6y,6) do 1142 j=l,n6y
ezn(1, j ,6)72*ezn(l, j ,6)+rl*ezn(2, j ,6) continue ezn(l,n6y,6)=0.5*(sl+ezn(l,n6y,6)) do 1144 i=l,n7x
ezn(i,nTy,7)=r2*ezn(i,n7~,7)+rl*ezn(i.n7-1.7) continue do 1146 i=l.n8x
eznii ,n8y ,8)=r2*ezn(i ,n8y ,8)+rl*ezn(i ,n8y-1,8) continue sl=ezn(n8x,n8y,8) do 1148 j=l,n8y
ezn(n8x,j,8)=r2*ezn(n8x,j,8)+rl*ezn(n8x-l,j,8~ continue ezn(n8x ,n8y ,8)=0.5*(sl+ezn(n8x ,n8y ,8) )
C --- set up values related to the last time level only --- C
&PPS CICALL do 1150 i=l,n
call f o ~ u l ~ h x n ~ l , l , i ~ , h ~ ~ l , l , i ~ , e z n ~ l , l , i ~ , u , m y ,nx(i) ,ny(i) ,r9)
1150 continue C c -- form interface information related to the last time level only -- C C ---- interface 12 (1) ---- C
APPENDIX. FORTRAN PROGRAMS
do 1000 j=2,n2y-1 s~ezin(j,l)-r(9)*(hxin(j,l)-hxin(j-l,l)) s=s+r(9)s(hyn(l,j,2)-hyn(nl~-l,j,l)) exin(j ,l)=s
continue
interface 23 (2) ---- do 1010 j-2,n2y-1
s=ezin(j,2)-r(9)*(hxin(j,2)-hxin(j-1,2)) s=s+r(9)*(hyn(l,j,3)-hyn(n2~-1,j,2)) ezin(j ,?)=a
continue
interface 14 (3) ---- do 1012 i-2,nrlx-1
s=ezin(i,3)+r(9)*(hyin(i,3)-hyin(i-1,3)) s=s-r(9)*(hxn(i,i,4)-hxn(i,nly-l,l)) ezin(i,3)=s
cont inue
interface 35 (4) ---- do 1014 i-2,nSx-1
s=ezin(i,4)+r(9)*(hyin(i,4)-hyin(i-1,4)) s=s-r(9)*(hxn(i,l,5)-hxn(i,n3y-1,3)) ezin(i,4)=s
continue
interface 46 (5) ----
interface 58 ( 6 ) ---- do 1016 i=2,n5x-1
s=ezin(i,6)+r(9)*(hyin(i,6)-hyin(i-l,6)) s=s-r(9)*(hxn(i,l,8)-hxn(i,nSy-1,5)) ezin(i.b)=s
continue
interface 67 (7) ---- do 1017 j=2,n7y-1
s=ezin(j ,7)-r(g)*(hxin(j ,7)-hxincj-1,7)) s=s+r(9)*(hyn(l.j.7)-hyn(n6x-l,j,6)) ezin(j,7)=s
continue
interface 78 (8) ---- do 1018 j=2,n7y-1
s=ezin(j ,8)-r(9)*(hxin( j ,8)-hxincj-1,s)) s=s+r(9)*(hyn(l,j,8)-hyn(n7x-l,j,7)) ezinc j ,8)=s
continue
set up interior boundary values (scatterer surface) ----
APPENDIX. FORTRAN PROGRAMS
1040 continue C c ---- subdomain 4 t 5 ---- C
sl=ezn(l,n2y,2) s2=ezn(n2x,n2y,2) do 1060 j=l,n4y
ezn(n4x, j ,4)=sl ezn(l,j,5)=s2
1060 continue C C ---- subdomains 1,3,6,8 ----
C c set up INS bS on interfaces C
110 do 1082 i=l,n
call rhsl(b9,ezin(l,i),ml,u,n-i(i),l,r) l=l+(n-i(i)-2)
1082 continue C
do 2900 ill ,n4x-1 ezn(i,nly,l)4.0 ezn(i,l,4)=0.0 ezn(i,n4y,4)4.0 ezn(i,l,6)1).0
continue
do 2902 i=2,nSx ezn(i,n3y,3)4.0 ezn(i,l,5)4.0 ezn(i,nSy,5)4.0 ezn(i,1.8)4.0
continue
do 2904 j=l,n2y-1 ezn(nlx,j,1)4.0 ezn(1, j ,2)1).O ezn(n2x,j,2)4.0 ezn(1, j ,3)4.O
continue
do 2906 j=2,n7y ezn(n6x,j,6)4.0 ezn(1, j ,7)4.O ezn(n?x,j,7)4.0 ezn(l,j,8)1).0
continue
~ F P P J CICALL do 3012 i4.n
call rhs(b(1 ,i) ,ezn(l , l , i ) ,m,ar,my,nx(i) ,ny(i 3012 continue
C do 3014 i=l,n
call matvect(y(l,i,i) ,b(l,i) ,b9,m,ml,nx(i) ,ny 3014 continue
C
APPENDIX. FORTRAN PROGRAMS
c solve the linear system on the interfaces
do 500 i=l,ml s=O .O do 510 j=i,ml
s=s+aSS(i,j)+bS(j) continue uS(i)=s
continue
obtain Ez and Hy on the four interfaces
d=r(6) 1 =O call disint(u9,ezin(l,l),hxin(l,l),n2y,l,-d) l=l+(n2y-2) call disint(u9,ezin(l,2),hxin(l,2),n2y,l,-d) l=l+(n2y-2) call disint(u9,ezin(l.3) ,hyin(l,3) ,n4x,l ,d) l=l+(n4x-2) call disint(uS,ezin(1.4),hyin(l,4),n5x,l.d) 1=1+(n5x-2) call disint(uS,ezin(l,5) ,hyin(l,5) ,n4x,l,d) l=l+(n4x-2) call disint(u9,ezin(l,6) ,hyin(l,6) ,n5x,l,d) l=l+(n5x-2) call disint(uS,ezin(l,T) ,hxin(l,7) ,n7y,l,-d) l=l+(n7y-2) call disint(uS,ezin(1,8),hxin(l,8),n7y,l,-d)
solve independent linear systems in the four subdomains C CFPPS CICALL
do 512 i=l,n call multi(a(l,l,t) ,aS(l,l,i) ,u9,.,.i,
* nx(i) ,ny(i) ,b(l ,i) ,u(l,i)) 512 continue
C c ---- obtain Ez,Hx and Hy over subdomains independently
do 2000 j=l,n2y ezn(nlx,j,l)=ezin(j,l) ezn(l,j,2)=ezin(j,l) ezn(n2x.j ,2)=ezin(j,2) ezn(1, j ,3)=ezin(j ,2)
cont inue do 2002 i=l,n4x
ezn(i,nly,l)=ezin(i,3) ezn(i,l,4)=ezin(i,3) ezn(i,n4y,4)=ezin(i,5) ezn(i,l,b)=ezin(i,5)
continue do 2004 i=l,n5x
ezn(i,n3y,3)=ezin(i,4) ezn(i,l,S)=ezin(i,4) ezn(i,nSy,5)=ezin(i,6) ezn(i,l,8)=ezin(i,6)
continue do 2006 j=l,n7y
ezn(n6x, j ,6)=ezin( j ,7) ezn(1, j ,7)-ezin(j ,7) ezn(n7xxj,7)-ezin(j,8) ezn(1, j ,8)=ezin( j ,8)
continue
EFPPS CICALL do 2012 i=l,n
call disval(u(1 ,i) ,ezn(l ,l ,i) ,hxn(l ,l ,i) , hyn(1 ,l ,i) ,u,my ,nx(i) ,ny(i) ,r)
2012 continue C c ---- record the solutions on the n-th time level ---- C
srite(7,*) nn orite(8,s) ezn(ix, jy ,4) write(+.+)'Ez4n(', ix.',',jy,') ======', ezn(ix,jy.4) if(nn.1t.n~) goto 100 stop
APPENDIX. FORTRAN PROGRAMS
C subroutine matrixO(a.ml,nx,l,ll,d) dimension a(m1,mi)
u=nx-2 d4=4.0*d+l. 0 lr=l do 10 ill , u
lr=lr+l a(lr,lr)=d4 if(i.gt.1) a(lr,lr-I)=-d if(i.1t.u) a(lr,lr+l)=-d
10 continue ll=lr return end
C c ......................................... C
subroutine matrix(a,m,nx,ny,dd) dimension a(=,=)
d=dd d4=4 .O*d+l.O u=nx-2 mymy-2 lr=O do 20 j=l,my
do 30 ill , u lr=lr+l if(j.eq.1) goto 40 a(lr,lr-mx)=-d continue a(lr,lr)=d4 if(i.gt.1) a(lr,lr-I)=-d if(i.1t.u) a(lr,lr+l)=-d if(j.0q.m~) goto 60 a(lr lr+u)=-d
50 cont h u e 30 continue 20 continue
return end
C ....................................... C
subroutine matmutl(a,x,y.n,m,mi) dimension a(n,n) ,x(n ,mi) ,y(ml ,n)
continue y(i,i)=s
C subroutine matmut2(y ,x,c,n ,m,ml) dimension y(m1 ,n) .x(n,ml) , d m 1 ,ml)
C do 10 i=l ,ml
do 20 j=l ,ml s=O . 0 do 30 k=l,m
s=s+y(i,k)*x(k, j) 30 continue ~
c(i,j)=s 20 continue 10 continue
return end
C .............................................
APPENDIX. FORTRAN PROGRAMS
C subroutin. sum(a,c,ml) dimension a h 1 ,mi) ,c(ml ,mi)
C do 10 i-1 ,ml
do 20 j=l,ml a(i,j)=a(i,j)-c(i,j)
20 continue 10 continue
return end
C
C subroutine disint(uS,ez,hy,nx,l,d) dimension uS(*),ez(*),hy(*)
C do 10 i=2 nx-1
14-;+i-1 ez(1)-uS(1i)
10 continue c d=r(6)
do 20 i-1 ,nx-1 hy(i)=hy(i)+d*(ez(i+l)-ez(i))
20 continue return end
C ............................................ C
subroutine multi(a,x,uS,m,ml,nx.ny,b,u) dimension a(m.m) ,x(m,ml) ,uS(ml) ,b(d ,u(m)
C l=(ny-2)*(nx-2) do 10 i=l ,l
s=O . 0 do 20 j=l,ml
a-s+x(i, j)*uS(j) 20 continue
b(i)=b(i)-s 10 continue
do 30 i=l,l s=O.O do 40 j=l,l
s=s+a(i,j)*b(j) 40 continue
u(i)=s 30 continue
return end
C ............................................. C
subroutine rhs(b,ezn,mll,m,n,nx,ny,r) dimension b(mll),ezn(m,n),r(*)
C d=r(ll) u=nx-2 my-ny-2
C c set up BHS C
do 80 j=l,my k=( j-l)*u do 70 i=l u
d-k+i b(ki)-ezn(i+l, j+l) if (i.eq.1) b(ki)-b(ki)+d*ezn(i,j+l) if(i.eq.u) b(ki)-b(ki)+d*ezn(i+2,j+l) if(j.eq.1) b(ki)=b(ki)+d*ezn(i+l,j) if(j.9q.m~) b(ki)=b(ki)+d*ezn(i+l,j+2)
70 continue 60 continue
return end
C ........................................... C
subroutine rhal(bS,ez,ml,n,nx,l,r) dimension bS(m1) ,ez(n) ,r(*)
C d-r(1l)
APPENDIX. FORTRAN PROGRAMS
10 continue return end
C c .......................................... C
subroutine disval(b,ez,hx,hy,m,n,nx,ny,r) dimension b(*) ,ez(m.n) .hx(m,n) ,hy(m,n) dimension r(*)
C u-nx-2 my-ny-2 do 80 j-1,my
k=( j-l)*u do 90 i-1 u
ki-k+i ez(i+l,j+l)=b(ki)
90 continue 80 continua
do 100 j=l,ny-1 do 110 ir2,nx-1
hx(i,j)--r(6)*(ez(i,j+l)-ez(i,j))+hx(i,j) 110 continue 100 continue
do 120 j=l,ny do 130 i-1,nx-1
hy(i,j)-r(6)*(ez(i+l,j)-ez(i,j))+hy(i.j) 130 continue 120 continue
return end
C ........................................... C
subroutine matvect (y ,b ,bS , m , d ,nx,ny) dimension y(ml.m).b(m),bS(ml)
C l=(ny-2)*(nx-2)
C subroutine fomul(hx,hy,ez,m,n,nx,ny,r9) dimension hx(m,n) ,hy(m,n) ,ez(m,n)
do 10 jx2,ny-1 do 20 i-2.nx-1
s=rS*(hy(i, j)-hy(i-1, j)) s=s-rS*(hx(i, ~)-hx(i ,j-1)) ez(i,j)=ez(i,j)+s
continue continue return end