a domain decomposition algorithm for the numerical solution of...

A DOMAIN DECOMPOSITION ALGORITHM FOR THE NUMERICAL SOLUTION OF MAXWELL'S

EQUATIONS

Yijun Lu

M .Sc., Huazhong University of Science and Technology, 1990

B.Sc., Huazhong University of Science and Technology, 1985

A THESIS SUBMITTED IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE

in the Department of Mathematics and Statistics

@ Yijun Lu 1995

SIMON FRASER UNIVERSITY

August 1995

All rights reserved. This work may not be

reproduced in whole or in part, by photocopy

or other means, without the permission of the author.

APPROVAL

Name: Yijun Lu

Degree: Master of Science

Title of thesis: A Domain Decomposition Algorithm for the Numerical Solu-

tion of Maxwell's Equations

Examining Committee:

Chairman: Dr. B. R. Alspach

Dr. C. Y. Shen

Senior %pervisor

-

Dr. M. Singh

Dr. G. A. C. Graham

Dr. R. W. Lardner

External Examiner

Department of Mathematics and Statistics

Simon Fraser University

~ugust 8, 1995 Date Approved:

PARTIAL COPYRIGHT LICENSE

I hereby grant to Simon Fraser Universi the right to lend my thesis, pro'ect or extended essay (the title o which is shown below) f' B to users o the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users. I further agree that permission for multiple copying of this work for scholarly purposes may be granted by me or the Dean of Graduate Studies. It is understood that copying or publication of this work for financial gain shall not be allowed without my written permission.

Title of Thesis/Project/Extended Essay

A Domain Decomposition Algorithm f o r t h e Numerical Solut ion

of Maxwell's Equations

Author., (signatur

June 27, 1995

(date)

Abstract

The development of scalable parallel algorithms for implementation on massively parallel

processing (MPP) computers has become an important research topic in scientific comput-

ing. The long-term goal of this work to develop an accurate, efficient, flexible and scable

parallel algorithm for solving Maxwell's equations.

A domain decomposition technique together with an implicit finite difference scheme has

been used to design a parallel algorithm to solve for the electromagnetic scattering by an

infinite square metallic cylinder in the time-domain. The implicit difference scheme yields

first order discretization accuary, unconditional stability, and a large system of linear equa-

tions at each time step. The domain decomposition technique reduces the solution of this

large system to that of many independent smaller subsystems. A concept of balance factor

is proposed to analyze the speedup of the algorithm for several different cases where the

computational domain is decomposed into 4 and 8 subregions and the size of computational

domain varies from 1.6X x 1.6X to 7.2X x 7.2X.

The present algorithm has been implemented on a coarse-grain parallel vector supercom-

puter CRAY C98, running in the dedicated mode, to obtain a speedup close to the number

of available CPU's for a perfectly balanced case. The present algorithm can also be adapted

to MPP computers.

Dedication

To my grandmother

Acknowledgements

I would like to thank my senior supervisor, Dr. C.Y. Shen fa lr his e mcouragement, patient

guidence and constant support during the preparation of this thesis. I would also like to

thank Cray Research, Inc. and Mr. Evans Harrigan for providing the dedicated time on

CRAY supercomputers. Finally, financial support from the Department of Mathematics

and Statistics at Simon Fraser University is much appreciated.

Contents

Approval ii

Abstract iii

Dedication iv

Acknowledgements v

1 Introduction 1

1.1 FDTD method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Domain Decomposition Technique . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 A Domain Decomposition Algorithm 7

2.1 Implicit Finite Difference Approximation . . . . . . . . . . . . . . . . . . . . . 7

2.2 Treatment of Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Domain Decomposition Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 1 1

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Numerical Results 18

3.1 Test Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Programming and Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . 20

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Code Validation 22

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Correctness 22

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Stability 24

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Speedup 26

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Summary 30

4 Concluding Remarks 3 2

Bibliography 34

Appendix 38

vii

List of Figures

1.1 Positions of the field components on a unit cell of FDTD lattice . . . . . . . . 2

. . . . . . . . . . . . . . . . . . . . . . . . . 1.2 A domain andi ts decomposition 5

2.1 Staggered spatial mesh scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 8

. . . . . . . . . . . . . . . . . . . . . . . 2.2 Left boundary mesh scheme for i= l 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Geometry of the test problem 19

. . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Decomposition: four subdomains 19

. . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Decomposition: eight subdomains 20

3.4 Comparison 1: solid line for one domain, -0- points for 4 and 8 subdomains . 23

3.5 Comparison 2: solid line for one domain. -0- points for 4 and 8 subdomains . 23

3.6 Stability 1: h=1/20. Nx=Ny=41. dt=3.e.l0. No evidence of unstibility . . . . 25

3.7 Stability 2: h=1/20. Nx=Ny=41. dt=5.e-10; Unstability is detected at n=70 25

3.8 Stability 3: h=1/50. Nx=Ny=lOl. dt=5.0e-10; Unstability is postponed to

n=200 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Speedup: 4 subdomain case 27

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Speedup: 8 subdomain case 27

. . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Balance factor: 4 subdomain case 29

. . . . . . . . . . . . . . . . . . . . . . . . . 3.12 Balance factor: 8 subdomain case 29

. . . . . . . . . . 3.13 Speedup as a function of balance factor: 4 subdomain case 30

viii

3.14 Speedup as a function of balance factor: 8 subdomain case . . . . . . . . . . 31

Chapter 1

Introduction

Electromagnetic wave propagation problem can be formulated as an initial and/or boundary

value problem involving Maxwell's equations. The determination of the resultant electro-

magnetic fields due to various mode of excitation is both an important theoretical and

practical problem. In the case of electromagnetic scattering, the exact analytical solutions

can be obtained only for very simple scatterers such as the sphere or the circular cylinder.

For most scatterers, we must resort to numerical methods to determine the scattered field.

We will briefly discuss the FDTD method for solving Maxwell's equations in $1.1 , the

domain decomposition technique for solving elliptic problems in 51.2, and an outline of the

thesis in $1.3 .

1.1 FDTD method

The finite-difference time-domain (FDTD)([4]) algorithm is developed to solve the two

Maxwell's curl equations directly in the time domain [28]

CHAPTER 1 . INTRODUCTION 2

where E = (Ex, Ey7 E,) is the electrical field and H = (H,, H,, Hz) the magnetic field. The

constants E , a, and p are electric permittivity, electric conductivity and magnetic perme-

ability, respectively.

The FDTD algorithm uses a second-order finite difference approximation to the space

and time derivatives of each field component. Figure 1.1 shows a typical spatial mesh

scheme.

z

Figure 1.1: Positions of the field components on a unit cell of FDTD lattice

All quantities on the right-hand side of each difference equation are known from compu-

tations performed a t the previous time step. This results in a fully explicit system whereby

chronological values of the electric and magnetic field components a t each location are ob-

tained in a temporal leapfrog manner. For example, the finite-difference equations for the

field components H, and E, are as follows:

At +-[(At)-'(E,"(i, j + 112, k + 1) - EC(i, j + 112, k))

P +(Ay)-'(~,"(i, j , k + 112) - E:(i, j + 1, k + 1/2)] (1.3)

CHAPTER 1 . INTRODUCTION

- ~ ; + ' / ~ ( i - 112, j, k + 112))

+(A~)-'H:+'/~(~, j - 112, k + 112)

- H,"f 'I2(i, j + 112, k + 1/2))] (1.4)

where Fn(i , j , k) = F(iAx, j a y , kAz, nAt) for any function F(x, y, z, t).

The FDTD lattice shown in Figure 1.1 was first proposed by Yee[42] in the mid-1960's,

but its use was very limited until early 1980's when Mur[25] introduced absorbing boundary

conditions which were employed to truncate the infinite scattering field to a finite solution

domain. The basis of this absorbing ( or radiation ) boundary condition is a two-term Taylor

series approximation of a one-way wave equation [ll]. Detailed discussion of the absorbing

boundary conditions can be found in [17, 181.

Various aspects of the FDTD method have been studied by many researchers. A stabil-

ity condition vAt < (& + + + &)-'I2 was given in [29] for the computations of the

explicit difference scheme, where v is the wave velocity. Extentions of the FDTD method

to handle curved surfaces and irregular nonorthogonal meshes were given in [21]. A FDTD

algorithm in curvilinear coordinates was discussed in [12, 131. A convergence analysis of

FDTD scheme on nonuniform grids was provided in [23].

Many papers have been published on the application of FDTD method to various elec-

tromagnetic problems. An accurate simulation of an incident wave of arbitrary duration,

pulse shape, angle of incidence and polarization was reported independently in [25] and

[39]. In [39], Umashankar and Taflove provided means to obtain unambiguous sinusoidal

steady-state data from the transient reponse. Accurate computations of far-field and mono-

staticlbistatic radar cross section were given in [39, 31, 341. A computation of coupling of

wires and wire bundles in free space and in a metal cavity was reported in [40,32]. Also see

[26, 30, 35, 33, 36, 37, 411 for other applications.


Research is ongoing for each of the problems mentioned above. Key questions include

efficient use of computer resources and good resolution for large and complex problems. The

complexity of many scattering problems requires faster computing speed and large memory

of the computer system. Multiprocessor supercomputers and massively parallel processors

have been used in the electromagnetic computing. For problems involving complex scat-

terers it is desirable to use requires different techniques to deal with different parts of the

scattering domains. In such cases, a uniformed stability condition will be difficult to be

implemented. Therefore it is necessary to design a method without the requirement of a

stability condition.

1.2 Domain Decomposition Technique

Domain decomposition approach is ideally suited for the parallel solution of very large

systems of linear or nonlinear algebraic equations that arise from the discretization of a

boundary value problem.

To simplify the discussion, we consider the following two-dimensional Poisson problem:

There are two variants of the domain decomposition method, namely, those in which the

subdomains are overlapped and those non-overlapped. We shall consider the latter. If the

given domain R is divided into two subdomains as shown in Figure 1.2, and the equations

( 1.5)-( 1.6) are discretized by using finite difference or finite element approximation, we

arrive a t a linear system which can be expressed as

We number the unknowns associated with the interior points of the subdomains first and

then followed by those with the interface I'. The linear system ( 1.7) can be written in

CHAPTER 1 . INTRODUCTION

partitioned form as

Figure 1.2: A domain and its decomposition

If we eliminate u1 and u2 from the above equations by using block Gaussian elimination,

we obtain the Schur complement system

where

C = A33 - AT3A;: A13 - A ~ ~ A T ; ~ 2 3 ,

and

f3 = 63 - AT3A;bl - AT3AZ;Llb2.

If u3 can be found from ( 1.8), then we can solve the following two independent subdomain

problems

Allul = bl - A13113

and

= b2 - A23u3.

In general, the matrix C is expensive to compute explicitly. The preconditioned con-

jugate gradient (PCG) method is an attractive alternative for solving ( 1.8). In this case,


C does not have to be formed explicitly. All required is the matrix-vector product C w for

a given vector w. It is very important to keep the number of iterations low. A number

of preconditioners have been suggested in the literature to improve the convergence of the

method (see [ I , 2, 6, 10, 9, 14, 151).

Domain decomposition methods have been widely studied for solving elliptic boundary

value problems[7]. The parallel implementation of domain decomposition techniques was

reported in [15]. But little is known for the applicability or the effectiveness of the technique

when it is applied to hyperbolic or parabolic problems. The main objective of this thesis is

to demonstrate that the domain decomposition strategy can be efficiently utilized to solve

Maxwell's equations in the time domain.

1.3 Outline of the Thesis

In order to avoid the stability condition for the traditional FDTD algorithm and to effi-

ciently utilize multiprocessor supercomputers such as CRAY C90's to solve the Maxwell's

equations, a domain decomposition algorithm is proposed in this thesis. Based upon an

implicit difference discretization, the algorithm solves systems of linear equations at ev-

ery time step. Implicit difference approximations for a two-dimensional scattering problem

are described in $2.1. The treatment of an absorbing boundary condition is considered in

$2.2. The domain decomposition algorithm is given in $2.3. In Chapter 3, we discuss the

implementation of the algorithm on a CRAY C98 system. The parallelization and the pro-

gramming of the algorithm are explored in $3.2. The correctness of the numerical results

and the stability of the algorithm are given in 53.3. Also a concept of balance factor which

is used to describe the workload allocation among parallel processors is introduced to an-

alyze the speedup of the algorithm. Finally, some concluding remarks are given in Chapter 4.

Chapter 2

A Domain Decomposition

Algorithm

An implicit finite difference time-domain scheme is presentee 1. When this scheme is

combined with the appropriate approximation of an absorbing boundary condition ($2.2),

a linear system of equations with unknowns involving both the electric and magnetic fields

can be obtained. A domain decomposition algorithm is developed in $2.3 to solve a linear

system of equations involving only the electric field.

2.1 Implicit Finite Difference Approximation

Let us consider the transverse magnetic (TM) wave in two dimensions. In an isotropic

medium, the three field components Hz, H, and E, satisfy the following Maxwell's equations

in a region R. By using an absorbing boundary condition which will be discussed in $2.2,

the infinite scattering domain R can be truncated into a bounded region.

CHAPTER 2. A DOMAIN DECOMPOSITION ALGORITHM 8

Figure 2.1: Staggered spatial mesh scheme

Consider a staggered spatial mesh as shown in Figure 2.1 with mesh parameters A x and

AY.

Let F n ( i , j ) = F ( x o + i A x , yo+ j a y , to+nAt) for any function F ( x , y, t ) , where ( x o , yo, t o )

is a fixed point and A t is the temporal discretization parameter. We have the following finite

difference approximations to ( 2.1)-( 2.3):

Ez(i- 1 j )

1 1 1 1 ( H i j + 5 ) - H j + 5 ) ) = --(E:(i, j + 1 ) - E F ( 4 j ) ) A t PAY (2.4) 1 1 1 1

( i + , j ) ) = -(E:(i + 1, j ) - EF(i , j ) ) -(H;(i + 5 , j ) - Hv A t PAX (2.5)

1 1 1 1 1 ( E , j ) - E j ) ) = -[-(H;(i + 5 , j ) - H;(i - -, j ) ) A t E A x 2

1 1 1 --(H:(i, j + 5 ) - H:(i, j - -1)

AY 2 -aE:(i , j ) l (2.6)

An alternative to the above differencing schemes is to use the central difference, which

yields the following discretizations:

Hx:i

Hyci-1/2j)

Hx(i

Ezt

1 1 1 1 ( H i j + ) - H j + - ) = --[E;(i7 j + 1 ) - E:(i7 j ) A t 2 PAY

Ez(ij+l)

j+ l /2 )

E z ~ i j )

j-1R)

i j - 1)

Hy(i+1/2j) Ez(i+l j )

CHAPTER 2. A DOMAIN DECOMPOSITION ALGORITHM

1 1 1 1 ( H i + -, j ) - ( , ) = [E:(i + 1, j ) - E:(i,j) At 2

1 1 1 1 . 1 ( E , j ) - ( i j ) ) = -{[-[H;(i + -, j ) - H;(i - ?, j ) At 2~ Ax 2

It is not difficult to show that the implicit difference schemes ( 2.4)-( 2.6) and ( 2.7)-

( 2.9) are unconditionally stable and have first-order and second-order accuracy respectively.

2.2 Treatment of Boundary Conditions

As mentioned above, the exterior problem ( 2.1)-( 2.3) needs to be restricted to a finite

computational domain. The scattering field is truncated into some regularly shaped domain

such as a circle or a rectangle which will also be denoted by R. It is necessary to impose

some boundary conditions on the outer boundary a R so that the scattered wave from the

interior of R can pass through a R without being reflected. In general, it is difficult to find

an efficient boundary condition which will perform the above task perfectly [17].

In this thesis, Engquist and Majda's [ll, 171 first-order absorbing (or radiation) boundary

condition will be used, i.e.

where v = ( p ~ ) - ' / ~ is the velocity of wave propagation, and the partial derivative with

respect to n denotes the derivative in the direction of the outer normal of the boundary 8 0 .

By choosing the outer boundary as a rectangle, i.e. 52 = (a, b ) x (c, d), ( 2.10) becomes


Using the same discretization parameters A t , Ax and A y as in 32.1, we can write differ-

ence approximations of ( 2.11)-( 2.12) as

1 ( ( 1 , j ) - E 1 ( 1 j ) ) + ( E 1 ( 2 , j ) - ( 1 ) = 0 (2.15) At Ax

1 v -(EF(nz, j ) - E;-l(n, , j ) ) - -(E:-'(n,, j ) -'E;-'(n, - I d ) = 0 (2.16) At Ax

where n, is the number of mesh points in the x-direction. Figure 2.2 shows the boundary

Figure 2.2: Left boundary mesh scheme for i = l

situation for i = 1 . Notice that we use the explicit discretization here to make the problem

easier to handle. From ( 2.15) and ( 2.16), we have

The discretizations of ( 2.13) and ( 2.14) are similar to ( 2.17) and ( 2.18). Notice that I I

a weighted mean is needed for the computation of the boundary value at each vertex of I


the rectangular boundary 8 0 , since we can have two values, for example, a t the point with

coordinates ( a , c). The values of Hz and H, on 80 can easily be computed from E, by

using ( 2.4) and ( 2.5).

Once an incident wave is introduced, the boundary condition on the scatterer surface

can be obtained. For example, for a perfectly conducting body, we have the condition

EScat = -E;,, applied to the E-field tangential t o the surface of the scatterer. In the TM

wave case, we have

(Ez)scat = -(Ez)inc. (2.19)

2.3 Domain Decomposition Algorithm

A linear system of equations a t each time step tn = nAt (n = 1,2, ...) can be obtained

by combining ( 2.4)-( 2.6) with the discretized boundary conditions given in $2.2. But this

linear system contains unknows involving the electric field as well as the magnetic field, thus

the solution is difficult t o be computed. In this section, we attempt t o get rid of Hx and H,

in ( 2.6) by using ( 2.4) and ( 2.5) and develop a domain decomposition algorithm t o solve

the reduced linear system.

In fact, equations ( 2.4)-( 2.6) can be rewritten as

At H;(i, j + 112) = ( E , j + 1 - E , j ) ) + H , j + 2 ) (2.20)

PAY At

( + I , j) = -(EF(i + 1, j ) - Er( i , j ) ) + ~ ; - ' ( i + 1/2 , j ) (2.21) PAX

and

At 1 1 + [ H i , j + ) - H i , j - - 1 = E , j ) (2.22) EAY 2

Substituting ( 2.20) and ( 2.21) into ( 2.22), we find


u At ( 1 + -)E:(i, j ) - s ( s ( ~ F ( i + 1, j ) - E;( i , j ) ) + H;-'(i + 112, j )

E -L(EF(~, j ) - E:(i - 1, j ) ) - H;-'(i - 112, j ) ) PA^ +$(-s(~;(i, j + 1) - EF(i , j ) ) t H;-'(i , j + 1/21

+-$$(EF(~, j ) - EF(i , j - 1 ) ) - H;-'(i, j - 1/21) = E:-'(i, j )

After some simplifications, we have

where

u At d = 1 + - + 2(a+ b),

E

and At 1 1

fn-'(2, j ) = ~ : - ' ( i , j ) + -[H;-'(i + -, j ) - H;-'(i - 5 , j ) ] &Ax 2

At -- 1 1 [H;-'(i, j + 5 ) - ~ ; - ' ( i , j - -)I

2 (2.24)

EAY

The combination of the difference equations ( 2.23) and the discretized boundary condi-

tions in 52.2 gives us a linear system in which only the electric field E , at n-th time level is

involved as the unknows. It is worthy pointing out that the linear system ( 2.23) is strictly

diagonally dominante since d > 2(a + b).

The linear system can be written in the form of

Assume that the domain Q is decomposed into N subdomains R;, i = 1,2, ..., N . The union

of N interfaces r;, i = 1,2 , ..., N , which separate these N subdomains from each other is

denoted by r. We have the following relations:

Q = Q~ u n2 u ... u Q~ u r , 0, n Q j = 0 , for i # j ,

r = rl u r , u ... u rN


There is no direct coupling between any two different subdomains. The coupling between

the subdomains and interfaces is represented by the matrices A;, i = 1,2, ..., N, which are

sparse. Most rows of A;(i = 1,2, ..., N) are zeros and there are at most two nonzero entries

for any nonzero rows. If we use the same ordering method for the unknowns as that described

in 51.2, then the system ( 2.25) becomes

A 1

A2

where the partitions of the coefficient matrix, the unknown vector and the right hand side

are obvious.

If we let n;,i = 1,2, ..., N, be the number of unknowns in each of the subdomains,

and n, be the number of unknowns on the interface I?, then each of the matrices A;,, A; is

of the order n, x n; and n; x n, respectively for i = 1,2, ..., N. Likewise Ar is of order n, x n,.

The Schur complement system corresponding to ( 2.27) is

where

and

Once the Schur complement system ( 2.28) has been solved, we can obtain the rest of the

solution of the system ( 2.25) by solving the following subdomain problems

A;; X; = g;, (2.29)

where g, = b, - A;y, i = 1,2, . . . , N. It is clear that the problems ( 2.29) are independent of

each other and that the solutions can be sought in a parallel manner.


A domain decomposition algorithm for solving ( 2.27) can be described as follows:

ALGORITHM 1

1. Solve ( 2.28) for x r ;

2. Solve the linear systems ( 2.29) simultaneously.

For the numerical solution of the Maxwell's equations ( 2.1)-( 2.3) with a given incident

wave, we have the following algorithm:

ALGORITHM 2

1. n := 1, initialize the calculation;

2. Use ALGORITHM 1 to solve for the E, values a t the n-th time level ;

3. Use ( 2.20) and ( 2.21) t o compute Hz and H , from E, ;

4. n := n + 1, if n 5 Nmax (number of time steps), goto (2); Else exit.

Remarks

1. By using the same procedure as that used to generate ( 2.23)' we can derive the

following difference equations from ( 2.7)-( 2.9):

- ( j - ) - E - 1 j ) + E ( j ) - E ( + l j - E , 1 = f"-'(i,j) (2.30)

where

and


All the discussions for the difference equations ( 2.4)-( 2.6) can be applied to the dif-

ference equations ( 2.7)-( 2.9).

2. If each subdomain is a rectangle and the unknowns E, inside the subdomain are

ordered from left t o right, and from down and up, then each of the matrices A;;(i =

1 ,2 , . . ., N ) in ( 2.27) has the form

where

and

Since I?;, i = 1,2, ..., N , satisfies ( 2.26), no points from different Ti's are expected to

appear in the same difference equation ( 2.23). This accounts for the special structure

of matrix Ar. In fact, Ar is block diagonal. That is

with each of B;;, i = 1,2, ..., N , being of the same form as matrix T.


With these special forms, the system in ( 2.29) can be solved directly, and the inverse

of A;;(i = 1,2, . . . , N ) is easy to obtain. Furthermore, for the time-dependent prob-

lem( 2.1)-( 2.3), the matrices A;,(i = 1,2, . . . , N) and C are the same for every time

step, therefore it favors the strategy to compute A;' and C exlicitly at the beginning

of the computation.

3. Equation ( 2.23) is a two-level difference scheme involving (E:, H2, H t ) and (EF-', H:-',

HC-I). But fn-'(i, j ) can be expressed in terms of E:-'(i, j ) and E:-'(i, j ) , that is

a At fn-'(i, j ) = (2 + -)E:-'(i, j ) - ~ : - ~ ( i , j ) .

E (2.32)

So ( 2.23) can be tranformed to a three-level difference scheme. The right hand side

p - ' ( i , j ) of ( 2.30) cannot have a simple form similar to ( 2.32). The equation ( 2.23)

in which fn-'(i, j) is in the form ( 2.32) cannot be obtained directly from

which is formulated by eliminating Hz and H, from ( 2.3) using ( 2.1) and ( 2.2). But

we can use the following discretization for ( 2.33):

where

and similarly for the other terms in the right hand side of the above equation. In this

case, ( 2.34) is equivalent to


where d = 2 + 9 + 2 ( a + b ) . As we can see, this is a three-level scheme. We have

to use arrays EF-', E: as well as H c and H i for the computation at the n-th

time level. Therefore, ( 2.23) or ( 2 . 3 0 ) is a better scheme because we need to use

fewer working arrays to implement the computation.

2.4 Summary

Two implicit finite difference schemes have been presented for the discretization of the 2-

dimensional Maxwell's equations. The use of staggered grids yields the symmetric schemes.

The unconditional stability of the difference schemes is a desirable property.

To truncate an infinite scattering domain into a finite computational domain, an ab-

sorbing boundary condition is necessary. Here Engquist and Majda's first-order absorbing

boundary condition is employed, and its difference approximation is discussed.

For the linear system involving only E, unknowns, which is derived from the differ-

ence discretization of Maxwell's equations and the absorbing boundary condition, a domain

decomposition algorithm is developed. Our main concern is how to solve the problem on

multiprocessor systems such as CRAY C90's. The idea of allocating workload among sev-

eral processors requires the decomposition of the solution domain. The numerical results in

Chapter 3 will show various properties of our algorithm.

Chapter 3

Numerical Results

In this chapter, we shall consider the implementation of the algorithm given in the previous

chapter on CRAY supercomputers. In section 3.1, we describe a electromagnetic scattering

problem which will be used for our numerical experiment. Section 3.2 is dedicated to

parallelize the domain decomposition algorithm. Section 3.3 presents numerical results

which illustrate the correctness and the stability of the algorithm, and demonstrate the

speedup for different sizes of the problem and different numbers of subdomains.

3.1 Test Problems

We shall calculate the scattering field of a perfectly conducting square cylinder illuminated

by an incident plane wave. The cylinder is assumed to be infinite in the z direction. The

incident wave is assumed to be a +x-directed TM wave. Because there is no variation of

either scatterer geometry or incident fields in the z-direction, this problem may be treated

as a two-dimensional scattering problem, with only E,, Hz and Hy present. Thus the mesh

scheme of Figure 2.1 is used. The configuration of the problem is illustrated in Figure 3.1.

The scattering electric field on the surface of the conducting cylinder is set to be equal to

the negative of the incident E-field. Engquist and Majda's first-order boundary condition

is enforced on the outer boundary of the rectangular computational domain.

CHAPTER 3. NUMERICAL RESULTS 19

For the purpose of domain decomposition, we shall consider the following three cases:

1. One domain only as shown in Figure 3.1;

1

Figure 3.1: Geometry of the test problem

2. Four subdomains as shown in Figure 3.2;

Figure 3.2: Decomposition: four subdomains

3. Eight subdomains as shown in Figure 3.3.

CHAPTER 3. NUMERICAL RESULTS

Figure 3.3: Decomposition: eight subdomains

Throughout the discussion below, the incident plane wave is taken to be [22]

t 2 27r E ( t , x ) = [ l - e-(t..) ] sin - (c t - z)

X ( 3 . 1 )

where c is the speed of light, X is the wavelength and is chosen to be 25h, and h = A x = A y .

The exponential term in ( 3.1) is for the smooth transition from zero to a sinusoidal variation

and t , is chosen to be 20At . The parameters in the Maxwell's equations ( 2.1)- ( 2.3) are

taken to be

In this case, the velocity of the wave is

3.2 Programming and Parallelizat ion

Our numerical experiments are carried out on a C R A Y C98 which has eight CPUs. Its ma-

chine accuracy is around 0.8 x 10-l4 and has 512MW central memory and 4nsec clock period.

CHAPTER 3. N U M E R I C A L RESULTS 2 1

There are two classes of global variables being used in the programming of the do-

main decomposition algorithm, namely, the internal variables associated with nodes within

subdomains and the interface variables associated with nodes belonging to two or more

subdomains.

After A i l , i = 1,2, ..., N , are computed by the LINPACK subroutines SGEFA and

SGEDI, we could use the following subroutines to implement the computation of the algo-

rithm:

MATMUTI compute Y , := ATA;;~ for i = 1,2, ..., N;

MATMUT2 implement Y,Ai to get ATAilA;, i = 1,2, ..., N;

FORMUL implement ( 2.24) or ( 2.31) over each subdomain R;, i = 1,2, ..., N;

RHSl formulate the right hand side f related to the interface;

RHS formulate the right hand sides b;, i = 1,2, ..., N, related to the subdomains;

MATVECT implement Y,b; to obtain ATA;%; for i = 1,2, ..., N;

DISINT distribute interface solutions to each segment of the interface;

MULTI solve the systems ( 2.29) for i = 1,2, ..., N ;

DISVAL use ( 2.20) and ( 2.21) to calculate Hz and H , fields in every subdomain.

The domain decomposition algorithm formulated in $2.3 essentially reduces the solution

of a large linear system into that of several disjoint smaller subsystems. This is a typical

example of the coarse-grained parallelism. This kind of pardelism can be efficiently imple-

mented on the CRAY C90 systems.

Coarse-grained parallelism can also be found in

1. The calculation of Ac1(i = 1,2, ..., N);

2. The product of matrices ATA;;'A;(~ = 1,2, ..., N);


3. The product of matrix-vector AT~i ' b ; ( i = 1,2, ..., N);

4. The computation of the Hz and H , fields from E, over the N subdomains.

As discussed in 53.1, N will be chosen to be 4 or 8 for the implementation of the algo-

rithm on a CRAY C98 system.

Autotasking techniques are used to realize the parallelization of the algorithm. For

example, if the same subroutine SAMPLE(A) is called N times for different parameters A :=

A;(i = 1,2, ..., N), we can put these different parameters into a 2-D array A(M, N ) such

that

A(-, i) := A;

Then in the CRAY autotasking environment, these calls become

CFPP$ CNCALL

DO 10 i = l,N

CALL SAMPLE( A(1,i) )

10 CONTINUE

This kind of processing can make the N calls of the subroutine SAMPLE to be distributed

to N processors and implemented concurrently.

3.3 Code Validation

The desirable properties of the domain decomposition algorithm are stability and appli-

cability to multiprocessor supercomputers. Various numerical results which support our

previous discussion are presented below. Throughout the numerical computation, only the

finite difference scheme ( 2.4)-( 2.6) is implemented.

3.3.1 Correctness


Figure 3.4: Comparison 1: solid line for one domain, -0- points for .4 and 8 subdomains

Figure 3.5: Comparison 2: solid line for one domain, -0- points for 4 and 8 subdomains

Prior to demonstrating the stability and speedup of ALGORITHM 2, we need to confirm

the correctness of the algorithm. For this purpose, a sequential algorithm which solves the

linear system of equations( 2.23) over the whole computational domain (see Figure 3.1) by

using Gaussian elimination method is coded.


The computational results for the two kinds of algorithms are illustrated in Figures 3.4

and 3.5 with different parameters. The graphs show that the electric fields E, obtained by

different methods at a fixed point are exactly the same.

3.3.2 Stability

As indicated in [29], care must be taken in setting discretization parameters At and h for

the conventional

FDTD becomes

For example, At

FDTD method. For the two-dimensional case, the stability condition for

< 1.18 x 10-lo when h = 1/20. Because of the unconditional stability of

the implicit finite difference scheme ( 2.4)-( 2.6), it is not necessary for our algorithm to

satisfy ( 3.2) in the interior of R. However, for the discretization of the absorbing boundary

condition, an explicit finite difference scheme is used. According to [17], we must have

in order to satisfy the stability condition at the outer boundary. This stability condition

( 3.3) is a sufficient condition. We have considered several choices of At for the same value 1 o f h = = .

When X = 1.0, N , = N, = 41, we implemented our algorithm by taking At = 3 . 0 ~ 10-lo,

which makes > 1. There is no any evidence of instability for time step N,,, 5 5000.

This result is plotted in Figures 3.6 for the values of E, at the point (6,21) and time steps

between 1 and 600. If At = 5.0 x 10-lo, the result is given in Figure 3.7 which shows that

the instability can be detected at about n = 70. This kind of instability can be postponed

by enlarging the outer boundary. Figure 3.8 illustrates the E-field at the same physical

point and with the same parameters as in Figure 3.6 except N , = N , = 101. One can

find that the instability is postponed to n = 200. To reduce the instibility generated from

the discretized absorbing boundary condition, other approximations including the implicit

scheme to ( 2.10) will be studied in the future.


n (time step)

Figure 3.6: Stability 1: h=1/20, Nx=Ny=41, dt=3.e-10; No evidence of unstibility

Figure 3.7: Stability 2: h=1/20, Nx=Ny=41, dt=5.e-10; Unstability is detected at n=70

7

I -

I -.

I -

I

!

-2.5

-

-

0 10 20 30 40 50 60 70 n (time step)


-1 1 I 0 50 100 150 200 250

n (time step)

Figure 3.8: Stability 3: h=1/50, Nx=Ny=lOl, dt=5.0e-10; Unstability is postponed to n=200

3.3.3 Speedup

Autotasking provides a mechanism for automatic multitasking on CRAY systems. Multi-

tasking is used to decrease wall-clock execution time for a program relative to that required

for single-processor execution. A multitasked program generally has the same amount of

work for the processors to perform as does the corresponding unitasked program. However,

when the work is spread across many processors, the wall-clock time required to complete

the work should be less.

Notice that multitasking decreases only wall-clock time. In fact, multitasking generally

increases CPU time because of extra code required for starting, stopping and synchronizing

processors.

Theoretically, speedup is defined to be the ratio of the execution time for the best

sequential algorithm and the parallel algorithm. However, it is not a trivial task to determine

CHAPTER 3. NUMERICAL RESULTS 2 7

an optimal sequential algorithm for a particular application and computer architecture.

Following [15], the meaning of speedup discussed here refers to measurements relative to

the uni- and multiprocessor implementation of an algorithm. On a dedicated CRAY system,

the speedup can be calculated in the following manner:

wall-clock execution time (single- processor) Speedup =

wall-clock execution time (multitasked)

With N CPU's, a speedup as close as possible to N is desired.

By using this definition, we obtain the following two figures (Figures 3.9 and 3.10) that

show the speedups of our algorithm implemented on a CRAY C98 system for various domain

sizes at four subdomains and at eight subdomains.

Speedup -7

Figure 3.9: Speedup: 4 subdomain case

n I Wall-clock time (seconds) I n 11 Case I N.(= N,) I Sequential I p a r d e l ' I Speedup 11

Figure 3.10: Speedup: 8 subdomain case

C H A P T E R 3. NUMERICAL RESULTS 28

It should be noted, from Figures 3.9 and 3.10, that, with the increasing of the domain

size, the speedup is decreasing. To analyze this situation, we need to introduce Amdahl's law

[5]. The formulation of Amdahl's law for multitasking is shown in the following equation:

where

S,,, - Maximum expected speedup from multitasking;

N - Number of processors avaiable for parallel execution;

fp - Fraction of a program that can be executed in parallel;

f, - Fraction of a program that is sequential, and f, + f, = 1.

The speedup from multitasking, S,, is in terms of wall-clock time, not CPU time.

If half of the execution time for a program can be spent in parallel execution (50% par-

allelism) on an eight-processor CRAY C98 system, the theoretical potential speedup would

be 1.78. If 95% of the program execution in parallel, the theoetical speedup would be 5.93.

Therefore, based on Amdahl's law, it is clear that significant speedup cannot be obtained

unless a significant portion of the execution is done in parallel. To obtain speedup equal

to the physical number of processors requires the execution program to use all processors

effectively 100% of the time with no overhead. Because this is virtually impossible, perfor-

mance is dominated by the fraction of the time spent executing serial code.

But there is an assumption in the Amdhal's law that the parallel portion fp could be

allocated to the available processors evenly. That is the workload is perfectly balanced

among processors. If this goal is unable to be achieved, some kind of balance information

should be introduced into the Amdahl's law. Here we propose the following

Definition The balance factor a for a given concurrent processing of a multitasked program

implemented on a computer system with N processors is defined by

Total size of all subtasks a =

N x (size of the largest subtask) '

Since there may be several concurrent processings for a multitasked program, we may have

several different balance factors. If these balance factors are equal, then we can describe a

C H A P T E R 3. NUMERICAL RESULTS

modified Amdahl's law as

It is easy to see that 5 a 5 1. In the perfectly balanced case, we have a = 1 and the

above formula degenerates to the original Amdah17s law ( 3.4).

Once a multitasked program is given, the sequential portion f, as well as the paral-

lel portion f, is generally fixed. Therefore on a dedicated system with N processors, the

speedup is mainly determined by the balance factor a.

n 1 Size of subtask I n

Figure 3.11: Balance factor: 4 subdomain case

Case 1

n I Size of subtask I n

Nz(= N y ) 40

Figure 3.12: Balance factor: 8 subdomain case

Case 1

smallest I largest 200 1 400

a 0.750

Nz(= N y ) 60

a 1.000

smallest 400

largest 400

CHAPTER 3. NUMERICAL RESULTS 30

Figures 3.11 and 3.12 display the balance factors corresponding to cases in Figures 3.9

and 3.10. One can see from the two given figures that, although the size of the compu-

tational domain is increasing, the balance factor is decreasing. That is why we have the

situation illustrated in Figures 3.9 and 3.10.

Figures 3.13 and 3.14 graphically show speedups with respect to balance factor a. It

can be found that the speedup is an increasing function with respect to the balance factor.

In the perfectly balanced case - case 1 in Figure 3.12, a speedup 7.81 is obtained.

balance factor

Figure 3.13: Speedup as a function of balance factor: 4 subdomain case

3.4 Summary

Numerical results concerning correctness, stability and speedup of the domain decomposi-

tion algorithm given in Chapter 2 are demonstrated in this Chapter.

For the programming of our algorithm, the subroutines listed in 53.2 are important.

The construction of the coupling matrices A;, i = 1,2, ..., N, in ( 2.27) is complicated. The

method described in the last part of 53.2 plays a key role in the use of autotasking techniques


balance factor

Figure 3.14: Speedup as a function of balance factor: 8 subdomain case

on CRAY systems. The Fortran program is listed in Appendix.

After illustrating the correctness and stability of the algorithm, we focus on the speedup

of our algorithm. To discuss the performance of our parallel programs, a concept of balance

factor is incorporated into Amdahl's law. The results show that, once the sequential and

parallel portions are fixed for a given multitasked program, the speedup is an increasing

function of the balance factor. On a dedicated CRAY C98 system with eight CPUs, A

speedup of 7.81, which is very close to the physical number of processors, is obtained in the

perfectly balanced case.

Chapter 4

Concluding Remarks

Based upon an implicit finite difference discretization of the two-dimensional Maxwell's

equations, a domain decomposition algorithm has been developed to solve the problem on

a multiprocessor supercomputer. The domain decomposition technique reduces the solu-

tion of a large linear system into that of several independent smaller subsystems. This is a

typical example of the large-granularity parallelism and may be efficiently implemented on

CRAY C90 systems. Numerical results have shown the correctness and the stability of the

algorithm.

A concept of balance factor is proposed to analyze the speedups for different sizes of the

test problem. In the case where the sequential and parallel portions are fixed for a given

multitasked program, the speedup of the algorithm is mainly determined by the balance

factor. A speedup close to the physical number of available processors for the execution of

our program can be obtained in the perfectly balanced case.

It is our main interest to solve electromagnetic scattering problems on CRAY systems.

To achieve the efficient use of multiprocessor architectures, one has to divide the problem

into several independent parts. So the decomposition of the solution domain is an important

strategy. Domain decomposition techniques can also be applied to complex problems. In

such cases, we may decompose the computational domain into several regular subdomains,

and ,use different discretization schemes and different solution methods for each subdomain.

CHAPTER 4 . CONCLUDING REMARKS 33

Thus it is possible to combine the advantages of finite difference, finite element and spec-

tral methods and provide opportunitities for devising more efficient and accurate algorithms.

For three-dimensional problems, we will not have a linear system similar to ( 2.23) or

( 2.27) because E,, Ey and E, will be involved in the system. The splitting of the three

spatial directions or the idea of AD1 (alternating direction implicit) method may be helpful

to solve the three-dimensional problem.

As mentioned in section 3.4, the formulation of the coupling matrices A;'s in ( 2.27) is

tedious. To avoid this, we may use explicit finite difference scheme on the interfaces and

implicit scheme in each subdomain. Once the computation at the ( n - 1)-th time step is

completed, we can implement the explicit scheme with smaller time step size At several times

to get the numerical solutions on the interfaces a t n-th time step. These solutions are taken

to be the boundary values of subdomains. Then the problems associated with different

subdomains can be solved independently. This kind of domain decomposition approach

is easy to be implemented on CRAY C90 systems even for three-dimensional problems.

More research should be done the its stability, accuracy and effectiveness of the domain

decomposition method when it is applied to three-dimensional problems.

Bibliography

[I] P.E. Bjarstad and O.B. Widlund, Iterative methods for the solution of elliptic problems

on regions partitioned into substructures, SIAM J. Numer. Anal, 23, 1986, 1097-1120.

[2] J.H. Bramble, J.E. Pasciak and A.H. Schatz, The construction of preconditioners for

elliptic problems by substructuring I, Math. Comp., 47, 103-134, 1986.

[3] V.J.Brankovic et al., An efficient two-dimensional graded mesh finite-difference time-

domain algorithm for shielded or open waveguide structures, IEEE, MTT-40, 2272-

2277, 1992.

[4] A.C.Cangellaris et al., Analysis of the numerical error cased by the stair-stepped ap-

proximation of a conducting boundary in FDTD simulations of electormagnetic phe-

nomena, IEEE AP-39, 1518-1525, 1991.

[5] CF77 Volume 4: Parallel Processing Guide, SG-3074 5.0, Cray Research, Inc.

[6] T.F. Chan, Analysis of preconditioners for domain decomposition, SIAM J. Numer.

Anal., 24, 382-390, 1987.

[7] T.F. Chan, R. Glowinski, J. Pdriaux and O.B. Widlund(Eds), Third International Sym-

posium on Domain Decomposition Methods for Partial Differential Equations, SIAM,

Philadelphia, 1990.

[8] T.F. Chan and D.E. Keyes, Interface preconditionings for domain-decomposed

convection-diffusion operators, In [7], 245-262.

[9] T.F. Chan and D.C. Resasco, A domain-decomposed fast Poisson solver on a rectangle,

SIAM J. Sci. Statist. Comput., 8, s14-s26, 1987.

BIBLIOGRAPHY 35

[lo] D.Colton and R.Kress, Integral equation methods in scattering theory, John Wiley &

Sons, Inc., NY,1983.

[ll] B.Engquist and A.Majda, Absorbing boundary conditions for the numerical simulation

of waves, Math. Comp., Vol. 31, 629-651, 1977.

[12] M.Fusco, FDTD algorithm in curvilinear coordinates, IEEE Trans. Antennas Propa-

gat., vol. 38, 76-89, 1990.

[13] M.A.Fusco et al., A three-dimensional FDTD algorithm in curvilinear coordinates,

IEEE AP-39, 1463-1471, 1991.

[14] G.H. Golub and D.F. Mayers, The use of pre-conditioning over irregular regions, Lecture

at Sixth International Conference on Computing Methods in Applied Sciences and

Engineering, Versailles, France, December 1983.

[15] W.D. Gropp and D.E. Keyes, Complexity of parallel implementation of domain de-

composition techniques for elliptic partial differential equations, SIAM J. Sci. Statist.

Comput., 9, 1988, 312-326.

[16] B.Gustafsson and J.Oliger, Stable boundary approximations for implicit time dis-

cretization~ for gas dynamics, SIAM J.Sci.Statist.Comput., v.3, 1982, 408-421.

[17] R.L.Higdon, Numerical absorbing boundary conditions for the wave equation, Math.

Comput., v.49, 65-91, 1987.

[18] R.L.Higdon, Absorbing boundary conditions for difference approximations to the multi-

dimensional wave equation, Math.Comput., v.47, No.176,437-459, 1986.

[19] R.Holland et al., Finite-difference analysis of EMP coupling to lossy dielectric struc-

tures, IEEE EMC-22, 203-209, 1983.

[20] T.G. Jurrgens, A.Taflove, K .Urnashanker and T.G. Moore, Finite-difference time-

domain modeling of curved surfaces, IEEE AP-40, 357-366, 1992.

[21] N.Madsen and R.Ziolkowski, Numerical solution of Maxwell's equations in time domain

using irregular nonorthogckal grids, Wave Motion, vol. 10, 583-596, 1988.

[22] K.Mei et al., Superabsorption - A method to improve absorbing bounbary conditions,

IEEE AP-40, 1001-1010, 1992.

BIBLIOGRAPHY 36

[23] P.Monk and E.Suli, A convergence analysis of Yee's scheme on nonuniform grids, SIAM

J. Numer. Anal., Vo1.31, No.2, 393-412, 1994.

[24] T.G.Moore et al., Theory and application of radiation boundary operators, IEEE AP-

36, 1797-1812, 1988.

[25] G.Mur, Absorbing boundary conditions for the finite-difference approximation of the

time-domain electromagnetic-field equations, IEEE Trans. on Elect. Compatibility,

EMC-23, 377-382, 1981.

[26] A.T.Perlik, T.Opash1 and A.Taflove, Predicting scattering of electromagnetic fields

using FDTD on a connection machine, IEEE Trans. Magn., vo1.25, 2910-2912, 1989.

[27] D.B.Shorthouse et al., The incorporation of static field solutions into the finite difference

time domain algorithm, IEEE MTT-40, 986-994, 1992.

[28] J.A.Stratton, Electromagnetic Theory, McGraw-Hill Book Company, NY, 1941.

[29] A.Taflove and M.E.Brodwin, Numerical solution of steady-state electromagnetic scat-

tering problems using the time-dependent Maxwell's equations, IEEE AP-23, 623-630,

1975.

[30] A.Taflove et al., Detailed FD-TD analysis of electromagnetic fields penetrating narrow

slots and lapped joints in thick condiucting screens, IEEE AP-36, 247-257, 1988.

[3 11 A.Taflove and K.R.Umashankar, A hybrid moment methodlfinite-difference time-

domain approach to electromagnetic coupling and aperture penetration into complex

geometries, IEEE Trans. Antennas Propagat., vol. AP-30, 617-627, 1982.

[32] A.Ta.flove and K.R.Umashankar, Radar cross section of general three-dimensional scat-

terers, IEEE Trans. Electromagn. Compat., Vol. EMC-25, 433-440,1983.

[33] A.Taflove and K.R.Umashankar, The finite-difference time-domain (FD-TD) method

for electromagnetic scattering and interaction problems, J. Electromag.Waves Appl.,

V01.1, 243-267, 1987.

[34] A.Taflove, K.R.Umashankar and T.G.Jurgens, Validation of FD-TD modeling of the

radar cross section of three-dimensional scatterers, IEEE Tran. Antennas Propagat.,

V O ~ . AP-33, 662-666, 1985.

BIBLIOGRAPHY 3 7

[35] A.Taflove, Application of the finite-difference time-domain method to sinusoidal steady-

state electromagnetic penetration problems, IEEE Trans. Electromagn. Cornpat., vol.

EMC-22, 191-202, 1980.

[36] P.A.Tirkas et al., Modeling of thin dielectric structures using the finite-difference time-

domain technique, IEEE AP-39, 1338-1344, 1991.

[37] P.A.Tirkas et al., Finite-difference time-domain method for antenna radiation, IEEE

AP-40, 334-340, 1992.

[38] P.A.Tirkas et al., Higher order absorbing boundary conditions for the finite-difference

time-domain method, IEEE AP-40,1215-1222,, 1992.

[39] K .R.Umashankar and A.Taflove, A novel method to analyze electromagnetic scattering

of complex objects, IEEE Trans. Electromagn. Compaat., vol. EMC-24, 397-405, 1982.

[40] K.R.Umashankar, A.Taflove and B.Beker, Calculation and experimental validation of

induced currents on coupled wires in an arbitrary shaped cavity, IEEE Trans. Antennas

Propagat., Vol. AP-35, 1248-1257, 1987.

[41] Chen Wu et al., Accurate characterization of planar printed antennas using finite-

difference time-domain method, IEEE AP-40, 526-534, 1992.

[42] K.S.Yee, Numerical solution of initial boundary value problems involving Maxwell's

equations on isotropic media, IEEE AP-14, 302-307, 1966.

APPENDIX. FORTRAN PROGRAMS

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCccCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C This program is used to solve llaxwell's equations in 2-D C C by implicit FDTD method with domain decomposition. The C C scattering region is decomposed into FOUR subdomains, and C C the subproblems in the subdomains are solved in parallel C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C C MOTE: (1) use first-order difference approximation C C (2) use the Engquist-llajda's 1st-order B.C. C C (3) use direct method to solve linear system C C C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCccCCCCCCCCCCCCCCCCCCCCCCCCCC

program maxws4p

parameter(nx4) C c ----- CASE 1 ----- C

parameter(nlx~4l,nly=ll,n2x=ll,n2y=21) parameter(n3x~ll.n3ya2y,n4x=nlx,n4y=ll) parameter(urlllx,myn2y)

C c ----- CASE 2 ----- C c parameter(nlx=61,niy~2l,n2~=21,n2y=21) c parameter(n3~=2l,n3yaZy,n4x=nix,n4y=21) c parareter(u=nlx,mpn2y) C c ----- CASE 3 ----- C c parameter(nlx=81,nly=31,n2~=31,n2y=21) c parameter(n3~=3l,n3ya2y,n4x=nlx,n4y=31) c parameter(u=nlx,mpnly) C ----- CASE 4 ----- C c parameter(nlx~l0l,nlp41,n2x~41,n2y~21) c parameter(n3~=41,n3ya2y,n4x=nlx,n4y=41) c parameter(mx=nlx,mpnly) C c ----- CASE 5 ----- C c parameter(nlx=l21,nlp51,n2x=51,n2y=21) c parameter(n3~~51,n3ya2y,n4x=nlx,n4y=51) c parameter(urlllx,myrnly) C ----- CASE 6 ----- C C. parameter(nlx=14l,nlp61,n2~~61,n2y21) c parameter(n3~=6l,n3ylnZy,n4x~nlx,n4y~61) c parameter(u=nlx,mpnly) C

parameter(m=(nlx-l)*(nly-1)) par~eter(mi=2*(n2x+n3x)-8,m2=nlx) par~eter(mll=(nlx-2)*~nly-2),m22~(n2~-2)*(n2y-2)) parameter(m33r(n3~-2)*(n3y-2),m4rlr(n4~-2)*(n4y-2))

parameter(ixa2x-5,jyll) dimension lngth(n) ,nx(n) ,ny(n) ,n-i(n)

C c u is the largest among nlx,n2x,n3x, and n4x. c my is the largest among nly,n2y,n3y, and n4y. c mil is the largest among mll,m22,m33, and m44. c ml is the total number of mesh points on the interface. C

dimension hxn(u,my ,n) ,hyn(u,my ,n) ,ezn(ax,my ,n) dimension hyin(n2x,n),ezin(n2x,n) dimension a(ml1 ,mll ,n) ,a5(mll ,ml ,n) ,y(ml ,mll,n) dimension c(ml,ml,n),a55(ml,ml),bS(ml),u5~ml) dimension b(mll.n),u(mil,n) dimension z(ml1 ,n) integer ipvt(ml1.n) dimension r(20) real mu,eps,wl,pi

C


c ------- set up constants ------------ C

pi=4.0*atan(1.0) mu=4.0*pi*(l.Oe-7) eps=(1.0/(36.0*pi))*(1.Oe-9) v=i.O/sqrt(mu*eps) write(*,*) 'speed v=',v write (*,*)'Please input the number of time levels lax=?' read(*,*) nax write(*,*)'Please input time step size dt=?' read(*,*) dt wl-1 .O h=wl/10.0

c h=wl/25.0 write(*,*)'dt=',dt dx=h dy=dx tw=20.0*dt tx=dt/dx ty=dt/dy txv=v*tx tyv=v*ty

C c set up coefficient matrices

&PP$ CICALL do 15 i=l,n

call matrix(a(l,l,i),mll,nx(i).ny(i),d) 15 continue

C c set up A55 matrix C c ---------- C11 C

1-0 do 18 i=l,n

call matrixO(a55,ml,n-i(i),l,ll,d) la11

18 continue C c -------- A15 C

lr=(nly-3)*(nlx-2) lc-0 do 2040 i=l,n2x-2

aS(lr+i,lc+i 2040 continue

lr=lr+(nlx-2)-(n3x-2 lc=lc+(n2x-2) do 2050 i=l,n3x-2

aS(lr+i,lc+i 2050 continue C -


l i m o do 2060 i=l,n2x-2

aS(lr+i,lc+i,2)=-d 2060 continue

lr=(n2y-3)*(n2x-2) lc=(n2~-2)+(n3~-2) do 2070 i=l.n2x-2

a5(ir+i ,lc+i ,2)=-d 2070 continue

C lr=O lc=n2x-2 do 2080 i=l,n3x-2

aS(lr+i,lc+i,J)=-d 2080 continue

lr=(n3y-3)*(n3x-2) lc=2*(n2~-2)+(113x-2) do 2090 i=l,n3x-2


C c --------- A45 C

lrlO lc=(n2~-2)+(n3~-2) do 2100 i=l,nZx-2


lr=lr+(n4~-2)-(n3~-2) lc=lc+(nZx-2) do 2110 i=l.n3x-2

a5(ir+i ,lc+i ,4)=-d 2110 continue

C c find the inverse matricer

EFPPS CICALL do 2115 i4.n

call sgefa(a(l,l,i) ,mil ,lngh(i) ,ipvt(l,i) ,info) 2115 continue

:FPPS CICALL do 2116 i=l,n

call rgedi(a(l,l,i),mll,lngth(i) ,ipvt(l,i), det ,z(l ,i) ,0l)

2116 continue

EFPPS CICALL do 2118 i-1,n

call matmut2(y(1,lpi) ,a6(l,l,i) ,c(l,l,i) ,mll.ln@h(i)+l) 2118 continue C

do 2119 i=l,n call sum(a55,c(l,l,i),ml)

21 19 continue C c find the inverse of C (stored still in A55) C

call sgefa(a55,ml ,mi ,ipvt ,info) call agedi(a55,ml,ml,ipvt,det,z,Ol)

C c start to compute Hx,Hy ,Ez C

C c --- set up outer boundary values ----


r24.0-rl do 1100 i=l,nlx

ezn(i,l.l)=r2*ezn(i,l,l)+rl*ezn(i,2,1) 1100 continue

sl=ezn(l,l,l) s2=ezn(nlx 1,l) do 1110 j=i,nly

ezn(l,j,l)=r2*ezn(l,j,l)+rl*ezn(2, j,l) ezn(nlx,j,l)=r2*ezn(n1x,j,l)+rl*ezn(nlx-l,j,1)

1110 continue ezn(l,l.l)=0.5*(sl+ezn(l,l,l)) ezn(nlx,l,l)rO.5*(s2+ezn(nlx,1,1)) do 1120 j=l,n2y

ezn(l,j,2)=r2*ezn(i,j,2)+rl*ezn(2,j.2) ezn(n3x,j,3)=r2*ezn(n3x,j,3)+rl+ezn(n3~-l,j,3)

1120 continue do 1130 j=l,n4y

ezn(l,j,4)=r2*ezn(l,j,4)+rl*ezn(2,j,4) ezn(n4x,j,4)=r2*ezn(n4x,j,4)+rl*ezn(n4~-i,j,4)

1130 continue sl=ezn(l,n4y,4) s2=ezn(n4x,n4y,4) do 1140 i=l,n4x

ezn(i,n4y,4)=r2*ezn(i,n4y,4)+rl*ezn(i,n4y-1,4) 1140 continue

ezn(l,n4y,4)r0.5*(sl+ezn(l,n4y.4)) ezn(n4x,n4y,4)=0.5*(.2+ezn(n4x,n4y,4))

C ezin(l,l)=ezn(l,niy,l) ezin(n3~,2)=ezn(nlx,nly.l) ezin(l,3)=ezn(l,l,4) ezin(n3~,4)=ezn(n4~,1,4)

C C --- set up values related to the last time level only --- C

CFPPS

1142 C c form C c ---- C

1000 C c ---- C

1010 C c ---- C

1020 C c ---- C

1030 C c ----

interface information related to the last time level only

interface 12 (1) ---- do 1000 i=2,n2x-1

s=ezin(i,l)+r(S)*(hyin(i,l)-hyinci-1,l)) s=s-r(S)*(hxn(i,1,2)-hxn(i,nly-l,1)) ezin(i,l)=s

continue

interface 24 (3) ---- do 1010 i=2,n2x-1

s=ezin(i,3)+r(S)*(hyin(i,3)-hyin(i-1,3)) s=s-r(S)*(hm(i.1,4)-hxn(i,n2y-1,2)) ezin(i ,3)=s

continue

interface 13 (2) ---- nx13mlx-n3x do 1020 i=2,n3x-1

s=ezin(i,2)+r(9)*(hyin(i,2)-hyinci-1,2)) s=s-r(S)*(hxn(i,i,3)-hxn(nxl3+i,nly-l ,I)) ezin(i,2)=s

continue

interface 34 (4) ---- nx43m4x-n3x do 1030 i=Z,n3x-l

s=ezin(i,4)+r(9)*(hyin(i,4)-hyin(i-1,4)) s=s-r(9)*(hxn(nx43+i,l,4)-hxn(i,n3y-l,3)) ezin(i,4)=s

continue

set up interior boundary values (scatterer surface) ----


subdomain 1 ---- s=nn*dt/tv s=s*s s=exp(-s) s=1.0-s sl=2.O*pi/vl cndt=v*nn*dt do 1040 i=n2x,nlx-n3x+l

ii=i-n2x-(n2y-1)/2 sZ=sl*(cndt-(ii)+dx) ezn(i,nly,l)= - srsin(s2)

continua

subdomain 2 ---- s=ezn(n2x,nly,l) do 1060 j=l,n2y

ezn(n2x, j ,2)=s continue

subdomain 3 ---- s=ezn(nlx-n3x+l,nly,1) do 1070 j=l,n3y

ezn(1, j ,3)=s continue

subdomain 4 ---- do 1080 i=n2x,n4x-n3x+l

ezn(i,1,4)=ezn(i,nly,l) continue

c set up RHS b5 on interfaces C

1 =o do 1082 i=l,n

call rhsl(b5,ezin(l,i),ml,n2x,n-i(i),l,r) l=l+(n-i(i)-2)

1082 continue

&PPS CICALL do 301 1 i=l ,n

call rhs(b(1,i) ,ezn(l,l,i) ,mll,mx,my,nx(i) ,ny(i) ,r) 3011 continue

C do 3012 i=l,n

call matvect(y(1 ,l ,i) ,b(l,i) ,bS,mll ,ml ,nx(i) ,ny(i)) 3012 continue

C c solve the linear system on the interfaces C


do 510 j=l,ml s=s+a55(i,j)*bS(j)

510 continue u5(i)=s

500 continue C C ---- obtain Ez and Hy on the four interfaces C

l=O call disint(u5,ezin(l,l),hyin(l,l),n2x,l,r) l=l+(n2x-2) call disint(u5,ezin(l,2),hyin(l,2),n3x,l,r) l=l+(n3x-2) call disint (u5 ,ezin(l,3) .hyin(l,3) ,n2x,l ,r) l=l+(n2xy2) call disint(u5,ezin(i,4),hyin(l,4),n3r ..r)

C C ---- solve independent linear systems in the four subdomains P

CFPPS CICALL do 512 i=l,n

call multi(a(l,l,i),a5(1,l,i),u5,m1l,ml, nx(i),ny(i),b(l,i),u(l,i))

512 continue C C ---- obtain Ez,Hx and Hy over subdomains independently C

C c artifical boundary values C

do 2000 i=l,n2x ezn(i,nly,l)=ezin(i,l) ezn(i,l,2)=ezin(i,l) ezn(i,n2y,2)=ezin(i,3) ezn(i,l,4)=ezin(i,3)

2000 continue nxl3lnlx-n3x do 2010 i=l,n3x

ezn(nxl3+i,nly,l)=ezin(i,2) ezn(i,l,3)=ezin(i,2) ezn(i,n3~,3)=ezin(i,4) ezn(nx13+i,l,4)=ezin(i,4)

2010 continue c c ............................................ C c In each subdomain, distribute u into Ezn, then get the c values of Hxn and Hyn at each mesh point.

&PP$ CICALL do 2012 i-1,n

call disval(u(1,i) ,ezn(l,l,i) ,hxn(l,l.i), , hyn(l,l,i) ,u,my,nx(i) ,ny(i),r) 2012 continue

C C ---- record the solutions on the nn-th time level ---- C

srite(7,*) nn srite(8,*) ezn(ix,jy ,2) srite(*,*)'Ez2n(', ix.',',jy,') =====a', ezn(ix,jy,2) if(nn.lt.nax) goto 100 stop end

C ............................................. C

subroutine matrixO(a,ml.nx,l,ll,d) dimension a(m1,ml)

10 continue ll=lr return


end C ............................................. C

subroutine matrix(a,m,nx,ny,dd) dimension a(.,*)

li=lr+i if(j.eq.1) goto 40 a(1r lr-m)=-d cont h u e a(lr,lr)=d4 if(i.gt.1) a(lr,lr-I)=-d if(i.1t.u) a(lr,lr+i)=-d if(j.0q.m~) goto 50 a(lr,lr+u)=-d continue

C subroutine matmutl(a,x,y,n,m,ml) dimension a(n,n) ,x(n,ml) ,y(ml ,n)

C do 10 i=1 mi

d; 20 j=i ,m srO.0 do 30 k-1 .m

s=s+x(k,i)*a(k, j) 30 cont jnue

y(i,~)=s 20 continue 10 continue

return end

C ............................................. C

subroutine matmut2(y ,x,c,n,m,ml) dimension y(m1 ,n) ,x(n.ml) ,c(ml,ml)

C do 10 i=l ,ml

do 20 j-1,ml 810.0 do 30 k=l,m

s=s+y(i,k)*x(k, j) 30 cont jnue

c(~,J)=s 20 continue 10 continue

return end

c ............................................ C

subroutine sum(a,c,ml) dimension a(ml .ml) , c h i ,ml)

20 continue 10 continue

return end

C ............................................ C

subroutine disint(uS,ez,hy,nx,l,r' dimension u5(*) ,ez(*) ,hy(*) ,r(*)


li=l+i-1 ez(i)=u5(li)

10 continue

C subroutine multi(a,x,u5,m,ml,nx,ny,b,u) dimension a(m,m),x(m,ml),u5(ml),b(m),u(m)

C l=(ny-2)*(nx-2) do 10 i=l,l

srO.0 do 20 j=l,ml

s=s+x(i,j)*uS(j) 20 continue

b(i)=b(i)-s 10 continue

do 30 i=l,l s=O.O do 40 j=l ,l

s=s+a(i,j)*b(j) 40 continue

u(i)=s 30 continue

return end

C ............................................ C

subroutine rhs(b,ezn,mll,m,n,nr,ny,r) dimension b(ml1) ,ezn(m,n) ,r(*)

C d=r(ll) u=nx-2 my=ny-2

C c set up RHS C

do 60 j=l ,my k=(j-l)*mx do 70 i=l u

k:=k+i b(ki)=ezn(i+l,j+l) if(i.eq.1) b(ki)=b(ki)+d*ezn(i,j+l) if(i.9q.u) b(ki)=b(ki)+d*ezn(i+2!j+l) if(j.eq.1) b(ki)=b(ki)+d*ezn(i+l,j) if(j.9q.m~) b(ki)=b(ki)+d*ezn(i+l,j+2)


return end

C ........................................... C

subroutine rhsl(b5,ez,ml,n,nx,l,r) dimension b5(ri) ,ex(n) ,r(*)

C

C subroutine disval(b,ez,hx,hy,m,n,nx,ny,r) dimension b(*) ,ez(m,n) ,hx(m,n) ,hy(m,n) dimension r(*)

C


k=(j-l)*u do 90 i=l u

k:=k+i ez(i+l,j+l)=b(ki)


do 100 j=l,ny-1 do 110 i=l,nx

hx(i,j)=-r(6)*(ez(i,j+l)-ez(i,j))+hx(i,j) 110 cont inue 100 continue

do 120 j=l,ny do 130 i=l,nx-1

hy(i,j)=r(6)+(ez(i+l,j)-ez(i,j))+hy(i,j) 130 continue 120 continue

return end

C .......................................... C

subroutine matvect(y ,b,bS ,m,ml ,nx,ny) dimension y(m1 ,a) ,bh) ,bS(ml)

C l=(ny-2)*(nx-2) do 10 i=l,ml

s=O.O do 20 j=l,l

s=s+y(i,j)*b(j) 20 continue

bS(i)=bS(i)-s 10 continue

return end

C .......................................... C

subroutine fomul(hx,hy,ez,m,n,nx,ny,r9) dimension hx(6.n) ,hy(m,n) ,ez(m,n)

C

do 10 j=2,ny-1 do 20 i=2,nx-1

s=rg*(hy(i,j)-hy(i-1,j)) s=s-rS*(hx(i, j)-hx(i, j-1)) ez(i,j)=ez(i,j)+s


return end


C This program is used to solve Maxwell's equations in 2-D li C

C by implicit FDTD method with domain decomposition. The C C scattering region i 6 decomposed into EIGHT subdomains, and C C the subproblems in the subdomains are solved in parallel C - (i

C IOTE: (1) use first-order difference approximation C (2) use the Engquist-llajda's 1st-order B.C. C (3) use direct method to solve linear system P

program maxws8p ,- - C ---- n is the required SCPUs ----

CASE 1 -----

CASE 2 -----

CASE 3 -----

CASE 4 -----

CASE 5 -----

c ----- CASE 6 ----- C


dimension hxn(u,my ,n) ,hyn(u,my ,n) ,ezn(u,my ,n) dimension hxin(my,n),hyin(u,n),ezin(u,n)

dimension a(m,m,n) ,a9(m,ml ,n) ,y(ml ,m,n) dimension c(m1 ,ml ,n) ,a99(ml,ml) ,b9(ml) ,u9(ml) dimension b(m,n),u(m,n) dimension r(20) dimension z(m,n) integer ipvt(m,n) real mu, eps ,sl

open(unit-7,file-'n.m',status="unknown") open(unit4,file-'ez.m',status="unknoon")

C c ------- set up constants ------------

pi=4.O*atan(l.O) mu=4.O*pi*(l.Oe-7) eps=(l.O/(36.0*pi))*(l1Oe-9) v=l.O/sqrt(mu*eps) write(*,*) 'speed v-',v write (*,*) 'Please input the number of time step lax=?' read(*,*) n u write(*,*)'Please input time step size dt=?' read(*.*) dt

dx-h dy-dx tw=20.0*dt tx=dt/dx ty=dt/dy txv=v*tx tyv=v*ty r(l)=txv r(2)=tyv r 6)=tx/mu rt9)-tx/eps r(lO)=ty/eps r(ll)=r(6)*r(9) d=r(ll)


C c set up coefficient C CFPPS CICALL

do 15 i 4 . n call

15 continue

C

c set up A99 matrix C C ---------- Cll

l=O do 18 ir1.n

call l=1l

18 continue

materices

matrix(a(i,i,i),m,nx(i),ny(i),d)

lr= -1 lc=O do 2040 j=l,nly-2

lr=lr+(nlx-2) lc=lc+1 aS(lr,lc,l)=-d

2040 continue lr=lr- nlx-2) lc=lc+lniy-2)

-- -- - lc=lc+1 a9(1r,lc,l)=-d

2042 continue

continue

ir=ir+(kx-2) lc=lc+l a9(lr,lc,3)=-d

2046 continue lr=lr lc=lc+(ntx-2) do 2048 i=l.(n3x-2)

lr=ir+l lc=lc+l a9(lr,lc,3)=-d

2048 continue

do 2066 i=l ,n4;-2 a9(lr+i,lc+i,4)=-d

2050 continue lr=(n4y-3)*(n4x-2) lc=lc+(nlx-2)+(n3x-2) do 2052 i-l,n4x-2

ag(lr+i,lc+i,4)=-d 2052 continue


2054 continue lrr(n6r-3)*(nSx-2) lc=lc+(n3~-2)+(n4~-2) do 2066 i=l,n5x-2

ag(lr+i,lc+i,S)=-d 2056 continue

C c --------- A69 "

1r-O lc=(nly-2)+(n2y-2)+(nlx-2)+(n3x-2) do 2058 i=l,(n6x-2)


lrr-1 lc=lc+(n4~-2)+(n5~-2) do 2060 i=l,(n6x-2)

lr=lr+(n6x-2) lc=lc+l a9(1r,lc,6)=-d

2060 continue

lc=lc+l aS(lr,lc,7)=-d a9(1r+n7~-3,lc+n7~-2,7)=-d

2062 continue

C lr=O lc=(nly-2)+(n2y-2)+(nlx-2)+(n3~-2)+(n4x-2) do 2064 i=l,(n8x-2)

ag(lr+i.lc+i,O)=-d 2064 continue

lr=-(n8x-2)+1 Ic=lc+(nSx-2)+(n6y-2) do 2066 j-1, (n8y-2)

lr=lr+(n8x-2) lc=lc+l ag(lr,lc,O)=-d

2066 continue C c find the inverse matrices C CFPPJ CICALL

do 2115 i=l,n call sgefa(a(l,l,i),m,lngth(i),ipvt(l,i),info)

2115 continue

EFPPS CICALL do 2116 i=l,n

call sgedi(a(1 ,l,i) ,m,lngth(i) ,ipvt(l ,i) , det,z(l,i) ,01)

2116 continue

C c get Schur complement C ,. EFPPS CICAU

do 2117 i=l,n call matmutl(a(l,l,i) ,a9(1,1,i) ,y(l,l,i) ,m,lngth(i) ,mi)

2117 continue C CFPPS CICALL

do 2118 i=l,n call matmut2(y(l,l,i) ,a9(l,l,i) ,c(l,l,i) ,m,lngth(i) ,mi)

2118 continue C

do 2119 i=l,n call slu(a99,c(l,l,i),ml)

21 19 continue C c find the inverse of C (stored still in A99) C


call s~efa(a99,ml,ml,ipvt,info) call sgedi(a99,ml,ml,ipvt,det,z,Ol)

C c start to compute Hx,Hy,Ez C

c write(* ,*) ----- START TO CONPUTE FOB THE' ,nn, '-TH TINE LEVEL -----' C

c --- set up outer boundary values ----

ezn~i,l,l)=r2*ezn(i,l,l)+rl*ezn(i,2,1) continue sl=ezn(l 1 1) do 1110 j=i,nly

&n(l, j ,l)=r2*ezn(l, j,l)+rl*ezn(2, j,l) continue ezn(1 1 l)=0.5*(sl+ezn(l,l,l)) do 1130 'is1 ,n2x

ezn(i,l,2)=r2*ezn(i,l,l)+rl*ezn(i,2,2) continue do 1122 i=l,n3x

ezn(i,l,3)=r2*ezn(i,l,3)+rl*ezn(i,2,3) continue sl=ezn(n3x.l.3)

;zn<n3;, j ,3)=r2*ezn(n3x,j ,3)+rlr.zn(n3x-1, j ,3) continue ezn(n3~,1,3)4.5*(sl+ezn(n3~,1,3)) do 1130 i=l,n4y

&nil , j ,4)=r2*ezn(l, j ,4)+rl*ezn(2, j ,4) ezn(nSx,j,S)=r2*ezn(n5x,j,S)+rl*ezn(n5~-1,j,5)

continue do 1140 i=l,n6x

ezn(i,n6y,6)=r2*ezn(i,n6y,6)+rl*ezn(i,n6y-l,6) continue sl=ezn(l,n6y,6) do 1142 j=l,n6y

ezn(1, j ,6)72*ezn(l, j ,6)+rl*ezn(2, j ,6) continue ezn(l,n6y,6)=0.5*(sl+ezn(l,n6y,6)) do 1144 i=l,n7x

ezn(i,nTy,7)=r2*ezn(i,n7~,7)+rl*ezn(i.n7-1.7) continue do 1146 i=l.n8x

eznii ,n8y ,8)=r2*ezn(i ,n8y ,8)+rl*ezn(i ,n8y-1,8) continue sl=ezn(n8x,n8y,8) do 1148 j=l,n8y

ezn(n8x,j,8)=r2*ezn(n8x,j,8)+rl*ezn(n8x-l,j,8~ continue ezn(n8x ,n8y ,8)=0.5*(sl+ezn(n8x ,n8y ,8) )

C --- set up values related to the last time level only --- C

&PPS CICALL do 1150 i=l,n

call f o ~ u l ~ h x n ~ l , l , i ~ , h ~ ~ l , l , i ~ , e z n ~ l , l , i ~ , u , m y ,nx(i) ,ny(i) ,r9)

1150 continue C c -- form interface information related to the last time level only -- C C ---- interface 12 (1) ---- C


do 1000 j=2,n2y-1 s~ezin(j,l)-r(9)*(hxin(j,l)-hxin(j-l,l)) s=s+r(9)s(hyn(l,j,2)-hyn(nl~-l,j,l)) exin(j ,l)=s

continue

interface 23 (2) ---- do 1010 j-2,n2y-1

s=ezin(j,2)-r(9)*(hxin(j,2)-hxin(j-1,2)) s=s+r(9)*(hyn(l,j,3)-hyn(n2~-1,j,2)) ezin(j ,?)=a

continue

interface 14 (3) ---- do 1012 i-2,nrlx-1

s=ezin(i,3)+r(9)*(hyin(i,3)-hyin(i-1,3)) s=s-r(9)*(hxn(i,i,4)-hxn(i,nly-l,l)) ezin(i,3)=s

cont inue

interface 35 (4) ---- do 1014 i-2,nSx-1

s=ezin(i,4)+r(9)*(hyin(i,4)-hyin(i-1,4)) s=s-r(9)*(hxn(i,l,5)-hxn(i,n3y-1,3)) ezin(i,4)=s

continue

interface 46 (5) ----

interface 58 ( 6 ) ---- do 1016 i=2,n5x-1

s=ezin(i,6)+r(9)*(hyin(i,6)-hyin(i-l,6)) s=s-r(9)*(hxn(i,l,8)-hxn(i,nSy-1,5)) ezin(i.b)=s

continue

interface 67 (7) ---- do 1017 j=2,n7y-1

s=ezin(j ,7)-r(g)*(hxin(j ,7)-hxincj-1,7)) s=s+r(9)*(hyn(l.j.7)-hyn(n6x-l,j,6)) ezin(j,7)=s

continue

interface 78 (8) ---- do 1018 j=2,n7y-1

s=ezin(j ,8)-r(9)*(hxin( j ,8)-hxincj-1,s)) s=s+r(9)*(hyn(l,j,8)-hyn(n7x-l,j,7)) ezinc j ,8)=s

continue

set up interior boundary values (scatterer surface) ----


1040 continue C c ---- subdomain 4 t 5 ---- C

sl=ezn(l,n2y,2) s2=ezn(n2x,n2y,2) do 1060 j=l,n4y

ezn(n4x, j ,4)=sl ezn(l,j,5)=s2

1060 continue C C ---- subdomains 1,3,6,8 ----

C c set up INS bS on interfaces C

110 do 1082 i=l,n

call rhsl(b9,ezin(l,i),ml,u,n-i(i),l,r) l=l+(n-i(i)-2)

1082 continue C

do 2900 ill ,n4x-1 ezn(i,nly,l)4.0 ezn(i,l,4)=0.0 ezn(i,n4y,4)4.0 ezn(i,l,6)1).0

continue

do 2902 i=2,nSx ezn(i,n3y,3)4.0 ezn(i,l,5)4.0 ezn(i,nSy,5)4.0 ezn(i,1.8)4.0

continue

do 2904 j=l,n2y-1 ezn(nlx,j,1)4.0 ezn(1, j ,2)1).O ezn(n2x,j,2)4.0 ezn(1, j ,3)4.O

continue

do 2906 j=2,n7y ezn(n6x,j,6)4.0 ezn(1, j ,7)4.O ezn(n?x,j,7)4.0 ezn(l,j,8)1).0

continue

~ F P P J CICALL do 3012 i4.n

call rhs(b(1 ,i) ,ezn(l , l , i ) ,m,ar,my,nx(i) ,ny(i 3012 continue

C do 3014 i=l,n

call matvect(y(l,i,i) ,b(l,i) ,b9,m,ml,nx(i) ,ny 3014 continue

C


c solve the linear system on the interfaces

do 500 i=l,ml s=O .O do 510 j=i,ml

s=s+aSS(i,j)+bS(j) continue uS(i)=s

continue

obtain Ez and Hy on the four interfaces

d=r(6) 1 =O call disint(u9,ezin(l,l),hxin(l,l),n2y,l,-d) l=l+(n2y-2) call disint(u9,ezin(l,2),hxin(l,2),n2y,l,-d) l=l+(n2y-2) call disint(u9,ezin(l.3) ,hyin(l,3) ,n4x,l ,d) l=l+(n4x-2) call disint(uS,ezin(1.4),hyin(l,4),n5x,l.d) 1=1+(n5x-2) call disint(uS,ezin(l,5) ,hyin(l,5) ,n4x,l,d) l=l+(n4x-2) call disint(u9,ezin(l,6) ,hyin(l,6) ,n5x,l,d) l=l+(n5x-2) call disint(uS,ezin(l,T) ,hxin(l,7) ,n7y,l,-d) l=l+(n7y-2) call disint(uS,ezin(1,8),hxin(l,8),n7y,l,-d)

solve independent linear systems in the four subdomains C CFPPS CICALL

do 512 i=l,n call multi(a(l,l,t) ,aS(l,l,i) ,u9,.,.i,

* nx(i) ,ny(i) ,b(l ,i) ,u(l,i)) 512 continue

C c ---- obtain Ez,Hx and Hy over subdomains independently

do 2000 j=l,n2y ezn(nlx,j,l)=ezin(j,l) ezn(l,j,2)=ezin(j,l) ezn(n2x.j ,2)=ezin(j,2) ezn(1, j ,3)=ezin(j ,2)

cont inue do 2002 i=l,n4x

ezn(i,nly,l)=ezin(i,3) ezn(i,l,4)=ezin(i,3) ezn(i,n4y,4)=ezin(i,5) ezn(i,l,b)=ezin(i,5)

continue do 2004 i=l,n5x

ezn(i,n3y,3)=ezin(i,4) ezn(i,l,S)=ezin(i,4) ezn(i,nSy,5)=ezin(i,6) ezn(i,l,8)=ezin(i,6)

continue do 2006 j=l,n7y

ezn(n6x, j ,6)=ezin( j ,7) ezn(1, j ,7)-ezin(j ,7) ezn(n7xxj,7)-ezin(j,8) ezn(1, j ,8)=ezin( j ,8)

continue

EFPPS CICALL do 2012 i=l,n

call disval(u(1 ,i) ,ezn(l ,l ,i) ,hxn(l ,l ,i) , hyn(1 ,l ,i) ,u,my ,nx(i) ,ny(i) ,r)

2012 continue C c ---- record the solutions on the n-th time level ---- C

srite(7,*) nn orite(8,s) ezn(ix, jy ,4) write(+.+)'Ez4n(', ix.',',jy,') ======', ezn(ix,jy.4) if(nn.1t.n~) goto 100 stop


C subroutine matrixO(a.ml,nx,l,ll,d) dimension a(m1,mi)

u=nx-2 d4=4.0*d+l. 0 lr=l do 10 ill , u

lr=lr+l a(lr,lr)=d4 if(i.gt.1) a(lr,lr-I)=-d if(i.1t.u) a(lr,lr+l)=-d

10 continue ll=lr return end

C c ......................................... C

subroutine matrix(a,m,nx,ny,dd) dimension a(=,=)

d=dd d4=4 .O*d+l.O u=nx-2 mymy-2 lr=O do 20 j=l,my

do 30 ill , u lr=lr+l if(j.eq.1) goto 40 a(lr,lr-mx)=-d continue a(lr,lr)=d4 if(i.gt.1) a(lr,lr-I)=-d if(i.1t.u) a(lr,lr+l)=-d if(j.0q.m~) goto 60 a(lr lr+u)=-d

50 cont h u e 30 continue 20 continue

return end

C ....................................... C

subroutine matmutl(a,x,y.n,m,mi) dimension a(n,n) ,x(n ,mi) ,y(ml ,n)

continue y(i,i)=s

C subroutine matmut2(y ,x,c,n ,m,ml) dimension y(m1 ,n) .x(n,ml) , d m 1 ,ml)

C do 10 i=l ,ml

do 20 j=l ,ml s=O . 0 do 30 k=l,m

s=s+y(i,k)*x(k, j) 30 continue ~

c(i,j)=s 20 continue 10 continue

return end

C .............................................


C subroutin. sum(a,c,ml) dimension a h 1 ,mi) ,c(ml ,mi)

C do 10 i-1 ,ml

do 20 j=l,ml a(i,j)=a(i,j)-c(i,j)


return end

C

C subroutine disint(uS,ez,hy,nx,l,d) dimension uS(*),ez(*),hy(*)

C do 10 i=2 nx-1

14-;+i-1 ez(1)-uS(1i)

10 continue c d=r(6)

do 20 i-1 ,nx-1 hy(i)=hy(i)+d*(ez(i+l)-ez(i))

20 continue return end

C ............................................ C

subroutine multi(a,x,uS,m,ml,nx.ny,b,u) dimension a(m.m) ,x(m,ml) ,uS(ml) ,b(d ,u(m)

C l=(ny-2)*(nx-2) do 10 i=l ,l

s=O . 0 do 20 j=l,ml

a-s+x(i, j)*uS(j) 20 continue

b(i)=b(i)-s 10 continue

do 30 i=l,l s=O.O do 40 j=l,l

s=s+a(i,j)*b(j) 40 continue

u(i)=s 30 continue

return end

C ............................................. C

subroutine rhs(b,ezn,mll,m,n,nx,ny,r) dimension b(mll),ezn(m,n),r(*)

C d=r(ll) u=nx-2 my-ny-2

C c set up BHS C

do 80 j=l,my k=( j-l)*u do 70 i=l u

d-k+i b(ki)-ezn(i+l, j+l) if (i.eq.1) b(ki)-b(ki)+d*ezn(i,j+l) if(i.eq.u) b(ki)-b(ki)+d*ezn(i+2,j+l) if(j.eq.1) b(ki)=b(ki)+d*ezn(i+l,j) if(j.9q.m~) b(ki)=b(ki)+d*ezn(i+l,j+2)


return end

C ........................................... C

subroutine rhal(bS,ez,ml,n,nx,l,r) dimension bS(m1) ,ez(n) ,r(*)

C d-r(1l)


10 continue return end

C c .......................................... C

subroutine disval(b,ez,hx,hy,m,n,nx,ny,r) dimension b(*) ,ez(m.n) .hx(m,n) ,hy(m,n) dimension r(*)

C u-nx-2 my-ny-2 do 80 j-1,my

k=( j-l)*u do 90 i-1 u

ki-k+i ez(i+l,j+l)=b(ki)

90 continue 80 continua

do 100 j=l,ny-1 do 110 ir2,nx-1

hx(i,j)--r(6)*(ez(i,j+l)-ez(i,j))+hx(i,j) 110 continue 100 continue

do 120 j=l,ny do 130 i-1,nx-1

hy(i,j)-r(6)*(ez(i+l,j)-ez(i,j))+hy(i.j) 130 continue 120 continue

return end

C ........................................... C

subroutine matvect (y ,b ,bS , m , d ,nx,ny) dimension y(ml.m).b(m),bS(ml)

C l=(ny-2)*(nx-2)

C subroutine fomul(hx,hy,ez,m,n,nx,ny,r9) dimension hx(m,n) ,hy(m,n) ,ez(m,n)

do 10 jx2,ny-1 do 20 i-2.nx-1

s=rS*(hy(i, j)-hy(i-1, j)) s=s-rS*(hx(i, ~)-hx(i ,j-1)) ez(i,j)=ez(i,j)+s

continue continue return end

a domain decomposition algorithm for the numerical solution of...

Documents