f t exactg.de /mat/ [email protected] e keyw ords. newton's metho d, appro ximate...

Menck, J.An Approximate Newton-Like Coupling of SubsystemsComplex technical systems are often assembled from well-studied subsystems. Here, we elaborate on the obviousidea of coupling existing subsystem solvers to solve a coupled system. More speci�cally, we present a matrix-freeiterative method that is inspired by a Newton type coupling of the subsystems but aims at e�ciently controlled linearconvergence. We prove its convergence and propose a control mechanism to optimize its e�ciency. We illustrate themethod's properties with the help of a numerical example.Note: (As yet) we only deal with the stationary case; mathematically speaking, we are looking for roots of systemsof nonlinear equations.Address. J�urgen Menck, Technische Universit�at Hamburg-Harburg, Arbeitsbereich Mathematik, Kasernenstra�e12, D-21073 Hamburg, Federal Republic of Germany, http://www.tu-harburg.de/mat/, [email protected]. Newton's method, approximate Newton's method, block-structured Newton's method, coupled systems,stationary process simulation, work control.1 IntroductionComplex technical systems are often assembled by coupling well-known subsystems together. Typically, all of theseunits can be attacked with reliable solvers, maybe even specialized software packages, as long as they stand alone.It may however be doubtful how to handle them if they all interact. There are basically two substantially di�erentapproaches to the problem:� The analytic approach: Re-analyse the units, recover explicit sets of equations, assemble a set of equations forthe coupled system, then use (possibly optimized) standard solvers (e. g. Newton).� The synthetic approach: Try to combine the given subsystem solvers to get a solver for the coupled system.The analytic approach is for example the basis of the well-established owsheeting software SPEEDUP [5]. Whileit certainly has a lot of advantages, there can still be good reasons to stick to the synthetic approach:� It may seem preferable to keep a modular view of the coupled system because it helps judging the validity ofthe results.� It may be too expensive to re-analyse the units or it may be unreasonable to ignore the development costsrepresented by the subsystem solvers.� It may seem dangerous to re-analyse the units. While the subsystem solvers are probably well-tested andtuned, the new equations may be incorrect or technically awkward.� It may even be impossible to re-analyse the units. In fact, they may be represented by some sort of black boxsolvers, particularly some sort of commercial software that forbids access to the source code.In many applications the synthetic approach to the problem has thus been favoured. Unfortunately, the most popularway to couple the solvers is also the most naive: Everything is chained in some sort of block Gauss-Seidel processor block Jacobi process, which may result in divergence or very slow convergence of the hybrid solver. Sometimesthe convergence can be improved by re-ordering the systems in a clever way, but this requires extensive testing.Consequently, attempts have been made to couple the solvers in a more sophisticated way, thus ending up with alocally convergent process. Our own process is based on a well-established concept which we will call T(angential)1

B(lock) N(ewton). Basically, TBN computes Newton steps for the coupled system while retaining the given blockstructure. As this interpretation of the method suggests, TBN is guaranteed to converge under quite generalassumptions, and it will even converge quadratically if executed exactly. Numerical experience however shows thatif the subsystems are handled with linearly convergent solvers, it will typically be too expensive to force quadraticconvergence on the hybrid method. Consequently we concentrate on controlling the behaviour of a linearly convergentapproximation of TBN and making it as e�cient as possible.Tony F. Chan ([7], [8]) seems to have been the �rst to apply the TBN concept to our basic problem. His method,however, di�ers signi�cantly from ours in some respects, and a proof of convergence required restrictive assumptions.The present paper is closely related to the work of Artlich and Mackens ([2], [3], [4]) although the emphasis on linearconvergence is a novel feature.2 ProblemLet us now outline the details of our setting: Suppose we are given k 2 IN subsystems, each depending on a setxi 2 IRki of ki internal variables and a set y 2 IRkc of common external (or \coupling") variables. Each system isrepresented by its respective solver, which is assumed to be an iterative processxn+1i := �i(xni ; y) ; n 2 IN (1)(Note that a direct solver will also qualify.) To keep notations simple, we merge the subsystems into a large systemx = �(x; y) ; x = (x1; : : : ; xk) ; � = (�1; : : : ;�k) (2)Actually, the characteristic orders of magnitude of the kxik and the contraction numbers of the �i may di�erconsiderably between the di�erent units. So in practice it may be advisable to balance the systems by rescaling theinternal variables and executing some of the slower solvers repeatedly, thus usingxn+1i := �i��ii (��1i xni ; y) ; �i a positive diagonal matrix ; �i 2 IN (3)instead of (1). For the purpose of the present paper, consider everything balanced in a reasonable way. Our compositesystem reads x = �(x; y) ; (4)g(x; y) = 0 ; (5)where g represents the coupling of the units. Settingf(x; y) := x��(x; y) ; (6)we can reformulate this as a root �nding problem:f(x; y) = 0 ; (7)g(x; y) = 0 : (8)We assume that g consists of exactly as much equations as y has components, such that the coupled system is\square". The aim of our method will be to reduce suitable norms of the residual errors in (7) and (8). Thus we�nally end up with an optimization problem,max(kf(x; y)k; kg(x; y)k) = min! : (9)If there is a (locally) unique solution of (7) and (8), this system will be locally equivalent to (9).Suppose our starting point lies in the vicinity of a solution (x; y) of (7), (8) at which the joint Jacobian of f and gis nonsingular. Intermediate iterates of our method may move away from the solution a bit but our estimates willmake it easy to contain them inside a suitable region U if the starting point is \good enough". We assume that fand g are at least C2 inside U , that their joint Jacobian � fx fygx gy � remains nonsingular and that � is uniformlycontractive in the sense that kDx�(x; y)k � q < 1 holds for all (x; y).2

Of course the last assumption restricts the choice of norms for f . Typically one would use a combination of normswhich make the �i contractive. In the sequel, suppose that adequate norms have been chosen. Matrix norms willalways be assumed to be the subordinate matrix norms with respect to these vector norms, i. e.kAk = supw 6=0 kAwkkwk : (10)3 Exact Tangential Block-NewtonOur method is based on what we call the T(angential) B(lock) N(ewton) Algorithm. This is basically a blockedNewton's iteration for solving coupled systems of the form (7), (8). Variants of this method have been known andused for many years, see for example [6], [7], [14], [15], [16], [17], [18], [21], [25]. One TBN iteration step consists ofone Newton step for f with respect to x,�x := �Dxf(xn; yn)f(xn; yn) ; (11)x+ := xn +�x ; (12)and one Newton step for g along the tangential space of the manifold M := f(x; y) j f(x; y) = f(x+; yn)g through(x+; yn), �y := �S�1(x+; yn)g(x+; yn) ; (13)xn+1 := x+ �C�y ; (14)yn+1 := yn +�y : (15)The above matrices C and S are de�ned as follows:C(x; y) := (Dxf(x; y))�1Dyf(x; y) (16)S(x; y) := �Dxg(x; y)C(x; y) +Dyg(x; y) (17)C is the \correction" matrix that generates the tangential directions of M in the sense that they are all of theform (�C�y;�y). The matrix S is the total derivative of g with respect to these directions. It is also the Schurcomplement of fx in the abovementioned Jacobian J (thus the letter \S").One might say that the TBN Algorithm achieves a one-sided decoupling of f and g: While the �rst part of onecomposite step may perturb kgk considerably, the second one will not interfere with kfk up to at least �rst orderterms of �y. Thus in a neighbourhood of the solution, both kfk and kgk will be reduced by the composite step. Infact, up to a quadratically small perturbation exact TBN resembles Newton's method for the large system. (Thiscan be seen by executing a formal block Gaussian elimination step in the linear system de�ning the Newton step.)Consequently the exact TBN scheme will even converge quadratically (cf. [18]).f(x,y) = 0g(x,y) = 0

x

y

TBN trajectory

Figure 1: qualitative behaviour of TBN in (x; y)-space3

4 Approximate Tangential Block-NewtonIn [18] it was recognized that the key processes involved in TBN,1. computing the Newton step x+ � xn,2. multiplying y type vectors by C,3. solving the linear system S�y = �gmay in some sense be perturbed quadratically without losing the convergence properties of TBN. Taking this intoaccount, [2], [3], and [4] proposed to use � iterations for 1. and 2. and to replace the exact linear solver in 3. by asuitable Krylov space iteration (more speci�cally, BiCGStab). While the iterative approach is still the basis of ourmodi�ed algorithm, numerical results have shown that maintaining quadratic convergence within this frameworkmay in fact be practically not feasible, as it may lead to a prohibitive number of � iterations per step. Since � itselfis only linearly convergent, it seems wiser to aim at an overall linearly convergent variant of TBN. Unfortunatelythis also means abandoning the security of Newton's method. Considerably more sophisticated control mechanismsmay be required to ensure convergence, but in the long run the e�ort will pay o�. In this section we will introducethe framework of our modi�ed method, the A(pproximate) T(angential) B(lock) N(ewton) algorithm. Details aboutits actual implementation, particularly about the setting of the method parameters, will follow in the next sections.4.1 The f StepWe substitute the original Newton step for f by � iterations. It is more or less intuitively clear that this makes sense,since both variants are designed to make kfk smaller. In fact, if we only executed enough � iterations, the resultingstep �x would be identical with the Newton step up to second order perturbations. But as we have already pointedout, this is not our intention. Instead we design the f step such that its accuracy and length can be controlled usingstep parameters. To this end we introduce0 < � � 1 ; �1 2 IN;where �1 denotes the number of successive � iterations used and � allows to damp the resulting step:�x := ��1 (xn; yn)� xn ; (18)x+ := xn + ��x : (19)In terms of e�ciency, �1 helps balancing the quality and the costs of �x. � will turn out to be the naturaldamping parameter of the composite ATBN step if the parameter � introduced below is adapted reasonably. Fora local analysis of ATBN, � can thus be assumed to be 1. However, we prefer to keep the value variable for laterglobalization purposes.4.2 The g StepIn the g step, we replace C and S by approximations based on the Neumann series of f�1x . Furthermore, insteadof computing (13) exactly, we approximate the solution of the associated linear system S(x+ ; yn)�y = �g(x+; yn)using an iterative solver. The modi�ed step has three parameters,0 < � � 1 ; �2 2 IN ; 0 < "1 � 1 ;where �2 denotes the number of � iterations used for approximatingC, "1 is the error bound for the relative residuumof the linear equation, and � o�ers the opportunity of damping the computed g step �y . Our modi�ed matricesare the following:~C(x; y) := �2Xi=0(Dx�(x; y))iDyf(x; y) ; (20)~S(x; y) := �Dxg(x; y) ~C(x; y) +Dyg(x; y) : (21)4

As mentioned above, ~C(x; y) is derived from C by replacing f�1x by a truncated Neumann series, and ~S is theresulting approximation of S. Note that the above expressions are only intended for analytical use, not for use inthe actual implementation. Many applications of ATBN will involve large numbers of variables, which can makecomputing and storing the matrices quite expensive. On the other hand, if we approach (22) using a \transposefree" iterative method, the only operations needed from these matrices are in fact matrix-vector products, and wewill point out in section 7 how to achieve those matrix free. It will also become clear in that section why we referto �2 as the number of � iterations used for ~C.With the above matrix approximations, the g step takes the formFind �y s.t. k~S(x+; yn)�y + g(x+; yn)k � "1kg(x+; yn)k ; (22)xn+1 := x+ � � ~C�y ; (23)yn+1 := yn + ��y : (24)For (22), use BiCGStab or some other transpose free iterative solver. (For details, refer to section 7.)While �2 can be viewed as balancing the accuracy of the tangential space and the computational costs, "1 de�nesthe required quality of the Newton step for given ~C. This will become clear in section 5. � is again a dampingparameter. We will propose an optimal choice of � depending, among other things, on the value of �. Surprisingly,due to the formulation of the minimization problem (9) and the approximations involved, � = 1 will not imply� = 1. Numerical experiments suggest that typically � will instead be slightly smaller than 1.5 Analysis of the ATBN StepIn this section we analyse the ATBN step as de�ned in section 4. First we study the e�ect of the f step on thenorms of f and g, then the e�ect of the g step. Finally we combine the results to get estimates for the e�ect of thecombined step. This section may in part seem too technical but it is a necessary basis for the convergence proof insection 6 and for the development of a reasonable control mechanism in section 7.In the sequel, fn and gn will denote the values of f and g at (xn; yn), respectively. Similarly, we shall denote thevalues at (x+; yn) by f+ and g+.First of all we will show that the f step basically reduces f as inkf+k � ((1� �) + �q�1)kfnk+ �O(kfnk2) : (25)This is indeed what we intended: The undamped step (� = 1) will reduce the norm of f by a factor of q�1 , anddamping can be roughly described as linearly interpolating the values for � = 0 and � = 1. In this and the followingestimates, we keep track of the damping parameters in the remainder terms because we want to stress how dampingcan force convergence even if the norms of f and g are not yet particularly small. See the convergence theorem insection 6 for an illustration. Basically, this is already a nod in the direction of globalization.To prove the result, we note that by the implicit function theorem there is a mapping y 7! x� = x�(y) s. t. in aneighbourhood of (x; y), (x�; y) solves f = 0. An easy induction proves��(xn; yn) = x�(yn) + (Dx�(x�(yn); yn))�(xn � x�(yn)) + O(kfnk2) 8� 2 IN ; (26)and (suppressing subscripts n for once) we getf+ = f + �Dxf(x; y)�x + �2O(k�xk2)= f + �(I �Dx�(x; y))(�� x) + �2O(k�xk2)= f + �(I �Dx�(x�; y) + O(kfk)) �((Dx�(x�; y))� � I) (x� x�) +O(kfk2)� + �2O(kfk2)= f + �(I �Dx�(x�; y)) ((Dx�(x�; y))� � I) (x� x�) + �O(kfk2)= f + � ((Dx�(x�; y))� � I) (I �Dx�(x�; y))(x � x�) + �O(kfk2)= [(1� �)I + �Dx�(x�; y)� ]f + �O(kfk2) ;5

which proves (25). To estimate kg+k, we �rst note that the de�nition of �x directly impliesk�xk � 1� q�11� q kfnk ; (27)and since g+ = gn +Dxg(xn; yn)�x+ �2O(k�xk2) ; (28)we get kg+k � kgnk+ ��1� q�11� q kfnk+ �2O(kfnk2) ; where (29)� := max(x;y)2U kDxg(x; y)k : (30)This result can be interpreted as follows: Up to �rst order terms the change of kgk will be proportional to the steplength. This is mirrored by the factor � in the second right hand side term. To eliminate the unknown quantityk�xk from the estimate, we have used an upper bound which clearly shows that the step length will depend onthe residuum kfnk before the step and on the number of iterations �1. Due to the contractivity of �, there is asaturation as �1 ! 1, i. e. the � iteration steps most harmful to kgk are the �rst few. The factor � is a measurefor the sensitivity of kgk w. r. t. changes of x. If f and g were decoupled, � would be 0.Let us now have a look at the g step. It's e�ect on the norm of g itself is readily computed: Using (21), (23) and(24), we getgn+1 = g+ + � ~S(x+; yn)�y + �2O(k�yk2) ; (31)and since multiplying (22) by ~S�1 yieldsk�y + ~S�1(x+; yn)g+k � "1k ~S�1(x+; yn)k � kg+k ; (32)the y step size is bounded according tok�yk � (1 + "1)k ~S�1(x+; yn)k � kg+k : (33)This implies that we can replace O(k�yk2) by O(kg+k2), s.t. we can combine (22) and (31) to getkgn+1k � ((1� �) + �"1)kg+k+ �2O(kg+k2) : (34)To better understand this estimate, let us �rst note that the damping parameter � again achieves a linear interpo-lation between the extremes � = 0 and � = 1. From now on, consider the undamped case � = 1. In this case theg step basically reduces kg+k by a factor of "1. This means that up to �rst order terms the parameter "1 is thecontraction rate that we demand from the approximate Newton step for g. As "1 ! 0, the behaviour of the stepwill resemble superlinear convergence, which hints at the fact that �y will converge against the exact Newton stepfor given ~C. (Obviously, (22) quite literally becomes the equation of the Newton step.) Note that like in the case ofthe f step, quadratic decrease of the error would increase the computational costs; thus it is not a feature that wereally strive for in our actual implementation.If you are puzzled by the fact that the accuracy of ~C has no e�ect on the improvement of kg+k, remember that theapproximate directions used in the de�nition of (22) are also used for the actual update of (xn+1; yn+1). Consequentlythe undamped step will (up to �rst order terms) actually reduce kgk as predicted in (22), and the error in ~C willonly a�ect kfn+1k.To compute an estimate for kfn+1k, we use ~C �C = �(Dx�)�2+1C to write the increment asfn+1 � f+ = ��(Dxf(x+; yn) ~C(x+; yn)�Dyf(x+; yn))�y + �2O(k�yk2)= ��Dxf(x+; yn)( ~C(x+; yn)� C(x+; yn))�y ��(Dxf(x+ ; yn)C(x+; yn)�Dyf(x+; yn))�y + �2O(k�yk2)= ��Dxf(x+; yn)( ~C(x+; yn)� C(x+; yn))�y + �2O(k�yk2)= �Dxf(x+; yn)(Dx�(x+; yn))�2+1C(x+; yn)�y + �2O(k�yk2)= �(Dx�(x+; yn))�2+1Dxf(x+ ; yn)C(x+; yn)�y + �2O(k�yk2)6

= �(Dx�(x+; yn))�2+1Dyf(x+; yn)�y + �2O(k�yk2) ; (35)and using (33), we �nally getkfn+1k � kf+k+ �(1 + "1)q�2+1�kg+k+ �2O(kg+k2) ; where (36)� := max(x;y)2U kDyf(x; y)k � k~S�1(x; y)k : (37)In most respects this estimate resembles (29) with the roles of f and g reversed: Up to �rst order terms, the changeof kf+k will be proportional to the damping factor �, and there is an analogous relation to the size of k�yk whichgets obscured by the fact that we have replaced this unknown quantity by other known quantities. The size of kg+khas basically the same e�ect as the size of kfnk in the aforementioned estimate.The basic new feature here is the factor q�2+1. It shows how improving the quality of ~C will lessen the sensitivityof f with respect to the g step. For �2 ! 1, ~C would converge towards the exact matrix C, and the one-sideddecoupling mentioned in the context of exact TBN would become apparent, since up to �rst order terms the g stepwould not a�ect kf+k at all.It may be noted that while �2 was not involved in (34), the present estimate is also very nearly independent of "1.Thus the e�ects of these parameters are quite easy to separate, which helps understanding the way in which ATBNwill work.Finally, let us have a look at the constant �. In analogy to �, this quantity can be described as a measure forthe sensitivity of kfk w. r. t. certain variations. More precisely, � consists of two components: k ~S�1k measures theresponse of g to variations in the approximate tangential directions. If for example f and g are strongly coupled,small angles may occur between tangential directions of f f � const. g and tangential directions of f g � const. g withthe unfortunate consequence that kgk may react poorly to variations along the direction used for the g step. In sucha situation k ~S�1k, and probably the Newton step as well, will become large. In keeping with this interpretation,k ~S�1k has entered our formula via (33).The second component of � is kDyf(x; y)k : This quantity measures the sensitivity of f w. r. t. variations in y thatare not accompanied by any tangential correction in x. It has to be multiplied by the aforementioned factor q�2+1to indicate the sensitivity w. r. t. the directions generated using ~C.In the sequel, let mn denote mn := max(kfnk; kgnk). Then, by combining the above inequalities, we can �nallyestimate the e�ect of the combined step on the norms of f and g:kfn+1k � ((1� �) + �q�1 ) kfnk+ �(1 + "1)q�2+1�kgnk+ ��q�2+1 1� q�11� q ��kfnk+�2�O(m2n) + �2O(m2n) ; (38)kgn+1k � ((1� �) + �"1)�kgnk+ �1� q�11� q �kfnk�+�2O(m2n) + �2O(m2n) : (39)Roughly speaking, the �rst right-hand side term in (38) represents the (intended) reduction of kfk by the f step, thesecond one (with factor �) accounts for the perturbation of kfk that would be expected from the g step if executedbefore the f step, and the third term (with factor ��) is some sort of a feedback in that it models the additionalperturbation of kfk caused by having to start the g step with an already perturbed g. The factor ((1� �)+ �"1) onthe right-hand side of (39) accounts for the (intended) reduction of kgk in the g step, while the perturbation in theother bracket (with factor �) is the unwelcome e�ect of the f step on g that is again being mirrored in the feedbackterm of (38). For further details, please refer to the preceding discussion of the separate f and g steps.6 Linear ConvergenceIn the present section we will prove the local linear convergence of ATBN on the condition that the method parametersare set in a reasonable way. Since the result will not rely on a speci�c control mechanism, we will postpone presentingours to section 7. 7

To avoid complicating matters by dealing with di�erent cases, we �rst replace kfnk and kgnk by the maximummn = max(kfnk; kgnk) on the right hand sides of (38) and (39). Our actual control mechanism will actually be moresubtle than that, but for the purpose of the proof the simpli�ed model su�ces:kgn+1k � qg;2mn + �2O(m2n) + �2O(m2n) ; where (40)qg;1 := 1 + �1� q�11� q � ; (41)qg;2 := ((1 � �) + �"1)qg;1 ; and (42)kfn+1k � qf;2mn + �2�O(m2n) + �2O(m2n) ; where (43)qf;1 := (1� �) + �q�1 ; (44)qf;2 := qf;1 + �(1 + "1)q�2+1�qg;1 : (45)Viewing the factors qf;2 and qg;2 as functions of the damping parameters, we �nd@@�qg;2 � 0 ; @@� qg;2 < 0 ;@@�qf;2 < 0 ; @@� qf;2 � 0 : (46)Particularly, increasing � will increase qf;2 and decrease qg;2. Thus, supposing all other parameters are �xed, theoptimal choice of � will be� = max(0;min(1; ��)) ; where (47)�� := qg;1 � qf;1qg;1((1 + "1)q�2+1� + (1� "1)) = � (1� q�1 )�1 + �1�q�qg;1((1 + "1)q�2+1�+ (1� "1)) (48)because �� achieves qf;2 = qg;2. (Moreover, �� is the only value of � to achieve this as long as � 6= 0.) Elementarycalculations using 1 � qg;1 � (1 + �)=(1� q) yieldK�� K+� ; where (49)K� = (1� q)2(1 + �)(1 + 2�q) ; (50)K+ = 1 + �(1� q)(1� "1) : (51)Thus basically �� and � are proportional and �� > 0 is satis�ed for every 0 < � � 1. Our choice of � now impliesmn+1 � max(qf;2 ; "1qg;1)mn + �2O(m2n) ; (52)and using the upper bound K+, a short calculation yieldsmn+1mn � max(q�; qmin) + �2O(mn) ; where (53)q� = (1� �) + ��q�1 + q�2 � 2q� (1 + �)2(1� q)2(1� "1)� ; (54)qmin = "11� "1 (1 + �)2(1� q)2 : (55)As the names suggest, q� is the reduction factor associated with �� while qmin relates to the case � = 1 < ��. For�1; �2 !1 and "1 ! 0 the estimate simply readsmn+1mn � (1� �) + �2O(mn) : (56)This illustrates that asymptotically the ATBN step indeed behaves like a damped Newton step; in particular, theresiduum will be reduced quadratically as � ! 1. If we introduce an "1 > 0 again, (56) will hold true as long as(1 � �) remains larger than qmin: As we have already remarked, "1 is a bound for the reduction of kgk in the gstep. On the other hand, if we keep "1 = 0 and let �1 or �2 be a �nite number, the lower bound qmin will disappearbut instead q� will fail to approximate 0 as � ! 1: Obviously, computing the g step exactly will not yet ensurequadratic convergence. Also note that the formula (54) con�rms quite clearly that the necessity of using large �2depends very much on � and �, as should have been expected.8

More importantly, (53) shows that for any combination of su�ciently large �1, �2 and su�ciently small "1 the ATBNalgorithm will exhibit linear convergence. Furthermore it is apparently possible and reasonable to choose � = 1near the solution; � is thus a true damping parameter for the composite ATBN step which will only be needed forglobalization purposes. Finally the attainable factor mn+1=mn is only bounded by O(mn) from below, which meansthat (theoretically) any convergence rate can be achieved. This is not to be misunderstood as a recommendation:Always bear in mind that the convergence rate mn+1=mn refers to the composite ATBN step as a black box solver.It does not allow for an evaluation of the computational costs involved. We will come back to this point in thefollowing theorem:T h e o r em (Convergence of Approximate Tangential Block-Newton)Let the assumptions of section 2 hold true.a) ATBN can be made to converge linearly with any given convergence rate 0 < qcomp. < 1. More precisely, thereis a neighbourhood of the solution such that for su�ciently large �1, �2 and su�ciently small "1mn+1mn � (1� �) + �qcomp. (57)will hold for all 0 � � � 1 and suitably chosen � = �(�). In particular, the convergence rate qcomp. is achievedfor � = 1. Using smaller � will in most cases enlarge the domain of attraction of the solution (as well as thedomain of validity for (57)).b) ATBN can be made to achieve an e�ective convergence rate q � qe�. < 1 in the following sense: Assume thatthe iterative solver always succeeds in solving the modi�ed g equation (22) and that the number of iterationsteps needed is bounded for any given "1. Let � = �(x; y; �1; �2; "1; �) denote the number of � evaluationsneeded to compute the ATBN step. Then there is a neighbourhood of the solution and a q � qe�. < 1 suchthat for � = 1 and suitably chosen �1, �2, "1 and � the ATBN step will satisfy�mn+1mn �1=� � qe�. : (58)The proof will be sketched below. To convince yourself that the assumption of b) on the linear solver is reasonable,suppose w0 = 0 is used as the start approximation. The corresponding linear residuum ~Sw0 + g+ will then be g+,and (22) can be interpreted as a condition to reduce the linear residuum by a (�xed) factor of "1. Any acceptableiteration process should achieve that within a �nite number of steps, and in fact there should also be a bound onthat number which (in a neighbourhood of (x; y)) only depends on "1. Unfortunately it is known that for the kindof solvers we want to use, namely BiCGStab or comparable Krylov subspace methods, the convergence behaviour isvery hard to predict. Even if the notorious breakdowns of Lanczos based algorithms are excluded, the methods mayfail to converge in �nite precision arithmetic, see [12]. On the other hand, even if exact arithmetic is assumed andthe iterates of the method at hand satisfy a minimization property, the convergence may still be intolerably slow,see for example [11]. In other words, the typical iteration process will not be acceptable for all kinds of ~S. Theproblem seems intrinsic, and the best help is probably providing alternative solvers to choose from if the defaultmethod should fail. Note that if one Krylov subspace method fails to solve a problem e�ciently, another one maystill succeed ([19]).Proof. Part a) of the theorem is an immediate consequence of (53), which is almost also true for part b):Under the assumptions of b) the number of � evaluations needed to compute the step inside a given neighbourhoodof the solution can be bounded in dependence of the parameters �1, �2 and "1 only,�(x; y; �1; �2; "1; �) � �(�1; �2; "1) : (59)Thus, choosing a neighbourhood, a qcomp. and suitable �1, �2 and "1 according to a) yields�mn+1mn �1=� � �mn+1mn �1=� � (qcomp.)1=� ; (60)which proves that indeed a qe�. exists and (qcomp.)1=� is a possible choice. Optimizing the value of qe�. will be thetask of an intellingent control mechanism (cf. the following section).9

7 PracticalitiesThe present section is dedicated to the actual implementation of ATBN. It divides into two subsections: The �rst,\Matrix Implementation", addresses the handling of the matrices ~C and ~S as de�ned in section 4. It points outhow computing and storing these matrices can be avoided by interpreting matrix-vector products as directionalderivatives. In the second subsection, \Parameters", we develop a control mechanism for the method parameters �,�1, and �2 which aims at minimizing the e�ective error reduction qe�. de�ned in the convergence theorem, or ratherthe e�ective reduction derived from (38) and (39). Note that there is a di�erence due to certain simpli�cationsinvolved in computing (53).7.1 Matrix ImplementationAs mentioned in section 4, we do not recommend computing the matrices ~C and ~S as de�ned. Instead we proposeto organize ATBN such that only matrix-vector products with these matrices are needed, and to compute theseproducts with the help of appropriate di�erencing schemes. One way to do this is solving (22) with the help ofKrylov subspace methods.This outline presumes that the number kc of coupling variables (i. e. the number of components of y) is reasonablylarge. Otherwise it might be preferable to assemble the matrix ~S and use a direct solver to compute �y from~S�y = �g+. One way to achieve this would be to compute the columns of ~S as ~Sei for i = 1; : : : ; kc using thedi�erencing scheme described below.Krylov subspace methods, however hard to analyse in the case of nonsymmetric system matrices, are widely popularand well-respected. Although systems can be constructed which make them fail or perform very poorly, it isgenerally believed that they will work well for certain classes of large systems with \a lot of structure", particularlyin combination with an appropriate preconditioner. However, since in general our matrix ~S will not be symmetric,many of the usual methods, like CGNE, CGNR, LSQR, BiCG, BiORes, or QMR, will not only require matrix-vector products with ~S but also matrix-vector products with its transpose ~ST , and since it is not clear how tocompute these via clever di�erencing schemes, we restrict ourselves to the class of transpose free algorithms. Theseare mainly orthogonalization methods based on the Arnoldi process, like GMRES ([20]) or truncated versions of it(e. g. GMRES(m)), or \squared" methods based on the two-sided Lanczos process, like CGS ([23]), BiCGStab ([24]),BiCGStab(`) ([22]), or TFQMR ([10]). (For an overview of Lanczos-type solvers, one can also refer to [13].)Assuming we use one of these solvers, we only need to know how to approximate matrix-vector products of ~C and~S. In the case of ~C, this can be done using the di�erencing scheme~C(x; y)w ' �2+1w (0;x; y) ; where (61)w(r;x; y) := �(x+ h2r; y)��(x; y)h2 + f(x; y + h1w)� f(x; y)h1 : (62)The formula is an immediate consequence of w(r;x; y) ' �x(x; y)r+ fy(x; y)w. It is clear from this argument thath1 and h2 are supposed to be small numbers suited to approximate the implied directional derivatives. As a rule ofthumb, they should be of the order of p", where " is the machine precision. More precisely, we useh1 := p" max(kyk; 1)max(kwk; "2) ; (63)h2 := p" max(kxk; 1)max(krk; "2) : (64)Here "2 is a very small positive number designed to avoid zerodivide errors. The choice of hi in these formulae isinspired by [9].Assuming that ~C is handled as shown, we can approximate matrix-vector products of ~S according to~S(x; y)w ' g(x � h3 ~C(x; y)w; y + h3w)� g(x; y)h3 : (65)10

This follows from the fact that if we de�ne~g(x;y)(�y) := g(x� ~C(x; y)�y; y +�y) (66)to represent g on the approximate tangential space, directional derivatives with respect to the coordinate directionswill correspond to products of ~S and the respective direction vectors:@@w ~g(x;y)(0) = ~S(x; y)w : (67)Similar to h1 and h2, we choose as h3h3 := p" max(k(x; y)k; 1)max(k( ~Cw;w)k; "2) : (68)7.2 ParametersWe will now discuss the control mechanism for ATBN's method parameters. In keeping with the previous sections,we shall concentrate on the local convergence behaviour of the method. Consequently, we suppose that the remainderterms of our estimates will be negligible and that it will be possible to let � = 1 throughout. For the sake of latergeneralizations we will develop our control mechanism for arbitrary given �, but adaptation strategies for � itself aswell as other aspects of globalization will be postponed to a forthcoming paper.The following control mechanism may be enhanced by including special strategies for certain special situations: Forexample, the cases kfk � kgk and kfk � kgk can be treated separately by executing only the g or the f step,respectively; these cases may actually occur during a startup phase, depending on the starting values.Our control mechanism basically works like this:1. � = 1 unless complications occur (in which case � may have to be damped).2. "1 is provided by the user. �1 and �2 will adapt to "1 but as a rule of thumb, "1 should be chosen small butnot too small to avoid unnecessary costs in computing (22). It may be a good idea to warn the user if thegiven value of "1 does not seem reasonable, or to adapt the value automatically.3. We supply a formula to choose an optimal � provided all the other parameters are �xed.4. We eliminate � from the model by inserting this optimal value. Thus we can consider the estimated e�ectivereduction factor qe�. as a function of �1 and �2 . We minimize this function.5. We use the values �i and the associated optimal � that we have just computed. At this point a dampingstrategy can attack if the true results di�er signi�cantly from those of the model.The decision to let the user supply "1 stems from the observation that there is a certain redundancy among theATBN method parameters and for the typical case of a fairly poor contraction rate q the values of �i will be muchmore signi�cant for the quality of the step than "1. Furthermore it would be di�cult to predict the number ofiteration steps associated with a given value of "1.Let us now work out the details of � and �i : If we view the right-hand sides of the estimates (38) and (39) asfunctions of � and �, a similar argument as in (46) will show that if everything else is �xed, the optimal choice of �again will be� = min(1; ��) (69)where �� is the value of � for which the right-hand sides of (38) and (39) coincide. Note that this is not exactlythe �� from (48) because we have not substituted mn for the values of kfnk and kgnk. To �nd an approximation,we interpolate kfk and kgk linearly between (x+; yn) and (x+ � ~C�y; yn +�y), that is, between � = 0 and � = 1.With \++" denoting the values for � = 1, the result takes the form�� = kg+k � kf+k(kf++k � kf+k)� (kg++k � kg+k) ; (70)11

mn+1 ' kf++k � kg+k � kf+k � kg++k(kf++k � kf+k)� (kg++k � kg+k) : (71)(The formula formn+1 has been included because it can be helpful in checking the validity of the linear interpolation.)The choice of � can be �ne-tuned with respect to special situations (like e. g. , kf++k � kf+k) but �� is a reasonablestandard.To choose �1 and �2, we now provide a model for the composite step: We use the right hand sides of (25), (36),(29), (34) to model the norms of f and g. We eliminate � from the expressions by inserting the optimal value justcomputed. Then we get:kf+k ' ((1� �) + �q�1 )kfnk ; (72)kg+k ' kgnk+ ��1� q�11� q kfnk ; (73)mn+1 ' max("1kg+k;m�n+1) ; where (74)m�n+1 := (1� "1)kf+k+ (1 + "1)�q�2+1kg+k(1� "1) + (1 + "1)�q�2+1 : (75)If we want to evaluate these estimates for given values of �i, we will have to supply q, �, and �. To this purpose,we again refer to our estimates (25), (29), and (36). Supposing we have computed kf+k, kg+k, and kf++k in thecurrent step, we can conclude:q ' �kf+k � (1 � �)kfnk�kfnk �1=�1 ; (76)� ' j kf++k � kf+k j�q�2+1(1 + "1)kg+k (with the above q estimate), (77)� ' j kg+k � kgnk j�k�xk : (78)Once we have computed these values, we can feed them back into the model and optimize the next step. It is alsopossible and probably reasonable to use convex combinations of the current approximations and earlier values tosmoothen out oscillations between the steps. None of this is of course possible for the very �rst step: Here we willhave to work with an educated guess or some default values.Finally we suppose that the number ` of (outer) steps that the linear solver needs to satisfy (22) remains approx-imately constant, which enables us to estimate the number � of � evaluations belonging to a given pair (�1; �2).Of course � depends on the linear solver and on details of the actual implementation but a typical number for aBiCGStab based approach would be� ' (3 + �1) + 2(` + 1)(�2 + 1) : (79)Using the mn+1 estimate (74), we optimize the �-th root of the reduction factor mn+1=mn for 1 � �1; �2 � �max,where �max is some upper bound for the admissible number of � iterations in a row (cf. part b) of the convergencetheorem). Again, note that the present estimate of the reduction factor is sharper than the one in (53) because thelatter involved several simpli�cations that we have avoided here.8 A Numerical ExampleIn the present section we shall demonstrate some properties of ATBN and our control mechanism by studying awell-known model problem. We chose the Bratu problem on the unit square,��u(�1; �2) = � exp(u(�1; �2)) for 0 < �1; �2 < 1 ;u(�1; �2) = 0 if �1 = 0 or �1 = 1 or �2 = 0 or �2 = 1 : (80)In terms of Chemical Engineering, (80) can be viewed as a model problem for di�usion and exothermic reaction(cf. [1]). We used the standard 5 point discretization of the Laplacian on a uniform grid. We divided the unitsquare into four identical squares; this substructuring will naturally produce a coupled system if the interior nodes12

of the subsquares are interpreted as interior variables and the common nodes as coupling variables. We treated theparameter � (the so-called Thiele modulus) as a further coupling variable and introduced the equationu(0:5; 0:5) = umax (81)with a prespeci�ed umax to compensate for it. As the name suggests, umax will be the maximum value of u(�1; �2);technically, however, it is just a parameter of our problem. By separating the diagonal part of the discretizedLaplacian from the rest of the equations, we constructed an obvious Jacobi type iteration process �. It is knownthat this � will be contractive as long as umax is small enough. (The substructuring of the square allows for largervalues than would be admissible without it.)For the following computations we used (2 � 7 + 1)2 = 225 interior grid points and set u(0:5; 0:5) = 8. The following�gures are supposed to demonstrate the behaviour of the method as well as some speci�cs of the control mechanism.0 1000 2000 3000 4000 5000 6000 7000

10−10

10−8

10−6

10−4

10−2

100

102

number of Φ evaluations

resi

du

a

Figure 2: ATBN at "1 = 0:1 0 5 10 15 20 25 30 350

5

10

15

20

25

number of ATBN steps

κ1

κ2

LFigure 3: �1, �2 and ` for "1 = 0:10 1000 2000 3000 4000 5000 6000 7000

10−10

10−8

10−6

10−4

10−2

100

102


resi

du

a

Figure 4: ATBN at "1 = 0:01 0 2 4 6 8 10 12 140

10

20

30

40

50

60

70

80

κ2

κ1

L

number of ATBN stepsFigure 5: �1, �2 and ` for "1 = 0:01The residua plots are to be read as follows:� The horizontal axes represent the numbers of � evaluations used. The vertical axes represent the residualnorms. 13

� The dashed lines indicate the values of kfk, the dash-dotted ones the values of kgk. They interpolate linearlybetween the values at three critical phases of the composite ATBN step: a) before the f step, b) before the gstep, c) after a virtual undamped g step. It will become clear why this makes sense.� The solid lines connect the values of max(kfk; kgk) before and after the actual composite ATBN step. Theycan be used to measure the success of the step: the steeper the slope, the better the step.Figures 2 and 3 demonstrate the overall behaviour of the method for "1 = 10�1. The steps are relatively small;after a startup phase the method stabilizes itself at �1 = 21� 1 and �2 = 15� 1. As we assumed, the number ` ofBiCGStab iteration steps remains almost constant after the startup phase.Figures 4 and 5 show how the control mechanism adapts to a reduction of "1 : Since the quality of the g step willimprove and since the costs of solving (22) will increase, the mechanism decides to improve the f step and the qualityof the tangential directions as well. The overall costs of the step actually hardly increase: The solid lines in �gures4 and 2 both meet the level 10�8 at approximately 6000 � iterations. The control mechanism will always providesome sort of compensation for a maladjusted "1 but there can be no guarantee that it will always be as successfulas in this example; it is of course preferable not to choose "1 unnecessarily small in the �rst place.It is perhaps worth noting that the slightly odd oscillations of the �i between very small and relatively large valuesduring the startup phase is due to the fact that the estimated e�ective reduction associated with (�1; �2) tends tohave two separate local minima. Due to changes in the estimates of �, � and q and gaps between kfnk and kgnk itis not clear from the outset which one will dominate in the long run. See �gures 6 and 7 for the model estimates ofthe e�ective reduction associated with di�erent combinations of �i for steps 7 and 8 in �gure 5, respectively.Figure 6: model of step # 7, "1 = 0:01 Figure 7: model of step # 8, "1 = 0:01Finally, let us take a closer look at one isolated step of the method: Figures 8 and 9 demonstrate the in uence ofthe �i on the performance of the step. The value of "1 was 10�2, and the model had suggested to use �1 = 66 and�2 = 47. For �gure 8 we accepted �2 and varied �1, for �gure 9 vice versa. We always used the optimal � associatedwith the actual �i values.First of all, both �gures clearly re ect the fact that the f step is typically much more expensive than the g step.This should be no surprise because the g step always involves computing tangential corrections. Since the qualityof these corrections is governed by the value of �2, the in uence of this parameter on the computational costs of theATBN step is much stronger than the in uence of �1. Nevertheless it is still not wise to choose �1 too large: Theampli�cation of kfk during the g step depends crucially on kgk, and thus a disproportionately small value of kf+kwill only be maintained if ~C is very accurate. The loss of e�ciency caused by an oversized �1 will not be too gravethough, owing to the fact that the f step will remain relatively cheap (cf. �gure 8, �1 = 150). Conversely, a small�1 can be very annoying because it will prevent exploiting the qualities of �2 and "1, which were typically ratherexpensive to achieve (cf. �gure 8, �1 = 30). 14

0 200 400 600 800 1000 120010

−11

10−10

10−9

10−8

10−7

10−6

10−5

10−4

30

66

150


resi

du

a

Figure 8: varying �1 in an ATBN step 0 500 1000 1500 2000 250010

−8

10−7

10−6

10−5

10−4


resi

du

a

3047

75

100Figure 9: varying �2 in an ATBN stepTo a certain extent, �gure 9 mirrors �gure 8 in that increasing �1 corresponds to decreasing �2 and vice versa. If �2is too small, the g step will partly spoil the success of the f step (cf. �gure 9, �2 = 30), if �2 is too large, the qualityof the resulting tangential corrections can not be fully exploited. Note that �gure 9 shows "1 to become the mainimpediment for large �2 : In the cases of �2 = 75 and �2 = 100 the ATBN step clearly is hampered by the limitedability of the g step to reduce kg+k, that is by the value of "1 .Acknowledgements. The author thanks Wolfgang Mackens for his fruitful suggestions and his helpful criticism.References[1] Aris, R. : The Mathematical Theory of Di�usion and Reaction in Permeable Catalysts, Vol.1: The Theory ofthe Steady State, Clarendon Press: Oxford (1975).[2] Artlich, S. : Zweidimensionale Simulation der Kohleverbrennung in Druckwirbelschichtfeuerungen, Dissertation,TU Hamburg-Harburg 1996, VDI-Verlag, Reihe 6, Nr. 346: D�usseldorf (1996).[3] Artlich, S. : Combustion of Coal in Pressurized Fluidized Bed Reactors, in Scienti�c Computing in ChemicalEngineering, F. Keil, W. Mackens, H. Vo�, Werther, J. (eds), Springer Verlag: Heidelberg (1996).[4] Artlich, S. and Mackens, W. : Newton-Coupling of Fixed Point Iterations, in Numerical Treatment of CoupledSystems, W. Hackbusch and G. Wittum (eds), Vieweg-Verlag: Braunschweig, Wiesbaden (1995), 1{10.[5] AspenTech: SPEEDUP User Manual, Aspen Technology Inc. , Cambridge, MA (1990).[6] Blomgren, P. and Chan, T.F. : Modular Solvers for Constrained Image Restoration Problems, U.C.L.A. Com-putational and Applied Mathematics Report 97-52, University of California, Los Angeles (1997).[7] Chan, T.F. : An Approximate Newton Method for Coupled Nonlinear Systems, SIAM J. Numer. Anal. 22(1985), 904{913.[8] Chan, T. F. : An E�cient Modular Algorithm for Coupled Nonlinear Systems, in Numerical Analysis. Proceed-ings of the Fourth IIMAS Workshop held at Guanajuato, Mexico, July 23{27, 1984, J.P. Hennart (ed), SpringerLect. Notes in Math. 1230, Springer-Verlag: Berlin (1986), 73{85.[9] Dennis, J. E. and Schnabel, R.B. : Numerical Methods for Unconstrained Optimization and Nonlinear EquationsPrentice-Hall, Inc. : Englewood Cli�s (1983).[10] Freund, R.W. : A Transpose-Free Quasi-Minimal Residual Algorithm for Non-Hermitian Linear Systems, SIAMJ. Sci. Comput. 14 (1993), 470{482. 15

[11] Greenbaum, A. : Any Nonincreasing Convergence Curve is Possible for GMRES, SIAM J. Matrix Anal. Appl.17 (1996), 465{469.[12] Greenbaum, A. : Estimating the Attainable Accuracy of Recursively Computed Residual Methods, SIAM J.Matrix Anal. Appl. 18 (1997), 535{551.[13] Gutknecht, M.H. : Lanczos-Type Solvers for Nonsymmetric Linear Systems of Equations, Technical ReportTR-97-04, CSCS/SCSC, ETH Z�urich (1997).[14] Hoyer, W. and Schmidt, J.W. : Newton-Type Decomposition Methods for Equations Arising in Network Anal-ysis, ZAMM 64 (1984), 397{405.[15] Hoyer, W. , Schmidt, J.W. and Shabani, N. : Superlinearly Convergent Decomposition Methods for Block-Tridiagonal Nonlinear Systems of Equations, Numer. Funct. Anal. and Optimiz. 10 (1989), 961{975.[16] Lanzkron, P. J. , Rose, D. J. and Wilkes, J. T. : An Analysis of Approximate Nonlinear Elimination, SIAM J.Sci. Comput. 17 (1996), 538{559.[17] Mackens, W. : Some Notes on Block-Gauss-Seidel Newton Iterations for the Solution of Sparse NonlinearSystems, Bericht Nr. 37 des Instituts f�ur Geometrie und Praktische Mathematik der RWTH Aachen (1986).[18] Mackens, W. : Quadratic Convergence of the Recursive Block-Gauss-Seidel-Newton Iteration, Bericht Nr. 44 desInstituts f�ur Geometrie und Praktische Mathematik der RWTH Aachen (1987).[19] Nachtigal, N.M. , Reddy, S.C. and Trefethen, L.N. : How Fast are Nonsymmetric Matrix Iterations? , SIAMJ. Matrix Anal. Appl. 13 (1992), 778{795.[20] Saad, Y. and Schultz, M.H. : GMRES: A Generalized MinimumResidual Algorithm for Solving NonsymmetricLinear Systems, SIAM J. Sci. Statist. Comput. 7 (1986), 856{869.[21] Schmidt, J.W. , Hoyer, W. and Hau�e, C. : Consistent Approximation in Newton-Type DecompositionMethods,Numer. Math. 47 (1985), 413{425.[22] Sleijpen, G.L.G. and Fokkema, D.R. : BiCGstab(`) for Linear Equations Involving Unsymmetric Matriceswith Complex Spectrum, Electronic Trans. Numer. Anal. 1 (1993), 11{32.[23] Sonneveld, P. : CGS, a Fast Lanczos-Type Solver for Nonsymmetric Linear Systems, SIAM J. Sci. Statist.Comput. 10 (1989), 36{52.[24] van der Vorst, H.A. : Bi-CGstab: A Fast and Smoothly Converging Variant of Bi-CG for the Solution ofNonsymmetric Linear Systems, SIAM J. Sci. Statist. Comput. 13 (1992), 631{644.[25] Zhang, X. , Byrd, R.H. and Schnabel, R.B. : Parallel Methods for Nonlinear Block Bordered Systems ofEquations, SIAM J. Stat. Comput. 13 (1992), 841{859.16

f t exactg.de /mat/ [email protected] e keyw ords. newton's metho d, appro ximate...

Documents