broyden method

8/8/2019 Broyden Method

1/236

A limited memory Broyden method to solvehigh-dimensional systems of nonlinear equations

Proefschrift

ter verkrijging van

de graad van Doctor aan de Universiteit Leiden,

op gezag van de Rector Magnificus Dr. D. D. Breimer,

hoogleraar in de faculteit der Wiskunde en

Natuurwetenschappen en die der Geneeskunde,

volgens besluit van het College voor Promoties

te verdedigen op dinsdag 9 december 2003klokke 15.15 uur

door

Bartholomeus Andreas van de Rotten

geboren te Uithoornop 20 oktober 1976


2/236

Samenstelling van de promotiecommissie:

promotoren: prof. dr. S.M. Verduyn Lunelprof. dr. ir. A. Bliek (Universiteit van Amsterdam)

referent: prof. dr. D. Estep (Colorado State University)

overige leden: prof. dr. G. van Dijkdr. ir. H.C.J. Hoefsloot (Universiteit van Amsterdam)prof. dr. R. van der Houtdr. W.H. Hundsdorfer (CWI, Amsterdam)prof. dr. L.A. Peletierprof. dr. M.N. Spijker


3/236

A limited memory Broyden method

to solve high-dimensional systems

of nonlinear equations


4/236

Bart van de RottenMathematisch Instituut, Universiteit Leiden, The Netherlands

ISBN: 90-9017576-8

Printed by PrintPartners Ipskamp

The research that led to this thesis was funded by N.W.O. (Nederlandse or-ganisatie voor Wetenschappelijk Onderzoek) grant 616-61-410 and supportedby the Thomas Stieltjes Institute for Mathematics:


5/236

Contents

Introduction 1

I Basics of limited memory methods 15

1 An introduction to iterative methods 17

1.1 Iterative methods in one variable . . . . . . . . . . . . . . . . . 18

1.2 The method of Newton . . . . . . . . . . . . . . . . . . . . . . . 25

1.3 The method of Broyden . . . . . . . . . . . . . . . . . . . . . . 35

2 Solving linear systems with Broydens method 55

2.1 Exact convergence for linear systems . . . . . . . . . . . . . . . 56

2.2 Two theorems of Gerber and Luk . . . . . . . . . . . . . . . . . 62

2.3 Linear transformations . . . . . . . . . . . . . . . . . . . . . . . 67

3 Limited memory Broyden methods 71

3.1 New representations of Broydens method . . . . . . . . . . . . 72

3.2 Broyden Rank Reduction method . . . . . . . . . . . . . . . . . 83

3.3 Broyden Base Reduction method . . . . . . . . . . . . . . . . . 96

3.4 The approach of Byrd . . . . . . . . . . . . . . . . . . . . . . . 100

II Features of limited memory methods 109

4 Features of Broydens method 111

4.1 Characteristics of the Jacobian . . . . . . . . . . . . . . . . . . 113

4.2 Solving linear systems with Broydens method . . . . . . . . . . 120

4.3 Introducing coupling . . . . . . . . . . . . . . . . . . . . . . . . 125

4.4 Comparison of selected limited memory Broyden methods . . . 128

i


6/236

ii Contents

5 Features of the Broyden rank reduction method 1355.1 The reverse flow reactor . . . . . . . . . . . . . . . . . . . . . . 1355.2 Singular value distributions of the update matrices . . . . . . . 1385.3 Computing on a finer grid using same amount of memory . . . 1405.4 Comparison of selected limited memory Broyden methods . . . 142

III Limited memory methods

applied to periodically forced processes 147

6 Periodic processes in packed bed reactors 1496.1 The advantages of periodic processes . . . . . . . . . . . . . . . 1496.2 The model equations of a cooled packed bed reactor . . . . . . 152

7 Numerical approach for solvingperiodically forced processes 1677.1 Discretization of the model equations . . . . . . . . . . . . . . . 1687.2 Tests for the discretized model equations . . . . . . . . . . . . . 1737.3 Bifurcation theory and continuation techniques . . . . . . . . . 175

8 Efficient simulation of periodically forced reactors in 2D 1838.1 The reverse flow reactor . . . . . . . . . . . . . . . . . . . . . . 1838.2 The behavior of the reverse flow reactor . . . . . . . . . . . . . 1858.3 Dynamic features of the full two-dimensional model . . . . . . . 191

Notes and comments 195

Bibliography 201

A Test functions 207

B Matlab code of the limited memory Broyden methods 211

C Estimation of the model parameters 219

Samenvatting (Waarom Broyden?) 223

Nawoord 227

Curriculum Vitae 229


7/236

Introduction

Periodic chemical processes form a field of major interest in chemical reac-tor engineering. Examples of such processes are the pressure swing adsorber(PSA), the thermal swing adsorber (TSA), the reverse flow reactor (RFR),and the more recently developed pressure swing reactor (PSR). The state ofa chemical reactor, that contains a periodically forced process, is given by thetemperature profiles and concentration profiles of the reactants. Starting withan initial state, the reactor generally goes through a transient phase duringmany periods before converging to a periodic limiting state. This periodiclimiting state is also known as the cyclic steady state (CSS). Because the re-

actor operates in this state most of the time, it is interesting to investigatethe dependence of the cyclic steady state on the operating parameters of thereactor.

The simulation of periodically forced processes in packed bed reactors leadsto the development of partial differential equations. In order to investigate thebehavior of the system numerically, we discretize the equations in space. Theaction of the process during one period can be computed by integrating theobtained system of ordinary differential equations in time for one period. Themap that assigns to an initial state of the process, the state after one periodis called the period map. We denote the period map by f : Rn Rn, which,in general, is highly nonlinear. The dynamical process in the reactor can now

be formulated by the dynamical system

xk+1 = f(xk), k = 0, 1, 2, . . . ,

where xk denotes the state of the reactor after k periods. Periodic states ofthe reactor are fixed points of the period map f and a stable cyclic steadystate can be computed by taking the limit of xk as k . Depending onthe convergence properties of the system at hand, the transient phase of theprocess might be very long, and efficient methods to find the fixed points of fare essential.

1


8/236

2 Introduction

Fixed points of the map f correspond to zeros of g : Rn Rn where g isgiven by

g(x) = f(x) x.So, the basic equation we want to solve is

g(x) = 0 for x Rn. (1)

Because (1) is a system of n nonlinear equations, iterative algorithms areneeded to approximate a zero of the function g. The iterative algorithms pro-duce a sequence {xk} of approximations to the zero x of g. A function eval-uation can be a rather expensive task, and it is generally accepted that themost efficient iterative algorithms for solving (1) minimize the least numberof function evaluations.

In his thesis [49], Van Noorden compares several iterative algorithms forthe determination of the CSS of periodically forced processes, by solving (1).He deduces that for this type of problem, the Newton-Picard method and themethod of Broyden are especially promising.

In this thesis, the study of periodically forced processes is extended to

more complex models. Because the dimension of the discretized system ofsuch models is very large, memory constraints arise and care must be takenin the choice of iterative methods. Therefore, it is necessary to develop lim-ited memory algorithms to solve (1), that is, algorithms that use a restrictedamount of memory. Since the method of Broyden is popular in the chemicalreactor engineering, we focus on approaches aimed at reducing the memoryneeded by the method of Broyden. We call the resulting algorithms limitedmemory Broyden methods.

Basics of limited memory methods

The standard iterative algorithm is the method of Newton. Let x0 Rn be aninitial guess in the neighborhood of a zero x of g. Newtons method defines asequence {xk} in Rn of approximations of x by

xk+1 = xk J1g (xk)g(xk), k = 0, 1, 2, . . . , (2)

where Jg(x), is the Jacobian of g at the point x. An advantage of the methodof Newton is that the convergence is quadratic in a neighborhood of a zero,i.e.,

xk+1 x < cxk x2,


9/236

Introduction 3

for a certain constant c > 0. Since it is not always possible to determine theJacobian ofg analytically, we often have to approximate Jg using finite differ-ences. The number of function evaluations per iteration step in the resultingapproximate Newtons method is (n + 1).

In 1965, Broyden [8] proposed a method that uses only one function evalu-ation per iteration step instead of (n + 1). The main idea of Broydens methodis to approximate the Jacobian of g by a matrix Bk. Thus the scheme (2) isreplaced by

xk+1 = xk B1k g(xk), k = 0, 1, 2, . . . . (3)After every iteration step, the Broyden matrix Bk is updated using a rank-one-matrix. If g(x) is an affine function, so for some A Rnn and b Rn,g(x) = Ax + b, then

g(xk+1) g(xk) = A(xk+1 xk)holds. According to this equality, the updated Broyden-matrix Bk+1 is chosensuch that it satisfies the equation

yk = Bk+1sk, (4)

withsk = xk+1 xk and yk = g(xk+1) g(xk).

Equation (4) is called the secant equation and algorithms for which this con-dition is satisfied are called secant methods. If we assume that Bk+1 and Bkare identical on the orthogonal complement of the linear space spanned by sk,the condition in (4) results in the following update scheme for the Broydenmatrix Bk

Bk+1 = Bk + (yk Bksk) sTk

sTk sk= Bk +

g(xk+1)sTk

sTk sk(5)

In 1973, Broyden, Dennis and More [11] published a proof that the methodof Broyden is locally q-superlinearly convergent, i.e.,

limk

xk+1 xxk x = 0.

In 1979, Gay [22] proved that for linear problems the method of Broyden is infact exactly convergent in 2n iterations. Moreover, he showed that this implieslocally 2n-step, quadratic convergence for nonlinear problems

xk+2n x cxk x2,


10/236

4 Introduction

with c > 0. This proof of exact convergence was simplified and sharpened in1981 by Gerber and Luk [23]. In practice, these results imply that the methodof Broyden needs more iterations to converge than the method of Newton.Yet, since only one function evaluation is made for every iteration step, themethod of Broyden might significantly reduce the amount of CPU-time tosolve the problem.

In Chapter 1, we discuss the method of Newton and the method of Broydenin more detail and in particular describe the derivation and convergence prop-

erties. Subsequently, we consider the method of Broyden for linear systemsof equations in Chapter 2. We deduce that Broydens method uses selectiveinformation of the system to solve it.

Both Newtons and Broydens method need to store an (n n)-matrix,see (2) and (3). Therefore, for high-dimensional systems, this might lead tosevere memory constraints. From the early seventies, there has been seriousattention paid to the issue of reducing the amount of storage required forthe iterative methods. Different techniques have appeared for solving largenonlinear problems [62].

The problem that we consider is the general nonlinear equation (1), sonothing is known beforehand about the structure of the Jacobian of the sys-tem. In Chapter 3, we develop several limited memory methods that do notdepend on the structure of the Jacobian and are based on the method ofBroyden. In addition to a large reduction of the memory used, these limitedmemory methods give more insight in the original method of Broyden, since weinvestigate the question of how much and which information can be droppedwithout destroying the property of superlinear convergence. In Section 3.2,we derive our main algorithm, the Broyden Rank Reduction method (BRR).To introduce the idea of the BRR method we first consider an example.

Example 1. The period map f : Rn Rn to be considered is a small (take = 1.0

102) quadratic perturbation to two times the identity map,

f(x) =

2x1 x22

...2xn1 x2n

2xn

. (6)The unique fixed points of the function f, x = 0, can be found by applyingBroydens method to solve (1) with g(x) = f(x) x up to a certain residual

g(x) < 1.0 1012,


11/236

Introduction 5

using initial estimate vector x0 = (1, . . . , 1). In order to obtain a good exampleof memory reduction, we choose n = 100, 000. Starting with a simple initialmatrix B0 = I, the first Broyden matrix is given by,

B1 = B0 + c1dT1 ,

where c1 = g(x1)/s0 and d1 = s0/s0. Because B0 does not have to bestored, it is more economical to store the vector c1 and d1 instead of the

matrix B1 itself. Applying another update, we obtain the second Broydenmatrix,

B2 = B1 + c2dT2 = B0 + c1d

T1 + c2d

T2 , (7)

where c2 = g(x2)/s1 and d2 = s1/s1. Now 4 n = 400, 000 locations areused to store the vector pairs {c1, d1} and {c2, d2}. In the next iteration step,we would need 6 n = 600, 000 storage locations to store all of the vector pairs.We consider the update matrix in the fifth iteration that consists of the firstfive rank-one updates to the Broyden matrix,

Q = c1dT1 + c2d

T2 + . . . + c5d

T5 .

Because Q is the sum of five rank-one matrices, it has rank less or equal tofive, and if we compute the singular value decomposition of Q (see Section 3.2for details) we see that Q can be written as

Q = 1u1vT1 + . . . + 5u5v

T5 ,

where {u1, . . . , u5} and {v1, . . . , v5} are orthonormal sets of vectors and

1 = 2.49, 2 = 1.61, 3 = 0.214 105,4 = 0.121 1012, 5 = 0.00.

This suggests that we can ignore the singular value 5 in the singular valuedecomposition of Q without changing the update matrix Q. We replace thematrix Q by Q with

Q = Q 5u5vT5 = 1u1vT1 + . . . + 4u4vT4 .We define ci := iui and di := vi for i = 1, . . . 4. The difference between theoriginal Broyden matrix B5 = B0 + Q and the reduced matrix B5 = B0 + Qcan be estimated as

B5

B5 = B0 + Q B0

Q = 5u5vT5 = 5u5v5 = 5,


12/236

6 Introduction

which is equal to zero in this case. After this reduction we can store a newpair of update vectors

c5 := g(x6)/s5 and d2 := s2/s2.Continuing, on every iteration step we first remove the singular value 5 ofQ before computing the new update. This leads to Algorithm 3.11 of Section3.2, the Broyden Rank Reduction method, with parameter p = 5. Surprisinglythe fifth singular value of the update matrix remains zero in all subsequent

iterations until the process is converged, see Figure 1. Therefore, if in everyiteration we save the four largest singular values of the update matrix anddrop the fifth singular value, we do not alter the Broyden matrix. In fact, weapply the method of Broyden itself.

0 5 10 150

0.5

1

1.5

2

2.5

3

iteration k

singularvalues

Figure 1: The singular values of the update matrix during the BRR process withp = 5.

The rate of convergence of this process is plotted in Figure 2, together withthe rate of convergence of the BRR method for other values of p.

If p is larger than 5, the rate of convergence does not increase. We observethat the residual g(xk) is approximately 1014 after 14 iterations. For p = 5,the number of required storage locations is reduced from n2 = 1010 for theBroyden matrix of the original method to 2pn = 106 for the BRR method.

Note that p cannot be any small number, and care is needed to find theoptimal p. Clearly, the BRR process does not converge for p = 2. In the firstiterations, it might not be harmful to remove the second singular value ofthe update matrix, but after 8 iterations the process fails to keep the fast q-superlinear convergence and starts to diverge. For p = 3 we observe the samekind of behavior, where the difficulties start in the 9th iteration.


13/236

Introduction 7

0 10 20 30 40 50 6010

15

1010

105

100

105

iteration k

residual

g(xk

)

Figure 2: The convergence rate of the Broyden Rank Reduction method when com-puting a fixed point of the function f given by (6). [(p = 1), (p = 2), +(p = 3),(p = 4), (p = 5)]

The reduction applied to the update matrix Q can also be explained asfollows. For the method of Broyden the action of the Broyden matrix has to be

known is all n directions, for example, in order to compute the new Broydenstep sk, see (3). In this situation the BRR method is satisfied with the actionof the Broyden matrix in only p directions. These directions are produced bythe Broyden process itself.

Features of limited memory methods

In Part I, we discuss most of the properties of Broydens method and the newlydeveloped Broyden Rank Reduction method. The good results in practicalapplications of Broydens method can only be explained to a limited degree.Therefore, in Part II we apply the method of Broyden and the BRR method to

several test functions. We are especially interested in nonlinear test functions,since it is known that for linear systems of equations, Broydens method is farless efficient than, for example, GMRES [61] and Bi-CGSTAB [70, 71].

In the neighborhood of the solution, a function often can be considered asapproximately affine. Moreover, the rate of convergence of Broydens methodapplied to the linearization of a function indicates how much iterations itmight need for the function itself. In Part I, we show that if we apply Broy-dens method to an affine function g(x) = Ax + b, the difference Bk Abetween the Broyden matrix and the Jacobian of the function does not in-crease as k increases, if measured in the matrix norm induced by the l2-vector


14/236

8 Introduction

norm. For nonlinear functions g the difference Bk Jg(x)F, measured inthe Frobenius norm, may increase. However, we can choose a neighborhoodN1 of the solution x

and a neighborhood N2 of the Jacobian Jg(x), so that

if (x0, B0) N1 N2 the difference Bk Jg(x)F never exceeds two timesthe initial difference B0 Jg(x)F.Example 2. Let the matrix A be given by the sum

A =

2 1. . .

. . .

. . . 12

+ S, (8)where the elements of S are in between zero and one. The matrix S containsin fact the values of a gray-scale picture of a cat. We consider the systemof linear equations Ax = 0 and apply the method of Broyden from initialcondition x0 = (1, . . . , 1) and with initial Broyden matrix B0 = I. Thedimension of the problem is n = 100. Since A is invertible, the Theorem ofGay implies that it will take Broydens method less then 200 iterations to solvethe problem exactly. It turns out that in the simulation, about 219 iterationsare needed to reach a residual ofg(xk) < 1012. The finite arithmetic of thecomputer has probably introduced a nonlinearity into the system so that theconditions of Gays Theorem are not completely fulfilled.

In Figure 3, we have plotted the matrix A and the Broyden matrix fordifferent iterations. We observe that in some way the Broyden matrix Bktries to approximate the Jacobian A. Although the final Broyden matrix iscertainly not equal to the Jacobian, it approximates the Jacobian to such anextent that the solution to the problem Ax = 0 can be found.

After 50 iterations, the rough contour of the cat can be recognized inthe Broyden matrix B

50. While reconstructing the two main diagonals of the

Jacobian, the picture of the cat is sharpened. Note that the light spot atthe left side of the image of the Jacobian is considered less interesting by themethod of Broyden. On the other hand, the nose and eyes of the cat areclearly detected.

Limited memory methods applied to periodic processes

We now consider an application in the chemical reactor engineering. In caseof significant temperature fluctuations, it turns out to be essential to includea second space dimension in the model of packed bed reactors. Moreover, to


15/236

Introduction 9

Jacobian B50

B100 B218

Figure 3: The Jacobian as given by (8) and the Broyden matrix at three different

iterations of the Broyden process (n = 100). Black corresponds to values smaller than0 and white to values larger than 1.

obtain an accurate approximation of the periodic state of the reactor, it is nec-essary to use a fine grid. This implies that the number of equations, n, is verylarge. Combining the integration of the system of ordinary differential equa-tions for the evaluation of the function g with a fine grid in the reactor makesit practically impossible to solve (1) using classical iterative algorithms fornonlinear equations. To overcome severe memory constraints, many authorshave reverted to pseudo-homogeneous one-dimensional models and to coarse


16/236

10 Introduction

grid discretization, which renders such models inadequate or inaccurate.The radial transport of heat and matter is essential in non-isothermal

packed bed reactors [72]. A highly exothermic reaction, a large width of thereactor, and efficient cooling of the reactor at the wall cause radial temperaturegradients to be present, see Figure 4. Clearly, for reactors operating underthese conditions the radial dimension must be taken into account explicitly.

ax. distance

rad. distance

conversion

ax. distance

rad. distance

temperature

Figure 4: Qualitative conversion and temperature distribution of the cooled reverseflow reactor in the cyclic steady state using the two-dimensional model (10)-(12) withthe parameter values of Tables C.1 and C.2.

As an example, we consider a reverse flow reactor, which is considered indetail in Chapter 8. The reverse flow reactor (RFR) is a catalytic packed-bedreactor in which the flow direction is periodically reversed to trap a hot zonewithin the reactor. Upon entering the reactor, the cold feed gas is heated upregeneratively by the hot bed so that a reaction can occur. The reaction isassumed to be exothermic. At the other end of the reactor, the hot productgas is cooled by the colder catalyst particles. The beginning and end of thereactor thus effectively work as heat exchangers. The cold feed gas purges

the high-temperature (reaction) front in downstream direction. Before thehot reaction zone exits the reactor, the feed flow direction is reversed. Theflow-reversal period, denoted by tf, is usually constant and predefined. Onecomplete cycle of the RFR consists of two flow-reverse periods. Overheating ofthe catalyst and hot spot formation are avoided by a limited degree of cooling.

In Chapter 6, we derive the balance equations of the two-dimensional modelof a general packed bed reactor. Here we give a short summary of the deriva-tion. We start with the one-dimensional pseudo-homogeneous model of Khi-nast, Jeong and Luss [33], which takes into account the axial heat and massdispersion.


17/236

Introduction 11

The concentration and temperature depend on the axial and the radialdirection, c = c(z,r,t) and T = T(z,r,t). The second spatial dimension isincorporated by including the radial components of the diffusion terms,

Drad1

r

r

r

c

r

and rad

1

r

r

r

T

r

,

in the component balance and the energy balance, respectively. The cooling

term in the energy balance disappears. Instead, at the wall of the reactor theboundary condition

radT

r

r=R

= Uw(T(R) Tc), (9)

is added to the system. Equation (9) describes the heat loss at the reactorwall to the surrounding cooling jacket.

In summary, we can now give the complete two-dimensional model. Thecomponent balance is given by

c

t= Dax

2c

z 2 u

c

z r(c, T) + Drad

1

r

r rc

r , (10)the energy balance is given by

((cp)s(1 ) + (cp)g) Tt

= ax2T

z 2 u(cp)g T

z+

(H)r(c, T) + rad 1r

r

r

T

r

, (11)

and the boundary conditions are given by

ax

Tz z=0 = u(cp)g(T0 T(0)), Tz z=L = 0,

Dax cz

z=0= u(c0 c(0)), cz

z=L

= 0,

cr

r=0

= 0, cr

r=R

= 0,

Tr

r=0

= 0, radTr

r=R

= Uw(T(R) Tc).(12)

The values of the parameters in this model are derived in Appendix C andsummarized in Tables C.1 and C.2.


18/236

12 Introduction

In Chapter 7, we describe a numerical approach to deal with the partialdifferential equations in order to compute the cyclic steady state of the process.When we are able to evaluate the period map of the process using discretizationtechniques and integration routines, we can apply the limited memory Broydenmethods of Chapter 3.

We define f : Rn Rn to be the period map of the RFR of one flowreverse period associated to the balance equations (10) - (12). We use 100equidistant grid points in the axial direction and 25 grid points in the radial

direction. The state vector, denoted by x, consist of the temperature and theconcentration in every grid point. This implies that n = 5000.

In Chapter 8, we propose the use of the Broyden Rank Reduction methodto simulate a full two-dimensional model of the reverse flow reactor with radialgradients taken into account. A disadvantage of the method of Broyden isthat an initial approximation of the solution has to be chosen as well as aninitial Broyden matrix. This problem is naturally solved by the application.The reverse flow reactor is usually started in preheated state, for exampleT = 2T0, where the reactor is filled with the carrier gas without a trace of thereactants, that is, c = 0. This initial state of the reactor is chosen as the initialstate of the Broyden process. In the first periods, the state of the reverse flow

reactor converges relatively fast to the cyclic steady state. Thereafter the rateof convergence decreases and the dynamical process takes many periods beforeit reaches the cyclic steady state. By taking the initial Broyden matrix equalto minus the identity, B0 = I, the first iteration of the Broyden process is adynamical simulation step,

x1 = x0 B10 g(x0) = x0 + f(x0) x0 = f(x0).

We apply the BRR method for different values of p to approximate azero of the function g(x) = f(x) x with a residual of 108. Figure 5 showsthat the BRR method converges in 49 iterations for p = 10. Here, instead of

25, 000, 000 (n2) required for a standard Broyden iteration, only 100, 000 (2pn)storage locations are needed for the Broyden matrix. If we use a few moreiterations, p can even be chosen equal to 5 and the number of storage locationsis reduced further. If p is chosen too small (p = 2) the (fast) convergence islost. For complete details of the computations see Chapter 5.

The BRR method makes it possible to compute efficiently the cyclic steadystate of the reverse flow reactor, where radial gradients are integrated in themodel.


19/236

Introduction 13

0 10 20 30 40 50 60 70 80 90 100

105

100

iteration k

residual

g(xk

)

Figure 5: The convergence rate of the method of Broyden and the BRR method, fordifferent values of p, applied to the period map of the reverse flow reactor using thetwo-dimensional model (10)-(12) with the parameter values of Tables C.1 and C.2.[+(p = 10), (p = 5), (p = 2)]


20/236

14 Introduction


21/236

Part I

Basics of limited memory

methods

15


22/236


23/236

Chapter 1

An introduction to iterativemethods

A general nonlinear system of algebraic equations can be written as

g(x) = 0, (1.1)

where x = (x1, . . . , xn) is a vector in Rn

, the n-dimensional real vector space.The function g : Rn Rn is assumed to be continuously differentiable in anopen, convex domain D. The Jacobian of g, denoted by Jg, is assumed to beLipschitz continuous in D, that is, there exists a constant 0 such that forevery u and v in D,

Jg(u) Jg(v) u v.We write Jg Lip(D).

In general, systems of nonlinear equations cannot be solved analyticallyand we have to consider numerical approaches. These approaches are generallybuilt on the concept of iterations. Steps involving similar computations areperformed over and over again until the solution is approximated. The oldest

and the most famous iterative method might be the method of Newton, alsocalled the Newton-Raphson method. In Section 1.2, we derive and discuss themethod of Newton and describe the convergence properties. In addition, wediscuss some quasi-Newton methods, based on the method of Newton.

The quasi-Newton method of most interest to this work is the method ofBroyden, proposed by Charles Broyden in 1965 [8]. In Section 1.3, we derivethis method in the same way as Broyden. We prove the local convergence ofthe method and discuss the most interesting features.

As a simple introduction into quasi-Newton methods, we first consider ascalar problem.

17


24/236

18 Chapter 1. An introduction to iterative methods

1.1 Iterative methods in one variable

The algorithms we discuss in this section are the scalar version of Newtonsmethod, Broydens method and other quasi-Newton methods, as discussed inSections 1.2 and 1.3. The multi-dimensional versions of the methods are morecomplex, but an understanding of the scalar case will help in understandingthe multi-dimensional case. The theorems in this section are special cases ofthe theorems in Sections 1.2 and 1.3. So, we often omit the proof, unless it

gives insight in the algorithms.

The scalar version of Newtons method

The standard iterative method to solve (1.1) is the method of Newton, whichcan be described by a single expression. We choose an initial guess x0 R tothe solution x and compute the sequence {xk} using the iteration scheme

xk+1 = xk g(xk)g(xk)

, k = 0, 1, 2, . . . . (1.2)

This iteration scheme involves solving a local affine model for the function ginstead of solving the nonlinear equation (1.1) directly. A clear choice for theaffine model, denoted by lk(x), is the tangent line to the graph of g in thepoint (xk, g(xk)). So, the function is linearized in the point xk, i.e.,

lk(x) = g(xk) + g(xk)(x xk), (1.3)

and xk+1 is defined to be the zero of this affine function, which yields (1.2).We illustrate this idea with an example.

Example 1.1. Let g : R R be given by

g(x) = x2

2. (1.4)The derivative of this function is g(x) = 2x and an exact zero ofg is x =

2.

As initial condition, we take x0 = 3. The first affine model equals the tangentline to g at x0,

l0(x) = g(x0) + g(x0)(x x0) = 7 + 6(x 3) = 6x 11.

The next iterate x1 is determined to be the intersection point of the tangentline and the x-axis, x1 =

116 , see Figure 1.1. Next, we repeat the same step

starting from the new estimate x1.


25/236

1.1 Iterative methods in one variable 19

1 2 3 42

0

2

4

6

8

10

x0x1x2

Figure 1.1: The first two steps of the scalar version of Newtons method (1.2) forx2

2 = 0, starting at x0 = 3.

An important fact is that the method of Newton is locally q-quadraticallyconvergent. In every iteration, the number of accurate digits is doubled whenthe iteration starts close to the true solution. The scalar version of Theorem1.10 reads.

Theorem 1.2. Letg : R R be continuously differentiable in an open inter-val D, where g Lip(D). Assume that for some > 0, |g(x)| for everyx D. If g(x) = 0 has a solution x D, then there exists an > 0 such thatif |x0 x| < , the sequence {xk} generated by

xk+1 = xk g(xk)g(xk)

, k = 0, 1, . . . ,

exists and converges to x. Furthermore, for k 0,

|xk+1 x| 2

|xk x|2.

The condition that g(x) has a nonzero lower bound in D, simply meansthat g(x) must be nonzero for Newtons method to converge quadratically.If g(x) = 0, then x is a multiple root, and Newtons method converges only


26/236


linearly [18]. In addition, if g(x) on D, the continuity of g implies thatx is the only solution in D.

Theorem 1.2 guarantees the convergence only for a starting point x0 thatlies in a neighborhood of the solution x. If |x0 x| is too large, Newtonsmethod might not converge. So, the method is useful for its fast local conver-gence, but we need to combine it with a more robust algorithm that is canconverge from starting points further away from the true solution.

The secant method

In many practical applications, the nonlinear equation cannot be given inclosed form. For example, the function g might be the output of a compu-tational or experimental procedure. In this case, g(x) is not available andwe have to modify Newtons method, which requires the derivative g (x) tomodel g around the current estimate xk by the tangent line to g(x) at xk. Thetangent line can be approximated by the secant line through g(x) at xk and anearby point xk + hk. The slope of this line is given by

ak =g(xk + hk) g(xk)

hk

. (1.5)

The function g(x) is modeled by

lk(x) = g(xk) + ak(x xk). (1.6)

Iterative methods that solve (1.6) in every iteration step are called quasi-Newton methods. These methods follow the scheme

xk+1 = xk g(xk)ak

, k = 0, 1, . . . . (1.7)

Of course we have to choose hk in the right way. For hk sufficiently small,

ak is a finite-difference approximation to g(xk). In Theorem 1.5, we showthat using ak given by (1.5) with sufficiently small hk, works as well as usingthe derivative itself. However, in every iteration two function evaluations areneeded. If computing g(x) is very expensive, using hk = xk1 xk may be abetter choice, where xk1 is the previous iterate. Substituting hk = xk1 xkin (1.5) gives

ak =g(xk1) g(xk)

xk1 xk , (1.8)

and only one function evaluation is required, since g(xk1) is already computedin the previous iteration. This quasi-Newton method is called the secant


27/236


method, because the local model uses the secant line through the points xkand xk1. Since a0 is not defined by the secant method, a0 is often chosenusing (1.5) with h0 small or a0 = 1.

While it may seem locally ad hoc, it turns out to work well. The method isslightly slower than a finite-difference method, but usually it is more efficient interms of the total number of function evaluations required to obtain a specifiedaccuracy.

To prove the convergence of the secant method we need the following

lemma, which also plays a role in the multi-dimensional setting.

Lemma 1.3. Letg : R R be continuously differentiable in an open intervalD, and let g Lip(D). Then for any x, y in D,

|g(y) g(x) g(x)(y x)| (y x)2

2.

Proof. The fundamental theorem of calculus gives that g(y)g(x) = yx g(z)dz,which implies

g(y)

g(x)

g(x)(y

x) =

y

x

(g(z)

g(x))dz. (1.9)

Under the change of variables

z = x + t(y x), dz = dt(y x),(1.9) becomes

g(y) g(x) g(x)(y x) =1

0(g(x + t(y x)) g(x))(y x)dt.

Applying the triangle inequality to the integral and using the Lipschitz conti-nuity of g, yields

|g(y) g(x) g(x)(y x)| = |y x| 10

|t(y x)|dt = |y x|2/2.

We analyze one step of the quasi-Newton process (1.6). By construction,

xk+1 x = a1k (ak(xk x) g(xk) + g(x))= a1k (g(x

) g(xk) g(xk)(x xk) + (g(xk) ak)(x xk))

= a1k

xxk

(g(z) g(xk))dz + (g(xk) ak)(x xk)

.


28/236


If we define ek = |xk x| and use g Lip(D) in the same way as in theproof of Lemma 1.3, we obtain

ek+1 |a1k |

2e2k + |g(xk) ak|ek

. (1.10)

In order to use (1.10), we have to know how close the finite differenceapproximation ak is to g

(xk) as a function of hk.

Lemma 1.4. Letg : R R be continuously differentiable in an open intervalD and let g Lip(D). If xk, xk + hk D and ak is defined by (1.5), then

|ak g(xk)| |hk|2

. (1.11)

Proof. From Lemma 1.3, we have

|g(xk) g(xk + hk) hkg(xk)| |hk|2

2.

Dividing both sides by |hk| gives the desired result.

Substituting (1.11) in (1.10) gives

ek+1 2|ak|(ek + |hk|)ek. (1.12)

Using this inequality, it is not difficult to prove the following theorem.

Theorem 1.5. Let g : R R be continuously differentiable in an open in-terval D and let g Lip(D). Assume that |g(x)| for some > 0 and

for everyx D. If g(x) = 0 has a solution x D, then there exist positiveconstants , h such that if {hk} is a real sequence with 0 < |hk| h, and if|x0 x| < , then the sequence {xk} given by

xk+1 = xk g(xk)ak

, ak =g(xk + hk) g(xk)

hk,

for k = 0, 1, . . . , is well defined and converges q-linearly to x. If limk hk =0, then the convergence is q-superlinear. If there exists some constant c1 suchthat

|hk| c1|xk x|,or equivalently, a constant c2 such that

|hk| c2|g(xk)|, (1.13)


29/236


then the convergence is q-quadratic. If there exists some constant c3 such that

|hk| c3|xk xk1|, (1.14)

then the convergence is at least two-step q-quadratic.

If we would like the finite-difference methods to converge q-quadratically,we just set hk = c2|g(xk)|. Indeed, if xk is close enough to x, the mean value

theorem and the fact that g(x

) = 0 implies that |g(xk)| c|xk x

| for somec > 0. Note that the secant method hk = xk1xk is included as a special caseof (1.14). We restrict the proof of Theorem 1.5 to the two-step q-quadraticconvergence of the secant method.

Proof (of Theorem 1.5). We first prove that the secant method is q-linearlyconvergent. Choose = /(4) and h = /(2). Suppose x0 and x1 are in Dand in addition |x0 x| < , |x1 x| < and |h1| = |x1 x0| < . Since|g(x)| for all x D, (1.11) implies that

|a1| = |a1 g(x1) + g(x1)|

|g(x1)

| |a1

g(x1)

| |h1|2

2

2

=3

4.

From (1.12), this gives

e2 2 34

(e1 + |h1|)e1 23

4

+

2

e1 =

1

2e1.

Therefore, we have |x2 x| 12 < and

|h2| = |x2 x1| e2 + e1 32

=3

4h < h.

Using the same arguments, we obtain

ek+1 23

(ek + |hk|)ek 12

ek for all k = 1, 2, . . . . (1.15)

To prove the two-step q-quadratic convergence of the secant method, we notethat

|hk| = |xk xk1| ek + ek1, k = 1, 2, . . . .


30/236


Using the linear convergence, we derive from (1.15)

ek+1 23

(ek + ek + ek1)ek 23

(2ek1)1

2ek1 =

2

3e2k1.

This implies the two-step q-quadratic convergence.

In numerical simulations, we have to deal with the restrictions arising fromthe finite arithmetic of the computer. In particular, we should not choose hktoo small because then f l(xk) = f l(xk + hk), where f l(a) is the floating pointrepresentation of a, and the finite-difference approximation of g(xk) is notdefined. Additionally, it can also happen that f l(g(xk)) = f l(g(xk + hk)),although the derivative is not equal to zero. This is one reason that the secantprocess can fail. Of course, it depends on the function and how accurately wecan approximate the zero of the function.

In the next example, we consider the secant method again applied to thefunction g(x) = x2 2.

Example 1.6. Let g : Rn Rn be given by (1.4). We apply the secantmethod, (1.7) and (1.8), to solve g(x) = 0, starting from the initial conditionx0 = 3. In the first iteration step of the secant method, we use a0 = g

(x0). InFigure 1.2, the first two steps of the secant method are displayed. Table 1.1shows that the secant method needs just a few more iterations to obtain thesame precision for the approximation of

2 as Newtons method.

iteration Newtons method Secant method

0 3.0 3.0

1 1.833333333333 1.8333333333332 1.462121212121 1.5517241379313 1.414998429895 1.4312393887954 1.414213780047 1.4149984298955 1.414213562373 1.4142182573496 - 1.4142135636767 - 1.414213562373

Table 1.1: The method of Newton (1.2) and the secant method (1.7) and (1.8) ap-proximating

2.


31/236

1.2 The method of Newton 25

1 2 3 42

0

2

4

6

8

10

x0x1x2

Figure 1.2: The first two steps of the secant method, defined by (1.7) and (1.8), forx2

2 = 0, starting from x0 = 3.

1.2 The method of Newton

This section we derive the method of Newton in the multi-dimensional set-ting. By Theorem 1.10, we prove that in Rn Newtons method is locallyq-quadratically convergent. We then discuss the finite-difference version ofNewtons method and other types of quasi-Newton methods that leads to theintroduction of the method of Broyden in Section 1.3.

Derivation of the algorithm

Similar to the one-dimensional setting, the method of Newton is based onfinding the root of an affine approximation to g at the current iterate xk. Thelocal model is derived from the equality

g(xk + s) = g(xk) +

xk+sxk

Jg(z)sdz,

where Jg denotes the Jacobian ofg. If the integral is approximated by Jg(xk)s,the model in the current iterate becomes

lk(xk + s) = g(xk) + Jg(xk)s. (1.16)


32/236


We solve this affine model for s, that is, find sk Rn such that

lk(xk + sk) = 0.

This Newton step, sk, is added to the current iterate

xk+1 = xk + sk.

The new iterate xk+1 is not expected to equal x

, but only to be a betterestimate than xk. Therefore, we build the Newton iteration into an algorithm,starting from an initial guess x0.

Algorithm 1.7 (Newtons method). Choose an initial estimate x0 Rn,pick > 0, and set k := 0. Repeat the following sequence of steps until

g(xk) < .i) Solve Jg(xk)sk = g(xk) for sk,

ii) xk+1 := xk + sk.

In order to judge every iterative method described in this thesis, we con-sider the rate of convergence of each method on a test function, the discreteintegral equation function, as described in Appendix A. We have chosen thisfunction from a large set of test function, called the CUTE collection, cf.[18, 47]. It is a commonly chosen problem and, in addition, the method ofBroyden is able to compute a zero of this function rather easily. The first timewe use this test function, we explicitly give the expression of the function. Infuture examples, we refer to Appendix A.

Example 1.8. We apply Algorithm 1.7 to find a zero of the discrete integralequation function, given by

gi(x) = xi +h

2

(1ti)

ij=1

tj(xj +tj +1)3 +ti

nj=i+1

(1tj )(xj +tj +1)3, (1.17)for i = 1, . . . , n , where h = 1/(n + 1) and ti = i h, i = 1, . . . , n . We start withthe initial vector x0 given by

x0 = (t1(t1 1), . . . , tn(tn 1)).

In Table 1.2, the convergence properties of Newtons method are described,for different dimensions n of the problem. The initial residual g(x0) and the


33/236


final residual g(xk) are given, where k is the number of iterations used.The variable R is a measure of the rate of convergence and defined by

R = log(g(x0)/g(xk))/k. (1.18)

The residual g(xk), k = 0, . . . , k, is plotted in Figure 1.3. We observe thatthe dimension of the problem does not influence the convergence of Newtonsmethod in case of this test function.

method n g(x0) g(xk) k R

Newton 10 0.2518 6.3085 1015 3 10.4393Newton 100 0.7570 1.7854 1014 3 10.4594Newton 200 1.0678 2.4858 1014 3 10.4637

Table 1.2: The convergence properties of Algorithm 1.7 applied to the discrete integralequation function (1.17) for different dimensions n.

0 1 2 310

15

1010

105

100

iteration k

residual

g(xk

)

Figure 1.3: The convergence rate of Algorithm 1.7 applied to the discrete integralequation function (1.17) for different dimensions n. [(n = 10), (n = 100), +(n =200)]

Note that if g is an affine function, Newton solves the problem in oneiteration. Even if a component function of g is affine, each iterate generatedby Newtons method is a zero of this component function, that is, ifg1 is affinethen g1(x1) = g1(x2) = . . . = 0. We illustrate this with an example.


34/236


Example 1.9. We consider the Rosenbrock function g : R2 R2 defined by

g(x) =

10(x2 x21)

1 x1

.

As initial condition, we choose x0 = (1.2, 1). The function value in x0equals g(x0) = (4.4, 2.2). Note that the second component of the Rosen-brock function is affine in x. This explains the zero in the function value of

x1 = (1, 3.84), which equals g(x1) = (48.4, 0). As said before, all futureiterates will be a zero of the second component function. This implies thatthe first component of xk will be equal to 1 for all future iterations. So, thefirst component of the Rosenbrock function has become affine and the nextiterate yields the solution x2 = (1, 1).

Some problems arise in implementing Algorithm 1.7. The Jacobian ofg is often not analytically available, for example, if g itself is not given inanalytic form. A finite-difference method or less expensive methods should beused to approximate the Jacobian. Secondly, if Jg(xk) is ill-conditioned, thenJg(xk)sk = g(xk) will not give a reliable solution.

Local convergence of Newtons method

In this section, we give a proof of the local q-quadratic convergence of Newtonsmethod and discuss its implications. The proof is a prototype of the proofsfor convergence of the quasi-Newton methods. All the convergence results inthis thesis are local, i.e., there exists an > 0 such that the iterative methodconverges for all x0 in an open neighborhood N(x

, ) of the solution x. Here,

N(x, ) = {x Rn | x x < }.

Theorem 1.10. Let g : Rn

R

n be continuously differentiable in an open,convex set D Rn. Assume that Jg Lip(D) and that there exist x Rnand > 0 such thatg(x) = 0 andJg(x

) is nonsingular withJg(x)1 .Then there exists an > 0 such that for all x0 N(x, ) the sequence {xk}generated by

xk+1 = xk Jg(xk)1g(xk), k = 0, 1, 2, . . . ,

is well defined, converges to x, and satisfies

xk+1 x xk x2 for k = 0, 1, 2, . . . . (1.19)


35/236


We can show that the convergence is q-quadratic, by choosing such thatJg(x) is nonsingular for all x N(x, ). The reason is that if the Jacobianis nonsingular, the local error in the affine model (1.16) is at most of orderO(xk x2). This is a consequence of the following lemma.Lemma 1.11. Let g : Rn Rn be continuously differentiable in the open,convex set D Rn and let x D. If Jg Lip(D), then for any y D,

g(y) g(x) Jg(x)(y x)

2 y x2

.Proof. According to the fundamental theorem of calculus, we have

g(y) g(x) Jg(x)(y x)

=1

0Jg(x + t(y x))(y x)dt

Jg(x)(y x)

=

10

(Jg(x + t(y x)) Jg(x))(y x)dt. (1.20)

We can bound the integral on the right hand side of (1.20) in terms of theintegrand. Together with the Lipschitz continuity of Jg at x D, this implies

g(y) g(x) Jg(x)(y x) 1

0Jg(x + t(y x)) Jg(x)(y x)dt

1

0t(y x)(y x)dt

= (y x)21

0tdt =

2y x2.

The next theorem says that matrix inversion is continuous in norm. Fur-thermore, it gives a relation between the norms of the inverses of two nearby

matrices that is useful later in analyzing algorithms.

Theorem 1.12. Let. be the induced l2-norm onRnn and let E Rnn.If E < 1, then (I E)1 exists and

(I E)1 11 E .

If A is nonsingular and A1(B A) < 1, then B is nonsingular and

B1 A1

1 A1(B A) .


36/236


The proof of Theorem 1.12 can be found in [18].

Proof (of Theorem 1.10). We choose

1/(2) (1.21)

so that N(x, ) D. By induction on k, we show that (1.19) holds for eachiteration step and that

xk+1 x 12xk x,

which implies that xk+1 N(x, ) if xk N(x, ).We first consider the basis step (k = 0). Using the Lipschitz continuity of

Jg at x, x0 x and (1.21), we obtain

Jg(x)1(Jg(x0) Jg(x)) Jg(x)1Jg(x0) Jg(x) x0 x 1

2.

Theorem 1.12 implies that Jg(x0) is nonsingular and

Jg(x0)1 Jg(x)1

1 Jg(x)1(Jg(x0) Jg(x)) 2Jg(x)1 2. (1.22)

This implies that x1 is well defined and, additionally,

x1 x = x0 x Jg(x0)1g(x0)= x0 x Jg(x0)1(g(x0) g(x))= Jg(x0)

1(g(x) g(x0) Jg(x0)(x x0)). (1.23)

The second factor in (1.23) gives the difference between g(x

) and the affinemodel l0(x) evaluated at x

. Therefore, by Lemma 1.11 and (1.22),

x1 x Jg(x0)1g(x) g(x0) Jg(x0)(x x0) 2

2x0 x2 = x0 x2.

We have shown (1.19) for k = 0. Since x0 x 1/(2), it follows thatx1 x 12x0 x, which yields x1 N(x, ) as well. This completesthe proof for k = 0.

The proof of the induction step proceeds in the same way.


37/236


Note that ifg is affine, the Jacobian is constant and the Lipschitz constant can be chosen to be zero. We then have

x1 x x0 x2 = 0,

and the method of Newton converges exactly in one single iteration. If g is anonlinear function, the relative nonlinearity of g at x is given by, rel = .So, for x D,

Jg(x)1(Jg(x) Jg(x)) Jg(x)1Jg(x) Jg(x) x x = relx x.

The radius of guaranteed convergence of Newtons method is inversely propor-tional to the relative nonlinearity, rel, of g at x

. The bound for the regionof convergence is a worst-case estimate. In directions from x in which g isless nonlinear, the region of convergence may very well be much larger.

We conclude with a summery of the characteristics of Newtons method.

Advantages of Newtons method

q-Quadractically convergent from good starting points if Jg(x) is non-singular,

Exact solution in one iteration for an affine function g (exact at eachiteration for any affine component function of g).

Disadvantages of Newtons method

Not globally convergent for many problems, Requires the Jacobian Jg(xk) at each iteration step,

Each iteration step requires the solution of a system of linear equationsthat might be singular or ill-conditioned.

Quasi-Newton methods

We have already indicated that it is not always possible to compute the Ja-cobian of the function g, or that it is very expensive. In this case, we have toapproximate the Jacobian, for example, by using finite differences.

Algorithm 1.13 (Discrete Newton method). Choose an initial estimatex0 Rn and set k := 0. Repeat the following sequence until g(xk) < .


38/236


i) Compute

Ak =

(g(xk + hke1) g(xk))/hk (g(xk + hken) g(xk))/hk

ii) Solve Aksk = g(xk) for sk,iii) xk+1 := xk + sk.

Example 1.14. We consider the discrete integral equation function g, given

by (A.5). We assume that hk h and apply Algorithm 1.13 for different valuesof h. We start with the initial condition x0 given by (A.6) and set = 10

12.The convergence properties of the discrete Newton method are described inTable 1.3. The rate of convergence is plotted in Figure 1.4. The differencebetween the real Jacobian and the approximated Jacobian Jg(xk)Ak turnsout to be of order 105 for h = 1.0 104, of order 107 for h = 1.0 108 andof order 103 for h = 1.0 1012.

method n h g(x0) g(xk) k R

Discrete Newton 100 1.0 104 0.7570 4.4908 1016 4 8.7652Discrete Newton 100 1.0 10

8

0.7570 8.6074 1014

3 9.9351Discrete Newton 100 1.0 1012 0.7570 2.1317 1013 4 7.2246

Table 1.3: The convergence properties of Algorithm 1.13, applied to the discreteintegral equation function (A.5) for different values of h.

If the finite-difference step size hk is properly chosen, the discrete Newtonmethod is also q-quadratically convergent. This is the conclusion of the nexttheorem. We denote the l1 vector-norm and the corresponding induced matrixnorm by .1.Theorem 1.15. Letg and x satisfy the assumptions of Theorem 1.10. Then

there exist , h > 0 such that if {hk} is a real sequence with 0 < |hk| h andx0 N(x, ), the sequence {xk} generated by

xk+1 = xk A1k g(xk), k = 0, 1, 2, . . . ,where

Ak =

(g(xk + hke1) g(xk))/hk (g(xk + hken) g(xk))/hk

,

is well defined and converges q-linearly to x. Additionally, if

limk

hk = 0,


39/236


0 1 2 3 4

1015

1010

105

100

iteration k

residual

g(xk

)

Figure 1.4: The convergence rate of the discrete Newton method 1.13 applied tothe discrete integral equation function (A.5) for different values of h. [(h = 104),(h = 108), +(h = 1012)]

then the convergence is q-superlinear. If there exists a constant c1 such that

|hk

| c1

xk

x

1,

or equivalently a constant c2 such that

|hk| c2g(xk)1,

then the convergence is q-quadratic.

For the proof of Theorem 1.15 we refer to [18]. Another way to avoid com-putations of the Jacobian in every iteration is to compute the Jacobian in thefirst iteration, A = Jg(x0), and use this matrix in all subsequent iterations asan approximation ofJg(xk). This method is called the Newton-Chord method.

It turns out that the Newton-Chord method is locally linearly convergent [38].Algorithm 1.16 (Newton-Chord method). Choose an initial estimatex0 Rn, set k := 0, and compute the Jacobian A := Jg(x0). Repeat thefollowing sequence of steps until g(xk) < .

i) Solve Ask = g(xk) for sk,ii) xk+1 := xk + sk.

Example 1.17. Let g be the discrete integral equation function given by(A.5). We apply Algorithm 1.16 and Algorithm 1.7 to approximate the zero


40/236


ofg. As initial estimate we choose x0 given by (A.6) multiplied by a factor 1, 10or 100. The convergence properties of the Newton-Chord method and Newtonsmethod are described in Table 1.4. In Figure 1.5, we can observe the linearconvergence of the Newton-Chord method. The rate of convergence of theNewton-Chord method is very low in case of the initial condition 100x0. Clearlyfor all initial conditions, the Newton-Chord method needs more iterations toconverge than the original method of Newton, see Figure 1.6.

method n factor g(x0) g(xk) k R

Newton Chord 100 1 0.7570 2.3372 1014 8 3.8886Newton Chord 100 10 18.5217 2.9052 1013 16 1.9866Newton Chord 100 100 3.8215 103 2.1287 200 0.0375

Newton 100 1 0.7570 1.7854 1014 3 10.4594Newton 100 10 18.5217 2.7007 1016 4 9.6917Newton 100 100 3.8215 103 3.9780 1013 9 4.0890

Table 1.4: The convergence properties of Algorithm 1.16 and Algorithm 1.7 applied tothe discrete integral equation function (A.5) for different initial conditions ( x0, 10x0and 100x0).

0 2 4 6 8 10 12 14 16 18 20

1015

1010

105

100

105

iteration k

residua

l

g(xk

)

Figure 1.5: The convergence rate of Algorithm 1.16 applied to the discrete integralequation function (A.5) for different initial conditions. [(x0), (10x0), +(100x0)]


41/236

1.3 The method of Broyden 35

0 1 2 3 4 5 6 7 8 9 10

1015

1010

105

100

105

iteration k

residual

g(xk

)

Figure 1.6: The convergence rate of Algorithm 1.7 applied to the discrete integralequation function (A.5) for different initial conditions. [(x0), (10x0), +(100x0)]

1.3 The method of Broyden

The Newton-Chord method of the previous section saves us the expensive

computation of the Jacobian Jg(xk) is every iterate xk of the process, by ap-proximating it by the Jacobian in the initial condition, A = Jg(x0). Additionalinformation about the Jacobian obtained during the process is neglected. Thisinformation consists of the function values of g in the iterates needed to com-pute the step sk. In this section, we start with the basis idea for a class ofmethods that adjust the approximation matrix to the Jacobian Jg(xk) usingonly the function value g(xk). We single out the method proposed by C.G.Broyden in 1965 [8] that has a q-superlinear and even 2n-step q-quadraticlocal convergence rate and seems to be very successful in practice. This algo-rithm, which is analogous to the method of Newton, is called the method ofBroyden.

A derivation of the algorithm

Recall that in one dimension we use the local model (1.6),

lk+1(x) = g(xk+1) + ak+1(x xk+1)for the nonlinear function g. Note that lk+1(xk+1) = g(xk+1) for all choices ofak+1 Rn. If we set ak+1 = g(xk+1), we obtain Newtons method. If g(xk+1)is not available, we force the scheme to satisfy lk+1(xk) = g(xk), that is

g(xk) = g(xk+1) + ak+1(xk xk+1),


42/236


which yields the secant approximation (1.8),

ak+1 =g(xk+1) g(xk)

xk+1 xk .

The next iterate xk+2 is the zero of the local model, lk+1. Therefore we arriveat the quasi-Newton update

xk+2 = xk+1

g(xk+1)/ak+1.

The price we have to pay is a reduction in local convergence rate, from q-quadratic to 2-step q-quadratic convergence.

In multiple dimensions, we apply an analogous affine model

lk+1(x) = g(xk+1) + Bk+1(x xk+1).

For Newtons method Bk+1 equals the Jacobian Jg(xk+1). We enforce the samerequirement that led to the one-dimensional secant method. So, we assumethat lk+1(xk) = g(xk), which implies that

g(xk) = g(xk+1) + Bk+1(xk

xk+1). (1.24)

Furthermore, if we define the current step by sk = xk+1 xk, and the yield ofthe current step by yk = g(xk+1) g(xk), Equation (1.24) is reduced to

Bk+1sk = yk. (1.25)

We refer to (1.25) as the secant equation. For completeness we first give thedefinition of a secant method.

Definition 1.18. The iterative process

xk+1 = xk B1k g(xk)

is called a secant method if the matrix Bk satisfies the secant equation (1.25)in every iteration step.

The crux of the problem in extending the secant method to more thanone dimension is that (1.25) does not completely specify the matrix Bk+1. Infact, if sk = 0, there is an n(n 1)-dimensional affine subspace of matricessatisfying (1.25). Constructing a successful secant approximation consists ofselecting a good approach to choose from all these possibilities. The choiceshould enhance the Jacobian approximation properties of Bk+1 or facilitateits use in a quasi-Newton algorithm.


43/236


A possible strategy is using the former function evaluations. That is, inadditional to the secant equation, we set

g(xl) = g(xk+1) + Bk+1(xl xk+1), l = k m , . . . , k 1.

This is equivalent to

g(xl) = g(xl+1) + Bk+1(xl xl+1), l = k m , . . . , k 1,

so,Bk+1sl = yl, l = k m , . . . , k 1. (1.26)

For m = n1 and linear independent skm, . . . , sk the matrix Bk+1 is uniquelydetermined by (1.25) and (1.26). Unfortunately, most of the time skm, . . . , sktend to be linearly dependent, making the computation of Bk+1 a poorly posednumerical problem.

The approach that leads to the successful secant approximation is quitedifferent. Aside from the secant equation no new information about either theJacobian or the model is given. The idea is to preserve as much as possible ofwhat we already have. Therefore, we try to minimize the change in the affine

model, subject to the secant equation (1.25). The difference between the newand the old affine model, at any x is given by

lk+1(x) lk(x) = g(xk+1) + Bk+1(x xk+1) g(xk) Bk(x xk)= yk Bk+1sk + (Bk+1 Bk)(x xk)= (Bk+1 Bk)(x xk).

The last equality is due to the secant equation. Now if we write an arbitraryx Rn as

x xk = sk + q, where qTsk = 0, R,the expression that we want to minimize becomes

lk+1(x) lk(x) = (Bk+1 Bk)sk + (Bk+1 Bk)q. (1.27)

We have no control over the first term on the right hand side of (1.27), sinceit equals

(Bk+1 Bk)sk = yk Bksk. (1.28)However, we can make the second term on the right hand side of (1.27) zerofor all x Rn, by choosing Bk+1 such that

(Bk+1 Bk)q = 0, for all q sk. (1.29)


44/236


This implies that (Bk+1 Bk) has to be a rank-one matrix of the form usTk ,with u Rn. Equation (1.28) now implies that u = (yk Bksk)/(sTk sk). Thisleads to the Broyden or secant update

Bk+1 = Bk +(yk Bksk)sTk

sTk sk. (1.30)

The word update indicates that we are not approximating the Jacobian in

the new iterate, Jg(xk+1), from scratch. Rather a former approximation Bkis updated into a new one, Bk+1. This type of updating is shared by all thesuccessful multi-dimensional secant approximation techniques.

We arrive at the algorithm of Broydens method.

Algorithm 1.19 (Broydens method). Choose an initial estimate x0 R

n and a nonsingular initial Broyden matrix B0. Set k := 0 and repeat thefollowing sequence of steps until g(xk) < .

i) Solve Bksk = g(xk) for sk,

ii) xk+1 := xk + sk

iii) yk := g(xk+1) g(xk),

iv) Bk+1 := Bk + (yk Bksk)sTk /(sTk sk),

In this section we use the Frobenius norm, denoted by .F. The norm isdefined by

AF = n

i=1

nj=1

A2ij

1/2. (1.31)

So, it equals the l2-vector norm of the matrix written as a n2-vector. For

y, s R

n

the set of all matrices that satisfy the secant equation As = y isdenoted by

Q(y, s) = {A Rnn | As = y}.In the preceding, we have followed the steps of Broyden when developing

his iterative method in [8], but the derivation of the Broyden update can bemade much more rigorous. The Broyden update is the minimum change toBk consistent with the secant equation (1.25), if (Bk+1 Bk) is measured inFrobenius norm. That is, of all matrices A that satisfy the secant equation(1.25) the new Broyden matrix Bk+1 yields the minimum ofA BkF. Thiswill be proved in Lemma 1.20.


45/236


Lemma 1.20. Let B Rnn and s, y Rn arbitrary. If s = 0, then theunique solution A = B to

minAQ(y,s)

A BF (1.32)

is given by

B = B +(y Bs)sT

sTs.

Proof. We compute for any A Q(y, s),B BF =

(y Bs)sTsTs

F

=(A B)ssT

sTs

F

A BFssT

sTs

2

= A BF.

Note that Q(y, s) is a convex (in fact, affine) subset of Rnn. Because theFrobenius norm is strictly convex, the solution to (1.32) is unique on theconvex subset Q(y, s).

We have not defined yet what should be chosen for the initial approxima-tion B0 to the Jacobian in the initial estimate, Jg(x0). The finite differencesapproximation turns out to be a good start. It also makes the minimum changecharacteristics of Broydens update more appealing, as given in Lemma 1.20.Another choice, that avoids the computation of Jg(x0), is taking the initialapproximation equal to minus identity,

B0 = I. (1.33)Suppose the function g is defined by g(x) = f(x) x, where f is the periodmap of a dynamical process

xk+1 = f(xk), k = 0, 1, . . . . (1.34)A fixed point of the process (1.34) is a zero of the function g. By choosingB0 = I, the first iteration of Broyden is just a dynamical simulation step,

x1 = x0 B10 g(x0) = x0 (f(x0) x0) = f(x0).So, in this way, we let the system choose the direction of the first step. Inaddition, the initial Broyden matrix is easy to store and can be directly imple-mented in the computer code. This makes the reduction methods discussedin Chapter 3 effective.

We now apply the method of Broyden to the test function (A.5).


46/236


Example 1.21. Let g be the discrete integral equation function given by(A.5). We define the initial condition x0 by (A.6) and we set = 10

12. Weapply Algorithm 1.19 for different dimensions of the problem. The convergenceresults for the method of Broyden are described in Table 1.5. Although theBroydens method needs more iterations to converge than Newtons method, itavoids the computation of the Jacobian. The method of Broyden method onlymakes one function evaluation per iteration compared to the n + 1 functionevaluations of Algorithm 1.13. The rate of convergence again does not depend

on the dimension of the problem, see also Figure 1.7.

method n g(x0) g(xk) k R

Broyden 10 0.2518 4.8980 1014 21 1.3937Broyden 100 0.7570 4.4398 1013 21 1.3412Broyden 200 1.0678 6.3644 1013 21 1.3404

Table 1.5: The convergence results for Algorithm 1.19 applied to the discrete integralequation function (A.5) for different dimensions n.

0 2 4 6 8 10 12 14 16 18 20 2210

15

1010

105

100

iteration k

residual

g(xk

)

Figure 1.7: The convergence rate of Algorithm 1.19 applied to the discrete integralequation function (A.5) for different dimensions n. [(n = 10), (n = 100), +(n =200)]

Example 1.22. Let g be the discrete integral equation function given by(A.5). We apply Algorithm 1.19 to approximate the zero of g. As we did forthe Newton-Chord method and the method of Newton we multiply the initialcondition x0, given by (A.6), by a factor 1, 10 and 100. The convergence results


47/236


for Broydens method are described in Table 1.6. For the initial condition100x0 the method of Broyden fails to converge.

method n factor g(x0) g(xk) k R

Broyden 100 1 0.7570 4.4398 1013 21 1.3412Broyden 100 10 18.5217 8.7765 1013 33 0.9297Broyden 100 100 3.8215

103 1.0975

10+20 13 -2.9151

Table 1.6: The convergence results for Broydens method 1.19 applied to the discreteintegral equation function (A.5) for different initial conditions, x0, 10x0 and 100x0.

Superlinear convergence

In order to prove the convergence of Broydens method, we first need thefollowing extension of Lemma 1.11.

Lemma 1.23. Let g : Rn Rn be continuously differentiable in the open,convex set

D R

n, x

D. If Jg

Lip(

D) then for every u and v in

Dg(v) g(u) Jg(x)(v u) max{v x, u x}v u. (1.35)Moreover, if Jg(x) is invertible, there exist > 0 and > 0 such that

(1/)v u g(v) g(u) v u, (1.36)for all u, v D for which max{v x, u x} .Proof. The proof of Equation (1.35) is similar to the proof of Lemma 1.11.

Equation (1.35) together with the triangle inequality implies that for u, vsatisfying max{v x, u x} ,

g(v)

g(u)

J

g(x)(v

u)

+

g(v)

g(u)

J

g(x)(v

u)

(Jg(x) + max{v x, u x})v u (Jg(x) + )v u.

Similarly,

g(v) g(u) Jg(x)(v u) g(v) g(u) Jg(x)(v u)

1Jg(x)1 max{v x, u x}

v u

1

Jg(x)1 v u.


48/236


Thus if < (1/Jg(x)1), then 1/Jg(x)1 > 0 and (1.36) holds if wechoose large enough such that

> Jg(x) + ,and

1

0 such that

A AF, (1.37)where . is the l2-operator norm induced by the corresponding vector norm.

By L(Rn) we denote the space of all linear maps from Rn to Rn, i.e., all(n n)-matrices. So, an element of the power set P{L(Rn)} is a set of linearmaps from Rn to Rn. The function appearing in Theorem 1.24 is a set valuedfunction, that assigns to a couple of a vector x Rn and a matrix B L(Rn) aset of matrices

{B

}. This can consists of one single element and it can contain

B itself.

Theorem 1.24. Let g : Rn Rn be continuously differentiable in the open,convex setD Rn, and assume thatJg Lip(D). Assume that there exists anx D such that g(x) = 0 and Jg(x) is non-singular. Let : Rn L(Rn) P{L(Rn)} be defined in a neighborhood N = N1 N2 of (x, Jg(x)) where N1is contained in D and N2 only contains non-singular matrices. Suppose thereare non-negative constants 1 and 2 such that for each (x, B) in N, and forx = x B1g(x), the function satisfies

B Jg(x)F

1 + 1 max{x x, x x}

B Jg(x)F

+ 2 max{x x

, x x

} (1.38)for eachB in(x, B). Then for arbitraryr (0, 1), there are positive constants(r) and (r) such that for x0 x < (r) and B0 Jg(x)F < (r), andBk+1 (xk, Bk), k 0, the sequence

xk+1 = xk B1k g(xk) (1.39)is well defined and converges to x. Furthermore,

xk+1 x rxk x (1.40) for eachk 0, and {Bk}, {B1k } are uniformly bounded.


49/236


Proof. Let r (0, 1) be given and set Jg(x)1. Choose (r) = and(r) = such that

(21 + 2)

1 r , (1.41)and for given by (1.37),

(1 + r)( + 2) r. (1.42)If necessary further restrict and so that (x, B) lies in the neighborhood Nwhenever BJg(x)F < 2 and xx < . Suppose that B0Jg(x)F < and x0 x < . Then B0 Jg(x) < < 2, and since (1.42) yields

2(1 + r) r, (1.43)Theorem 1.12 gives

B10

1 2

1 r/(1 + r) = (1 + r).

Lemma 1.23 now implies that

x1 x

x0 B1

0 g(x0) x

B10

g(x0) g(x) Jg(x)(x0 x)

+B0 Jg(x)x0 x

(1 + r)( + 2)x0 x,and by (1.42) it follows that x1 x rx0 x. Hence, x1 x < ,and thus x1 D.

We complete the proof with an induction argument. Assume that bothBk Jg(x)F 2 and xk+1 x rxk x for k = 0, 1, . . . , m 1. Itfollows from (1.38) that

Bk+1 Jg(x)F Bk Jg(x)F (1Bk Jg(x)F + 2)max{xk+1 x, xk x} (21 + 2)max{rxk x, xk x} (21 + 2)rkx0 x (21 + 2)rk,

and by summing both sides from k = 0 to m 1, we obtain

Bm Jg(x)F B0 Jg(x)F + (21 + 2) 1 r ,


50/236


which by (1.41) implies that Bm Jg(x) 2. To complete the inductionstep we only need to prove that xm+1 x rxm x. This follows byan argument similar to the one for m = 1. In fact, since Bm Jg(x) 2,Lemma 1.12 and (1.43) implies that

B1m (1 + r),

and by Lemma 1.23 it follows that

xm+1 x B1m g(xm) g(x) Jg(x)(xm x)

+Bm Jg(x)xm x

(1 + r)( + 2)xm x,

and xm+1 x rxm x follows from (1.42).

Corollary 1.25. Assume that the hypotheses of Theorem 1.24 hold. If somesubsequence of {Bk Jg(x)} converges to zero, then the sequence {xk}converges q-superlinearly at x.

Proof. We would like to show that

limk

xk+1 xxk x = 0.

By Theorem 1.24 there are numbers ( 12 ) and (12 ) such that B0 Jg(x)F 0 such that BmJg(x)F < (r)and xm x < (r). So, xk+1 x rxk x for each k m. Sincer (0, 1) was arbitrary, the proof is completed.

It should be clear that some condition like the one in Corollary 1.25 isnecessary to guarantee q-superlinear convergence. For example, the Newton-Chord iteration scheme, see Algorithm 1.16,

xk+1 = xk Jg(x0)1g(xk)

satisfies (1.38) with 1 = 2 = 0, but is, in general, only linearly convergent.One of the interesting aspects of the result of the following theorem is thatq-superlinear convergence is guaranteed for the method of Broyden, withoutany subsequence of {Bk Jg(x)} necessarily converging to zero.


51/236


Theorem 1.26. Let g : Rn Rn be continuously differentiable in the open,convex set D Rn, and assume that Jg Lip (D). Let x be a zero of g, forwhichJg(x

) is non-singular. Then the update function(x, B) = {B | s = 0},where

B = B + (y Bs) sT

sTs, (1.44)

is well defined in a neighborhood N = N1 N2 of (x, Jg(x)), and the corre-sponding iteration

xk+1 = xk B1k g(xk) (1.45)with Bk+1 (xk, Bk), k 0, is locally and q-superlinearly convergent at x.

Before we can prove the theorem we need some preparations. The ideaof the proof of Theorem 1.26 is in the following manner. If B is given by(1.44), then Lemma 1.23 and standard properties of the matrix norms .2and .F imply that there exists a neighborhood N of (x, Jg(x)) such thatcondition (1.38) is satisfied for every (x, B) in N. Subsequently, Theorem 1.24yields that iteration (1.45) is locally and linearly convergent. The q-superlinearconvergence is a consequence of the following two lemmas.

Lemma 1.27. Let xk Rn

, k 0. If {xk} converges q-superlinearly to x

R

n, then in any norm .,

limk

xk+1 xkxk x = 1.

Define the error in the current iteration ek by

ek = xk x. (1.46)The proof is drawn in Figure 1.8. Clearly, if

lim

k

ek+1ek

= 0, then lim

k

skek

= 1.

Proof (of Lemma 1.27). With ek given by (1.46) we compute

limk

skek 1 = lim

k

sk ekek

limk

sk + ekek

= limk

ek+1ek = 0,


52/236


xk

xk+1

sk

ek

ek+1

x

Figure 1.8: Schematic drawing of two subsequent iterates.

where the final equality is the definition of q-superlinear convergence if ek = 0for all k.

Note that Lemma 1.27 is also of interest to the stopping criteria in our al-gorithms. It shows that whenever an algorithm achieves at least q-superlinearconvergence, then any stopping test that uses sk is essentially equivalent tothe same test using ek, which is the quantity we are really interested in.

Lemma 1.28. LetD Rn be an open, convex set, g : Rn Rn continuouslydifferentiable, and Jg Lip(D). Assume that Jg(x) is non-singular for somex D . Let {Ak} be a sequence of nonsingular matrices in L(Rn). Suppose

for some x0 D that the sequence of points generated byxk+1 = xk A1k g(xk) (1.47)

remains in D, and satisfies limk xk = x, where xk = x for every k. Then{xk} converges q-superlinearly to x in some norm . and g(x) = 0, if andonly if

limk (Ak

Jg(x

))sk

sk = 0 (1.48)where sk = xk+1 xk.Proof. Define ek = xk x. First we assume that (1.48) holds, and show thatg(x) = 0 and that {xk} converges q-superlinearly to x. Equation (1.47) gives

0 = Aksk + g(xk) = (Ak Jg(x))sk + g(xk) + Jg(x)sk,

so that

g(xk+1) = (Ak Jg(x))sk + (g(xk+1) + g(xk) + Jg(x)sk), (1.49)


53/236


and

gk+1sk

Ak Jg(x)sksk +

g(xk+1) + g(xk) + Jg(x)sksk

Ak Jg(x)sk

sk + max{x x, x x} (1.50)

where the second inequality follows from Lemma 1.23. Equation (1.50) to-

gether with limk ek = 0 and (1.48) gives

limk

g(xk+1)sk = 0. (1.51)

Since limk sk = 0, it follows that

g(x) = limk

g(xk) = 0.

From Lemma 1.23, there exist > 0, k0 0, such that

g(xk+1) = g(xk+1) g(x

) 1

ek+1, (1.52)

for all k k0. Combining (1.51) and (1.52) gives

0 = limk

g(xk+1)sk

limk

1

ek+1sk

limk

1/ ek+1ek + ek+1 = limk

1/ rk1 + rk

,

where rk = ek+1/ek. This implies

limk

rk = 0,

which completes the proof of q-superlinear convergence.

The proof of the reverse implication, that q-superlinear convergence andg(x) = 0 imply (1.48), is the derivation above read in more or less the reversedorder. From Lemma 1.23, there exist > 0, k0 0, such that

g(xk+1) ek+1


54/236


for all k k0. Therefore,

0 = limk

ek+1ek

limk

g(xk+1)1/ek

limk

g(xk+1)

sk

sk

ek

. (1.53)

The q-superlinear convergence implies that limk sk/ek = 1 accordingto Lemma 1.27. Together with (1.53) this gives that (1.51) holds. Finally,from (1.49) and Lemma 1.23,

(Ak Jg(x))sksk

g(xk+1)sk +

g(xk+1) + g(xk) + Jg(x)sksk

g(xk+1)sk + max{x x, x x},

which together with (1.51) and limk ek = 0 proves (1.48).Due to the Lipschitz continuity of Jg, it is easy to show that Lemma 1.28

remains true if (1.48) is replaced by

limk

(Ak Jg(xk))sksk = 0. (1.54)

This condition has an interesting interpretation. Because sk = A1k g(xk),Equation (1.54) is equivalent to

limk

Jg(xk)(sNk sk)sk = 0,

where sNk = Jg(xk)1g(xk) is the Newton step from xk. Thus the necessaryand sufficient condition for the q-superlinear convergence of a secant method

is that the secant steps converge, in magnitude and direction, to the Newtonsteps from the same points.

After stating a final lemma we are able to prove the main theorem of thissection.

Lemma 1.29. Let s Rn be nonzero and E Rnn. ThenEI ssTsTs

F

=E2F

Ess

21/2(1.55)

EF 12EF

Ess

2. (1.56)


55/236


Proof. Note that I (ssT/sTs) is a Euclidean projection, and so is ssT/sTs.So by the Pythagorean theorem,

E2F =EssT

sTs

2F

+EI ssT

sTs

2F

,

and the equality

E

ssT

sTsF=

Es

s

,

we have proved (1.55). Because for any || 0, (2 2)1/2 2/2,Equation (1.55) implies (1.56).

Proof (of Theorem 1.26). In order to be able use both Theorem 1.24 andLemma 1.28, we first derive an estimate for B Jg(x). Assume that xand x are in D and s = 0. Define E = B Jg(x), E = B Jg(x),e = x x, and e = x x. Note that

E = B Jg(x)

= B

Jg(x

) + (y

Bs)

sT

sTs

= (B Jg(x))

I ssT

sTs

+ (y Jg(x)) s

T

sTs.

Therefore,

EF (B Jg(x))I ssT

sTs

F

+y Jg(x)s

s

EI ssTsTs

F

+ max{e, e}. (1.57)

For the last inequality of (1.57) Lemma 1.23 is used. Because I ssT/(sTs)is an orthogonal projection it has l

2-norm equal to one,I ssTsTs

= 1.Therefore, the inequality (1.57) can be reduced to

EF EF + max{e, e}. (1.58)We define the neighborhood N2 of Jg(x

) by

N2 =

B L(Rn) | Jg(x)1 B Jg(x) < 12

.


56/236


Then any B N2 is non-singular and satisfies

B1 Jg(x)1

1 (Jg(x)1(B Jg(x)) 2Jg(x)1.

To define the neighborhood N1 of x, choose > 0 and > 0 as in Lemma

1.23 so that max{x x, x x} implies that x and x belong to Dand that (1.36) holds, for u = x and v = x, i.e.,

(1/)x x g(x) g(x) x x. (1.59)In particular, if x x and B N2 then x D and

s = B1g(x) B1g(x) g(x) 2Jg(x)1x x.Let N1 be the set of all x Rn such that x x < 2 and

2Jg(x)1x x < 2

.

If N = N1 N2 and (x, B) N, then

x

x

s

+

x

x

.

Hence, x D and moreover, (1.59) shows that s = 0 if and only if x = x. So,the update function is well defined in N. Equation (1.58) then shows that theupdate function associated with the iteration (1.45) satisfies the hypothesesof Theorem 1.24 and therefore, the algorithm according to (1.45) is locallyconvergent at x. In addition we can choose r (0, 1) in (1.40) arbitrarily. Wetake r = 12 , so

xk+1 x 12xk x (1.60)

Considering Lemma 1.28, a sufficient condition for {xk} to converge q-superlinearly to x is

limk

Eksksk = 0. (1.61)

In order to justify Equation (1.61) we write Equation (1.57) as

Ek+1F EkI sksTk

sTk sk

F

+ max{ek+1, ek}. (1.62)

Using Equation (1.60) and Lemma 1.29 in (1.62), we obtain

Ek+1F EkF Eksk2

2EkFsk2 + ek,


57/236


orEksk2sk2 2EkF

EkF Ek+1F + ek

. (1.63)

Theorem 1.24 gives that {Bk} is uniformly bounded for k 0. This impliesthat there exists an M > 0 independently of k such that

Ek = Bk Jg(x) Bk + Jg(x) M.

By Equation (1.60) we obtain

k=0

ek 2.

Thus from (1.63),

Eksk2sk2 2M

EkF Ek+1F + ek

, (1.64)

and summing the left and right sides of (1.64) for k = 0, 1, . . . , m , yields

mk=0

Eksk2

sk2 2ME0F Em+1F + m

k=0

ek 2M

E0F + 2

2M

M + 2

. (1.65)

Because (1.65) is true for any m 0, we obtain

k=0

Eksk2sk2 < ,

which implies (1.61) and completes the proof.

The inverse notation of Broydens method

A restriction of the method of Broyden is that it is necessary to solve ann-dimensional system to compute the Broyden step, see Algorithm 1.19. Toavoid this problem, instead of the Broyden-matrix one could store the inverseof this matrix, and the operation is reduced to a matrix-vector multiplication.If Hk is the inverse of Bk then

sk = Hkg(xk),


58/236


and the secant equation becomes

Hk+1yk = sk. (1.66)

Equation (1.66) again does not define a unique matrix but a class of matrices.In Section 1.3, the new Broyden matrix Bk+1 has been chosen so that,

in addition to the secant equation (1.25), it satisfies Bk+1q = Bkq in anydirection q orthogonal to sk. This was sufficient to define Bk+1 uniquely andthe update was given by (1.30).

It is possible, using Householders modification formula, to compute thenew inverse Broyden matrix Hk+1 = B

1k+1 with very little effort from Hk.

Householders formula, also called the Sherman-Morrison formula, states thatif A is a nonsingular (n n)-matrix, u and v are vectors in Rn, and (1 +vTA1u) = 0, then (A + uvT) is nonsingular and

(A + uvT)1 = A1 A1uvTA1

1 + vTA1u. (1.67)

This formula is a particular case of the Sherman-Morrison-Woodbury for-mula derived in the next theorem.

Theorem 1.30. Let A Rnn be nonsingular and U, V Rnp be arbitrarymatrices with p n. If (I+ VTA1U) is nonsingular then(A + U VT)1 existsand

(A + U VT)1 = A1 A1U(I + VTA1U)1VTA1. (1.68)Proof. The formula (1.68) is easily verified by computing

(A1 A1U(I + VTA1U)1VTA1)(A + U VT),and

(A + U VT)(A1 A1U(I + VTA1U)1VTA1),that both yield the identity. Therefore A + U VT is invertible and the inverse

is given by (1.68).

Equation (1.67) gives that if sTk Hkyk = 0, then

Hk+1 = B1k+1 = (Bk + (yk Bksk)

sTksTk sk

)1

= B1k (B1k yk sk)sTk B

1k

sTk B1k yk

= Hk + (sk Hkyk) sTk Hk

sTk Hkyk. (1.69)


59/236


The iterative scheme

xk+1 = xk Hkg(xk), k = 0, 1, 2, . . . ,

together with the rank one update (1.69) equals Algorithm 1.19.Instead of assuming that Bk+1q = Bkq in any direction q orthogonal to sk,

we could also require that

Hk+1q = Hkq for qT

yk = 0.

This is, in some sense, the complement of the first method of Broyden. SinceHk+1 satisfies (1.66), it is readily seen that for this method Hk+1 is uniquelygiven by

Hk+1 = Hk + (sk Hkyk)yTk

yTk yk.

This update scheme, however, appears in practice to be unsatisfactory and iscalled the second or bad method of Broyden [8].


60/236



61/236

Chapter 2

Solving linear systems withBroydens method

One important condition for an algorithm to be a good iterative method isthat it should use a finite number of iterations to solve a system of linearequations

Ax + b = 0, (2.1)where A Rnn and b Rn. As we have seen in Section 1.2, the methodof Newton satisfies this condition, that is, it solves a system of linear equa-tions in just one single iteration step. Although computer simulations indicatethat the method of Broyden satisfies finite convergence, for a long time it wasnot possible to prove this algebraically. In 1979, fourteen years after CharlesBroyden proposed his algorithm, David Gay published a proof that Broydensmethod converges in at most 2n iteration steps for any system of linear equa-tions (2.1) where A is nonsingular [22]. In addition Gay proved under whichconditions the method of Broyden needs exactly 2n iterations.

For many examples, however, it turns out that Broydens method needs

much less iterations. In 1981, Richard Gerber and Franklin Luk [23] publishedan approach to compute the exact number of iterations that Broydens methodneeds to solve (2.1).

In this chapter, we discuss the Theorems of Gay, Section 2.1, and of Ger-ber and Luk, Section 2.2, and we give examples to illustrate the theorems. InSection 2.3, we show that the method of Broyden is invariant under unitarytransformations and in some weak sense also under nonsingular transforma-tions. This justifies that we restrict ourselves to examples where A is in Jordancanonical block form, cf. Section 4.2.

But first we start again with the problem in the one-dimensional setting.

55


62/236

56 Chapter 2. Solving linear systems with Broydens method

The one-dimensional case

Consider the function g : R R given by

g(x) = x + ,

where = 0. It is clear that Newtons method converges in one iterationstarting from any initial point x0, different from the solution x

. Indeed,

x1 = x0 g(x0)g(x0)

= x0 x0 +

=

.

an

broyden method

Documents