mast30013 techniques in operations research

1

MAST30013 Techniques in Operations Research

Newton's Method: A Comparative Analysis of Algorithmic Convergence Efficiency

T.Lee [email protected]

J.Rigby [email protected]

L.Russell [email protected]

Department of Mathematics and Statistics

University of Melbourne

mailto:[email protected]



2

Summary:

Objective

By considering a specific problem the aim of this project is to provide an example of the

implementation of traditional Newtonian Methods in multivariate minimization applications. The

effectiveness of three methods: Newton’s, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) and

Symmetric Rank 1 (SR1) are to be examined. By analysing the performance of these algorithms in

minimising a quadratic and convex function, a recommendation will be given as to the best method

to apply to such a case.

These methods cannot be applied to all multivariate functions and for a given problem a technique

may 'succeed' where others 'fail'. This success may manifest as convergence at a faster rate or in the

form of algorithmic robustness. Investigation into which conditions facilitate the successful and

efficient application of the chosen algorithms will be undertaken.

Minimization algorithms require varying degrees of further information in addition to the objective

function. Newtonian methods, in a relative sense, require more information than other common

methods like the Steepest Descent Method (SDM). The advantages and disadvantages of this further

requisite information will be examined.

Findings and Conclusions

For the specific quadratic and convex non-linear program outlined subsequently in the introduction,

it was found that Newton’s Method performed best for both the constrained and unconstrained

problem. It was computationally quicker, usually requiring less iterations to solve the problems and

achieved greater convergence accuracy than the BFGS, SR1 and SDM methods, consistent with the

outlined theory.

The ‘ideal’ nature of the specific case required further evaluation of the other methods without

consideration given to Newton’s method. For a constrained problem the SR1 method achieved non-

trivial convergence success partly attributed to the algorithms flexibility in not always choosing a

descent direction (that is, positive definiteness of the approximated hessian is not imposed).

Recommendations

Whilst theoretical convergence rates are greater for Newton’s method than for Quasi-Newton

methods and the SDM in turn, the applicability of certain algorithms is highly dependent on the

specifics of a problem. For more complex general cases the advantages of using quasi-Newtonian

methods become apparent as hessian inversion becomes more computationally taxing.

Analysis suggests that when seeking to minimize quadratic and convex nonlinear programs,

Newton's Method appears to perform better than any of the other tested methods.

3

Introduction:

The Objective Function

This project investigates the advantages and disadvantages of using three variations of Newton's

Method and contrasts their convergence efficiencies against one another as well as the Steepest

Descent Method for the general case and the specified problem:

min 𝑓(𝒙) = 𝒄𝑇𝒙 +1

2𝒙𝑇𝐻𝒙

where

𝒄 = [5.04, −59.4, 146.4, 96.6]𝑇 , 𝐻 = [

0.16 −1.2 2.4 −1.4−1.2 12.0 −27.0 16.82.4 −27.0 64.8 −42.0

−1.4 16.8 −42.0 28.0

]

A constrained case, where the objective function is subject to the constraint 𝒙𝑇𝒙 = 1, is considered

and analysed with Newtonian methods implementing an L2 penalty program. The results are

contrasted against output from the MATLAB Optimisation Tool.

In the following sections this report will detail a method of analysing the constrained and

unconstrained objective function and compare algorithmic output to analytical solutions. Results will

be contrasted with theory and meaningful conclusions made where grounded by empirical evidence.

Any results requiring further substantiation will be discussed subsequently.

Newton's Method

Newton's Method (also known as the Newton-Raphson Method) is a method for minimising an

unconstrained multivariate function. Given a starting point, this method approximates the objective

function with a second order Taylor polynomial and proceeds to minimise this approximation by

moving in the 'Newton direction'. The output is subsequently used as the new starting point and the

process is iteratively repeated (Chong and Zak, 2008).

Newton's Method seeks to increase the rate of convergence by using second order information of

the function it is operating on. This second order information is the hessian function, denoted 𝛻2𝑓.

The 'Newton direction' mentioned above is defined to be:

𝒅𝑘 ≔ −𝛻2𝑓(𝒙𝑘)−1

𝛻𝑓(𝒙𝑘)

where 𝒙𝑘 denotes a particular iterate point of the algorithm. It is defined this way such that if the

second order approximation to a given function held exactly, then it would be minimised in one step

with a step size of 1.

BFGS and SR1

Minimisation via Newton's Method requires the calculation of the gradient and hessian matrices. In

addition, the hessian must also be inverted. This inversion can be quite computationally expensive

(standard techniques are known to be 𝑂(𝑛3) )(Hauser, 2012) and may not be defined. Additionally,

4

the Newton direction is not necessarily guaranteed to be a descent direction if the hessian is not

positive definite. These two potential problems may affect the quality of any implementation of

Newton’s method and give rise to the need for quasi-Newton methods.

The Broyden-Fletcher-Goldfarb-Shanno (BFGS) and Symmetric Rank 1 (SR1) Quasi-Newton methods

have been formulated specifically in order to bypass these concerns by approximating the hessian

from successively calculated gradient vectors (Farzin and Wah, 2012). Both methods attempt to

satisfy the secant equation at each iteration:

𝒙𝑘+1 − 𝒙𝑘 = 𝐻𝑘+1(∇𝑓(𝒙𝑘+1) − ∇𝑓(𝒙𝑘))

During each iteration the approximated hessian is said to ‘updated’ in a manner dependent on the

method:

BFGS Update

𝐻𝑘+1 = 𝐻𝑘 +1 + ⟨𝒓𝑘 , 𝒈𝑘⟩

⟨𝒔𝑘 , 𝒈𝑘⟩𝒔𝑘(𝒔𝑘)𝑇 − [𝒔𝑘(𝒓𝑘)𝑇 + 𝒓𝑘(𝒔𝑘)𝑇]

where

𝒔𝑘 = 𝒙𝑘+1 − 𝒙𝑘 , 𝒈𝑘 = 𝛁f(𝒙k+1) − 𝛁f(𝒙𝑘), 𝒓𝑘 = 𝐻𝑘𝒈𝑘

⟨𝒔𝑘 , 𝒈𝑘⟩

𝐻𝑘+1 is an approximation to the inverse of the hessian. The BFGS update always satisfies the secant

equation and maintains positive definiteness of the hessian approximation (if initialized as such).

Additionally the BFGS update satisfies the useful symmetry property 𝐻𝑘+1 = 𝐻𝑘+1𝑇 . It can also be

shown that 𝐻𝑘+1 differs from its predecessor by a rank-2 matrix (Nocedal and Wright, 1999).

SR1 Update

The SR1 update is a simpler rank-1 update that also maintains the symmetry of the hessian and

seeks to (but does not always) satisfy the secant equation. (Nocedal and Wright, 1999)

𝐻𝑘+1 = 𝐻𝑘 +(∆𝒙𝑘 − 𝐻𝑘𝒚𝑘)(∆𝒙𝑘 − 𝐻𝑘𝒚𝑘)𝑇

(∆𝒙𝑘 − 𝐻𝑘𝒚𝑘)𝑇𝒚𝑘

where

𝒚𝑘 = ∇𝑓(𝒙𝑘 + ∆𝒙𝑘) − ∇𝑓(𝒙𝑘).

As with the BFGS method, 𝐻𝑘+1 is an approximation to the inverse of the hessian. This update does

not guarantee that the update be positive definite and subsequently does not ensure that following

iterations always move in descent directions. In practice, the approximated hessians generated by

the SR1 method exhibit faster converge towards the true hessian inverse than the BFGS method

(Conn, Gould and Toint, 1991). A known drawback affecting the robustness of the SR1 method is

that the denominator can vanish (Nocedal & Wright 1999). Where this is the case algorithmic

robustness can be increased simply by skipping the updating process of troublesome iteration.

5

Method:

Analysis Method

First a theoretical analysis of the function was completed. Then each of the methods of minimisation

was applied to the outlined objective function in both its constrained and unconstrained form using

MATLAB algorithms (see appendices: MATLAB). The unconstrained results from these three methods

have been compared to each other and to the Steepest Descent Method in order to draw

conclusions about the best method to apply. The constrained program results were compared to

results calculated from the MATLAB Optimisation Tool. The criteria analysed were, time taken and

number of iterations taken to converge on global minimum (within a tolerance value), accuracy of 𝒙∗

and algorithmic robustness.

Analysis of Unconstrained Objective Function

The properties of the considered problem need to be determined to draw meaningful conclusions

from output results. Knowledge about the behaviour of this function, specifically the type of

function, location and number of any stationary points, the class of stationary points and whether

they are local or global can be determined for this case.

Functions of the following form are defined as quadratic:

𝑓(𝑥) = 𝛼 + ⟨𝒄, 𝒙⟩ + 1

2⟨𝒙, 𝐵𝒙⟩ = 𝛼 + 𝒄𝑇𝒙 +

1

2𝒙𝑇𝐵𝒙

The specified function is of this form with 𝛼 = 0, 𝐵 = 𝐻 and 𝒄 = 𝑐 as detailed above. This

identification means that:

𝛻𝑓(𝒙) = 𝒄 + 𝐻𝒙

𝛻2𝑓(𝒙) = 𝐻

A particular feature of quadratic functions is that they are convex if and only if their hessian is

positive semi-definite. Calculating the eigenvalues of H (via MATLAB) reveals:

𝜆1 = 0.0066657144469

𝜆2 = 0.0591221937911

𝜆3 = 1.4840596546051

𝜆4 = 103.4101524371569

Since each eigenvalue is positive, this tells us that H is positive definite (see appendices: Proofs).

Positive definite matrices are also positive semi-definite, and so the quadratic function is also

convex.

The function under investigation is quadratic and convex, hence solvable via matrix algebra. For a

convex and at least 𝐶1 function, 𝛻𝑓(𝒙∗) = 0 if and only if 𝒙∗ is a global minimum of f. Note that by

the fact that the function is quadratic, it must be at least 𝐶1. Therefore:

𝛻𝑓(𝒙) = 0

⟺ 𝐻𝒙∗ + 𝒄 = 0

6

⟺ 𝐻𝒙∗ = −𝒄

⟺ 𝒙∗ = −𝐻−1𝒄

For the specified function:

𝐻−1 = [

100 50 33.333 2550 33.333 25 20

33.333 25 20 16.66725 20 16.667 14.286

]

And recall:

𝒄 = [

5.04−59.4146.4−96.6

]

This implies that:

𝒙∗ = − [

100 50 33.333 2550 33.333 25 20

33.333 25 20 16.66725 20 16.667 14.286

] ∗ [

5.04−59.4146.4−96.6

] = [

10

−12

]

So 𝒙∗ = [1 0 −1 2]𝑇 is the global minimum to the nonlinear function. There will be no other

minimums of this function. It is expected that the algorithms converge to this point.

Thus, the result in calculating f(x*) = -167.28.

Note that the inverse of the hessian matrix for the program is a constant matrix, that is to say that it

does not change for all elements of ℝ4. Recall that one of the potential problems with Newton's

method was that the hessian may not be invertible or positive definite. Both of these cases are not

true for this specified problem.

Implementation of Algorithms

Each algorithm under investigation has been written into MATLAB code and included in the

appendices. They each call on a common univariate line-search algorithm, the Golden Section

Search, one of the better methods in the class of robust interval reducing methods.(Arora, 2011)

Similarly, they implement an algorithm for finding an upper bound on the location of the minimum

on a half-open interval which doubles the incremented step size with each iteration.

Each algorithm has the following parameters that can be altered:

𝑥0

The starting point for the algorithm.

Tolerance 1

The stopping criteria for the particular algorithm. In all cases, this is a check of the

magnitude of the gradient vector at a particular iteration point 𝒙𝑘 against 0. If it is

‘close enough’ to zero, the algorithm will end. ‘Close enough’ is defined as the value

set for this tolerance.

7

Tolerance 2

The stopping criteria for the Golden Section Search as detailed above. This value sets

how large the interval estimate will be when the line search is complete.

T

The parameter used the Multi Variable Half Open Interval Search nested in each of

the algorithms. 2(𝑘−1)𝑇 is the increase to the upper bound during each iteration

when trying to find an interval on which the minimum of the approximation must

exist.

𝐻0

This is the ‘starting hessian’, thought of as an approximation to the inverse hessian

of the program. It is only present in the BFGS and SR1 methods.

In this paper, 𝒙0, tolerance 1 and 𝐻0 values (where appropriate) have been varied and the effects

analysed. The effect of changing tolerance 2 and the value of T have not been analysed because they

do not relate directly (see above) to the methods under investigation.

It is expected that the result of altering 𝒙0 will depend on the distance of 𝒙0 from the global

minimum. The expectation is that the closer 𝒙0 is to the minimum the less time and number of

iterations the algorithm will be expected to take to converge. Having a more strict tolerance (that is,

bringing tolerance 1 closer to 0) should result in an increased number of iterations and time taken.

The algorithm in question must get closer to the true global minimum in order to comply with the

more strict tolerance, and hence is expected to require more computational time. Additionally, the

effect of changing 𝐻0 will be expected to depend on how well 𝐻0 approximates the true hessian

inverse of the function. A better approximation (such as giving the algorithm the function’s true

hessian inverse to begin with) would be expected to reduce the amount of computation time

required for convergence. However, the idea behind the BFGS and SR1 methods is to avoid

calculating the inverse of the hessian directly. As such, the hessian inverse will not be used as a 𝐻0 in

order to simulate more realistic conditions under which these two methods might be implemented.

Constrained Case

In solving the constrained case, two approaches were taken. The first was to solve the nonlinear

program using the MATLAB Optimization Tool, specifically via the interior point and active set

algorithms. This gave the solution point as well as some data and intuition in regards to the time

taken and iterations required to solve such a constrained problem (alternate analytical approaches

could have been used to provide this reference value, see discussion). Since the algorithms

implemented by the MATLAB Optimization Tool are specifically designed to solve constrained

nonlinear programs, the expectation was that they would outperform the Newtonian algorithms

under investigation. The second method for solving the constrained case was via the L2 penalty

method. Converting the constrained nonlinear program into an unconstrained nonlinear program

allowed for the Newtonian algorithms to be implemented.

The specified constraint is:

𝒙𝑇𝒙 = 1

⟺ 𝑥12 + 𝑥2

2 + 𝑥32 + 𝑥4

2 − 1 = 0

8

The L2 penalty method requires that this constraint be converted into a penalty term and added to

the objective function. So, rather than minimizing the original objective function with the above

constraint, the function to be minimised was instead:

𝑃𝑘(𝑥) = 𝒄𝑇𝒙 +1

2𝒙𝑇𝐻𝒙 +

𝑘

2(𝑥1

2 + 𝑥22 + 𝑥3

2 + 𝑥42 − 1)2

where 𝒄 and 𝐻 are defined as above, and 𝑘 being the parameter of the penalty term.

The algorithms under investigation require the calculation of both the gradient function and the

hessian matrix. As shown above, in the unconstrained case, the hessian is a symmetric, positive

definite matrix, perfect for implementation of Newtonian methods. When the hessian matrix is

calculated in order to implement these algorithms for solving the constrained case, the positive

definiteness of the matrix is potentially lost. This is a possible cause for any non-convergence issues

arising from implementation of the Newtonian methods. For the equations for the gradient function

and hessian matrix see appendices: MATLAB.

The L2 penalty method analytically finds the minimum point by evaluating 𝒙∗ = lim𝑘→∞ 𝒙𝑘. When

solving for the minimum point numerically using the Newtonian algorithms, a small value of 𝑘 was

chosen and increased in order to simulate this limiting process. It was expected that as the value of

𝑘 increased the minimum point the algorithms found converged on the value of 𝑥∗ found by the

MATLAB Optimization tool.

Theoretical Convergence

Under specific circumstances, the various methods exhibit differing rates of convergence.

For an initial point sufficiently close to the minimum, if the hessian is positive definite, a

local/global minimum actually exists, the step size at each iteration satisfies the Armijo-

Goldstein and Wolfe conditions and f is C3, then the rate of convergence for Newton's

Method is quadratic. (see appendices for univariate proof).

Quasi-Newton methods are known to exhibit superlinear convergence under certain

circumstances:

The BFGS Method can be shown to converge to the global minimum at a superlinear

rate if the starting hessian is positive definite and the objective function is twice

differentiable and convex. (Powell, 1976)

Likewise, the SR1 method, exhibits superlinear convergence under the same

conditions. (Nocedal and Wright, 1999).

As an aside, the Steepest Descent Method is known to converge at a linear rate. In addition,

it is not adversely affected, as the Newtonian methods are known to be, by horizontal

asymptotes where divergence is sometimes observed.

The correlation of results to these theoretical rates was examined. It is expected that Newton's

Method will converge the quickest. The quasi Newtonian methods are expected to be the next

fastest methods to converge followed lastly by the Steepest Descent Method.

9

Results, Conclusions and Recommendations:

Results for Unconstrained Case

The performance of the chosen methods was analysed for the specified objective function by

choosing 10 starting points, 𝒙0 (8 randomly generated, [0 0 0 0] and the known minimum at [1 0 -1

2]).

Point 𝑥1 𝑥2 𝑥3 𝑥4

1 0 0 0 0

2 1 0 -1 2

3 0.81 0.91 0.13 0.91

4 3.16 0.49 1.39 2.73

5 9.57 -4.85 8.00 -1.42

6 14.66 -4.04 12.09 3.42

7 1.60 3.35 -16.46 18.81

8 -0.30 -7.47 -17.90 1.86

9 2.23 11.37 15.51 17.57

10 7.52

-10.79

-10.14 17.14

Table 1: List of starting points

*Starting points truncated to two decimal places.

Results shown here were generated with tolerance values of 0.01 for T and tolerances 1,2 (see

appendices for full list of all results).

Disregarding data sets that failed to return a value for x* and sets where the number of iterations

exceeded the average by >500% the following table of average values was generated:

Newton's Method Steepest Descent BFGS (hessian) BFGS (identity) SR1 (hessian) SR1 (identity)

x(1) 0.99998 1.01376 1.06218 0.94505 1.00033 0.996722222

x(2) -0.00007 0.01406 0.03947 -0.02716 0.00152 0.002833333

x(3) -1.00019 -0.9882 -0.97035 -1.01809 -0.99881 -0.996133333

x(4) 1.99986 2.00992 2.02388 1.98639 2.00088 2.003922222

f(x*) -167.28 -167.2749 -167.27892 -167.27965 -167.28 -167.2799889

Elapsed Time (s) 0.0040865 0.1965647 0.0100982 0.014235375 0.0119615 0.0125613

Iterations 9 1010 9.2 5.75 24.6 24.1

Elapsed Time per Iteration (s) 0.000454056 0.000194619 0.00109763 0.002475717 0.00048624 0.000521216

Table 2: Summary of algorithms performances (averaged values given robust algorithm implementation)

The BFGS and SR1 methods were both ran using 𝐻 (as defined in the introduction) and the 4x4

identity as 𝐻0 inputs.

All algorithms converged on the global minimum at 𝒙∗ = [1 0 − 1 2]; 𝑓(𝒙∗) = −167.28 for all

staring point except in one instance of the SR1 method initialsed with the identity matrix from point

7. However given the ‘shallowness’ of the function and the tolerances, outlying 𝒙∗ values were

occasionally generated. This ‘shallowness’ refers to, based on the results in the appendix, the

relatively small magnitude of the gradient vector at many points of the objective function close to

10

the minimum. To this end, the BFGS Method and Steepest Descent Method often stopped only part

way to the global minimum.

For a given set of parameters it was often the case that the BFGS method regularly converged faster

than Newton’s method albeit with less precision. However, for this ideal quadratic objective function

Newton’s Method was less prone to ‘getting stuck’ (see appendices data: BFGS (identity) had runs

where iteration values were 28905 and 126675) and always took the least time to converge. The

accuracy of calculated 𝒙∗ was greatest for Newton’s Method followed by SR1, BFGS and Steepest

Descent in decreasing order of accuracy. As expected the quasi-Newtonian methods each converged

faster (from these points, by an order of magnitude) than the Steepest Descent method. The

average iteration of the Steepest Descent method, however, took less time to compute than the

other methods. A cause of this could be that all other methods require operations on a 4x4 matrix

such as an inversion of a hessian or an update of the inverse hessian approximation at each

iteration, whereas the Steepest Descent Method requires only calculation of a gradient vector.

Tolerance Variation (see appendices for tables of results)

With tolerance 2 held constant at 0.01 and T held constant at 1:

For Newton's Method, the algorithm converged quickest with tolerance 1 values set at 0.0001.

When given the identity matrix for the starting iteration the efficiency of the SR1 method appeared

to generally increase as the tolerance 1 became stricter. When given the hessian of the program to

start with, there was no real discernible pattern as to what effect varying the tolerance 1 had. The

BFGS method, when given either the identity or the program's hessian to start with, behaved as

intuitively expected and computational time increased with tightened tolerances. Hence the results

varied and were not always consistent with our expectations. Further investigation and more data

would allow for greater quantitative analysis (see discussion).

With regard to the BFGS method, on occasion the value of T was altered to get the algorithm to

converge. This issue did not arise when implementing Newton’s Method or with the SR1 Method. As

T is only used in the Multi Variable Half Open Section Search portion of the algorithm, this points to

a possible incompatibility between this particular algorithm and the BFGS method under certain

conditions. Finding an alternative method for doing this task (e.g. using a step size that meets the

Armijo-Goldstein and Wolff conditions) would rectify this issue and may increase the robustness of

the BFGS method.

Summary

In summary, each method found the global minimum to the desired accuracy in a vast majority of

cases; however, Newton’s method was the most accurate method and took less computational time.

The convergence rates were mostly-consistent with theory in that the quasi-Newtonian algorithms

generally converged slower than Newton’s method and faster than the Steepest Descent Method.

Results for Constrained Case

Algorithmic performance given the original objective function constrained by 𝑥𝑇𝑥 = 1 was next

analysed. Firstly, the constrained case was solved using MATLAB’s Optimization Toolbox:

11

Interior Point Algorithm Active Set Algorithm

x(1) -0.025 -0.025

x(2) 0.311 0.311

x(3) -0.789 -0.789

x(4) 0.53 0.53

f(x*) -133.56022058 -133.56022058

Average Iterations 22.2 33.6

Elapsed Time (s) 1.05 2.01

Elapsed Time per Iteration (s) 0.047297 0.059821

Table 3: MATLAB Optimisation Tool Results

The following values were obtained using the first five starting points listed in Table 1.

To solve this constrained problem using Newton's Method and its variants, the L2 penalty method

was used. In applying the algorithms the following results were returned, with the penalty term

= 10,000,000 :

Newton's Method BFGS (Hessian) BFGS (Identity)* SR1 (Hessian) SR1 (Identity)

x(1) -0.02477 N/A -0.02477 -0.02477 -0.02477

x(2) 0.31073 N/A 0.31073 0.31073 0.31073

x(3) -0.78876 N/A -0.78876 -0.78876 -0.78876

x(4) 0.52980 N/A 0.52980 0.52980 0.52980

f(x*) -133.56022894 N/A -133.56022894 -133.5603044 -133.5603044

Elapsed Time (s) 0.2051092 N/A 0.112444 2.104064 1.678844

Iterations 169 N/A 111 2037 659

Elapsed Time per Iteration (s)

0.001214 N/A 0.001013 0.001033 0.002546

Table 4: Newtonian Methods Constrained Problem Results

*BFGS (identity) only returned results for the [0 0 0 0] starting point. As such, its results are not

averaged. The rest of the results are averaged data returned for the five starting points.

The BFGS algorithm’s (see appendices: MATLAB) L2 implementation was especially fragile given this

constraint. It did not return any results when given the program's hessian for the staring iteration

and only found the minimum once when given the identity as a starting hessian. In that one case

where it found a minimum, the BFGS method found the same minimum as the other two algorithms,

closely matching MATLAB optimization output.

Slightly more robust was the SR1 method. It managed to find the minimum from more starting

points than the BFGS method did, although not from all starting points. This slightly increased

robustness did come at a cost however, with the SR1 method requiring a very large number of

iterations and a longer timeframe in which to operate, making it more computationally expensive to

use. The SR1 update does not impose positive definiteness on the updated hessian and this fact may

have contributed at the algorithms increased success rate when compared to the BFGS method.

In general, the relative reduced robustness of the BFGS and SR1 implementations, given the

specified problem, possibly stems from the fact that they are both quasi-Newton Methods derived

from the Secant Method. As quasi-Newton Methods are designed to avoid the computationally

expensive hessian inversion, the hessian is instead approximated using finite differences of the

function gradient, and data is interpolated with iterations (Indiana University, 2012). As Quasi-

12

Newton methods are multivariate generalisations of the secant method, the same problem exists for

both methods – namely, that if the initial values used are not close enough to x*, the methods may

fail to converge entirely. (Ohio University, 2012)

In contrast, Newton's Method performed exceptionally well. It found the minimum from all starting

points and did so relatively quickly in terms of both time and number of iterations.

Compared to the MATLAB optimization algorithms, all of the Newtonian algorithms took more

iterations to converge, as expected. Surprisingly, Newton's Method was able to outperform

MATLAB’s optimization algorithms with regard to speed.

Hessian Analysis (both programs)

For both the constrained and unconstrained cases, the BFGS Method converged in less iterations

and more accurately when it was started with the identity matrix as the inverse hessian

approximation. It was also more robust when starting with the identity, always converging in the

unconstrained case and at least finding the global minimum once in the constrained case. The

method was quicker in terms of elapsed time when given the program's hessian to start with. It is

therefore recommended that the identity be used as 𝐻0 when minimising a function such as this via

the BFGS Method.

In the unconstrained case, the difference in choosing 𝐻0 as the starting inverse hessian

approximation for the SR1 Method was negligible. There is no significant difference in the time

taken, accuracy or average number or iterations required to warrant recommending one particular

𝐻0 over the other. In the constrained case, whilst there may be no difference between the results in

terms of accuracy, there is a more pronounced disparity between iterations and time taken. Starting

with the function's hessian caused the SR1 Method to take nearly three times as many iterations and

almost a 20% longer timeframe. Therefore, using the identity matrix for 𝐻0 for this constrained case

is a much better alternative.

As was noted earlier, the hessian of this nonlinear function is a constant 4x4 matrix regardless of the

algorithm's current 𝒙𝑘. This means that it is not very computationally taxing to compute the hessian

and it only needs to be computed once. This deals with the problem of the hessian inversion being

computationally expensive. Since it is known that the hessian is positive definite, so too is the

inverse of the hessian. Thus, the Newton Direction will always be a descent direction. Whilst this is

only true for this function (and functions of similar forms), it means that Newton's Method behaved

very well in this particular problem.

Summary

Given an ideal function such as this, that is to say a quadratic and convex nonlinear program, based

on the above analysis, Newton's Method outperformed both of its variants (BFGS and SR1) and the

Steepest Descent Method. With the problem formulated using the L2 penalty method it is the best

algorithm to use for such a program.

http://www.math.ohiou.edu/courses/math3600/lecture6.pdf

13

Discussion:

The objective function analysed by this project was particularly suited to minimization via Newton’s

Method. For other programs, especially non-convex and non-quadratic ones, the results obtained by

this paper may not hold. The BFGS and SR1 method were formulated precisely because the relative

effectiveness of Newton’s method diminishes with increasing complexity.

In addition, the methodology implemented one class of many available algorithms which specifically

used the Golden Section Search in conjunction with a particular open interval search algorithm. A

variety of methods could have also been used to determine an appropriate step size to move during

each iteration. For example, step sizes satisfying the Armijo-Goldstein and Wolfe conditions would

be an appropriate choice. Hence, a whole family of dissimilar results could have been generated

from the same starting points using different algorithms which could just as easily be considered

Newtonian.

The analysis was very origin-centric in that the starting points were all within a relatively similar

distance from [0 0 0 0]𝑇 and hence [1 0 −1 2]𝑇 and [−0.025 0.311 − 0.789 0.53]𝑇 , the

global minimums. Analysis from starting iterations further from the minimums should yield results

consistent with those generated by this report; further investigation is needed.

As discussed in the previous section, in such a nonlinear program as this, the inverse of the hessian

needs to only be calculated once. As long as it is known the hessian does not change for any point in

ℝ4 and inversion of that constant hessian matrix is computationally feasible, the coding for

Newton's Method used here could be adjusted to remove the evaluation and inversion of the

hessian at each iteration. Such a change would result in less calculations per iteration speeding up

the algorithm. The results returned for this particular case would be even better if such an

adjustment was made. The drawback of doing so is that the adjusted method could only be applied

to cases where the hessian is a constant matrix, severely restricting its applicability.

For the constrained case an analytical solve of using the KKT method would have been possible

albeit complicated and not solvable by simple linear algebra operations due to the quadratic nature

of the constraint. If this project had gone down this path instead of utilizing MATLAB’s optimization

tools an exact value of 𝒙∗ could have been used as a point of reference.

The shortcomings of the BFGS algorithm’s implementation of the L2 penalty method requires further

analysis and perhaps troubleshooting.

Finally, the analysis of varying the tolerances of the algorithms used by this report could have been

furthered with more systematically obtained data. It would be expected that for a general case

decreasing the ‘strictness’ of the major tolerance would decrease computational time taken (this

was not always the case: see results). By way of contrast, varying the tolerances of the open interval

search and golden section search would have been expected to exhibit different effects for different

starting points and for different problems. To optimize an algorithm a balance must be struck

between accuracy and time taken to generate an appropriate step length. Hence, ideal tolerance

values exist for different algorithm, different starting points and for each iteration. Further

investigation may reveal common properties of these ideal tolerances given the algorithms used.

14

References:

Arora, J. (2011). Introduction to Optimum Design [electronic resource]. p.42 Burlington Elsevier

Science.

Chong, E. Zak, S. (2008). An Introduction To Optimization 3rd Edition. pp. 155-156. John Wiley and

Sons.

Conn, A., Gould, N. and Toint, P. (1991). "Convergence of quasi-Newton matrices generated by the

symmetric rank one update". Mathematical Programming (Springer Berlin/ Heidelberg) 50 (1): pp.

177–195.

Farzin, K. and Wah, L. (2012). On the performance of a new symmetric rank-one method with restart

for solving unconstrained optimization problems. Computers and Mathematics with Applications,

Volume 64, Issue 6, September 2012, pp. 2141-2152,

http://www.sciencedirect.com/science/article/pii/S089812211200449X

Hauser, K. (2012). Lecture 6: Multivariate Newton's Method and Quasi-Newton Methods. p. 5.

Available online: http://homes.soic.indiana.edu/classes/spring2012/csci/b553-

hauserk/newtons_method.pdf

Indiana University. (2012). Lecture 6: Multivariate Newton's Method and Quasi-Newton methods.

Available online: http://homes.soic.indiana.edu/classes/spring2012/csci/b553-

hauserk/newtons_method.pdf

Indian Institute of Technology. (2002). Convergence of Newton-Raphson method. Available online:

http://ecourses.vtu.ac.in/nptel/courses/Webcourse-contents/IIT-

KANPUR/Numerical%20Analysis/numerical-analysis/Rathish-kumar/ratish-1/f3node7.html

Nocedal, J. and Wright, S.J. (1999). Numerical Optimization, pp. 220, 144.

Ohio University (2012). Lecture 6: Secant Methods. Available online:

http://www.math.ohiou.edu/courses/math3600/lecture6.pdf

Powell, M. (1976). 'Superlinear convergence Some global convergence properties of a variable metric algorithm for minimization without exact line searches', Nonlinear Programming, Vol 4, Society for Industrial and Applied Mathematics, p. 53

15

Appendices:

Proofs

Positive Eigenvalues imply invertiblity of matrix:

𝐷𝑒𝑓𝑖𝑛𝑒 𝑎 𝑝𝑜𝑙𝑦𝑛𝑜𝑚𝑖𝑎𝑙 𝑝(𝑡) = (𝑡 − 𝜆1)(𝑡 − 𝜆2) … (𝑡 − 𝜆𝑛)

𝑇ℎ𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑡𝑒𝑟𝑚: (−1)𝑛𝜆1𝜆2 … 𝜆𝑛

𝐿𝑒𝑡 𝑝(𝑡) = det (𝑡𝐼 − 𝐴)

𝑊ℎ𝑒𝑟𝑒 𝐴 𝑖𝑠 𝑎 𝑠𝑞𝑢𝑎𝑟𝑒 𝑚𝑎𝑡𝑟𝑖𝑥: 𝑝(0) = 𝑑𝑒𝑡(−𝐴) = (−1)𝑛det (𝐴)

det(𝐴) = 𝜆1𝜆2 … 𝜆𝑛

if 𝜆1, 𝜆2,, … , 𝜆𝑛 > 0

det(𝐴) ≠ 0

𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝐴 𝑖𝑠 𝑖𝑛𝑣𝑒𝑟𝑡𝑖𝑏𝑒

Newton’s method Converges Quadratically for Univariate Case:

𝐿𝑒𝑡 𝑥𝑖 𝑏𝑒 𝑎 𝑟𝑜𝑜𝑡 𝑜𝑓 𝑓(𝑥) = 0

𝐿𝑒𝑡 𝑥𝑛 𝑏𝑒 𝑎𝑛 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑥𝑖: |𝑥𝑖 − 𝑥𝑛| = 𝜀 < 1

𝐵𝑦 𝑇𝑎𝑦𝑙𝑜𝑟 𝑠𝑒𝑟𝑖𝑒𝑠 𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛:

0 = 𝑓(𝑥𝑖) = 𝑓(𝑥𝑛 + 𝜀) = 𝑓(𝑥𝑛) + 𝑓′(𝑥𝑛)( 𝑥𝑖 − 𝑥𝑛) + 0.5𝑓′′(𝜉)( 𝑥𝑖 − 𝑥𝑛)2

𝐹𝑜𝑟 𝑠𝑜𝑚𝑒 𝜉 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑥𝑖, 𝑥𝑛

𝐹𝑜𝑟 𝑁𝑒𝑤𝑡𝑜𝑛′𝑠 𝑚𝑒𝑡ℎ𝑜𝑑: −𝑓′(𝑥𝑛)(𝑥𝑛+1 − 𝑥𝑛) = 𝑓(𝑥𝑛)

𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒: 0 = 𝑓′(𝑥𝑛)(𝑥𝑖 − 𝑥𝑛+1) + 0.5𝑓′′(𝜉)( 𝑥𝑖 − 𝑥𝑛)2

( 𝑥𝑖 − 𝑥𝑛), (𝑥𝑖 − 𝑥𝑛+1) 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚𝑠 𝑓𝑜𝑟 𝑠𝑢𝑐𝑒𝑠𝑠𝑖𝑣𝑒 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠

(𝑥𝑖 − 𝑥𝑛+1) ∝ ( 𝑥𝑖 − 𝑥𝑛)2

Q.E.D.

(Indian Institute of Technology, 2002)

16

Data

All tolerance values 0.01:

Newton's Method:

x* f(x*) Elapsed Time (s) Iterations

1.0000 -0.0000 -1.0000 2.0000 -167.28 0.009944 10

1 0 -1 2 -167.28 0.000134 0

1.0000 -0.0001 -1.0001 2.0001 -167.28 0.004207 9

0.9999 -0.0000 -1.0002 2.0000 -167.28 0.001905 9

1.0002 -0.0001 -1.0002 1.9999 -167.28 0.00208 10

0.9999 0.0000 -1.0001 2.0000 -167.28 0.005061 11

1.0000 0.0001 -1.0004 1.9995 -167.28 0.003064 10

1.0000 0.0001 -0.9999 2.0000 -167.28 0.005054 11

0.9999 -0.0008 -1.0011 1.9990 -167.28 0.004321 9

0.9999 0.0001 -0.9999 2.0001 -167.28 0.005095 11

Steepest Descent:


0.2126 -0.4083 -1.2812 1.7836 -167.2769 0.00655 34

1 0 -1 2 -167.28 0.000116 0

1.1539 0.1993 -0.8189 2.1600 -167.2789 0.038846 106

2.1659 0.6656 -0.5248 2.3718 -167.2728 0.170743 798

1.7844 0.3563 -0.7776 2.1590 -167.2768 0.077666 418

2.1659 0.6648 -0.5257 2.3709 -167.2728 0.481597 2600

-0.1767 -0.6709 -1.4786 1.6256 -167.2727 0.274945 1399

-0.1749 -0.6699 -1.4779 1.6262 -167.2727 0.273263 1332

2.1767 0.6709 -0.5213 2.3744 -167.2727 0.303484 1575

-0.1702 -0.6672 -1.4760 1.6277 -167.2727 0.338437 1838

BFGS (hessian for starting iterate):


0.1915 -0.4173 -1.2823 1.7865 -167.2767 0.004697 6

1 0 -1 2 -167.28 0.000144 0

1.2740 0.1562 -0.8887 2.0870 -167.2796 0.002729 7

0.9981 0.0014 -0.9982 2.0018 -167.28 0.004855 10

2.1585 0.6613 -0.5280 2.3692 -167.2729 0.006476 7

1.0040 -0.0019 -1.0030 1.9967 -167.28 0.02334 11

0.9987 0.0005 -0.9991 2.0011 -167.28 0.016815 14

0.9946 -0.0028 -1.0020 1.9983 -167.28 0.016214 13

1.0029 -0.0024 -1.0021 1.9982 -167.28 0.010847 10

0.9995 -0.0003 -1.0001 2.0000 -167.28 0.014865 14

17

BFGS (identity for starting iterate):


0.2028 -0.4142 -1.2815 1.7865 -167.2768 0.001249 3

1 0 -1 2 -167.28 0.000297 0

1.2434 0.1361 -0.9034 2.0753 -167.2797 0.005207 5

1.0024 0.0016 -0.9992 2.0004 -167.28 0.003858 7

1.0118 0.0074 -0.9948 2.0039 -167.28 3.954961 28905

0.9991 -0.0006 -1.0005 1.9996 -167.28 0.079164 9

0.9976 -0.0041 -1.0046 1.9955 -167.28 0.004674 7

0.9953 -0.0024 -1.0012 1.9992 -167.28 0.008941 7

1.0007 0.0009 -0.9994 2.0004 -167.28 0.010493 8

0.9974 0.0037 -0.9963 2.0031 -167.28 16.554056 126675

SR1 (hessian for starting iterate):


1.0001 0.0001 -0.9999 1.9999 -167.28 0.011665 14

1 0 -1 2 -167.28 0.000299 0

1.0000 0.0000 -1.0000 2.0000 -167.28 0.007054 21

0.9985 -0.0008 -1.0003 2.0000 -167.28 0.013603 20

1.0173 0.0085 -0.9961 2.0015 -167.28 0.015117 19

0.9996 0.0115 -0.9892 2.0093 -167.28 0.007496 19

1.0000 0.0000 -1.0000 2.0000 -167.28 0.012073 37

0.9812 -0.0119 -1.0090 1.9927 -167.28 0.028734 52

1.0066 0.0056 -0.9956 2.0038 -167.28 0.013331 39

1.0000 0.0022 -0.9980 2.0016 -167.28 0.010243 25

SR1 (identity for starting iterate):


1.0014 -0.0007 -0.9997 2.0009 -167.28 0.023934 17

1 0 -1 2 -167.28 0.000314 0

1.0004 0.0001 -1.0001 1.9997 -167.28 0.015986 25

1.0053 0.0051 -0.9963 2.0025 -167.28 0.009545 30

0.9998 -0.0005 -1.0002 2.0000 -167.28 0.012679 37

1.0000 -0.0000 -1.0000 2.0000 -167.28 0.01236 34

0.9723 0.0241 -0.9692 2.0305 -167.2799 0.003297 9

NaN NaN NaN NaN NaN 0.006176 7

0.9991 -0.0003 -1.0002 1.9998 -167.28 0.013616 41

0.9922 -0.0023 -0.9995 2.0019 -167.28 0.027706 41

18

Newton's Method

x0 Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations

[0 0 0 0] 0.1 0.01 1 [ NaN -Inf -Inf -Inf] NaN 0.3685380 353

[0 0 0 0] 0.01 0.01 1 [1 0 -1 2] -167.279999996088 0.0294290 4

[0 0 0 0] 0.001 0.01 1 [1 0 -1 2] -167.279999999938 0.0160210 3

[0 0 0 0] 0.0001 0.01 1 [1 0 -1 2] -167.279999999976 0.0058770 2

[0 0 0 0] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0072560 2

[0 0 0 0] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0104300 3

[1 0 -1 2] 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0003140 0

[1 0 -1 2] 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001230 0

[1 0 -1 2] 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000790 0

[1 0 -1 2] 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000750 0

[1 0 -1 2] 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000740 0

[1 0 -1 2] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0

[0.81 0.91 0.13 0.91] 0.1 0.01 1 [NaN NaN NaN NaN] NaN 0.3632520 353

[0.81 0.91 0.13 0.91] 0.01 0.01 1 [1 0 -1 2] -167.279999998378 0.0060940 4

[0.81 0.91 0.13 0.91] 0.001 0.01 1 [1 0 -1 2] -167.279999999974 0.0052550 3

[0.81 0.91 0.13 0.91] 0.0001 0.01 1 [1 0 -1 2] -167.279999999990 0.0043530 2

[0.81 0.91 0.13 0.91] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0052190 2

[0.81 0.91 0.13 0.91] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0058110 3

[3.16 0.49 1.39 2.73] 0.1 0.01 1 [NaN NaN NaN NaN] NaN 0.3621590 353

[3.16 0.49 1.39 2.73] 0.01 0.01 1 [1 0 -1 2] -167.279999997557 0.0058490 4

[3.16 0.49 1.39 2.73] 0.001 0.01 1 [1 0 -1 2] -167.279999999961 0.0052710 3

[3.16 0.49 1.39 2.73] 0.0001 0.01 1 [1 0 -1 2] -167.279999999985 0.0043780 2

19

[3.16 0.49 1.39 2.73] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0051530 2

[3.16 0.49 1.39 2.73] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0116540 2

[9.57 -4.85 -8.00 -1.42] 0.1 0.01 1 [NaN NaN NaN NaN] NaN 0.3284300 352

[9.57 -4.85 -8.00 -1.42] 0.01 0.01 1 [1 0 -1 2] -167.279999995272 0.0051650 4

[9.57 -4.85 -8.00 -1.42] 0.001 0.01 1 [1 0 -1 2] -167.279999999925 0.0045340 3

[9.57 -4.85 -8.00 -1.42] 0.0001 0.01 1 [1 0 -1 2] -167.279999999971 0.0037930 2

[9.57 -4.85 -8.00 -1.42] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0045140 2

[9.57 -4.85 -8.00 -1.42] 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.0072210 3

20

SR1 Method

x0 Starting Hessian Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations

[0 0 0 0] Identity 0.1 0.01 1 [1 0 -1 2] -167.279999128240 0.1688580 89

[0 0 0 0] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999575407 0.0692100 39

[0 0 0 0] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999999853 0.0587820 26

[0 0 0 0] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.1066900 38

[0 0 0 0] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999973 0.0380570 10

[0 0 0 0] Identity 0.000001 0.01 1 [NaN NaN NaN NaN] NaN 0.0307350 9

[0 0 0 0] Program's Hessian 0.1 0.01 1 [1.42 0.12 -0.94 2.05] -167.278456189735 0.0620180 31

[0 0 0 0] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999919554 0.0949530 57





[1 0 -1 2] Identity 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0001780 0

[1 0 -1 2] Identity 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0000800 0

[1 0 -1 2] Identity 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0

[1 0 -1 2] Identity 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0

[1 0 -1 2] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0

[1 0 -1 2] Identity 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000690 0

[1 0 -1 2] Program's Hessian 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0000880 0






[0.81 0.91 0.13 0.91] Identity 0.1 0.01 1 [1.40 0.52 -0.52 2.43] -167.272369332640 0.1391750 94

[0.81 0.91 0.13 0.91] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999927619 0.0512890 30

21

[0.81 0.91 0.13 0.91] Identity 0.001 0.01 1 [NaN NaN NaN NaN] NaN 0.0337580 17

[0.81 0.91 0.13 0.91] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.0577020 26

[0.81 0.91 0.13 0.91] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999955 0.0280640 10

[0.81 0.91 0.13 0.91] Identity 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0415910 13

[0.81 0.91 0.13 0.91] Program's Hessian 0.1 0.01 1 [1.03 .01 -0.99 2.00] -167.279983904904 0.2144600 156

[0.81 0.91 0.13 0.91] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999620384 0.0843110 46





[3.16 0.49 1.39 2.73] Identity 0.1 0.01 1 [1 0 -1 2] -167.279998954778 0.1396510 89

[3.16 0.49 1.39 2.73] Identity 0.01 0.01 1 [NaN, NaN, NaN, NaN] NaN 0.0152440 9



[3.16 0.49 1.39 2.73] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999990 0.0244790 9








[9.57 -4.85 -8.00 -1.42] Identity 0.1 0.01 1 [0.35 -0.05 -0.92 2.12] -167.272798242689 0.0848760 72

[9.57 -4.85 -8.00 -1.42] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999999861 0.0446150 31

[9.57 -4.85 -8.00 -1.42] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999953795 0.0294250 17

[9.57 -4.85 -8.00 -1.42] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999996731 0.0385920 15

[9.57 -4.85 -8.00 -1.42] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999715 0.0541070 22

[9.57 -4.85 -8.00 -1.42] Identity 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.0781950 31

[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.1 0.01 1 [1.03 0.02 -0.99 2.00] -167.279959843584 0.0323530 23

22

[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999991203 0.0652370 41





23

Steepest Descent Method

x0 Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations

[0 0 0 0] 0.1 0.01 1 [-Inf NaN NaN NaN] NaN 0.8100010 377

[0 0 0 0] 0.01 0.01 1 [0.22 -0.41 -1.28 1.78] -167.276916941098 0.2438320 196

[0 0 0 0] 0.001 0.01 1 [0.89 -0.06 -1.05 1.96] -167.279933780104 8.3233250 6416

[0 0 0 0] 0.0001 0.01 1 [0.99 -0.01 -1 2] -167.279999267465 3.8602080 2417

[0 0 0 0] 0.00001 0.01 1 [1 0 -1 2] -167.279999997389 22.7155470 12213

[0 0 0 0] 0.000001 0.01 1 [1 0 -1 2] -167.279999999942 18.2888800 8432

[1 0 -1 2] 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0011080 0

[1 0 -1 2] 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001110 0

[1 0 -1 2] 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000690 0

[1 0 -1 2] 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000660 0

[1 0 -1 2] 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000660 0

[1 0 -1 2] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000630 0

[0.81 0.91 0.13 0.91] 0.1 0.01 1 [-Inf NaN NaN NaN] NaN 0.3082390 377

[0.81 0.91 0.13 0.91] 0.01 0.01 1 [1.16 0.19 -0.83 2.15] -167.279100661797 1.0436020 984

[0.81 0.91 0.13 0.91] 0.001 0.01 1 [1.12 0.07 -0.95 2.04] -167.279928943921 1.9716170 1514

[0.81 0.91 0.13 0.91] 0.0001 0.01 1 [1 0 -1 2] -167.279999266127 2.4358950 1833

[0.81 0.91 0.13 0.91] 0.00001 0.01 1 [1 0 -1 2] -167.279999999681 1.9436270 999

[0.81 0.91 0.13 0.91] 0.000001 0.01 1 [1 0 -1 2] -167.279999999926 5.8643590 2553

[3.16 0.49 1.39 2.73] 0.1 0.01 1 [-Inf NaN NaN NaN] NaN 0.3068300 377

[3.16 0.49 1.39 2.73] 0.01 0.01 1 [2.02 0.58 -0.59 2.32] -167.274489679393 6.4725080 5980

[3.16 0.49 1.39 2.73] 0.001 0.01 1 [1.11 0.061 -0.96 2.03] -167.279938709635 17.6293290 15514

[3.16 0.49 1.39 2.73] 0.0001 0.01 1 [1.01 0.01 -1 2] -167.279999278511 8.2242300 5651

[3.16 0.49 1.39 2.73] 0.00001 0.01 1 [1 0 -1 2] -167.279999997974 1.9071060 1631

[3.16 0.49 1.39 2.73] 0.000001 0.01 1 [1 0 -1 2] -167.279999999979 4.4262350 1784

[9.57 -4.85 -8.00 -1.42] 0.1 0.01 1 [-Inf Inf NaN NaN] NaN 0.3026670 376

[9.57 -4.85 -8.00 -1.42] 0.01 0.01 1 [1.75 0.35 -0.77 2.16] -167.277168000179 3.8968330 3948

[9.57 -4.85 -8.00 -1.42] 0.001 0.01 1 [1.12 0.07 -0.95 2.04] -167.279928743320 5.0956410 3914

24

[9.57 -4.85 -8.00 -1.42] 0.0001 0.01 1 [1 0 -1 2] -167.279999264172 3.6328800 2661

[9.57 -4.85 -8.00 -1.42] 0.00001 0.01 1 [1 0 -1 2] -167.279999998846 25.4060450 16377

[9.57 -4.85 -8.00 -1.42] 0.000001 0.01 1 [1 0 -1 2] -167.279999999938 4.1823240 C

25

BFGS Method

x0 Starting Hessian Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations

[0 0 0 0] Identity 0.1 0.01 1 [0.22 -0.41 -1.28 1.79] -167.276910497404 0.0077950 5

[0 0 0 0] Identity 0.01 0.01 1 [0.61 -0.31 -1.26 1.78] -167.278342371136 0.0092760 5

[0 0 0 0] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999999944 0.0167320 6

[0 0 0 0] Identity 0.0001 0.01 0.0001 [1 0 -1 2] -167.279999999619 0.0260180 7

[0 0 0 0] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999982 0.0232830 6

[0 0 0 0] Identity 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.0269040 6

[0 0 0 0] Program's Hessian 0.1 0.01 1 [0.19 -0.41 -1.28 1.78] -167.276642869312 0.0094350 7

[0 0 0 0] Program's Hessian 0.01 0.01 1 [0.19 -0.42 -1.28 1.79] -167.276717196021 0.0057380 4


[0 0 0 0] Program's Hessian 0.0001 0.01 0.00000001 [NaN, NaN, NaN, NaN] NaN 0.0415680 13

[0 0 0 0] Program's Hessian 0.00001 0.01 0.1 [1 0 -1 2] -167.279999999938 0.0834650 22

[0 0 0 0] Program's Hessian 0.000001 0.01 0.00000001 [NaN NaN NaN NaN] NaN 1.4213690 2330

[1 0 -1 2] Identity 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0002520 0

[1 0 -1 2] Identity 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001150 0

[1 0 -1 2] Identity 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0

[1 0 -1 2] Identity 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0

[1 0 -1 2] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000680 0

[1 0 -1 2] Identity 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000670 0







[0.81 0.91 0.13 0.91] Identity 0.1 0.01 1 [1.05 0.30 -0.69 2.29] -167.275083269658 0.0100910 4

[0.81 0.91 0.13 0.91] Identity 0.01 0.01 1 [1.24 0.14 -0.90 2.08] -167.279686990669 0.0090960 5

26

[0.81 0.91 0.13 0.91] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999999376 0.0185580 7

[0.81 0.91 0.13 0.91] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999860 0.0183590 6

[0.81 0.91 0.13 0.91] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0260800 8

[0.81 0.91 0.13 0.91] Identity 0.000001 0.01 0.01 [1 0 -1 2] -167.279999999999 0.1026530 43

[0.81 0.91 0.13 0.91] Program's Hessian 0.1 0.01 1 [1.02 0.32 -0.66 2.33] -167.273452580260 0.0098400 7

[0.81 0.91 0.13 0.91] Program's Hessian 0.01 0.01 1 [1.28 0.16 -0.89 2.09] -167.279598023217 0.0122180 7


[0.81 0.91 0.13 0.91] Program's Hessian 0.0001 0.01 0.000001 [NaN NaN NaN NaN] NaN 1.0398700 1601


[0.81 0.91 0.13 0.91] Program's Hessian 0.000001 0.01 0.000000001 [NaN NaN NaN NaN] NaN 0.0878040 78

[3.16 0.49 1.39 2.73] Identity 0.1 0.01 1 [ 2.95 0.95 -0.37 2.47] -167.260921728458 0.0148570 7

[3.16 0.49 1.39 2.73] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999971584 0.0201850 8

[3.16 0.49 1.39 2.73] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999998218 0.0196480 7

[3.16 0.49 1.39 2.73] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999995 0.0273770 9

[3.16 0.49 1.39 2.73] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999995 0.0163080 5

[3.16 0.49 1.39 2.73] Identity 0.000001 0.01 0.0001 [1 0 -1 2] -167.279999999999 0.0242500 5

[3.16 0.49 1.39 2.73] Program's Hessian 0.1 0.01 1 [3 1.49 0.19 2.99] -167.244594503201 0.0097640 7





[3.16 0.49 1.39 2.73] Program's Hessian 0.000001 0.01 1 [NaN NaN NaN NaN] NaN 32.2398260 11663

[9.57 -4.85 -8.00 -1.42] Identity 0.1 0.01 1 [ 2.00 0.58 -0.58 2.33] -167.274520285135 0.0099090 6

[9.57 -4.85 -8.00 -1.42] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999526260 0.0190540 8

[9.57 -4.85 -8.00 -1.42] Identity 0.001 0.01 0.1 [1 0 -1 2] -167.279999999709 0.0216050 7

[9.57 -4.85 -8.00 -1.42] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.0256380 8

[9.57 -4.85 -8.00 -1.42] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0312400 9

[9.57 -4.85 -8.00 -1.42] Identity 0.000001 0.01 0.1 [1 0 -1 2] -167.279999999999 0.0293100 7


27






28

Newton's Method For k = 10,000,000 All tolerances at 0.01

T set to 1 (adjusted as necessary to get convergence)

Starting Point xmin fmin Iterations Elapsed Time (s)

[0 0 0 0] [-0.024771532289335 0.310732926395592 -0.788760177923239 0.529801104907597] -133.560228943395 39 0.03467

[1 0 -1 2] [-0.024771837343072 0.310733028722944 -0.788760301760888 0.529800846242057] -133.560228943387 143 0.253257

[0.81 0.91 0.13 0.91] [-0.024771522265604 0.310732911352949 -0.788760170844310 0.529801124775638] -133.560228943395 173 0.183464

[3.16 0.49 1.39 2.73] [-0.024771527241936 0.310732906803061 -0.788760174684506 0.529801121482055] -133.560228943395 240 0.243349

[9.57 -4.85 -8.00 -1.42] [-0.024771536889997 0.310732912478593 -0.788760169890084 0.529801124846407] -133.560228943395 250 0.310806

BFGS ID

[0 0 0 0] [-0.024770871284853 0.310733636611044 -0.788758902260001 0.529802618520184] -133.560228943197 111 0.112444

[1 0 -1 2] Refused to return an answer

[0.81 0.91 0.13 0.91] Refused to return an answer


[9.57 -4.85 -8.00 -1.42] Refused to return an answer

BFGS Hess

[0 0 0 0] Refused to return an answer

[1 0 -1 2] Refused to return an answer




SR1 ID

[0 0 0 0] [-0.024771567761620 0.310733441325913 -0.788761623923610 0.529802126005156] -133.560304375650 378 1.013921

[1 0 -1 2] [-0.024771544747263 0.310733182847815 -0.788761385804056 0.529802633470826] -133.560304375634 819 2.392423

[0.81 0.91 0.13 0.91] [-0.024777470366075 0.310725229879482 -0.788760807092803 0.529807865053035] -133.560304368405 619 1.81805


[9.57 -4.85 -8.00 -1.42] [-0.024770231258035 0.310730257079946 -0.788765912375838 0.529797671405605] -133.560304373593 821 1.490982

29

SR1 Hess

[0 0 0 0] [-0.024773909306389 0.310738355421821 -0.788757020809805 0.529805987358458] -133.560304372956 1976 1.343549

[1 0 -1 2] [-0.024775941428827 0.310732519348041 -0.788760367888054 0.529804332231798] -133.560304374565 1734 2.006957

[0.81 0.91 0.13 0.91] [-0.024769720656432 0.310730507404563 -0.788765053684288 0.529798826909580] -133.560304374229 2254 2.432939

[3.16 0.49 1.39 2.73] [-0.024767341442569 0.310733094291290 -0.788764366771071 0.529798443639664] -133.560304374047 2184 2.63281


30

MATLAB

Objective Function File: f.m

function val = f(x) % This m-file is the objective function for our unconstrained % nonlinear program.

% Definitions. c = [5.04; -59.4; 146.4; -96.6]; hessian = [ 0.16 -1.2 2.4 -1.4; -1.2 12 -27 16.8; 2.4 -27 64.8 -42; -1.4 16.8 -42 28]; xs = [x(1); x(2); x(3); x(4)];

val = c' * xs + 0.5 * xs' * hessian * xs;

end

Gradient Function File: gradf.m

function grad = gradf(x) % This is the gradient function of our objective function, f.m.

% Definitions. c = [5.04 -59.4 146.4 -96.6]; hessian = [ 0.16 -1.2 2.4 -1.4; -1.2 12 -27 16.8; 2.4 -27 64.8 -42; -1.4 16.8 -42 28]; xs = [x(1) x(2) x(3) x(4)];

grad = c + xs * hessian;

end

Hessian Function File: hessf.m

function hessian = hessf(x) % The hessian of our objective function, f.m.

% Note the the hessian is independent of x. hessian = [ 0.16 -1.2 2.4 -1.4; -1.2 12 -27 16.8; 2.4 -27 64.8 -42; -1.4 16.8 -42 28];

end

Objective Function File, Penalty Method: fpen.m function val = f(x,k) % The objective function implemented with the L2 penalty method.

31

% Evaluate with increasing values of k to simulate evaluating the limit % as k approaches infinity.

% Ensure that k is the same for fpen, gradfpen and hessfpen.

% Definitions k = 10000000; c = [5.04 -59.4 146.4 -96.6]; hessian = [ 0.16 -1.2 2.4 -1.4; -1.2 12 -27 16.8; 2.4 -27 64.8 -42; -1.4 16.8 -42 28]; xs = [x(1); x(2); x(3); x(4)]; constraint = 1-(x(1))^2-(x(2))^2-(x(3))^2-(x(4))^2;

% Function val = c * xs + 0.5 * xs' * hessian * xs + (k/2)*(constraint)^2;

end

Gradient Function File, Penalty Method: gradfpen.m function grad = gradf(x,k) % The gradient function implemented with the L2 penalty method. % Evaluate with increasing values of k to simulate evaluating the limit % as k approaches infinity.

% Ensure that k is the same for fpen, gradfpen and hessfpen.

%Definitions k = 10000000; g = ((x(1))^2+(x(2))^2+(x(3))^2+(x(4))^2-1); c = [5.04 -59.4 146.4 -96.6]; hessian = [ 0.16 -1.2 2.4 -1.4; -1.2 12 -27 16.8; 2.4 -27 64.8 -42; -1.4 16.8 -42 28]; xs = [x(1) x(2) x(3) x(4)];

% Function grad =c + xs * hessian + [2*k*x(1)*g; 2*k*x(2)*g; 2*k*x(3)*g; 2*k*x(4)*g]';

end

Hessian Function File, Penalty Method: hessfpen.m function hessian = hessf(x,k) % This is the hessian of the L2 penalty method for our program. % Evaluate with increasing values of k to simulate evaluating the limit % as k approaches infinity.

% Ensure that k is the same for fpen, gradfpen and hessfpen

%Definitions k = 10000000; h11 = 2*k*(3*(x(1))^2 + (x(2))^2 + (x(3))^2 + (x(4))^2-1);

32

h22 = 2*k*((x(1))^2 + 3*(x(2))^2 + (x(3))^2 + (x(4))^2-1); h33 = 2*k*((x(1))^2 + (x(2))^2 + 3*(x(3))^2 + (x(4))^2-1); h44 = 2*k*((x(1))^2 + (x(2))^2 + (x(3))^2 + 3*(x(4))^2-1); unchessian = [ 0.16 -1.2 2.4 -1.4; -1.2 12 -27 16.8; 2.4 -27 64.8 -42; -1.4 16.8 -42 28];

% Function hessian = unchessian + [h11 4*k*x(1)*x(2) 4*k*x(1)*x(3) 4*k*x(1)*x(4);... 4*k*x(1)*x(2) h22 4*k*x(2)*x(3) 4*k*x(4)*x(2);... 4*k*x(1)*x(3) 4*k*x(3)*x(2) h33 4*k*x(3)*x(4);... 4*k*x(1)*x(4) 4*k*x(4)*x(2) 4*k*x(3)*x(4) h44];

end

Constraint File: MATLAB Optimisation Tool function [c, ceq] = xtx(x) % The constraint as required for the MATLAB Optimization Tool.

% c is the set of nonlinear inequality constraints. Empty in our case. c = [];

% ceq is the set of nonlinear equality constraints. ceq = x(1)^2 + x(2)^2 + x(3)^2 + x(4)^2 - 1;

end

33

SR1 Quasi-Newton Method: NewtonMethod_SR1.m % INPUT: % % f - the multivariable function to minimise (a separate % user-defined MATLAB function m-file) % % gradf - function which returns the gradient vector of f evaluated % at x (also a separate user-defined MATLAB function % m-file) % % x0 - the starting iterate % % tolerance1 - tolerance for stopping criterion of algorithm % % tolerance2 - tolerance for stopping criterion of line minimisation (eg:

% in golden section search)

%

% H0 - a matrix used as the first approximation to the hessian.

% Updated as the algorithm progresses % % T - parameter used by the "improved algorithm for % finding an upper bound for the minimum" along % each given descent direction % % OUTPUT: % % xminEstimate - estimate of the minimum % % fminEstimate - the value of f at xminEstimate

function [xminEstimate, fminEstimate, iteration] = NewtonMethod_SR1(f,... gradf, x0,H0,tolerance1, tolerance2, T) tic %starts timer

k = 0; % initialize iteration counter iteration_number=0; %initialise count

xk = x0; % row vector xk_old=x0; % row vector H_old=H0; % square matrix while ( norm(feval(gradf, xk)) >= tolerance1 ) iteration_number = iteration_number + 1; H_old = H_old / max(max(H_old)); % Correction if det H_old gets % too large or small dk = transpose(-H_old*transpose(feval(gradf, xk))); % gives dk as % a row vector

% minimise f with respect to t in the direction dk, which involves % two steps:

% (1) find upper and lower bound, [a,b], for the stepsize t using % the "improved procedure" presented in the lecture notes

[a, b] = multiVariableHalfOpen(f, xk, dk, T);

% (2) use golden section algorithm (suitably modified for % functions of more than one variable) to estimate the

34

% stepsize t in [a,b] which minimises f in the direction dk % starting at xk

[tmin, fmin] = multiVariableGoldenSectionSearch(f, a, b, tolerance2,... xk, dk);

% note: we do not actually need fmin, but we do need tmin

% update the iteration counter and the current iterate k = k + 1;

xk = xk + tmin*dk; xk_new = xk_old + tmin*dk;

% update the hessian approximation gk = (feval(gradf, xk_new) - feval(gradf, xk_old))'; %column vector s = (xk_new - xk)' - (H_old * gk); st = s';

H_new = H_old + (s * st) / (st * gk);

% keep track of the old values xk_old=xk_new; H_old=H_new;

end

% assign output values toc xminEstimate = xk; fminEstimate = feval(f,xminEstimate) iteration = iteration_number

mast30013 techniques in operations research

Documents