extrapolated least squares optimization ......4. for any illustrations that cannot be reproduced...

141
EXTRAPOLATED LEAST SQUARES OPTIMIZATION APPLIED TO LENS DESIGN. Item Type text; Dissertation-Reproduction (electronic) Authors HUBER, EDWARD DAVID. Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 15/08/2021 03:15:57 Link to Item http://hdl.handle.net/10150/186701

Upload: others

Post on 16-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

EXTRAPOLATED LEAST SQUARESOPTIMIZATION APPLIED TO LENS DESIGN.

Item Type text; Dissertation-Reproduction (electronic)

Authors HUBER, EDWARD DAVID.

Publisher The University of Arizona.

Rights Copyright © is held by the author. Digital access to this materialis made possible by the University Libraries, University of Arizona.Further transmission, reproduction or presentation (such aspublic display or performance) of protected items is prohibitedexcept with permission of the author.

Download date 15/08/2021 03:15:57

Link to Item http://hdl.handle.net/10150/186701

Page 2: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

INFORMATION TO USERS

This was produced from a copy of a document sent to us for microfilming. While the most advanced technological means to photograph and reproduce this document have been used, the quality is heavily dependent upon the quality of the material submitted.

The following explanation of techniques is provided to help you understand markings or notations which may appear on this reproduction.

1. The sign or "target" for pages apparently lacking from the document photographed is "Missing Page(s)". If it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting through an image and duplicating adjacent pages to assure you of complete continuity.

2. When an image on the film is obliterated with a round black mark it is an indication that the film inspector noticed either blurred copy because of movement during exposure, or duplicate copy. Unless we meant to delete copyrighted materials that should not have been filmed, you will find a good image of the page in the adjacent frame. If copyrighted materials were deleted you will find a target note listing the pages in the adjacent frame.

3. When a map, drawing or chart, etc., is part of the material being photo­graphed the photographer has followed a definite method in "sectioning" the material. It is customary to begin filming at the upper left hand corner of a large sheet and to continue from left to right in equal sections with small overlaps. If necessary, sectioning is continued again-beginning below the first row and continuing on until complete.

4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost and tipped into your xerographic copy. Requests can be made to our Dissertations Customer Services Department.

5. Some pages in any document may have indistinct print. In all cases we have filmed the best available copy.

UniversiW MicrOfilms

International 300 N. ZEEB RD., ANN ARBOR, MI 48106

Page 3: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost
Page 4: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

8217423

Huber, Edward David

EXTRAPOLATED LEAST SQUARES OPTIMIZATION APPLIED TO LENS DESIGN

The University of Arizona

University Microfilms

I nternation al 300 N. Zeeb Road, Ann Arbor, M148106

Copyright 1982

by

Huber, Edward David

All Rights Reserved

PH.D. 1982

Page 5: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

.:. ... :::;.'

Page 6: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

EXTRAPOLATED LEAST SQUARES OPTIMIZATION

APPLIED TO LENS DESIGN

by

Edward David Huber

A Dissertation Submitted to the Faculty of the

COMMITTEE ON OPTICAL SCIENCES (GRADUATE)

In Partial Fulfillment of the Requirements For the Degree of

DOCTOR OF PHILOSOPHY

In the Graduate College

THE UNIVERSITY OF ARIZONA

198 2

~ Copyright 1982 Edward David Huber

Page 7: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE

As members of the Final Examination Committee, we certify that we have read

the dissertation prepared by Edward David Huber ---------------------------------------entitled Extrapolated Least Squares Optimization Applied to Lens Design

and recommend that it be accepted as fulfilling the dissertation requirement

for the Degree of Doctor of Philosophy

Date

Date

Date

Date

Date

Final approval and acceptance of this dissertation is contingent upon the candidate's submission of the final copy of the dissertation to the Graduate College.

I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement.

Ihll.~ Dissertation Director

Page 8: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

STATEMENT BY AUTHOR

This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at The University of Arizona and is deposited in the University Library to be made available to bor­rowers under rules of the Library.

Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the copyright holder.

SIGNED:~W4

Page 9: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

To my three sons

Dylan, Scott and Craig

iii

Page 10: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

ACKNOWLEDGMENTS

I would like to thank Professor Robert R. Shannon for serving as

my dissertation advisor and for his assistance, encouragement and

guidance during my pursuit of my dissertation research. I also wish to

express my gratitude to Professor Roland V. Shack for the many enjoyable

and enlightening conversations we shared, and to Professor Murray

Sargent III for his continued assistance throughout the project. I

would also like to thank the faculty and staff of the Optical Sciences

Center at the University of Arizona for providing the fine curriculum

and academic environment in which I pursued my research; the University

Computer Center for allocating the necessary computer resources for my

project; and Sherrie Cornett for the preparation of this dissertation

for publication.

I would also like to express my appreciation to my fellow students

with whom I have shared many hours of study, the exchange of ideas and

good times; especially to Tom Kuper, Art Gmitro and Gene Gindi.

I wish to express my sincere appreciation to my wife Joyce, my

family and my parents without whose understanding, encouragement and

sacrifices this undertaking would never have been possible.

iv

Page 11: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

TABLE OF CONTENTS

LIST OF ILLUSTRATIONS

LIST OF TABLES

ABSTRACT

1. INTRODUCTION

Mathematical Notation

2. REVIEW OF APPROACHES TO OPTIMIZATION IN OPTICAL DESIGN

Introduction • . . . . . . . . . . . . . . . Gradients in Optimization Methods . . . . . . First and Second Derivatives in Optimization Orthonormalization in Optimization

Conjugate Gradient Method . . . . . Orthonormalization in Least Squares Methods

Matrix Conditioning . . . . . . ... The Metric in Optimization . . . . . Optimization Steplength and Direction

3. EXTRAPOLATED LEAST SQUARES OPTIMIZATION .

Page

vii

xi

xii

1

2

9

9 12 13 19 21 22 25 30 35

40

Introduction . . . . . . . . . . . . . 40 Derivation of Least Squares Optimization 42 Derivation of the Extrapolated Least Squares eELS)

Optimization Method . . . . . . . . . . . . . . . 45 Approximations - Analyzed and Discussed with Examples 50

Conventional Least Squares Methods - Linear Approximations . . . . . . . . . . . . . . . 51

Extrapolated Least Squares Method - Compared to Conventional Least Squares Method . . . . . . 54

The Least Squares Method's Second Derivative Term . 73 Modifications to Extrapolated Least Squares Methods 79 Computational Savings for the Extrapolated Least Squares

Optimization Method . . . . . . . . . . . . . . 80

4. LENS DESIGN WITH EXTRAPOLATED LEAST SQUARES OPTIMIZATION 84

Introduction 84

v

Page 12: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

5.

TABLE OF CONTENTS--Continued

Optical Design Problem Selection and Demonstration Lens Optimization .

CONCLUSION

APPENDIX A: THE SECOND DERIVATIVE TE~1 IN THE LEAST SQUARES

vi

Page

85 86

101

OPTIMIZATION . . . . . . . . . . . . . . . . 104

APPENDIX B: THE "LENS IN A HOLE" OPTICAL DESIGN PROBLEM 11 0

APPENDIX C: THE COMPUTER PROGRAM 112

REFERENCES 117

Page 13: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Figure

1.1.

1. 2.

1. 3.

2.1.

3.1.

3.2.

3.3.

3.4.

3.5.

3.6.

3.7.

3.8.

LIST OF ILLUSTRATIONS

Notation description for scalar quantity (outer product) derivatives . . • • . . . • • • • . • . . •.•....

Notation description for vector quantity (outer product) derivatives . . . . . . . . . . . . • . . . •.

Notation description for inner products (a vector in this case) ••••.•..••...

Merit function contours showing optimizations solution steps taken for different system metrics ••...

Rosenbrock parabolic valley problem (2nd order) solved using damped least squares optimization . . . . • . •

nd Rosenbrock parabolic valley problem (2 order) solved using damped least squares optimization with linear extension of the reused metric type • . . . . . . . •

Rosenbrock parabolic valley problem (2nd order) solved using damped least squares optimization with linear extension of the (Robb) step length scaling type

Rosenbrock parabolic valley problem (2nd order) solved using ELS extrapolated damped least squares optimi-zation . . • . . . . . . . • . • . • . . . . . . . .

Rosenbrock parabolic valley (2nd order) solved using ELS extrapolated damped least squares optimization with linear extension of the reused metric type . • . .

Second derivative matrix of merit factors and the approximating matrix of principal diagonal elements

Buchele tilted parabolic valley problem (2nd order) solved using damped least squares optimization

Buchele tilted parabolic valley problem (2nd order) using ELS extrapolated damped least squares optimi-zation . . • . . . . . . • • • . • • • . • •

vii

Page

4

5

8

34

52

53

55

57

58

60

62

63

Page 14: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Figure

3.9.

3.10.

3.11.

3.12.

3.13.

3.14.

3.15.

3.16.

3.17.

3.18.

3.19.

LIST OF ILLUSTRATIONS--Continued

The full second derivative matrix and the matrix of its principal diagonal is shown for the Rosenbrock and Buchele example problems . . • • . . ...

th Rosenbrock parabolic valley problem (4 order) solved using damped least squares optimization . . . . • . .

th Rosenbrock parabolic valley problem (4 order) solved using damped least squares optimization with linear extension of the reused metric type . • • . . . . . .

Rosenbrock parabolic valley problem (4th order) solved using damped least squares optimization with linear extension of the CRobb) step length scaling type

Rosenbrock parabolic valley problem (4th order) solved using ELS extrapolated damped least squares optimi-zation . . . . . . • . . . . . . . . . . ..... .

Rosenbrock parabolic valley problem (4th order) solved using ELS extrapolated damped least squares optimiza­tion with linear extension of the reused metric type

Rosenbrock parabolic valley problem (8th order) solved using damped least squares optimization with linear extension of the reused metric type . • . . . . . . .

Rosenbrock parabolic valley problem (8th order) solved using ELS extrapolated damped least squares optimiza­tion with linear extension of the reused metric type

Rosenbrock parabolic valley problem (2nd order) solved using damped least squares optimization with exact and pseudo second derivative principal diagonal damping factors . . . • . . . . . • . . . • . . . . . . . . . •

Rosenbrock parabolic valley problem (2nd order) solved using ELS extrapolated damped least squares optimiza­tion with pseudo second derivative damping factors

Buchele tilted parabolic valley problem (2nd order) solved using damped least squares optimization with exact and pseudo second derivative principal diagonal damping factors . . • . . . . . . . . . . . . • . . .

viii

Page

64

66

67

68

69

70

71

72

75

76

77

Page 15: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

LIST OF ILLUSTRATIONS--Continued

Figure

3.20. A two dimensional example showing the optimization path length for several least squares optimization

ix

Page

techniques . • • . . . . . . . . . . . . 81

4.1. Lens design with a vanishing merit function using con­ventional and ELS extrapolated damped least squares optimization . . . . . . . . . . 88

4.2. Lens design with a vanishing merit function using con­ventional and ELS extrapolated damped least squares optimization (with linear extension of the reused metric type) •.•.... . . . • . . . . . 89

4.3. Lens design with a stagnating merit function using con­ventional and ELS extrapolated damped least squares optimization • . . . . . . . . • • • . . . . . 90

4.4. Lens design with a stagnating merit function using con­ventional and ELS extrapolated damped least squares optimization (with linear extension of the reused metric type) •.. . • . . . . . . . . • 91

4.5. Lens design with a vanishing merit function using con­ventional damped least squares optimization (with and without linear extension of the reused metric type) 94

4.6. Lens design with a vanishing merit function using ELS extrapolated damped least squares optimization (with and without linear extension of the reused metric type) . • . . • . . . . . . . . . . . . . . . . 9S

4.7. Lens design with a stagnating merit function using con­ventional damped least squares optimization (with and without linear extension of the reused metric type) . 96

4.8. Lens design with a stagnating merit function using ELS extrapolated damped least squares optimization (with and without linear extension of the reused metric type) . • . • • . . . . . . . . . • • • . 97

4.9. Lens design with a vanishing merit function using con­ventional and ELS extrapolated damped least squares optimization (with linear extension of the reused metric type) ..............•.••. 99

Page 16: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

LIST OF ILLUSTRATIONS--Continued

Figure

~.10. Lens design with a stagnating merit function using con­ventional and ELS extrapolated damped least squares optimization (with linear extension of the reused

x

Page

metric type) . . . • . • . . . . • • • . . . . . . . 100

Page 17: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

LIST OF TABLES

Table

2.1. General texts and articles on optimization

2.2. References for optimization methods using differencing techniques to approximate derivatives . • . • . .

2.3. References for variable metric optimization methods

2.4. References for second order terms in least squares optimization . • • . • . . . .

2.5. References for conjugate gradient optimization methods

2.6. References for Grey's approach to least squares optimi-zation

2.7. References for least squares optimization

2.8. References for adaptive optic optimization methods

2.9. References about matrix conditioning and manipulation techniques • . . . . . • • . . . • . •

3.1. First order least squares optimization derivation

3.2. Second order least squares optimization derivation

4.1. Merit function used for the preliminary design of the double Gauss lens . . . . • • • . . . .

xi

Page

11

15

18

20

23

26

29

29

31

44

46

87

Page 18: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

ABSTRACT

A new approach to least squares optimization has been developed

which uses extrapolation factors to introduce variable metric techniques

into the least squares optimization methods used in optical design. This

new approach retains derivative information between successive optimiza­

tion iterative steps to form approximate second derivatives in order to

develop extrapolation factors. These extrapolation factors are used to

update and refine important system parameters including the merit

function, the first derivative matrix and the system metric without re­

quiring the reevaluation of the system derivatives. This extrapolated

least squares CELS) optimization method does not simply add damping terms

to the diagonal elements of the system metric to control optimization

step lengths as is done in the various damped least squares CDLS) optimi­

zation methods; but the total system metric is updated to reflect the

current optimization progress made to within the limit of the extrapolated

quadratic approximation to the problem. The ELS and conventional least

squares optimization methods ar.c compared in numerous optimization problem

examples including several test functions as well as typical optical de­

sign problems. The extrapolated least squares CELS) optimization method

is shown to reduce computational overhead and to accelerate convergence

of least squares types of optimization problems.

xii

Page 19: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

CHAPTER 1

INTRODUCTION

The Extrapolated Least Squares (ELS) optimization method is in­

troduced as a new approach to least squares optimization. This method

incorporates a variable metric technique of optimization into the

familiar family of least squares optimization methods used in optical

design. The ELS method retains the first derivative information be­

tween optimization iterations in order to approximate second derivatives

which are used in developing extrapolation factors. These extrapolation

factors are used to update the system's first derivative matrix and the

optical system metric. This technique does not simply assist in

selecting damping factors (which only involve the diagonal elements of

the metric), but it actually updates and improves the system metric

used to describe the optical problem between iterations .

. The ELS method effectively extends the region over which the

optimizativn problem can be linearized for iterative solution by least

squares optimization techniques. With the application of these extrapo­

lation techniques the optimization iterative procedures can progress with­

out recalculating the derivative matrix for the optical systems. This

procedure has general application and can be incorporated into a wide

variety of least squares optimization techniques that employ the numerous

damping and scaling schemes which have been developed to accelerate

optimization convergence.

I

Page 20: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

A well referenced review of several optimization methods is

presented in Chapter 2. The emphasis here is on the techniques used in

and the approaches to the various optimization procedures, rather than

on the detailed derivations of the particular optimization methods.

This fairly qualitative review of the procedures in current use in

optimization is intended to provide a background against which the new

extrapolated least squares eELS) optimization method can be compared

and a context into which it can be set.

2

A detailed derivation of the extrapolated least squares optimi­

zation method is presented in Chapter 3. The ELS derivation is made in

association with the derivations of several conventional least squares

optimization methods in order to develop the motivation for the new

approach used. This also serves to highlight the unique as well as the

shared features of the new ELS and conventional least squares optimiza­

tion methods. These features are further highlighted by the many test

problems and the graphical results which exemplify specific aspects of

the optimization methods. In Chapter 4 the extrapolated and conventional

least squares optimization methods are applied to optical design problems.

The optical design problems of Chapter 4 as well as the test problems of

Chapter 3 are used to clearly demonstrate the superior features of the

extrapolated least squares optimization method over conventional least

squares approaches.

Mathematical Notation

In an effort to clearly present the ideas developed in this

dissertation, an emphasis is placed on the basic concepts of the approach

Page 21: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

3

rather than on the mathematical detail. To this end, a simplified

notation is used throughout this dissertation which clearly displays

mathematical relationships without explicitly detailing the specific

mathematical operations involved. A brief discussion of this notational

scheme should clarify this approach.

In the discussion and development of the extrapolated least

squares optimization method, a variety of tensor quantities are en­

countered including tensors up to rank 3. This includes: scalars, or

rank zero tensors; vectors and one dimensional arrays, or rank one

tensors; two dimensional arrays, or rank two tensors; and three di­

mensional arrays, or rank three tensors. These tensors often arise out

of differentiation operations. In the context of this dissertation,

differentiation is applied as an outer product of a differential or

gradient operator. As shown in Figure 1.1 the scalar $ becomes a vector

when its gradient is computed. This vector becomes a two dimensional

array when it is differentiated, and upon further differentiation a three

dimensional array is formed. It is thus seen that the second derivative

of a scalar gives rise to a two dimensional array. When starting with a

vector quantity, however, as shown in Figure 1.2, its first derivative is

a two dimensional array and its second derivative is a three dimensional

array. The point here is, that the rank of the tensor after differenti­

ation depends upon the rank of the original tensor before differentiation.

There are several summation notations and conventions that could

be used to represent these vectors, matrices or tensors and their

associated operations. The convention used in this dissertation,

Page 22: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

~ • Seal aT

=

a2 1j1

aiiTxi" a2 1j1

axzaxi

~.. ~ a31j1 ••• ~ , a a a xi xJ' xk • ijk xi Xj ~

=

Figure 1.1. Notation description for scalar quantity (outer product) derivatives.

4

Page 23: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

• :1.:1. :I. :I. f,

f • r f. i. • [ f. ]

~ afi . . !if. £. ax x. x. •

ij j :I. J

.J!ll... aXl aXl

~ aXlaXl

. = f

Figure 1.2. Notation description for vector quantity (outer product) derivatives.

5

Page 24: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

6

however, is quite simple. Scalars or simple constants will be repre-

sented by greek letters, usually K or~. All other tensor quantities

are represented by upper or lower case English letters. The -.ank of the

tensor quantity is determined by the reader from the specific usage of

that particular quantity. The specific usage is determined by two

factors. First is its order of differentiation as discussed above, and

this is given simply by dots over the tensor quantities. The number of

dots equals the derivative order. For example 'i is the second derivative

tensor of a scalar, ~ (a greek letter) and thus $ is a 2 dimensional

matrix or tensor of rank 2 as shown in Figure 1.1. For a vector quantity

f, f is a second derivative tensor of rank three or a three dimensional

array as shown in Figure 1.2. If one is trying to follow the rank of

these tensor quantities very carefully throughout the mathematical de-

velopment, the notation presented here may be counter-productive. How-

ever, in general, knowing the rank of the various tensor quantities

involved is not crucial to the developlnent of the concepts presented and

thus reference to tensor rank is eliminated in order to distill the

notation to a concise form.

The second factor affecting the rank of the tensor quantities is

their usage in tensor products. The convention used here is that all

products are inner or scalar products to be taken in the order that they

are written. Thus the product of a vector times itself is written as

~ = fTf where fT is the transpose of the vector f and the quantity

written as fTf is the inner or scalar product of the vectors. In the

simplified notation of this dissertation this product is written simply

Page 25: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

7

as ff. Since an inner product is always assumed, it follows that the

transpose of the first term must be taken to form an inner product as

shown in Figure 1.3. Thus fTf and ff have the same mathematical meaning

in the notation used in this dissertation. Examples taken from a later

chapter are the equivalent equations ffAX + ff = 0 which is also written

as fTiAX + iTf = 0 where f and AX are vectors and f is a two dimensional

•• 'T array. In the second equation the ff product is written as f f to

emphasize that the scalar product is involved.

Page 26: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

= [f1f2 f 3'''] f1 = f12+f22+f3 2 ...

f2

f3

Figure 1. 3. Notation description f01' inner products (a vector in this case) .

8

Page 27: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

CHAPTER 2

REVIEW OF APPROACHES TO OPTIMIZATION IN OPTICAL DESIGN

Introduction

A variety of optimization techniques have been developed for

optical design problems since the 1950's and the advent of modern com­

puters. The optimization techniques which have proven successful in

optical design are those which effectively and efficiently handle large

nonlinear problems involving many variables. Many of these optimization

procedures utilize Taylor series to approximate the nonlinear functions

describing the behavior of lenses. Three general features found in these

optimization approaches include the following:

search is pursued locally rather than globally.

first, the optimization

That is, after the

search is initiated, optimization follows a path which, based upon the

available sampling of the local configuration of the parameter space,

leads to the solution minimum in this region. Minima found in this

manner are often called local minima since undiscovered, better minima

may exist in remote regions of the solution space. Second, the nonlinear

optical problem is linearized by truncating the approximating series

expansion for the system's nonlinear functions. This serves to facili­

tate the mathematical requirements of the optimization methods, however,

it tends to further localize the optimization process because the region

of solution is restricted to a neighborhood where the series approxi­

mations remain valid. This results in a stepwise or iterative

9

Page 28: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

10

optimization process. Third, the optimization search is usually carried

out using a descent or gradient technique. These methods often require

the evaluation of the derivatives of the system functions in order to

determine the local slopes of the parameter space contours. Although

evaluating derivatives requires a modest amount of computation, the in­

formation gained can be used very effectively to pursue the optimization

minimum.

There are several optimization methods which have been developed

from gradient methods and the Taylor series approximations as outlined

above. Some of the more familiar methods include direct gradient, con­

jugate gradient and least squares optimization techniques. Numerous

variations on these optimization techniques have been developed in order

to accelerate convergence, generalize the application or in some fashion

improve the optimization performance. In this chapter, both specific and

general characteristics of the various optimization methods are discussed

in order to point out the unique features of a particular optimization

approach and to display common features shared by several. The objective

of this chapter is not to provide a review that explicitly details the

various optimization approaches. Several general texts on optimization

and review articles on optimization in optical design are listed in

Table 2.1 which provide thorough discussions of these optimization

techniques. This review emphasizes the approaches used in and the

techniques applied to these optimization methods and their variants to

provide the background for the development in this dissertation of the

new extrapolated least squares eELS) optimization method.

Page 29: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Table 2.1. General texts and articles on optimization.

General Texts

Fletcher (1971) Forsythe, Malcolm, and Moler (1977) Jacoby, Kowalik, and Pizzo (1972) Pierre (1969)

Optical Texts

Feder (1963b) Jamieson (1971) Lavi and VagI (1966) Rigler and Pegis (1980)

11

Page 30: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

12

Gradients in Optimization Methods

The gradient of the optical merit function is the basic tool or

compass used to guide the optimization process through the parametric

space describing the optical system. Locally the gradient indicates the

direction of improved optical performance. When the gradient vanishes

or goes to zero, the solution minimum (or a function inflection) has

been located. However, when an optimization method is called a "gradient

method," such as the direct gradient (Gauss) method or the second-order

gradient (Newton) method, the term gradient is used in a slightly

restrictive sense. In the "gradient methods," the gradient of the

optical system merit function is incorporated directly into the optimi­

zation process, and the particular type or form of the merit function

does not matter. That is, the merit function may equal the sum of a set

of optical aberrations

</> = f = f1 + f2 + f3 •.• (2.1)

or it may equal the sum of the squares (the variance) of the aberrations

(2.2)

but the gradient operation in "gradient methods" does not penetrate into

or operate on the particular configuration of </> in the derivation of a

specific method. Thus in the direct-gradient method for example, the

optimization step is given simply as

/).x = -KV'</> = -K</> (2.3)

Page 31: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

13

For a non-gradient method such as the least squares optimization

method, the gradient operation does penetrate into and does operate on the

component factors of the merit function. Thus the type of merit function

becomes important. In the case of the least squares method, the merit

function takes the form of Eq. (2.2) and the gradient of the merit func­

tion now takes the form

2ff (2.4)

Although the merit functions used for a least squares optimization method

and a gradient method may be identical, the development of the optimiza­

tion procedures and the use of the gradient of the merit function are

seen to be fundamentally different.

First and Second Derivatives in Optimization

The importance of the gradient or first derivative of the optical

system merit function in optimization was discussed above. The first

derivative provides, for a modest amount of computation, some very useful

information. In particular, it gives the localized slopes of the con­

figuration space described by the merit function. It also identifies

the solution minimum because at the minimum the first derivative goes to

zero as the slopes flatten out and vanish. Although the first derivative

aids in identifying the solution minimum, it often creates computational

difficulties when it is used in calculating the iterative step length

while becoming vanishingly small nea~ the minimum. This can be seen in

an approach that uses a Taylor series, truncated at the first order term,

to approximate the nonlinear merit function. The truncated merit

Page 32: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

14

function is written as

(2.5)

Then solving for the iterative step length ~x

the first derivative now appears as an inverse function. As the solution

to the optimization problem is approached and the derivative term gets

vanishingly small, it is apparent here that this inverse factor diverges

and the solution step computation can become erroneous. Convergence

problems are encountered in several optimization methods which use first

derivatives and a variety of schemes and procedural variations have been

developed to rectify these problems. Where the convergence problems are

avoided, the first derivative methods are attractive because the amount

of computation required for larger optimization problems remains reason-

able. Several variants of optimization methods have been developed which

use differencing techniques to approximate the first derivatives from

the merit function values obtained during optimization. Several refer-

ences to these approaches are listed in Table 2.2.

An alternate optimization approach with superior convergence

properties but which requires the calculation of second derivatives can

be developed in a fashion similar to that given above. This second

order gradient method is often called the Newton method and is derived

from a second order Taylor series approximation of the nonlinear merit

function. The merit function is truncated after the second order term

Page 33: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Table 2.2. References for optimization methods using differencing techniques to approximate derivatives.

First Derivative Differencing Approximations

Barns (1965)

Broyden (1965)

Powell (1964, 1965)

Second Derivative Differencing Approximations

Brunner (1971)

Dilworth (1978)

Feder (1957)

Stewart (1967)

15

Page 34: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

16

and is \~ri tten as

(2.7)

Since the gradient of the merit function vanishes at the minimum, Eq.

(2.7) can be rewritten after differentiation as

f o :::::: (2.8)

The iterative step length ~x is now given as

(2.9)

The first-order gradient is now removed from the inverse factor and if

the second-order gr~dient term does not also vanish at the solution

minimum, the convergence properties of the Newton solution in Eq. (2.9)

are superior to those of the first-order gradient method. However, the

amount of additional computation required to numerically calculate the

second derivatives in a typical optical design optimization problem

makes the second-order gradient methods impractical. For an optical

system with n variables (typically n is between 5 and 20 and often

higher), then for the same number of optical aberration function evalua-

tions a first order gradient method could take approximately n additional

iterative steps for each iterative step taken by a second order gradient

method. In papers by Feder (1957) and Brunner (1971) the second deriva-

tive calculation in the Newton method is approximated by differencing

the values of the first derivatives at each iterative step in order to

greatly reduce the computation required. Similar differencing

Page 35: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

17

techniques are used in a variant of the Newton method, called the Vari­

able Metric Method. Several references to these variable metric methods

are listed in Table 2.3. This type of differencing approximation for

second derivatives is discussed in more detail in the following chapter,

and several references are given in Table 2.2.

Since least squares optimization techniques are by their very

nature linear, when a nonlinear problem is to be optimized using a least

squares method, the optimized function is linearized by truncating the

Taylor series after the first order term. The errors introduced by

truncating the higher order series terms tend to reduce the optimization

convergence rate of least squares methods. A detailed discussion

covering the derivation of the least squares optimization method is

presented in the next chapter. Two methods of derivation are displayed

in Table 3.1. In the second approach of Table 3.1 the gradient of the

merit function is linearized rather ~han the aberration function itself.

This procedure introduces a second derivative term (see Eq. 2.10) into

the least squares solution in spite of the fact that the Taylor series

is truncated before the second order term

(2.10)

This second-order gradient term is, in general, considered troublesome

in least squares optimization methods because of the large amount of

additional computation required to calculate the second derivatives.

This term is usually neglected as adequately small; several justifica­

tions that have been developed in the literature for eliminating this

Page 36: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Table 2.3. References for variable metric optimization methods.

Adachi (1971) Huang and Levy (1970)

Davidon (1959, 1966, 1968)

Fletcher (1970)

Fletcher and Powell (1963)

Goldfarb (1970)

Greenstadt (1970)

Huang (1970)

Jamieson (1971)

Myers (1968)

Pearson (1969)

Rigler and Pegis (1980)

Stewart (1967)

18

Page 37: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

19

term are referenced in Table 2.4. After the second derivative term of

Eq. (2.10) has been eliminated, the least squares solution step from the

derivation of the second approach shown in Table 3.1 becomes identical

with the conventional least squares solution step

(2.11)

which is derived directly in the first approach.

In a paper by Buchele (1968) the second derivative term is re­

tained in the least squares solution step as given in Eq. (2.10), but

the second derivative is approximated to reduce the required computation.

Buchele suggested using only the principal diagonal elements of the full

second derivative matrix and then incorporating the second derivative

term as a damping factor in the least squares solution. In order to

further reduce the computation required when the second order gradient

term is retained in the least squares optimization method, Dilworth

(1978) uses the differencing techniques of Feder (1957) and Brunner

(1971) to approximate the principal diagonal elements of the second

derivative matrix for use in least squares problems. These approximations

to the second derivative and the use of the second derivative term in

least squares optimization are discussed further in the next chapter and

in Appendix A.

Orthonormalization in Optimization

In the optimization processes discussed above, the highly non­

linear optical merit functions are linearized so that optimization can

proceed within a nearly linear, localized domain. An additional problem

Page 38: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Table 2.4. References for second order terms in least squares optimization.

Buche1e (1968) Grey (1963a)

Dilworth (1978)

Faggiano (1980)

Feder (1957, 1966)

Kidger and Wynne (1967)

Powell (1965)

Wynne and Worme11 (1963)

20

Page 39: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

21

to be addressed is the high degree of interaction between the aberrations

themselves with respect to the optical system variables. Within the

linearized domain this interaction or linear dependence between the

system aberrations and the system variables poses two problems for the

optimization process. First, the functional relationship between a

system variable and the aberrations may be distributed among several

linearly dependent aberrations. Thus as some aberrations are reduced,

others may be forced to increase. Second, the linear dependence between

the aberrations causes serious numerical problems related to the ill­

conditioned nature of the derivative matrix. As a means of addressing

these problems in the optimization process, the coordinate system used

to specify the optical problem may be orthonormalized in order to de­

couple the system variables. A variety of orthonormalization techniques

are available such as the Gram-Schmidt process or Householder trans­

formations. In the transformed coordinate space, the aberrations can be

corrected independently (to within the limits of the linear approxima­

tions) and the degree of functional linear dependence can be established.

The use of orthonormalization to detect linear dependencies and improve

matrix conditioning is discussed in the following section which looks at

matrix conditioning. The use of orthonormalization in gradient and

least squares methods of optimization are discussed in this section.

Conjugate Gradient Method

The inclusion of orthonormalization procedures in gradient

methods of optimization significantly improves their convergence charac­

teristics. In the case of the optimum gradient methods as given in Eq.

Page 40: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

22

(2.3), each iterative step is taken perpendicular to the local contour

tangent. This results in a zig-zagging optimization path that converges

quite slowly. When the coordinate directions are orthonormalized during

optimization using a Gram-Schmidt process, Eq. (2.3) becomes

AX. l. = - f. + (~) AX. 1 l. • 2 l.-

f. 1 l.-

(2.12)

Each iterative step now can be interpreted as directed along the princi-

pal axes of the elliptical contours of the linearized hyper-space.

Optimization now proceeds directly towards the solution minimum at each

iterative step and the rate of convergence is greatly increased. This

type of optimization process is called a conjugate gradient method be-

cause the term for a general rectangular orthonormalized matrix is a

"conjugate" matrix (an orthonormal matrix is square). Although the

orthonormalization process does require a significant amount of addi-

tional computation, the improvement of its convergence properties greatly

outweigh this expense. The conjugate gradient method has been used for

optical design problems, but it has generally been discarded in favor of

least squares optimization methods. Several references to conjugate

gradient methods are listed in Table 2.5.

Orthonormalization in Least Squares Methods

An alternative approach to the usual least squares optimization

procedures was developed by Grey (1963a, 1963b) which employs orthonormal

decomposition techniques. In this procedure the least squares problem

Page 41: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Table 2.5. References for conjugate gradient optimization methods.

Feder (1963b)

Fletcher and Reeves (1964)

Hestenes and Stiefel (1952)

Jamieson (1971)

Powell (1962)

Rigler and Pegis (1980)

23

Page 42: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

24

is solved with the variables decoupled using an orthonormal transform

'T' procedure which takes advantage of the scalar product form of the (f f)

coefficient term to the ~x-variable. Using a Gram-Schmidt orthonormal

decomposition of f gives

F. = f. + E(f. 1 f. > F. 1/ II F. II 1 1 1- 1 1- 1

(2.14 )

and the associated triangular matrix S is formed that satisfies

f = FS (2.15)

Since the matrix is easily manipulated and the above scalar product

(fTf)-matrix becomes diagonalized in the transformed space, this method

greatly reduces the required amount of computation when attempting to

solve the least squares problem of Eq. (2.13), using an orthonormal

approach. Using Eq. (2.15) and the orthonormality of the ~-matrix

developed according to Eq. (2.14), Eq. (2.13) can be rewritten

(2.16)

and simplified to

(2.17)

The change in the decoupled variables, u, associated with the transformed

problem are defined as

~u = S~X (2.18)

Page 43: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

25

and the change in the solution vector previously given as Eq. (2.11) can

now be written as

(2.19)

The orthonorma1ization procedure described above proceeds co1umnwise;

this can be interpreted as constructing the orthonormal coordinate system,

one coordinate at a time. Grey uses this fact to great advantage when

optimizing nonlinear problems. First, the linear dependence of the

individual variables can be identified and addressed during orthonorma1i-

zation. Second, the individual optimization step length 6u. (in the 1

orthonorma1ized space) can be determined independently by a relaxation

process and the predicted error f is automatically refined as optimiza-

tion progresses. This combination of procedures in Grey's method of

approaching the least squares optimization problem has made it a powerful

method for solving the nonlinear problems encountered in optical design.

Several references for Grey's method of orthonorma1ization applied to the

least squares problem are listed in Table 2.6.

Matrix Conditioning

The solution of an optimization problem usually involves solving

an equation of the following general form

A x = b (2.20)

for the solution vector x. If the matrix A is a square, non-singular,

well conditioned full rank matrix then the x vector can be found directly

by inverting the A-matrix, as follows:

Page 44: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

26

Table 2.6. References for Grey's approach to least squares optimization.

Bauer (1965)

Cornwell et al. (1973)

Grey (1963a, 1963b)

Grey et al. (1966)

Jamieson (1971)

Pegis et al. (1966)

Rigler and Pegis (1980)

Seppala (1974)

Page 45: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

27

(2.21)

If the A-matrix is rectangular, an exact solution cannot necessarily be

found and a minimal solution is sought. To facilitate solving Eq.

(2.20) under these circumstances a generalized inverse is developed. In

the case where Eq. (2.20) is over determined, the rectangular A-matrix

has more rows than columns and the problem can be solved in the least

squares sense. That is, Eq. (2.20) is rewritten as

ATCAx = b)

ATAx = ATb

and the solution step is now found as

(2.22)

(2.23)

Note that the matrix formed by the inner product ATA is square, and

thus invertible. When Eq. (2.20) is under determined, the rectangular

A-matrix has more columns than rows and a minimal solution to the problem

is found. That is, Eq. C2.20) is rewritten as

Ax = Alx = b

A[ATCAT)-l]x = b

and the solution step is now found as

(2.24)

(2.25)

Note that the matrix formed by the inner product AAT is square, and thus

invertible; but that the dimension of this matrix is set by the number

Page 46: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

of rows of the A-matrix, whereas the square matrix formed in the least

squares matrix above has the dimension of the columns of the A-matrix.

28

The generalized inverse method would fail, however, if the square

matrices formed by the inner products were poorly conditioned or

singular. This often occurs when two of the matrix columns or transform

vectors are linearly dependent. In the least squares procedure, this

problem is addressed by introducing a positive definite damping factor,

K, before inverting the matrix of the generalized inverse, that is

(2.26)

The introduction of damping for least squares problems was introduced by

Levenberg (1944). Several references for damped least squares methods

are listed in Table 2. 7. The generalized inverse of the form in Eq.

(2.25) which was suggested by Hopkins and Spencer (1962) is used for

the adaptive optimization method of G1atzel (1961). Several references

for the adaptive method are listed in Table 2.8. In their approach, the

linearly dependent vectors are removed from the generalized inverse

during the inversion process.

As discussed briefly in the previous section on orthonorma1iza­

tion, Grey removes linear dependencies during the orthonormalization

process. The application of orthonormal decomposition and transformation

techniques to the problems of matrix conditioning and inversion has been

addressed in many papers in the general literature in the 1960's and

1970's. The requirements of a matrix inverse were redefined in a more

Page 47: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Table 2.7. References for least squares optimization.

Damping and Least Squares Methods

Buche1e (1968) Oil worth (1978) Doyle (1972) Faggiano (1980) Feder (1957, 1963b, 1966) Hoer1 and Kennard (1970a, 1970b) Hopkins and Spencer (1962) Jacoby, Kowalik, and Pizzo (1972) Jamieson (1971) Jones (1970) Kidger and Wynne (1967) Lawson and Hanson (1974) Levenberg (1944)

Marquardt (1963, 1970) McCarthy (1955) Meiron (1959, 1965) Morrison (1968) Nunn and Wynne (1960) Rigler and Pegis (1980) Robb (1979) Rosen and E1dert (1954) Rosen and Chung (1956) Spencer (1963) Tabata and Ito (1975) Wynne (1959) Wynne and Wormell (1963)

Least Squares Optimization Problems and Tests

Box (1965, 1966) Cornwell and Rigler (1972) Feder (1963a) Hopkins and Feder (1963)

Huber (1978) Juergens (1980) Rosenbrock (1960, 1965) Wampler (1969, 1980)

Table 2.8. References for adaptive optic optimization methods.

G1atze1 (1961, 1966)

Glatze1 and Wilson (1968)

Jamieson (1971)

Krautter (1970)

Rayces (1980)

29

Page 48: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

30

generalized fashion and the development of matrices satisfying these re-

quirements were called pseudo-inverses. A variety of decomposition

techniques were developed which include: modified Gram-Schmidt methods,

also including iterative refinements; Givens and Householder transform

methods; and single valued decomposition methods. A discussion of these

numerical procedures is not presented here, but several texts and

published papers on these topics are referenced in Table 2.9.

The Metric in Optimization

The nature of an optimization problem is that the solution is

achieved by a search or a stepwise iterative process rather than by a

single mathematical manipulation. To work towards the solution miRimum

of the merit function, the optimization process proceeds by reducing the

value of the merit function as progress is made. The optimization is

thus seen to be directed along the negative gradient of the merit

function; that is, it is directed down the slope of contours of the

hyperspace described by the merit function. In general, an optimization

problem can be written with a generalized merit function as

cP + A (II y II - c)

where Ilyll, written as

Ilyll = xMTx c; a constant

(2.27)

(2.28)

is the measure of vector length with respect to the system metric, M.

The metric length is held constant as a side condition to the minimiza­

tion problem and A is the Lagrange multiplier. The solution to the

Page 49: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

31

Table 2.9. References about matrix conditioning and manipulation tech­niques.

Bjorck (1967a, 1967b, 1968)

Fletcher (1968)

Forsythe, Malcolm, and Moler (1977)

Golub (1965)

Golub and Kahan (1965)

Golub and Reinsch (1970)

Lawson and Hanson (1974)

Martin and Tee (1961)

Peters and Wilkinson (1970)

Wampler (1979)

Page 50: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

32

generalized problem is

-1 • D.x = -M cP (2.29)

where M is a matrix of constants defining the system metric. The system

metric provides the relationship between the system variables and the

merit function and establishes the measure of vector lengths, I Iyl I for

the system. Here D.y is the vector length of the optimization step D.X in

the optical system with the metric M. The relationship between the

optimization step in terms of a metric as given in Eq. (2.29) and that

of the direct gradient method of Eq. (2.3) is quite simple. In this

case the metric M takes on the value of the identity matrix, I, times

the scalar K. In the second order gradient or Newton method the metric

is given by M = f(xo)' from Eq. (2.9). Similar relationships may be

developed for least squares optimization problems by noting that ~ is

proportional to fTf in these equations. The metric is thus seen to be

'T the coefficient term to the (f f) quantity. The metrics for various

least squares solutions include the following: . T' ••

M = (f f + ff) for the

'T' general least squares Eq. (2.10), M = (f f) for the least squares Eqs.

(2.11) and (2.23), and M = (fTf + K2I) in the damped least squares Eq.

(2.26).

The metric scales the coordinate system in which the optimization

of a particular merit function is used. This is particularly useful in

optical design problems where the variables may be very un-alike. This

can be seen from dimensional analysis alone. The units of an optical

element's thickness may be millimeters, whereas the units for that

element's curvature may be reciprocal millimeters, and its index of

Page 51: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

33

refraction would be unitless. Just as one must select the appropriate

scales for the coordinate axis of a two dimension graph or plot, the

metric scales the hyper-space for the optimization problem. The contours

of equal merit function values can be described as a multidimensional

ellipsoid. For the unconditioned optical design problem, the axis scales

could differ greatly. In a simple two dimensional example as sho\~ in

Figure 2.1a the major and minor axis differ substantially, giving a very

elongated ellipse. If the direct gradient method is used, with its

metric equal to the identity matrix, the direction of optimization is

the gradient or simply the normal to the tangent at that point. As seen

in Figure 2.la, this step is not very well directed toward the sGlution

minimum, although it does reduce the merit function. If a new metric is

selected for this problem such as the one appropriate for the conjugate

gradient method, then the coordinate axes of the ellipse are rotated

as in Figure 2.lb to better define distances in the variable space. In

this case it can be seen in Figure 2.1b that the gradient in the rotated

(orthonormalized) system is directed toward the solution minimum (to

within a quadratic approximation of the problem). When the Newton method

is used, the solution is achieved for a problem up to a quadratic

approximation in a single step along each second-order gradient direction

with a step length of unity for each. For the two dimensional example

this metric stretches the coordinate system into a circle, scaling the

coordinate axes to produce contours of constant radius, as shown in

Figure 2.1c.

Since the accuracy of a solution is limited by the approximations

made for the nonlinear merit function, the contours of Figure 2.1 are not

Page 52: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Xz

f

--~~---------------r------------~--,r~Xl

(a)

f

------~~~--------_+~==~----~~-------Xl

(b)

---+--------~~----~---Xl

(e)

Figure 2.1. Merit function contours showing optimizations solution steps taken for different system metrics.

(a) Direct gradient metric.

34

(b) Conjugate gradient metric (orthonorma1ized coordinate axes) .

(c) Newton (second order gradient) metric.

Page 53: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

35

exact and the optimization must continue iteratively. A variant of the

Newton method called the Variable Metric Method (Davidon, 1959) updates

the system metric as optimization progresses iteratively. This proce-

dure updates the system metric with a correction, h., by approximating 1

the second order gradient using differencing techniques and the readily

calculated values of the iterative step length and the system's first

order gradient. This technique greatly reduces the frequency that second

derivatives must be recalculated. The inverse metric for a particular

optimization step is given as

-1 M. 1

H. 1

H. 1 + h. 1- 1

(2.30)

where the current inverse metric Hi is calculated by adding the correction

h. to the inverse metric, H. l' from the previous iteration. Several 1 1-

references for variable metric methods are listed in Table 2.2.

In damped least squares optimization methods, the system metric

may be changed at each iterative step with the damping factor. This is

done to select the best optimization step at each iteration. For a good

selection of a system metric it may be reused for several iterations under

certain conditions. However, since the metric is not modified in order to

provide the required metric for the next optimization iterative step,

damped least squares methods are not considered variable metric methods.

Optimization Steplength and Direction

The many optimization techniques which are available, provide a

wide variety of approaches to optimization problems. Some of the

Page 54: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

36

optimization routines vary only in their numerical methods while others

differ in their search strategies. Several aspects of these search

strategies are reviewed in this section to discuss the many similarities

between the different optimization methods.

The most basic optimization strategy involves selecting the

direction in which to look for the solution minimum and then selecting

the step1ength which determines how far the optimization will progress

in that direction. For the various gradient methods such as the direct

gradient, conjugate gradient or variable metric methods, the direction

of optimization is usually derived from the basic nature of the particular

method. That is, optimization is directed along the negative gradient or

the orthonorma1ized basis vectors of the direct or conjugate gradient

methods, respectively. The selection of the optimization step1ength is

developed in a somewhat ad-hoc fashion in an effort to accommodate for

the approximations made when dealing with nonlinear problems. Rosen­

brock's (1960) modification to the direct gradient method is an example

of introducing an ad-hoc procedure for determining the optimization step­

length. In the optimum gradient method each successive iterative step is

taken perpendicular to the last. As a result optimization progresses in

a zig-zag fashion towards the solution minimum. In Rosenbrock's proce­

dure, each step is taken a specified percentage of the optimum step1ength.

Though each optimization step does not reduce the merit function to its

lowest value in that iterative step, the following step is not taken in

a perpendicular direction and thus its step1ength is longer and better

directed towards the solution minimum.

Page 55: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

37

In the damped least squares optimization methods both the step-

length and direction of each iterative least squares solution step is

varied. This is accomplished by modifying the system metric with an

appropriate damping factor. Several damping techniques have been de­

veloped; these-include additive (M = fTf + K2 I), multiplicative

'T' 'T' .. (M = f f(1+K 2 I)) and second derivative (M = f f + K2ffI) damping. The

solution step is directed toward the undamped least squares solution

when the damping factor K is small. When the damping factor K is large

and dominates, optimization would be directed toward the direct gradient

solution for additive damping, or toward the first-order gradient solu-

tion for multiplicative damping, or toward the Newton solution for the

second derivative damping. When K falls between these extreme values

the solution step is directed between the least squares solution and one

of the other gradient solutions. Also as the value of K is increased,

the steplength is further damped or shortened. Thus damped least squares

optimization directed toward the least squares solution has longer step-

lengths than those directed toward the gradient solutions. Jones (1970)

presented an algorithm which selects the appropriate size for the damping

factor of an additive damped least squares method by finding the inter-

mediate solution direction along a spiral curve between the direct

gradient solution and the least squares solution. Many other methods for

selecting the damping factor have been presented which are often of a

more ad-hoc nature. For example Marquardt (1963) presents a method in

which the damping factor is scaled based upon the success of the previous

step, and Tabata and Ito (1975) offer a different set of scale factors

Page 56: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

for the same technique. The influence of damping was also studied in

terms of ridge traces by Hoerl and Kennard (1970a, 1970b).

38

Damping has been the most generally accepted means of controlling

the iterative step direction and length of least squares optimization

methods. There are however some alternative approaches. In Grey's

(1963a, 1963b) orthonormalization approach to optimization, the step

directions are established by the basis vectors during orthonormalization.

Since damping is not used to control the solution step length, maximum

and minimum step length limits are included in the optimization algorithm

in an ad-hoc fashion. Further scaling of the steplength is also included

to find the optimum steplength in each of the orthogonal directions. In

a paper by Robb (1979), he suggested using simple scaling of a solution

steplength following a successful damped least squares step to accelerate

convergence.

In the adaptive optimization method of Glatzel (1961), Glatzel

and Wilson (1968) and Rayces (1980) the solution step direction is deter­

mined by the minimal solution to an underdetermined problem as in Eq.

(2.25). The steplength is controlled by two factors. The merit function

target values are adjusted to intermediate values which reduces the

amount of correction required for a particular iterative step. This has

an effect on the steplength in very much the same way that damping

affects least squares problems. The solution steplengths are also scaled

in a two step procedure. First the variable with the largest steplength

is scaled to a specified fraction of its original value and all other

variables are scaled proportionally. Second the variables are then

Page 57: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

39

rescaled incrementally until a minimum is found for that iteration. The

techniques which are used to control the steplength and direction of an

optimization iterative step are thus seen not to be unique to a particu­

lar optimization method. There are many scaling and damping procedures

shared among several of the optimization methods. Much of the common

ground shared by the various optimization methods are pragmatic strategies

required to handle difficult nonlinear optimization problems.

Page 58: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

CHAPTER 3

EXTRAPOLATED LEAST SQUARES OPTIMIZATION

Introduction

In the optimization of a complex optical system there is no

simple analytic relationship between the lens element parameters such as

curvatures or thicknesses (the independent variables) and the system

aberrations (the dependent variables). Without such an analytic ex­

pression to describe the optical design problem, on~· is unable to di­

rectly predict the system configuration which minimizes the optical

aberrations. Thus one is restricted to the information obtainable in

the neighborhood of some judiciously chosen starting configuration. The

objective of optimization procedures is to utilize this local information

to efficiently select a new configuration which reduces the system aber­

rations and which upon repeated application leads to a minimum of

aberrations.

The use of such local information to work towards the optimiza­

tion minimum is the basis of the various optimization techniques dis­

cussed earlier. The motivation for the optimization techniques to be

developed in this dissertation is to substantially extend the size of the

neighborhood within which one obtains useful information for the optimi­

zation process by retaining and utilizing the history of past iterative

optimization steps in order to predict the next optimization step more

effectively.

40

Page 59: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

41

For a particular optical system configuration, the information

calculated locally is usually the value of the dependent variable and its

first derivative with respect to the independent variables. Some methods

require the calculation of second derivatives, but for larger optical

design problems this may be a formidable computational task. In least

squares optimization methods, only the calculation of the values of the

dependent variables and their first derivatives is required.

The optimization technique to be developed here is a modification

of least squares methods. The first derivative matrices are retained

between iterations and then by simple differencing, the change in the

derivative matrix is found. This gives the approximate directional

derivative of the derivative matrix. This also provides an approximation

to the principal diagonal elements of the second derivative matrix.

These second derivative elements contain a substantial amount of new

information beyond that which is available from a localized first de­

rivative and it is information which is already available from prior

calculations. The basis of the new optimization approach is to in­

corporate these second derivative matrix elements into extrapolation

factors. These extrapolation factors are applied to the first derivative

matrix and the associated system metric in order to extend the useful

working neighborhood of each optimization iterative step. In the de­

rivation of this extrapolation approach to the least squares optimization

problem, the inclusion of second derivative factors is fundamental; the

use of only the principal diagonal elements of the second derivative

matrix is not. Using the principal diagonal of the second derivative is

Page 60: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

42

just an approximation to the full second derivative which improves the

calculational efficiency of the procedure. Because of this approxi­

mation, and several others to be discussed later, there are limitations

to the useful application of this approach to the optimization problem.

Experience with this. extra-po,lated le~~t squares (ELS) optimization

approach has shown that, within the limitations of the approximations

made, the region over which a least squares iterative step can be taken

is greatly extended, and the convergence of the optimization problem is

greatly accelerated.

Derivation of Least Squares Optimization

In this chapter a mathematical derivation for the extrapolation

approach to least squares optimization is presented. The derivation of

conventional least squares methods is also presented in order to demon­

strate some of the basic differences in the way the least squares optimi­

zation problem is formulated. The least squares optimization problem can

be developed in four basic steps. These include: 1) defining a merit

function, ~, for the problem at hand as the square of the vector of

merit factors, that is, the sum of the squares of the individual merit

factors (in lens design these are usually aberrations); 2) finding the

minimum of this merit function by differentiation (~=O); 3) linearizing

the problem by applying an approximating series expansion that is

truncated after the first order term so that linear algebraic techniques

can be employed to solve the problem; and 4) finally the problem, as

formulated, is solved in order to determine the location of, or at least

a step in the direction of the minimum.

Page 61: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

43

Two different approaches are taken for the derivation of the

conventional least squares problem. The difference between these ap­

proaches is found in the manner in which the problem is linearized. In

one case the vector of merit factors is linearized, while in the other

case it is the derivative of the merit function that is linearized. In

both cases the linearization is accomplished by a Taylor series expansion

truncated after the linear term. A comparison of the two methods is made

in Table 3.1. After all approximations are made, the results from the two

approaches are shown to be identical. However, the difference between

these methods of linearization leads to significantly different inter­

pretations of the formulation of the least squares problem. In the

former, the vector of merit factors is linearized. In this approach the

least squares problem is accurately defined only within a neighborhood of

the location in the parameter space where the vector of merit factors is

specified (that is, a particular lens configuration in an optical system).

This is quite different from the second case where the derivative of the

merit function is linearized. Since the derivative of the merit function

is set to zero in order to locate its minimum, in this approach the least

squares problem is accurately defined only within a neighborhood of the

solution minimum. This difference becomes apparent in the second case

if one looks into the effect of the second derivative term as if it had

been retained. As discussed in detail in Appendix A, this term can be­

come large, negative and cause the optimization process to diverge when

it is included remote from the solution minimum.

If one were to attempt to improve upon the conventional least

squares approach by simply retaining Taylor series terms to second order,

Page 62: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Table 3.1. First order least squares optimization derivation.

QJ

4>

f(x)

Approach 1

fTf

2fTf = 0

f(x )+f(x )6x o 0

f(x ) = f o 0

f = f +f 6x o 0

f f-o

• T f [f + f 6x] 0 o 0 0

6x = - [f T f ]-1 f Tf o 0 0 0

6x _ [f T f ]-1 f Tf o 0 0 0

Comments

Merit Function

Minimize

Linearize

Make Substitutions

Find Step Length

Least Squares Solution

Approach 2

QJ fTf . 4> o

4> 4>(x )+~'(x )6x o 0

$ (x ) = $ o 0

cp f Tf o 0 0

~ 2f Tf o 0 0

~ 2(f Tf +f f ) o 0 0 0 0

• T • T' .. T f f + [f f + f f ]6x = 0

o 0 0 0 0 0

6x [f Tf + f Tf ]-1 f Tf 0 o 0 0 0 0 0

6x = - [f T f ]-1 f Tf o 0 0 0

~ ~

Page 63: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

45

the derivations would proceed as shown in Table 3.2. This shows that

the first approach now picks up the second derivative term l ffl and that

both approaches now look the same up to first order (in ~x). The first

grouping of second order terms are also identical in both cases, but the

first has an additional third order term while the second has an addi-

tional third derivative term. When these second and higher order terms

are included, however, one is unable to apply linear algebraic techniques

to these systems of equations; and, as it appears in this approach, one

must proceed by simply truncating the series before the second order

terms.

Derivation of the Extrapolated Least Squares (ELS) Optimization Method

An alternate approach to that taken above is developed by in-

eluding in the derivation of the least squares optimization problem the

assumption that it is an iterative process, as one would expect for non-

linear problems. It is then possible to develop a method which both

utilizes much of the available information from the iterative process as

well as includes higher order terms in the approximating series expansion.

This is accomplished by reviewing the previous iterative steps in order

to develop extrapolation factors which can be used to expand the region

over which the least squares optimization process can progress. These

extrapolation factors are calculated from the first and second derivative . ..

matrices, f and f , and the solution step (~xo) calculated during the o 0

previous iteration. The procedure involves rewriting the first and second

derivative matrices, fl and fl' for the new solution position, Xl=Xo+~Xo'

Page 64: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Table 3.2. Second order least squares optimization derivation.

Approach 1

<P fTf

·T <P = 2f f = 0

f(x)

$

f(x )+£(x )~x+~f(x )~x2 o 0 0

f(x ) = f o 0

f f +£ ~x+~f ~x2 000

f £ +£~x o 0

f= f o

(f +f ~x)(f +f ~x+~f ~x2) o 0 000

f Tf + (f Tf +f Tf )~x o 0 0 0 0 0

3 .. T· .. T·· + - f f ~x2+~f f ~x3 20000

Comments

Merit Function

Minimize

Linearize

o Make Substitutions

Collect Terms

o

Approach 2

<P fTf

. <p = 0

<P $(x )+$(x )~x+~;(x )~x2 o 0 0

o

<p (x ) = <p o 0

~ 2£ Tf o 0 0

cp 2 (1: T f +f T f ) o 0 0 0 0

.~; 2(3£ Tf +"f Tf ) 00000

$ f Tf + (f Tf +f Tf )~x o 0 0 0 0 0

+ ~(3f Tf :i Tf )~x2 o 0 0 0

. T . T· f f + (f f +f f )~x

o 0 0 000

+ ~2 f Tf ~x2:i Tf ~x2 o 0 0 0

o

o

~ (]\

Page 65: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

47

using a Taylor series approximation to second order. The new function

and derivative matrices are now written as

f1 = f(xo) + f(x )~x + ~ f(x )~x 2 + ••• o 0 0 0 (3.1)

f(x ) o + (3.2)

where the ~x -step is a specific vector quantity determined from the o

previous iteration. Note that the ~x is a known quantity which now o

serves as a constant for the purposes of calculating the extrapolation

factors. These new matrices are now substituted into the conventional

least squares solution equations in order to refine their description of

the system following the last iterative step. Thus f1 and f1 from Eqs.

(3.1) and (3.2) are substituted into

for fo and fo of Table 3.1. This gives

o . (3.4)

Note that ~x is a variable that is yet to be determined by the optimiza-

tion process, while ~xo is a fixed quantity determined from the last

optimization step. The products of Eq. (3.4) are expanded in Eq. (3.5)

to show the terms collected with respect to the powers of the ~x's (\vith

both ~x and ~x combined), and in Eq. (3.6) to show the terms collected o

solely with respect to the variable ~x

Page 66: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

+[f f ~x + f f ~x] o 0 0 0 0

+[~f f ~x 2 + f f ~X 2 + f f ~X ~X] 00000 0 000

+[f' f ~x 3 + f f ~x 2~X] 00000 0

a

. .. .... . .. . ....

48

Zeroth Order Term

First Order Term

Second Order Term

Third Order Term

Fourth Order Term

(3.5)

{f f +f f ~x +(~f f +f f )~x 2+f f ~x 3+f f ~x 4} o 0 000 0 0 0 0 000 0 000

+ [f f +f f (~x +~X 2) + f f ~X 3]~X o 0 0 0 0 0 000

a . (3.6)

A term by 'term comparison of Eq. (3.5) to the second order derivation of

Table 3.2 shows similar as well as dissimilar terms for each derivation.

The form of Eq. (3.6) shows the suitability of a solution in terms of

the variable ~x. However, collecting terms within the Taylor series

follows quite naturally during computation, and Eq. (3.4) is rewritten

more simply as

a (3. 7)

The iterative solution step is then calculated by

(3.8)

The precise similarity between this extrapolated solution with its

imbedded higher order terms and the classical solution shown in Table

3.1 is clear. A similar derivation could be developed for the second

Page 67: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

49

approach used in Table 3.1, however, it should be noted that two Taylor

series approximations would be required. One to approximate the merit

function derivative, ~, near the solution and the other to approximate

the merit factor vector, f, for extrapolation.

The numerical methods used to calculate the solution to the

iterative step ~x for the conventional least squares problem as given in

Eq. (3.3) (and developed in Table 3.1), and for the extrapolated least

squares problem as given in Eq. (3.7) are identical. The essential dif­

ference between these two techniques is that the former requires the

calculation of a new derivative matrix at each iterative step while the

latter does not. Eliminating or reducing the need to recalculate the

derivative matrix at each iteration greatly reduces the computational

requirements of a least squares optimization problem. This is very often

the case in optical design problems because there are large numbers of

system variables and aberrations where each one often requires the

tracing of several rays through each of the many surfaces of a complex

optical system. The questions to be addressed then are: how effective

is the extrapolation technique compared to a conventional iterative

optimization step; and how much less computation is required? To answer

these questions requires that a closer look be taken at the approximation

made and at the types of problems which are to be solved. Both the

effectiveness and efficiency of the extrapolation techniques varies

with the problem at hand. A variety of problems are presented below

which exemplify specific features, advantages and limitations of the

extrapolated least squares optimization technique. As is demonstrated,

Page 68: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

50

the various approximations incorporated into the development of the ex­

trapolation factors, result in specific restrictions which limit the

effectiveness of the approach. Within the scope of these limitations,

the extrapolation optimization method is shown to perform with high

efficiency, and simple strategies are developed to identify limiting

situations which would cause stagnation of the optimization progress.

Approximations - Analyzed and Discussed with Examples

The inclusion of approximations in the development of the con­

ventional as well as the extrapolation method of the least squares optimi­

zation problem introduces restrictions to the optimization procedures.

These restrictions occur where the validity of the approximations

begin to break down. As the approximations fail, the behavior of the

optimization process becomes unpredictable. This usually results in the

stagnation of the optimization process and progress towards the solution

minimum is halted. At this point a new optimization iteration is

initiated where the approximations are renewed. In order to demonstrate

and test various features of the optimization processes, several poly­

nomial functions and optical systems were selected as optimization

examples. The polynomial functions are selected so that specific

features of the optimization procedure are tested and displayed in

isolation. These polynomial examples are presented next. In the optical

design problems, isolating specific optimization features is more diffi­

cult and these are discussed in Chapter 4.

Page 69: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Conventional Least Squares Methods -Linear Approximations

51

A problem was presented by Rosenbrock (1960) that has been used

for comparison in the general literature as a test problem for many

optimization methods. The polynomial function describes a parabolic

valley with steep walls but with only a gradual slope along its bottom

to a minimum. Starting at one end of the parabolic valley one proceeds,

stepping iteratively, along the contoured space towards the minimum. A

conventional damped least squares solution to this problem is shown in

Figure 3.1. Here about twelve linear or straight line steps are taken

along the contour forming a segmented path to the minimum. Each line

segment represents an iterative step, each requiring the recalculation

of the derivative matrix and the calculation of the least squares solu-

tion, as in Eq. (3.3). The length of each line segment represents the

region over which the linearization approximation holds. Beyond this

length the merit function diverges as the parabolic contour bends away

from the linear approximating line. A variety of accelerating tech-

niques have been developed to reduce the number of required iterations

that need recalculated first derivative matrices in order to reach a

minimum. One method reuses the derivative matrix from the previous

optimization step to resolve the least squares problem (Robb, 1979). As

is shown in Figure 3.2, this is useful in fairly linear regions where the

bend in the contours is only slight. Where the curvature of the contours

is stronger, this method fails. Here the number of iterations that re-

quire that the derivative matrix be recalculated is now reduced to six.

Page 70: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

X2 1.0

-1.0 1.0 Xl

ConventIonal DLS ------0

Figure 3.1. Rosenbrock parabolic valley problem (2nd order) solved using damped least squares optimization. (Jl

N

Page 71: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

-1.0

ConventIonal DLS _

LI near Extens I on _. -. 4[

X2 1.0

1.0 Xl

Figure 3.2. Rosenbrock parabolic valley problem (2nd order) solved using damped least squares optimization with linear extension of the reused metric type. VI

~

Page 72: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

54

In the method suggested by Robb (1979) the solution vector from the pre-

vious iteration is simply scaled. In Figure 3.3 this method is also

shown to work well in the nearly linear regions and to fail in the

regions of greater nonlinearity or stronger curvature. Here, the minimum

is reached with only about eight iterations that require the derivatives

to be recalculated. There are also many schemes that have been developed

to accelerate convergence which use different methods of selecting

damping factors. These methods only affect the direction and size of a

particular iterative step. The two methods discussed above are attempts

to extend the region of solution beyond a particular optimization step

in order to continue optimization without recalculating the derivative

matrix. In several of the examples which follow, these linear extension

techniques (Robb, 1979) are included for problems using both conventional

and extrapolated least squares optimization methods. These examples

are included to provide clear demonstrations of the limits of conven-

tional least squares approaches to optimization, even when linear ex-

tension techniques or sub-iterations are employed. The advantages of

the extrapolated least squares optimization method, which includes second

order terms, over these conventional techniques are thus clearly demon-

strated by contrast.

Extrapolated Least Squares Method -Compared to Conventional Least Squares Methods

When the extrapolation method of optimization is applied as in-

dicated in Eq. (3.8), the Rosenbrock problem is solved with just two

iterations which require the recomputation of the first derivative

Page 73: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

X2 1.0

-1.0 1.0 Xl

Conventional DlS -LI near Extens I on _. _.-(

Figure 3.3. Rosenbrock parabolic valley problem (2nd order) solved using damped least squares optimization with linear extension of the (Robb) step length scaling type. V1

V1

Page 74: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

56

matrices, as shown in Figure 3.4. As can be seen in this figure, the

iterative steps of the extrapolation method look about the same as those

of the conventional least squares approach of Figure 3.1. The extrapo-

lated least squares steps, however, do not require the re-evaluation of

the derivative matrix but only the resolving of the least squares problem

of Eq. (3.8). With this extrapolation approach there are two full

iterative steps followed by eleven of the more economical extrapolation

steps. When the reused derivative matrix linear extension technique is

also included with the extrapolation method, the number of extrapolated

iterative steps is seen to be reduced from eleven to five in Figure 3.5.

Second Derivative Approximations. If the second derivative had

been calculated in the first iteration along with the first derivative,

the problem would then have only required one full iteration in which

the first derivative matrix also needed to be calculated. The two

iterations were required here to approximate the second derivative matrix

by an approximation to its principal diagonal. This approximation is

made by finding the difference between the first derivative matrices of

the previous two iterations and then dividing, term by term, with the

solution vector (~x) elements

af~ af. 1 ].

a2f. f:. - f .. ax. - ax. 1 1J 1J J J (3.9) ax.ax.

:=::: ~x.

J J ~Xj J

A second derivative calculated in this fashion involves three approxi-

mating steps. The first two approximations arise from the differencing

of the first derivative matrix by using a difference step along the

Page 75: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

X2 1.0

-1.0 1.0 Xl

Conventional DLS ------0

ELS ---4

Figure 3.4. Rosenbrock parabolic valley problem (2nd order) solved using ELS extrapolated damped least squares optimization. U1

---J

Page 76: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

X2 1.0

-1.0 1.0 Xl

ConventIonal DlS ------0

ElS ---f

lInear Extension _. --oK

Figure 3.5. Rosenbrock parabolic valley (2nd order) solved using ELS extrapolated damped least squares optimization with linear extension of the reused metric type. U1

00

Page 77: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

59

direction of optimization. A finite sized difference step (~x) is used

instead of an infinitesimal differential (a/ax). This approximation

becomes weaker the more nonlinear the problem. Also, since this differ­

ence is taken in the direction of the last optimization step rather than

along the coordinate axes, the quantity which is actually calculated is

a projection onto the coordinate axes of the directional derivative of

the first derivative matrix. This approximation loses accuracy when the

off diagonal or mixed partial second derivatives become large. Finally,

by using only these approximate diagonal elements of the second derivative

matrix, all the off diagonal elements are effectively set equal to zero.

This approximation also begins to fail as the off diagonal elements

become large. However, where these approximations do hold this second

derivative approximation provides great computational economy. As is

shown in Figure 3.6, the number of computations required for calculating

the second derivative for an optical system with m-aberrations and n­

variables is only (mXn) when using the principal diagonal approximation

as opposed to (mXnXn) when the full second derivative is used. This

amounts to 300 derivative calculations as compared to 4500 calculations

for a system with 20 aberrations and 15 variables. Also since these

principal diagonal elements are found by simple differencing of pre­

viously calculated quantities, only a single subtraction and division is

required for each term. There is no need here to recompute any quanti­

ties from the actual optical system, so that all ray tracing and aber­

ration computation is avoided.

In the Rosenbrock problem, the off diagonal matrix elements are

zero due to the symmetry of the function used. The use of the principal

Page 78: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

60

~ ~ .1!!.L ~ aXlaXl aXZoxZ aX3aX3 OXmOXn

aZf. ~ ~ f .. r 1

Xi Xj aXloXl axzoxz

aXj aXj .

ij ~ aXloxl

r a2.f aZf

111 111

~ oXmoxn

-

Figure 3.6. Second derivative matrix of merit factors and the approxi­mating matrix of principal diagonal elements.

Page 79: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

61

diagonal is therefore an exact representation of the second derivative

in this specific case. For a function with a tilted parabolic contour,

the mixed second partial derivatives of the off diagonal matrix elements

would not be zero. This is the case for the problem suggested by

Buchele in order to demonstrate the effect of large off diagonal second

derivative matrix terms. As is shown in Figures 3.7 and 3.8 the solution

to the Buchele problem is not achieved in just two iterations when using

the extrapolation method as was the case for the symmetric Rosenbrock

problem. This problem requires six iterations that recalculate the

first derivative matrices. This however is still far fewer than the 24

iterations required by the conventional least squares approach (or the

19 iterations required by Buchele).

The Buchele problem was resolved using full, exact second

derivatives in the extrapolation procedure in order to verify that the

derivative approximation was the sole cause for the additional optimiza­

tion iterations. With this as the only modification to the extrapolation

optimization procedure, the Buchele problem was solved with only two

iterations requiring the recalculation of the first derivative matrices.

The full second derivative matrix and the matrix of the principal diagonal

elements are presented in Figure 3.9 for the Rosenbrock and Buchele

problems. Exact and approximated matrices are presented for the Buchele

problem in Figures 9c and 9d, respectively. In the approximated second

derivative, note the errors in the values of the principal diagonal

terms as well as the erroneous setting to zero of off diagonal terms.

The Buchele problem demonstrates how the approximations involved in

Page 80: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Conventional DLS _

Figure 3.7. Buchele tilted parabolic valley problem (2nd order) solved using damped least squares optimization. (J\

"-l

Page 81: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Figure 3.8.

Conventional DLS ------0

ELS ---I

Buchele tilted parabolic valley problem (2nd order) using ELS extrapolated damped least squares optimization. (J\

VI

Page 82: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

aZf aXlaXI

aZ£ ixldl

aZ£ 5x%5X%

~ 1:% 1:%

., (b) /

GLJ IT:]

o

(e) /

c:D ~

(d) /' ~ [TJ

64

Figure 3.9. The full second derivative matrix and the matrix of its principal diagonal is shown for the Rosenbrock and Buchele example problems.

(a) The general matrix for the problems. (b) Matrix elements for the Rosenbrock problem. (c) The exact calculation of matrix elements for the

Buchele problem. (d) The matrix elements as calculated using differencing

techniques for the Buchele problem.

Page 83: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

65

calculating the second derivative do reduce the effectiveness of the

extrapolation optimization method, but that the method does continue to

substantially accelerate convergence to the solution minimum.

Quadratic Approximation. Achieving the solution to a problem in

two or perhaps one iteration with the calculation of the first derivative

matrix as was demonstrated for the Rosenbrock problem is not generally

possible. That is, just as the least squares optimization method finds

the minimum to a linear problem in a single step, the extrapolated least

squares optimization method can solve a quadratic optimization problem

with a single computation of the first and second derivative matrices.

Though the first and second derivatives are reapplied, in the extra­

polated least squares optimization approach, in an iterative sequence of

least squares steps towards the optimization minimum, they need not be

recalculated as long as the approximations that are made continue to

hold during the optimization process. This was demonstrated in the

Buchele problem for the second derivative principal diagonal approxi­

mations.

If the symmetry of the Rosenbrock problem is retained, but the

nonlinearity of the function defining the U-shaped valley is increased,

then the approximation made by truncating the Taylor series in the

extrapolation factors at the second order term becomes the limiting

approximation. These higher order effects upon the extrapolation

optimization process are seen in Figures 3.10 through 3.16 where fourth

and eighth order functions are optimized. Here the problem minima are

found for the fourth order problem not in one or two iterations

Page 84: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Figure 3.10.

X2

Rosenbrock parabolic valley problem (4th order) solved using damped least squares optimization. (]\

'"

Page 85: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

100 X2

---, Linear Extension _.-.-1(

Figure 3.11. Rosenbrock parabolic valley problem (4th order) solved using damped least squares optimization with linear extension of the reused metric type. . ~

~

Page 86: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

X2

Llnear·Extenslon

Figure 3.12. Rosenbrock parabolic valley problem (4th order) solved using damped least squares optimization with linear extension of the (Robb) step length seal ing type. '"

00

Page 87: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

X2

Figure 3.13. Rosenbrock parabolic valley problem (4th order) solved using ELS extrapolated damped least squares optimization. ~

~

Page 88: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

X2 1.0

0.5

... .... -H---1A---'*f---

Xl

Linear extension -.---«

Figure 3.14. Rosenbrock parabolic valley problem (4th order) solved using ELS extrapolated damped least squares optimization Nith linear extension of the reused metric type_ -..J

o

Page 89: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

10001 100 20 Xz

--- 1.0

0.5

-I. • _x. __ -fl __ ·)f-·lI-" 0.5 I. Xl

Conventional DLS -Linear Extension _._.-1(

Figure 3.15. Rosenbrock parabolic valley problem (8th order) solved using damped least squares optimization with linear extension of the reused metric type. --.J ....

Page 90: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

10001 100 20

-I.

Conventlon.1 OLS ------0

ELS ELS Stagnates

--~

o

X2 I 1.0

0.5

Xl

" "

" '1'

Figure 3.16. Rosenbrock parabolic valley problem (8th order) solved using ELS extrapolated damped least squares optimization \\lith linear extension of the reused metric type. -...l

IV

Page 91: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

73

(requiring derivative matrix recalculations) but in five and three itera-

tions (Figures 3.13 and 3.14) using the extrapolation and extrapolation

plus linear extension methods, respectively. This compares with thirty

and fourteen iterations required when using the conventional least squares

optimization methods shown in Figures 3.10 and 3.11 without and with

linear extension techniques. Figures 3.15 and 3.16 show that the eighth

order problem is solved in eighteen iterations when conventional least

squares methods are used, whereas only seven iterations are required. when

the extrapolation technique is applied to the problem.

The Least Squares Method's Second Derivative Term

The inclusion of the second derivative term (ff) as shown in the

second approach of Table 3.1 in the least squares optimization problem was

considered by Buchele (1968) and Dilworth (1978). Both use the principal

diagonal of the second derivative matrix as an economical means of com-

puting an approximate second derivative in order to arrive at damping

factors which are added to the diagonal elements of the system metric.

Buchele called this approach "second derivative diagonal damping" while

Dilworth called it the "pseudo-second derivative" (PSD) method (because

of the differencing approximation used for calculating the principal

diagonal of the second derivative). Buchele found little advantage in

this method of damping when applied to his test problem (see Figure 3.7),

whereas Dilworth found it to significantly accelerate convergence for

several optical problems. In the examples considered above this second

derivative term was ignored as it generally is in least squares optimiza-

tion problems. In this section the Rosenbrock and Buchele problems will

Page 92: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

74

be solved using the pseudo-second derivative method of Dilworth. Figure

3.17 shows the PSD solution to the Rosenbrock problem. There are three

features of the PSD solution that are easily noted on this figure.

First, the PSD method is seen to make sudden changes in its direction at

some points during the optimization progression. This appears to indi­

cate that this method has a high degree of sensitivity to local gradients.

This brings in the second point which is that the PSD method more rapidly

goes to the lower contour levels of the solution space, that is, the

merit function is decreased more rapidly. The third point is that

although the PSD method does find its way more rapidly to lower merit

function values in the parameter space, Figure 3.17 shows that the some­

times zig-zag stepping pattern actually increases the number of total

steps required in order to move through the solution space to reach the

solution minimum. This is somewhat reminiscent of the convergence

pattern experienced with direct gradient optimization methods. This at

least was the case for the Rosenbrock problem where 18 iterations were

required as opposed to 12 for conventional damping as shown in Figure 3.1.

When the extrapolation method was incorporated into the problem, the

solution was again attained in just two iterations that required the re­

calculation of the first derivative matrix plus twelve extrapolated steps

(see Figure 3.18).

When the Buchele problem was attempted much of the behavior of

the PSD method as discussed above is seen again in Figure 3.19, however,

this time the problem stagnated and progress was halted. This was caused

by the failure of the approximating technique of the PSD method. Due to

Page 93: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

-1.0

Exact PrIncIpal DIagonal DampIng ------0

PSD PrIncIpal DIagonal DampIng ------f

X2 1.0

1.0 Xl

Figure 3.17. Rosenbrock parabolic valley problem (2nd order) solved using damped least squares opti­mization with exact and pseudo second derivative principal diagonal damping factors. -...J

til

Page 94: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

X2 1.0

-1.0 1.0 Xl

Conventional DlS ------0

ElS ---I

Figure 3.18. Rosenbrock parabolic valley problem (2nd order) solved using ELS extrapolated damped least squares optimization with pseudo second derivative damping factors. ---l

C]\

Page 95: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Ex.ct Prlnclp.' Ol.gon.' O~lng ------0

PSO Prlnclp.1 Ol.gon.' O.mplng ~

Figure 3.19. Buchele tilted parabolic valley problem (2nd order) solved using damped least squares optimization with exact and pseudo second derivative principal diagonal damping factors. --.J

--.J

Page 96: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

78

the large mixed partial derivatives inherent in the Buchele problem, the

principal diagonal becomes poorly approximated and as discussed above in

detail, when it becomes negative, divergence becomes unavoidable. When

the problem was successfully restarted at this point with conventional

damping on the first iteration the PSD method was again able to proceed

to a solution. Also shown in this figure is the problem solved with

exact second derivative diagonal damping as suggested by Buchele. In

this case stagnation did not occur. Although some differences are seen

in the optimization paths of the exact and PSD method of second derivative

diagonal damping, they both are seen to proceed very closely along the

same optimization routes.

One additional comment is in order concerning least squares

optimization damping using the second derivative term. In general it

was found that PSD method did achieve the solution more rapidly than

other damping methods when the PSD method was employed in a region

actually near the solution minimum. In the Rosenbrock problem for

instance, though several additional PSD steps were necessary to reach

the solution, in the region near the solution, the PSD method reached

lower merit function values much more quickly. This was also the case

in the Buchele problem where the PSD method failed about midway to the

solution, but after restarting the problem, it actually provided superior

convergence rates in the final optimization steps nearer to the solution

minimum. Based upon the discussion presented earlier about the behavior

of convergence with respect to the ff term, this is just what would be

expected when this second derivative term is included in the least squares

optimization procedure.

Page 97: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Modifications to Extrapolated Least Squares Methods

The results obtainable from the application of extrapolation

techniques to least squares optimization problems are not exactly pre-

79

dictable because of the many approximations incorporated into the proce-

dure, as discussed above. Thus one would expect that the effectiveness

of the extrapolation approach would vary with the particular problem

under consideration. In order to tailor the extrapolation technique to

most effectively solve a particular problem, there are modifications

which can be incorporated into the methods of application of the extrapo-

1ation optimization approach. One modification that was found to be very

effective, was to limit the application of extrapolation to the first

derivative matrix only. Rather than approximate the function as (f1)

from Eq. (3.1), it is actually recalculated at each iterative step as

(f[X1J). This also allows the merit function (~) to be evaluated exactly

at each iterative step. Mathematically this modifies Eqs. (3.7) and (3.8)

which now appear as follows:

o (3.10)

and

6x (3.11)

Though this modification does require some additional computation, it

provides two important benefits. First, since the merit function can be

evaluated exactly, optimization divergence is detected precisely and

unquestionably, even as the extrapolation approximations begin to fail.

Page 98: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

80

Second, the evaluation of the optimization step (~x) as given in Eq.

(3.11) is more accurate when the merit factors are evaluated exactly as

f[Xl] at each step rather than as in Eq. (3.8). This can be seen

pictorially in Figure 3.20 where the linear extension, extrapolated and

modified extrapolated methods of least squares optimization techniques

are compared. In Figure 3.20b it is seen that with the linear extension

technique the direction of the optimization step is unchanged and that

progress soon stagnates. In Figure 3.20c the extrapolation method step

direction is modified as optimization progresses but the values of the

merit factors are seen to be approximated only quadratically and thus

do not exactly follow the actual function's contour. In Figure 3.20d

the modified extrapolation approach is shown to provide a tighter fit to

the actual function contour as optimization progresses. For problems in

which the second derivative approximations tend to be weak, this modifi-

cation is found to add a significant degree of stability to the extra-

polated optimization process. The basic extrapolation approach can be

modified in any of a number of ways in order to best suit a particular

problem or a general approach, the goal here is to achieve computational

efficiency as well as ease of application of the approach.

Computational Savings for the Extrapolated Least Squares Optimization Method

The use of the extrapolated least squares (ELS) optimization

method reduces the amount of computation required to perform a least

squares type of optimization. The computational savings arise from the

replacement of direct calculations of optical system parameters such as

Page 99: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

81

(a) ( b)

(c) (d)

--- Contour Minimum

••••••• Contour of ELS Extrapolated Minimum

--_Q Conventional DLS Iteration Step

--~J( Linear Extension Step

---II ELS Iterative Step

Figure 3.20. A two dimensional example showing the optimization path length for several least squares optimization techniques.

(a) Conventional damped least squares optimization. (b) Conventional damped least squares optimization with

linear extension. (c) ELS extrapolated damped least squares optimization. (d) Modified ELS extrapolated damped least squares optimi­

zation (derivative matrix extrapolation only).

Page 100: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

82

its merit function or the derivative matrix with the extrapolated values

for these system parameters. To determine the improved efficiency of

the ELS optimization method over the conventional least squares methods

involves looking at two factors. First, one must look at the difference

in the amount of computation that is required between extrapolating and

directly calculating the values of the system parameters. The amount of

this difference depends upon the type of merit function used for a

particular problem. That is, for larger optical design problems that

have many lens surfaces and system variables or that have an extensive

merit function which requires numerous ray traces and aberration calcu­

lations, the computational savings using the ELS method will be greater

than for smaller simpler problems.

The second factor to be considered is the amount of progress

that is achieved with an ELS iterative step relative to a conventional

least squares iterative step. The point here is to note that the least

squares equations must be solved to determine each iterative solution

step for either the ELS or the conventional least squares optimization

approaches. It is thus important to know what fraction of time of an

iterative optimization cycle is spent computing system parameters as

opposed to solving the least squares system of equations. This ratio is

different from one optimization program to another because of the differ­

ent numerical methods used to solve the least squares equations and to

select damping factors. Also the relative efficiency of each conventional

or ELS solution step calculation is important because the approximations

used in the ELS method may reduce the effectiveness of the optimization

Page 101: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

83

procedure. As an example consider a conventional damped least squares

optimization cycle which requires 50% of the time to compute the system

parameters and 50% to solve the system equations. Then, if the ELS

method were to reduce the time required to compute the system parameters

to a negligible amount, and if the approximations used in the ELS proce­

dure do not reduce its effectiveness, then two ELS iterative steps could

be taken for each conventional least squares step. In a case where

calculating the first derivative matrix requires five times more time

than for finding the solution step, then five additional ELS iterative

steps could be taken for each conventional least squares step. As the

effectiveness of the ELS procedure is reduced by the approximations made

in this method, the amount of savings gained by the ELS procedure is

reduced. In order to determine the computational savings gained by

employing the ELS optimization method, the numerical procedures of the

specific optical design computer program as well as the specific kind of

optical design problem must be considered.

Page 102: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

CHAPTER 4

LENS DESIGN WITH EXTRAPOLATED LEAST SQUARES OPTIMIZATION

Introduction

The extrapolated least squares eELS) optimization techniques de­

veloped in the previous chapter are now applied to optical systems. In

general, optimization of optical design problems is a formidable task

because the optical systems are usually described by a large number of

highly nonlinear relationships that involve a large number of system

variables. Part of the difficulty in effectively developing an optical

design strategy is the inability to meaningfully display and visualize

the significant relationships for the problem at hand. In the previous

chapter several contour plots were provided to graphically display the

progress of the optimization process. Those examples were restricted to

involve only two independent variables so that they could be plotted as

two dimensional contours. For the many variable optical problem, its

associated multidimensional contours result in a hypersurface which

cannot be pictured. For this reason the optimization progress of the

optical design problems considered here is charted simply as the change

in the value of the merit function versus the number of iterative steps.

Though many other factors could be considered, the plots of merit

function versus iteration steps provide the information necessary to

demonstrate the important differences between the conventional and ex­

trapolated least squares optimization methods.

84

Page 103: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

85

Optical Design Problem Selection and Demonstration

A double Gauss photographic objective was selected as a typical

opt:i.('?l de!:ign p!'oblem which would be optimized using an optical design

computer program. The problem selected was the "Lens in a Hole" problem,

presented at the First International Lens Design Conference at Haverford

College (Sinclair, 1975). The problem specifications are listed in

Appendix B. Two merit functions are used with the lens design problem in

order to demonstrate the optimization characteristics of the extrapolated

and conventional least squares methods for the two types of merit function

which are typically found in optical design problems. Although many

types of merit functions have been devised by optical designers attempting

to obtain the best or at least an adequate optical design, in terms of

the optimization algorithm, these methods tend to fall into two groups.

First there are those which are defined so that the merit function goes

readily to zero, or at least is reduced to a negligible value such as

-5 -12 -25 below 10 ,10 or even 10 ; and second, there are those which tend

to stagnate at a modest value such as 0.1 or 0.01 of the original value.

The latter types of merit functions usually contain merit factors which

tend to work in opposition to each other. This approach is often

effective in improving the performance of complex optical systems such

as a double Gauss. An example of this type of merit factor combination

is to balance third with fifth order spherical aberrations and also to

attempt to force .third order spherical aberration to zero. Balancing

the third and fifth order spherical aberrations without restricting the

third order spherical aberration, would give a merit factor of the

Page 104: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

86

former type, where it could be driven readily very close to zero in a

double Gauss optical system. The point here is not to attempt to devise

a merit function that necessarily produces a high quality double Guass

lens, but rather to provide merit functions that behave in a manner

which one typically finds when designing real optical systems. The two

merit functions used for the preliminary design of the double Gauss lens

are displayed in Table 4.1. The computer program used is discussed in

Appendix C.

Lens Optimization

The progress of an optimization procedure is usually measured by

the amount that the merit function is reduced. By using the merit

function reduction to compare the progress of the conventional and

extrapolated least squares optimization techniques, the relative effective­

ness of these methods can be determined. Since the primary feature of

the extrapolated least squares method is its ability to continue optimi­

zation without recomputing the derivative matrix, several plots show the

reduction of the merit function versus the iterations requiring the re­

calculation of the first derivative matrix. The accelerated reduction

of the merit function when extrapolation is employed in the least squares

optimization process is shown in Figures 4.1 through 4.4 which compare

extrapolated to conventional least squares optimization methods. Figures

4.1 and 4.2 show optimization with a merit function that rapidly goes to

zero while the merit function of Figures 4.3 and 4.4 stagnates at a

value near 0.4. In all cases the extrapolation method is shown to have

substantially accelerated the reduction of the merit functions for each

Page 105: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

87

Table 4.1. Merit function used for the preliminary design of the double Gauss lens.

Merit Function Constructed to Stagnate During Optimization

Aberrations Target Values

I. Edge Thickness-surface 2 2 mm 2. Spherical-3* 0 3. Spherical-3 + Spherical-5 0 4. Coma-3 0 5. Coma-3 + Coma-5 0 6. Astigmatism-3 0 7. Astigmatism-3 + Astigmatism-5 0 8. Transverse-oblique-spherical 0 9. 0.5 Transverse-oblique-spherical +

Astigmatism-3 + Astigmatism-5 0 10. Distortion-3 0 II. Distortion-3 + Distortion-5 0 12. Petzval-3 0 13. Petzval-3 + Petzval-5 0

Merit Function Constructed to Readily Converge to Zero During Optimization

Aberrations

1. Edge Thickness-surface 2 2. Spherical-3* 3. Coma-3 4. Astigmatism-3 5. Distortion-3

Target Values

2mm o o o o

Weights

0.7 1.0 0.5 1.0 0.5 3.0 1.0 1.0

1.0 1.0 0.5 1.0 0.5

Weights

0.7 1.0 1.0 3.0 1.0

*The numbers following an aberration indicate the order of the aberration polynomial coefficient.

Page 106: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

+2

o

-2

-4

0 .... -6 Cl 0

...J

c: 0

.... -8 u c: ::l ~

.... I- -10 OJ ~

-12

-14

-16

-18

\ \ \ \

\

2

, , , , , , , ,

Conventionll DLS ------0

ELS ---I

, T , , , , , ,

T , I I \

.L

3 4 5 6

Number of Iterations

Figure 4.1. Lens design with a vanishing merit function using conven­tional and ELS extrapolated damped least squares optimization.

88

Page 107: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

89

+2

0 \. Conventional DLS ------G

'. ELS ---I \ L I near Extens I on _. ---« \

-2 " " :,., .... -4 . '~

-0 -6 .... \

0'1 , ,

0 ~ ,

.....I

\, \

c: *-0 -8 I

... , u ~ c: , , ::J

" LL.. I ... -10 .\I. '. '.

~ \

\ Q) \

:L: !to , \

-12 \

\ -14 ,.\

-16 . '.

1r-

-18

2 3 4 5 6

Number of Iterations

Figure 4.2. Lens design with a vanishing merit function using conven­tional and ELS extrapolated damped least squares optimization (with linear extension of the reused metric type).

Page 108: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

.70

.65

.60

c: 0 - .55 ~ u c: :J

LL

~

L .50 d)

L

.45

.40

.35 2

, , , , ~

~~

3

, , '4. -IIII--o,.Ulfl'.OH4/HwfHI

4 5 6

Number of Iterations

Conventional DLS ------0

ELS ---I

7 8 9 10

Figure 4.3. Lens design with a stagnating merit function using conventional and ELS extrapolated damped least squares optimization. 10

o

Page 109: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

.70~1------~~~-------------------------------------------------------'

.65

.60

c 0 ;:; .55 u c :::J

LL

~

L. .50 Q) ~

.45

.40

, ,

\ '\ .,

t \

"\ \

, "fI-o·'HHI--o..

Conventional DLS ------0

ELS ---I Linear Extension _. -.~

v ".ff--.ot,,+

.35~, _____ ~----~----~----~----~----~----~----~----~----~ 2 3 4 5 6 7 8 9 10

Number of Iterations

Figure 4.4. Lens design with a stagnating merit function using conventional and ELS extrapolated damped least squares optimization (with linear extension of the reused metric type). to ......

Page 110: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

92

recalculation of the first derivative matrix at each iteration. Figures

4.1 and 4.3 show that extrapolation improves convergence when compared

to conventional least squares methods when no linear extension techniques

are used. In the case where the merit function rapidly converges to

zero the extrapolation technique is shown to very effectively continue

the optimization progress. Note in the second iteration that after the

conventional least squares step, the first two extrapolation steps made

substantial progress while those that followed stagnated. In the

following iteration the application of extrapolation is seen to make

very substantial progress (and this is accomplished without recalculating

the first derivative matrix). In the case where the merit function

stagnates, as shown in Figure 4.3, the application of the extrapolation

method is not as dramatic but it is no less effective. In this case

the success of the individual extrapolation steps is seen to be variable;

but following the initial conventional least squares step of each itera­

tion, the extrapolation steps do substantially add to the progress of this

more difficult problem. To verify that the extrapolation method produces

progress beyond the conventional linear extension subiteration steps,

linear extension of the type that reuses the derivative matrix was in­

cluded in tests with both the conventional and extrapolated least squares

optimization methods. As seen in Figures 4.2 and 4.4, including con­

ventional linear extension into optimization did not substantially alter

the progress of this problem. What is also demonstrated here is that

after the linear extension methods have stagnated, the extrapolation

method is able to continue the progress of the problem (without requiring

the recalculation of the first derivative matrix).

Page 111: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

93

By replotting the above graphs to compare each solution with and

without linear extension in Figures 4.5 through 4.8, some additional

observations are made concerning linear extension procedures. In Figure

4.5 linear extension is applied to conventional least squares optimiza­

tion methods that involve a merit function that is reducible to zero.

Here the linear extension makes a substantial contribution to reducing

the value of the merit function after the conventional least squares

step, however the next least squares step is seen to be correspondingly

shortened. As a result the advantage gained by using linear extension

is seen to be nominal here. When extrapolated methods are employed with

this merit function, linear extension is seen to fail when used with the

extrapolated step. This is probably due to the errors that accumulate

in the extrapolated system metric which would then tend to misdirect a

linear extension step. In the problems with the stagnating merit function

the application of linear extension is seen to be particularly ineffec­

tive, as in Figures 4.7 and 4.8. Linear extension is seen to fail except

in a couple of the extrapolated steps. The general failure of linear

extrapolation in these problems might be expected, due to the extreme

difficulty encountered with a merit function composed of conflicting and

highly nonlinear merit factors.

The advantages of using nonlinear extrapolation with the least

squares optimization of optical systems was demonstrated in the discussion

above. The advantage gained by using extrapolated over conventional

least squares optimization methods is based upon the reduction of the

number of times that the first derivative matrix must be recalculated.

Page 112: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

94

+2~----------------------------------------------~

0 Conventional DLS -\ Linear Extension _._.-1( . '.

\

-2 "' "' ~ . .... .... -4

,...... 0 ..... -6 C'l 0

...J -c: 0

""' -8 u

'.'. c: :::l lL. '.

""' '. -10

\

~

\~. <I) L

-12

'. '. -14

-16

-18~ ______ ~ ______ ~ ______ ~ ______ +-_____ +-_____ ~ ___ --....I

2 3 4 5 6

Number of Iterations

Figure 4.5. Lens design with a vanishing merit function using conven­tional damped least squares optimization (with and without linear extension of the reused metric type).

Page 113: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

......... 0 ...... C1 0

...J

c: 0

.j..J \) c: :J

I.J..

.j..J

L. Q)

L

+2

o

-2

-4

-6

-8

-10

-12

-14

-16

-18

\ '. , " \

~ 0 ..

;\~

\\ \ ~

2

I

\ \

\

"t \ \

.\I. \ \

~

I

1" \

\ \ \

I I

\ \ \ I .1..

\ \

\ 3

I \ , \ \ \

Conventional DLS ------0

ELS ---I Linear Extension _._ • .,

5 6

Number of Iterations

Figure 4.6. Lens design with a vanishing merit function using ELS extrapolated damped least squares optimization (with and without linear extension of the reused metric type).

95

Page 114: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

.70

.65

.60

c 0 .... .55 u C ::J

I.L.

.... .50 I-Q)

~

.45

.40

.35

2 3 4 5 6

Number of Iterations

Conventional DLS _

Linear Extension _.-.-1(

7 8 9 10

Figure 4.7. Lens design with a stagnating merit function using conventional damped least squares optimization (with and without linear extension of the reused metric type). \0

(]\

Page 115: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

.70

.65

.60

c 0 ~ .55 u c :l LL

~

L. .50 Q)

::E:

.45

.40

.35

Figure 4.8.

.\ \ "*

\\ ~ , ~ \ , .x

\ \ ,~ ,

'/fI-.o.XI:J#.1ft.

~~Ut.-()fI'if-

2 3 4 5 6

Number of Iterations

Conventional DLS ------0

ELS --~ Linear Extension -.-.~

7 8 9 10

Lens design with a stagnating merit function using ELS extrapolated damped least squares optimization (with and without linear extension of the reused metric type). \D

-...J

Page 116: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

98

In general this results in a significant reduction in computation during

optimization. It would, however, be expected that the effectiveness of

the extrapo"lation method would be less than that of the conventional

method when measured on the basis of the individual least squares solu­

tion steps. This is seen in Figures 4.9 and 4.10 where the horizontal

axes of Figures 4.2 and 4.4 have been stretched out to show progress per

solution step rather than for each iterative recalculation of the first

derivative matrix. Thus it is seen from the discussion and graphs in

this chapter that the nonlinear ELS extrapolation method of least squares

optimization makes far better use of each calculation of the first

derivative matrix, and this results in significant computational savings

during optimization.

Page 117: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Figure 4.9. Lens design with a vanishing merit function using conven­tional and ELS extrapolated damped least squares optimization (with linear extension of the reused metric type).

Optimization progress is shown here with respect to solution calculations rather than matrix recalculations.

Page 118: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

c 0 -.... u C :J

lJ..

.... L. Q) ~

.70~i--~----------------------------------------------------------~

, .65

\, ')( -. .. " ,

.60 .'¥..,

.55

.50

.45

• 40

" .. , --" "-,

' .. , .,

Conventional DLS _

ElS ---I

"' "P--- _ -- '- ____ -f-y- _____ 1-,, ____ _

~ . .35~' ____ -4 ______ ~ ____ +-____ ~ ____ -+ ____ ~ ____________ +-____ ~ ____ ~

5 10 15 20 25 30 35 40 45 50

Number of Least Squares Solution Calculations

Figure 4.10. Lens design with a stagnating merit function using conventional and ELS extrapolated damped least squares optimization (with linear extension of the reused metric type).

Optimization progress is shown here with respect to solution calculations rather than derivative matrix recalculations.

~ a a

Page 119: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

CHAPTER 5

CONCLUSION

The extrapolated least squares eELS) optimization method developed

in this dissertation introduces a variable metric optimization technique

into least squares optimization procedures. The extrapolation factors are

calculated and used to update the optical system merit function, the

first derivative matrix and the optical system metric all with a nominal

amount of computation. These updated quantities are then used to resolve

the least squares problem without requiring that the first derivative

matrix be recalculated. By varying the system metric in this fashion

several iterative solution steps are taken between each full iteration

which does require the recalculation of the first derivative matrix.

This in effect greatly expands the neighborhood in which the least squares

optimization process can use each calculation of the optical system's

first derivative matrix. In this neighborhood the ELS method of optimi­

zation provides a quadratic approximation to the exact optimization

problem. The least squares or any of the various damped least squares

optimization techniques can proceed iteratively in this expanded neighbor­

hood without recalculating the system's first derivative matrix as long

as this quadratic and other approximations hold. It is important to note

that the ELS optimization method does not simply introduce damping

factors to the diagonal elements of the system metric to control the

101

Page 120: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

102

optimization step lengths; but that the total system metric is actually

updated and refined to reflect the optimization progress when the ELS

method is employed.

Using a variable metric in a least squares optimization procedure

greatly reduces the amount of computation required to solve the problem

and accelerates convergence to a solution. This is because the recalcu­

lation of the first derivative matrix requires a substantial amount of

computation, often including numerous ray traces and aberration function

evaluations, whereas the metric is refined in the ELS method with only

a nominal amount of additional computation. The extrapolation factors

used to vary or refine the optical system metric are developed from an

approximate second derivative which is economically computed from in­

formation that is normally calculated and available in conventional least

squares optimization methods. This information, which consists primarily

of derivative information retained between iterations, is usually dis­

carded in most least squares methods or used simply to develop damping

factors. As was demonstrated in the numerous examples presented in this

dissertation, the application of this information in terms of extrapo­

lation factors significantly enhances the conventional approach to least

squares optimization.

The use of ELS optimization procedures has the potential of being

important when large optical systems are designed on the slower mini- or

micro-computer systems. This is because the ELS procedures can reduce

the number of times that the costly first derivative matrix must be

calculated. Also ELS methods or extrapolation techniques in general

Page 121: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

could be used to investigate solutions quickly and economically for a

design search on an interactive basis.

103

The ELS extrapolated least squares optimization method was used

on a variety of test functions and optical design problems. The computer

program (see Appendix C) used to conduct the investigation of ELS proce­

dures was developed explicitly for this purpose. Because the entire

optical design program including the optimization subroutines was de­

veloped as part of this dissertation project, some areas of the program

development were not undertaken due to time and resource limitations.

As a result, optical systems were evaluated using only paraxial ray

tracing and the development of optical aberration coefficients up to

seventh order. Also, system constraints were included in the merit

function, not explicitly in the optimization algorithm. Though this did

not limit the general investigation of ELS procedures in a fundamental

way, further work with ELS optimization methods using real ray tracing

techniques including constraint calculations is currently under consider­

ation. This could include merit functions that use merit factors which

would involve a large amount of computation such as modulation transfer

functions, radial energy distributions, and diffraction calculations.

In these types of larger optical design problems with merit factors

that are more expensive to compute, the efficiencies of the ELS extrapo­

lation methods are expected to be the greatest.

Page 122: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

APPENDIX A

THE SECOND DERIVATIVE TERM IN THE

LEAST SQUARES OPTIMIZATION

The inclusion of the second derivative term (as shown in Table

3.1 - second approach) can significantly affect the least squares optimi-

zation process. In general, however, this term is assumed to be small

and is then dropped. Several papers have included discussion along this

line to justify the elimination of this term (see Table 2.4 for several

references). Buche1e (1968) and Dilworth (1978) and others have in-

corporated the second derivative term into the least squares optimization

process with varied success. A different approach for analyzing the

impact of this second derivative term is presented in this appendix.

Here the specific characteristics of the second derivative term are

analyzed in relation to the least squares problem rather than attempting

to demonstrate its relatively insignificant size.

This can be demonstrated by looking at the second last line of

Table 3.1 with

bx (A.l)

and noting that the ff factor is proportional to the derivative of the

merit function (ff=~¢). Since progress towards the solution to the

minimization problem advances in the general direction of the negative

gradient of the merit function, it is seen that the inverted quantity in

1M

Page 123: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

105

Eq. (A.l) (often called the system metric) must be positive. The ff

term is always positive, however, the ff term may be positive or negative

and this requires closer examination. The ff term is positive when f and

f are both positive or both negative. These situations correspond to the

expected concave shaped merit function near the solution minimum. If f

and f are of opposite signs then the ff term will be negative. For the

system metric to remain positive, then the ff term must be smaller than

the ff term. In general either f or f could be large and their product

could be negative and larger than ff. However, f and f may be of

opposite signs at the minimum of the merit function when there is a zero

crossing of the merit factor. This can be seen in Figure A.l and in

Table A.l. Here the function is of odd order, and near the zero crossing

the f and f terms are seen to be of opposite sign. The merit function is

also seen to be concave in the region with a minimum of zero at the zero

crossing point of the function f. The solution step in this region very

near the minimum (with x-values between .75 and 1.0) is seen in Figure

A.lb to be directed toward the merit function minimum, for both the regu­

lar least squares solution and also for the solution that includes the ff

term. As can be seen in Table A.I the ff term is negative and smaller

than -the ff term. For a small region further away from the solution

minimum, the solution step is, however, misdirected away from the merit

function minimum when the ff second derivative term is included in the

least squares problem. As can be seen in Table A.I at the location

x=+O.S, the ff term is negative and larger than the ff term. In this

region the solution step is directed away from the solution minimum (see

Page 124: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

2

Least :lquares Sol ution Steps

,

1

-2.0 -1.0 I

Second Derivative Damped Least Squares Solution S·te,,$'

*Mis-directed Solution Step

106

x

2

(a)

II r ...-

1.0 Z:C

(b)

Figure A-I. Examination of solution steps at a merit function minimum with an odd order merit factor and a zero crossing.

(a) The merit function and merit factor at the solution minimum.

(b) Solution step lengths and direction for least squares and second derivative damped least squares optimiza­tion.

Page 125: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

107

Table A-I. Solution step data at a merit function minimum with an odd order merit factor and a zero crossing.

x

-~ +~* 3/4 .9

f -1.125 -.875 -.57812 -.271

f .75 .75 1. 6875 2.43

f -3 3 4.5 5.4

ff 3.375 -2.625 -2.60156 -1.4634

ff .5625 .5625 2.8476 5.90490

ff+ff 3.9375 -2.0625* 0.24609 4.44150

ff -.84375 -.65625 -.97559 -.65854

/:'x = _ (ff) -1 ff a +1.5000 +1.16667 +.34259 +.11152

_ (ff+ff) -1££ b

/:'X +0.21429 -.31818* +.24009 +.14827

*Denotes factors for misdirected solution step away from solution direction.

aLeast squares solution step. bSecond derivative damped least squares solution step.

Page 126: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

108

Figure A.lb) when the second derivative term is included in the least

squares problem, whereas the regular least squares solution is properly

directed toward the merit function minimum. Thus at the minimum, the

ff term is zero and only in a neighborhood of this minimum is the ff

term guaranteed to be small and less than the ff term as required.

It is interesting to note that the ff term is brought into the

first formulation of the least squares problem only after the second

order terms are included in the Taylor series approximation (see Table

3.2) while in the second formulation this term arises directly from the

first order approximation (as shown in Table 3.1). This is because the

second formulation is developed within a more "well behaved" region of

the solution space as discussed above. In fact, if the ff term is in­

cluded remote from the solution minimum and the metric does become nega­

tive, its inclusion induces a divergent optimization step no matter how

small the step size. Under these conditions, one has a case where the

inclusion of a factor from a higher order term in a series gives a less

accurate approximation to the problem.

Based upon the discussion presented here, it is seen that the

primary reason for excluding or including the second derivative term is

the remoteness or nearness of the optimization step, to the solution

minimum, and not the size of the term. Far from the solution minimum

the second derivative term mayor may not improve the description of the

problem. This depends upon the particular problem. Within an appro­

priately small neighborhood of the solution minimum, the second deriva­

tive term will improve the approximate description of the problem at

Page 127: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

109

hand. The elimination of the second derivative term, however, general­

izes the least squares process in the sense that it becomes an effective

optimization procedure remote from or near to the solution minimum.

Page 128: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

APPENDIX B

THE "LENS IN A HOLE" OPTICAL DESIGN PROBLEM

The "Lens in a Hole" problem was presented at the International

Lens Design Conference at Haverford College in 1975 as a demonstration

project to "probe different aspects of contemporary lens design"

(Sinclair, 1975).

The problem involved re-optimizing a well corrected lens system

to operate at markedly different f-number and field angle specifications.

The starting lens configuration was a double-Gauss lens adapted from a

U.S. patent. The double-Gauss lens specifications are listed in Table

B-1. Three re-optimization problems were presented. The one selected

for use as a demofistration problem in this dissertation had the following

requirements:

1) focal length: 100.0

2) f-number: f/3.0

3) field angle: 60 0 full field

4) vignetting not to exceed: 15% at ±2l o field, 35% at ±30 0 field

5) distortion not to exceed: 1%

6) air spaces between lens elements to exceed: 0.2 mm

7) minimum glass thicknesses to exceed: 1.5 mm.

110

Page 129: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Table B-l. "Lens in a Hole" starting

Curvature Thickness

+.015528 8.0

+.004 0.5

+.024752 14.6

+.004411 4.0

+.040785 12.0

0.0 8.0

-.034788 4.0

+.021552 13.0

-.026583 0.5

+.005590 8.1

-.009615 65.8

Effective Focal Length, EFL: 100.8

f-number: f/2

Field Angle: 36°

U.S. Patent 2,117,252 by Lee

111

configuration specifications.

Material

SSK-2

AIR

SK-10

F-5

AIR

AIR

F-5

SK-10

AIR

SK-10

Air

Page 130: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

APPENDIX C

THE COMPUTER PROGRAM

A general purpose least squares optimization computer program was

developed to aid in investigating the development of new optimization

concepts. The program is set up to optimize either lens systems or

generalized functions (such as for the Rosenbrock, 1960, or Buchele,

1968, problems). A variety of optimization options are available in the

computer program that could be selected for specific optimization tests.

These included: 1) least squares optimization (without damping); DLS

damped least squares optimization including additive or multiplicative

damping; second derivative damping including exact and PSD pseudo-second­

derivative damping; linear extension procedures including reused metric

and solution scaling techniques; and, the use of ELS extrapolated least

squares optimization. The flow diagram in Figure C-l lays out the general

structure of the optimization subroutine and shows the manner in which

the various options fit into this structure.

As can be seen in the flow diagram, the new ELS extrapolation

procedure can be readily integrated into computer programs that use the

conventional least squares optimization procedures. The ELS procedure

can be viewed as a step that simply short circuits the full iterative

cycle. It is thus seen that when the ELS option is used, the recalcula­

tion of the full first derivative matrix is avoided as optimization

112

Page 131: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Figure C-l.

DLS Dupins Factor Soarcll

PSD Dupins Factor Searcll

Re-set Solution and CIIock

Scale oIx and Check Solution

So lYe Lean Square. ProblOll, Find New 4X

Step Lenstll

113

Flow diagram of the general structure of the least squares optimization subroutine, including ELS extrapolated least squares optimization.

Page 132: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

114

continues iteratively. The primary function of the ELS subroutine is to

check for optimization stagnation or convergence and to calculate extrapo­

lated values for the first derivative matrix according to Eq. (3.2) and

the merit factors according to Eq. (3.1). The flow diagram for the ELS

extrapolation subroutine is given in Figure C-2 and the program listing

for this subroutine is included at the end of this appendix. It should

be pointed out that the program listing is taken directly from a computer

program which is in the developmental stage and it was not prepared or

documented as computer code for general distribution.

Page 133: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

t-- - --~~---.;~ I I I

I I I I I &- - - - -~;;-...::;;...:.:...~~

I I

I

I

Calc. ELS Update of

f. A(IJ) IJ • 1+(J-l)"14 A(IJ) • A(IJ)+AZ(lJ)oI1X(JJ

~-------

Print Convergence

115

Figure C-2. Flow diagram of the ELS extrapolation subroutine (extrapo­lation for the first derivative matrix only).

Page 134: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

LOGICAL FUNCTION ELSCDUM) COHMON/DEF/MRG.NVB.LRGCSO).LRGICSO),LRGJCSO).XRGCSO).LVB(lS).

C LVBS(lS).WT(20).WWT(240).VT(lS).VVTC22S).MG.MR.HS.OB(100). C OB1(20).OA(100).OA1(20).DEFG(100).IWT.IVT.IDEFG(18) COMMON/OPTZ/XO(lS).X(15).X2(15).DX(15).A(240).AA(240),A2(240).

116

C ATA(225).ATA1(22S).ORG(20).ORG2(20).TAR(20).B(20).ATF(20).PSD(20) C .TAYLOR1(20).TAYLOR2(20).DTAYLOR(20).DLINEAR(20).D2LINER(20), C T1PC(20).T2PC(20).RES(2),RO(2),Rl(2),R2C2), C P.PP,DP,IP.H,AITER3,LENS,ITER,ITERG.ITER1,ITER2.TOLDX,TOLIT, C TOL2IT,TOLSIT,TOLSLN,RSIG,R1S2.DXO(15),DX1(15),DX2(lS),Xl(lS), C OPTZQ(33),ITERO,ITAR1,ITAR2,ITARB, C IPSD,ITEROQ,ITER1Q,ITER2Q,ITAR1Q,ITAR2Q,ITARBQ,IPSDQ,IMQ,IR2, C ITRSTG,ITRDXQ,ITSSTQ.ITSDXQ,IT2STQ,IT2DXG

COMMON/INDXS/I01,I02,I03,I04,IOS,I06,IDVG.IDVGG,INDXGe28), C XINDXQ(20)

LOGICAL STAG C *** TEST FOR MAX. NO. OF ELS CYCLES

IFeITAR2 .GT. ITAR2Q) GO TO 1110 PRINT*,' ********** ***** RE-ITERATION ·.ITAR2,· ***** **********.

C *** THIS IS DX/PER TOTAL SUB-ITERATION C *** CALL MAINUP TO UPDATE TAYLOR2eJ) FOR NEW SUB-ITER ORG(J) C *** AA-MATRIX CORR. WITH 2ND DERIV. C *** NEG. IT2DXQ TO INDEX STAG CALC.

PRINT*.·***** SUBR2R',ITAR2,' R2( ',IR2,' ). ',R2eIR2),' RES( ., C IR2,' ). ',RES(IR2),' *****.

C *** TEST FOR STAGNATION CHECK REQUIREMENT IF(ITAR2 .LE. 1) GO TO 1100

C *** SOLUTION STAGNATION CHECK IFeSTAGeDX2,R2,IR2,IT2DX,IT2DXQ,IT2ST~IT2STQ,TOL2IT» GO TO 1160

1100 CONTINUE C *** DIVERGANCE CHECK

IFeRES(IR2) .GT. ROeIR2» GO TO 1180 ITAR2"'ITAR2+1

C *** CALC. SOLUTION VECTOR INCREMENT -DX DO 1120 J a l,NVB DX(J)=X2(J)-Xl(J) DX2CJ)-DXeJ)

C *** X HAY CHANGE DURING SUB-ITERATIONS X(J)"'X2(J)

C *** Xl WILL CHANGE ONLY DURING NEXT RE-ITER XleJ)·X2(J)

C *** ELS UPDATE OF FIRST DERIVATIVE MATRIX -A DO 1120 I-l,HRQ IJ=I+(J-l)*MRQ AeIJ)=A(IJ)+A2(IJ)*DXeJ)

1120 CONTINUE R2(IR2)aoRES(IR2)

C *** MOVE ITERATIVE ORGIN TO NEWEST SOLUTION POINT IlO 1150 I-l,MRQ

11~O ORGeI).ORG2(I) C *** SUCCESSFUL ELS CYCLE ••• CONTINUE

ELS=.TRUE. RETURN

1110 ITAR2-1 . PRINT*,' RE-ITER CYCLE LIMIT ',ITAR2Q,' ••• CONTINUE.· GO TO 1130

1160 PRINT*,' RE-ITER STAGNATION ••• CONTINUE.' GO TO 1130

1180 PRINT*,' DIVERGED RE-ITER ',ITAR2,' FAILED ••• CONTINUE.' 1130 CONTINUE

ITAR2-1 C *** UNSUCCESSFUL ELS CYCLE ••• CONTINUE

ELS-.FALSE. RETURN END

Page 135: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

REFERENCES

Adachi, Norihiko, "On Variable-Metric Algorithms," J. Optim Th, 7..., 391 (1971) .

Barns, J. G. P., "An Algorithm for Solving Non-linear Equations Based on The Secant Method," Computer J., ~, 66 (1965).

Bauer, F. L., "Elimination with Weighted Row Combinations for Solving Linear Equations and Least Squares Problems," Numr Math, 7..., 338 (1965).

Bjorck, A.ke, "Solving Linear Least Squares Problems by Gram-Schmidt Orthogonalization," BIT, 7..., 1 (1967a).

Bjorck, Ake, "Iterative Refinement of Linear Least Squares Solutions I," BIT, 7..., 257 (1967b).

Bjorck, A.ke, "Iterative Refinement of Linear Least Squares Solutions II," BIT, ~, 8 (1968).

BOX, M. J., "A New Method of Constrained Optimization and a Comparison wi t.h Other Techniques," Computer J., ~, 42 (1965).

Box, M. J., "A Comparison of Several Current Optimization Methods, and the Use of Transformations in Constrained Problems," Computer J., 2., 67 (1966).

Broyden, C. G., "A Class of Methods for Solving Nonlinear Simultaneous Equations," Math Comput, ~, 577 (1965).

Brunner, H., "Automatisches Korrigieren unter Beriickichtigung der Zweiten Albeitungen der Giitefunktion," Optica Acta, l!, 743 (1971).

Buchele, Donald R., "Damping Factors for the Least-Squares Method of Optical Design," Appl. Opt. 7..., 2433 (1968).

Cornwell, L. W., R. J. Pegis, A. K. Rigler, and T. P. Vogl, "Grey's Method for Nonlinear Optimization," J. Opt. Soc., 63, 576 (1973).

Cornwell, 1. W., and A. K. Rigler, "Comparison of Four Nonlinear Optimi­zation Methods," Appl. Opt., !!..' 1659 (1972).

117

Page 136: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

118

Davidon, Wm. C., "Variable Metric Method for Minimization," AEC Res. and Dev. Rept ANL-5990 (Rev), Argon Nat. Lab., Lemont, Ill. (1959).

Davidon, Wm. C., "Variable Metric Method for Minimization," AEC Res. and Dev. Rept ANL-5990 (Rev 2), Argon Nat. Lab., Lemont, Ill. (1966).

Davidon, Wm. C., "Variance A1goritlun for Minimization," Computer J., !Q" 406 (1968).

Dilworth, Donald C., "Pseudo-Second-Derivative Matrix and its Application to Automatic Lens Design," Appl. Opt . .!2., 3372 (1978).

Doyle, Thomas C., "Nonlinear Least Squares Optimization of a Continuous N-parameter System," Preprint LA-DC-72-l0l8 (LASL, Los Alamos, NM) (1972).

Faggiano, Andrea, "Automatic Lens Design with Pseudo-Second-Derivative Matrix: A Contribution," Appl. Opt. ~, 4226 (1980).

Feder, Donald P., "Automatic Lens Design Methods," J. Opt. Soc., 47, 902 (1957) .

Feder, Donald P., "The Symposium Lens," Appl. Opt., ~, 272 (1963a).

Feder, Donald P., "Automatic Optical Design," Appl. Opt., ~, 1209 (1963b).

Feder, Donald P., "Lens Design Viewed as an Optimization Process," in Recent Advances in Optimization Techniques, A. Lavi and T. Vogl (eds.), John Wiley & Sons, New York (1966).

Fletcher, R., "Generalized Inverse Methods for the Best Least Squares Solution of Systems of Non-linear Equations," Computer J., 10, 392 (1968). -

Fletcher, R., "A New Approach to Variable Metric Algorithms," Computer J., ~, 317 (1970).

Fletcher, R., "Methods for the Solution of Optimization Problems," Sympo­sium on Computer-Aided Engineering Proc., May 11-13, Waterloo, Ontario, Canada (1971), pp. 123-155.

Fletcher, R., and M. J. D. Powell, "A Rapidly Convergent Decent Method for Minimization," Computer J., ~, 163 (1963).

Fletcher, R., and C. M. Reeves, "Function Minimization by Conjugate Gradients," Computer J., 7..., 149 (1964).

Forsythe, George E., M. A. Malcolm, and C. B. Moler, Computer Methods for Mathematical Computations, Prentice-Hall, Englewood Cliffs, N.J. (1977).

Page 137: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

119

Glatzel, E., "Ein Neues Verfahren zur Automatischen Korrektion Optischer Systemme," Optik, ~, 577 (1961).

Glatzel, E., "A New Method for Automatic Correction of Optical Systems," Proc. of Conf. Lens Design with Large Computers, Univ. of Rochester, N.Y. (1966), pp. 23-41.

Glatzel, E., and R. Wilson, "Adaptive Automatic Correction in Optical Design," App1. Opt., 7..., 265 (1968).

Goldfarb, Donald, "A Family of Variable-Metric Methods Derived by Variational Means," Math Comput, 24, 23 (1970).

Golub, G., "Numerical Methods for Solving Linear Least Squares Problems," Numr Math, 7..., 206 (1965).

Golub, G., and W. Kahan, "Calculating the Singular Values and the Pseudo­Inverse of a Matrix," Siam J. Num AnI, Ser. B, ~, 205 (1965).

Golub, G. H., and C. Reinsch, "Singular Value Decomposition and Least Squares Solutions," Numr Math, ..!.!' 403 (1970).

Greenstadt, J., "Variations on Variable-Metric Methods," Math. Compt., ~, 1 (1970).

Grey, D. S., "Aberration Theories for Semiautomatic Lens Design by Electronic Computers, I Preliminary Remarks," J. Opt. Soc., 53, 672 (1963a).

Grey, D. S., "Aberration Theories for Semiautomatic Lens Design by Electronic Computers,'II A Specific Computer Program," J. Opt. Soc., 53, 677 (1963b).

Grey, D. S., "Boundary Conditions in Optimization Problems," in Recent Advances in Optimization Techniques, A. Lavi and T. Vogl (eds.), John Wiley & Sons, New York (1966).

Hestenes, Magnus R., and Eduard Stiefel, "Methods of Conjugate Gradients for Solving Linear Systems," J. of Res. NBS, 49, 409 (1952).

Hoerl, Arthur E., and Robert W. Kennard, "Ridge Regression: Biased Estimation for Nonorthogonal Problems," Technometrics, g, 55 (1970a) .

Hoerl, Arthur E., and Robert W. Kennard, "Ridge Regression: Applications to Nonorthogonal Problems," Technometrics, g, 69 (1970b).

Hopkins, R. E., and D. P. Feder, "The Symposium Lens - An Epilogue," Appl. Opt., ~, 1227 (1963).

Page 138: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

..

120

Hopkins, R. E., and G. Spencer, "Creative Thinking and Computing Machines in Optical Design," J. Opt. Soc., g, 172 (1962).

Huang, H. Y., "Unified Approach to Quadratically Convergent Algorithms for Functional Minimization," J. Optim Th, ~, 405 (1970).

Huang, H. Y., and A. V. Levy, "Numerical Experiments on Quadratically Convergent Algorithms for Function Minimization," J. Optm Th, .§..' 269 (1970) .

Huber, Edward D., "An Intercomparison of Lens Design Computer Programs -A New User's Viewpoint," Proc. SPIE on Computer Aided Optical Design, SPIE, 147, 45 (1978).

Jacoby, S. L. S., J. S. Kowalik, and J. T. Pizzo, Iterative Methods for Nonlinear Optimization Problems, Prentice Hall, Englewood Cliffs, N.J. (1972).

Jamieson, T. H., Optimization Techniques in Lens Design, Monographs on Applied Optics No.5, American Elsevier Pub. Co., New York (1971).

Jones, A., "Spiral - A New Algorithm for Non-linear Parameter Estimation using Least Squares," Computer J., ~, 301 (1970).

Juergens, Richard C., "The Sample Problem: A Comparative Study of Lens Design and Users," International Lens Design Conference Proc., SPIE, 237, 348 (1980).

Kidger, M. J., and C. G. Wynne, "Experiments with Lens Optimization Procedures," Optica Acta, ~, 279 (1967).

Krautter, Martin, "Zusammenhange der in Programmen zur Automatisch en Korrektion Verwendeten Losungsmatrizen" ("Relations Between Matrix-Solutions that are used in Programs for Automatic Lens Design"), Optik, 30, 334 (1970).

Lavi, Abraham, and T.'P. VagI, Recent Advances in Optimization Techniques, John Wiley & Sons, New York (1966).

Lawson, Charles L., and Richard J. Hanson, Solving Least Squares Problems, Prentice-Hall, Englewood Cliffs, N.J. (1974).

Levenberg, Kenneth, "A Method for the Solution of Certain Non-linear Problems in Least Squares," Q. Appl Math, ~, 164 (1944).

Marquardt, Donald W., "An Algorithm for Least-Squares Estimation of Non­linear Parameters," J. Siam, ,!!, 431 (1963).

Marquardt, Donald W., "Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation," Technometrics, g, 591 (1970).

Page 139: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

Martin, D. W., and G. J. Tee, "Iterative Methods for Linear Equations with Symmetric Positive Definite Matrix," Computer J., ,i., 242 (1961) .

121

McCarthy, Charles A., "A Note on the Automatic Correction of Third Order Aberrations," J. Opt. Soc., 45, 1087 (1955).

Meiron, Joseph, "Automatic Lens Design by the Least Squares Method," J. Opt. Soc., 49, 293 (1959).

Meiron, Joseph, "Damped Least-Squares Method for Automatic Lens Design," J. Opt. Soc., 55, 1105 (1965).

Morrison, David D., "Optimization by Least Squares," SIAM J. Num AnI, ~, 83 (1968).

Myers, Geraldine E., "Properties of the Conjugate-Gradient and Davidon Methods," J. Optim Th, ~, 209 (1968).

Nunn, M., and C. G. Wynne, "Lens Designing by Electronic Computer II," Proc. Phys. Soc. (London), 1, 316 (1960).

Pearson, J. D., "Variable Metric Methods of Minimization," Computer J., 12, 171 (1969).

Pegis, R. J., D. S. Grey, T. P. Vogl, and A. K. Rigler, "The Generalized Orthonormal Optimization Program and its Application," in Recent Advances in Optimization Techniques, A. Lavi and T. Vogl (eds.), John Wiley & Sons, New York (1966).

Peters, G., and J. H. Wilkinson, "The Least Squares Problem and Pseudo­Inverses," Computer J., .!l, 309 (1970).

Pierre, Donald A., Optimization Theory with Applications, John Wiley & Sons, New York (1969).

Powell, M. J. D., "An Iterative Method for Finding Stationary Values of a Function of Several Variables," Computer J., ~, 147 (1962).

Powell, M. J. D., "An Efficient Method for Finding the Minimum of a Function of Several Variables without Calculating Derivatives," Computer J., 1, 155 (1964).

Powell, M. J. D., "A Method for Minimizing a Sum of Squares of Non-linear Functions without Calculating Derivatives," Computer J., 1, 303 (1965) •

Rayces, Juan L., "Ten Years of Lens Design with Glatzel' s Adaptive Method," International Lens Design Conference Proc., SPIE, 237, 75 (1980).

Page 140: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

122

Rigler, A. K., and R. J. Pegis, "Optimization Methods in Optics," in The Computer in Optical Research, B. R. Frieden (ed.), Springer­Verlag, New York (1980).

Robb, Paul N., "Accelerating Convergence in Automatic Lens Design," App1. Opt., l!, 4191 (1979).

Rosen, Saul, and An-Min Chung, "Application of the Least-Squares Method," J. Opt. Soc., 46, 223 (1956).

Rosen, Saul, and Cornelius Eldert, "Least-Squares Method for Optical Correction," J. Opt. Soc., 44, 250 (1954).

Rosenbrock, H. H., "An Automatic Method for Finding the Greatest or Least Value of a Function," Computer J., l, 175 (1960).

Rosenbrock, H. H., "Corresponce," Response to Letter to Editor from Roger H. Moore, Computer J., ~, 224 (1965).

Seppala, Lynn G., "Optical Interpretation of the Merit Function in Grey's Lens Design Program," Appl. Opt., g, 671 (1974).

Sinclair, Douglas C., "Lens in a Hole Problem," paper presented at the International Lens Design Conference, June 23-27, Haverford College, Haverford, PA (1975).

Spencer, Gordon H., "A Flexible Automatic Lens Correction Procedure," Appl. Opt., ~, 1257 (1963).

Stewart, G. W. III, "A Modification of Davidon's Minimization Method to Accept Difference Approximations of Derivatives," J. Assoc. Comput Mach, ~, 72 (1967).

Tabata, T., and R. Ito, "Effective Treatment of the Interpolation Factor in Marquardt 1 s Nonlinear Least-Squares Fi t A1gorithm," Computer J., l!, 25 (1975).

Wampler, Roy H., "An Evaluation of Linear Least Squares Computer Programs," J. of Res. NBS-B Math. Sci, 73B, 59 (1969).

Wampler, Roy H., "Solutions to Weighted Least Squares Problems by Modified Gram-Schmidt with Iterative Refinement," J. Assoc. Comput Mach, ~, 457 (1979).

Wampler, Roy H., "Test Procedures and Test Problems for Least Squares Algorithms," J. of Econometrics, g, 3 (1980).

Page 141: EXTRAPOLATED LEAST SQUARES OPTIMIZATION ......4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost

123

Wynne, C. G., "Lens Designing by Electronic Digital Computer I," Proc. Phys. Soc. (London), 73, 777 (1959).

Wynne, C. G., and P. M. J. H. Wormell, "Lens Design by Computer," Appl. Opt., ~, 1233 (1963).