constrained non linear programming

Upload: ajeeta23

Post on 14-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 constrained non linear programming

    1/40

    149

    CONSTRAINED NONLINEAR PROGRAMMING

    We now turn to methods for general constrained nonlinear programming. These may

    be broadly classified into two categories:

    1. TRANSFORMATION METHODS: In this approach the constrained nonlinear

    program is transformed into an unconstrained problem (or a series of unconstrained

    problems) and solved using one (or some variant) of the algorithms we have already

    seen.

    Transformation methods may further be classified as

    (1) Exterior Penalty Function Methods

    (2) Interior Penalty (or Barrier) Function Methods

    (3) Augmented Lagrangian (or Multiplier) Methods

    2. PRIMAL METHODS: These are methods that work directly on the original

    problem.

    We shall look at each of these in turn; we first start with transformation methods.

  • 7/27/2019 constrained non linear programming

    2/40

    150

    Exterior Penalty Function Methods

    These methods generate a sequence of infeasible points whose limit is an optimal

    solution to the original problem. The original constrained problem is transformed into

    an unconstrained one (or a series of unconstrained problems).The constraints are

    incorporated into the objective by means of a "penalty parameter" which penalizes

    any constraint violation. Consider the problem

    Minf(x)

    st gj(x) 0,j=1,2,,m, hj(x) = 0,j=1,2,,p, xR

    n

    Define a penalty functionp as follows:

    Max0,

    where is some positive integer. It is easy to see that

    Ifgj(x) 0 then Max{0, gj(x)} = 0

    Ifgj(x) > 0 then Max{0, gj(x)} = gj(x) (violation)

    Now define the auxiliary function as

    P(x)=f(x) + (x),where >> 0.Thus if a constraint is violated,

    (x)>0 and a "penalty" of

    (x) is incurred.

  • 7/27/2019 constrained non linear programming

    3/40

    151

    The (UNCONSTRAINED) problem now is:

    Minimize P(x)=f(x) + (x)

    st xRn

    EXAMPLE:

    Minimizef(x)=x, st g(x)=2-x 0,xR (Note that minimum is atx* =2)

    Let (x)= Max[{0,g(x)}]2 ,

    i.e., 2 if 20if 2

    If(x)=0, then the optimal solution to Minimize is atx*= - and this isinfeasible; so 2

    Minimize hasoptimumsolution 2 12.

    :As , 2

    1

    2 2

    Penalty Function Auxiliary Function

    3 2 1 0 1 2 1 0 1 2 3

    1p

    2p

    p

    f+ 2p

    f+ 1p

    1=0.5

    2=1.5

  • 7/27/2019 constrained non linear programming

    4/40

    152

    In general:

    1. We convert the constrained problem to an unconstrained one by using thepenalty function.

    2. The solution to the unconstrained problem can be made arbitrarily close to thatof the original one by choosing sufficiently large .

    In practice, ifis very large, too much emphasis is placed on feasibility. Often,

    algorithms for unconstrained optimization would stop prematurely because the step

    sizes required for improvement become very small.

    Usually, we solve a sequence of problems with successively increasing

    values; the optimal point from one iteration becomes the starting point for the next

    problem.

    PROCEDURE: Choose the following initially:

    1. a tolerance ,2. an increase factor,3. a starting pointx1, and4. an initial 1.

    At Iteration i

    1. Solve the problem Minimizef(x)+ip(x); stxX (usually Rn). Usexias thestarting point and let the optimal solution bex

    i+1

    2. Ifip(xi+1) < then STOP; else let i+1 =(i)and start iteration (i+1)

  • 7/27/2019 constrained non linear programming

    5/40

    153

    EXAMPLE: Minimizef(x) =x12+ 2x2

    2

    st g(x) = 1x1 x2 0;xR2

    Define the penalty function

    1 if 00if 0

    The unconstrained problem is

    Minimizex12+ 2x2

    2+ (x)

    If(x)=0, then the optimal solution isx* =(0,0) INFEASIBLE!(x) = (1x1 x2)2, f(x) =x12 + 2x22+ [(1x1 x2)2], and the necessaryconditions for the optimal solution (f(x)=0) yield the following: 2 21 1 0,

    4 21 1 0

    Thus

    2

    2 3 and

    2 3

    Starting with =0.1, =10 andx1=(0,0) and using a tolerance of 0.005 (say), we have

    the following:

    Iter. (i) i xi+1 g(x

    i+1) ip(x

    i+1)

    1

    2

    34

    5

    0.1

    1.0

    10100

    1000

    (0.087, 0.043)

    (0.4, 0.2)

    (0.625, 0.3125)(0.6622, 0.3311)

    (0.666, 0.333)

    0.87

    0.40

    0.06250.0067

    0.001

    0.0757

    0.16

    0.0390.00449

    0.001

    Thus the optimal solution isx*= (2/3, 1/3).

  • 7/27/2019 constrained non linear programming

    6/40

    154

    INTERIOR PENALTY (BARRIER) FUNCTION METHODS

    These methods also transform the original problem into an unconstrained one;

    however the barrier functions prevent the current solution from ever leaving the

    feasible region. These require that the interior of the feasible sets be nonempty, which

    is impossible if equality constraints are present. Therefore, they are used with

    problems having only inequality constraints.

    A barrier functionB is one that is continuous and nonnegative over the interior

    of {x | g(x)0}, i.e., over the set {x | g(x)

  • 7/27/2019 constrained non linear programming

    7/40

    155

    Q. Why should be SMALL?

    A. Ideally we would likeB(x)=0 ifgj(x)

  • 7/27/2019 constrained non linear programming

    8/40

    156

    EXAMPLE: Minimizef(x) =x12+ 2x2

    2

    st g(x) = 1x1 x2 0;xR2

    Define the barrier function

    1The unconstrained problem is

    Minimizex12+ 2x2

    2+ (x) =x12 + 2x22 - 1

    The necessary conditions for the optimal solution (f(x)=0) yield the following:

    2 / 1 0, 4 / 1 0

    Solving,weget 1 1 33 and 1 1 3

    6

    Since the negative signs lead to infeasibility, we and

    Starting with =1, =0.1 andx

    1

    =(0,0) and using a tolerance of 0.005 (say), we have

    the following:

    Iter. (i) i xi+1 g(x

    i+1) iB(x

    i+1)

    1

    2

    3

    4

    5

    1

    0.1

    0.01

    0.001

    0.0001

    (1.0, 0.5)

    (0.714, 0.357)

    (0.672, 0.336)

    (0.6672, 0.3336)

    (0.6666, 0.3333)

    -0.5

    -0.071

    -0.008

    -0.0008

    -0.0001

    0.693

    0.265

    0.048

    0.0070

    0.0009

    Thus the optimal solution isx*= (2/3, 1/3).

  • 7/27/2019 constrained non linear programming

    9/40

    157

    Penalty Function Methods and Lagrange Multipliers

    Consider the penalty function approach to the problem below

    Minimize f(x)

    st gj(x) 0;j = 1,2,,m

    hj(x) = 0;j = m+1,m+2,,l xkRn.

    Suppose we use the usual interior penalty function described earlier, withp=2.

    The auxiliary function that we minimize is then given by

    P(x) =f(x) + p(x) =f(x) +{j[Max(0, gj(x))]2

    + j[hj(x)]2

    }

    =f(x) +{j[Max(0, gj(x))]2

    + j[hj(x)]2}

    The necessary condition for this to have a minimum is that

    P(x) = f(x) + p(x) = 0, i.e.,

    f(x) + j2[Max(0, gj(x)]gj(x) + j2[hj(x)]hj(x) = 0 (1)

    Suppose that the solution to (1) for a fixed (say k>0) is given byxk.

    Let us also designate

    2k[Max{0, gj(xk

    )}] = j(k); j=1,2,,m (2)

    2k[hj(xk)] = j(k); j=m+1,,l (3)

    so that for=kwe may rewrite (1) as

  • 7/27/2019 constrained non linear programming

    10/40

    158

    f(x) + jj(k) gj(x) + jj(k) hj(x) = 0 (4)

    Now consider the Lagrangian for the original problem:

    L(x,) =f(x) + jjgj(x) + jjhj(x)

    The usual K-K-T necessary conditions yield

    f(x) + j=1..m {jgj(x)} + j=m+1..l {jhj(x)} = 0 (5)

    plus the original constraints, complementary slackness for the inequality constraints

    and j0 forj=1,,m.

    Comparing (4) and (5) we can see that when we minimize the auxiliary function

    using =k, thej(k) values given by (2) and (3) estimate the Lagrange multipliers

    in (5). In fact it may be shown that as the penalty function method proceeds and

    kand thexkx*, the optimum solution, the values ofj(k) j*, the optimum

    Lagrange multiplier value for constraintj.

    Consider our example on page 153 once again. For this problem the Lagrangian is

    given byL(x,)=x12

    + 2x22+ (1-x1-x2).

    The K-K-T conditions yield

    L/x1 = 2x1-= 0; L/x2 = 4x2-= 0; (1-x1-x2)=0

    Solving these results inx1*=2/3;x2

    *=1/3;

    *=4/3; (=0 yields an infeasible solution).

  • 7/27/2019 constrained non linear programming

    11/40

    159

    Recall that for fixed kthe optimum value ofxk was [2k/(2+3k) k/(2+3k)]T. As

    we saw, when k, these converge to the optimum solution ofx*= (2/3,1/3).

    Now, suppose we use (2) to define () = 2[Max(0, gj(x

    k

    )]

    = 2[1 - {2/(2+3)} - {/(2+3)}] (since gj(xk)>0 if>0)

    = 2[1 - {3/(2+3)}] = 4/(2+3)

    Then it is readily seen that Lim() = Lim4/(2+3) = 4/3 = *

    Similar statements can also be made for the barrier (interior penalty) function

    approach, e.g., if we use the log-barrier function we have

    P(x) =f(x) + p(x) =f(x) +j-log(-gj(x)),

    so that

    P(x) = f(x) - j{/gj(x)}gj(x) = 0 (6)

    If (like before) for a fixed kwe denote the solution byxkand define -k/gj(x

    k) =

    j(k), then from (6) and (5) we see that j(k) approximates j. Furthermore, as

    k0and thexk

    x*

    , it can be shown that j(k) j*

    For our example (page 156) -k/gj(xk) =k/1

    =

    -2k/(1-1 3 4/3, as k0.

  • 7/27/2019 constrained non linear programming

    12/40

    160

    Penalty and Barrier function methods have been referred to as SEQUENTIAL

    UNCONSTRAINED MINIMIZATION TECHNIQUES (SUMT) and studied in

    detail by Fiacco and McCormick. While they are attractive in the simplicity of the

    principle on which they are based, they also possess several undesirable properties.

    When the parameters are very large in value, the penalty functions tend to be

    ill-behaved near the boundary of the constraint set where the optimum points usually

    lie. Another problem is the choice of appropriate 1and values. The rate at which

    change (i.e. the values) can seriously affect the computational effort to find a

    solution. Also as increases, the Hessian of the unconstrained function becomes ill-

    conditioned.

    Some of these problems are addressed by the so-called "multiplier" methods.

    Here there is no need forto go to infinity, and the unconstrained function is better

    conditioned with no singularities. Furthermore, they also have faster rates of

    convergence than SUMT.

  • 7/27/2019 constrained non linear programming

    13/40

    161

    Ill-Conditioning of the Hessian Matrix

    Consider the Hessian matrix of the Auxiliary function P(x)=x12+2x2

    2+(1-x1-x2)

    2:

    H =

    242

    222. Suppose we want to find its eigenvalues by solving

    |H-I| = (2+2-)*(4+2-) - 42

    = 2

    (6+4) + (8+12) = 0

    This quadratic equation yields

    = 14)23(2

    Taking the ratio of the largest and the smallest eigenvalue yields

    14)23(14)23(

    2

    2

    . It should be clear that as , the limit of the preceding ratio

    also goes to . This indicates that as the iterations proceed and we start to increase

    the value of , the Hessian of the unconstrained function that we are minimizing

    becomes increasingly ill-conditioned. This is a common situation and is especially

    problematic if we are using a method for the unconstrained optimization that requires

    the use of the Hessian.

  • 7/27/2019 constrained non linear programming

    14/40

    162

    MULTIPLIER METHODS

    Consider the problem: Minf(x), st gj(x)0,j=1,2,...,m.

    In multiplier methods the auxiliary function is given by

    P(x) =f(x) + Max0,

    where j>0 and 0are parameters associated with thejth constraint. This is also

    often referred to as theAUGMENTED LAGRANGIANfunction

    Note that if=0 and j=, this reduces to the usual exterior penalty function.

    The basic idea behind multiplier methods is as follows:

    Letxibe the minimum of the auxiliary function Pi(x) at some iteration i. From the

    optimality conditions we have

    P 0,

    Now, 0, , . Therefore P implies

    ,

    Let us assume for a moment that are chosen at each iteration i so that they satisfy

    the following:

    (1)

    It is easily verified that (1) is equivalent to requiring that

    0, 0and 0

    , 0

  • 7/27/2019 constrained non linear programming

    15/40

    163

    Therefore as long as (2) is satisfied, we have for the pointxithat minimize Pi(x)

    P (3)

    If we let

    and use the fact that j>0,then (2) reduces to

    0, 0and 0, (A)and (3) reduces to

    . (B)

    It is readily seen that (A) and (B) are merely the KARUSH-KUHN-TUCKER

    necessary conditions forxi to be a solution to the original problem below!!

    Minf(x), st gj(x)0,j=1,2,...,m.

    We may therefore conclude that

    and

    Herex*is the solution to the problem, and j

    *is the optimum Lagrange multiplier

    associated with constraintj.

    From the previous discussion it should be clear that at each iteration jshould be

    chosen in such a way that , 0, which results inxix*.Note thatxiisobtained here by minimizing the auxiliary function with respect tox.

  • 7/27/2019 constrained non linear programming

    16/40

    164

    ALGORITHM: A general algorithmic approach for the multiplier method may now

    be stated as follows:

    STEP 0: Set i=0, choose vectorsxi, and .STEP 1: Set i=i+1.STEP 2: Starting atxi-1, Minimize Pi(x) to findxi.STEP 3: Check convergence criteria and go to Step 4 only if the criteria are not

    satisfied.

    STEP 4: Modify

    based on satisfying (1). Also modify if necessary and go toStep 2.

    (Usually is set to 0)

    One example of a formula for changing is the following:

    ,

  • 7/27/2019 constrained non linear programming

    17/40

    165

    PRIMAL METHODS

    By a primal method we mean one that works directly on the original constrained

    problem:

    Minf(x)

    st gj(x)0j=1,2,...,m.

    Almost all methods are based on the following general strategy:

    Letx

    i

    be a design at iteration i. A new designx

    i+1

    is found by the expressionx

    i+1

    =x

    i

    + i, where iis a step size parameter and is a search direction (vector). Thedirection vectoris typically determined by an expression of the form:

    where fand gj are gradient vectors of the objective and constraint functions andJis an active set" given byJ={j|gj(x

    i)+ 0, >0}.

    Unlike with transformation methods, it is evident that here the gradient vectors of

    individual constraint functions need to be evaluated. This is a characteristic of all

    primal methods.

  • 7/27/2019 constrained non linear programming

    18/40

    166

    METHOD OF FEASIBLE DIRECTIONS

    The first class of primal methods we look at are the feasible directions methods,

    where eachxiis within the feasible region. An advantage is that if the process is

    terminated before reaching the true optimum solution (as is usually necessary with

    large-scale problems) the terminating point is feasible and hence acceptable. The

    general strategy at iteration i is:

    (1)Letxibe a feasible point.(2)

    Choose a direction

    so that

    a)xi+ is feasible at least for some "sufficiently" small >0, andb)f(xi+)

  • 7/27/2019 constrained non linear programming

    19/40

    167

    Further, if we also havef(x'+d)

  • 7/27/2019 constrained non linear programming

    20/40

    168

    Geometric Interpretation

    Minimizef(x), st gj(x) 0,j=1,2,,m

    (i) (ii)In (ii)dany positive step in this direction is infeasible (however we will often

    consider this a feasible direction)

    FARKAS LEMMA: GivenARmn,bRm,xRn,yRm,the following statements

    are equivalent to each other:

    1. yTA 0 yTb 0

    2. z such thatAz=b, z0

    dTgj(x*)

  • 7/27/2019 constrained non linear programming

    21/40

    169

    Application to NLP

    Let b -f(x*) Ajgj(x*)y d(direction vector), zjjforjJx* {j |gj(x*)=0}

    Farkas Lemma then implies that

    (1) dTgj(x*) 0 jJx* -dTf(x*) 0, i.e.,dTf(x*) 0

    (2) j0such that jJx*jgj(x*) = -f(x*)are equivalent! Note that (1) indicates that

    directions satisfyingdTgj(x*) 0jJx* are feasible directions directions satisfying dTf(x*) 0 are ascent (non-improving) directions

    On the other hand (2) indicates the K-K-T conditions. They are equivalent!

    So at the optimum, there is no feasible direction that is improving

    Similarly for the general NLP problem

    g2(x*)

    Cone generated by gradients

    of tight constraints

    K-K-T implies that the steepest descent direction is in

    the cone generated by gradients of tight constraints

    *

    g3(x)=0

    g2(x)=0

    g3(x

    *

    )

    -f(x*)

    g1(x)=0

    g1(x

    *)

    Feasible region

  • 7/27/2019 constrained non linear programming

    22/40

    170

    Minimizef(x)

    st gj(x) 0,j=1,2,...,m

    hk(x) 0, k=1,2,...,p

    the set of feasible directions is

    Dx'= {d| gjT(x')d< 0 jJx', and hkT(x') d= 0 k}====================================================

    In a feasible directions algorithm we have the following:

    STEP 1 (direction finding)

    To generate an improving feasible direction fromx', we need to finddsuch that

    Fx'Dx', i.e.,

    fT(x')d< 0 and gjT(x')d< 0 forjJx', hkT(x') d= 0 kWe therefore solve the subproblem

    Minimize fT(x')dst gjT(x')d< 0 forjJx'

    (0 forj gjlinear)

    hkT (x')d= 0 for all k,along with some normalizing constraints such as

    -1 di 1 fori=1,2,...,n or d2=dTd 1.

    If the objective of this subproblem is negative, we have an improving feasible

    directiondi.

  • 7/27/2019 constrained non linear programming

    23/40

    171

    In practice, the actual subproblem we solve is slightly different since strict inequality

    (

  • 7/27/2019 constrained non linear programming

    24/40

    172

    NOTE: (1) In practice the setJx' is usually defined asJx'={ j |gj(xi)+0}, where the

    tolerance iis referred to as the "constraint thickness," which may be

    reduced as the algorithm proceeds.

    (2) This procedure is usually attributed to ZOUTENDIJK

    EXAMPLE

    Minimize 2x12+x2

    2- 2x1x2 - 4x1 - 6x2

    st x1 + x2 8

    -x1 + 2x2 10

    -x1 0

    -x2 0 (A QP)

    For this, we have

    1

    1 , 1

    2 , 1

    0 , 0

    1,

    and 4 2 42 2 6.

    Let us begin with 00,withf(x0)=0

    ITERATION 1 J={3,4}. STEP 1 yields

    Minz Minz

    st fT(x1)dz d1(4x1-2x2-4) + d2(2x2-2x1 -6) zg3T(xi)dz -d1zg4T(xi)dz -d2z-1d1, d21 d1,d2 [-1,1]

  • 7/27/2019 constrained non linear programming

    25/40

    173

    i.e., Minz

    st -4d1- 6d2 z, -d1z, -d2z d1,d2 [-1,1]

    yieldingz*

    = -1 (

  • 7/27/2019 constrained non linear programming

    26/40

    174

    Thus max = { | 8-8, -4+810, -40, -40} = 4

    We therefore solve: Minimize 2(4-)2+ 4

    2 2(4-)4 4(4-) 6(4), st 04.

    =1=1

    34

    , withf(x2)= -26.

    ITERATION 3 J={}. STEP 1 yields

    Minz

    st d1(4x1-2x2-4) + d2(2x2-2x1 -6) z

    -1d1, d21

    i.e., Minz

    st -4d2 z, d1,d2 [-1,1]

    yieldingz*

    = -4 (

  • 7/27/2019 constrained non linear programming

    27/40

    175

    ITERATION 4 J={1}. STEP 1 yields

    Minz

    st d1(4x1-2x2-4) + d2(2x2-2x1 -6) z

    d1+ d2z

    -1d1, d21

    i.e., Minz

    st -2d1- 10d2 z, d1d2z, d1,d2 [-1,1]

    yieldingz* = 0, withd* = 00. STOP

    Therefore the optimum solution is given by 35, withf(x*)= -29.

    1 2 3 4 5 6 87

    1

    2

    3

    4

    5

    6

    8

    7

    x0

    x1

    x2

    x3

    -x1+2 x2=10

    x1+ x2=8

    path followed by algorithm

    feasible region

  • 7/27/2019 constrained non linear programming

    28/40

    176

    GRADIENT PROJECTION METHOD (ROSEN)

    Recall that the steepest descent direction is -f(x). However, for constrained

    problems, moving along -f(x) may destroy feasibility. Rosen's method works by

    projecting -f(x) on to the hyperplane tangent to the set of active constraints. By

    doing so it tries to improve the objective while simultaneously maintaining

    feasibility. It usesd=-Pf(x) as the search direction, whereP is a projection matrix.

    Properties ofP (nn matrix)

    1)PT =P andPTP = P (i.e.,Pis idempotent)2)P is positive semidefinite3)P is a projection matrix if, and only if,I-P is also a projection matrix.4) Let Q=I-P andp= Px1, q=Qx2 wherex1, x2Rn. Then

    (a)pTq= q

    Tp =0 (p and q are orthogonal)

    (b) AnyxRn can be uniquely expressed asx=p+q, wherep=Px, q=(I-P)x

    x2

    x1

    x

    p

    q

    givenx, we can

    findp andq

  • 7/27/2019 constrained non linear programming

    29/40

    177

    Consider the simpler case where all constraints are linear. Assume that there are 2

    linear constraints intersecting along a line as shown below.

    Obviously, moving along -f would take us outside the feasible region. Say we

    project -f on to the feasible region using the matrixP.

    THEOREM: As long asP is a projection matrix andPf0,d=-Pfwill be animproving direction.

    PROOF: fTd= fT(-Pf) = -fTPTPf= -Pf2 < 0. (QED)

    For -Pf to also be feasible we must have g1T (-Pf) and g2T (-Pf) 0 (bydefinition).

    -f

    ( -P)(-f)g2

    g1

    g1(x)=0 g2(x)=0

    P(-f)

  • 7/27/2019 constrained non linear programming

    30/40

    178

    Say we pickP such that g1T(-Pf) = g2T(-Pf) = 0, so that -Pf is a feasibledirection). Thus if we denoteMas the (2n) matrix where Row 1 is g1T andRow 2 is g2T, then we haveM(-Pf)= 0 (*)

    To find the form of the matrixP, we make use of Property 4(b) to write the vector -fas -f= P(-f) + [I-P](-f) = -Pf - [I-P]f

    Now, -g1 and -g2 are both orthogonal to -Pf, and so is [I-P](-f). Hence we must

    have

    [I-P](-f) =- [I-P]f= 1g1+ 2g2

    -[I-P]f = MT, where = -M[I-P]f = MMT(MMT)-1M[I-P]f= (MMT)-1MMT= = { (MMT)-1Mf (MMT)-1 MPf}= { (MMT)-1Mf+(MMT)-1M(-Pf)}= (MMT)-1Mf (from (*) aboveM(-Pf)=0)

    Hence we have

    -f = -Pf - [I-P]f= -Pf +MT-Pf +MT(MMT)-1Mfi.e.,Pf = f -MT(MMT)-1Mf= [I-MT(MMT)-1M]fi.e., P = [I-M

    T(MM

    T)

    -1]

  • 7/27/2019 constrained non linear programming

    31/40

    179

    Question: What ifPf(x) = 0?Answer: Then 0 =Pf = [IMT(MMT)-1M]f= f +MTw, wherew = -(MM

    T)

    -1Mf. Ifw0,thenx satisfies the Karush-Kuhn-Tucker conditions and

    we may stop; if not, a new projection matrix could be identified such thatd= -fis an improving feasible direction. In order to identify this matrix we merely pickany component ofw that is negative, and drop the corresponding row from the matrix

    M. Then we use the usual formula to obtain .

    ====================

    Now consider the general problem

    Minimizef(x)

    st gj(x) 0,j=1,2,...,m

    hk(x) 0, k=1,2,...,p

    The gradient projection algorithm can be generalized to nonlinear constraints by

    projecting the gradient on to the tangent hyperplane atx (rather than on to the surface

    of the feasible region itself).

  • 7/27/2019 constrained non linear programming

    32/40

    180

    STEP 1: SEARCH DIRECTION

    Letxibe a feasible point and letJbe the "active set", i.e.,J= {j | gj(x

    i)=0}, or more

    practically,J= {j | gj(xi)+0}.

    Suppose eachjJis approximated using the Taylor series expansion by:

    gj(x) = gj(xi) + (x-x

    i)gj(xi).

    Notice that this approximation is a linear function ofx with slope gj(xi). LetMmatrix whose rows are gjT(xi) forjJ(active constraints) and hkT(xi)

    fork=1,2,...,p (equality constraints)

    Let letP=[I-MT(MMT)-1M] (projection matrix). IfMdoes not exist, letP=I. Letdi = -Pf(xi).

    a) Ifdi =0 andMdoes not exist STOPb) Ifdi =0 andMexists find

    = -(MMT)

    -1Mf(xi)

    where u corresponds tog and v toh. Ifu>0 STOP, else delete the row ofM

    corresponding to some uj

  • 7/27/2019 constrained non linear programming

    33/40

    181

    STEP 2. STEP SIZE (Line Search)

    SolveMinimize

    where max= Supremum {| gj(xi+d

    i) 0, hj(x

    i+d

    i) =0}.

    Call the optimal solution *and findx

    i+1= x

    i+

    *d

    i. If all constraints are not linear

    make a "correction move" to returnx into the feasible region. The need for this is

    shown below. Generally, it could be done by solvingg(x)=0 starting atxi+1using the

    Newton Raphson approach, or by moving orthogonally fromxi+1

    etc.

    g1, g2 are linear constraints g1 is a nonlinear constraint

    NO CORRECTION REQD. REQUIRES CORRECTION

    g1=0

    g2=0 g1f

    Pf

    O.F. contours f

    Pfg1=0

    O.F. contours

    correction move

    xi+1

  • 7/27/2019 constrained non linear programming

    34/40

    182

    LINEARIZATION METHODS

    The general idea of linearization methods is to replace the solution of the NLP by the

    solution of a series of linear programs which approximate the original NLP.

    The simplest method is RECURSIVE LP:

    Transform Minf(x), st gj(x), j=1,2,...,m,

    into the LP

    Min {f(x

    i

    ) + (x-x

    i

    )

    Tf(x

    i

    )}

    st gj(xi) + (x-x

    i)

    Tgj(xi) 0, for allj.Thus we start with some initialx

    iwhere the objective and constraints (usually only

    the tight ones) are linearized, and then solve the LP to obtain a newxi+1

    and

    continue...

    Note thatf(xi) is a constant andx-xi=dimplies the problem is equivalent to

    Min dTf(xi)

    st dTgj(xi) -gj(xi), for allj.

    and this is a direction finding LP problem, andxi+1

    =xi+dimplies that the step

    size=1.0!

    EXAMPLE: Minf(x)= 4x1 x22

    -12,

    st g1(x) =x12

    +x22

    - 25 0,

    g2(x) = -x12

    -x22

    +10x1 + 10x2 - 34 0

    If this is linearized at the pointxi=(2,4), then since

  • 7/27/2019 constrained non linear programming

    35/40

    183

    42 4

    8; 22

    48;

    2 102 10 62

    it yields the following linear program (VERIFY)

    Min(x) = 4x1 8x2 + 4,st 1(x) = 4x1 + 8x2 - 45 0,

    2(x) = 6x1 + 2x2 - 14 0

    The method however has limited application since it does not converge if the local

    minimum occurs at a point that is not a vertex of the feasible region; in such cases it

    oscillates between adjacent vertices. To avoid this one may limit the step size (for

    instance, the Frank-Wolfe method minimizesfbetweenxiandx

    i+dto getx

    i+1).

    f=-20

    f=-10

    f=0

    =-30

    =01=0

    2=0

    1=0

    2=0

    *

  • 7/27/2019 constrained non linear programming

    36/40

    184

    RECURSIVE QUADRATIC PROGRAMMING (RQP)

    RQP is similar in logic to RLP, except that it solves a sequence of quadratic

    programming problems. RQP is a better approach because the second order terms

    help capture curvature information.

    Min dTL(xi) + dTBd

    stdTgj (xi) -gj(xi) j

    whereB is the Hessian (or an approximation of the Hessian) of the Lagrangian

    function of the objective functionfat the pointxi. Notice that this a QP indwith

    linear constraints. Once the above QP is solved to getdi, we obtainx

    iby finding an

    optimum step size alongxi+d

    iand continue. The matrixB is never explicitly

    computed --- it is usually updated by schemes such as the BFGS approach using

    information from f(xi) and f(xi+1) only. (Recall the quasi-Newton methods forunconstrained optimization)

  • 7/27/2019 constrained non linear programming

    37/40

    185

    KELLEY'S CUTTING PLANE METHOD

    One of the better linearization methods is the cutting plane algorithm of J.E.Kelley

    developed for convex programming problems.

    Suppose we have a set of nonlinear constraints gj(x)0 forj=1,2,...,m that determine

    the feasible region G. Suppose further that we can find a polyhedral set H that

    entirely contains the region determined by these constraints, i.e., GH.

    In general, a cutting plane algorithm would operate as follows:

    1. Solve the problem with linear constraints.2. If the optimal solution to this problem is feasible in the original (nonlinear)

    constraints then STOP; it is also optimal for the original problem.

    3. Otherwise add an extra linear constraint (i.e., an extra line/hyperplane) so that thecurrent optimal point becomes infeasible for the new problem with the additional

    constraint (we thus "cut out" part of the infeasibility).

    Now return to Step 1.

    1=0

    2=0

    3=0G

    1=0

    H

    2=03=0

    4=0

    G:

    j(x): nonlinearj(x): linear; hence

    His polyhedral

  • 7/27/2019 constrained non linear programming

    38/40

    186

    JUSTIFICATION

    Step 1: It is in general easier to solve problems with linear constraints than ones with

    nonlinear constraints.

    Step 2: G is determined by gj(x)0,j=1,...,m, where each gjis convex, while

    H is determined by hk0 k, where each hkis linear. G is wholly contained inside

    the polyhedral set H.

    Thus every point that satisfies gj(x)0 automatically satisfies hk(x)0, but not

    vice-versa. Thus G places "extra" constraints on the design variables and therefore

    the optimal solution with G can never be any better than the optimal solution with H.

    Therefore, if for some region H, the optimal solutionx*

    is also feasible in G, it

    MUST be optimal forG also. (NOTE: In general, the optimal solution forH at some

    iteration will not be feasible in G).

    Step 3: Generating a CUT

    In Step 2, let Maximum , whereJis the set of violated constraints,

    J={j | gj(x)>0}, i.e., is the value of the most violated constraint atxi

    . By the

    Taylor series approximation aroundxi,

    +

    , for everyxRn (since is convex).

  • 7/27/2019 constrained non linear programming

    39/40

    187

    If then

    But since is violated atxi, 0. Therefore 0xRn.

    is violated for allx,

    the original problem is infeasible.

    If , then consider the following linear constraint:

    i.e., 0 ()

    (Note thatxi, and are all known)

    The current pointxiis obviously infeasible in this constraint --- since hk(x

    i) 0

    implies that 0, which contradicts the fact thatis violated atxi , (i.e, (xi) >0).

    So, if we add this extra linear constraint, then the optimal solution would

    change because the current optimal solution would no longer be feasible for the new

    LP with the added constraint. Hence () defines a suitable cutting plane.

    NOTE: If the objective for the original NLP was linear, then at each iteration we

    merely solve a linear program!

  • 7/27/2019 constrained non linear programming

    40/40

    188

    An Example (in R2) of how a cutting plane algorithm might proceed.

    (--------- cutting plane).

    Nonnegativity + 5 linear

    constraints

    G

    3H

    21

    4=x*

    5 6

    G

    3H

    21

    45

    True optimum

    x*

    forG

    Nonnegativity + 6 linear

    constraints

    Nonnegativity + 4 linear

    constraints

    G

    3H

    21

    4

    Nonnegativity + 3 linear

    constraints

    G

    3H

    21

    h4

    h5

    h6