introducing the bounded derivative network—superceding the application of neural networks in...

9
Introducing the bounded derivative network—superceding the application of neural networks in control Paul Turner * , John Guiver Aspen Technology Inc., 1293 Eldridge Parkway, Houston, TX, USA Received 7 January 2003; received in revised form 11 March 2004; accepted 10 August 2004 Abstract The Bounded Derivative Network (BDN), the analytical integral of a neural network, is a natural and elegant evolution of uni- versal approximating technology for use in automatic control schemes. This modeling approach circumvents the many real problems associated with standard neural networks in control such as model saturation (zero gain), arbitrary model gain inversion, Ôblack boxÕ representation and inability to interpolate sensibly in regions of sparse excitation. Although extrapolation is typically not an advan- tage unless the understanding of the process is complete, the BDN can incorporate process knowledge in order that its extrapolation capability is inherently sensible in areas of data sparsity. This ability to impart process knowledge on the BDN model enables it to be safely incorporated into a model based control scheme. Ó 2004 Elsevier Ltd. All rights reserved. Keywords: Nonlinear control; Neural Networks 1. Introduction Recent publications have seriously questioned the relevance of neural networks (see Figs. 1 and 2) in auto- matic control [10,11,2,15]. The sophisticated suppres- sion mechanisms that accompany this technology for automatic control applications [6,3] clearly show the lack of trust engineers have in the current technology. When the list of their control related inadequacies [10] is combined with these methods of suppressing the influ- ence of the neural network on the controller, the contri- bution of neural networks to enhanced control performance comes into serious question. On investiga- tion of The Bounded Derivative Network, many natu- rally evolving properties are identified that make this technology a more appropriate modeling paradigm for automatic control applications. This paper describes the algorithm in detail and presents the application of this technology to a commercial polypropylene process. The BDN technology is not a black box modeling para- digm (in much the same way a linear model is not a black box). The ability to specify minimum and maxi- mum gains on each input/output relationship and also the gain trajectory (i.e. whether the gain increases or de- creases with input value) creates a transparent model the properties of which are globally understood. 2. Neural network dangers in control The derivative of a neural network model is indeed bounded. However, the lower gain bound of a neural network is always either 0 (resulting in catastrophic con- troller singularities) or of opposite sign to the upper gain (which will cause plant valves to open when they should 0959-1524/$ - see front matter Ó 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.jprocont.2004.08.001 * Corresponding author. Tel.: +44 1325 751 332; fax: +44 1925 844484. E-mail address: [email protected] (P. Turner). www.elsevier.com/locate/jprocont Journal of Process Control 15 (2005) 407–415

Upload: paul-turner

Post on 26-Jun-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introducing the bounded derivative network—superceding the application of neural networks in control

www.elsevier.com/locate/jprocont

Journal of Process Control 15 (2005) 407–415

Introducing the bounded derivative network—supercedingthe application of neural networks in control

Paul Turner *, John Guiver

Aspen Technology Inc., 1293 Eldridge Parkway, Houston, TX, USA

Received 7 January 2003; received in revised form 11 March 2004; accepted 10 August 2004

Abstract

The Bounded Derivative Network (BDN), the analytical integral of a neural network, is a natural and elegant evolution of uni-versal approximating technology for use in automatic control schemes. This modeling approach circumvents the many real problemsassociated with standard neural networks in control such as model saturation (zero gain), arbitrary model gain inversion, �black box�representation and inability to interpolate sensibly in regions of sparse excitation. Although extrapolation is typically not an advan-tage unless the understanding of the process is complete, the BDN can incorporate process knowledge in order that its extrapolationcapability is inherently sensible in areas of data sparsity. This ability to impart process knowledge on the BDNmodel enables it to besafely incorporated into a model based control scheme.� 2004 Elsevier Ltd. All rights reserved.

Keywords: Nonlinear control; Neural Networks

1. Introduction

Recent publications have seriously questioned therelevance of neural networks (see Figs. 1 and 2) in auto-matic control [10,11,2,15]. The sophisticated suppres-sion mechanisms that accompany this technology forautomatic control applications [6,3] clearly show thelack of trust engineers have in the current technology.When the list of their control related inadequacies [10]is combined with these methods of suppressing the influ-ence of the neural network on the controller, the contri-bution of neural networks to enhanced controlperformance comes into serious question. On investiga-tion of The Bounded Derivative Network, many natu-rally evolving properties are identified that make this

0959-1524/$ - see front matter � 2004 Elsevier Ltd. All rights reserved.doi:10.1016/j.jprocont.2004.08.001

* Corresponding author. Tel.: +44 1325 751 332; fax: +44 1925844484.

E-mail address: [email protected] (P. Turner).

technology a more appropriate modeling paradigm forautomatic control applications. This paper describesthe algorithm in detail and presents the application ofthis technology to a commercial polypropylene process.The BDN technology is not a black box modeling para-digm (in much the same way a linear model is not ablack box). The ability to specify minimum and maxi-mum gains on each input/output relationship and alsothe gain trajectory (i.e. whether the gain increases or de-creases with input value) creates a transparent model theproperties of which are globally understood.

2. Neural network dangers in control

The derivative of a neural network model is indeedbounded. However, the lower gain bound of a neuralnetwork is always either 0 (resulting in catastrophic con-troller singularities) or of opposite sign to the upper gain(which will cause plant valves to open when they should

Page 2: Introducing the bounded derivative network—superceding the application of neural networks in control

Inputs

Model Bias

Scaling Block

Summation and Bounded Transform

Scaling Block

Output

Model Coefficient (multiplier, weight)

Fig. 1. Artificial neural network.

-5 -4 -3 -2 -1 0 1 2 3 4 5-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Input Variable Z

Out

put =

tanh

(Z)

Graph of Hyperbolic Tangent 'tanh' Nonlinearity

Fig. 2. Hyperbolic tangent nonlinearity.

408 P. Turner, J. Guiver / Journal of Process Control 15 (2005) 407–415

be closing on monotonic processes). The importantpoint to stress here is that these problems are intrinsicto neural networks. The regions of �zero gain� and�Gaussian Decay� cannot be algorithmically elimi-nated—they are always there. Indeed any attempt toeliminate the problem via algorithm (as opposed toarchitecture) actually exacerbates the situation as thissimply masks the problem and creates latent dangers.This has severe implications for closed loop controlapplications [15]. The BDN (in contrast) can be globallyconstrained to guarantee such conditions will never oc-cur. This renders the model transparent and well be-haved in regions of interpolation and extrapolation.

3. Introducing the bounded derivative network

The Bounded Derivative Network is essentially theanalytical integral of a neural network. The exampledemonstrated here is the analytical integral of a hyper-bolic tangent neural network but the principles applyequally to the integral of the sigmoidal neural modelarchitecture.

The first step in building the integral of a neural net-work is to integrate the activation function. A bias termis added to the hyperbolic tangent so that the integral ofthe function (the new bounded derivative activationfunction) can be forced to be monotonic by setting thebias greater than +1. The integral of a hyperbolic tan-gent can be calculated as follows:Z

ðtanhðaxþ a0Þ þ kÞdx¼ 1

alogðcoshðaxþ a0ÞÞ þ kxþ c

ð1Þ

In the final model, the output of each activation func-tion will be multiplied by an independent coefficient.The 1

a term can be absorbed into this coefficient. Whenthe bounded derivative network is differentiated, a par-tial differentiation is performed between the input inquestion and the model output. Hence the a0 term inEq. (1) will be a combination of the bias to that partic-ular hidden node and the weighted sum of the other in-puts. The Bounded Derivative activation function istherefore as follows:

f ð�Þ ¼ k1 log cosh a0 þXnii¼1

aixi

! !

þ k2 a0 þXnii¼1

aixi

!!ð2Þ

The partial derivative of this function with respect toany of the inputs is a hyperbolic tangent transformation(plus a constant).

The final step in the integration of the neural networkis the integration of the model bias. This results in a lin-ear sum of the input variables added to the output nodeof the network. Although this step is technically redun-dant (due to the linear term in Eq. (2)) it provides a sim-ple exposure of a bias to each input/output gain profilethat can be manipulated post identification if required.Fig. 3 displays the architecture of this formulation.The pre and post scaling is assumed to be external tothe architecture outlined in Fig. 3.

One of the many advantages of this architecture isthat theoretical global bounds on the minimum andmaximum input/output gains (i.e. the partial derivativesof the model with respect to each input) can be calcu-lated and therefore constrained. The general equationof the architecture detailed in Fig. 3 is as follows:

Page 3: Introducing the bounded derivative network—superceding the application of neural networks in control

13

3

4

4

5

5Layer 1 (Constant Bias)

Layer 3 (Linear Hidden)

Layer 4 (Non-linear Activation)

Layer 5 (Linear Activation)

6

Layer 6 (Output)

0

0

0

`Layer 0 (Input)

2

2

2

Layer 2 (Transform)

Fig. 3. The Bounded Derivative Network.

P. Turner, J. Guiver / Journal of Process Control 15 (2005) 407–415 409

y¼wð6;1Þ11 þ

Xi

wð6;2Þ1i wð2;0Þ

ii xiþXj

wð6;5Þ1j

� wð5;4Þjj

log cosh wð3;1Þj1 þ

Piwð3;2Þ

ji ðwð2;0Þii xiÞ

� �� �

þwð5;3Þjj wð3;1Þ

j1 þPiwð3;2Þ

ji ðwð2;0Þii xiÞ

� �0BBB@

1CCCA

0BBB@

1CCCAð3Þ

Weights between nodes in each layer are notated as wðp;qÞij

which represents the connection weight from the jthnode in the qth layer to the ith node in the pth layer(q < p). The additional term wð2;0Þ

kk is an extra degree offreedom that can be used to govern the sign of the deriv-ative and also block unwanted inputs or states fromentering the nonlinear mapping.

The derivative of the Bounded Derivative Networkmodel described in (3) with respect to one of the inputvariables can be calculated as

oyoxk

¼ wð2;0Þkk wð6;2Þ

1k þXj

wð6;5Þ1j wð3;2Þ

jk

0BBB@

wð5;3Þjj

þwð5;4Þjj tanh

wð3;1Þj1

þPiwð3;2Þ

ji wð2;0Þii xi

0@

1A

0BBB@

1CCCA1CCCA ð4Þ

It can be noted that Eq. (4) is the equation of a Multi-layer Perceptron with a single hidden layer. This demon-strates that the derivative of the Bounded DerivativeNetwork is indeed a neural network. This is significantin that this model architecture is a universal approxima-

tor for both the input/output mapping of the model andfor its first derivatives. This ability to approximate bothpattern based and underlying behavior is the basis of thepowerful, intelligent extrapolation capability observedwith this new architecture. This is in stark contrast toa neural network for which the derivatives of the modelarchitecture are unconstrained, and in fact are compro-mised in order to obtain a more accurate mapping.

To demonstrate the universal approximating capabil-ity of the Bounded Derivative network, it can be as-sumed that y = f (x) is a given function with a compactdomain and is the function to be approximated withthe Bounded Derivative Network. y is a scalar, x is avector. Let e be the tolerance within which the functionis to be approximated. The derivative of y with respectto the elements of the vector x can be written as follows:

gðxÞ ¼ dydx

¼ df ðxÞdx

ð5Þ

g(x) can be universally approximated with a standardneural network within the tolerance e/V where V is thehypervolume of the compact domain. Hence:

gðxÞ ¼ NNðxÞ þ eðxÞ ð6Þwhere je(x)j < e/V for all x in the compact domain andNN(x) is a neural network model of g(x) and e is the de-sired model error tolerance for the Bounded DerivativeNetwork (BDN). Integrating Eq. (6) gives:

f ðxÞ ¼ BDNðxÞ þZ

eðxÞ ð7Þ

where jReðxÞj < jV ð eV Þj ¼ jej:

The above analysis demonstrates that the integral ofa universal approximator (i.e. the Bounded DerivativeNetwork) is also a universal approximator.

Page 4: Introducing the bounded derivative network—superceding the application of neural networks in control

410 P. Turner, J. Guiver / Journal of Process Control 15 (2005) 407–415

Another significant advantage of this architecture isthat the global bounds on the derivative can be calcu-lated and therefore constrained during model identifica-tion. In order to make the constraints easier to calculate,wð5;4Þ

jj is constrained to be positive. This will not suppressthe approximating capability of the model as any con-straints placed on these weights can be offset by wð6;5Þ

1j .The theoretical minimum and maximum gain of Eq.(4) can be calculated as follows:

ABInputs

States

BoundedDerivativeNetwork

Output PredictionAB BoundedDerivativeNetwork

Fig. 4. State space Bounded Derivative Network.

oyoxkboundð1Þ

¼ wð2;0Þkk

Pjw

ð6;5Þ1j wð3;2Þ

jk wð5;3Þjj

�Pjj wð6;5Þ

1j wð3;2Þjk wð5;4Þ

jj j þwð6;2Þ1k

0B@

1CA

ð8Þ

oyoxkboundð2Þ

¼ wð2;0Þkk

Pjw

ð6;5Þ1j wð3;2Þ

jk wð5;3Þjj

þPjj wð6;5Þ

1j wð3;2Þjk wð5;4Þ

jj j þwð6;2Þ1k

0B@

1CA

ð9Þ

If wð2;0Þkk is positive (as in this case), then oy

oxkboundð1Þis a

minimum theoretical gain and oyoxkboundð2Þ

is a maximumtheoretical gain. If wð2;0Þ

kk is negative then the situationis reversed. The actual model gains are globally guaran-teed to lie within these theoretical limits.

Attempts have been made to constrain the gains ofneural networks [4], but these are not global constraintsand so they simply move the problem to some �unob-served� region of the input domain. In addition to this,the model performance cited in Hartman�s paper yieldsa model error standard deviation that is greater thanthe standard deviation of the predicted variable andhence is actually less accurate than assuming that thepredicted variable will always equal its mean value. Itis not possible to constrain the gain of a neural net-work to never equal zero since this is an intrinsic fea-ture of the sigmoidal nonlinearity. Constraining thegains at randomly sampled intervals of the input do-main simply pushes these zero and near zero gainregions into other areas of the input domain. This ismost undesirable for online applications and hassignificant safety and performance implications [15].To avoid the problems associated with these hidden�singularities�, on-line neural networks become so heav-ily enveloped in safety logic that their impact on theprocess and controller performance is effectivelyeliminated.

Eqs. (8) and (9) represent the theoretical globalbounds on the derivative of the Bounded DerivativeNetwork (It is important to note that the actual boundsare globally guaranteed to be within the limits of the the-oretical bounds). This results in a smooth and eleganttransition to a linear approximation (constant gain) inregions of extrapolation.

4. Training algorithm

The training algorithm for the BDN consists of thefollowing:

(a) The formulation of a constrained optimisationproblem where the constraint sets are determinedvia process knowledge (such as global minimumand maximum input/output gains). The model pre-dictions are calibrated against historical processdata.

(b) The calculation of the first and second derivatives ofthe objective with respect to the parameters (i.e.weights). This procedure uses ordered derivatives[17]—i.e. these derivatives are obtained by propa-gating derivative information back through thenetwork.

(c) The solution of the optimisation problem which canutilize any standard NLP (Non-Linear Program).

5. The state space bounded derivative network

The Bounded Derivative Network can be extended todynamic model form utilizing the Wiener model archi-tecture displayed in Fig. 4. The theoretical global boundcalculations described in Eqs. (8) and (9) now refer tothe steady state gain bounds on the state vectors. Eachinput has a specific contribution to each state whichnow needs to be incorporated into the constraintcalculations.

Eqs. (8) and (9) refer to the derivatives between theinputs to the model and the target output. However,in the State Space Bounded Derivative Model case, theinputs to the model are normalized dynamic states.The contribution of each �unscaled� state to the steadystate gain of a specific input/output relationship canbe derived from the state equations. The dynamic stateequations can be written as

xt ¼ Axt�1 þ But ð10Þ

The values of the states at steady-state (assuming stabledynamics) can be calculated as

xss ¼ ½ðI� AÞ�1B�ut ð11Þ

Page 5: Introducing the bounded derivative network—superceding the application of neural networks in control

P. Turner, J. Guiver / Journal of Process Control 15 (2005) 407–415 411

We now define the matrix W = [(I � A)�1B] which is amatrix of dimension (number of states, number of in-puts) and therefore provides a set of multipiers whichmap the contribution of each �unscaled� state to the stea-dy state gain of each input. It is then the sum of theseindividual steady state gains that make the overall in-put/output gain and it is this summation that requiresconstraining.

In addition, the input variables may be directly con-nected to the nonlinear model as well as the states (theequivalent of a D matrix connection in linear state spacetheory). If input variables are to be included as inputs tothe Bounded Derivative Network as well as the states,then their contribution to the overall steady state gainmust be considered. This will be indicated by the steadystate matrix W = [(I � A)�1B] being appended with anidentity matrix equal in size to the number of inputvariables.

The dimensions of this W matrix is therefore (num-ber of states + number of inputs, number of inputs).Hence, each column of W refers to a specific input var-iable and each row refers to a specific input to theBounded Derivative Network (either a state variableor a direct input variable). Indeed, multiple W matricescan be created to represent not only the steady stategains but also dynamic gains across the model predic-tion horizon.

Eqs. (8) and (9) will now need to be modified to in-clude this new matrix W. However, the constraints needto be calculated in terms of the unscaled states and theunscaled output. The resulting constraints for the in-put/output gains can now be calculated as follows:

oyouiboundð1Þ

¼Xnumber of states

k¼1

WkirðyÞrðuiÞ

wð2;0Þkk

Pjw

ð6;5Þ1j wð3;2Þ

jk wð5;3Þjj

�Pjjwð6;5Þ

1j wð3;2Þjk wð5;4Þ

jj j

þwð6;2Þ1k

0BBBBB@

1CCCCCA ð12Þ

oyouiboundð2Þ

¼Xnumber of states

k¼1

WkirðyÞrðuiÞ

wð2;0Þkk

Pjw

ð6;5Þ1j wð3;2Þ

jk wð5;3Þjj

þPjjwð6;5Þ

1j wð3;2Þjk wð5;4Þ

jj j

þwð6;2Þ1k

0BBBBB@

1CCCCCA ð13Þ

where ui refers to a specific model input and x now refersto a state input variable to the Bounded Derivative Net-work and r(Æ) represents the standard deviation of theunscaled variable in question.

6. Constraint calculations

For automatic control, the ability to globally con-strain the derivatives of the model between an upperand a lower limit is of paramount importance. TheBDN allows the engineer to formulate model constraints(based on fundamental process knowledge) in order togenerate a model with global behavior that is knownand understood. The constraint equations can be sum-marised as follows:

Constraint Set 1. Constrain wð6;5Þ1j wð3;2Þ

jk > 0

Constraint Set 2a. Constrain wð5;4Þjj >¼ 0

Constraint Set 2b. Constrain wð4;3Þjj ¼ 1

Constraint Set 3a. Constrain wð2;0Þkk ¼ 1

OR

Constraint Set 3b. Constrain wð2;0Þkk ¼ 0

Constraint Set 3b allows the disconnection of directinputs to the model if required.

Constraints Sets 1 and 2 allow a simple calculation ofthe modulus component of Eqs. (12) and (13).

Constraint Set 4. (Minimum Gain Bound Con-straint) Constrain

Xk¼kmax

k¼1

WkirðyÞrðxkÞ

wð2;0Þkk

Pjw

ð6;5Þ1j wð3;2Þ

jk

wð5;3Þjj

�signðWkiÞ �wð5;4Þjj

!

þwð6;2Þ1k

0BB@

1CCAPmi

Constraint Set 5. (Maximum Gain Bound Con-straint) Constrain

Xk¼kmax

k¼1

WkirðyÞrðxkÞ

wð2;0Þkk

Pjw

ð6;5Þ1j wð3;2Þ

jk

wð5;3Þjj

þsignðWkiÞ �wð5;4Þjj

!

þwð6;2Þ1k

0BB@

1CCA6Mi

kmax includes all state variables and direct input connec-tions; i refers to the ith input variable; k refers to the kthstate (or direct input); mi = User specified minimumgain bound for input i; Mi = User specified maximumgain bound for input I.

In addition to these five constraints, the model behav-ior can also be constrained such that the relationship be-tween the model gains and the input variables is growthor decay (e.g. for growth increasing the input variablewill also increase the model gain). These constraintsare therefore optional:

Constraint Sets 6 and 7. (For Growth/Decay Specifica-tion) For both Growth and Decay, the followingconstraint must be enforced:

Page 6: Introducing the bounded derivative network—superceding the application of neural networks in control

412 P. Turner, J. Guiver / Journal of Process Control 15 (2005) 407–415

c6 : 1� ðwð5;3Þjj Þ2 6 0

If a �Growth� gain trajectory is required, the followingconstraint applies:

c7 : �wð5;3Þjj wð6;5Þ

1j < 0 ðGrowthÞ

If a �Decay� gain trajectory is required, then the follow-ing constraint applies:

c7 : wð5;3Þjj wð6;5Þ

1j < 0 ðDecayÞ

7. Analytical constraint derivatives

The analytical constraint derivatives can be explicitlycalculated for each constraint and for each model coef-ficient. For example, the derivative of constraint 4 withrespect to the output weights can be calculated as

oc4iowð6;5Þ

1j

¼ �Xkmax

k¼1

WkirðyÞrðxkÞ

wð2;0Þkk wð3;2Þ

jk

wð5;3Þjj

�signðWkiÞ � wð5;4Þjj

!

ð14Þ

The above constraint derivative is calculated for eachoutput weight coefficient. Model training is performedby using an NLP solver fed by the constraints and con-straint derivatives outlined above, and using gradientbackpropagation to calculate the derivatives of thetraining objective with respect to the weights.

The capability to impose these constraints prior tomodel identifications ensures several desirable featuresfor a control model not possible with neural networks.The following properties are then globally guaranteed:

• Monotonicity (if required);• Model approximation occurs at both input/output

level and first derivative (i.e. accurate gainapproximation);

• Absolute minimum and maximum gain predictioncapability (no more zero gain predictions—an intrin-sic problem with neural networks);

• Guaranteed invertibility both in the steady state anddynamic model form;

• Elegant and intelligent extrapolation capability.

8. Extrapolation

In an environment of models built on sparsely distrib-uted historical data, extrapolation and interpolationcapability are inherently linked. Whereas a neural net-work will predict zero gains in regions of sparse interpo-lation or extrapolation, the BDN gain predictions areinherently sensible. There is always a risk in extrapolat-ing if the system is not fully understood. However, this

same argument can be made for the many thousandsof processes currently successfully controlled using lin-ear models. In these applications all extrapolation andinterpolation is linear. The extrapolation is determinedbased on the known gain behavior around the observedoperating point. In the case of the BDN model the sameapproach can be taken. The engineer can specify a min-imum and maximum acceptable gain for each input/out-put relationship and the model can then be calibrated onhistorical process data. Then when the model is forcedto extrapolate or interpolate the resultant gain predic-tions are constrained to be in line with known processbehavior. Indeed the log(cosh()) activation functioncan be approximated as a multivariable quadratic splinebetween two linear hyperplanes. This enables the BDNtechnology to create an intelligent spline between twoknown operating regions where the spline is governedby constraints based on fundamental process knowl-edge. This is in constrast to a neural network wherethe derivatives of the model naturally attenuate to zeroin unfamiliar operating regions resulting in inappropri-ately large controller gains and unstable controlperformance.

Attempts to use short cut connections to enhance theextrapolation capability of neural networks (such as thatdescribed by Zhao et al. [16]) are less capable becausethey essentially reduce the model to the same linear pre-diction in all regions of extrapolation or sparse interpo-lation. This rather arbitrary behavior is unsuitable forthe majority of nonlinear control applications.

9. Overfitting

Overfitting is a phenomenon associated with blackbox models with high parameterisation that are gearedtoward pattern recognition rather than underlying sys-tem identification. Multi-layer Perceptrons and espe-cially Radial Basis function networks are prone tooverfitting because, as well as the large number ofparameters, the transfer functions can be pieced togethereasily to make arbitrary surfaces. Thus functions can bebuilt up with a set of functions with localized domain.BDNs do not have this problem as the global extentof the transfer functions make it difficult to build uparbitrary functions in this way. So there is an inherentbarrier against overfitting. In addition, gain constraintsand growth/decay constraints make the training of aBDN a much more constrained optimisation problemthan the unconstrained training of an ANN.

10. Examples

Fig. 5 displays a typical input/output curve for amonotonic Bounded Derivative Network. The elegant

Page 7: Introducing the bounded derivative network—superceding the application of neural networks in control

-5 -4 -2 -1 0 1 2 3 4 5-2

0

2

4

6

8

10

Input x

Bounded Derivative Transformation (y=log(cosh(x))+1.1x)

Elegant LinearExtrapolation

Elegant LinearExtrapolation

Non-linear Domain

-3 - -

Fig. 5. Bounded derivative activation function (elegant linearextrapolation).

P. Turner, J. Guiver / Journal of Process Control 15 (2005) 407–415 413

linear extrapolation ranges can be observed at either endof the input range.

An additional benefit of the global gain constraintsavailable to the Bounded Derivative Network is thatthe susceptability of overfitting during model identifica-tion is substantially reduced. This means that signifi-cantly less data is required for model building thanwould be required for a neural network. As a resultmodel maintenance is also made much more efficient.

The Bounded Derivative Network has now beenwidely applied worldwide (for several years) in manycommercial nonlinear polymer production controlapplications [12,13]. Some of the many benefits includefaster implementation times, replication (copying ofmodels from one process line to another due to extrap-olation capability), intrinsically safe online models, norequirement for model suppression mechanisms (utilizefull nonlinear model at every control execution) and re-duced maintenance.

Fig. 6. Bounded Derivative Netw

Versteeg et al. [12] describe the application of thestate-space Bounded Derivative Network to a commer-cial polyethylene process within a total nonlinear con-trol solution. This approach performs a full modelinversion on the universally approximating model. Inthis particular application, the State Space BoundedDerivative Network was able to capture both positionaland directionally dependent nonlinear dynamics fromthe historical process data. The controller is seen to takeadvantage of this by maximizing transition speeds inboth directions (increasing and decreasing reactor gasconcentrations). The author reports that the fastest everrecorded transition on this commercial polyethylenefacility can be achieved utilizing this new control tech-nology in place of their existing gain adapted multivar-iable advanced control solution.

11. Application of the bounded derivative network to a

commercial polypropylene process

The following example demonstrates the applicationof the Bounded Derivative Model architecture to a com-mercial Polypropylene process for the purpose of multi-variable control. Figs. 6 and 7 show the model fit to aproduct quality parameter (melt flow index) and the cor-responding gain curve with respect to Hydrogen massflow.

The model consists of just two inputs and a singleoutput. The two input variables being Hydrogen massflow and catalyst flow. The training model R2 error is0.93 which represents a standard deviation error ofabout 8% of the range of the melt index. This level ofaccuracy for a control model (which only includes causalvariables and no model update or noise models) is morethan sufficient to capture the nonlinearities of the proc-ess. The gain curves can then be checked against whatis known on the process and several key features corre-late with process knowledge. Firstly, the process gain forthis application is known to decrease with increasing

ork on polypropylene data.

Page 8: Introducing the bounded derivative network—superceding the application of neural networks in control

Fig. 7. Bounded Derivative Network on polypropylene data-gain analysis.

414 P. Turner, J. Guiver / Journal of Process Control 15 (2005) 407–415

hydrogen. Also the gain range of between 0.5 and 5.0 isalso well matched to process understanding. Figs. 8 and9 show a neural network applied to the same data set.This direct comparison between a BDN model (Figs. 6and 7) and a neural network model (Figs. 8 and 9)clearly demonstrate the advantages for the BDN tech-nology for control.

The training model error for the neural network pre-diction is better to that achieved with the BoundedDerivative Network resulting in an R 2 model error of0.97 (which represents a model error standard deviation

Fig. 8. Neural network model

Fig. 9. Neural network on polypr

of 5% of the MFI range). The unconstrained neural net-work has utilized the high degree of curvature in thederivative in order to fit the data more accurately. How-ever, this has been achieved at the expense of the accu-racy of the model derivative—i.e. the process gainpredictions. Fig. 9 clearly shows the nonsensical shapesof the neural network derivative when trained on spar-sely distributed historical data. This phenomenon isnot an over-fitting problem (the Gaussian peaks in thederivative are architectural). The neural network modelsin this example were trained using cross validation with

on polypropylene data.

opylene data-gain analysis.

Page 9: Introducing the bounded derivative network—superceding the application of neural networks in control

P. Turner, J. Guiver / Journal of Process Control 15 (2005) 407–415 415

training terminating when the model error on the testingdata reached a minimum. The high curvature of thederivatives seen in Fig. 9 are a direct result of the fact thatthe derivative of a neural network activation function isderived from �Gaussian� shaped peak functions and thenatural data sparsity associated with real historical proc-ess data. The poor performance of the neural network inregions of extrapolation and sparse interpolation isclearly observed in Fig. 8. The natural tendency of a neu-ral network derivative to decay to zero in regions of datasparsity make it particularly unsuited for control.

12. Conclusions

There are many issues with neural networks thatmake them an unsuitable choice of modeling paradigmfor automatic control applications. Aribtrary model sat-uration, problematic gain inversions and poor extrapo-lation capability are all a result of the inability toglobally constrain the behavior of the model. In multi-variable control applications, the ability to interpolateinto sparsely excited regions of process operation is afundamental requirement since �extrapolative� assump-tions are often made over the optimisation horizons(such as constant disturbances or pushing the plant tonew and more profitable operating regimes). It is there-fore questionable as to why neural networks have takensuch a prominent role in this field. The authors suggestthat the reason for this is that there has, up until now,been no suitable alternative. This paper has presentedThe Bounded Derivative Network, the unique proper-ties of which make it ideally suited for automatic controlapplications. Apart from being a universal approximat-ing architecture, the Bounded Derivative Network alsooffers the following key capabilities:

• Global constraint capability on minimum and maxi-mum gains (on a per input basis);

• Elegant and intelligent extrapolation capability;• Globally guaranteed model inversion capability;• Global elimination of �zero gain� regions of the input

domain.

All of these additional properties are essential fea-tures for any model utilized in an automatic multivaria-ble control scheme. The fact that neural networks do notoffer any of these capabilities brings their applicationinto serious question. In response to this compellingargument for the use of the Bounded Derivative Net-work in automatic control schemes, the authors presentthis algorithm as a natural and essential replacement forneural networks in control.

References

[2] A. Fache, O. Dubois, A. Billat, On the invertibility of the RBFmodel in a predictive control strategy, in: Proceedings of theEuropean Symposium on Artificial Neural Networks (ESANN),Belgium, 21–23, 1999, pp. 381–386.

[3] C. Klimasauskas, J. Guiver, Hybrid linear-neural network processcontrol, United States Patent Number 6,278,962 (2001).

[4] E. Hartman, Training feedforward neural networks with gainconstraints, Neural Computation 12 (2000) 811–829.

[6] G.D. Martin, Method and apparatus for dynamic and steady statemodeling over a desired path between two end points, UnitedStates Patent Number 5,933,345 (1999).

[10] P. Turner, J. Guiver, The State Space Bounded DerivativeNetwork—Superceding The Application of Neural Networks inControl, European Control Conference 2003, Technical Session 8,Applications General, Cambridge, UK, 2003.

[11] P. Turner, J. Guiver, Neural network APC: Fact or fantasy?Control Solutions Magazine, Optimizing your process supplement(June 2001) 16–20.

[12] J. Versteeg, P. Turner, B. Lines, J. Guiver, Aspen Apollo—Nonlinear Controller Pilot Project, Proceedings of Aspenworld2002, Washington, DC, 2002.

[13] P. Turner, J. Guiver, B. Lines, M. Keenan, On the commercialapplication of next generation nonlinear model based predictivecontrol, Session Applications of Model Predictive Control, paper144f, 2003 AIChE Spring Meeting, New Orleans, LA, 2003.

[15] P. Lisboa, Final Report on the Task Force For Safety CriticalSystems Eunite 2003, Available from <http://www.eunite.org/eunite/task_forces/running/FinalReportSafetyCriticalEunite2003.pdf >.

[16] H. Zhao, J. Guiver, R. Neelakantan, L. Biegler, A nonlinearindustrial model predictive controller using integrated PLS andneural net state space model, IFAC Journal, Control EngineeringPractice 9 (2001) 125–133.

[17] P.J. Werbos, The Roots of Backpropagation: From OrderedDerivatives to Neural Networks and Political Forecasting, JohnWiley and Sons, Inc., 1994.