an introduction to connectionist learning control systems

From the book:Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches,

White, D. & Sofge, D., eds., Van Nostrand Reinhold, 1992.

An Introduction toConnectionist Learning Control Systems

Walter L. Baker and Jay A. Farrell

The Charles Stark Draper Laboratory, Inc.555 Technology SquareCambridge, MA 02139

A b s t r a c t

An important, perhaps even defining, attribute of an intelligent control

system is its ability to improve its performance in the future, based on past

experiences with its environment. The concept of learning is usually used to

describe the process by which this is achieved. This introductory chapter will

focus on control systems that are explicitly designed to exploit learning

behavior. In particular, the use of connectionist learning systems in this

context will be motivated and described. The basic paradigm which emerges is

that a control system can be viewed as a mapping, from plant outputs and

control objectives to actuation commands, with learning as the process of

modifying this mapping to improve future closed-loop system performance.

The feedback required for learning (i.e., the information required to correctly

generate the desired mapping) is obtained through direct interactions with

the plant (and its environment). Thus, learning can be used to compensate for

limited or inaccurate a priori design information by exploiting empirical data

that is gained experientially. A key advantage of connectionist learning

control systems is their ability to accommodate poorly modeled, nonlinear dy-

namical systems.

Contemporary learning control methodologies based on the perspective

outlined above will be described and compared to more traditional control

strategies including generalized robust and adaptive control methods. The

discussion that follows will identify both the distinguishing characteristics of

connectionist learning control systems and the benefits of augmenting tradi-

tional control approaches with learning.

2

1 I n t r o d u c t i o n

Intelligent control systems are intended to maintain closed-loop system integrity and

performance over a wide range of operating conditions and events. This objective

can be difficult to achieve due to the complexity of both the plant and the perfor-

mance objectives, and due to the presence of uncertainty. Such complications may

result from nonlinear or time-varying behavior, poorly modeled plant dynamics,

high dimensionality, multiple inputs and outputs, complex objective functions,

operational constraints, imperfect measurements, and the possibility of actuator,

sensor, or other component failures. Each of these effects, if present, must be ad-

dressed if the system is to operate reliably in an autonomous fashion. Although

learning systems may be used to address several of these difficulties, the main focus

of this introductory chapter will be on the control of complex dynamical systems that

are poorly modeled and nonlinear .

At this point, a basic question arises: What is a learning control system? While most,

if not all, researchers in the intelligent systems field would accept the general

statement that "the ability to learn is a key attribute of an intelligent system," very

few would be able to agree on any particular statement that attempted to be more pre-

cise. The stumbling blocks, of course, are the words "learn" and (ironically)

"intelligent." Similarly, it is difficult to provide a precise and completely satisfactory

definition for the term "learning control system." One interpretation that is,

however, consistent with the prevailing literature is that:

A learning control system is one that has the ability to improve its

performance in the future, based on experiential information it has

gained in the past, through closed-loop interactions with the plant and

its environment.1

There are several implications of this statement. One implication is that a learning

control system has some autonomous capability, since it has the ability to improve its

own performance. Another is that it is dynamic , since it may vary over time. Yet

1. To help focus the discussion that follows and avoid any unnecessary controversy, we will further limit our subject toinclude, primarily, the type of learning that one might associate with sensorimotor control, and exclude moresophisticated learning behaviors (e.g., planning and exploration — see Chapter 9).

3

another implication is that it has m e m o r y , since it can exploit past experience to

improve future performance. Finally, to improve its performance, the learning

system must operate in the context of an objective function and, moreover, it must

receive performance feedback that characterizes the appropriateness of its current

behavior in that context.

In a fundamental sense, the control design problem is to find an appropriate

functional mapping, from measured plant outputs ym and desired plant outputs yd, to

a control action u that will produce satisfactory behavior in the closed-loop system.

In other words, the problem is to choose a function (a control law) u = k ym ,yd , t( ) that

achieves certain performance objectives when applied to the open-loop system. In

turn, the solution to this problem may naturally involve other mappings; e.g., a

mapping from the current plant operating condition to the parameters of a

controller or local plant model, or a mapping from measured plant outputs to

estimated plant state. Accordingly, a learning system that could be used to synthesize

such mappings on-line would be an advantageous component of an intelligent

control system. To successfully employ learning systems in this manner, one must

have an effective means for their implementation and incorporation into the overall

control system architecture. The belief that connectionist systems offer a suitable

means with which to implement learning systems has been the impetus for a large

body of recent research, including much that is reported in this book. Perhaps a

more cogent statement of affairs is that, in the context of control, learning can be

viewed as the automatic incremental synthesis of multivariable functional mappings

and, moreover, that connectionist systems provide a useful framework for realizing

such mappings.

The remaining sections of this chapter are designed to satisfy three major goals:

first, to introduce and motivate the concept of learning in control (with special em-

phasis on the view that learning can be considered as the synthesis of an

appropriate functional mapping); second, to illustrate how a learning system can be

implemented and the role that connectionist systems can play in achieving this; and

third, to illustrate how learning can be incorporated into control system architec-

tures. Along the way, we will identify key issues that arise in the application of

learning to automatic control, and in the use of connectionist systems in such

2. Throughout this chapter, we will use boldface type to denote vector and matrix quantities, and italics to denote scalars.

4

applications. We will conclude with a short list of research areas that may be critical

to future success in this area and that appear to offer great potential for reward.

Because this chapter serves only as an introduction, the reader will be directed

(where appropriate) to subsequent chapters or specific references for further infor-

mation on connectionist systems and learning control.

2 Basic Issues in Learning Control

We will now address a number of basic questions. In particular: What is the role of

learning in the context of intelligent control? What are the alternatives to

learning? How is learning related to adaptation? What is the role of performance

feedback? How can a learning system be realized or implemented? Each of these

questions will be briefly considered in this section; later, in the sections that follow,

we shall develop more detailed and complete answers.

2 . 1 Role of Learning

Following Tsypkin (1973), the necessity for applying learning arises in situations

where a system must operate in conditions of uncertainty, and when the available

pr ior i information is so limited that it is impossible or impractical to design in

advance a system that has fixed properties and also performs sufficiently well. In

the context of intelligent control, learning can be viewed as a means of solving those

problems that lack sufficient a priori information to allow a complete and fixed

control system design to be derived in advance. Thus, a central role of learning in

intelligent control is to enable a wider class of problems to be solved, by reducing the

prior uncertainty to the point where satisfactory solutions can be obtained on-line.

This result is achieved empirically, by means of performance feedback, association,

and memory (or knowledge base) adjustment.

The principal benefits of learning control, given the present state of its

technological development, derive from the ability of learning systems to

automatically synthesize mappings that can be used advantageously within a control

system architecture. Examples of such mappings include a controller mapping that

relates measured and desired plant outputs to an appropriate set of control actions

(Fig. 1a), a related control parameter mapping that generates parameters (e.g., gains)

for a separate controller (Fig. 1b), a model state (or estimator) mapping that produces

5

state estimates (Fig. 1c), and a model parameter mapping that relates the plant

operating condition to an accurate set of model parameters (Fig. 1d). In general,

these mappings may represent dynamic functions (i.e., functions that involve

temporal differentiation or integration).

Learning is required when these mappings cannot be determined completely in

advance because of a priori uncertainty (e.g., modeling error). In a typical learning

control application, the desired mapping is stationary (i.e., does not depend explicitly

on time), and is expressed (implicitly) in terms of an objective function involving

the outputs of both the plant and the learning system. The objective function is used

to provide performance feedback to the learning system, which must then associate

this feedback with specific adjus tab le elements of the mapping that is currently

stored in its memory . The underlying idea is that experience can be used to improve

the mapping furnished by the learning system.

6

controller

mappingplant

ym

u yd

u = fu ym,yd, t( )

Figure 1a. Controller mapping.

k = f k ym, t( )

control

parameter

mapping

plant

ym

u ydcontroller

k

Figure 1b. Control parameter mapping.

x = f x ym,u, t( )

x

plant ym u yd

controller

model

state

mapping

Figure 1c. Model state (or estimator)mapping .

plant ym u yd

controller

k

control

law

design

model

parameter

mapping

p = f p ym,u, t( )

p

Figure 1d. Model parameter mapping.

2 . 2 Relation to Alternative Approaches

There are, of course, several alternative approaches that have been used to

accommodate poorly modeled, nonlinear dynamical behavior. These include

generalized robust and adaptive strategies, as well as what might be termed

"manually executed" learning techniques. The relations between these approaches

and the learning control approach are important and are discussed in the following

paragraphs .

Robust (or "Fixed") Approach . Robust control system design techniques attempt to

treat the problem of model uncertainty as best as possible in advance, so as to produce

a fixed design with guaranteed stability and performance properties for any specific

scenario contained within a given uncertainty set. A tradeoff exists between perfor-

mance and robustness, since robust control designs are usually achieved at the

7

expense of resulting closed-loop system performance (relative to a control design

based on a exact model with perfect certainty). Advanced robust control system

design methods have been developed to minimize this inherent performance /

robustness tradeoff. Although robust design methods are currently limited to linear

problems, nonlinear problems with model uncertainty can sometimes be approached

by interpolating among (gain scheduling) a representative set of robust point

designs over the full operating envelope of the plant, thus decreasing the amount of

model uncertainty that each linear point design must address. Nevertheless, the per-

formance resulting from any f i x e d control design is always limited by the

availability and accuracy of the a priori design information. If there is enough

uncertainty or complexity so that a fixed control law design will not suffice or cannot

be satisfactorily determined, then high closed-loop system performance can only be

obtained in one of three ways: (i) through improved modeling to reduce the

uncertainty, (ii) via an automatic on-line adjustment technique, or (iii) through an

iterative procedure involving experimental testing, evaluation, and manual tuning

of the nominal control law design.

Adaptive Approach. In contrast to the robust or "fixed" design approach, adaptive

control approaches attempt to treat the problem of uncertainty through on-line

means. An adaptive control system can adjust itself to accommodate new situations,

such as changes in the observed dynamical behavior of the plant. In essence,

adaptive techniques monitor the input / output behavior of the plant to, either

explicitly or implicitly, identify the parameters of an assumed dynamical model. The

control system parameters are then adjusted to achieve some desired performance

objective. Thus, adaptive techniques seek to achieve increased performance by

improving some representation, which depends on knowledge of the plant, based on

on-line measurement information. An adaptive control system will attempt to adapt

whenever the behavior of the plant changes by a significant degree. If the

dynamical characteristics of the plant vary considerably over its operating envelope

(e.g., due to nonlinearity), then the control system may be required to adapt

continually. This is generally undesirable, since degradation in performance can be

associated with these adaptation periods. Note that adaptation can occur even in the

absence of time-varying dynamics and disturbances, since the controller must

readapt every time a different dynamical regime is encountered (i.e., one for which

the current control law is inadequate), even if it is returning to an operating con-

dition it has encountered and handled before.

8

"Manually Executed" Learning Approach. Another approach that has been applied

in applications with complex nonlinear dynamical behavior is based on a kind of

manually executed learning control system. In fact, this is the predominant design

practice used to develop flight control systems for aircraft. In this approach, the

control law is developed through an iterative process that integrates multiple control

law point designs to approximate the required nonlinear control law. This approach

often results in numerous design iterations, each involving manual redesign of the

nominal control law (for certain operating conditions), followed by extensive

computer simulation to evaluate the modified control law. After the initial control

system has been designed, extensive empirical evaluation and tuning (manual

adjustment and redesign) of the nominal control system is often required. This arises

because the models used during the design process do not always accurately reflect

the actual plant dynamics. During this procedure, the role of the "learning system"

is played by a combination of control system design engineers, modeling specialists,

and evaluators (including the test pilots). An interesting perspective to consider is

that the learning control approaches discussed within this book potentially offer a

means of automat ing this manual design, evaluation, and tuning process for certain

applications.

2 . 3 Adaptation vs. Learning

At first glance, the rough characterization of a learning system provided in Section 1

does not appear to offer any means of distinguishing a learning control system from

an adaptive one. To some degree this is unavoidable, since there is no definitive

dividing line between the two processes. If, however, one considers the differences

between r e p r e s e n t a t i v e adaptive and learning control systems, then several

distinguishing qualities emerge. For example, learning control systems make exten-

sive use of memory; adaptive control systems also rely on memory for their opera-

tion, but to a significantly lesser degree. In the discussion that follows we will see

that both adaptive and learning control systems can be based on parameter

adjustment algorithms, and that both make use of experiential information gained

through closed-loop interactions with the plant. Nevertheless, we intend to clearly

differentiate the goals and behavioral characteristics of adaptation from those of

learning. A control system that treats every distinct operating situation as a novel

one is limited to a d a p t i v e operation, whereas a system that correlates past

experiences with past situations, and that can recall and exploit those past expe-

9

riences, is capable of l earn ing . From a teleological perspective, adaptation and

learning are different. The key differences (which are essentially a matter of

degree, emphasis, and intended purpose) are summarized in Table 1 and discussed in

the paragraphs that follow.

Adaptive control has a temporal emphasis: its objective is to maintain some desired

closed-loop behavior in the face of disturbances and dynamics that appear to be time-

v a r y i n g . In actuality, the apparent temporal variation may be caused by

nonlinearities, when the operating point of the plant changes (resulting in temporal

changes in the local linearized behavior of a nonlinear, time-invariant plant).

Because the functional forms used by most adaptive control laws are generally inca-

pable of representing, over a wide range of operating conditions, the required

control action as a function of the current plant state, it can be said that adaptive

controllers lack "memory" in the sense that they must readapt to compensate for all

apparent temporal variations in the dynamical behavior of a plant, even those which

are due to time-invariant nonlinearities and have been experienced previously

Table 1. Adaptation vs. Learning.

ADAPTATION LEARNING

reactive : maintain desired behavior(local optimization)

constructional : synthesize desiredbehav ior

(global optimization)

temporal emphasis spatial emphasis

no "memory" ⇒ no anticipation "memory" ⇒ anticipation

fast dynamics slow dynamics

novel situations & slowly time-varyingbehav ior

structural uncertainty & nonlineardependencies

This point is further illustrated with Fig. 2. If the curved surface in this figure

represents the time-invariant, nonlinear dynamical behavior of a simple plant, and

the plant is operating near the point x1, then an adaptive control system might

generate a control law that is suitable for the local linearized behavior of the plant

about this point (the local linear behavior is indicated by the tangent plane at x1). If

10

at a later time, the operating point of the plant changes to x2 , then the adaptive

controller must generate a new control law that is compatible with the local lin-

earized behavior about x2 . Moreover, if the operating point ever returns to x1, the

adaptive controller will have to "solve" the local design problem anew to determine

an appropriate control law for a situation it has operated in before. As can be plainly

seen, the local model used by the adaptive controller will be time-varying, even

though the underlying plant behavior is time-invariant.

Even under ideal circumstances, whenever the behavior of the plant changes, the

dynamical nature of the adaptive process will cause delays in the production of the

desired control actions. As a result, inappropriate control actions may be applied for

a finite period. In the case of a nonlinear time-invariant plant, this results in

degraded performance since unnecessary transient behavior due to inappropriate

control will occur every time the dynamical behavior of the plant changes by a

significant degree. Furthermore, the adaptive process is also inefficient since

(presumably, in this special case) the desired control law could actually be

represented purely as a function of the current state of the plant and the desired

plant outputs, so that no adaptation would be required.

x1 x2

Figure 2. A cartoon illustrating the difference between adaptation and learning.

In general, adaptive controllers operate by optimizing a small set of adjustable pa-

rameters to account for plant behavior that is local in both state-space and time (e.g.,

the local linearized behavior at the current time). To be effective, adaptive control

systems must have relatively fast dynamics so that they can quickly react to

11

changing plant behavior. In some instances, however, the linearized dynamics may

vary so fast that the adaptive system cannot maintain desired performance through

adaptive action alone. As also argued by Fu (1964), it is in this type of situation

(where the variation in dynamical behavior is due to nonlinearity) that a learning

system is needed. Because the learning system retains information, it can, in

principle, react more rapidly to purely state-dependent variations once it has

learned.

Learning controllers exploit an automatic mechanism that associates, throughout

some operating envelope, a suitable control action or set of control system or plant

model parameters with the current operating condition. In this way, the presence

and effect of previously unknown nonlinearities can be accounted for and antici-

pated (in the future), based on past experience. Once such a control system has

"learned," transient behavior that would otherwise be induced in an adaptive system

by state-dependent variations in the dynamics no longer occurs, resulting in greater

efficiency and improved performance over completely adaptive strategies.

Referring back to the example of Fig. 2, the learning augmented control approach

would seek to develop a control law that was suitable throughout the operational

envelope of the plant. Thus, the results of past closed-loop interactions with the

plant would be compiled into a global control law mapping and (once learning had

occurred) used to generate the appropriate control action (as a function of the

current operation condition) without the delays and transients associated with pure

adaptive action.

Learning systems operate by optimizing over a relatively large set of adjustable

parameters (and potentially variable structural elements) to construct a mapping

that captures the state-space dependencies (e.g., nonlinearities) of the problem,

again, throughout the operating envelope. In effect, this optimization is global in

the state-space of interest, and assumes that the desired mapping is stationary or

quasi-stationary. To successfully execute the optimization, learning systems make

extensive use of past information and employ relatively slow learning dynamics.

As defined, the processes of adaptation and learning are complementary: each has

unique desirable characteristics from the point of view of intelligent control. For

example, adaptive behavior is needed to accommodate (slowly) time-varying dynam-

ics and novel situations (e.g., those which have never before been experienced), but

12

is often inefficient for problems involving significant nonlinearity. Learning

approaches, in contrast, have the opposite characteristic: they are well-equipped to

accommodate poorly modeled nonlinear dynamical behavior, but are not well-suited

to applications involving time-varying dynamics. We suggest that a control system

might actually be comprised of three subsystems: an a priori compensator, an

adaptive control system, and a learning control system. Similar strategies were

proposed as early as 1966 (Sklansky (1966)).

2 . 4 Per formance Feedback

Thus far, we have used the term "fixed" to describe those control systems in which

the parameters and structure are determined in an open-loop, p e r f o r m a n c e

independent fashion. Included in this class are all static (memoryless) and dynamic

compensators with constant or scheduled parameters. In a fixed control system, the

structure, parameters, and scheduling dependencies of the controller are completely

determined through an off-line design based on a priori design information. Ex-

amples of control systems that do no t belong to this category include those that

incorporate adaptation or learning.

The fixed control system design strategies may be regarded as being part of the

spectrum depicted in Fig. 3, where the individual techniques are organized according

to the degree to which past information is used by the control system to improve its

subsequent performance. The ordering of the first two categories is easily explained.

Static feedback approaches require only the current plant measurements, and

generate control actions using a static (memoryless) function. Examples of this class

include proportional output error and full-state linear feedback controllers (e.g.,

based on pole-placement, LQR, or H∞ design), as well as nonlinear state feedback

controllers (e.g., based on approximate linearization or dynamic inversion designs).

Dynamic compensators make use of current and past plant output measurements to

generate control actions. The functional mapping (based on current measurements

only) described by such a compensator is a dynamic function (involving temporal

differentiation or integration) that requires internal state variables to characterize

past plant behavior. Examples include state feedback controller / state estimator

combinations (e.g., LQG based designs) and classical frequency domain compensators

(e.g., PID or lead-lag compensators). In the case of static feedback and dynamic com-

pensators that are fixed, the control law is designed off-line and does not vary on-

13

line (in the sense that its parameters and structure do not depend on the

mance of the closed-loop system).

fixed designs flexible designs

staticfeedback

dynamiccompensat ion

adaptivecontro l

l e a r n i n gcontro l

⇒ increasing use of past information ⇒

Figure 3. A spectrum of control strategies.

In contrast, control systems designs that are not fixed (and hence, involve adaptation

or learning), employ performance feedback information to adjust the parameters or

structure of the control law. Note that the distinction between "fixed" and "non-

fixed" control systems may depend on a somewhat arbitrary distinction between the

adjustable parameters and states of a control system.3 As is illustrated in Fig. 3,

adaptive and learning controllers both utilize past plant information to a greater

degree than either of the fixed control system design strategies. As previously indi-

cated, however, learning in the context of control can be construed as the ability to

automatically develop and retain a desired control law for a given plant, based on

closed-loop interactions with the plant (and its environment). The ability to

the requisite control law on-line clearly differentiates learning control approaches

from fixed design approaches (including those that are gain scheduled). The ability

to retain the control law as a function of operating condition further differentiates

learning strategies from adaptive ones. Thus, learning control systems utilize past

experience to the greatest extent, in their attempt to store control or model informa-

tion as a function of the plant operating condition. Note that extensive utilization of

past experience may lead to an increase in the computational resources required for

on-line operation. The expected return, of course, is increased performance.

3. If a control system "parameter" is adjusted in a dynamic fashion (i.e., it is not strictly a static function of other controlsystem variables), then technically, it is a state variable. Nevertheless, a useful distinction can often be made betweenadjustable parameters and states, based on the design of the control system and the time rate of change of the quantitiesin question. For example, the "states" of a controller are usually determined by signals internal to the control loop(i.e., determined more or less directly from plant measurements), while the "parameters" of a controller are usually deter-mined by an external agent. Moreover, the time rate of change of the state variables is usually significantly faster thanthat of the adjustable parameters.

14

2 . 5 Implementation of Learning Systems

From the previous discussion, it is clear that a learning system must be capable of

accumulating and manipulating experiential information, storing and retrieving

compiled knowledge, and adapting its stored knowledge to accommodate new

experiences. A key implementation point is that a learning system will require an

efficient representational framework to retain empirically derived knowledge. In

addition, the structure and operational attributes of a learning system will be deter-

mined, in part, by the quantity and quality of the a priori design information that is

available, including the anticipated characteristics of the experiential information

that is expected to be measurable on-line.

One simple way to implement a learning system is to use a discrete-input, analog-

o u t p u t mapping; that is, to partition the input-space into a number of disjoint

regions, so that the current output is determined by "looking up" the analog output

value associated with the current input region. Although the output values of such a

device are analog, the overall mapping will be discontinuous, as depicted in Fig. 4a.

Many early learning control systems were based on this type of architecture (e.g.,

BOXES — Michie & Chambers (1968)). Assuming that the learning system output is

used directly for control action, a nonlinear control law can be developed by

learning the appropriate output value for each input region (resulting in a

"staircase" approximation to the actual desired control law). The connection between

learning and function synthesis is readily apparent in this case. One drawback of

this approach is the combinatorial growth in the number of regions required, as

either the input-space dimension or the number of partitions per input-space

dimension is increased; another is the discontinuous nature of the control laws that

can be represented.

A more sophisticated learning system can be developed via an appropriate

mathematical framework that is capable of representing a family of continuous

functions (see Fig. 4b); this framework can have a fixed or variable structure, and a

potentially large number of free parameters. Such architectures, including

artificial neural networks, are often used in contemporary learning control systems.

In this case, the learning process is designed to automatically adjust the parameters

(or structure) of the functional form to achieve the desired input / output mapping.

Such representations have important advantages over simple look-up table ap-

15

proaches; for instance, continuous functional forms are generally more efficient in

terms of the number of free parameters, and hence, the amount of memory, needed to

approximate a smooth function. Furthermore, they are capable of automatically

providing local generalization of learning experiences.

Figure 4a. A "staircase" approximationbased on a discrete-input, analog-output

mapping .

Figure 4b. A smooth approximation basedon a continuous mapping structure.

3 Connectionist Learning Systems for Control

Although the motivation, models, and goals of much of connectionist learning

systems research are derived from the biological and behavioral sciences, current

research in the application of this work to problems in automatic control has yielded

an emerging theory with a firm basis in the established mathematical disciplines of

function approximation, estimation, and optimization. This approach coincides with

traditional connectionist implementations of learning systems in a number of ways:

namely, in the use of massively parallel architecture, extensive memory, local

computation, and automatic self-adjustment of parameters and structure. This

approach differs, however, in its view of the "learning" process itself. This new

perspective will be discussed further in this section.

A commonly held notion is that learning results in an assoc ia t ion between input

stimuli and desired output actions. By interpreting the word "association" in a math-

ematically rigorous manner, one is naturally lead to the central idea underlying

many contemporary learning control systems. In these systems, "learning" is

viewed as a process of automatically synthesizing multivariable functional

16

mappings, based on a criterion for optimality and experiential information that is

gained incrementally over time (Baker & Farrell (1990)). Most importantly, this pro-

cess can be realized through the selective adjustment of the parameters and structure

of an appropriate representational framework.

Brief reviews of some early learning control system implementations are presented

in Sklansky (1966), Fu (1970), Franklin (1989), and Farrell & Baker (1992). The

advantages of contemporary implementation schemes relative to those of the past are

numerous and will be considered in this section. In this presentation we shall

decompose the overall problem of learning control into two sub-problems: first, how

can a learning system be realized or implemented and, second, how can such a

learning system be employed within a control system architecture? In this section

we discuss implementations of learning systems and show how connectionist net-

works can be used to realize the automatic function synthesis capabilities we desire.

Key benefits are derived from the smoothness, generality, and representational effi-

ciency of the mappings that can be obtained. Answers to the second question are

discussed in Section 4.

To further develop the main theme of this section, we will proceed by elaborating on

the notion of learning as automatic function synthesis. After a more formal in-

troduction to the concept and its key issues, the applicability of connectionist

ing systems in this context will be described, followed by subsections covering the

issues of incremental and spatially localized learning.

3 . 1 Learning as Function Synthesis

To allow for a concrete discussion of the key concepts, we will make use of the follow-

ing definitions. Let M denote a general memory device that has been incorporated

into a learning system, and let D represent the domain over which this memory is

applicable. If x ∈D , then the expression u = M x( ) will be used to signify the "recall"

of item u , by "situation" x , from the memory M . The desired mapping to be stored by

this memory (via learning) will be denoted by M* x( ) . For purposes of control, x

could be used to represent plant states or outputs, or even more general control

situations; similarly, u could be used to represent control action directly or the

parameters of a control system or plant model. In the discussion that follows, we will

assume (without loss of generality) that x corresponds to the plant state (i.e., a point

17

in the state-space) or to a small set of plant states (i.e., a small closed region of the

state-space), and that u corresponds to control action. If the desired mapping M*

were explicitly known (which is n o t generally the case), then a basic learning

scheme would involve the following three steps: (1) Given x , use the current

mapping M to generate u = M x( ) . (2) Compare u with the desired u

* , given by

u* = M* x( ). (3) Update the mapping M to reduce the discrepancy between u and u

* .4

For a wide and important class of learning control problems, the desired mapping is

known (or assumed) to be continuous in advance. In such situations, memory im-

plementations with efficient storage mechanisms can be proposed. By assuming that

the desired mapping M* is continuous, an approximate mapping M can be

implemented by any scheme capable of approximating arbitrary continuous

functions. In such cases, the memory M is represented as a continuous function,

parameterized by a vector p ; i.e., M = M x;p( ). To apply this approach to the learning

scheme described above, the learning update step (3) would be achieved by

appropriately adjusting the parameter vector p by an amount ∆p (yet to be de-

termined). By "appropriate," we mean that the adjusted parameter vector p = p + ∆p is

such that the resulting u = M x; p( ) would be "better" than the original u , relative to

the desired response u* . As new learning experiences became available, the

mapping M would be incrementally improved. Knowledge recall would be achieved

by evaluating the functional mapping at a particular point in its input domain.

In this parameterized approach to function synthesis, the knowledge that is gained

over time is stored in a distr ibuted fashion in the parameter space of the memory.

This feature, which arises naturally in any practical implementation of a continuous

mapping, can be most desirable from a learning control point of view (depending on

the way it is achieved, as discussed below). Distributed learning is advantageous

when previous learning under s imilar circumstances can be combined to provide a

suitable response for the current situation. This fusion process effectively broadens

the scope and influence of each learning experience and is referred to as general iza-

tion .

4. If the desired mapping is not explicitly known, then a variant of this scheme can still be used in which the second stepis approximated. As will be discussed, this can be accomplished (directly) by estimating the desired values of themapping, or (indirectly) by estimating the gradient of an objective function with respect to the mapping.

18

There are several important ramifications of generalization. First, it has the effect of

eliminating "blank spots" in the memory (i.e., specific points at which no learning

has occurred), since s o m e response (albeit not necessarily the desired one) will

always be generated. Second, it has the effect of constraining the set of possible

input / output mappings that can be achieved by the memory, since in most cases

neighboring input situations will result in similar outputs (i.e., the mapping would

become a smooth or piecewise-smooth function). Finally, generalization complicates

the learning process, since the adjustment of the mapping following a learning

experience can no longer be considered as an independent, point-by-point process.

In spite of this, the advantages accorded by generalization usually far outweigh the

difficulties it evokes.

Generalization is an intrinsic feature of function synthesis approaches that rely on

parameterized continuous mappings. In any practical implementation having a

finite number of adjustable parameters, each adjustable parameter will affect the

realized function over a region of non-zero measure. When a single parameter p j

(from the set p = p1, p2 , . . . , pN{ }) is adjusted to improve the approximation at a specific

point x , the continuous mapping M (i.e., at least one of the outputs of

M = M1, M2 , . . . , Mm{ }) will be affected throughout the region of "influence" of that

parameter. This region of influence is determined by the partial derivatives ∂Mi ∂p j

(one for each output of M), which are functions of the input x . Under these condi-

tions, the effect of a learning experience will be generalized automatically, and ex-

tended to all parts of the mapping in which the "sensitivity" functions ∂Mi ∂p j are

non-zero. The greatest effect will occur where ∂Mi ∂p j is largest; little or no change

will occur wherever this quantity is small or zero. The nature of this generalization

may or may not be beneficial to the learning process depending on whether the ex-

tent of the generalization is local or global. These issues are further discussed in

Subsection 3.4.

For function synthesis approaches based on parameterized representations, the

learning process requires an algorithm that will specify an appropriate ∆p so as to

achieve some desired objective. When the mathematical structure used to implement

the mapping is continuously differentiable and the objective function J can be

treated as a "cost" to be minimized, then the construction of ∆p can be straight-

forward. In the special case where the adjustable parameters p appear linearly in

the gradient vector ∂J ∂p of the cost function J with respect to the adjustable

19

parameters p , the optimization could be treated as a linear algebra problem; in

general (i.e., for most applications), nonlinear optimization methods must be used.

One nonlinear technique that is suitable for on-line learning is the g r a d i e n t

learning algorithm: ∆p = −W ⋅ ∂J ∂p , where W is a positive definite matrix that deter-

mines the "learning rate," and the gradient ∂J ∂u is defined to be a column vector. If

a second-order Taylor expansion is used to provide a local approximation of the

objective function J (about the current parameter vector p ), then the "optimum" W

which minimizes this local quadratic cost function in a single step can be shown to

be equal to the inverse of the Hessian matrix H (of J ), so that

Wopt = H−1 =∂ 2J

∂p∂pT

−1

(1)

Eqn. (1) is only valid when the local Hessian matrix is positive definite. Because it is

difficult to compute and invert the Hessian on-line, the weight matrix W is usually

only an approximation of the full Hessian, as in the Levenberg-Marquardt method

(see Press, et al. (1988)). Often, in fact, a single learning rate coefficient α is used to

set W = αI .

More insight can be gained into the gradient learning algorithm through an

application of the chain rule, which yields: ∆p = −W ⋅ ∂uT ∂p ⋅ ∂J ∂u (where the

Jacobian ∂uT ∂p is defined as a matrix of gradient column vectors ∂ui ∂p , so that

∂J ∂p = ∂uT ∂p ⋅ ∂J ∂u). This form of the gradient learning rule involves two types of

information: the Jacobian of the outputs of the mapping with respect to the ad-

justable parameters, and the gradient of the objective function with respect to the

mapping outputs. The gradient ∂J ∂u is determined both by the specification of the

objective function J and the manner in which the mapping outputs affect this

function (which, in turn, is determined by the way in which the learning system is

used within the control system architecture). The Jacobian ∂uT ∂p is completely

determined by the approximation structure M and, hence, is known a priori as a

function of the input x . Note that the performance feedback information provided to

the learning system is the output gradient ∂J ∂u. This gradient vector provides the

learning system with considerably more information than the scalar J ; in

particular, ∂J ∂u indicates both a direction and magnitude for ∆p (since ∂uT ∂p is

known), whereas performance feedback based solely on the scalar J does neither.

20

To give an illustrative example, a simple quadratic objective function might be

defined as

J = 12

eiTei

E

∑ (2)

where J is the cost to be minimized (over a finite set of evaluation points

xi ∈E = x1,x2 , . . . ,xR{ }) and the output errors ei = ui

* − ui = M* (xi ) − M(xi ) are assumed to

be known. In the special case where the objective function is given by Eqn. (2) and

W = αI , the learning rule is

∆p = α∂ui

T

∂p⋅ ei

E

∑If the objective function is a strictly convex function of p , then the gradient algo-

rithm will find the optimum value p* that minimizes J . For most practical learning

control problems, however, the situation is much more complicated. The objective

function J to be minimized may involve terms that are only known implicitly (e.g.,

the desired output u* may not be explicitly known or, equivalently, the output error

e of the mapping may not be measurable); moreover, J may be significantly more

complex than that shown in Eqn. (2) (e.g., J may be a dynamic rather than a static

function). Finally, for reasons that will be discussed in Subsection 3.3, objective

functions defined over a finite set of evaluation points (as in Eqn. (2)) cannot usually

be used directly for on-line learning control.

As with all gradient based optimization techniques, there exists a possibility of

converging to a local minimum if the objective function is not unimodal. This point

together with the preceding discussion suggests two desiderata for learning control

systems employing gradient learning methods: first, the architecture should allow

for the determination (or accurate estimation) of the gradient ∂J ∂u and, second, the

cost function J should be a convex function of the adjustable parameters p . Note

that it may be possible to determine or estimate ∂J ∂u without ever knowing u* (this

point will be exploited in Section 4).

3 . 2 Connectionist Learning Systems

Connectionist systems, including what are often called "artificial neural networks,"

have been suggested by many authors to be ideal structures for the implementation

of learning control systems. A typical connectionist system is organized in a

21

network architecture that is comprised of nodes and connections between nodes.

Each node can be thought of as a simple processing unit, with a number of adjustable

parameters (which do not have to appear linearly in the nodal input / output

relationship). Typically, the number of different node types in a network is small

compared to the total number of nodes. Common examples of connectionist systems

include multilayer sigmoidal (Rumelhart, et al. (1990)) and radial basis function

(Poggio & Girosi (1990)) networks.5 The popularity of such systems arises, in part,

because they are relatively simple in form, are amenable to gradient learning

methods, and can be implemented in parallel computational hardware. For example,

"error back-propagation" (which is discussed in the appendix of Chapter 8) is an ef-

ficient implementation of a gradient algorithm to modify the adjustable network pa-

rameters in a multilayer sigmoidal network, based on the squared error at the output

of the network.

Perhaps more importantly, however, it is well known that several classes of

connectionist systems have the universal approximation property. This property

implies that any continuous function can be approximated to a given degree of accu-

racy by a sufficiently large network (Funahashi (1989), Hornik, et al. (1989)).

Although the universal approximation property is important, it is held by so many

different approximation structures that it does not form a suitable basis upon which

to distinguish them. Thus, we must ask what other attributes are important in the

context of learning control. In particular, we must look beyond the initial biological

motivations for connectionist systems and determine whether they indeed hold any

advantage over more traditional approximation schemes. An important factor to

consider is the env i ronment in which learning will occur. Thus, for example, the

quantity, quality, and content of the information that is likely to be available to the

learning system during its operation critically impact its performance, and should be

accounted for in the selection of a suitable learning approach.

The particular scenarios that we will consider involve the use of pass ive learning

strategies; that is, learning schemes that are opportunistic and exploit whatever

5. We do not consider any recurrent networks (i.e., networks having internal feedback and, hence, internal state) in thisdiscussion for the simple reason that any recurrent network representing a continuous or discrete-time dynamic mappingcan be expressed as an equivalent dynamical system comprised of two static mappings separated by either an integrationor unit delay operator. In other words, the problem can always be decomposed into two component problems: that ofestimating the parameters of the static mappings and that of estimating the state of the dynamical system (e.g., via anextended Kalman filter (Livstone, et al. (1992)).

22

information happens to be available during the normal course of operation of the

closed-loop system. In contrast, one might also consider active learning strategies,

in which the learning control system not only attempts to drive the outputs of the

plant along a desired trajectory, but also explicitly seeks to improve the accuracy of

the mapping maintained by the learning system. This is achieved by introducing

"probing" signals that direct the plant into regions of its state-space where

insufficient learning has occurred. Active learning control is analogous to

(adaptive) control (Åström & Wittenmark (1989)). Because we wish to focus on

passive learning strategies, the learning systems we consider must be capable of

accommodating on-line measurements and performance feedback that arise during

the normal operation of the closed-loop system. This situation presents special

challenges, as discussed in the next subsection.

3 . 3 Incremental Learning Issues

If the goal is to have learning occur on-line, in conjunction with a plant that can be

nominally modeled as the discrete-time dynamical system

xk+1 = f xk ,uk( )yk = h xk ,uk( ) (3)

where f ⋅,⋅( ) and

h ⋅,⋅( ) are continuous, then an objective function of the form given by

Eqn. (2) cannot be used directly. The main problem is that the set of possible inputs

to the mapping maintained by the learning system will not consist of a finite set of

discrete points. Consequently, it will not be easy way to select a finite set of

representative evaluation points zi ∈E, nor will it be possible to guarantee that any

or all of them are ever visited. In general, the inputs z to the learning system will

be comprised from measured or estimated values of

x,u,y{ } — which represent a con-

tinuum. Fortunately, various alternative objective functions that approximate Eqn.

(2) are feasible and are often used in practice. For example, one approach would be

to allow the set E to grow on-line to include all zi as they are encountered; i.e.,

Ek = z1,z2 , . . . ,zk{ } (4)

In the special case where the adjustable parameters p appear linearly in the

gradient ∂J ∂p of Eqn. (2) and E is given by Eqn. (4), recursive linear estimation

techniques (e.g., RLS) could be used to obtain the "optimum" parameter vector p*

(corresponding to the particular set E). In most connectionist networks, however,

some or all of the adjustable parameters appear nonlinearly in ∂J ∂p ; hence, linear

23

optimization methods cannot be used. Moreover, evaluation sets of the form given by

Eqn. (4) are difficult to employ in a nonlinear setting.

By far, the most common objective function used for on-line learning in control

applications is the point-wise function given by

J = 1

2eTe (5)

Eqn. (5) can be considered as a special case of Eqn. (2) when the evaluation set E

contains a single point at each sampling instant. Learning algorithms that seek to

minimize point-wise objective functions in lieu of objective functions defined over a

continuum are referred to as incremental learning algorithms; they are related to a

broad class of stochastic approximation methods (Gelb (1974)). Incremental gradient

learning algorithms operate by approximating the actual gradient ∂J ∂p of Eqn. (2)

with an in s tan taneous estimate of the gradient, based on Eqn. (5). Incremental

gradient learning algorithms of this form are related to stochastic gradient methods

(Haykin (1991)). The use of point-wise objective functions to approximate batch

ensemble) objective functions (i.e., those in which E contains more than one point)

will generally not be successful unless special attention is given to the distribution

of the evaluation points, the form of the learning algorithm, and the structure of the

network . We will have more to say concerning this point in the next subsection.

One well-known and widely used stochastic gradient algorithm is the least-mean-

square (LMS) algorithm (Widrow & Hoff (1960)). The LMS parameter adjustment law

is ∆p = −α ∂J ∂p, where the gradient ∂J ∂p is based on Eqn. (5). Given certain

assumptions (e.g., linearity, stationarity, Gaussian-distributed random variables,

etc.), LMS can be shown to be convergent, relative to the objective function of Eqn.

(2), with E given by Eqn. (4). In this case, the LMS algorithm is guaranteed to be

convergent in the mean and mean-square, i.e.,

limk→∞

E pk( ) = popt and limk→∞

E Jk( ) = Jsubopt > Jmin

if the learning rate coefficient α (a constant) satisfies conditions related to the

eigenvalues of the correlation matrix of z (e.g., α cannot be too large) (Haykin

(1991)). In the first limit, as the number of learning experiences goes to infinity,

the expected value of the parameter vector approaches that of the optimum

parameter vector popt corresponding to the Wiener solution for this problem (which

achieves Jmin). In the second limit, the expected value of the cost (which is the

mean-square error), also approaches a limit, but not the minimum value achieved by

24

the optimum (Wiener) solution. Under these same conditions, convergence of the

parameter vector (not its expected value) to the optimum value, i.e.,

limk→∞

pk = popt

can be obtained if the learning rate coefficient decreases at a special rate over time

(e.g., αk ~ 1 k) (Gelb (1974)). Although the theory supporting the stability and

convergence of the LMS algorithm only applies to the special case of a l i n e a r

network (among other assumptions), the basic strategy underlying LMS has been

used to formulate a simple learning algorithm for nonlinear networks. In this case,

the parameter adjustment law becomes

∆p = α

∂uT

∂p⋅ e (6)

where ∂J ∂p is based on Eqn. (5) (with e = u* − u ), so that the performance feedback

signal provided to the network is ∂J ∂u = −e . Eqn. (6) represents the standard

incremental gradient algorithm presently used by most practitioners for on-line

learning control; it is equivalent to incremental "error back-propagation" (e.g., see

Rumelhart, et al. (1990)).

3 . 4 Spatially Localized Learning

Special constraints are placed on a learning system whenever learning is to occur

on-line, during closed-loop operation; these constraints can impact the network

architecture, learning algorithm, and training process. Assuming a passive learn-

ing system is being employed, the learning experiences (training examples) cannot

be selected freely, since the plant state (and outputs) are constrained by the system

dynamics, and the desired plant outputs are constrained by the specifications of the

control problem (without regard to learning). Under these conditions, the system

state may remain in small regions of its state-space for extended periods of time (e.g.,

near setpoints). In turn, this implies that the measurements z used for incremental

learning will remain in small regions of the input domain of the mapping being

synthesized. Such "fixation" can cause undesirable side-effects in situations where

parameter adjustments (based on incremental learning algorithms) have a non-local

effect on the mapping maintained by the learning system.

For example, if a parameter that has a non-local effect on the mapping is repeatedly

adjusted to correct the mapping in a particular region of the input domain, this may

25

cause the mapping in other regions to deteriorate and, thus, can effectively "erase"

learning that has previously taken place. Such undesirable behavior arises because

the parameter adjustments dictated by an incremental learning algorithm are made

on the basis of a single evaluation point, without regard to the remainder of the

m a p p i n g . Another unfortunate phenomenon is inherent in all incremental

learning algorithms: conflicting demands on the adjustable parameters are created

because, for instance, the vector pi* that minimizes J in Eqn. (5) at some point zi , will

generally differ from the vector p j

* that minimizes this function at some other point

z j . The idiosyncrasies associated with passive incremental learning in closed-loop

control (i.e., fixation coupled with non-local learning, and conflicting parameter

updates), have precipitated the development and analysis of spatially localized

learning systems.

The basic idea underlying spatially localized learning arises from the observation

that learning is facilitated in situations where a clear association can be made

between a subset of the adjustable elements of the learning system and a localized

region of the input-space. Further consideration of this point in the context of the

difficulties described above, suggests several desired traits for learning systems that

rely on incremental gradient learning algorithms. These objectives can be

expressed in terms of the previously mentioned "sensitivity" functions ∂Mi ∂p j ,

which are the partial derivatives of the mapping outputs Mi with respect to the

adjustable parameters p j . At each point x in the input domain of the mapping, it is

desired that the following properties hold:

• for each Mi, there exists at least one p j such that the function

∂Mi ∂p j is

relatively large in the vicinity of x (coverage)

• for all Mi and p j , if the function

∂Mi ∂p j is relatively large in the vicinity of

x , then it must be relatively small elsewhere (localization)

Under these conditions, incremental gradient learning is supported throughout the

input domain of the mapping, but its effects are limited to the local region in the

vicinity of each learning point. Thus, experience and consequent learning in one

part of the input domain have only a marginal effect on the knowledge that has

already been accrued in other parts of the mapping. For similar reasons, problems

due to conflicting demands on the adjustable parameters are also reduced.

Several existing learning system designs, including BOXES (Michie & Chambers

(1968)), CMAC (Albus (1975)), radial basis function networks (Poggio & Girosi (1990)),

26

and local basis / influence function networks (Baker & Farrell (1990)), generally do

exhibit the spatially localized learning property. In contrast, the ubiquitous

sigmoidal (or perceptron) network often does not exhibit this property. To combat

the problems associated with non-localized learning and conflicting parameter

updates, a number of simple corrective procedures have been used with sigmoidal

networks, including local batch learning, very slow learning rates, distributed (un-

correlated) input sequences, and randomizing input buffers (e.g., see Baird & Baker

(1990)).

To give a simple example of spatially localized learning, we will briefly describe local

basis / influence function networks and, in particular, the linear-Gaussian network

This approach relies on a combination of local basis and inf luence function nodal

units to achieve a compromise between the spatially localized learning properties of

quantized learning systems (e.g., those based on "bins") and the efficient

representation and generalization capabilities of other connectionist networks. The

complete network mapping is constructed from a set of "basis" functions f i x( ) that

have applicability only over spatially localized regions of the input-space. The

influence functions γ i x( ) are coupled in a one-to-one fashion with the basis

functions f i x( ) , and are used to describe the domain over the input-space (the

"sphere of influence") of each local basis function. In other words, relative to some

point xo in the input domain, each influence function γ i x( ) is defined as a non-

negative function, with a maximum at xo, that tends to zero for all points x that are

"far away" from xo. The overall input / output relationship is given by

y x( ) = Γ i x( )f i x( )i=1

n

∑ (7a)

where Γ i x( ) are the normalized influence functions, defined to be

Γ i x( ) =γ i x( )

γ j x( )j =1

n

∑with 0 < Γ i x( ) ≤ 1 and Γ i x( ) = 1

i=1

n

∑ (7b)

By design, each adjustable parameter in this network affects the overall mapping

only over the limited region of its input-space described by the associated

(normalized) influence function. Thus, the aforementioned "fixation" problem is

avoided. Note also that (local) generalization is an inherent property of the network,

and that standard incremental gradient learning methods can be still be used.

27

To further illustrate the basic concept, we will consider a specific realization

employing linear functions (with an offset) as the local basis units, and Gaussian

functions as the influence function units. In this l i near -Gauss i an network , the

functions f i x( ) and

γ i x( ) are defined to be:

f i x( ) = Mi x − xio( ) + bi

γ i x( ) = ci exp − x − xio( )T

Qi x − xio( ){ } (8)

where, for each node pair i in the network, the matrices Mi and Qi , the vectors xio

and bi, and the scalar ci are all potentially adjustable ( Qi must be positive definite).

The vector xio represents the local origin shared by the linear-Gaussian pair, the idea

being that the overall mapping is approximated by f i x( ) in the "vicinity" of xi

o (as

characterized by Γ i x( ), relative to all other

Γ j≠ i x( )). Because of its unique structure,

physical meaning is more easily attributed to each parameter and to the overall

structure of the network. As a result, a priori knowledge and partial solutions are

easily incorporated (e.g., linear control point designs corresponding to the f i x( )). In

fact, linear functions were chosen as the local basis units due to their simplicity and

compatibility with conventional gain scheduled mappings (alternative local basis

units may be more desirable if certain a priori knowledge is available about the

regional functional structure of the desired mapping). Due to its special structure,

this network also allows on-line variable structure learning schemes to be used,

where nodal unit pairs can be added or removed from the network to achieve more

accurate or more efficient mappings. An example of a simple linear-Gaussian

network comprised of 5 pairs of local basis / influence function units is shown in

Fig. 5; the influence functions (lower part of the figure) have been separated from

each other somewhat so that each of the local linear functions is clearly visible in

the overall input / output mapping (upper part of the figure).

28

Figure 5. A somewhat exaggerated example of a linear-Gaussian network mapping

( ℜ2 → ℜ ), together with its underlying set of influence functions.

Learning algorithms for spatially localized networks can capitalize on localization in

two ways. First, spatial localization implies that at each instant of time only a small

subset of the nodal units (and hence a small subset of the adjustable parameters)

have a significant effect on the network mapping. Thus, the efficiency of both

calculating the network outputs and of updating the network parameters can be im-

proved by ignoring all "insignificant" nodal units. For example, this can be realized

in a linear-Gaussian network by utilizing only those nodal unit pairs with the largest

normalized influences; that is, those whose combined (normalized) influence equals

or exceeds some predefined threshold (e.g., 0.95). This approach (which can be

considered as a means of achieving a "sparse" computational problem) can greatly

increase the throughput of a network when implemented in sequential

computational hardware. Furthermore, since the system state may remain in partic-

ular regions of its state-space for extended periods of time, it is expected that the

approximation error will not tend uniformly to zero. Instead, the error will be lowest

in those areas where the greatest amount of learning has occurred. This leads to

conflicting constraints on the learning rate: it should be small, to filter the effects of

noise, in those regions where the approximation error is small; at the same time, it

should be larger, for fast learning, in those regions where the approximation error

is large (relative to the ambient noise level). Resolution of this conflict is possible

through the use of spatially localized learning rates, where individual learning rate

29

coefficients are maintained for each (spatially localized) adjustable parameter and

updated in response to the local learning conditions. In this case, the elements of the

weighting matrix W would vary individually over time.

The computational memory requirements for spatially localized networks fall

somewhere between those for non-local connectionist networks (on the low side) and

those for discrete-input, analog-output mapping architectures (on the high side). By

requiring each parameter to have only a localized effect on the overall mapping, we

should expect an increase in the number of parameters required to obtain a mapping

comparable in accuracy to a (potentially more efficient) non-local technique.

Nevertheless, for automatic control applications, training speed and approximation

accuracy should have priority over memory requirements, since memory is

generally inexpensive relative to the cost of inaccurate or inappropriate control ac-

tions.

4 Learning Control System Architectures

Having motivated and discussed the basic features of connectionist learning systems

for control, this section briefly describes hybrid control system architectures that

exhibit both adaptive and learning behaviors. These hybrid structures incorporate

adaptation and learning in a synergistic manner. In such schemes, an adaptive

system is coupled with a connectionist learning system to provide real-time

adaptation to novel situations and slowly time-varying dynamics, in conjunction

with learning to accommodate stationary or quasi-stationary state-space dependen-

cies (e.g., memoryless nonlinearities). The adaptive control system reacts to

discrepancies between the desired and observed behaviors of the plant, to maintain

the requisite closed-loop system performance. These discrepancies may arise from

time-varying dynamics, disturbances, or unmodeled dynamics. In practice, little can

be done to anticipate time-varying dynamics and disturbances; thus, these

phenomena are usually handled through feedback in the adaptive system. In

contrast, the effects of some unmodeled dynamics (in particular, static non-

linearities) can be predicted from previous experience. This is the task given to the

learning system. Initially, all unmodeled behavior is handled by the adaptive system;

eventually, however, the learning system is able to a n t i c i p a t e previously

experienced, yet initially unmodeled behavior. Thus, the adaptive system can

30

concentrate on novel situations (where little or no learning has occurred) and

slowly time-varying behavior.

Two general hybrid architectures are outlined in this section. The discussion of

these architectures parallels the usual presentation of direct and indirect adaptive

control strategies. In each approach, the learning system is used to alleviate the

burden on the adaptive controller of continually reacting to predictable state-space

dependencies in the dynamical behavior of the plant (e.g., stationary, memoryless

nonlinearities). Note that various technical issues must be addressed to guarantee

the successful implementation of these approaches. For example, to ensure both the

stability and robustness of the closed-loop system (which includes both the adaptive

and learning systems, as well as the plant), one must address issues related to:

controllability and observability; the effects of noise, disturbances, model-order

errors, and other uncertainties; parameter convergence, sufficiency of excitation,

and nonstationarity; computational requirements, time-delays, and the effects of

finite precision arithmetic. Many (if not all) of these issues arise in the im-

plementation of traditional adaptive control systems; as such, there are some existing

sources one may refer to in the hope of addressing these issues (e.g., Åström &

Wittenmark (1989), Clauberg & Farrell (1991), Narendra & Annaswamy (1989), Slotine

& Li (1991)). Although these topics are well beyond the scope of this chapter, in some

instances the learning augmented approach appears to offer operational advantages

over the corresponding adaptive approach (with respect to such implementation

issues). For example, a typical adaptive system would require persistent excitation to

ensure the generation of accurate control or model parameters, under varying plant

operating conditions. A learning system, however, would only require suf f ic ient

excitation, during some training phase, to allow the stationary, state-space dependen-

cies of the parameters to be captured.

4 . 1 Direct Implementat ion

In the typical direct adaptive control approach (see Fig. 6), each control action u is

generated based on the measured ym and desired yd plant outputs, internal state of

the controller, and estimates of the pertinent control law parameters k . The

estimates of the control law parameters are adjusted, at each time-step, based on the

error e between the measured plant outputs and the outputs of a reference system

yr . Of course, care must be taken to ensure that the plant is actually capable of

31

attaining the performance specified by the selected reference system. Direct

adaptive control approaches do not rely upon an explicit plant model, and thus avoid

the need to perform on-line system identification.

The controller in Fig. 6 is structured so that normal adaptive operation would result if

the learning system were not implemented. The reference represents the desired

behavior for the augmented plant (controller plus plant), while the adaptive

mechanism is used to transform the reference error directly into a correction ∆k for

the current control system parameters. The adaptation algorithm can be developed

and implemented in several different ways (e.g., via gradient or Lyapunov based

techniques — see Åström & Wittenmark (1989), Slotine & Li (1991)). Learning aug-

mentation can be accomplished by using the learning system to store the required

control system parameters as a function of the operating condition of the plant

(Farrell & Baker (1992), Vos, et al. (1991)). Alternatively, learning can be used to

store the appropriate control action as a function of the actual and desired plant

outputs (Farrell & Baker (1991)). The architecture in Fig. 6 shows the first case.

32

∆k ym

u ydplantcontroller

reference

learning

system

control

law

design

e

δk k

+ –

Figure 6. Direct adaptive / learning approach.

When the learning system is used to store the control system parameters as a func-

tion of the plant operating condition, the adaptive system would provide any

required perturbation to the control parameters k generated by the learning system.

The signal from the control block to the learning system in Fig. 6 is the perturbation

in the control parameters δk to be associated with the previous operating condition.

This association (incremental learning) process is used to combine the estimate from

the adaptive system with the control parameters that have already been learned for

that operating condition. At each sampling instant, the learning system generates

an estimate of the control system parameters k associated with that operating

condition, and then passes this estimate to the controller where it is combined with

the perturbation parameter estimates maintained by the adaptive system, and used to

generate the control action u . In the ideal limit where perfect learning has

occurred, and there is an absence of noise, disturbances, and time-varying dynamics,

the correct parameter values would always be supplied by the learning system, so

that both the perturbations δk and corrections ∆k generated by the adaptive system

would become zero.6 Under more realistic assumptions, there would be some small

6. In this case, the system architecture is similar to that used in gain scheduling, with the proviso that learning has oc-curred on-line with the actual plant, while a gain schedule is developed off-line via a model.

33

degradation in performance due to adaptation (e.g., δk and ∆k might not be zero due

to noise).

In the case where the learning system is trained to store control action directly as a

function of the actual and desired operating conditions of the plant, the adaptive

system would provide any required perturbation to the control action generated by

the learning system. Note that a dynamic mapping would have to be synthesized by

the learning system if a dynamic feedback law were desired (which was not

necessary in the first case). The advantage of this approach over the previous one is

that a more general control law can be learned. The disadvantage is that additional

memory is required and that a more difficult learning problem must be addressed.

4 . 2 Ind i rec t Implementa t ion

In the typical indirect adaptive control approach (see Fig. 7), each control action u is

generated based on the measured ym and yd desired plant outputs, internal state of

the controller, and estimated parameters pa of a local plant model. The parameters k

for a local control law are explicitly designed on-line, based on the observed plant

behavior. If the behavior of the plant changes (e.g., due to nonlinearity), an

estimator automatically updates its model of the plant as quickly as possible, based on

the information available from the (generally noisy) output measurements. The

indirect approach has the important advantage that powerful design methods

(including optimal control techniques) may potentially be used on-line. Note, how-

ever, that computational requirements are usually greater for indirect approaches

since both model identification and control law design are performed on-line.

If the learning system in Fig. 7 were not implemented, then this structure would

represent the operation of a traditional indirect adaptive control system. The signal

pa is the adaptive estimate of the plant model parameters. This signal is used to

calculate the control law parameters k . Incorporation of the learning system would

allow the plant model parameters to be learned as a function of the plant operating

condition. The model parameters generated by the learning system allow previously

experienced plant behavior to be anticipated , leading to improved control law design

(Baird & Baker (1990)). In this case, the output of the learning system pl to both the

control design block and the estimator is an a priori estimate of the model parameters

associated with the current operating condition. An a posteriori parameter estimate

34

p post from the estimator (involving both filtering and posterior smoothing) is used to

update the mapping stored by the learning system. The system uses model parameter

estimates from both the adaptive and learning systems to execute the control law

design and determine the appropriate control law parameters. In situations where

the design procedure is complex and time-consuming, the control law parameters

might also be stored (via a separate mapping in the learning system) as a function of

the plant operating condition. Thus, control law design could be performed at a

lower rate, assuming that the control parameter mapping maintained by the

learning system was sufficiently accurate to provide reasonable control in lieu of

design at a higher rate.

p a

ym u yd

k

plantcontroller

control

law

design

adaptive

estimator

learning

systemp l

p post

Figure 7. Indirect adaptive / learning approach.

4 . 3 Summary of Adaptive / Learning Control Architectures

In both of the hybrid implementations described in this section, the learning system

(prior to any on-line interaction) would only contain knowledge derived from the

design model. During initial closed-loop operation, the adaptive system would be used

to accommodate any inadequacies in the a priori design knowledge. Subsequently, as

experience with the actual plant was accumulated, the learning system would be used

to anticipate the appropriate control or model parameters as a function of the

current plant operating condition. The adaptive system would remain active to

handle novel situations and limitations of the learning system (e.g., finite accuracy).

With perfect learning, but no noise, disturbances, or time-varying behavior in the

35

plant, the contribution from the adaptive system would eventually become zero. In

the presence of noise and disturbances, the contribution from the adaptive system

would become small, but non-zero (depending on the hybrid scheme used, however,

the effect of this contribution might be negligible). In the general case involving

all of these effects, the hybrid control system should perform better than either sub-

system individually. Recalling the discussion in Subsection 2.3, it can be seen that

adaptation and learning are c o m plementary behaviors, and that they can be used

simultaneously (for purposes of automatic control) in a synergistic fashion.

5 Conclus ion

The main objectives of this chapter were to introduce, motivate, and describe the

salient features of learning in control systems; to distinguish these features and

their associated benefits from related approaches; and to suggest means for both

implementing learning systems and incorporating them into control system

architectures. We have intentionally avoided the urge to identify and categorize the

ever-growing variety of learning systems and learning control structures. Instead,

we have focused on the key issues underlying their motivation, operation, and

implementation for the purpose of intelligent control.

In the framework we have presented, learning can be construed as the purposive

adjustment of the parameters (and possibly structure) of an appropriate

representational framework to achieve a desired mapping. Learning systems (for

control) can be realized through connectionist network structures coupled with

automatic function synthesis mechanisms. Two general control system architectures

have been presented to demonstrate how learning systems can be used to augment

traditional adaptive control system structures in a synergistic fashion. Further

discussion of implementation issues and other learning control system structures is

given throughout this book, and also in the available literature. Intelligent control

systems incorporating learning have the potential to a c c o m m o d a t e u n c e r t a i n t y

through on-line interaction with the actual plant and improve efficiency and per-

formance through on-line self-optimization.

Presently, scientific and engineering substantiation of many of the potential

benefits associated with learning control remain to be produced. Nevertheless, on

the strength of the preliminary results that have been obtained, as well as the basic

36

theory that has been developed, further examination of the fundamental issues

underlying learning control and its application to intelligent control is warranted.

In particular, new research and development efforts aimed at demonstrat ing

sibility and potential benefits of learning control, and ident i fy ing future research

directions are needed. The fact that conventional and relatively inexpensive

computational facilities can be sufficient to support real-time implementations has

already been demonstrated (e.g., Farrell & Baker (1991)).

To close this chapter, we suggest the following topics as areas for future research

that appear to offer significant potential for improving the capabilities of existing

learning control systems.

1. Incremental function synthesis. How large a network is needed to adequately

represent a desired mapping (representational power)? What computational re-

sources will be required for implementation (representational efficiency

throughput )? Can convergence and stability of the adjustable parameters be

guaranteed under any (even simplifying) conditions? If so, how fast will it

occur? Are there potential pitfalls associated with certain types of representation

schemes and learning algorithms (e.g., non-spatially localized learning systems)?

2. Variable structure learning. The representational power and computational

requirements of a connectionist network are determined to a great extent by its

structure. When the structure is determined and fixed a priori, conservative

design practices easily lead to over-design and an inefficient use of resources.

Thus, too few resources may be assigned to approximate complex portions of the

desired mapping, limiting the approximation accuracy, or too many resources

may be applied to approximate relatively simple parts of the mapping, resulting

in excessive computational requirements. Variable structure learning schemes

are one solution to this dilemma; however, efficient modification rules that dictate

when, where, and how changes are to be made, remain to be identified.

3. Coupling of adaptive and learning control systems. Both of the hybrid

architectures described in Section 4 relied on a combination of adaptive and

learning phenomena. For example, in the direct approach, control system

parameter estimates from the adaptive and learning systems had to be combined,

while in the indirect approach, model parameter estimates were combined prior

37

to on-line control law design. Is there an optimal way to perform such fus ion

Considering optimal linear estimation techniques such as Kalman filtering, one is

naturally led to the idea that any reasonable blending of adaptive and learning

estimates will depend on maintaining measures of their individual "quality" (e.g.,

error covariance). How can the qua l i ty (or c o n f i d e n c e ) associated with the

current state of the learning system (which will vary throughout over its input

domain) be represented and maintained?

4. Higher level learning. The a priori specification of a realistic objective function

that can be achieved over an entire operational envelope is a difficult problem.

Accordingly, one topic for future research might be directed at adjusting the

objective function on-line to seek increased performance where possible, while

decreasing strain on the system where necessary. Another research topic might

involve planning and exploration techniques (e.g., see Chapter 9) to perform dual

control and learning, and thus ensure adequate training throughout the op-

erational envelope. These, and most other, higher level forms of learning control

all involve more complex optimization problems than are usually considered at

the present time.

A c k n o w l e d g m e n t

This article is based on work that was supported, in part, by: the Charles Stark Draper

Laboratory, Inc., under IR&D Project No. 276; the Air Force Wright Laboratory, under

Contract No. F33615-88-C-1740; the National Science Foundation, under Grant No. ECS-

9014065; and the Naval Air Warfare Center, under Contract No. N62269-91-C-0033.

Any opinions, findings, conclusions, or recommendations expressed in this material

are those of the authors and do not necessarily reflect the views of the sponsoring

agencies or of the Charles Stark Draper Laboratory, Inc.

The authors would also like to acknowledge the contributions by all members, past

and present, of the Learning Control Research Group at Draper Laboratory — their

help is greatly appreciated.

38

B ib l iography

Albus, J. (1975). "A New Approach to Manipulator Control: The Cerebellar ModelArticulation Controller (CMAC)," ASME Journal of Dynamic Systems, Measurement,and Control, Vol. 97, pp. 220-227.

Åström, K. & Wittenmark, B. (1989). Adaptive Control, Addison-Wesley.

Baird, L. & Baker, W. (1990). "A Connectionist Learning System for NonlinearControl," Proceedings, 1990 AIAA Conference on Guidance, Navigation, andControl.

Baker, W. & Farrell, J. (1990). "Connectionist Learning Systems for Control,"Proceedings, SPIE OE/Boston '90.

Clauberg, B. & Farrell, J. (1991). "Issues in the Implementation of an IndirectAdaptive Control System," Draper Laboratory Report CSDL-P-3136, Cambridge, MA.

Farrell, J. & Baker, W. (1991). "Learning Augmented Control for AdvancedAutonomous Underwater Vehicles," Proceedings, 18th Annual AUVS TechnicalSymposium and Exhibit.

Farrell, J. & Baker, W. (1992). "Learning Control Systems," in Antsaklis, P. & Passino,K., eds., Intelligent and Autonomous Control Systems, Kluwer Academic.

Franklin, J. (1989). "Historical Perspective and State of the Art in ConnectionistLearning Control," Proceedings, 28th IEEE Conference on Decision and Control

Fu, K. (1964). "Learning Control Systems," in Tou, J. & Wilcox, R., eds., Computer andInformation Sciences, Spartan.

Fu, K. (1970). "Learning Control Systems — Review and Outlook," IEEE Transactions onAutomatic Control, Vol. AC-15, No. 2.

Gelb, A., ed., (1974). Applied Optimal Estimation, MIT Press.

Funahashi, K. (1989). "On the Approximate Realization of Continuous Mappings byNeural Networks," Neural Networks, Vol. 2, pp. 183–192.

Haykin, S. (1991). Adaptive Filter Theory, 2nd ed., Prentice Hall.

Hornik, K., Stinchcombe, M., & White, H. (1989). "Multilayer Feedforward NetworksAre Universal Approximators," Neural Networks, Vol. 2, pp. 359–366.

Livstone, M., Farrell, J., & Baker, W. (1992). "A Computationally Efficient Algorithmfor Training Recurrent Connectionist Networks," Proceedings, 1992 AmericanControl Conference.

Michie, D. & Chambers, R. (1968). "BOXES: An Experiment in Adaptive Control," inDale, E. & Michie, D., eds., Machine Intelligence 2, Oliver and Boyd.

Narendra, K. & Annaswamy, A. (1989). Stable Adaptive Systems, Prentice-Hall.

Poggio, T. & Girosi, F. (1990). "Networks for Approximation and Learning,"Proceedings of the IEEE, Vol. 78, No. 9, pp. 1481-1497.

Press, W., Flannery, B., Teukolsky, S., & Vetterling, W. (1988). Numerical Recipes in C:The Art of Scientific Computing, Cambridge University Press.

39

Rumelhart, D., Hinton, G., & Williams, R. (1986). "Learning Internal Representationsby Error Propagation," in Rumelhart, D. & McClelland, J., eds., Parallel DistributedProcessing: Explorations in the Microstructure of Cognition, Vol. 1: FoundationsMIT Press / Bradford.

Sklansky, J. (1966). "Learning Systems for Automatic Control," IEEE Transactions onAutomatic Control, Vol. AC-11, No. 1.

Slotine, J. & Li, W. (1991). Applied Nonlinear Control, Prentice-Hall.

Tsypkin, Y. (1973). Foundations of the Theory of Learning Systems, Academic Press.

Vos, D., Baker, W., & Millington, P. (1991). "Learning Augmented Gain SchedulingControl," Proceedings, 1991 AIAA Conference on Guidance, Navigation, andControl.

Widrow, B. & Hoff, M. (1960). "Adaptive Switching Circuits," 1960 WESCON ConventionRecord, Part IV, pp. 96-104.

an introduction to connectionist learning control systems

Documents