an introduction to connectionist learning control systems
TRANSCRIPT
From the book:Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches,
White, D. & Sofge, D., eds., Van Nostrand Reinhold, 1992.
An Introduction toConnectionist Learning Control Systems
Walter L. Baker and Jay A. Farrell
The Charles Stark Draper Laboratory, Inc.555 Technology SquareCambridge, MA 02139
A b s t r a c t
An important, perhaps even defining, attribute of an intelligent control
system is its ability to improve its performance in the future, based on past
experiences with its environment. The concept of learning is usually used to
describe the process by which this is achieved. This introductory chapter will
focus on control systems that are explicitly designed to exploit learning
behavior. In particular, the use of connectionist learning systems in this
context will be motivated and described. The basic paradigm which emerges is
that a control system can be viewed as a mapping, from plant outputs and
control objectives to actuation commands, with learning as the process of
modifying this mapping to improve future closed-loop system performance.
The feedback required for learning (i.e., the information required to correctly
generate the desired mapping) is obtained through direct interactions with
the plant (and its environment). Thus, learning can be used to compensate for
limited or inaccurate a priori design information by exploiting empirical data
that is gained experientially. A key advantage of connectionist learning
control systems is their ability to accommodate poorly modeled, nonlinear dy-
namical systems.
Contemporary learning control methodologies based on the perspective
outlined above will be described and compared to more traditional control
strategies including generalized robust and adaptive control methods. The
discussion that follows will identify both the distinguishing characteristics of
connectionist learning control systems and the benefits of augmenting tradi-
tional control approaches with learning.
2
1 I n t r o d u c t i o n
Intelligent control systems are intended to maintain closed-loop system integrity and
performance over a wide range of operating conditions and events. This objective
can be difficult to achieve due to the complexity of both the plant and the perfor-
mance objectives, and due to the presence of uncertainty. Such complications may
result from nonlinear or time-varying behavior, poorly modeled plant dynamics,
high dimensionality, multiple inputs and outputs, complex objective functions,
operational constraints, imperfect measurements, and the possibility of actuator,
sensor, or other component failures. Each of these effects, if present, must be ad-
dressed if the system is to operate reliably in an autonomous fashion. Although
learning systems may be used to address several of these difficulties, the main focus
of this introductory chapter will be on the control of complex dynamical systems that
are poorly modeled and nonlinear .
At this point, a basic question arises: What is a learning control system? While most,
if not all, researchers in the intelligent systems field would accept the general
statement that "the ability to learn is a key attribute of an intelligent system," very
few would be able to agree on any particular statement that attempted to be more pre-
cise. The stumbling blocks, of course, are the words "learn" and (ironically)
"intelligent." Similarly, it is difficult to provide a precise and completely satisfactory
definition for the term "learning control system." One interpretation that is,
however, consistent with the prevailing literature is that:
A learning control system is one that has the ability to improve its
performance in the future, based on experiential information it has
gained in the past, through closed-loop interactions with the plant and
its environment.1
There are several implications of this statement. One implication is that a learning
control system has some autonomous capability, since it has the ability to improve its
own performance. Another is that it is dynamic , since it may vary over time. Yet
1. To help focus the discussion that follows and avoid any unnecessary controversy, we will further limit our subject toinclude, primarily, the type of learning that one might associate with sensorimotor control, and exclude moresophisticated learning behaviors (e.g., planning and exploration — see Chapter 9).
3
another implication is that it has m e m o r y , since it can exploit past experience to
improve future performance. Finally, to improve its performance, the learning
system must operate in the context of an objective function and, moreover, it must
receive performance feedback that characterizes the appropriateness of its current
behavior in that context.
In a fundamental sense, the control design problem is to find an appropriate
functional mapping, from measured plant outputs ym and desired plant outputs yd, to
a control action u that will produce satisfactory behavior in the closed-loop system.
In other words, the problem is to choose a function (a control law) u = k ym ,yd , t( ) that
achieves certain performance objectives when applied to the open-loop system. In
turn, the solution to this problem may naturally involve other mappings; e.g., a
mapping from the current plant operating condition to the parameters of a
controller or local plant model, or a mapping from measured plant outputs to
estimated plant state. Accordingly, a learning system that could be used to synthesize
such mappings on-line would be an advantageous component of an intelligent
control system. To successfully employ learning systems in this manner, one must
have an effective means for their implementation and incorporation into the overall
control system architecture. The belief that connectionist systems offer a suitable
means with which to implement learning systems has been the impetus for a large
body of recent research, including much that is reported in this book. Perhaps a
more cogent statement of affairs is that, in the context of control, learning can be
viewed as the automatic incremental synthesis of multivariable functional mappings
and, moreover, that connectionist systems provide a useful framework for realizing
such mappings.
The remaining sections of this chapter are designed to satisfy three major goals:
first, to introduce and motivate the concept of learning in control (with special em-
phasis on the view that learning can be considered as the synthesis of an
appropriate functional mapping); second, to illustrate how a learning system can be
implemented and the role that connectionist systems can play in achieving this; and
third, to illustrate how learning can be incorporated into control system architec-
tures. Along the way, we will identify key issues that arise in the application of
learning to automatic control, and in the use of connectionist systems in such
2. Throughout this chapter, we will use boldface type to denote vector and matrix quantities, and italics to denote scalars.
4
applications. We will conclude with a short list of research areas that may be critical
to future success in this area and that appear to offer great potential for reward.
Because this chapter serves only as an introduction, the reader will be directed
(where appropriate) to subsequent chapters or specific references for further infor-
mation on connectionist systems and learning control.
2 Basic Issues in Learning Control
We will now address a number of basic questions. In particular: What is the role of
learning in the context of intelligent control? What are the alternatives to
learning? How is learning related to adaptation? What is the role of performance
feedback? How can a learning system be realized or implemented? Each of these
questions will be briefly considered in this section; later, in the sections that follow,
we shall develop more detailed and complete answers.
2 . 1 Role of Learning
Following Tsypkin (1973), the necessity for applying learning arises in situations
where a system must operate in conditions of uncertainty, and when the available
pr ior i information is so limited that it is impossible or impractical to design in
advance a system that has fixed properties and also performs sufficiently well. In
the context of intelligent control, learning can be viewed as a means of solving those
problems that lack sufficient a priori information to allow a complete and fixed
control system design to be derived in advance. Thus, a central role of learning in
intelligent control is to enable a wider class of problems to be solved, by reducing the
prior uncertainty to the point where satisfactory solutions can be obtained on-line.
This result is achieved empirically, by means of performance feedback, association,
and memory (or knowledge base) adjustment.
The principal benefits of learning control, given the present state of its
technological development, derive from the ability of learning systems to
automatically synthesize mappings that can be used advantageously within a control
system architecture. Examples of such mappings include a controller mapping that
relates measured and desired plant outputs to an appropriate set of control actions
(Fig. 1a), a related control parameter mapping that generates parameters (e.g., gains)
for a separate controller (Fig. 1b), a model state (or estimator) mapping that produces
5
state estimates (Fig. 1c), and a model parameter mapping that relates the plant
operating condition to an accurate set of model parameters (Fig. 1d). In general,
these mappings may represent dynamic functions (i.e., functions that involve
temporal differentiation or integration).
Learning is required when these mappings cannot be determined completely in
advance because of a priori uncertainty (e.g., modeling error). In a typical learning
control application, the desired mapping is stationary (i.e., does not depend explicitly
on time), and is expressed (implicitly) in terms of an objective function involving
the outputs of both the plant and the learning system. The objective function is used
to provide performance feedback to the learning system, which must then associate
this feedback with specific adjus tab le elements of the mapping that is currently
stored in its memory . The underlying idea is that experience can be used to improve
the mapping furnished by the learning system.
6
controller
mappingplant
ym
u yd
u = fu ym,yd, t( )
Figure 1a. Controller mapping.
k = f k ym, t( )
control
parameter
mapping
plant
ym
u ydcontroller
k
Figure 1b. Control parameter mapping.
x = f x ym,u, t( )
x
plant ym u yd
controller
model
state
mapping
Figure 1c. Model state (or estimator)mapping .
plant ym u yd
controller
k
control
law
design
model
parameter
mapping
p = f p ym,u, t( )
p
Figure 1d. Model parameter mapping.
2 . 2 Relation to Alternative Approaches
There are, of course, several alternative approaches that have been used to
accommodate poorly modeled, nonlinear dynamical behavior. These include
generalized robust and adaptive strategies, as well as what might be termed
"manually executed" learning techniques. The relations between these approaches
and the learning control approach are important and are discussed in the following
paragraphs .
Robust (or "Fixed") Approach . Robust control system design techniques attempt to
treat the problem of model uncertainty as best as possible in advance, so as to produce
a fixed design with guaranteed stability and performance properties for any specific
scenario contained within a given uncertainty set. A tradeoff exists between perfor-
mance and robustness, since robust control designs are usually achieved at the
7
expense of resulting closed-loop system performance (relative to a control design
based on a exact model with perfect certainty). Advanced robust control system
design methods have been developed to minimize this inherent performance /
robustness tradeoff. Although robust design methods are currently limited to linear
problems, nonlinear problems with model uncertainty can sometimes be approached
by interpolating among (gain scheduling) a representative set of robust point
designs over the full operating envelope of the plant, thus decreasing the amount of
model uncertainty that each linear point design must address. Nevertheless, the per-
formance resulting from any f i x e d control design is always limited by the
availability and accuracy of the a priori design information. If there is enough
uncertainty or complexity so that a fixed control law design will not suffice or cannot
be satisfactorily determined, then high closed-loop system performance can only be
obtained in one of three ways: (i) through improved modeling to reduce the
uncertainty, (ii) via an automatic on-line adjustment technique, or (iii) through an
iterative procedure involving experimental testing, evaluation, and manual tuning
of the nominal control law design.
Adaptive Approach. In contrast to the robust or "fixed" design approach, adaptive
control approaches attempt to treat the problem of uncertainty through on-line
means. An adaptive control system can adjust itself to accommodate new situations,
such as changes in the observed dynamical behavior of the plant. In essence,
adaptive techniques monitor the input / output behavior of the plant to, either
explicitly or implicitly, identify the parameters of an assumed dynamical model. The
control system parameters are then adjusted to achieve some desired performance
objective. Thus, adaptive techniques seek to achieve increased performance by
improving some representation, which depends on knowledge of the plant, based on
on-line measurement information. An adaptive control system will attempt to adapt
whenever the behavior of the plant changes by a significant degree. If the
dynamical characteristics of the plant vary considerably over its operating envelope
(e.g., due to nonlinearity), then the control system may be required to adapt
continually. This is generally undesirable, since degradation in performance can be
associated with these adaptation periods. Note that adaptation can occur even in the
absence of time-varying dynamics and disturbances, since the controller must
readapt every time a different dynamical regime is encountered (i.e., one for which
the current control law is inadequate), even if it is returning to an operating con-
dition it has encountered and handled before.
8
"Manually Executed" Learning Approach. Another approach that has been applied
in applications with complex nonlinear dynamical behavior is based on a kind of
manually executed learning control system. In fact, this is the predominant design
practice used to develop flight control systems for aircraft. In this approach, the
control law is developed through an iterative process that integrates multiple control
law point designs to approximate the required nonlinear control law. This approach
often results in numerous design iterations, each involving manual redesign of the
nominal control law (for certain operating conditions), followed by extensive
computer simulation to evaluate the modified control law. After the initial control
system has been designed, extensive empirical evaluation and tuning (manual
adjustment and redesign) of the nominal control system is often required. This arises
because the models used during the design process do not always accurately reflect
the actual plant dynamics. During this procedure, the role of the "learning system"
is played by a combination of control system design engineers, modeling specialists,
and evaluators (including the test pilots). An interesting perspective to consider is
that the learning control approaches discussed within this book potentially offer a
means of automat ing this manual design, evaluation, and tuning process for certain
applications.
2 . 3 Adaptation vs. Learning
At first glance, the rough characterization of a learning system provided in Section 1
does not appear to offer any means of distinguishing a learning control system from
an adaptive one. To some degree this is unavoidable, since there is no definitive
dividing line between the two processes. If, however, one considers the differences
between r e p r e s e n t a t i v e adaptive and learning control systems, then several
distinguishing qualities emerge. For example, learning control systems make exten-
sive use of memory; adaptive control systems also rely on memory for their opera-
tion, but to a significantly lesser degree. In the discussion that follows we will see
that both adaptive and learning control systems can be based on parameter
adjustment algorithms, and that both make use of experiential information gained
through closed-loop interactions with the plant. Nevertheless, we intend to clearly
differentiate the goals and behavioral characteristics of adaptation from those of
learning. A control system that treats every distinct operating situation as a novel
one is limited to a d a p t i v e operation, whereas a system that correlates past
experiences with past situations, and that can recall and exploit those past expe-
9
riences, is capable of l earn ing . From a teleological perspective, adaptation and
learning are different. The key differences (which are essentially a matter of
degree, emphasis, and intended purpose) are summarized in Table 1 and discussed in
the paragraphs that follow.
Adaptive control has a temporal emphasis: its objective is to maintain some desired
closed-loop behavior in the face of disturbances and dynamics that appear to be time-
v a r y i n g . In actuality, the apparent temporal variation may be caused by
nonlinearities, when the operating point of the plant changes (resulting in temporal
changes in the local linearized behavior of a nonlinear, time-invariant plant).
Because the functional forms used by most adaptive control laws are generally inca-
pable of representing, over a wide range of operating conditions, the required
control action as a function of the current plant state, it can be said that adaptive
controllers lack "memory" in the sense that they must readapt to compensate for all
apparent temporal variations in the dynamical behavior of a plant, even those which
are due to time-invariant nonlinearities and have been experienced previously
Table 1. Adaptation vs. Learning.
ADAPTATION LEARNING
reactive : maintain desired behavior(local optimization)
constructional : synthesize desiredbehav ior
(global optimization)
temporal emphasis spatial emphasis
no "memory" ⇒ no anticipation "memory" ⇒ anticipation
fast dynamics slow dynamics
novel situations & slowly time-varyingbehav ior
structural uncertainty & nonlineardependencies
This point is further illustrated with Fig. 2. If the curved surface in this figure
represents the time-invariant, nonlinear dynamical behavior of a simple plant, and
the plant is operating near the point x1, then an adaptive control system might
generate a control law that is suitable for the local linearized behavior of the plant
about this point (the local linear behavior is indicated by the tangent plane at x1). If
10
at a later time, the operating point of the plant changes to x2 , then the adaptive
controller must generate a new control law that is compatible with the local lin-
earized behavior about x2 . Moreover, if the operating point ever returns to x1, the
adaptive controller will have to "solve" the local design problem anew to determine
an appropriate control law for a situation it has operated in before. As can be plainly
seen, the local model used by the adaptive controller will be time-varying, even
though the underlying plant behavior is time-invariant.
Even under ideal circumstances, whenever the behavior of the plant changes, the
dynamical nature of the adaptive process will cause delays in the production of the
desired control actions. As a result, inappropriate control actions may be applied for
a finite period. In the case of a nonlinear time-invariant plant, this results in
degraded performance since unnecessary transient behavior due to inappropriate
control will occur every time the dynamical behavior of the plant changes by a
significant degree. Furthermore, the adaptive process is also inefficient since
(presumably, in this special case) the desired control law could actually be
represented purely as a function of the current state of the plant and the desired
plant outputs, so that no adaptation would be required.
x1 x2
Figure 2. A cartoon illustrating the difference between adaptation and learning.
In general, adaptive controllers operate by optimizing a small set of adjustable pa-
rameters to account for plant behavior that is local in both state-space and time (e.g.,
the local linearized behavior at the current time). To be effective, adaptive control
systems must have relatively fast dynamics so that they can quickly react to
11
changing plant behavior. In some instances, however, the linearized dynamics may
vary so fast that the adaptive system cannot maintain desired performance through
adaptive action alone. As also argued by Fu (1964), it is in this type of situation
(where the variation in dynamical behavior is due to nonlinearity) that a learning
system is needed. Because the learning system retains information, it can, in
principle, react more rapidly to purely state-dependent variations once it has
learned.
Learning controllers exploit an automatic mechanism that associates, throughout
some operating envelope, a suitable control action or set of control system or plant
model parameters with the current operating condition. In this way, the presence
and effect of previously unknown nonlinearities can be accounted for and antici-
pated (in the future), based on past experience. Once such a control system has
"learned," transient behavior that would otherwise be induced in an adaptive system
by state-dependent variations in the dynamics no longer occurs, resulting in greater
efficiency and improved performance over completely adaptive strategies.
Referring back to the example of Fig. 2, the learning augmented control approach
would seek to develop a control law that was suitable throughout the operational
envelope of the plant. Thus, the results of past closed-loop interactions with the
plant would be compiled into a global control law mapping and (once learning had
occurred) used to generate the appropriate control action (as a function of the
current operation condition) without the delays and transients associated with pure
adaptive action.
Learning systems operate by optimizing over a relatively large set of adjustable
parameters (and potentially variable structural elements) to construct a mapping
that captures the state-space dependencies (e.g., nonlinearities) of the problem,
again, throughout the operating envelope. In effect, this optimization is global in
the state-space of interest, and assumes that the desired mapping is stationary or
quasi-stationary. To successfully execute the optimization, learning systems make
extensive use of past information and employ relatively slow learning dynamics.
As defined, the processes of adaptation and learning are complementary: each has
unique desirable characteristics from the point of view of intelligent control. For
example, adaptive behavior is needed to accommodate (slowly) time-varying dynam-
ics and novel situations (e.g., those which have never before been experienced), but
12
is often inefficient for problems involving significant nonlinearity. Learning
approaches, in contrast, have the opposite characteristic: they are well-equipped to
accommodate poorly modeled nonlinear dynamical behavior, but are not well-suited
to applications involving time-varying dynamics. We suggest that a control system
might actually be comprised of three subsystems: an a priori compensator, an
adaptive control system, and a learning control system. Similar strategies were
proposed as early as 1966 (Sklansky (1966)).
2 . 4 Per formance Feedback
Thus far, we have used the term "fixed" to describe those control systems in which
the parameters and structure are determined in an open-loop, p e r f o r m a n c e
independent fashion. Included in this class are all static (memoryless) and dynamic
compensators with constant or scheduled parameters. In a fixed control system, the
structure, parameters, and scheduling dependencies of the controller are completely
determined through an off-line design based on a priori design information. Ex-
amples of control systems that do no t belong to this category include those that
incorporate adaptation or learning.
The fixed control system design strategies may be regarded as being part of the
spectrum depicted in Fig. 3, where the individual techniques are organized according
to the degree to which past information is used by the control system to improve its
subsequent performance. The ordering of the first two categories is easily explained.
Static feedback approaches require only the current plant measurements, and
generate control actions using a static (memoryless) function. Examples of this class
include proportional output error and full-state linear feedback controllers (e.g.,
based on pole-placement, LQR, or H∞ design), as well as nonlinear state feedback
controllers (e.g., based on approximate linearization or dynamic inversion designs).
Dynamic compensators make use of current and past plant output measurements to
generate control actions. The functional mapping (based on current measurements
only) described by such a compensator is a dynamic function (involving temporal
differentiation or integration) that requires internal state variables to characterize
past plant behavior. Examples include state feedback controller / state estimator
combinations (e.g., LQG based designs) and classical frequency domain compensators
(e.g., PID or lead-lag compensators). In the case of static feedback and dynamic com-
pensators that are fixed, the control law is designed off-line and does not vary on-
13
line (in the sense that its parameters and structure do not depend on the
mance of the closed-loop system).
fixed designs flexible designs
staticfeedback
dynamiccompensat ion
adaptivecontro l
l e a r n i n gcontro l
⇒ increasing use of past information ⇒
Figure 3. A spectrum of control strategies.
In contrast, control systems designs that are not fixed (and hence, involve adaptation
or learning), employ performance feedback information to adjust the parameters or
structure of the control law. Note that the distinction between "fixed" and "non-
fixed" control systems may depend on a somewhat arbitrary distinction between the
adjustable parameters and states of a control system.3 As is illustrated in Fig. 3,
adaptive and learning controllers both utilize past plant information to a greater
degree than either of the fixed control system design strategies. As previously indi-
cated, however, learning in the context of control can be construed as the ability to
automatically develop and retain a desired control law for a given plant, based on
closed-loop interactions with the plant (and its environment). The ability to
the requisite control law on-line clearly differentiates learning control approaches
from fixed design approaches (including those that are gain scheduled). The ability
to retain the control law as a function of operating condition further differentiates
learning strategies from adaptive ones. Thus, learning control systems utilize past
experience to the greatest extent, in their attempt to store control or model informa-
tion as a function of the plant operating condition. Note that extensive utilization of
past experience may lead to an increase in the computational resources required for
on-line operation. The expected return, of course, is increased performance.
3. If a control system "parameter" is adjusted in a dynamic fashion (i.e., it is not strictly a static function of other controlsystem variables), then technically, it is a state variable. Nevertheless, a useful distinction can often be made betweenadjustable parameters and states, based on the design of the control system and the time rate of change of the quantitiesin question. For example, the "states" of a controller are usually determined by signals internal to the control loop(i.e., determined more or less directly from plant measurements), while the "parameters" of a controller are usually deter-mined by an external agent. Moreover, the time rate of change of the state variables is usually significantly faster thanthat of the adjustable parameters.
14
2 . 5 Implementation of Learning Systems
From the previous discussion, it is clear that a learning system must be capable of
accumulating and manipulating experiential information, storing and retrieving
compiled knowledge, and adapting its stored knowledge to accommodate new
experiences. A key implementation point is that a learning system will require an
efficient representational framework to retain empirically derived knowledge. In
addition, the structure and operational attributes of a learning system will be deter-
mined, in part, by the quantity and quality of the a priori design information that is
available, including the anticipated characteristics of the experiential information
that is expected to be measurable on-line.
One simple way to implement a learning system is to use a discrete-input, analog-
o u t p u t mapping; that is, to partition the input-space into a number of disjoint
regions, so that the current output is determined by "looking up" the analog output
value associated with the current input region. Although the output values of such a
device are analog, the overall mapping will be discontinuous, as depicted in Fig. 4a.
Many early learning control systems were based on this type of architecture (e.g.,
BOXES — Michie & Chambers (1968)). Assuming that the learning system output is
used directly for control action, a nonlinear control law can be developed by
learning the appropriate output value for each input region (resulting in a
"staircase" approximation to the actual desired control law). The connection between
learning and function synthesis is readily apparent in this case. One drawback of
this approach is the combinatorial growth in the number of regions required, as
either the input-space dimension or the number of partitions per input-space
dimension is increased; another is the discontinuous nature of the control laws that
can be represented.
A more sophisticated learning system can be developed via an appropriate
mathematical framework that is capable of representing a family of continuous
functions (see Fig. 4b); this framework can have a fixed or variable structure, and a
potentially large number of free parameters. Such architectures, including
artificial neural networks, are often used in contemporary learning control systems.
In this case, the learning process is designed to automatically adjust the parameters
(or structure) of the functional form to achieve the desired input / output mapping.
Such representations have important advantages over simple look-up table ap-
15
proaches; for instance, continuous functional forms are generally more efficient in
terms of the number of free parameters, and hence, the amount of memory, needed to
approximate a smooth function. Furthermore, they are capable of automatically
providing local generalization of learning experiences.
Figure 4a. A "staircase" approximationbased on a discrete-input, analog-output
mapping .
Figure 4b. A smooth approximation basedon a continuous mapping structure.
3 Connectionist Learning Systems for Control
Although the motivation, models, and goals of much of connectionist learning
systems research are derived from the biological and behavioral sciences, current
research in the application of this work to problems in automatic control has yielded
an emerging theory with a firm basis in the established mathematical disciplines of
function approximation, estimation, and optimization. This approach coincides with
traditional connectionist implementations of learning systems in a number of ways:
namely, in the use of massively parallel architecture, extensive memory, local
computation, and automatic self-adjustment of parameters and structure. This
approach differs, however, in its view of the "learning" process itself. This new
perspective will be discussed further in this section.
A commonly held notion is that learning results in an assoc ia t ion between input
stimuli and desired output actions. By interpreting the word "association" in a math-
ematically rigorous manner, one is naturally lead to the central idea underlying
many contemporary learning control systems. In these systems, "learning" is
viewed as a process of automatically synthesizing multivariable functional
16
mappings, based on a criterion for optimality and experiential information that is
gained incrementally over time (Baker & Farrell (1990)). Most importantly, this pro-
cess can be realized through the selective adjustment of the parameters and structure
of an appropriate representational framework.
Brief reviews of some early learning control system implementations are presented
in Sklansky (1966), Fu (1970), Franklin (1989), and Farrell & Baker (1992). The
advantages of contemporary implementation schemes relative to those of the past are
numerous and will be considered in this section. In this presentation we shall
decompose the overall problem of learning control into two sub-problems: first, how
can a learning system be realized or implemented and, second, how can such a
learning system be employed within a control system architecture? In this section
we discuss implementations of learning systems and show how connectionist net-
works can be used to realize the automatic function synthesis capabilities we desire.
Key benefits are derived from the smoothness, generality, and representational effi-
ciency of the mappings that can be obtained. Answers to the second question are
discussed in Section 4.
To further develop the main theme of this section, we will proceed by elaborating on
the notion of learning as automatic function synthesis. After a more formal in-
troduction to the concept and its key issues, the applicability of connectionist
ing systems in this context will be described, followed by subsections covering the
issues of incremental and spatially localized learning.
3 . 1 Learning as Function Synthesis
To allow for a concrete discussion of the key concepts, we will make use of the follow-
ing definitions. Let M denote a general memory device that has been incorporated
into a learning system, and let D represent the domain over which this memory is
applicable. If x ∈D , then the expression u = M x( ) will be used to signify the "recall"
of item u , by "situation" x , from the memory M . The desired mapping to be stored by
this memory (via learning) will be denoted by M* x( ) . For purposes of control, x
could be used to represent plant states or outputs, or even more general control
situations; similarly, u could be used to represent control action directly or the
parameters of a control system or plant model. In the discussion that follows, we will
assume (without loss of generality) that x corresponds to the plant state (i.e., a point
17
in the state-space) or to a small set of plant states (i.e., a small closed region of the
state-space), and that u corresponds to control action. If the desired mapping M*
were explicitly known (which is n o t generally the case), then a basic learning
scheme would involve the following three steps: (1) Given x , use the current
mapping M to generate u = M x( ) . (2) Compare u with the desired u
* , given by
u* = M* x( ). (3) Update the mapping M to reduce the discrepancy between u and u
* .4
For a wide and important class of learning control problems, the desired mapping is
known (or assumed) to be continuous in advance. In such situations, memory im-
plementations with efficient storage mechanisms can be proposed. By assuming that
the desired mapping M* is continuous, an approximate mapping M can be
implemented by any scheme capable of approximating arbitrary continuous
functions. In such cases, the memory M is represented as a continuous function,
parameterized by a vector p ; i.e., M = M x;p( ). To apply this approach to the learning
scheme described above, the learning update step (3) would be achieved by
appropriately adjusting the parameter vector p by an amount ∆p (yet to be de-
termined). By "appropriate," we mean that the adjusted parameter vector p = p + ∆p is
such that the resulting u = M x; p( ) would be "better" than the original u , relative to
the desired response u* . As new learning experiences became available, the
mapping M would be incrementally improved. Knowledge recall would be achieved
by evaluating the functional mapping at a particular point in its input domain.
In this parameterized approach to function synthesis, the knowledge that is gained
over time is stored in a distr ibuted fashion in the parameter space of the memory.
This feature, which arises naturally in any practical implementation of a continuous
mapping, can be most desirable from a learning control point of view (depending on
the way it is achieved, as discussed below). Distributed learning is advantageous
when previous learning under s imilar circumstances can be combined to provide a
suitable response for the current situation. This fusion process effectively broadens
the scope and influence of each learning experience and is referred to as general iza-
tion .
4. If the desired mapping is not explicitly known, then a variant of this scheme can still be used in which the second stepis approximated. As will be discussed, this can be accomplished (directly) by estimating the desired values of themapping, or (indirectly) by estimating the gradient of an objective function with respect to the mapping.
18
There are several important ramifications of generalization. First, it has the effect of
eliminating "blank spots" in the memory (i.e., specific points at which no learning
has occurred), since s o m e response (albeit not necessarily the desired one) will
always be generated. Second, it has the effect of constraining the set of possible
input / output mappings that can be achieved by the memory, since in most cases
neighboring input situations will result in similar outputs (i.e., the mapping would
become a smooth or piecewise-smooth function). Finally, generalization complicates
the learning process, since the adjustment of the mapping following a learning
experience can no longer be considered as an independent, point-by-point process.
In spite of this, the advantages accorded by generalization usually far outweigh the
difficulties it evokes.
Generalization is an intrinsic feature of function synthesis approaches that rely on
parameterized continuous mappings. In any practical implementation having a
finite number of adjustable parameters, each adjustable parameter will affect the
realized function over a region of non-zero measure. When a single parameter p j
(from the set p = p1, p2 , . . . , pN{ }) is adjusted to improve the approximation at a specific
point x , the continuous mapping M (i.e., at least one of the outputs of
M = M1, M2 , . . . , Mm{ }) will be affected throughout the region of "influence" of that
parameter. This region of influence is determined by the partial derivatives ∂Mi ∂p j
(one for each output of M), which are functions of the input x . Under these condi-
tions, the effect of a learning experience will be generalized automatically, and ex-
tended to all parts of the mapping in which the "sensitivity" functions ∂Mi ∂p j are
non-zero. The greatest effect will occur where ∂Mi ∂p j is largest; little or no change
will occur wherever this quantity is small or zero. The nature of this generalization
may or may not be beneficial to the learning process depending on whether the ex-
tent of the generalization is local or global. These issues are further discussed in
Subsection 3.4.
For function synthesis approaches based on parameterized representations, the
learning process requires an algorithm that will specify an appropriate ∆p so as to
achieve some desired objective. When the mathematical structure used to implement
the mapping is continuously differentiable and the objective function J can be
treated as a "cost" to be minimized, then the construction of ∆p can be straight-
forward. In the special case where the adjustable parameters p appear linearly in
the gradient vector ∂J ∂p of the cost function J with respect to the adjustable
19
parameters p , the optimization could be treated as a linear algebra problem; in
general (i.e., for most applications), nonlinear optimization methods must be used.
One nonlinear technique that is suitable for on-line learning is the g r a d i e n t
learning algorithm: ∆p = −W ⋅ ∂J ∂p , where W is a positive definite matrix that deter-
mines the "learning rate," and the gradient ∂J ∂u is defined to be a column vector. If
a second-order Taylor expansion is used to provide a local approximation of the
objective function J (about the current parameter vector p ), then the "optimum" W
which minimizes this local quadratic cost function in a single step can be shown to
be equal to the inverse of the Hessian matrix H (of J ), so that
Wopt = H−1 =∂ 2J
∂p∂pT
−1
(1)
Eqn. (1) is only valid when the local Hessian matrix is positive definite. Because it is
difficult to compute and invert the Hessian on-line, the weight matrix W is usually
only an approximation of the full Hessian, as in the Levenberg-Marquardt method
(see Press, et al. (1988)). Often, in fact, a single learning rate coefficient α is used to
set W = αI .
More insight can be gained into the gradient learning algorithm through an
application of the chain rule, which yields: ∆p = −W ⋅ ∂uT ∂p ⋅ ∂J ∂u (where the
Jacobian ∂uT ∂p is defined as a matrix of gradient column vectors ∂ui ∂p , so that
∂J ∂p = ∂uT ∂p ⋅ ∂J ∂u). This form of the gradient learning rule involves two types of
information: the Jacobian of the outputs of the mapping with respect to the ad-
justable parameters, and the gradient of the objective function with respect to the
mapping outputs. The gradient ∂J ∂u is determined both by the specification of the
objective function J and the manner in which the mapping outputs affect this
function (which, in turn, is determined by the way in which the learning system is
used within the control system architecture). The Jacobian ∂uT ∂p is completely
determined by the approximation structure M and, hence, is known a priori as a
function of the input x . Note that the performance feedback information provided to
the learning system is the output gradient ∂J ∂u. This gradient vector provides the
learning system with considerably more information than the scalar J ; in
particular, ∂J ∂u indicates both a direction and magnitude for ∆p (since ∂uT ∂p is
known), whereas performance feedback based solely on the scalar J does neither.
20
To give an illustrative example, a simple quadratic objective function might be
defined as
J = 12
eiTei
E
∑ (2)
where J is the cost to be minimized (over a finite set of evaluation points
xi ∈E = x1,x2 , . . . ,xR{ }) and the output errors ei = ui
* − ui = M* (xi ) − M(xi ) are assumed to
be known. In the special case where the objective function is given by Eqn. (2) and
W = αI , the learning rule is
∆p = α∂ui
T
∂p⋅ ei
E
∑If the objective function is a strictly convex function of p , then the gradient algo-
rithm will find the optimum value p* that minimizes J . For most practical learning
control problems, however, the situation is much more complicated. The objective
function J to be minimized may involve terms that are only known implicitly (e.g.,
the desired output u* may not be explicitly known or, equivalently, the output error
e of the mapping may not be measurable); moreover, J may be significantly more
complex than that shown in Eqn. (2) (e.g., J may be a dynamic rather than a static
function). Finally, for reasons that will be discussed in Subsection 3.3, objective
functions defined over a finite set of evaluation points (as in Eqn. (2)) cannot usually
be used directly for on-line learning control.
As with all gradient based optimization techniques, there exists a possibility of
converging to a local minimum if the objective function is not unimodal. This point
together with the preceding discussion suggests two desiderata for learning control
systems employing gradient learning methods: first, the architecture should allow
for the determination (or accurate estimation) of the gradient ∂J ∂u and, second, the
cost function J should be a convex function of the adjustable parameters p . Note
that it may be possible to determine or estimate ∂J ∂u without ever knowing u* (this
point will be exploited in Section 4).
3 . 2 Connectionist Learning Systems
Connectionist systems, including what are often called "artificial neural networks,"
have been suggested by many authors to be ideal structures for the implementation
of learning control systems. A typical connectionist system is organized in a
21
network architecture that is comprised of nodes and connections between nodes.
Each node can be thought of as a simple processing unit, with a number of adjustable
parameters (which do not have to appear linearly in the nodal input / output
relationship). Typically, the number of different node types in a network is small
compared to the total number of nodes. Common examples of connectionist systems
include multilayer sigmoidal (Rumelhart, et al. (1990)) and radial basis function
(Poggio & Girosi (1990)) networks.5 The popularity of such systems arises, in part,
because they are relatively simple in form, are amenable to gradient learning
methods, and can be implemented in parallel computational hardware. For example,
"error back-propagation" (which is discussed in the appendix of Chapter 8) is an ef-
ficient implementation of a gradient algorithm to modify the adjustable network pa-
rameters in a multilayer sigmoidal network, based on the squared error at the output
of the network.
Perhaps more importantly, however, it is well known that several classes of
connectionist systems have the universal approximation property. This property
implies that any continuous function can be approximated to a given degree of accu-
racy by a sufficiently large network (Funahashi (1989), Hornik, et al. (1989)).
Although the universal approximation property is important, it is held by so many
different approximation structures that it does not form a suitable basis upon which
to distinguish them. Thus, we must ask what other attributes are important in the
context of learning control. In particular, we must look beyond the initial biological
motivations for connectionist systems and determine whether they indeed hold any
advantage over more traditional approximation schemes. An important factor to
consider is the env i ronment in which learning will occur. Thus, for example, the
quantity, quality, and content of the information that is likely to be available to the
learning system during its operation critically impact its performance, and should be
accounted for in the selection of a suitable learning approach.
The particular scenarios that we will consider involve the use of pass ive learning
strategies; that is, learning schemes that are opportunistic and exploit whatever
5. We do not consider any recurrent networks (i.e., networks having internal feedback and, hence, internal state) in thisdiscussion for the simple reason that any recurrent network representing a continuous or discrete-time dynamic mappingcan be expressed as an equivalent dynamical system comprised of two static mappings separated by either an integrationor unit delay operator. In other words, the problem can always be decomposed into two component problems: that ofestimating the parameters of the static mappings and that of estimating the state of the dynamical system (e.g., via anextended Kalman filter (Livstone, et al. (1992)).
22
information happens to be available during the normal course of operation of the
closed-loop system. In contrast, one might also consider active learning strategies,
in which the learning control system not only attempts to drive the outputs of the
plant along a desired trajectory, but also explicitly seeks to improve the accuracy of
the mapping maintained by the learning system. This is achieved by introducing
"probing" signals that direct the plant into regions of its state-space where
insufficient learning has occurred. Active learning control is analogous to
(adaptive) control (Åström & Wittenmark (1989)). Because we wish to focus on
passive learning strategies, the learning systems we consider must be capable of
accommodating on-line measurements and performance feedback that arise during
the normal operation of the closed-loop system. This situation presents special
challenges, as discussed in the next subsection.
3 . 3 Incremental Learning Issues
If the goal is to have learning occur on-line, in conjunction with a plant that can be
nominally modeled as the discrete-time dynamical system
xk+1 = f xk ,uk( )yk = h xk ,uk( ) (3)
where f ⋅,⋅( ) and
h ⋅,⋅( ) are continuous, then an objective function of the form given by
Eqn. (2) cannot be used directly. The main problem is that the set of possible inputs
to the mapping maintained by the learning system will not consist of a finite set of
discrete points. Consequently, it will not be easy way to select a finite set of
representative evaluation points zi ∈E, nor will it be possible to guarantee that any
or all of them are ever visited. In general, the inputs z to the learning system will
be comprised from measured or estimated values of
x,u,y{ } — which represent a con-
tinuum. Fortunately, various alternative objective functions that approximate Eqn.
(2) are feasible and are often used in practice. For example, one approach would be
to allow the set E to grow on-line to include all zi as they are encountered; i.e.,
Ek = z1,z2 , . . . ,zk{ } (4)
In the special case where the adjustable parameters p appear linearly in the
gradient ∂J ∂p of Eqn. (2) and E is given by Eqn. (4), recursive linear estimation
techniques (e.g., RLS) could be used to obtain the "optimum" parameter vector p*
(corresponding to the particular set E). In most connectionist networks, however,
some or all of the adjustable parameters appear nonlinearly in ∂J ∂p ; hence, linear
23
optimization methods cannot be used. Moreover, evaluation sets of the form given by
Eqn. (4) are difficult to employ in a nonlinear setting.
By far, the most common objective function used for on-line learning in control
applications is the point-wise function given by
J = 1
2eTe (5)
Eqn. (5) can be considered as a special case of Eqn. (2) when the evaluation set E
contains a single point at each sampling instant. Learning algorithms that seek to
minimize point-wise objective functions in lieu of objective functions defined over a
continuum are referred to as incremental learning algorithms; they are related to a
broad class of stochastic approximation methods (Gelb (1974)). Incremental gradient
learning algorithms operate by approximating the actual gradient ∂J ∂p of Eqn. (2)
with an in s tan taneous estimate of the gradient, based on Eqn. (5). Incremental
gradient learning algorithms of this form are related to stochastic gradient methods
(Haykin (1991)). The use of point-wise objective functions to approximate batch
ensemble) objective functions (i.e., those in which E contains more than one point)
will generally not be successful unless special attention is given to the distribution
of the evaluation points, the form of the learning algorithm, and the structure of the
network . We will have more to say concerning this point in the next subsection.
One well-known and widely used stochastic gradient algorithm is the least-mean-
square (LMS) algorithm (Widrow & Hoff (1960)). The LMS parameter adjustment law
is ∆p = −α ∂J ∂p, where the gradient ∂J ∂p is based on Eqn. (5). Given certain
assumptions (e.g., linearity, stationarity, Gaussian-distributed random variables,
etc.), LMS can be shown to be convergent, relative to the objective function of Eqn.
(2), with E given by Eqn. (4). In this case, the LMS algorithm is guaranteed to be
convergent in the mean and mean-square, i.e.,
limk→∞
E pk( ) = popt and limk→∞
E Jk( ) = Jsubopt > Jmin
if the learning rate coefficient α (a constant) satisfies conditions related to the
eigenvalues of the correlation matrix of z (e.g., α cannot be too large) (Haykin
(1991)). In the first limit, as the number of learning experiences goes to infinity,
the expected value of the parameter vector approaches that of the optimum
parameter vector popt corresponding to the Wiener solution for this problem (which
achieves Jmin). In the second limit, the expected value of the cost (which is the
mean-square error), also approaches a limit, but not the minimum value achieved by
24
the optimum (Wiener) solution. Under these same conditions, convergence of the
parameter vector (not its expected value) to the optimum value, i.e.,
limk→∞
pk = popt
can be obtained if the learning rate coefficient decreases at a special rate over time
(e.g., αk ~ 1 k) (Gelb (1974)). Although the theory supporting the stability and
convergence of the LMS algorithm only applies to the special case of a l i n e a r
network (among other assumptions), the basic strategy underlying LMS has been
used to formulate a simple learning algorithm for nonlinear networks. In this case,
the parameter adjustment law becomes
∆p = α
∂uT
∂p⋅ e (6)
where ∂J ∂p is based on Eqn. (5) (with e = u* − u ), so that the performance feedback
signal provided to the network is ∂J ∂u = −e . Eqn. (6) represents the standard
incremental gradient algorithm presently used by most practitioners for on-line
learning control; it is equivalent to incremental "error back-propagation" (e.g., see
Rumelhart, et al. (1990)).
3 . 4 Spatially Localized Learning
Special constraints are placed on a learning system whenever learning is to occur
on-line, during closed-loop operation; these constraints can impact the network
architecture, learning algorithm, and training process. Assuming a passive learn-
ing system is being employed, the learning experiences (training examples) cannot
be selected freely, since the plant state (and outputs) are constrained by the system
dynamics, and the desired plant outputs are constrained by the specifications of the
control problem (without regard to learning). Under these conditions, the system
state may remain in small regions of its state-space for extended periods of time (e.g.,
near setpoints). In turn, this implies that the measurements z used for incremental
learning will remain in small regions of the input domain of the mapping being
synthesized. Such "fixation" can cause undesirable side-effects in situations where
parameter adjustments (based on incremental learning algorithms) have a non-local
effect on the mapping maintained by the learning system.
For example, if a parameter that has a non-local effect on the mapping is repeatedly
adjusted to correct the mapping in a particular region of the input domain, this may
25
cause the mapping in other regions to deteriorate and, thus, can effectively "erase"
learning that has previously taken place. Such undesirable behavior arises because
the parameter adjustments dictated by an incremental learning algorithm are made
on the basis of a single evaluation point, without regard to the remainder of the
m a p p i n g . Another unfortunate phenomenon is inherent in all incremental
learning algorithms: conflicting demands on the adjustable parameters are created
because, for instance, the vector pi* that minimizes J in Eqn. (5) at some point zi , will
generally differ from the vector p j
* that minimizes this function at some other point
z j . The idiosyncrasies associated with passive incremental learning in closed-loop
control (i.e., fixation coupled with non-local learning, and conflicting parameter
updates), have precipitated the development and analysis of spatially localized
learning systems.
The basic idea underlying spatially localized learning arises from the observation
that learning is facilitated in situations where a clear association can be made
between a subset of the adjustable elements of the learning system and a localized
region of the input-space. Further consideration of this point in the context of the
difficulties described above, suggests several desired traits for learning systems that
rely on incremental gradient learning algorithms. These objectives can be
expressed in terms of the previously mentioned "sensitivity" functions ∂Mi ∂p j ,
which are the partial derivatives of the mapping outputs Mi with respect to the
adjustable parameters p j . At each point x in the input domain of the mapping, it is
desired that the following properties hold:
• for each Mi, there exists at least one p j such that the function
∂Mi ∂p j is
relatively large in the vicinity of x (coverage)
• for all Mi and p j , if the function
∂Mi ∂p j is relatively large in the vicinity of
x , then it must be relatively small elsewhere (localization)
Under these conditions, incremental gradient learning is supported throughout the
input domain of the mapping, but its effects are limited to the local region in the
vicinity of each learning point. Thus, experience and consequent learning in one
part of the input domain have only a marginal effect on the knowledge that has
already been accrued in other parts of the mapping. For similar reasons, problems
due to conflicting demands on the adjustable parameters are also reduced.
Several existing learning system designs, including BOXES (Michie & Chambers
(1968)), CMAC (Albus (1975)), radial basis function networks (Poggio & Girosi (1990)),
26
and local basis / influence function networks (Baker & Farrell (1990)), generally do
exhibit the spatially localized learning property. In contrast, the ubiquitous
sigmoidal (or perceptron) network often does not exhibit this property. To combat
the problems associated with non-localized learning and conflicting parameter
updates, a number of simple corrective procedures have been used with sigmoidal
networks, including local batch learning, very slow learning rates, distributed (un-
correlated) input sequences, and randomizing input buffers (e.g., see Baird & Baker
(1990)).
To give a simple example of spatially localized learning, we will briefly describe local
basis / influence function networks and, in particular, the linear-Gaussian network
This approach relies on a combination of local basis and inf luence function nodal
units to achieve a compromise between the spatially localized learning properties of
quantized learning systems (e.g., those based on "bins") and the efficient
representation and generalization capabilities of other connectionist networks. The
complete network mapping is constructed from a set of "basis" functions f i x( ) that
have applicability only over spatially localized regions of the input-space. The
influence functions γ i x( ) are coupled in a one-to-one fashion with the basis
functions f i x( ) , and are used to describe the domain over the input-space (the
"sphere of influence") of each local basis function. In other words, relative to some
point xo in the input domain, each influence function γ i x( ) is defined as a non-
negative function, with a maximum at xo, that tends to zero for all points x that are
"far away" from xo. The overall input / output relationship is given by
y x( ) = Γ i x( )f i x( )i=1
n
∑ (7a)
where Γ i x( ) are the normalized influence functions, defined to be
Γ i x( ) =γ i x( )
γ j x( )j =1
n
∑with 0 < Γ i x( ) ≤ 1 and Γ i x( ) = 1
i=1
n
∑ (7b)
By design, each adjustable parameter in this network affects the overall mapping
only over the limited region of its input-space described by the associated
(normalized) influence function. Thus, the aforementioned "fixation" problem is
avoided. Note also that (local) generalization is an inherent property of the network,
and that standard incremental gradient learning methods can be still be used.
27
To further illustrate the basic concept, we will consider a specific realization
employing linear functions (with an offset) as the local basis units, and Gaussian
functions as the influence function units. In this l i near -Gauss i an network , the
functions f i x( ) and
γ i x( ) are defined to be:
f i x( ) = Mi x − xio( ) + bi
γ i x( ) = ci exp − x − xio( )T
Qi x − xio( ){ } (8)
where, for each node pair i in the network, the matrices Mi and Qi , the vectors xio
and bi, and the scalar ci are all potentially adjustable ( Qi must be positive definite).
The vector xio represents the local origin shared by the linear-Gaussian pair, the idea
being that the overall mapping is approximated by f i x( ) in the "vicinity" of xi
o (as
characterized by Γ i x( ), relative to all other
Γ j≠ i x( )). Because of its unique structure,
physical meaning is more easily attributed to each parameter and to the overall
structure of the network. As a result, a priori knowledge and partial solutions are
easily incorporated (e.g., linear control point designs corresponding to the f i x( )). In
fact, linear functions were chosen as the local basis units due to their simplicity and
compatibility with conventional gain scheduled mappings (alternative local basis
units may be more desirable if certain a priori knowledge is available about the
regional functional structure of the desired mapping). Due to its special structure,
this network also allows on-line variable structure learning schemes to be used,
where nodal unit pairs can be added or removed from the network to achieve more
accurate or more efficient mappings. An example of a simple linear-Gaussian
network comprised of 5 pairs of local basis / influence function units is shown in
Fig. 5; the influence functions (lower part of the figure) have been separated from
each other somewhat so that each of the local linear functions is clearly visible in
the overall input / output mapping (upper part of the figure).
28
Figure 5. A somewhat exaggerated example of a linear-Gaussian network mapping
( ℜ2 → ℜ ), together with its underlying set of influence functions.
Learning algorithms for spatially localized networks can capitalize on localization in
two ways. First, spatial localization implies that at each instant of time only a small
subset of the nodal units (and hence a small subset of the adjustable parameters)
have a significant effect on the network mapping. Thus, the efficiency of both
calculating the network outputs and of updating the network parameters can be im-
proved by ignoring all "insignificant" nodal units. For example, this can be realized
in a linear-Gaussian network by utilizing only those nodal unit pairs with the largest
normalized influences; that is, those whose combined (normalized) influence equals
or exceeds some predefined threshold (e.g., 0.95). This approach (which can be
considered as a means of achieving a "sparse" computational problem) can greatly
increase the throughput of a network when implemented in sequential
computational hardware. Furthermore, since the system state may remain in partic-
ular regions of its state-space for extended periods of time, it is expected that the
approximation error will not tend uniformly to zero. Instead, the error will be lowest
in those areas where the greatest amount of learning has occurred. This leads to
conflicting constraints on the learning rate: it should be small, to filter the effects of
noise, in those regions where the approximation error is small; at the same time, it
should be larger, for fast learning, in those regions where the approximation error
is large (relative to the ambient noise level). Resolution of this conflict is possible
through the use of spatially localized learning rates, where individual learning rate
29
coefficients are maintained for each (spatially localized) adjustable parameter and
updated in response to the local learning conditions. In this case, the elements of the
weighting matrix W would vary individually over time.
The computational memory requirements for spatially localized networks fall
somewhere between those for non-local connectionist networks (on the low side) and
those for discrete-input, analog-output mapping architectures (on the high side). By
requiring each parameter to have only a localized effect on the overall mapping, we
should expect an increase in the number of parameters required to obtain a mapping
comparable in accuracy to a (potentially more efficient) non-local technique.
Nevertheless, for automatic control applications, training speed and approximation
accuracy should have priority over memory requirements, since memory is
generally inexpensive relative to the cost of inaccurate or inappropriate control ac-
tions.
4 Learning Control System Architectures
Having motivated and discussed the basic features of connectionist learning systems
for control, this section briefly describes hybrid control system architectures that
exhibit both adaptive and learning behaviors. These hybrid structures incorporate
adaptation and learning in a synergistic manner. In such schemes, an adaptive
system is coupled with a connectionist learning system to provide real-time
adaptation to novel situations and slowly time-varying dynamics, in conjunction
with learning to accommodate stationary or quasi-stationary state-space dependen-
cies (e.g., memoryless nonlinearities). The adaptive control system reacts to
discrepancies between the desired and observed behaviors of the plant, to maintain
the requisite closed-loop system performance. These discrepancies may arise from
time-varying dynamics, disturbances, or unmodeled dynamics. In practice, little can
be done to anticipate time-varying dynamics and disturbances; thus, these
phenomena are usually handled through feedback in the adaptive system. In
contrast, the effects of some unmodeled dynamics (in particular, static non-
linearities) can be predicted from previous experience. This is the task given to the
learning system. Initially, all unmodeled behavior is handled by the adaptive system;
eventually, however, the learning system is able to a n t i c i p a t e previously
experienced, yet initially unmodeled behavior. Thus, the adaptive system can
30
concentrate on novel situations (where little or no learning has occurred) and
slowly time-varying behavior.
Two general hybrid architectures are outlined in this section. The discussion of
these architectures parallels the usual presentation of direct and indirect adaptive
control strategies. In each approach, the learning system is used to alleviate the
burden on the adaptive controller of continually reacting to predictable state-space
dependencies in the dynamical behavior of the plant (e.g., stationary, memoryless
nonlinearities). Note that various technical issues must be addressed to guarantee
the successful implementation of these approaches. For example, to ensure both the
stability and robustness of the closed-loop system (which includes both the adaptive
and learning systems, as well as the plant), one must address issues related to:
controllability and observability; the effects of noise, disturbances, model-order
errors, and other uncertainties; parameter convergence, sufficiency of excitation,
and nonstationarity; computational requirements, time-delays, and the effects of
finite precision arithmetic. Many (if not all) of these issues arise in the im-
plementation of traditional adaptive control systems; as such, there are some existing
sources one may refer to in the hope of addressing these issues (e.g., Åström &
Wittenmark (1989), Clauberg & Farrell (1991), Narendra & Annaswamy (1989), Slotine
& Li (1991)). Although these topics are well beyond the scope of this chapter, in some
instances the learning augmented approach appears to offer operational advantages
over the corresponding adaptive approach (with respect to such implementation
issues). For example, a typical adaptive system would require persistent excitation to
ensure the generation of accurate control or model parameters, under varying plant
operating conditions. A learning system, however, would only require suf f ic ient
excitation, during some training phase, to allow the stationary, state-space dependen-
cies of the parameters to be captured.
4 . 1 Direct Implementat ion
In the typical direct adaptive control approach (see Fig. 6), each control action u is
generated based on the measured ym and desired yd plant outputs, internal state of
the controller, and estimates of the pertinent control law parameters k . The
estimates of the control law parameters are adjusted, at each time-step, based on the
error e between the measured plant outputs and the outputs of a reference system
yr . Of course, care must be taken to ensure that the plant is actually capable of
31
attaining the performance specified by the selected reference system. Direct
adaptive control approaches do not rely upon an explicit plant model, and thus avoid
the need to perform on-line system identification.
The controller in Fig. 6 is structured so that normal adaptive operation would result if
the learning system were not implemented. The reference represents the desired
behavior for the augmented plant (controller plus plant), while the adaptive
mechanism is used to transform the reference error directly into a correction ∆k for
the current control system parameters. The adaptation algorithm can be developed
and implemented in several different ways (e.g., via gradient or Lyapunov based
techniques — see Åström & Wittenmark (1989), Slotine & Li (1991)). Learning aug-
mentation can be accomplished by using the learning system to store the required
control system parameters as a function of the operating condition of the plant
(Farrell & Baker (1992), Vos, et al. (1991)). Alternatively, learning can be used to
store the appropriate control action as a function of the actual and desired plant
outputs (Farrell & Baker (1991)). The architecture in Fig. 6 shows the first case.
32
∆k ym
u ydplantcontroller
reference
learning
system
control
law
design
e
δk k
+ –
Figure 6. Direct adaptive / learning approach.
When the learning system is used to store the control system parameters as a func-
tion of the plant operating condition, the adaptive system would provide any
required perturbation to the control parameters k generated by the learning system.
The signal from the control block to the learning system in Fig. 6 is the perturbation
in the control parameters δk to be associated with the previous operating condition.
This association (incremental learning) process is used to combine the estimate from
the adaptive system with the control parameters that have already been learned for
that operating condition. At each sampling instant, the learning system generates
an estimate of the control system parameters k associated with that operating
condition, and then passes this estimate to the controller where it is combined with
the perturbation parameter estimates maintained by the adaptive system, and used to
generate the control action u . In the ideal limit where perfect learning has
occurred, and there is an absence of noise, disturbances, and time-varying dynamics,
the correct parameter values would always be supplied by the learning system, so
that both the perturbations δk and corrections ∆k generated by the adaptive system
would become zero.6 Under more realistic assumptions, there would be some small
6. In this case, the system architecture is similar to that used in gain scheduling, with the proviso that learning has oc-curred on-line with the actual plant, while a gain schedule is developed off-line via a model.
33
degradation in performance due to adaptation (e.g., δk and ∆k might not be zero due
to noise).
In the case where the learning system is trained to store control action directly as a
function of the actual and desired operating conditions of the plant, the adaptive
system would provide any required perturbation to the control action generated by
the learning system. Note that a dynamic mapping would have to be synthesized by
the learning system if a dynamic feedback law were desired (which was not
necessary in the first case). The advantage of this approach over the previous one is
that a more general control law can be learned. The disadvantage is that additional
memory is required and that a more difficult learning problem must be addressed.
4 . 2 Ind i rec t Implementa t ion
In the typical indirect adaptive control approach (see Fig. 7), each control action u is
generated based on the measured ym and yd desired plant outputs, internal state of
the controller, and estimated parameters pa of a local plant model. The parameters k
for a local control law are explicitly designed on-line, based on the observed plant
behavior. If the behavior of the plant changes (e.g., due to nonlinearity), an
estimator automatically updates its model of the plant as quickly as possible, based on
the information available from the (generally noisy) output measurements. The
indirect approach has the important advantage that powerful design methods
(including optimal control techniques) may potentially be used on-line. Note, how-
ever, that computational requirements are usually greater for indirect approaches
since both model identification and control law design are performed on-line.
If the learning system in Fig. 7 were not implemented, then this structure would
represent the operation of a traditional indirect adaptive control system. The signal
pa is the adaptive estimate of the plant model parameters. This signal is used to
calculate the control law parameters k . Incorporation of the learning system would
allow the plant model parameters to be learned as a function of the plant operating
condition. The model parameters generated by the learning system allow previously
experienced plant behavior to be anticipated , leading to improved control law design
(Baird & Baker (1990)). In this case, the output of the learning system pl to both the
control design block and the estimator is an a priori estimate of the model parameters
associated with the current operating condition. An a posteriori parameter estimate
34
p post from the estimator (involving both filtering and posterior smoothing) is used to
update the mapping stored by the learning system. The system uses model parameter
estimates from both the adaptive and learning systems to execute the control law
design and determine the appropriate control law parameters. In situations where
the design procedure is complex and time-consuming, the control law parameters
might also be stored (via a separate mapping in the learning system) as a function of
the plant operating condition. Thus, control law design could be performed at a
lower rate, assuming that the control parameter mapping maintained by the
learning system was sufficiently accurate to provide reasonable control in lieu of
design at a higher rate.
p a
ym u yd
k
plantcontroller
control
law
design
adaptive
estimator
learning
systemp l
p post
Figure 7. Indirect adaptive / learning approach.
4 . 3 Summary of Adaptive / Learning Control Architectures
In both of the hybrid implementations described in this section, the learning system
(prior to any on-line interaction) would only contain knowledge derived from the
design model. During initial closed-loop operation, the adaptive system would be used
to accommodate any inadequacies in the a priori design knowledge. Subsequently, as
experience with the actual plant was accumulated, the learning system would be used
to anticipate the appropriate control or model parameters as a function of the
current plant operating condition. The adaptive system would remain active to
handle novel situations and limitations of the learning system (e.g., finite accuracy).
With perfect learning, but no noise, disturbances, or time-varying behavior in the
35
plant, the contribution from the adaptive system would eventually become zero. In
the presence of noise and disturbances, the contribution from the adaptive system
would become small, but non-zero (depending on the hybrid scheme used, however,
the effect of this contribution might be negligible). In the general case involving
all of these effects, the hybrid control system should perform better than either sub-
system individually. Recalling the discussion in Subsection 2.3, it can be seen that
adaptation and learning are c o m plementary behaviors, and that they can be used
simultaneously (for purposes of automatic control) in a synergistic fashion.
5 Conclus ion
The main objectives of this chapter were to introduce, motivate, and describe the
salient features of learning in control systems; to distinguish these features and
their associated benefits from related approaches; and to suggest means for both
implementing learning systems and incorporating them into control system
architectures. We have intentionally avoided the urge to identify and categorize the
ever-growing variety of learning systems and learning control structures. Instead,
we have focused on the key issues underlying their motivation, operation, and
implementation for the purpose of intelligent control.
In the framework we have presented, learning can be construed as the purposive
adjustment of the parameters (and possibly structure) of an appropriate
representational framework to achieve a desired mapping. Learning systems (for
control) can be realized through connectionist network structures coupled with
automatic function synthesis mechanisms. Two general control system architectures
have been presented to demonstrate how learning systems can be used to augment
traditional adaptive control system structures in a synergistic fashion. Further
discussion of implementation issues and other learning control system structures is
given throughout this book, and also in the available literature. Intelligent control
systems incorporating learning have the potential to a c c o m m o d a t e u n c e r t a i n t y
through on-line interaction with the actual plant and improve efficiency and per-
formance through on-line self-optimization.
Presently, scientific and engineering substantiation of many of the potential
benefits associated with learning control remain to be produced. Nevertheless, on
the strength of the preliminary results that have been obtained, as well as the basic
36
theory that has been developed, further examination of the fundamental issues
underlying learning control and its application to intelligent control is warranted.
In particular, new research and development efforts aimed at demonstrat ing
sibility and potential benefits of learning control, and ident i fy ing future research
directions are needed. The fact that conventional and relatively inexpensive
computational facilities can be sufficient to support real-time implementations has
already been demonstrated (e.g., Farrell & Baker (1991)).
To close this chapter, we suggest the following topics as areas for future research
that appear to offer significant potential for improving the capabilities of existing
learning control systems.
1. Incremental function synthesis. How large a network is needed to adequately
represent a desired mapping (representational power)? What computational re-
sources will be required for implementation (representational efficiency
throughput )? Can convergence and stability of the adjustable parameters be
guaranteed under any (even simplifying) conditions? If so, how fast will it
occur? Are there potential pitfalls associated with certain types of representation
schemes and learning algorithms (e.g., non-spatially localized learning systems)?
2. Variable structure learning. The representational power and computational
requirements of a connectionist network are determined to a great extent by its
structure. When the structure is determined and fixed a priori, conservative
design practices easily lead to over-design and an inefficient use of resources.
Thus, too few resources may be assigned to approximate complex portions of the
desired mapping, limiting the approximation accuracy, or too many resources
may be applied to approximate relatively simple parts of the mapping, resulting
in excessive computational requirements. Variable structure learning schemes
are one solution to this dilemma; however, efficient modification rules that dictate
when, where, and how changes are to be made, remain to be identified.
3. Coupling of adaptive and learning control systems. Both of the hybrid
architectures described in Section 4 relied on a combination of adaptive and
learning phenomena. For example, in the direct approach, control system
parameter estimates from the adaptive and learning systems had to be combined,
while in the indirect approach, model parameter estimates were combined prior
37
to on-line control law design. Is there an optimal way to perform such fus ion
Considering optimal linear estimation techniques such as Kalman filtering, one is
naturally led to the idea that any reasonable blending of adaptive and learning
estimates will depend on maintaining measures of their individual "quality" (e.g.,
error covariance). How can the qua l i ty (or c o n f i d e n c e ) associated with the
current state of the learning system (which will vary throughout over its input
domain) be represented and maintained?
4. Higher level learning. The a priori specification of a realistic objective function
that can be achieved over an entire operational envelope is a difficult problem.
Accordingly, one topic for future research might be directed at adjusting the
objective function on-line to seek increased performance where possible, while
decreasing strain on the system where necessary. Another research topic might
involve planning and exploration techniques (e.g., see Chapter 9) to perform dual
control and learning, and thus ensure adequate training throughout the op-
erational envelope. These, and most other, higher level forms of learning control
all involve more complex optimization problems than are usually considered at
the present time.
A c k n o w l e d g m e n t
This article is based on work that was supported, in part, by: the Charles Stark Draper
Laboratory, Inc., under IR&D Project No. 276; the Air Force Wright Laboratory, under
Contract No. F33615-88-C-1740; the National Science Foundation, under Grant No. ECS-
9014065; and the Naval Air Warfare Center, under Contract No. N62269-91-C-0033.
Any opinions, findings, conclusions, or recommendations expressed in this material
are those of the authors and do not necessarily reflect the views of the sponsoring
agencies or of the Charles Stark Draper Laboratory, Inc.
The authors would also like to acknowledge the contributions by all members, past
and present, of the Learning Control Research Group at Draper Laboratory — their
help is greatly appreciated.
38
B ib l iography
Albus, J. (1975). "A New Approach to Manipulator Control: The Cerebellar ModelArticulation Controller (CMAC)," ASME Journal of Dynamic Systems, Measurement,and Control, Vol. 97, pp. 220-227.
Åström, K. & Wittenmark, B. (1989). Adaptive Control, Addison-Wesley.
Baird, L. & Baker, W. (1990). "A Connectionist Learning System for NonlinearControl," Proceedings, 1990 AIAA Conference on Guidance, Navigation, andControl.
Baker, W. & Farrell, J. (1990). "Connectionist Learning Systems for Control,"Proceedings, SPIE OE/Boston '90.
Clauberg, B. & Farrell, J. (1991). "Issues in the Implementation of an IndirectAdaptive Control System," Draper Laboratory Report CSDL-P-3136, Cambridge, MA.
Farrell, J. & Baker, W. (1991). "Learning Augmented Control for AdvancedAutonomous Underwater Vehicles," Proceedings, 18th Annual AUVS TechnicalSymposium and Exhibit.
Farrell, J. & Baker, W. (1992). "Learning Control Systems," in Antsaklis, P. & Passino,K., eds., Intelligent and Autonomous Control Systems, Kluwer Academic.
Franklin, J. (1989). "Historical Perspective and State of the Art in ConnectionistLearning Control," Proceedings, 28th IEEE Conference on Decision and Control
Fu, K. (1964). "Learning Control Systems," in Tou, J. & Wilcox, R., eds., Computer andInformation Sciences, Spartan.
Fu, K. (1970). "Learning Control Systems — Review and Outlook," IEEE Transactions onAutomatic Control, Vol. AC-15, No. 2.
Gelb, A., ed., (1974). Applied Optimal Estimation, MIT Press.
Funahashi, K. (1989). "On the Approximate Realization of Continuous Mappings byNeural Networks," Neural Networks, Vol. 2, pp. 183–192.
Haykin, S. (1991). Adaptive Filter Theory, 2nd ed., Prentice Hall.
Hornik, K., Stinchcombe, M., & White, H. (1989). "Multilayer Feedforward NetworksAre Universal Approximators," Neural Networks, Vol. 2, pp. 359–366.
Livstone, M., Farrell, J., & Baker, W. (1992). "A Computationally Efficient Algorithmfor Training Recurrent Connectionist Networks," Proceedings, 1992 AmericanControl Conference.
Michie, D. & Chambers, R. (1968). "BOXES: An Experiment in Adaptive Control," inDale, E. & Michie, D., eds., Machine Intelligence 2, Oliver and Boyd.
Narendra, K. & Annaswamy, A. (1989). Stable Adaptive Systems, Prentice-Hall.
Poggio, T. & Girosi, F. (1990). "Networks for Approximation and Learning,"Proceedings of the IEEE, Vol. 78, No. 9, pp. 1481-1497.
Press, W., Flannery, B., Teukolsky, S., & Vetterling, W. (1988). Numerical Recipes in C:The Art of Scientific Computing, Cambridge University Press.
39
Rumelhart, D., Hinton, G., & Williams, R. (1986). "Learning Internal Representationsby Error Propagation," in Rumelhart, D. & McClelland, J., eds., Parallel DistributedProcessing: Explorations in the Microstructure of Cognition, Vol. 1: FoundationsMIT Press / Bradford.
Sklansky, J. (1966). "Learning Systems for Automatic Control," IEEE Transactions onAutomatic Control, Vol. AC-11, No. 1.
Slotine, J. & Li, W. (1991). Applied Nonlinear Control, Prentice-Hall.
Tsypkin, Y. (1973). Foundations of the Theory of Learning Systems, Academic Press.
Vos, D., Baker, W., & Millington, P. (1991). "Learning Augmented Gain SchedulingControl," Proceedings, 1991 AIAA Conference on Guidance, Navigation, andControl.
Widrow, B. & Hoff, M. (1960). "Adaptive Switching Circuits," 1960 WESCON ConventionRecord, Part IV, pp. 96-104.