automation & robotics research institute (arri) the ... talks/montreal plenary.pdfunknown...

Automation & Robotics Research Institute (ARRI)The University of Texas at Arlington

F.L. LewisMoncrief-O’Donnell Endowed Chair

Head, Controls & Sensors Group

http://ARRI.uta.edu/acs

Nonlinear Network Structures forFeedback Control

Organized and invited by Professor Jie Huang, CUHK

SCUT / CUHK Lectures on Advances in ControlMarch 2005

Relevance- Machine Feedback Control

qd

qr1qr2

AzEl

barrel flexiblemodes qf

compliant coupling

moving tank platform

turret with backlashand compliant drive train

terrain andvehicle vibrationdisturbances d(t)

Barrel tipposition

qd

qr1qr2

AzEl

barrel flexiblemodes qf

compliant coupling

moving tank platform

turret with backlashand compliant drive train

terrain andvehicle vibrationdisturbances d(t)

Barrel tipposition

Vehicle mass m

ParallelDamper

mc

activedamping

uc(if used)

kc cc

vibratory modesqf(t)

forward speedy(t)

vertical motionz(t)

surface roughnessρ(t)

k c

w(t)

Series Damper+

suspension+

wheel

Single-Wheel/Terrain System with Nonlinearities

High-Speed Precision Motion Control with unmodeled dynamics, vibration suppression, disturbance rejection, friction compensation, deadzone/backlash control

VehicleSuspension

IndustrialMachines

Military LandSystems

Aerospace

Newton’s Law

v(t)

p(t)

F(t)m

)()( tum

tFx

xmmaF

≡=

==

&&

&&

Mechanical Motion Systems (Vehicles, Robots)

τqB)+τq+G(q)+F(q)q(q,+Vq)qM( dm )(=&&&&&&

Coriolis/centripetalforce

gravity friction disturbances

Actuatorproblems

inertia

Control Input

LaGrange’s Eqs. Of Motion

2122

2111

xdxcxxxbxaxx

+−=−=

&

&

Darwinian Selection & Population Dynamics

x1= preyx2= predator

Volterra’s fishes

Stable Limit Cycle

2122

212111

xdxcxxexxbxaxx

+−=−−=

&

&

Effects of OvercrowdingLimited food and resources

Stable Equilibrium POINT

Favorable to Prey!

Dynamical System Models

)()()(

xhyuxgxfx

=+=&

Nonlinear system

Continuous-Time Systems Discrete-Time Systems

)()()(1

kk

kkkk

xhyuxgxfx

=+=+

Linear system

CxyBuAxx

=+=&

kk

kkk

CxyBAxx

=+=+1

1/s

f(x)

h(x)g(x)

z-1

xx& yu

Control Inputs Internal States Measured Outputs

Issues in Feedback Control

system

Feedbackcontroller

Feedforwardcontroller

Measured outputs

Control inputs

Desired trajectories

Sensornoise

Disturbances

StabilityTracking BoundednessRobustness

to disturbancesto unknown dynamics

Definitions of System Stability

xe

xe+B

xe-B

Const Bound B

tt0 t0+T

T

x(t)

x(t)

t

x(t)

t

Asymptotic Stability Marginal Stability

Uniform Ultimate Boundedness

)()(

1 kk xfxxfx

==

+

&

d

B(d)

plant

controlu(t)

outputy(t)

controller

systemidentifier estimated

output

)(ˆ ty

identificationerror

desiredoutput

)(tyd

plant

controlu(t)

outputy(t)

controllerdesiredoutput

)(tyd

trackingerror

plant

controlu(t)

outputy(t)

controller #1

controller #2

desiredoutput

)(tyd

trackingerror

Indirect Scheme

Controller Topologies

Direct Scheme

Feedback/FeedforwardScheme

Cell Homeostasis The individual cell is a complex feedback control system. It pumps ions across the cell membrane to maintain homeostatis, and has only limited energy to do so.

Cellular Metabolism

Permeability control of the cell membrane

http://www.accessexcellence.org/RC/VL/GG/index.html

Optimality in Biological Systems

Optimality in Control Systems DesignR. Kalman 1960

Rocket Orbit Injection

http://microsat.sm.bmstu.ru/e-library/Launch/Dnepr_GEO.pdf

FmmmF

rwvv

mF

rrvw

wr

−=

+−

=

+−=

=

&

&

&

&

φ

φμ

cos

sin2

2

ObjectivesGet to orbit in minimum timeUse minimum fuel

Dynamics

Performance Index, Cost, or Value function

∫∫ =+=TT

dtuxrdtuRxQJ00

),()]()([CT

Strategic utility utility

),(0

kk

N

kuxrJ ∑

=

=DT

Minimum energy RuuQxxuxr TT +=),(Minimum fuel uuxr =),(

Minimum time 1),( =uxr Then TdtuxrJT

== ∫0

),(

Discounting ),(0

kk

N

k

k uxrJ ∑=

= γ ∫ −=T

t dtuxreJ0

),(γ

Input Membership Fns. Output Membership Fns.

Fuzzy Logic Rule Base

NN

Input

NN

Output

Fuzzy Associative Memory (FAM) Neural Network (NN)

INTELLIGENT CONTROL TOOLS

Input x Output u

Input x Output u

Both FAM and NN define a function u= f(x) from inputs to outputs

FAM and NN can both be used for: 1. Classification and Decision-Making2. Control

(Includes Adaptive Control)

NN Includes Adaptive Control (Adaptive control is a 1-layer NN)

Neural Network Properties

Learning

Recall

Function approximation

Generalization

Classification

Association

Pattern recognition

Clustering

Robustness to single node failure

Repair and reconfiguration

Nervous system cell. http://www.sirinet.net/~jgjohnso/index.html

First groups working on NN Feedback Control in CS community

Werbos

NarendraSanner & SlotineF.C. Chen & KhalilLewisPolycarpou & IoannouChristodoulou & Rovithakis

A.J. Calise, McFarland, Naira HovakimyanEdgar Sanchez & PoznyakSam Ge, Zhang, et al.

Jun Wang, Chinese Univ. Hong Kong

c. 1995

Robot System[Λ I]qd e

Unity-Gain Tracking Loop

τrKv q

Industry Standard- PD Controller

)()()( tetetr Λ+= &Desiredtrajectory

Actualtrajectory

Easy to implement with COTS controllersFastCan be implemented with a few lines of code- e.g. MATLAB

But -- Cannot handle-High-order unmodeled dynamics Unknown disturbancesHigh performance specifications for nonlinear systemsActuator problems such as friction, deadzones, backlash

Controlinput

Two-layer feedforward static neural network (NN)

σ(.)

σ(.)

σ(.)

σ(.)

x1

x2

y1

y2

VT WT

inputs

hidden layer

outputs

xn ym

1

2

3

L

σ(.)

σ(.)

σ(.)

Summation eqs Matrix eqs

)( xVWy TTσ=⎟⎟⎠

⎞⎜⎜⎝

⎛+⎟⎟

⎠

⎞⎜⎜⎝

⎛+= ∑ ∑

= =

K

ki

n

jkjkjiki wvxvwy

10

10σσ

Control System Design Approach

ττ =++++ dm qFqGqqqVqqM )()(),()( &&&&&

)()()( tqtqte d −= eer Λ+= &

ττ −++−= dm xfrVrM )(&

Robot dynamics

Tracking Error definition

Error dynamics

qdRobot System[Λ I]

qe

PD Tracking Loop

τr


Robot dynamics

?controller

)()()( tqtqte d −=Tracking error

eer Λ+= &Sliding variable

The equations give the FB controller structure

Control System Design Approach


)()()( tqtqte d −= eer Λ+= &

ττ −++−= dm xfrVrM )(&

Robot dynamics

Tracking Error definition

Error dynamics

vrKxVW vTT −+= )ˆ(ˆ στDefine control input

εσ += )()( xVWxf TTApprox. unknown function by NN

Universal Approximation Property UNKNOWN FN.

)()ˆ(ˆ)( tvxVWxVWrKrVrM dTTTT

vm ++−++−−= τσεσ&

Closed-loop dynamics

)(~ tvfrKrVrM dvm +++−−= τ&

qdRobot System[Λ I]

Robust ControlTerm

q

v(t)

e

PD Tracking Loop

τrKv

Neural Network Robot Controller

^

qd

f(x)

Nonlinear Inner Loop

..

Feedforward Loop

Universal Approximation Property

Problem- Nonlinear in the NN weights sothat standard proof techniques do not work

Feedback linearization

Easy to implement with a few more lines of codeLearning feature allows for on-line updates to NN memory as dynamics changeHandles unmodelled dynamics, disturbances, actuator problems such as frictionNN universal basis property means no regression matrix is neededNonlinear controller allows faster & more precise motion

Stability Proof based on Lyapunov Extension

Define a Lyapunov Energy Function

)~~()~~( 21

21

21 VVtrWWtrMrrL TTT ++=

Differentiate

)()'ˆˆ~(~)ˆ'ˆˆ~(~

)2(21

vwrWxrVVtr

xrVrWWtr

rVMrrKrL

TTTT

TTTT

mT

vT

++++

−++

−+−=

σ

σσ&

&

&&

Using certain special tuning rules, one can show that the energyderivative is negative outside a compact set.

L&negative

)(tr

)(~ tW This proves that all signals are bounded

Problems—1. How to characterize the NN weight errors as ‘small’?- use Frobenius Norm2. Nonlinearity in the parameters requires extra care in the proof

Theorem 1 (NN Weight Tuning for Stability)

Let the desired trajectory )(tqd and its derivatives be bounded. Let the initial tracking error bewithin a certain allowable set U . Let MZ be a known upper bound on the Frobenius norm of theunknown ideal weights Z . Take the control input as

vrKxVW vTT −+= )ˆ(ˆ στ with rZZKtv MFZ )()( +−= .

Let weight tuning be provided by

WrFxrVFrFW TTT ˆˆ'ˆˆˆ κσσ −−=& , VrGrWGxV TT ˆ)ˆ'ˆ(ˆ κσ −=&

with any constant matrices 0,0 >=>= TT GGFF , and scalar tuning parameter 0>κ . Initialize the weight estimates as randomVW == ˆ,0ˆ .

Then the filtered tracking error )(tr and NN weight estimates VW ˆ,ˆ are uniformly ultimately bounded. Moreover, arbitrarily small tracking error may be achieved by selecting large controlgains vK . Backprop terms-

WerbosExtra robustifying terms-Narendra’s e-mod extended to NLIP systems

Forward Prop term?

Can also use simplified tuning- Hebbian

010

2030

4050

0

12

3

4

5

6

-20

-15

-10

-5

0

5

10

15

weights

W2 weights, x

d=[0.5sin(t) 0.5cos(t)]T

time

W2 w

eigh

ts

NN weights converge to the best learned values for the given system

0 2 4 6 8 10 12 14 16-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Time(second)

(a)

0 2 4 6 8 10 12 14 16-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

Time(second)

Leng

th(m

eter

)

(a)

0 2 4 6 8 10 12 14 16-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

Time(second)

Leng

th(m

eter

)

(b)

NN Friction Compensator

Desired trajectory

Tracking errors- solid = fixed gain controller, dashed= NN controller

Trajectory Tracking Controller

Position Velocity

position

velocity

Fixed gain

NN

Robot System[Λ I]

Robust ControlTerm

qqd

v(t)

qd

e

Tracking Loop

τf(x)

rKv


..

^

Feedforward Loop

Static NN => Dynamic NN Feedback Controller

Dynamic NN and Passivity

1s C

A

xx.Bu2

H(s)

u1

1s C

A

xx.Bu2

H(s)

u1

kkTT

kk uxVWAxx ++=+ )(1 σ

Closed-Loop System wrt Neural Networkis a Dynamic (Recursive NN)

Discrete time case

,

TT rWGxV )ˆ'ˆ(ˆ σ=&

TTT xrVFrFW ˆ'ˆˆˆ σσ −=&The backprop tuning algorithms

make the closed-loop system passive

WrFxrVFrFW TTT ˆˆ'ˆˆˆ κσσ −−=&

VrGrWGxV TT ˆ)ˆ'ˆ(ˆ κσ −=&

The enhanced tuning algorithms

make the closed-loop system state-strict passive

SSP gives extra robustness properties to disturbances and HF dynamics

Force Control

Flexible pointing systems

Vehicle active suspensionSBIR Contracts

What about practical Systems?

Flexible Systems with Vibratory Modes

τ⎥⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡+⎥

⎦

⎤⎢⎣

⎡+⎥

⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡+⎥

⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡+⎥

⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡

f

rrr

f

r

fff

r

fffr

rfrr

f

r

fffr

rfrr

BBGF

qq

Koqq

VVVV

qq

MMMM

0000

&

&

&&

&&

Rigid dynamics

Flexible dynamics

Problem- only one control input !

Flexible link pointing system

acceleration velocityposition Flex. modes

[Λ I]

Robust ControlTerm

v(t)

Tracking Loop

τrKv


f(x)^

Neural network controller for Flexible-Link robot arm

qr = qrqr.e

ee = .

qd =qdqd.

..qd

Robot Systemqfqf.

Fast PDgains

Br-1

Manifoldequation

τ

τF

ξFast Vibration Suppression Loop

Singular PerturbationsAdd an extra feedback loopUse passivity to show stability

Coupled Systems

ee

Tdm

uqiRiLiKqGqFqqqVqqM

=++

=++++

ττ

),()()(),()(

&&

&&&&&

Motor electrical dynamics

Robot mechanical dynamics

Problem- only one control input !

Sprung mass (car body) smsz

Unsprung mass (tire) umuz

F+

F−

rzterrain

tK

VehicleActiveSuspensioncontrol

[Λ I]

Robust ControlTerm vi(t)

Tracking Loop

rKr

Nonlinear FB Linearization Loop

F1(x)^ qr = qrqr.e

ee = .

qd =qdqd.

..qd

RobotSystem1/KB1 i

F2(x)^

Kη

ηid

NN#1

NN#2Backstepping Loop

ue[Λ I]

Robust ControlTerm vi(t)

Tracking Loop

rKrKr

Nonlinear FB Linearization Loop

F1(x)F1(x)^

Neural network backstepping controller for Flexible-Joint robot arm

qr = qrqr.qr =qr = qrqr.qrqr.e

ee = .e = .

qd =qdqd.qd =qd =qdqd.qdqd.

..qd..qd

RobotSystem1/KB1 i

F2(x)F2(x)^

KηKη

ηid

NN#1

NN#2Backstepping Loop

ue

Backstepping

Advantages over traditional Backstepping- no regression functions needed

Add an extra feedback loopTwo NN neededUse passivity to show stability

τ=D(u)

ud+

-d-

m+

m-

.

τ

ud+

d-

mu

τ

BacklashDeadzone

System

Feedbackcontroller

Outputs

ActualControl inputsActuator

nonlinearity

AppliedControl inputs

Actuator Nonlinearities

MechanicalSystem

Kv[ΛΤ Ι]

v

reqd

Estimateof NonlinearFunction

w

--

D(u)u

NN DeadzonePrecompensator

I

II

$( )f x

$τ

τ q

dq&&

NN in Feedforward Loop- Deadzone Compensation

iiiTTTTT

iii WWrTkWrTkUuUWrwUTW ˆˆˆ)('ˆ)(ˆ21 −−= σσ

WrSkrwUWUuUSW TTiii

TT ˆ)(ˆ)('ˆ1−−= σσ

Acts like a 2-layer NNWith enhanced backprop tuning !

little critic network

0 5 10 15-1

-0.5

0

0.5

1

x2(k)

time

0 5 10 15-1

-0.5

0

0.5

e2(k)

time

0 5 10 15-2

-1

0

1

2

x2(k)

time

0 5 10 15-1

-0.5

0

0.5

e2(k)

time

Performance Results

PD control-deadzone chops out the middle

NN control fixes the problem

Nonlinear

SystemK

v[ ΛΤ Ι]

v1

rexd

Estimate

of Nonlinear

Function

--

x

$( )f x

desτ

[0 ΛΤ ]

--

yd

(n)

-

Backlash

-

1/s

Filter v2

Backstepping loop

τ

desτ&

NN Compensator

-

dx r

Kb

uϕ

nny

FZ

Dynamic inversion NN compensator for system with Backlash

U.S. patent- Selmic, Lewis, Calise, McFarland

Performance Results

PD control-backlash chops off tops & bottoms

NN control fixes the problem

0 1 2 3 4 5 6 7 8 9 10-1.5

-1

-0.5

0

0.5

1

time

x 1(t)

PD controller with backlash

0 1 2 3 4 5 6 7 8 9 10-0.1

-0.05

0

0.05

0.1

0.15

time

e 1(t)

0 1 2 3 4 5 6 7 8 9 10-2

-1

0

1

2

time

x 2(t)

PD controller with backlash

0 1 2 3 4 5 6 7 8 9 10-1

-0.5

0

0.5

1

time

e 2(t)

position

velocity

error

0 1 2 3 4 5 6 7 8 9 10-1.5

-1

-0.5

0

0.5

1

time

x 1(t)

PD controller with NN backlash compensation

0 1 2 3 4 5 6 7 8 9 10-0.04

-0.03

-0.02

-0.01

0

0.01

timee 1(t)

position

Tracking error

~x 1

$ ( $ , $ )h x xo 1 2

• ••

• ••

• ••

• ••

• ••

• ••

$ ( $ , $ )h x xc 1 2

ROBOT

Kv

[λ Ι ]vc

kD

KvkpM-1(.)

∫

∫

$$

$q

xx=

⎡⎣⎢

⎤⎦⎥

1

2

eee

=⎡

⎣⎢

⎤

⎦⎥$&

qqqdd

d=

⎡

⎣⎢⎤

⎦⎥&$x1

$x2

$z2

1x)(tτ)(ˆ tr

Neural Network Observer

Neural Network Controller

1~x

111

212

121

~)()()ˆ,ˆ(ˆˆ

~ˆˆ

xKxMxxWz

xxz

++=

+=− t

k

oTo

D

τσ&

&

TooDo k 1

~)ˆ(ˆ xxFW σ−=&

oooooo WFWxF ˆˆ~1 κκ −−

)()(ˆ)ˆ,ˆ(ˆ)( 21 ttt cvcT

c vrKxxW −+= στ

Tccc rxxFW ˆ)ˆ,ˆ(ˆ

21σ=&

ccc WrF ˆˆκ−

)()(ˆ)(ˆ ttt eer Λ+= &

NN ObserversNeeded when all states are not measured

)())(())(()1( kukxgkxfkx +=+

NN Control for Discrete Time Systems

dynamics

)(ˆ)(ˆ)(ˆ)(ˆ)(ˆ)(ˆ)1(ˆ kWkkIkykkWkW iTiii

Tiiiii ϕϕαφα −Γ−−=+

NN Tuning

layerlastforkrkyandNiforkrKkkWky NviT

ii ),1()(ˆ1,,1),()(ˆ)(ˆ)(ˆ +≡−=+≡ Lϕ

Error-based tuning

Gradient descent with momentum

Extra robust term

U.S. Patent- Jagannathan, Lewis

Neural Network Properties

Learning

Recall

Function approximation

Generalization

Classification

Association

Pattern recognition

Clustering

Robustness to single node failure

Repair and reconfiguration

Nervous system cell. http://www.sirinet.net/~jgjohnso/index.html

USED

???

x 2

x1

FL Membership Functions for 2-D Input Vector x

1

0

1 0

X1i X1

i+1

X2j

X2j+

1x1

i x1i+1

x 2j

x 2j+

1

Relation Between Fuzzy Systems and Neural Networks

Separable Gaussian activation functions for RBF NN

Separable triangular activation functions for CMAC NN

Two-layer NN as FL System

σ(.)

σ(.)

σ(.)

σ(.)

x1

x2

y1

y2

VT WT

inputs

hidden layer

outputs

xn ym

1

2

3

L

σ(.)

σ(.)

σ(.)

Standard thresholdsθ1

θ11

θ12

θ1n

FL system = NN with VECTOR thresholds

θ2

.e)b,a,z(2l

ii2l

i

li

)bz(ali

liiA

⎟⎠⎞⎜

⎝⎛ −−

=φ

rWKkr)bBaAˆ(KW WWT

W −−−= Φ&

raKkrWAKa aaT

a −=&

rbKkrWBKb bbT

b −=&

Gaussian membership function

Tuning laws

ControlledPlantKv[ ΛΤ I]

r(t)

-

Input MembershipFunctions

Fuzzy Rule Base

Output MembershipFunctions

xd(t)

e(t)

-

-)x,x(g d

x(t)

Fuzzy Logic Controllers

Dynamic Focusing of Awareness

Initial MFs

Final MFs

Effect of change of membership function spread "a"

Effect of change of membership function elasticities "c"

2cB )b,a,z()c,b,a,z( φφ =

2

22

2

1

c

)bz(a))bz(a(cos)c,b,a,z( ⎥

⎦

⎤⎢⎣

⎡−+−

=φ

Elastic Fuzzy Logic- c.f. P. WerbosWeights importance of factors in the rules

ControlledPlantKv[ ΛΤ I]

r(t)

-

Input MembershipFunctions

Fuzzy Rule Base

Output MembershipFunctions

xd(t)

e(t)

-

-)x,x(g d

x(t)

)x,x(grK)t(u dv −−=raKkrWAKa aa

Ta −=&

rbKkrWBKb bbT

b −=&

rWKkr)cCbBaAˆ(KW WWT

W −−−−= Φ& rcKkrWCKc ccT

c −=&

Elastic Fuzzy Logic ControlControl Tune Membership Functions

Tune Control Rep. Values

Better Performance

UnknownPlant

PerformanceEvaluator

InstantaneousUtility

r(t)

DesiredTrajectory

Action Generating NN

x(t)u(t)

tuning

d(t)

R(t)

)(ˆ xfUnknown

Plant


InstantaneousUtility

r(t)

DesiredTrajectory


x(t)u(t)

FL Critic

tuning

d(t)

R(t)

)(ˆ xf

Fuzzy Logic Critic NN controller

UnknownPlant


d(t)

)(^ xg u(t) x(t)

.

R

r

+

v(t)


xd(t)Kv

Kv

∫ ( 6-15 )

( 6-14 )

+

+

11ˆ;ˆ VW

-σ(.)

σ(.)

σ(.)

σ(.)

x1

x2

y1

y2

VT WT

inputs

hidden layer

outputs

xn ym

1

2

3

L

REFERENCE

input membershipfunctions

fuzzy rulle base

output membershipfunctions

+

ρ ρ&

R

Learning FL Critic Controller

,ˆ)ˆ('ˆˆ1111 VrVWrHV TTT Φμ −−=

&

,ˆ)ˆ(ˆ1111 WRrVW TT Γμ −−=

&

2211122222ˆˆ)ˆ('ˆ)()(ˆ WVrVWRrW TTTTT ΓμχσΓχσΓ −−=

&

Tune Action generating NN (controller)

Tune Fuzzy Logic Critic

FL Critic

Action generating NN

Critic requires MEMORY

User input:Reference Signal Performance

MeasurementMechanism

Reinforcement Signal

r(t)

Action Generating Neural Net

PLANT

RobustTerm

Kv

q(t)u(t)

qd(t)

v(t)

$g(x)-

-+

Utility

∑

Critic Element

R(t)

d(t)fr(t)

Control Action

σ( )×

σ( )×

σ( )×

σ( )×

y1

ym-1

ym

Input Layer Hidden

Layer

Output Layer

z2

zN-1

zN

Inpu

t Pre

-pro

cess

ing W

x1

xn-1

xn

1z1=1q

d(t)

Reinforcement Learning NN Controller

High-Level NN Controllers Need Exotic Lyapunov Fns.

1))(sgn()( ±== trtR

)~~(21)( 1

1WFWtrrtL T

n

ii

−

=

+= ∑

WFRxFW T )& κσ −= )(ˆ

)~~(21)1ln()1ln()( 1)()( WFWtreetL Ttrtr −− ++++= αα

& ( ) & ( ~ ~& )L trT T= −sgn +r r W F W1

)~~()(11

1)()(

WFW &&& −

α

−

α−

++⎟

⎟⎠

⎞⎜⎜⎝

⎛

+

α−+

+

α=

−+T

trtrtrtr

eeL

Reinforcement NN control

Simplified critic signal

Lyapunov Fn

Lyap. Deriv. contains R(t) !!

Tuning Law only contains R(t)

Adaptive Reinforcement Learning

,)(ˆ11 ρχσ +⋅= TWR

Critic is output of NN #1

)(ˆ),(ˆ 22 χσTd Wxxg =

Action is output of second NN

,ˆ)(ˆ111 WRW T −−= χσ&

( ) ,ˆˆ)(')(ˆ211122 WRWVrW

TT Γ−+⋅Γ= χσχσ&

The tuning algorithm treats this as a SINGLE 2-layer NN

Principe- Entropy

0000 ),(ln),())(),,(,( dxduuxpuxpuptxuxH ∫ ∫−=

Brockett- Minimum-Attention Controlawareness & effort (partial derivatives in PM)

Renyi’s entropyCorentropy

dtdxxub

tuadtuxruxV

22

0 ),(),( ⎟⎠⎞

⎜⎝⎛

∂∂

+⎟⎠⎞

⎜⎝⎛

∂∂

+= ∫ ∫∫

Encode Information into the Value Function

Information-Theoretic Learning

2. Neural Network Solution of Optimal Design Equations

Nearly Optimal ControlBased on HJ Optimal Design EquationsKnown system dynamicsPreliminary Off-line tuning

1. Neural Networks for Feedback Control

Based on FB Control ApproachUnknown system dynamicsOn-line tuning

Before-

2 2Tz h h u= +

2

0

2

0

2

0

2

0

2

)(

)(

)(

)(γ≤

+=

∫

∫

∫

∫∞

∞

∞

∞

dttd

dtuhh

dttd

dttz T

System

),(

)()()(

uxzxy

dxkuxgxfx

ψ==

++=&

)(ylu =

d

u

z

y control

Performance output

Measuredoutput

disturbance

where

Find control u(t) so that

For all L2 disturbancesAnd a prescribed gain γ2

L2 Gain Problem

H-Infinity Control Using Neural Networks

Zero-Sum differential game

Standard Bounded L2 Gain Problem

Take Ruuu T=2 ddd T=2and

Hamilton-Jacobi Isaacs (HJI) equation

xTT

xxTT

xTT

x VkkVVggRVhhfV 21

41

410γ

+−+= −

Stationary Point

xT VxgRu )(* 1

21 −−=

xT Vxkd )(

21* 2γ

=

If HJI has a positive definite solution V and the associated closed-loop system is ASthen L2 gain is bounded by γ2

Problems to solve HJIBeard proposed a successive solution method using Galerkin approx.

Viscosity Solution

Optimal control

Worst-case disturbance

( )∫∞

−+=0

222),( dtduhhduJ T γ Game theory value function

Bounded L2 Gain Problem for Constrained Input Systems

This is a quasi-norm

∫ −=u

Tq

du0

2 )(2 ννφ

Weaker than a norm –homogeneity property is replaced by the weaker symmetry property qq

xx −=

(Used by Lyshevsky for H2 control)

Control constrained by saturation function φ(.)tanh(p)

p

1

-1

∫ ∫∞

−

⎟⎟⎠

⎞⎜⎜⎝

⎛−+=

0

22

0

)(2),( dtddhhduJu

TT γννφ

Encode constraint into Value function

Hamiltonian

( ) dddhhkdgufxVduVxH T

uTT

T

x2

0

)(2),,,( γννφ −++++∂∂

≡ ∫ −

Stationarity conditions

)(20 1 uVguH

xT −+=

∂∂

= φ

dVkdH

xT 220 γ−=

∂∂

=

Optimal inputs

( )xT Vxgu )(* 2

1 φ−= Note u(t) is bounded!

xT Vxkd )(

21* 2γ

=

Leibniz’s Formula

Solve for u(t)

Cannot solve HJI !! Successive Solution- Algorithm 1:Let γ be prescribed and fixed.

0u a stabilizing control with region of asymptotic stability 0Ω

1. Outer loop- update controlInitial disturbance 00 =d

2. Inner loop- update disturbanceSolve Value Equation

( ) 0)()(2)( 2

0

=−++++∂

∂∫ − iTiu

TTj

Tj

i

dddhhkdgufx

V j

γννφ

Inner loop update

xVxkd j

iTi

∂∂

=+ )(2

12

1

γgo to 2.Iterate i until convergence to jVd ∞∞ , with RAS j

∞Ω

Outer loop update

⎟⎟⎠

⎞⎜⎜⎝

⎛∂

∂−=

∞

+ xVxgu jT

j )(21

1 φ

Go to 1.Iterate j until convergence to ∞

∞∞ Vu , , with RAS ∞

∞Ω

CT Policy Iteration for H-Infinity Control--- c.f. Howard

Consistency equation

Results for this Algorithm

For this to occur it is required that 0* Ω⊆Ω

The algorithm converges to )(*),(*,),(* 0000 ΩΩΩΩ duV

the optimal solution on the RAS 0Ω

Sometimes the algorithm converges to the optimal HJI solution V*, *Ω , u*, d*

For every iteration on the disturbance di one hasj

ij

i VV 1+≤ the value function increasesj

ij

i 1+Ω⊇Ω the RAS decreases

For every iteration on the control uj one has1+

∞∞ ≥ jj VV the value function decreases

1+∞∞ Ω⊆Ω jj the RAS does not decrease

)()()(

)()( i

LT

Li

L

TL

iL WxW

xL

xV

σσ

∇=∂

∂=

∂∂

Value function gradient approximation is

Substitute into Value Equation to get

Therefore, one may solve for NN weights at iteration (i,j)

Neural Network Approximation for Computational Technique

222),,()(),,()(0 i

jTi

jTi

ji

jTi

j duhhduxfxwduxrxxw γσσ −++∇=+∇= &

Neural Network to approximate V(i)(x)

( ) ( ) ( )

1( ) ( ) ( ),

Li i T i

L j j L Lj

V x w x W xσ σ=

= =∑

Problem- Cannot solve the Value Equation!

Neural Network Feedback Controller

1 ( ) .2

T TL Ld k x Wσ= ∇

Optimal Solution

( )LT

LT Wxgu σφ ∇−= )(2

1

A NN feedback controller with nearly optimal weights

Example: Linear system

1u ≤1 1

2 2

0 0.5 0,

1 1.5 1

x xu

x x

−= +

−

⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦ ⎣ ⎦

&

&

2 2 4

15 1 2 1 1 2 2 3 1 2 4 1

4 3 2 2 3 6 6

5 2 6 1 2 7 1 2 8 1 2 9 1 10 2

5 4 2 3 3 2 4 5

11 1 2 12 1 2 13 1 2 14 1 2 15 1 2

( , )V x x w x w x w x x w x

w x w x x w x x w x x w x w x

w x x w x x w x x w x x w x x

= + + + +

+ + + + +

+ + + +

Activation functions = even polynomial basis up to order 6

RAS found by integrating )(xfx −=&

That is, reverse time τddt −=

Initial Gain found by LQR Optimal NN solution

Rotational-Translational Actuator Benchmark Problem

Control input is torque NF is a disturbance

Rotational-Translational Actuator Benchmark Problem

22

1 4 3 32 2 2 2

3 3

42

3 1 4 32 22 2

33

( ) ( ) ( )

0sin cos

1 cos 1 cos( ) , ( )

01cos ( sin )

1 cos1 cos0.2

x f x g x u k x dx

x x x xx x

f x g xx

x x x xxx

ε εε ε

ε εεε

ε

= + +

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥− + −⎢ ⎥ ⎢ ⎥⎢ ⎥− −⎢ ⎥

= =⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥−⎢ ⎥ ⎢ ⎥−− ⎣ ⎦⎣ ⎦

=

&

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6State Evolution for both controlllers

x1

x2

Switching Surface methodNonquadratic functionals method

( )1

0 0

tanh( ) 2 ( )u

TTV x Q x Rd dtφ μ μ∞

−⎡ ⎤= +⎢ ⎥

⎣ ⎦∫ ∫

Minimum-Time ControlEncode into Valua Function

2. Neural Network Solution of Optimal Design Equations

Nearly Optimal ControlBased on HJ Optimal Design EquationsKnown system dynamicsPreliminary Off-line tuning

1. Neural Networks for Feedback Control

Based on FB Control ApproachUnknown system dynamicsOn-line tuning

Before-

3. Approximate Dynamic Programming

Nearly Optimal ControlBased on recursive equation for the optimal valueUsually Known system dynamics (except Q learning)

The Goal – unknown dynamicsOn-line tuning

IEEE Trans. Neural NetworksSpecial Issue on Neural Networks for Feedback Control

Lewis, Wunsch, Prokhorov, Jie Huang, Parisini

Due date 1 December

Bring together:Feedback control system communityApproximate Dynamic Programming communityNeural Network community

)())(,()( 1++= khkkkh xVxhxrxV γ

),(1 kkk uxfx =+

Discrete-Time Systems

Recursive formConsistency equation

Howard Policy Iteration- Iterate the following until convergence 1. Find the value for the prescribed policy

solve completely2. Policy improvement

)())(,()( 1++= kjkjkkj xVxhxrxV γ

))(),((minarg)( 11 ++ += kjkkukj xVuxrxhk

γ

∑=

−=N

kikk

kik uxrxV ),()( γ

Value in difference form -

Four ADP Methods proposed by Werbos

Heuristic dynamic programming

Dual heuristic programming

AD Heuristic dynamic programming

AD Dual heuristic programming

(Watkins Q Learning)

Critic NN to approximate:

Value

Gradient xV

∂∂

)( kxV Q function ),( kk uxQ

GradientsuQ

xQ

∂∂

∂∂ ,

Action NN to approximate the Control

Bertsekas- Neurodynamic Programming

Barto & Bradtke- Q-learning proof (Imposed a settling time)

xVxgtxu T

∂∂

−=*

* )(21))((

),(

)()()(

uxzxy

dxkuxgxfx

ψ==

++=&

)( ylu =

d

u

z

y),(

)()()(

uxzxy

dxkuxgxfx

ψ==

++=&

)( ylu = )( ylu =

d

u

z

y

),,,(),,()(0 duxVxHduxrkdguf

xV T

∂∂

≡+++⎟⎠⎞

⎜⎝⎛

∂∂

=

xVxktxd T

∂∂

=*

2* )(

21))((γ

dxdVkk

dxdV

dxdVgg

dxdVhhf

dxdV T

TT

TT

T **

2

***

41

410 ⎟⎟

⎠

⎞⎜⎜⎝

⎛+⎟⎟

⎠

⎞⎜⎜⎝

⎛−+⎟⎟

⎠

⎞⎜⎜⎝

⎛=

γ

Continuous-Time Systems

HJB equation

Consistency equation

∫=T

t

dtduxrtxV ),,())((

Value in differential form -

)())(,()( 1++= khkkkh xVxhxrxV γ

Continuous Time Policy Iteration Select a stabilizing initial control1. Outer loop- update control

Initial disturbance set to zero

2. Inner loop- update disturbanceSolve Lyapunov equation

Inner loop disturbance update

go to 2.Until convergence

Outer loop update

Go to 1.Until convergence

( ) 0)( 222=−++++

∂∂ i

jTi

j

Tj

i

duhhkdgufx

V γ

xVxkd j

iTi

∂∂

=+ )(2

12

1

γ

⎟⎟⎠

⎞⎜⎜⎝

⎛∂

∂−=+ x

Vxgu ji

Tj )(2

11

Abu-Khalaf and Lewis- H inf

c.f. Howard work in DT Systems

Saridis – H2

)(),(ˆ xwwxV Tijj

i σ=

Neural Network Approximation of Value Function

222),,()(),,()(0 i

jTi

jTi

ji

jTi

j duhhduxfxwduxrxxw γσσ −++∇=+∇= &

Lyapunov equation becomes

*12

1* )()()( wxxgRxu TT σ∇−= −

Control action

CT Nearly Optimal NN feedback

CT Approx Policy IterationAbu-Khalaf & Lewis

Nearly optimal FB controlOff-line tuningKnown dynamics

Continuous-time adaptive critic

0),(),(),(),(),,( =+⎟⎠⎞

⎜⎝⎛

∂∂

=+⎟⎠⎞

⎜⎝⎛

∂∂

=+=∂∂ uxruxf

xVuxrx

xVuxrVu

xVxH

TT

&&

),(),()(),()(),( uxruxfxwuxrxxwuxrdt

dw TTT

+∇=+∇=+= σσσδ &

residual eq error

221 δ=E

gradient ),()()()( uxfxtw

twE σδδδ ∇=

∂∂

=∂∂

Update weights using, e.g., gradient descentδσα ),()( uxfxw ∇−=&

Critic NN

Abu-Khalaf & Lewis (c.f. Doya)

Hamiltonian (CT consistency check)

Or RLS

)()( xwxV Tσ=On-line tuning

Target value

Action NN

wwxxgRY TTT φσ =∇−= − )()(12

12

)()()( 12

1 xxgRx TTT σφ ∇−= −

vxY T )(2 φ=

])[(][)()(ˆ)( 12

1222 wvxwvxxgRYYxe TTT −=−∇−=−= − φσ

update weights by gradient descent)()( 2 xexv φβ−=&

Target action

Action NN

Activation fns depend on system dynamics

Critic weights

Alternative, simply set wwxxgRYxu TTT φσ =∇−== − )()()( 12

12

Does not work- proof development so far indicates that critic NN must be tuned faster than action NNi.e. α > β

c.f. Bradtke & Barto DT Q learning work

tuxr

tVV

uxrt

VVuxrxVu

xVxH tt

Dtttt

Δ+

Δ−

≈+Δ−

≈+=∂∂ ++ ),(

),(),()(),,( 11&

Small Time-Step Approximate Tuning for Continuous-Time Adaptive Critics

txVxVuxr

uxA ttttD

tt Δ−+

= + )()(),(),(

*1*

1

Baird’s Advantage function

)())(,()( 1++= khkkkh xVxhxrxV γThis is not in standard DT form

Sampled data systems

Optimal ControlLewis & Syrmos 1995

For More InformationJournal papers on http://arri.uta.edu/acs

In Progress: M. Abu-Khalaf, Jie Huang, F.L. LewisNearly Optimal Control by HJ Equation Solution Using Neural Networks

Theorem 1. Necessary and Sufficient Conditions for H-infinity Static OPFB Control

Assume that Q>0, then system (1) is output-feedback stabilizable with L2 gain bounded by γ If and only if:

i. (A, C) is detectable

ii. There exist matrices K* and L such that

)(* 1 LPBRCK T += −

where P>0, PT =P, is a solution of

01 112 =+−+++ −− LRLPBPBRPPDDQPAPA TTTT

γc.f. results by Kucera and De Souza

Note there is an (A,B) stabilizability condition hidden in the existence of Solution to the Riccati eq.

ONLY TWO COUPLED EQUATIONS

1. Initialize:Set n=0, 00 =L , and select γ, Q, R

2. n-th iteration:solve for nP in the ARE

01 112 =+−+++ −−

nT

nnT

nnT

nnT

n LRLPBBRPPDDPQPAAPγ

Evaluate gain and update L

111 )()( −−

+ += TTn

Tn CCCLPBRK

nT

nn PBCRKL −= ++ 11

Solution Algorithm 1- c.f. Geromel

Until Convergence

Based on ARE, so no initial stabilizing gain needed !!

Tries to project gain onto nullspace perp. of C using degrees of freedom in L

Aircraft Autopilot Design

F-16 Normal Acceleration Regulator Design

Aircraftq,α 2.20

2.20+s

kα

kq

α q

_ kI

ke

s 1 _

__

r e ε u δe

nzz =

1010+ sG Command System

TF eqy ][ εα=

ykkkkKyu Ieq ][ α−=−=

SystemdynamicsActuator

dynamics

Sensordynamics

Theorem 2. - new work

Parametrization of all H-infinity Static SVFB Controls

Assume that Q>0, then K is a stabilizing SVFB with L2 gain bounded by γ If and only if:

i.(A, B) is stabilizable

ii.There exist a matrix L such that

)(1 LPBRK T += −

where P>0, PT =P, is a solution of

01 112 =+−+++ −− LRLPBPBRPPDDQPAPA TTTT

γ

OPFB is a special case

1s C

A

xx.Bu2

H(s)

u1

1s C

A

xx.Bu2

H(s)

u1

kkTT

kk uxVWAxx ++=+ )(1 σ

Chaos in Dynamic Neural Networksc.f. Ron Chen

%MATLAB file for chaotic NN from Jun Wang's paper

function [ki,x,y,z]=tcnn(N);y(1)= rand; ki(1)=1; z(1)= 0.08;a=0.9; e= 1/250; Io=0.65;g= 0.0001; b=0.001;

for k=1: N-1;ki(k+1)= k+1;x(k)= 1/(1+exp(-y(k)/e));y(k+1)= a*y(k) + g -

z(k)*(x(k) - Io);z(k+1)= (1-b)*z(k);

endx(N)= 1/(1+exp(-y(N)/e));

⎟⎠⎞

⎜⎝⎛ −

+−+=

=

−+

+

Ie

zgyy

zz

kykkk

kk

ρα

β

/1

1

11

Jun Wang

automation & robotics research institute (arri) the ... talks/montreal plenary.pdfunknown...

Documents