automation & robotics research institute (arri) the ... talks/montreal plenary.pdfunknown...

Automation & Robotics Research Institute (ARRI)The University of Texas at Arlington

F.L. LewisMoncrief-O’Donnell Endowed Chair

Head, Controls & Sensors Group

http://ARRI.uta.edu/acs

Nonlinear Network Structures forFeedback Control

Organized and invited by Professor Jie Huang, CUHK

SCUT / CUHK Lectures on Advances in ControlMarch 2005

Relevance- Machine Feedback Control

qr1qr2

barrel flexiblemodes qf

compliant coupling

moving tank platform

turret with backlashand compliant drive train

terrain andvehicle vibrationdisturbances d(t)

Barrel tipposition

qr1qr2

barrel flexiblemodes qf

compliant coupling

moving tank platform

turret with backlashand compliant drive train

terrain andvehicle vibrationdisturbances d(t)

Barrel tipposition

Vehicle mass m

ParallelDamper

activedamping

uc(if used)

vibratory modesqf(t)

forward speedy(t)

vertical motionz(t)

surface roughnessρ(t)

Series Damper+

suspension+

Single-Wheel/Terrain System with Nonlinearities

High-Speed Precision Motion Control with unmodeled dynamics, vibration suppression, disturbance rejection, friction compensation, deadzone/backlash control

VehicleSuspension

IndustrialMachines

Military LandSystems

Aerospace

Newton’s Law

)()( tum

Mechanical Motion Systems (Vehicles, Robots)

τqB)+τq+G(q)+F(q)q(q,+Vq)qM( dm )(=&&&&&&

Coriolis/centripetalforce

gravity friction disturbances

Actuatorproblems

inertia

Control Input

LaGrange’s Eqs. Of Motion

xdxcxxxbxaxx

+−=−=

Darwinian Selection & Population Dynamics

x1= preyx2= predator

Volterra’s fishes

Stable Limit Cycle

212111

xdxcxxexxbxaxx

+−=−−=

Effects of OvercrowdingLimited food and resources

Stable Equilibrium POINT

Favorable to Prey!

Dynamical System Models

)()()(

xhyuxgxfx

Nonlinear system

Continuous-Time Systems Discrete-Time Systems

)()()(1

xhyuxgxfx

Linear system

CxyBuAxx

CxyBAxx

h(x)g(x)

xx& yu

Control Inputs Internal States Measured Outputs

Issues in Feedback Control

system

Feedbackcontroller

Feedforwardcontroller

Measured outputs

Control inputs

Desired trajectories

Sensornoise

Disturbances

StabilityTracking BoundednessRobustness

to disturbancesto unknown dynamics

Definitions of System Stability

Const Bound B

tt0 t0+T

Asymptotic Stability Marginal Stability

Uniform Ultimate Boundedness

1 kk xfxxfx

controlu(t)

outputy(t)

controller

systemidentifier estimated

output

)(ˆ ty

identificationerror

desiredoutput

controlu(t)

outputy(t)

controllerdesiredoutput

trackingerror

controlu(t)

outputy(t)

controller #1

controller #2

desiredoutput

trackingerror

Indirect Scheme

Controller Topologies

Direct Scheme

Feedback/FeedforwardScheme

Cell Homeostasis The individual cell is a complex feedback control system. It pumps ions across the cell membrane to maintain homeostatis, and has only limited energy to do so.

Cellular Metabolism

Permeability control of the cell membrane

http://www.accessexcellence.org/RC/VL/GG/index.html

Optimality in Biological Systems

Optimality in Control Systems DesignR. Kalman 1960

Rocket Orbit Injection

http://microsat.sm.bmstu.ru/e-library/Launch/Dnepr_GEO.pdf

ObjectivesGet to orbit in minimum timeUse minimum fuel

Dynamics

Performance Index, Cost, or Value function

∫∫ =+=TT

dtuxrdtuRxQJ00

),()]()([CT

Strategic utility utility

kuxrJ ∑

Minimum energy RuuQxxuxr TT +=),(Minimum fuel uuxr =),(

Minimum time 1),( =uxr Then TdtuxrJT

== ∫0

Discounting ),(0

k uxrJ ∑=

= γ ∫ −=T

t dtuxreJ0

Input Membership Fns. Output Membership Fns.

Fuzzy Logic Rule Base

Output

Fuzzy Associative Memory (FAM) Neural Network (NN)

INTELLIGENT CONTROL TOOLS

Input x Output u

Both FAM and NN define a function u= f(x) from inputs to outputs

FAM and NN can both be used for: 1. Classification and Decision-Making2. Control

(Includes Adaptive Control)

NN Includes Adaptive Control (Adaptive control is a 1-layer NN)

Neural Network Properties

Learning

Recall

Function approximation

Generalization

Classification

Association

Pattern recognition

Clustering

Robustness to single node failure

Repair and reconfiguration

Nervous system cell. http://www.sirinet.net/~jgjohnso/index.html

First groups working on NN Feedback Control in CS community

Werbos

NarendraSanner & SlotineF.C. Chen & KhalilLewisPolycarpou & IoannouChristodoulou & Rovithakis

A.J. Calise, McFarland, Naira HovakimyanEdgar Sanchez & PoznyakSam Ge, Zhang, et al.

Jun Wang, Chinese Univ. Hong Kong

c. 1995

Robot System[Λ I]qd e

Unity-Gain Tracking Loop

τrKv q

Industry Standard- PD Controller

)()()( tetetr Λ+= &Desiredtrajectory

Actualtrajectory

Easy to implement with COTS controllersFastCan be implemented with a few lines of code- e.g. MATLAB

But -- Cannot handle-High-order unmodeled dynamics Unknown disturbancesHigh performance specifications for nonlinear systemsActuator problems such as friction, deadzones, backlash

Controlinput

Two-layer feedforward static neural network (NN)

inputs

hidden layer

outputs

Summation eqs Matrix eqs

)( xVWy TTσ=⎟⎟⎠

⎞⎜⎜⎝

⎛+⎟⎟

⎞⎜⎜⎝

⎛+= ∑ ∑

jkjkjiki wvxvwy

10σσ

Control System Design Approach

ττ =++++ dm qFqGqqqVqqM )()(),()( &&&&&

)()()( tqtqte d −= eer Λ+= &

ττ −++−= dm xfrVrM )(&

Robot dynamics

Tracking Error definition

Error dynamics

qdRobot System[Λ I]

PD Tracking Loop

Robot dynamics

?controller

)()()( tqtqte d −=Tracking error

eer Λ+= &Sliding variable

The equations give the FB controller structure

Control System Design Approach

)()()( tqtqte d −= eer Λ+= &

ττ −++−= dm xfrVrM )(&

Robot dynamics

Tracking Error definition

Error dynamics

vrKxVW vTT −+= )ˆ(ˆ στDefine control input

εσ += )()( xVWxf TTApprox. unknown function by NN

Universal Approximation Property UNKNOWN FN.

)()ˆ(ˆ)( tvxVWxVWrKrVrM dTTTT

vm ++−++−−= τσεσ&

Closed-loop dynamics

)(~ tvfrKrVrM dvm +++−−= τ&

qdRobot System[Λ I]

Robust ControlTerm

PD Tracking Loop

Neural Network Robot Controller

Nonlinear Inner Loop

Feedforward Loop

Universal Approximation Property

Problem- Nonlinear in the NN weights sothat standard proof techniques do not work

Feedback linearization

Easy to implement with a few more lines of codeLearning feature allows for on-line updates to NN memory as dynamics changeHandles unmodelled dynamics, disturbances, actuator problems such as frictionNN universal basis property means no regression matrix is neededNonlinear controller allows faster & more precise motion

Stability Proof based on Lyapunov Extension

Define a Lyapunov Energy Function

)~~()~~( 21

21 VVtrWWtrMrrL TTT ++=

Differentiate

)()'ˆˆ~(~)ˆ'ˆˆ~(~

vwrWxrVVtr

xrVrWWtr

rVMrrKrL

−+−=

Using certain special tuning rules, one can show that the energyderivative is negative outside a compact set.

L&negative

)(~ tW This proves that all signals are bounded

Problems—1. How to characterize the NN weight errors as ‘small’?- use Frobenius Norm2. Nonlinearity in the parameters requires extra care in the proof

Theorem 1 (NN Weight Tuning for Stability)

Let the desired trajectory )(tqd and its derivatives be bounded. Let the initial tracking error bewithin a certain allowable set U . Let MZ be a known upper bound on the Frobenius norm of theunknown ideal weights Z . Take the control input as

vrKxVW vTT −+= )ˆ(ˆ στ with rZZKtv MFZ )()( +−= .

Let weight tuning be provided by

WrFxrVFrFW TTT ˆˆ'ˆˆˆ κσσ −−=& , VrGrWGxV TT ˆ)ˆ'ˆ(ˆ κσ −=&

with any constant matrices 0,0 >=>= TT GGFF , and scalar tuning parameter 0>κ . Initialize the weight estimates as randomVW == ˆ,0ˆ .

Then the filtered tracking error )(tr and NN weight estimates VW ˆ,ˆ are uniformly ultimately bounded. Moreover, arbitrarily small tracking error may be achieved by selecting large controlgains vK . Backprop terms-

WerbosExtra robustifying terms-Narendra’s e-mod extended to NLIP systems

Forward Prop term?

Can also use simplified tuning- Hebbian

weights

W2 weights, x

d=[0.5sin(t) 0.5cos(t)]T

NN weights converge to the best learned values for the given system

0 2 4 6 8 10 12 14 16-0.8

Time(second)

0 2 4 6 8 10 12 14 16-0.04

Time(second)

0 2 4 6 8 10 12 14 16-0.2

Time(second)

NN Friction Compensator

Desired trajectory

Tracking errors- solid = fixed gain controller, dashed= NN controller

Trajectory Tracking Controller

Position Velocity

position

velocity

Fixed gain

Robot System[Λ I]

Robust ControlTerm

Tracking Loop

τf(x)

Feedforward Loop

Static NN => Dynamic NN Feedback Controller

Dynamic NN and Passivity

xx.Bu2

kk uxVWAxx ++=+ )(1 σ

Closed-Loop System wrt Neural Networkis a Dynamic (Recursive NN)

Discrete time case

TT rWGxV )ˆ'ˆ(ˆ σ=&

TTT xrVFrFW ˆ'ˆˆˆ σσ −=&The backprop tuning algorithms

make the closed-loop system passive

WrFxrVFrFW TTT ˆˆ'ˆˆˆ κσσ −−=&

VrGrWGxV TT ˆ)ˆ'ˆ(ˆ κσ −=&

The enhanced tuning algorithms

make the closed-loop system state-strict passive

SSP gives extra robustness properties to disturbances and HF dynamics

Force Control

Flexible pointing systems

Vehicle active suspensionSBIR Contracts

What about practical Systems?

Flexible Systems with Vibratory Modes

τ⎥⎦

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡+⎥

⎤⎢⎣

⎡+⎥

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡+⎥

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡+⎥

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

Rigid dynamics

Flexible dynamics

Problem- only one control input !

Flexible link pointing system

acceleration velocityposition Flex. modes

[Λ I]

Robust ControlTerm

Tracking Loop

Neural network controller for Flexible-Link robot arm

qr = qrqr.e

ee = .

qd =qdqd.

Robot Systemqfqf.

Fast PDgains

Manifoldequation

ξFast Vibration Suppression Loop

Singular PerturbationsAdd an extra feedback loopUse passivity to show stability

Coupled Systems

uqiRiLiKqGqFqqqVqqM

),()()(),()(

Motor electrical dynamics

Robot mechanical dynamics

Problem- only one control input !

Sprung mass (car body) smsz

Unsprung mass (tire) umuz

rzterrain

VehicleActiveSuspensioncontrol

[Λ I]

Robust ControlTerm vi(t)

Tracking Loop

Nonlinear FB Linearization Loop

F1(x)^ qr = qrqr.e

ee = .

qd =qdqd.

RobotSystem1/KB1 i

F2(x)^

NN#2Backstepping Loop

ue[Λ I]

Robust ControlTerm vi(t)

Tracking Loop

Nonlinear FB Linearization Loop

F1(x)F1(x)^

Neural network backstepping controller for Flexible-Joint robot arm

qr = qrqr.qr =qr = qrqr.qrqr.e

ee = .e = .

qd =qdqd.qd =qd =qdqd.qdqd.

..qd..qd

RobotSystem1/KB1 i

F2(x)F2(x)^

KηKη

NN#2Backstepping Loop

Backstepping

Advantages over traditional Backstepping- no regression functions needed

Add an extra feedback loopTwo NN neededUse passivity to show stability

τ=D(u)

BacklashDeadzone

System

Feedbackcontroller

Outputs

ActualControl inputsActuator

nonlinearity

AppliedControl inputs

Actuator Nonlinearities

MechanicalSystem

Kv[ΛΤ Ι]

Estimateof NonlinearFunction

NN DeadzonePrecompensator

$( )f x

NN in Feedforward Loop- Deadzone Compensation

iiiTTTTT

iii WWrTkWrTkUuUWrwUTW ˆˆˆ)('ˆ)(ˆ21 −−= σσ

WrSkrwUWUuUSW TTiii

TT ˆ)(ˆ)('ˆ1−−= σσ

Acts like a 2-layer NNWith enhanced backprop tuning !

little critic network

0 5 10 15-1

0 5 10 15-2

0 5 10 15-1

Performance Results

PD control-deadzone chops out the middle

NN control fixes the problem

Nonlinear

SystemK

v[ ΛΤ Ι]

Estimate

of Nonlinear

Function

$( )f x

[0 ΛΤ ]

Backlash

Filter v2

Backstepping loop

desτ&

NN Compensator

Dynamic inversion NN compensator for system with Backlash

U.S. patent- Selmic, Lewis, Calise, McFarland

Performance Results

PD control-backlash chops off tops & bottoms

NN control fixes the problem

0 1 2 3 4 5 6 7 8 9 10-1.5

x 1(t)

PD controller with backlash

0 1 2 3 4 5 6 7 8 9 10-0.1

e 1(t)

0 1 2 3 4 5 6 7 8 9 10-2

x 2(t)

PD controller with backlash

0 1 2 3 4 5 6 7 8 9 10-1

e 2(t)

position

velocity

0 1 2 3 4 5 6 7 8 9 10-1.5

x 1(t)

PD controller with NN backlash compensation

0 1 2 3 4 5 6 7 8 9 10-0.04

timee 1(t)

position

Tracking error

$ ( $ , $ )h x xo 1 2

• ••

$ ( $ , $ )h x xc 1 2

[λ Ι ]vc

KvkpM-1(.)

⎡⎣⎢

⎤⎦⎥

⎣⎢

⎦⎥$&

⎣⎢⎤

⎦⎥&$x1

1x)(tτ)(ˆ tr

Neural Network Observer

Neural Network Controller

~)()()ˆ,ˆ(ˆˆ

xKxMxxWz

+=− t

TooDo k 1

~)ˆ(ˆ xxFW σ−=&

oooooo WFWxF ˆˆ~1 κκ −−

)()(ˆ)ˆ,ˆ(ˆ)( 21 ttt cvcT

c vrKxxW −+= στ

Tccc rxxFW ˆ)ˆ,ˆ(ˆ

21σ=&

ccc WrF ˆˆκ−

)()(ˆ)(ˆ ttt eer Λ+= &

NN ObserversNeeded when all states are not measured

)())(())(()1( kukxgkxfkx +=+

NN Control for Discrete Time Systems

dynamics

)(ˆ)(ˆ)(ˆ)(ˆ)(ˆ)(ˆ)1(ˆ kWkkIkykkWkW iTiii

Tiiiii ϕϕαφα −Γ−−=+

NN Tuning

layerlastforkrkyandNiforkrKkkWky NviT

ii ),1()(ˆ1,,1),()(ˆ)(ˆ)(ˆ +≡−=+≡ Lϕ

Error-based tuning

Gradient descent with momentum

Extra robust term

U.S. Patent- Jagannathan, Lewis

Neural Network Properties

Learning

Recall

Function approximation

Generalization

Classification

Association

Pattern recognition

Clustering

Robustness to single node failure

Repair and reconfiguration

Nervous system cell. http://www.sirinet.net/~jgjohnso/index.html

FL Membership Functions for 2-D Input Vector x

X1i X1

i x1i+1

Relation Between Fuzzy Systems and Neural Networks

Separable Gaussian activation functions for RBF NN

Separable triangular activation functions for CMAC NN

Two-layer NN as FL System

inputs

hidden layer

outputs

Standard thresholdsθ1

FL system = NN with VECTOR thresholds

.e)b,a,z(2l

)bz(ali

⎟⎠⎞⎜

⎝⎛ −−

rWKkr)bBaAˆ(KW WWT

W −−−= Φ&

raKkrWAKa aaT

a −=&

rbKkrWBKb bbT

b −=&

Gaussian membership function

Tuning laws

ControlledPlantKv[ ΛΤ I]

Input MembershipFunctions

Fuzzy Rule Base

Output MembershipFunctions

-)x,x(g d

Fuzzy Logic Controllers

Dynamic Focusing of Awareness

Initial MFs

Final MFs

Effect of change of membership function spread "a"

Effect of change of membership function elasticities "c"

2cB )b,a,z()c,b,a,z( φφ =

)bz(a))bz(a(cos)c,b,a,z( ⎥

⎤⎢⎣

⎡−+−

Elastic Fuzzy Logic- c.f. P. WerbosWeights importance of factors in the rules

ControlledPlantKv[ ΛΤ I]

Input MembershipFunctions

Fuzzy Rule Base

Output MembershipFunctions

-)x,x(g d

)x,x(grK)t(u dv −−=raKkrWAKa aa

Ta −=&

rbKkrWBKb bbT

b −=&

rWKkr)cCbBaAˆ(KW WWT

W −−−−= Φ& rcKkrWCKc ccT

c −=&

Elastic Fuzzy Logic ControlControl Tune Membership Functions

Tune Control Rep. Values

Better Performance

UnknownPlant

PerformanceEvaluator

InstantaneousUtility

DesiredTrajectory

Action Generating NN

x(t)u(t)

tuning

)(ˆ xfUnknown

InstantaneousUtility

DesiredTrajectory

x(t)u(t)

FL Critic

tuning

)(ˆ xf

Fuzzy Logic Critic NN controller

UnknownPlant

)(^ xg u(t) x(t)

xd(t)Kv

∫ ( 6-15 )

( 6-14 )

11ˆ;ˆ VW

-σ(.)

inputs

hidden layer

outputs

REFERENCE

input membershipfunctions

fuzzy rulle base

output membershipfunctions

ρ ρ&

Learning FL Critic Controller

,ˆ)ˆ('ˆˆ1111 VrVWrHV TTT Φμ −−=

,ˆ)ˆ(ˆ1111 WRrVW TT Γμ −−=

2211122222ˆˆ)ˆ('ˆ)()(ˆ WVrVWRrW TTTTT ΓμχσΓχσΓ −−=

Tune Action generating NN (controller)

Tune Fuzzy Logic Critic

FL Critic

Action generating NN

Critic requires MEMORY

User input:Reference Signal Performance

MeasurementMechanism

Reinforcement Signal

Action Generating Neural Net

RobustTerm

q(t)u(t)

$g(x)-

Utility

Critic Element

d(t)fr(t)

Control Action

σ( )×

Input Layer Hidden

Output Layer

1z1=1q

Reinforcement Learning NN Controller

High-Level NN Controllers Need Exotic Lyapunov Fns.

1))(sgn()( ±== trtR

)~~(21)( 1

1WFWtrrtL T

+= ∑

WFRxFW T )& κσ −= )(ˆ

)~~(21)1ln()1ln()( 1)()( WFWtreetL Ttrtr −− ++++= αα

& ( ) & ( ~ ~& )L trT T= −sgn +r r W F W1

)~~()(11

WFW &&& −

⎟⎠

⎞⎜⎜⎝

α−+

trtrtrtr

Reinforcement NN control

Simplified critic signal

Lyapunov Fn

Lyap. Deriv. contains R(t) !!

Tuning Law only contains R(t)

Adaptive Reinforcement Learning

,)(ˆ11 ρχσ +⋅= TWR

Critic is output of NN #1

)(ˆ),(ˆ 22 χσTd Wxxg =

Action is output of second NN

,ˆ)(ˆ111 WRW T −−= χσ&

( ) ,ˆˆ)(')(ˆ211122 WRWVrW

TT Γ−+⋅Γ= χσχσ&

The tuning algorithm treats this as a SINGLE 2-layer NN

Principe- Entropy

0000 ),(ln),())(),,(,( dxduuxpuxpuptxuxH ∫ ∫−=

Brockett- Minimum-Attention Controlawareness & effort (partial derivatives in PM)

Renyi’s entropyCorentropy

dtdxxub

tuadtuxruxV

0 ),(),( ⎟⎠⎞

⎜⎝⎛

∂∂

+⎟⎠⎞

⎜⎝⎛

∂∂

+= ∫ ∫∫

Encode Information into the Value Function

Information-Theoretic Learning

2. Neural Network Solution of Optimal Design Equations

Nearly Optimal ControlBased on HJ Optimal Design EquationsKnown system dynamicsPreliminary Off-line tuning

1. Neural Networks for Feedback Control

Based on FB Control ApproachUnknown system dynamicsOn-line tuning

Before-

2 2Tz h h u= +

)(γ≤

∫∞

dttz T

System

)()()(

dxkuxgxfx

)(ylu =

y control

Performance output

Measuredoutput

disturbance

Find control u(t) so that

For all L2 disturbancesAnd a prescribed gain γ2

L2 Gain Problem

H-Infinity Control Using Neural Networks

Zero-Sum differential game

Standard Bounded L2 Gain Problem

Take Ruuu T=2 ddd T=2and

Hamilton-Jacobi Isaacs (HJI) equation

x VkkVVggRVhhfV 21

+−+= −

Stationary Point

xT VxgRu )(* 1

21 −−=

xT Vxkd )(

21* 2γ

If HJI has a positive definite solution V and the associated closed-loop system is ASthen L2 gain is bounded by γ2

Problems to solve HJIBeard proposed a successive solution method using Galerkin approx.

Viscosity Solution

Optimal control

Worst-case disturbance

( )∫∞

−+=0

222),( dtduhhduJ T γ Game theory value function

Bounded L2 Gain Problem for Constrained Input Systems

This is a quasi-norm

∫ −=u

2 )(2 ννφ

Weaker than a norm –homogeneity property is replaced by the weaker symmetry property qq

xx −=

(Used by Lyshevsky for H2 control)

Control constrained by saturation function φ(.)tanh(p)

∫ ∫∞

⎟⎟⎠

⎞⎜⎜⎝

⎛−+=

)(2),( dtddhhduJu

TT γννφ

Encode constraint into Value function

Hamiltonian

( ) dddhhkdgufxVduVxH T

)(2),,,( γννφ −++++∂∂

≡ ∫ −

Stationarity conditions

)(20 1 uVguH

xT −+=

∂∂

xT 220 γ−=

∂∂

Optimal inputs

( )xT Vxgu )(* 2

1 φ−= Note u(t) is bounded!

xT Vxkd )(

21* 2γ

Leibniz’s Formula

Solve for u(t)

Cannot solve HJI !! Successive Solution- Algorithm 1:Let γ be prescribed and fixed.

0u a stabilizing control with region of asymptotic stability 0Ω

1. Outer loop- update controlInitial disturbance 00 =d

2. Inner loop- update disturbanceSolve Value Equation

( ) 0)()(2)( 2

=−++++∂

∂∫ − iTiu

dddhhkdgufx

γννφ

Inner loop update

xVxkd j

∂∂

=+ )(2

γgo to 2.Iterate i until convergence to jVd ∞∞ , with RAS j

Outer loop update

⎟⎟⎠

⎞⎜⎜⎝

⎛∂

∂−=

+ xVxgu jT

j )(21

Go to 1.Iterate j until convergence to ∞

∞∞ Vu , , with RAS ∞

CT Policy Iteration for H-Infinity Control--- c.f. Howard

Consistency equation

Results for this Algorithm

For this to occur it is required that 0* Ω⊆Ω

The algorithm converges to )(*),(*,),(* 0000 ΩΩΩΩ duV

the optimal solution on the RAS 0Ω

Sometimes the algorithm converges to the optimal HJI solution V*, *Ω , u*, d*

For every iteration on the disturbance di one hasj

i VV 1+≤ the value function increasesj

i 1+Ω⊇Ω the RAS decreases

For every iteration on the control uj one has1+

∞∞ ≥ jj VV the value function decreases

1+∞∞ Ω⊆Ω jj the RAS does not decrease

)()()(

)()( i

iL WxW

∇=∂

∂∂

Value function gradient approximation is

Substitute into Value Equation to get

Therefore, one may solve for NN weights at iteration (i,j)

Neural Network Approximation for Computational Technique

222),,()(),,()(0 i

j duhhduxfxwduxrxxw γσσ −++∇=+∇= &

Neural Network to approximate V(i)(x)

( ) ( ) ( )

1( ) ( ) ( ),

Li i T i

L j j L Lj

V x w x W xσ σ=

= =∑

Problem- Cannot solve the Value Equation!

Neural Network Feedback Controller

1 ( ) .2

T TL Ld k x Wσ= ∇

Optimal Solution

LT Wxgu σφ ∇−= )(2

A NN feedback controller with nearly optimal weights

Example: Linear system

1u ≤1 1

0 0.5 0,

1 1.5 1

−= +

⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦ ⎣ ⎦

15 1 2 1 1 2 2 3 1 2 4 1

4 3 2 2 3 6 6

5 2 6 1 2 7 1 2 8 1 2 9 1 10 2

5 4 2 3 3 2 4 5

11 1 2 12 1 2 13 1 2 14 1 2 15 1 2

( , )V x x w x w x w x x w x

w x w x x w x x w x x w x w x

w x x w x x w x x w x x w x x

= + + + +

+ + + + +

+ + + +

Activation functions = even polynomial basis up to order 6

RAS found by integrating )(xfx −=&

That is, reverse time τddt −=

Initial Gain found by LQR Optimal NN solution

Rotational-Translational Actuator Benchmark Problem

Control input is torque NF is a disturbance

Rotational-Translational Actuator Benchmark Problem

1 4 3 32 2 2 2

3 1 4 32 22 2

( ) ( ) ( )

0sin cos

1 cos 1 cos( ) , ( )

01cos ( sin )

1 cos1 cos0.2

x f x g x u k x dx

x x x xx x

f x g xx

x x x xxx

ε εε ε

ε εεε

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥− + −⎢ ⎥ ⎢ ⎥⎢ ⎥− −⎢ ⎥

= =⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥−⎢ ⎥ ⎢ ⎥−− ⎣ ⎦⎣ ⎦

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6-0.8

0.6State Evolution for both controlllers

Switching Surface methodNonquadratic functionals method

tanh( ) 2 ( )u

TTV x Q x Rd dtφ μ μ∞

−⎡ ⎤= +⎢ ⎥

⎣ ⎦∫ ∫

Minimum-Time ControlEncode into Valua Function

2. Neural Network Solution of Optimal Design Equations

Nearly Optimal ControlBased on HJ Optimal Design EquationsKnown system dynamicsPreliminary Off-line tuning

1. Neural Networks for Feedback Control

Based on FB Control ApproachUnknown system dynamicsOn-line tuning

Before-

3. Approximate Dynamic Programming

Nearly Optimal ControlBased on recursive equation for the optimal valueUsually Known system dynamics (except Q learning)

The Goal – unknown dynamicsOn-line tuning

IEEE Trans. Neural NetworksSpecial Issue on Neural Networks for Feedback Control

Lewis, Wunsch, Prokhorov, Jie Huang, Parisini

Due date 1 December

Bring together:Feedback control system communityApproximate Dynamic Programming communityNeural Network community

)())(,()( 1++= khkkkh xVxhxrxV γ

),(1 kkk uxfx =+

Discrete-Time Systems

Recursive formConsistency equation

Howard Policy Iteration- Iterate the following until convergence 1. Find the value for the prescribed policy

solve completely2. Policy improvement

)())(,()( 1++= kjkjkkj xVxhxrxV γ

))(),((minarg)( 11 ++ += kjkkukj xVuxrxhk

kik uxrxV ),()( γ

Value in difference form -

Four ADP Methods proposed by Werbos

Heuristic dynamic programming

Dual heuristic programming

AD Heuristic dynamic programming

AD Dual heuristic programming

(Watkins Q Learning)

Critic NN to approximate:

Gradient xV

∂∂

)( kxV Q function ),( kk uxQ

GradientsuQ

∂∂

∂∂ ,

Action NN to approximate the Control

Bertsekas- Neurodynamic Programming

Barto & Bradtke- Q-learning proof (Imposed a settling time)

xVxgtxu T

∂∂

* )(21))((

)()()(

dxkuxgxfx

)( ylu =

)()()(

dxkuxgxfx

)( ylu = )( ylu =

),,,(),,()(0 duxVxHduxrkdguf

∂∂

≡+++⎟⎠⎞

⎜⎝⎛

∂∂

xVxktxd T

∂∂

21))((γ

dxdVkk

dxdVgg

dxdVhhf

dxdV T

410 ⎟⎟

⎞⎜⎜⎝

⎛+⎟⎟

⎞⎜⎜⎝

⎛−+⎟⎟

⎞⎜⎜⎝

Continuous-Time Systems

HJB equation

Consistency equation

dtduxrtxV ),,())((

Value in differential form -

)())(,()( 1++= khkkkh xVxhxrxV γ

Continuous Time Policy Iteration Select a stabilizing initial control1. Outer loop- update control

Initial disturbance set to zero

2. Inner loop- update disturbanceSolve Lyapunov equation

Inner loop disturbance update

go to 2.Until convergence

Outer loop update

Go to 1.Until convergence

( ) 0)( 222=−++++

∂∂ i

duhhkdgufx

xVxkd j

∂∂

=+ )(2

⎟⎟⎠

⎞⎜⎜⎝

⎛∂

∂−=+ x

Vxgu ji

Tj )(2

Abu-Khalaf and Lewis- H inf

c.f. Howard work in DT Systems

Saridis – H2

)(),(ˆ xwwxV Tijj

Neural Network Approximation of Value Function

222),,()(),,()(0 i

j duhhduxfxwduxrxxw γσσ −++∇=+∇= &

Lyapunov equation becomes

1* )()()( wxxgRxu TT σ∇−= −

Control action

CT Nearly Optimal NN feedback

CT Approx Policy IterationAbu-Khalaf & Lewis

Nearly optimal FB controlOff-line tuningKnown dynamics

Continuous-time adaptive critic

0),(),(),(),(),,( =+⎟⎠⎞

⎜⎝⎛

∂∂

=+⎟⎠⎞

⎜⎝⎛

∂∂

=+=∂∂ uxruxf

xVuxrx

xVuxrVu

),(),()(),()(),( uxruxfxwuxrxxwuxrdt

dw TTT

+∇=+∇=+= σσσδ &

residual eq error

221 δ=E

gradient ),()()()( uxfxtw

twE σδδδ ∇=

∂∂

=∂∂

Update weights using, e.g., gradient descentδσα ),()( uxfxw ∇−=&

Critic NN

Abu-Khalaf & Lewis (c.f. Doya)

Hamiltonian (CT consistency check)

Or RLS

)()( xwxV Tσ=On-line tuning

Target value

Action NN

wwxxgRY TTT φσ =∇−= − )()(12

)()()( 12

1 xxgRx TTT σφ ∇−= −

vxY T )(2 φ=

])[(][)()(ˆ)( 12

1222 wvxwvxxgRYYxe TTT −=−∇−=−= − φσ

update weights by gradient descent)()( 2 xexv φβ−=&

Target action

Action NN

Activation fns depend on system dynamics

Critic weights

Alternative, simply set wwxxgRYxu TTT φσ =∇−== − )()()( 12

Does not work- proof development so far indicates that critic NN must be tuned faster than action NNi.e. α > β

c.f. Bradtke & Barto DT Q learning work

VVuxrxVu

xVxH tt

≈+Δ−

≈+=∂∂ ++ ),(

),(),()(),,( 11&

Small Time-Step Approximate Tuning for Continuous-Time Adaptive Critics

txVxVuxr

uxA ttttD

tt Δ−+

= + )()(),(),(

Baird’s Advantage function

)())(,()( 1++= khkkkh xVxhxrxV γThis is not in standard DT form

Sampled data systems

Optimal ControlLewis & Syrmos 1995

For More InformationJournal papers on http://arri.uta.edu/acs

In Progress: M. Abu-Khalaf, Jie Huang, F.L. LewisNearly Optimal Control by HJ Equation Solution Using Neural Networks

Theorem 1. Necessary and Sufficient Conditions for H-infinity Static OPFB Control

Assume that Q>0, then system (1) is output-feedback stabilizable with L2 gain bounded by γ If and only if:

i. (A, C) is detectable

ii. There exist matrices K* and L such that

)(* 1 LPBRCK T += −

where P>0, PT =P, is a solution of

01 112 =+−+++ −− LRLPBPBRPPDDQPAPA TTTT

γc.f. results by Kucera and De Souza

Note there is an (A,B) stabilizability condition hidden in the existence of Solution to the Riccati eq.

ONLY TWO COUPLED EQUATIONS

1. Initialize:Set n=0, 00 =L , and select γ, Q, R

2. n-th iteration:solve for nP in the ARE

01 112 =+−+++ −−

n LRLPBBRPPDDPQPAAPγ

Evaluate gain and update L

111 )()( −−

+ += TTn

Tn CCCLPBRK

nn PBCRKL −= ++ 11

Solution Algorithm 1- c.f. Geromel

Until Convergence

Based on ARE, so no initial stabilizing gain needed !!

Tries to project gain onto nullspace perp. of C using degrees of freedom in L

Aircraft Autopilot Design

F-16 Normal Acceleration Regulator Design

Aircraftq,α 2.20

2.20+s

r e ε u δe

1010+ sG Command System

TF eqy ][ εα=

ykkkkKyu Ieq ][ α−=−=

SystemdynamicsActuator

dynamics

Sensordynamics

Theorem 2. - new work

Parametrization of all H-infinity Static SVFB Controls

Assume that Q>0, then K is a stabilizing SVFB with L2 gain bounded by γ If and only if:

i.(A, B) is stabilizable

ii.There exist a matrix L such that

)(1 LPBRK T += −

where P>0, PT =P, is a solution of

01 112 =+−+++ −− LRLPBPBRPPDDQPAPA TTTT

OPFB is a special case

xx.Bu2

kk uxVWAxx ++=+ )(1 σ

Chaos in Dynamic Neural Networksc.f. Ron Chen

%MATLAB file for chaotic NN from Jun Wang's paper

function [ki,x,y,z]=tcnn(N);y(1)= rand; ki(1)=1; z(1)= 0.08;a=0.9; e= 1/250; Io=0.65;g= 0.0001; b=0.001;

for k=1: N-1;ki(k+1)= k+1;x(k)= 1/(1+exp(-y(k)/e));y(k+1)= a*y(k) + g -

z(k)*(x(k) - Io);z(k+1)= (1-b)*z(k);

endx(N)= 1/(1+exp(-y(N)/e));

⎟⎠⎞

⎜⎝⎛ −

+−+=

Jun Wang

automation & robotics research institute (arri) the ... talks/montreal plenary.pdfunknown...

Documents

arri lighting handbook - towson university

arri 2010-04-12 pca brochure

ch arri skypanelaccessories techsheet 092520

arri prime lenses - spherical · arri signature prime...

arri picture chronicle 2012

the 12th of july - address of rev. father stafford on the...

arri lighting handbook - english version

kurzanleitung short instructions · kurzanleitung short...

arri look files white paper 3

chapter 7: atmospheric disturbances part i: midlatitude...

arri wfu-3 manual

arri 90th anniversary_book

getting the most from your new arri kit arri lighting...

arri lighting handbook

detection of coseismic ionospheric disturbances by high...

arri wcu 3 manual

vision arri 12-2005

m-series m18 - arri

arri multicam system · look library looks and 3d luts...

arri master primes