stochastic optimal control of unknown linear networked control system in the presence of random...

1
Stochastic Optimal Control of Unknown Linear Networked Control System in the Presence of Random Delays and Packet Losses OBJECTIVES Develop a Q-learning based stochastic suboptimal controller for an unknown networked control system (NCS) with random delay and packet losses; Develop an adaptive estimator (AE)-based stochastic optimal control Investigate the effects of delays and packet losses on the stability of the NCS with unknown dynamics Student : Hao Xu, ECE Department BACKGROUND Networked control can reduce the installation costs and increase productivity through the use of wireless communication technology The challenging problems in control of networked- based system are network delay and packet losses. These effects do not only degrade the performance of NCS, but also can destabilize the system. Approximate dynamic programming (ADP) techniques intent to solve optimal control problems of complex systems without the knowledge of system dynamics in a forward-in-time manner. Figure 1 the wireless networked control system The proposed approach for optimal controller design involves using a combination of Q-learning and adaptive estimator (AE) whereas for suboptimal controller design only Q-learning scheme will be utilized The delays and packet losses are incorporated in the dynamic model which will be used for the controller development Networked Control System Model Networked control system representation and Figure 2 depicts a block diagram representation: Faculty Advisor : Dr. Jagannathan Sarangapani, ECE Department Q-learning Stochastic Suboptimal Control 1. Define the Q-function: 2. Define the update law to tune the Q-function where 3. Using mean values of the delays and packet losses instead of the random delays and packet losses, then H matrix become time-invariant matrix. 4. Define the update law to tune the H matrix online in least-squares sense 1) Vectorize the H matrix: 2) Update law: where and 5. Develop the stochastic suboptimal control 6. Convergence: when , and at the same time. Simulation Results Consider the linear time-invariant inverted pendulum dynamics After random delays and packet losses due to NCS, the original time-invariant system was discretized and represented as a time- varying system (Note: since the random delays and packet losses are considered, the NCS model is not only time varying , but also a function of time k) Performance evaluation of proposed suboptimal and optimal control 1) Stability: Figure 5 Stability performance As shown in Figure 5, if we use a PID without considering delays and packet losses, the NCS will be unstable(fig.5-(a)). However, when we implement proposed Q-learning suboptimal and AE optimal control, the NCS can still maintain stable(Fig.5-(b),(c)). 2) Optimality: Figure 6 Optimal performance As shown in figure 6-(a), proposed AE-base optimal controller can minimize the cost-to-go ( ) better than proposed Q-learning suboptimal controller. In Figure 6-(b), proposed AE-based optimal control can force NCS states converge to zero quicker than Q-learning suboptimal control. It indicates proposed AE-based optimal control is more effective than Q-learning suboptimal control. AE-based Stochastic Optimal Control 1. When random delays and packet losses are considered, H matrix become time-varying. However, we assume that it changes slowly. 2. Set up stochastic Q-function: 3. Using the adaptive estimator to represent the Q-function: where and is the Kronecker product quadratic polynomial basis vector 4. Define the update law to tune the approximated H matrix 1) Represent residual error: where and 2) Update law for time varying matrix H: where is a constant, and 5. Determine the AE stochastic optimal control input 6. Convergence: when , then and CONCLUSIONS Proposed Q-learning based suboptimal and AE- based optimal control design for NCS with unknown dynamics in presence of random delays and packet losses performs superior than a traditional controller Both Q-learning based suboptimal control and AE-based optimal control can maintain NCS stable. Proposed AE-based optimal control is more effective than Proposed Q-learning based suboptimal control. AE-based Stochastic Optimal Control (2) Figure 3 present the block diagram for the AE- based stochastic optimal regulator of NCS Figure 3 Stochastic optimal regulator block diagram FUTURE WORK Design suboptimal and optimal control for nonlinear networked control systems (NNCS) with unknown dynamics in presence of random delays and packet losses Design a novel wireless network protocol to decrease the effects of random delays and packet losses. Optimize the NNCS globally from both control part and wireless network part. Plant Actuator Sensor Delay A nd Packetlosses Controller Delay A nd Packetlosses W irelessNetwork T ) ( t ca ) ( t Ip ) ( t sc ) ( t Ip 1 k zk k zk k k z k z Az B u ,y Cz 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 m m m k d d k k i i k k k s zk I I I B Ip B Ip B Ip A A 0 0 0 0 m k k zk I B Ip B 0 0 C C z 1 ,..., 2 , 1 , 1 1 1 d i iT T dsB e B T i iT k i k i k i s T A k i k i k i 1 1 0,if w asreceived during , 1 1,if w aslostduring , 1 k i k i k i u kT k T Ip u kT k T m d n T T d k T k T k k u u x z 1 1 1 1 , k k z T k k z T k k k J u R u z Q z E u z Q 1 1 1 1 1 1 1 1 1 k i k k k k i k k u * i T T k k k k z k k z k T T T T * T * k k i k k Q z ,u r z ,u minQ z ,u r z ,u J zQz uRu z u H z u z i T z z z i T z z i T z z i T z z i uu i uz i zu i zz i B P B R A P B B P A A P A Q H H H H H H vec h k i k k T i h i dz H z w d z w h h i 2 1 1 , min arg 1 1 1 , , k i k i k i z T k i k z T k i k z u z Q z u R z u z Q z H z w d T T k i T k k z u z z w * k i i k i uz i uu z i T z z i T z z i z K u H H A P B B P B R K * 1 1 i * * * , , k k i k k i u z Q u z Q K K H H i i , T T k T k k T k T k k k z T k k z T k k k u z H u z J u R u z Q z E u z Q 1 , k T k k k T k k k w h w H w u z Q ˆ ˆ , l m d n k T k T T k k k k w z u z w H vec h , , ˆ ˆ 2 1 2 2 2 1 , ,..., ,..., kl kl kl k k k w w w w w w 1 1 1 ˆ , k T k k k hk W h u z r e 1 1 k k k w w W 1 1 1 1 1 1 , k z T k k z T k k k u R u z Q z u z r T k k T hk h k T k T k k u z r e W W W h , ˆ 1 1 h 1 0 h k uz k uu k k k k z H H z K u ˆ ˆ ˆ ˆ 1 0 , k z k k k h h ˆ * ˆ k k J J * ˆ k k u u A daptive Estim atorof function LinearN etw ork ControlSystem w ith U nknow n k zk k zk k z u B z A z 1 zk zk B and A k k u z Q , CostFunction Network k T k w h ˆ uz k uu k H H 1 k z k uz k uu k k z H H z u 1 k k T k k w H w J ˆ ˆ u x x x x 5455 . 4 0 8182 . 1 0 0 1818 . 31 4545 . 0 0 1 0 0 0 0 6727 . 2 1818 . 0 0 0 0 1 0 0 1.75 3.5 5.25 7 -80 -60 -40 -20 0 20 Tim e (Sec) (a) R egulation E rror V alues S tate R egulation E rrorofN C S w ith D elay and P acketLosses e1 e2 e3 e4 0 3.5 7 10.5 14 17.5 -40 -30 -20 -10 0 10 20 30 Tim e (S ec) (b) R egulation E rror V alues S tate R egulation E rrors w ith Q -learning suboptim al control ofN C S w ith unknow n dynam ics e1 e2 e3 e4 0 3.5 7 10.5 14 17.5 -40 -20 0 20 40 60 80 Tim e (S ec) (c) R egulation E rror V alues State R egulation E rrors w ith P roposed AE O ptim al control ofN C S w ith unknow n dynam ics e1 e2 e3 e4 0 3.5 7 10.5 14 0 5 10 15 x 10 5 Tim e (Sec) (a) System totalcosts J(X k) S ystem total costs w ith Q -learning suboptim al control and Proposed AE optim al control w ith unknow n dynam ics Q -learning suboptim al control Proposed A E optim al control 0 3.5 7 10.5 14 -200 -150 -100 -50 0 50 100 Tim e (Sec) (b) C ontrolInput C ontrol inputs w ith Q -learning suboptim al control and Proposed A E optim al control w ith unknow n dynam ics Q -learning suboptim al control Proposed AE optim al control k i i z T i i z T i k u R u z Q z E J 1 k zk k zk k k z k z Az B u ,y Cz

Upload: joella-nelson

Post on 19-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Stochastic Optimal Control of Unknown Linear Networked Control System in the Presence of Random Delays and Packet Losses OBJECTIVES Develop a Q-learning

Stochastic Optimal Control of Unknown Linear Networked Control System in the Presence of Random Delays and Packet Losses

OBJECTIVES

• Develop a Q-learning based stochastic suboptimal controller for an unknown networked control system (NCS) with random delay and packet losses;

• Develop an adaptive estimator (AE)-based stochastic optimal control

• Investigate the effects of delays and packet losses on the stability of the NCS with unknown dynamics

Student: Hao Xu, ECE Department

BACKGROUND• Networked control can reduce the installation costs and increase

productivity through the use of wireless communication technology

• The challenging problems in control of networked-based system are network delay and packet losses. These effects do not only degrade the performance of NCS, but also can destabilize the system.

• Approximate dynamic programming (ADP) techniques intent to solve optimal control problems of complex systems without the knowledge of system dynamics in a forward-in-time manner.

Figure 1 the wireless networked control system

• The proposed approach for optimal controller design involves using a combination of Q-learning and adaptive estimator (AE) whereas for suboptimal controller design only Q-learning scheme will be utilized

• The delays and packet losses are incorporated in the dynamic model which will be used for the controller development

Networked Control System Model• Networked control system representation

and

• Figure 2 depicts a block diagram representation:

Figure 2 Block diagram of Networked control system

Faculty Advisor: Dr. Jagannathan Sarangapani, ECE Department

Q-learning Stochastic Suboptimal Control 1. Define the Q-function:

2. Define the update law to tune the Q-function

where

3. Using mean values of the delays and packet losses instead of the random delays and packet losses, then H matrix become time-invariant matrix.

4. Define the update law to tune the H matrix online in least-squares sense

• 1) Vectorize the H matrix:

• 2) Update law:

where and

5. Develop the stochastic suboptimal control

6. Convergence: when , and at the same time.

Simulation Results• Consider the linear time-invariant inverted pendulum dynamics

After random delays and packet losses due to NCS, the original time-invariant system was discretized and represented as a time-varying system (Note: since the random delays and packet losses are considered, the NCS model is not only time varying , but also a function of time k)

• Performance evaluation of proposed suboptimal and optimal control

1) Stability:

Figure 5 Stability performance

As shown in Figure 5, if we use a PID without considering delays and packet losses, the NCS will be unstable(fig.5-(a)). However, when we implement proposed Q-learning suboptimal and AE optimal control, the NCS can still maintain stable(Fig.5-(b),(c)).

2) Optimality:

Figure 6 Optimal performance

As shown in figure 6-(a), proposed AE-base optimal controller can minimize the cost-to-go ( ) better than proposed Q-learning suboptimal controller. In Figure 6-(b), proposed AE-based optimal control can force NCS states converge to zero quicker than Q-learning suboptimal control. It indicates proposed AE-based optimal control is more effective than Q-learning suboptimal control.

AE-based Stochastic Optimal Control1. When random delays and packet losses are considered, H matrix become

time-varying. However, we assume that it changes slowly.

2. Set up stochastic Q-function:

3. Using the adaptive estimator to represent the Q-function:

where and is the Kronecker product quadratic polynomial basis vector

4. Define the update law to tune the approximated H matrix

• 1) Represent residual error:

where and

• 2) Update law for time varying matrix H:

where is a constant, and

5. Determine the AE stochastic optimal control input

6. Convergence: when , then and

CONCLUSIONS

• Proposed Q-learning based suboptimal and AE-based optimal control design for NCS with unknown dynamics in presence of random delays and packet losses performs superior than a traditional controller

• Both Q-learning based suboptimal control and AE-based optimal control can maintain NCS stable.

• Proposed AE-based optimal control is more effective than Proposed Q-learning based suboptimal control.

AE-based Stochastic Optimal Control (2)

• Figure 3 present the block diagram for the AE-based stochastic optimal regulator of NCS

Figure 3 Stochastic optimal regulator block diagram

FUTURE WORK

• Design suboptimal and optimal control for nonlinear networked control systems (NNCS) with unknown dynamics in presence of random delays and packet losses

• Design a novel wireless network protocol to decrease the effects of random delays and packet losses.

• Optimize the NNCS globally from both control part and wireless network part.

PlantActuator Sensor

DelayAnd

Packet losses

Controller

DelayAnd

Packet losses

Wireless Network

T

)(tca

)(tIp

)(tsc

)(tIp

1k zk k zk k k z kz A z B u , y C z

000

00

000

0001111

m

m

m

kddk

kiik

kks

zk

I

I

I

BIpBIpBIpA

A

0

0

0

0

m

kk

zk

I

BIp

B 00 CCz

1,...,2,1

,1

11

di

iTTdsBeBTi

iT

ki

ki

ki

sTAki

ki

ki

11

0, if was received during , 1

1, if was lost during , 1

k i

k i

k i

u kT k TIp

u kT k T

mdnTTdk

Tk

Tkk uuxz 1

11

1, kkzTkkz

Tkkk JuRuzQzEuzQ

11 1 1

1

1 1 1 1

ki k k k k i k k

u

*i T Tk k k k z k k z k

TT TT * T *k k i k k

Q z ,u r z ,u minQ z ,u

r z ,u J z Q z u R u

z u H z u

ziTzzzi

Tz

ziTzzi

Tzz

iuu

iuz

izu

izz

iBPBRAPB

BPAAPAQ

HH

HHH

Hvech

kikk

Ti

hi dzHzwdzwhh

i

2

11 ,minarg1

11,, kikikizT

kikzTkik zuzQzuRzuzQzHzwd TT

kiTkk zuzzw *

kiik

iuz

iuuzi

Tzzi

Tzzi

zKu

HHAPBBPBRK

*

11

i ***,, kk

ikki uzQuzQ

KKHH ii ,

TTk

Tkk

Tk

Tkkkz

Tkkz

Tkkk uzHuzJuRuzQzEuzQ 1,

kTkkk

Tkkk whwHwuzQ ˆˆ,

lmdnk

T

kTT

kkkk wzuzwHvech

,,ˆˆ 2

12

221 ,,...,,..., klklklkkk wwwwww

111ˆ, kTkkkhk Whuzre

11 kkk wwW 111111, kzTkkz

Tkkk uRuzQzuzr

TkkThkhk

Tk

Tkk uzreWWWh ,ˆ 1

1

h 10 h

kuzk

uukkkk zHHzKu ˆˆˆˆ

1

0, kzkkk hh ˆ *ˆ

kk JJ *ˆ kk uu

Adaptive Estimator of function

Linear Network Control System with Unknown

kzkkzkk zuBzAz 1

zkzk BandA

kk uzQ ,

Cost Function Network

kTk wh

uzk

uuk HH

1

kz

kuzk

uukk zHHzu

1

kkTkk wHwJ ˆˆ

ux

x

x

x

5455.4

0

8182.1

0

01818.314545.00

1000

06727.21818.00

0010

0 1.75 3.5 5.25 7

-80

-60

-40

-20

0

20

Time (Sec)(a)

Reg

ula

tio

n E

rro

r V

alu

es

State Regulation Error of NCSwith Delay and Packet Losses

e1e2e3e4

0 3.5 7 10.5 14 17.5-40

-30

-20

-10

0

10

20

30

Time (Sec)(b)

Reg

ula

tio

n E

rro

r V

alu

es

State Regulation Errors with Q-learningsuboptimal control of NCS with unknown dynamics

e1e2e3e4

0 3.5 7 10.5 14 17.5-40

-20

0

20

40

60

80

Time (Sec)(c)

Reg

ula

tio

n E

rro

r V

alu

es

State Regulation Errors with Proposed AEOptimal control of NCS with unknown dynamics

e1e2e3e4

0 3.5 7 10.5 140

5

10

15x 10

5

Time (Sec)(a)

Sys

tem

to

tal c

ost

s J(

Xk)

System total costs with Q-learning suboptimal control and Proposed AE optimal control

with unknown dynamics

Q-learning suboptimal controlProposed AE optimal control

0 3.5 7 10.5 14-200

-150

-100

-50

0

50

100

Time (Sec)(b)

Co

ntr

ol I

np

ut

Control inputs with Q-learning suboptimal control andProposed AE optimal control with unknown dynamics

Q-learning suboptimal controlProposed AE optimal control

kiiz

Tiiz

Tik uRuzQzEJ

1k zk k zk k k z kz A z B u , y C z