stochastic optimal control of unknown linear networked control system in the presence of random...
TRANSCRIPT
![Page 1: Stochastic Optimal Control of Unknown Linear Networked Control System in the Presence of Random Delays and Packet Losses OBJECTIVES Develop a Q-learning](https://reader036.vdocument.in/reader036/viewer/2022083009/5697c02d1a28abf838cd9a43/html5/thumbnails/1.jpg)
Stochastic Optimal Control of Unknown Linear Networked Control System in the Presence of Random Delays and Packet Losses
OBJECTIVES
• Develop a Q-learning based stochastic suboptimal controller for an unknown networked control system (NCS) with random delay and packet losses;
• Develop an adaptive estimator (AE)-based stochastic optimal control
• Investigate the effects of delays and packet losses on the stability of the NCS with unknown dynamics
Student: Hao Xu, ECE Department
BACKGROUND• Networked control can reduce the installation costs and increase
productivity through the use of wireless communication technology
• The challenging problems in control of networked-based system are network delay and packet losses. These effects do not only degrade the performance of NCS, but also can destabilize the system.
• Approximate dynamic programming (ADP) techniques intent to solve optimal control problems of complex systems without the knowledge of system dynamics in a forward-in-time manner.
Figure 1 the wireless networked control system
• The proposed approach for optimal controller design involves using a combination of Q-learning and adaptive estimator (AE) whereas for suboptimal controller design only Q-learning scheme will be utilized
• The delays and packet losses are incorporated in the dynamic model which will be used for the controller development
Networked Control System Model• Networked control system representation
and
• Figure 2 depicts a block diagram representation:
Figure 2 Block diagram of Networked control system
Faculty Advisor: Dr. Jagannathan Sarangapani, ECE Department
Q-learning Stochastic Suboptimal Control 1. Define the Q-function:
2. Define the update law to tune the Q-function
where
3. Using mean values of the delays and packet losses instead of the random delays and packet losses, then H matrix become time-invariant matrix.
4. Define the update law to tune the H matrix online in least-squares sense
• 1) Vectorize the H matrix:
• 2) Update law:
where and
5. Develop the stochastic suboptimal control
6. Convergence: when , and at the same time.
Simulation Results• Consider the linear time-invariant inverted pendulum dynamics
After random delays and packet losses due to NCS, the original time-invariant system was discretized and represented as a time-varying system (Note: since the random delays and packet losses are considered, the NCS model is not only time varying , but also a function of time k)
• Performance evaluation of proposed suboptimal and optimal control
1) Stability:
Figure 5 Stability performance
As shown in Figure 5, if we use a PID without considering delays and packet losses, the NCS will be unstable(fig.5-(a)). However, when we implement proposed Q-learning suboptimal and AE optimal control, the NCS can still maintain stable(Fig.5-(b),(c)).
2) Optimality:
Figure 6 Optimal performance
As shown in figure 6-(a), proposed AE-base optimal controller can minimize the cost-to-go ( ) better than proposed Q-learning suboptimal controller. In Figure 6-(b), proposed AE-based optimal control can force NCS states converge to zero quicker than Q-learning suboptimal control. It indicates proposed AE-based optimal control is more effective than Q-learning suboptimal control.
AE-based Stochastic Optimal Control1. When random delays and packet losses are considered, H matrix become
time-varying. However, we assume that it changes slowly.
2. Set up stochastic Q-function:
3. Using the adaptive estimator to represent the Q-function:
where and is the Kronecker product quadratic polynomial basis vector
4. Define the update law to tune the approximated H matrix
• 1) Represent residual error:
where and
• 2) Update law for time varying matrix H:
where is a constant, and
5. Determine the AE stochastic optimal control input
6. Convergence: when , then and
CONCLUSIONS
• Proposed Q-learning based suboptimal and AE-based optimal control design for NCS with unknown dynamics in presence of random delays and packet losses performs superior than a traditional controller
• Both Q-learning based suboptimal control and AE-based optimal control can maintain NCS stable.
• Proposed AE-based optimal control is more effective than Proposed Q-learning based suboptimal control.
AE-based Stochastic Optimal Control (2)
• Figure 3 present the block diagram for the AE-based stochastic optimal regulator of NCS
Figure 3 Stochastic optimal regulator block diagram
FUTURE WORK
• Design suboptimal and optimal control for nonlinear networked control systems (NNCS) with unknown dynamics in presence of random delays and packet losses
• Design a novel wireless network protocol to decrease the effects of random delays and packet losses.
• Optimize the NNCS globally from both control part and wireless network part.
PlantActuator Sensor
DelayAnd
Packet losses
Controller
DelayAnd
Packet losses
Wireless Network
T
)(tca
)(tIp
)(tsc
)(tIp
1k zk k zk k k z kz A z B u , y C z
000
00
000
0001111
m
m
m
kddk
kiik
kks
zk
I
I
I
BIpBIpBIpA
A
0
0
0
0
m
kk
zk
I
BIp
B 00 CCz
1,...,2,1
,1
11
di
iTTdsBeBTi
iT
ki
ki
ki
sTAki
ki
ki
11
0, if was received during , 1
1, if was lost during , 1
k i
k i
k i
u kT k TIp
u kT k T
mdnTTdk
Tk
Tkk uuxz 1
11
1, kkzTkkz
Tkkk JuRuzQzEuzQ
11 1 1
1
1 1 1 1
ki k k k k i k k
u
*i T Tk k k k z k k z k
TT TT * T *k k i k k
Q z ,u r z ,u minQ z ,u
r z ,u J z Q z u R u
z u H z u
ziTzzzi
Tz
ziTzzi
Tzz
iuu
iuz
izu
izz
iBPBRAPB
BPAAPAQ
HH
HHH
Hvech
kikk
Ti
hi dzHzwdzwhh
i
2
11 ,minarg1
11,, kikikizT
kikzTkik zuzQzuRzuzQzHzwd TT
kiTkk zuzzw *
kiik
iuz
iuuzi
Tzzi
Tzzi
zKu
HHAPBBPBRK
*
11
i ***,, kk
ikki uzQuzQ
KKHH ii ,
TTk
Tkk
Tk
Tkkkz
Tkkz
Tkkk uzHuzJuRuzQzEuzQ 1,
kTkkk
Tkkk whwHwuzQ ˆˆ,
lmdnk
T
kTT
kkkk wzuzwHvech
,,ˆˆ 2
12
221 ,,...,,..., klklklkkk wwwwww
111ˆ, kTkkkhk Whuzre
11 kkk wwW 111111, kzTkkz
Tkkk uRuzQzuzr
TkkThkhk
Tk
Tkk uzreWWWh ,ˆ 1
1
h 10 h
kuzk
uukkkk zHHzKu ˆˆˆˆ
1
0, kzkkk hh ˆ *ˆ
kk JJ *ˆ kk uu
Adaptive Estimator of function
Linear Network Control System with Unknown
kzkkzkk zuBzAz 1
zkzk BandA
kk uzQ ,
Cost Function Network
kTk wh
uzk
uuk HH
1
kz
kuzk
uukk zHHzu
1
kkTkk wHwJ ˆˆ
ux
x
x
x
5455.4
0
8182.1
0
01818.314545.00
1000
06727.21818.00
0010
0 1.75 3.5 5.25 7
-80
-60
-40
-20
0
20
Time (Sec)(a)
Reg
ula
tio
n E
rro
r V
alu
es
State Regulation Error of NCSwith Delay and Packet Losses
e1e2e3e4
0 3.5 7 10.5 14 17.5-40
-30
-20
-10
0
10
20
30
Time (Sec)(b)
Reg
ula
tio
n E
rro
r V
alu
es
State Regulation Errors with Q-learningsuboptimal control of NCS with unknown dynamics
e1e2e3e4
0 3.5 7 10.5 14 17.5-40
-20
0
20
40
60
80
Time (Sec)(c)
Reg
ula
tio
n E
rro
r V
alu
es
State Regulation Errors with Proposed AEOptimal control of NCS with unknown dynamics
e1e2e3e4
0 3.5 7 10.5 140
5
10
15x 10
5
Time (Sec)(a)
Sys
tem
to
tal c
ost
s J(
Xk)
System total costs with Q-learning suboptimal control and Proposed AE optimal control
with unknown dynamics
Q-learning suboptimal controlProposed AE optimal control
0 3.5 7 10.5 14-200
-150
-100
-50
0
50
100
Time (Sec)(b)
Co
ntr
ol I
np
ut
Control inputs with Q-learning suboptimal control andProposed AE optimal control with unknown dynamics
Q-learning suboptimal controlProposed AE optimal control
kiiz
Tiiz
Tik uRuzQzEJ
1k zk k zk k k z kz A z B u , y C z