Stochastic Dynamic Programming for Network
Resource Allocation
Yahong Rosa Zheng
Department of Electrical & Computer EngineeringMissouri University of Science and Technology, USA
The work was supported by NSF, ONR, and Dept. of Transportation.
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Summary of My Research Projects
Secure
Cognitive/Relay
Networks
FPGA/DSP
Hardware
Implementation
MIMO
Wireless
Transceiver
Network
Resource
Networks
MIMO for
Massive
5G Cellular
Intelligent
Transportation
Systems
Smart &
Connected
Communities
Underwater
Infrastructure
Monitoring
Sensor
Underwater
Processing
Signal
Nearfield Array
Fading
Modeling
Channel
SAR & MRI
Sensing Allocation
Compressive
DSP Communication Networks Applications
Y. R. Zheng Stochastic Dynamic Programming 2 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Markov Decision Process
• States: S1, S2, S3, · · · and Actions: a0, a1, · · · ;• Transition Probabilities: T (Sj |ai , Si )– probability of transition to State Sj from State Si by taking Action ai ;
• Expected Rewards: R(Sj |ai , Si) - reward of transition to State Sj bytaking Action ai at State Si .
a0
a0
a0
a1
a1
a1S1
S2
S3
0.3
0.5
0.3
0.6
0.8
0.6
0.2
0.1
0.4
0.5
+7
-8
0.7
State Transition Diagram
T TStates
Value
Actions
Functions
stst−1 st+1
rtrt−1 rt+1
atat−1 at+1
MDP model
Y. R. Zheng Stochastic Dynamic Programming 3 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Solution to an MDP
A policy π : S → A maps each state to a desirable action.Optimal solution optimizes the expected discounted rewards over a horizon[0,T ]:
T∑
t=0
γtR (st+1|π(st), st)
Value function of State s is defined as
V (s) =∑
s′
T (s ′|a, s) [R(s ′|a, s) + γV (s ′)]
Bellman equation: value iteration
Vi+1(s) = maxa
∑
s′
T (s ′|a, s) [R(s ′|a, s) + γVi(s′)]
Y. R. Zheng Stochastic Dynamic Programming 4 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Applications: Wireless Network with Energy Harvesting
• Wireless Internet of Things (IoT) for ITS;
• Smart sensors for underwater infrastructure monitoring;
• Cellular base stations powered by renewable energy.
IoT for Intelligent Transportation Systems
antennaWi-Fi
Vehicle
GatewayNode
MobileTower
Cellphone Wi-Fiantenna
Cellphoneantenna
Piles
Transducer
Soil deposits
Hydrophone Current flow
River bank
Smart rocks
Pier
Bridge Deck
Natural rocks
Scour hole
Acoustic
Water
Communication
Smart Rocks for Bridge Scour Monitoring
Y. R. Zheng Stochastic Dynamic Programming 5 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Network Resource Allocation
Design Goals:
• Utilize harvested energy as much as possible;
• Ensure network performance.
Research Challenges:
• Harvested energy is non-stable: followsstatistical models;
• Mobile communication channels aretime-variant and stochastic.
Existing Approaches:
• Online vs. Offline ⇒ Dynamic vs. Static;
• Channel State Information: Instantaneous vs.Statistics
• Gaussian Inputs vs. Finite Alphabet Inputs
Wireless Base Station Powered by
Renewable Energy Sources
Y. R. Zheng Stochastic Dynamic Programming 6 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
System Model
y(i)u,k,d = H
(i)u,k · Pu,k,d · x(i)u,k,d + n
(i)u,k with H
(i)u,k =
√αu,k ·Ψ
12
u,kΘ(i)u,kΦ
12
u,k
i : index for channel realization; u = 1 : U user index
k = 1 : K subchannel index; d = 1 : D MCR index.
Y. R. Zheng Stochastic Dynamic Programming 7 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
MIMO Communication Networks
OFDMA – Orthogonal Frequency-Division Multiple AccessSC-FDMA – Single-Carrier Frequency-Division Multiple Access
Equ
aliz
e
Subcarrier Mapping End
Front
Subcarrier Mapping
IDFT
IDFTMCR DFT
SubcarrierDe−map
Detect
Decode
DFTPrecode
MCR Precode Detect
Decode
EndFront
DFT
Equ
aliz
e
SubcarrierDe−map
Users 2:U
CPIP/S
...
SC−FDMA:...
Decode
OFDMA:
CPIP/S
Cha
nnel
Channel CDI
IDFTDecode
...
...
Users 2:U
xu,k,d
xu,k,d
bu
burd
rd
H
Ωu,k = αu,k , Φu,k ,Ψu,k
Pu,k,d P−1u,k,d
su,k,d
su,k,d
su,k,d
su,k,dF
FF†
F†
Pu,k,d P−1u,k,d
Spectral efficiency rd = cd · Nt · log2 Md .where cd – coding rate; Md – Constellation Size of Modulation;
Linear precoders Pu,k,d : Nt × Nt complex matrices;Binary indicators su,k,d [t] ∈ 0, 1 – resource assignment table
Y. R. Zheng Stochastic Dynamic Programming 8 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Objectives of Resource Allocation
Network Throughput = Expected Sum of Average Mutual Information (AMI)
I(Pu,k,d |Ωu,k ) = Nt logMd −1
MNt
d
MNtd
∑
m=1
EHu,kEnu,k log
MNtd
∑
n=1
evmn/σ
2
where vmn = ‖Hu,kPu,k,d (xm,d − xn,d) + nu,k‖2 − ‖nu,k‖2.
Contrast to Gaussian Inputs with InstantaneousChannel State Information:
IG(Pu,k |Hu,k ) = log det(
I+Hu,kPu,kP†u,kH
†u,k/σ
2)
Solution is classic waterfilling for Gaussian inputsor mercury waterfilling for finite alphabet.However, Rx cannot distinguish symbol from noise;
Channel estimated at t1 is different from that at t2.−2
−1
0
1
2
−2
−1
0
1
20
0.05
0.1
0.15
0.2
0.25
Magnitude of xMagnitude of y
Pro
babili
ty
QPSK modulation vs. Gaussian inputs
Y. R. Zheng Stochastic Dynamic Programming 9 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Online vs. Offline
Energy dynamics: B[t + 1] = min(B[t] + G [t]− (E [t]− τPr )+,Bmax);
Battery states: B[t] = B1 : BM ; Harvested energy G [t] = G1 : GJ ;Actions: energy allocation E [t] = E1 : EN ;where
E [t] =
U∑
u=1
K∑
k=1
D∑
d=1
su,k,d [t] · Eu,k,d [t]
with Eu,k,d [t] = τ · Tr(
Pu,k,d [t] · PHu,k,d [t]
)
Y. R. Zheng Stochastic Dynamic Programming 10 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Online Resource Allocation
SDP:maximize
P,SR
(j)SUM
(
P ,S)
(j = 1 : T ; time slot index)
subject to B[T + 1] ≥ QB , E [t] ≤ τPr + B[t]
U∑
u=1
D∑
d=1
su,k,d [t] ≤ 1, ∀k , t.
Value function:
R(j)SUM
(
P ,S)
= Rj
(
P [1],S[1]∣
∣Ω,B[1])
+
T∑
t=j+1
EB
Rt
(
P [t],S[t]∣
∣Ω,B[t])
where P[t] = Pu,k,d [t] : ∀u, k , d −→ P = P[t] : ∀t
S[t] = su,k,d [t] : ∀u, k , d −→ S = S[t] : ∀t
Rt
(
P[t],S[t]∣
∣Ω,B[t])
= W ·
U∑
u=1
ρu
K∑
k=1
D∑
d=1
rd · su,k,d [t]
with I(
Pu,k,d [t]∣
∣Ωu,k
)
≥ rd · su,k,d [t], ∀u, k , d
Y. R. Zheng Stochastic Dynamic Programming 11 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Break the Curse of Dimensionality
Existing energy harvesting designs
• Offline designs with non-causalenergy arrival profile:[Ozel2011,JSAC],[Gregori2013,TC]
• Online designs with causalenergy arrival profile:SISO – [Ho2012,TSP]Heuristic – [Ng2013,TWC]
Our approach:
• Layered decomposition achievesglobal optimal solution with lowcomplexity
W. Zeng, Y.R. Zheng, and R. Schober, “Online Resource Allocation for Energy Harvesting Downlink Multiuser Systems: Precoding withModulation, Coding Rate, and Subchannel Selection,” IEEE Trans. Wireless Commun., Oct. 2015.
Y. R. Zheng Stochastic Dynamic Programming 12 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
The Upper Layer: a One-Dimensional SDP Problem
maximizeE
RUL
(
E)
= R⋆
ML(E [1]∣
∣B[1])
+
T∑
t=2
EB
R⋆
ML
(
E [t]∣
∣B[t])
(1a)
subject to B[T + 1] ≥ QB (1b)
B[t + 1] = min
B[t] + G [t]− (E [t]− τPr )+,Bmax
, ∀t (1c)
E [t] ≤ τPr + B[t], ∀t (1d)
......
......
...
Time Slot
... ...Time Slot Time Slot Time Slot Time Slot
State
State
State
Direction for Backward Recursion
...
Time Slot
...
Virtual
Time Slot
... ...
... ...
......
Y. R. Zheng Stochastic Dynamic Programming 13 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
The Middle Layer: a Knapsack Problem
maximizeS
RML(S) = W ·U∑
u=1
ρu
K∑
k=1
D∑
d=1
rd · su,k,d (2a)
subject to
U∑
u=1
K∑
k=1
D∑
d=1
su,k,d · ϕ⋆
u,k,d ≤ E ≤ τPr + B (2b)
U∑
u=1
D∑
d=1
su,k,d ≤ 1, ∀k , su,k,d ∈ 0, 1, ∀u, k , d (2c)
Method of Lagrangian Multiplier with µ for the energy constraint:Step 1 Solve K convex optimization problems:
maximizeSk
Lk(
Sk , µ)
subject to
U∑
u=1
D∑
d=1
su,k,d ≤ 1
where Lk(Sk , µ) =
U∑
u=1
D∑
d=1
su,k,d ·(
W · ρurd − µ · ϕ⋆
u,k,d
)
Y. R. Zheng Stochastic Dynamic Programming 14 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
The Middle Layer Solution
Step 2 For a given µ, compute
g(µ) = maximizeS
L(S, µ) =
K∑
k=1
Lk(Sk , µ) + µ · E
Step 3 Solve forminimize
µ≥0g(µ)
Since g(µ) is piecewise linear in µ → Step 3 is solved by subgradient-aidedbisection search.The gradient is computed at g(µ) at µ = (µl + µh)/2 as
∇µg(µ) = E −U∑
u=1
K∑
k=1
D∑
d=1
s(µ)u,k,d · ϕ⋆
u,k,d
where s(µ)u,k,d : Optimal solution in Step 1 with a given µ.
Y. R. Zheng Stochastic Dynamic Programming 15 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
The Lower Layer: Power Allocation
Rate maximization:
maximizeP
I(P∣
∣Ωu,k )
subject to τ · Tr(
P · PH)
≤ E
Two Optimization Methods:
• Convex optimization w.r.t GramMatrix (HP)†HP;
• Maximize a Tight Upper Bound ofMutual Information. −20 −10 0 10 20
0
1
2
3
4
H = [2 1; 1 1], Hd = svd(H) = [2.6180 0; 0 0.3820]
Mut
ual
Info
rmat
ion
(bi
ts/s
/Hz)
SNR (dB)
Hd, Gaussian, CWF
Hd, Gaussian, No WF
Hd, QPSK, Optimal P
Hd, QPSK, Max Div
Hd, QPSK, M/WF
Hd, QPSK, No WF
Hd, QPSK, CWF
The dual problem:
ϕ⋆
u,k,d = minimizeP
ϕ(P) = τ · Tr(
P · PH)
(4a)
subject to I(P∣
∣Ωu,k) ≥ rd , ∀u, k , d . (4b)
C. Xiao, Y. R. Zheng, and Z. Ding, “Globally Optimal Linear Precoders for Finite Alphabet Signals over Complex Vector Gaussian Channels,”IEEE Trans. SP, July 2011.W. Zeng, C. Xiao, M. Wang, and J. Lu, “Linear precoding for finite alphabet inputs over MIMO fading channels with statistical CSI,” IEEE Trans.
SP, June 2012.
Y. R. Zheng Stochastic Dynamic Programming 16 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
The Lower Layer Solution
Step 1: SVD channel correlation matrix Φu,k = Uu,kΣu,kVu,k
Let P = U ·√
Diag(λ) · V. Set U = Uu,k .Set V as the modulation diversity matrix.
Step 2: Approximate the average mutual information by
IA(λ|Ωu,k ) = Nt logMd−1
MNtd
MNtd
∑
m=1
log
MNtd
∑
n=1
∏
q
(
1+αu,kψ
(q)u,k
2σ2e(d)Hmn Diag(λ)Σu,k e
(d)mn
)−1
.
where ψ(q)u,k is the q-th eigenvalue of the receive correlation matrix Ψu,k .
Step 3: Solve for the power allocation λ by
minimizeλ
τ · 1Tλ
subject to IA(
λ∣
∣Ωu,k
)
≥ rd and λ ≥ 0
With the interior-point algorithm with the log-barrier function.
Y. R. Zheng Stochastic Dynamic Programming 17 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Simulation Setup
• Network Resource Profile # of subchannels K = 128, # of MCR schemes D = 12; # of Users U = 9, user priority weights ρu = 1 for all u ;
• MIMO Fading Channel Profile COST231-Hata propagation model with 6 multipath taps Per User: 2× 2 MIMO transmit correlation matrix Ψt = Ψ(0.95) receive correlation matrix Ψr = Ψ(0.5) the (i , j)-th element of Ψ(ρ) follows the exponential model
[Ψ(ρ)]i,j = ρ|i−j|, ρ ∈ [0, 1) and i , j = 1, · · · ,Nt or Nr
• Energy Harvesting Profile initial battery state: 1 unit; battery capacity Bmax = 10 harvested energy Ek , ∀k , takes its value in set 0,1,2 with equal
probability; the length of time slot τ = 1. battery state set and power choice set are discretized as 0, 1, · · · ,Bmax
Y. R. Zheng Stochastic Dynamic Programming 18 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Performance: Upper Layer
0 0.5 1 1.5 2 2.5 3400
500
600
700
800
900
1000
Power from Stable Energy Source: Pr
Sum
of A
vera
ge M
utu
al In
form
ation (
Mbit)
Proposed Precoder: Dynamic Programming
Proposed Precoder: Greedy Method
No Precoding: Dynamic Programming
No Precoding: Greedy Method
Max. Capacity: Dynamic Programming
Max. Capacity: Greedy Method
Achievable AMI vs. regular power source Pr with T = 40 and Bmax = 10.
Y. R. Zheng Stochastic Dynamic Programming 19 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Performance: Middle Layer
−10 −5 0 5 105
10
15
20
25
30
Energy Consumption (dB)
Sum
of A
vera
ge M
utu
al In
form
ation (
Mbps)
Max. Capacity: Gaussian inputs with Subchannel Selection
Linear Programming: MCR and Subchannel Selection
Proposed Precoder: MCR and Subchannel Selection
No Precoding: MCR and Subchannel Selection
Max. Capacity: MCR and Subchannel Selection
The proposed algorithm approaches the Gaussian input upper bound.
Y. R. Zheng Stochastic Dynamic Programming 20 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Performance: Lower Layer
−20 −15 −10 −5 0 5 10
1
2
3
4
5
6
7
Energy Consumption (dB)
Avera
ge M
utu
al In
form
ation (
bps/H
z)
Max. Capacity: Gaussian
Proposed Precoder: BPSK, QPSK, 16−QAM
No Precoding: BPSK, QPSK, 16−QAM
Max. Capacity: BPSK, QPSK, 16−QAM
BPSK, 1/2, 2
/3, 3/4, 4
/5Q
PSK, 1/2
, 2/3
, 3/4
, 4/5
16−Q
AM, 1
/2, 2
/3, 3
/4, 4
/5
Energy consumption vs. AMI for 2× 2 correlated MIMO channels.
Y. R. Zheng Stochastic Dynamic Programming 21 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Concluding Remarks
• Online resource allocation for energy harvesting wireless networks:multi-dimensional SDP problem.
• An innovative layered decomposition approach solves the SDPoptimization with preserved optimality
• Computational complexity with KUD subchannels, MJN battery/energylevels, and 2N2
t KUD real-valued precoder coefficients:
Direct computation of SDP:
O(
TMJ · (N2N2t KUD + 2KUD) · fO
)
fO = complexity for computing theMIMO Average Mutual Information.
The Proposed 3-layer algorithm:
O(
TMJN+N ·KUD+√
Nt ·KUD ·fA)
.
fA = 7.2× 10−8fO for QPSKfA = 3.2× 10−6fO for BPSK.
Y. R. Zheng Stochastic Dynamic Programming 22 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Security in Cognitive Radio Networks
maximizeP
Rsc
(
P|Ω)
=
I∑
i=1
min1≤j≤J
[
EHiI(xi ; yi)− EGj
I(xi ; zj)]+
subject to Tr(P†P) ≤ γ0 & Tr(P†ΘfkP) ≤ γk , ∀k .where Ω = Θhi ,Θgj ,Θfk, ∀i , j , k .
SR-I
SR-2
SR-1PR-1
PR-2
PR-K
ED-1 ED-2 ED-J
... ...
ST
...
H1
H2
HI
...
G1 G2... GJ
F1
F2
...
FK
ST – secondary transmitter to design precoder P;PR-k – kth primary user with channel Fk ;SR-i – ith secondary receiver with channel Hi ;ED-j – jth eavesdropper with channel Gj ;Algorithms:
• ST utilizes channel statistical information Ω;
• Difference of Convex Functions in terms ofW = P†H†HP;
• Generalized Quadratic Matrix Programming(GQMP).
J. Jin, Y.R. Zheng, W. Chen, and C. Xiao, “Generalized Quadratic Matrix Programming: A Unified Framework for Linear Precoding WithArbitrary Input Distributions,” IEEE Trans. Signal Processing. Accepted June 2017.
Y. R. Zheng Stochastic Dynamic Programming 23 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
High Data-Rate Underwater Acoustic Communications
Severe Multipath and Doppler Fading Channels Cause Phase Rotation.
Doppler (Hz)
Del
ay (
ms)
Scattering function (moving−source)
10 15 200
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0.5
1
1.5
2
2.5
3
x 106
Doppler (Hz)
Del
ay (
ms)
Scattering function (fixed−source)
−5 0 50
2
4
6
8
10
12
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2x 10
6
−1 −0.5 0 0.5 1
x 105
−1
−0.5
0
0.5
1
x 105
Imag
inar
y
Real
Received signals of first channel
−1.5 −1 −0.5 0 0.5 1 1.5−1.5
−1
−0.5
0
0.5
1
1.5
Imag
inar
y
Real
Equalized signals of 4 channels
−1.5 −1 −0.5 0 0.5 1 1.5
−1.5
−1
−0.5
0
0.5
1
1.5
Imag
inar
y
Real
Symbols after initial phase correction
−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
Imag
inar
y
Real
Phase−corrected signals of 4 channels
Algorithms:• Multiple-Input
Multiple-Output(MIMO);
• TD and FD TurboEqualization;
• Adaptive ChannelEstimation or DirectAdaptation.
MIMOLC-MMSE equalizer
H
MAP decoder
y
Iterative MIMOchannel estimator
Pilot
2σ
1Π
1,ˆ
pb
Π
1,( )Ee kL c
1,( )De kL c
1, '( )Ee kL c
1, '( )De kL c
DE
MU
XM
UX
LLRs
MAP decoder
1 Π
,ˆ
N pb
Π
,( )Ee N kL c
,( )De N kL c
, '( )Ee N kL c
, '( )De N kL c
⋮
⋮
⋮ ⋮ ⋮
LLRs
s
Y.R.Zheng, J. Wu, and C. Xiao, “Turbo Equalization for Underwater Acoustic Communications,” IEEE Commun. Mag., Vol. 53, No. 11, pp.79-87, Nov. 2015.Y.R.Zheng and Weimin Duan, “Improved Turbo Receivers for underwater acoustic communications,” US Patent Application #62483358.Pending. April 9, 2017.
Y. R. Zheng Stochastic Dynamic Programming 24 \ 25
Introduction Problem Formulation Resource Allocation Numerical Results Related Works
Compressive Sensing Using L1 Optimization
Near-Field SAR Image Reconstruction
( )zkkS yx ,,
( )zyxs ,,
( )zyx kkkF ,,′
( )ω,, yx kkF
Truncation Repair
SAR R-SAR
(Truncation)
( )ω,, yxf
.1−NU!
.12−D" .2D#
.NU$
222
2 zyx kkk ++= νω222
2yxz kkk −−%
&
'()
*=νω
.2D#
ReferenceForward
ReverseReference
.12−D"
( )ω,,ˆ yxf
( )zyx kkkF ,,
I
)(Ω )( 1−Ω
3D SAR Transform by NUFFT:s(x , y , z) = F−1
2D
F−1NU
F (kx , ky , ω)
with F (kx , ky , ω) = F2D f (x , y , ω) exp(−jz0kz)and wavenumber kz =
√
4k2 − k2x − k2
y
Matrix form CS-SAR by minimizating:
J (s) = λℓ ‖Ψs‖1 + λtTV(s) +∥
∥
∥f −ΦΩ
−1s∥
∥
∥
2
2
where Ψ – sparsifying domain; Φ –measurement matrix; Ω – SAR transform.Algorithms:
• Adaptive Bases Selection within CSIterations (i.e. OMP, ADM, CG);
• Split-Bregman Algorithm for 3D CS-SARw/ Fast Convergence;
• Adaptive Measurement via Non-referenceImage Quality Assessment.
D. Bi and L. Ma and X. Xie and Y. Xie and X. Li and Y. R. Zheng, “A Splitting Bregman-Based Compressed Sensing Approach for Radial UTEMRI,” IEEE Transactions on Applied Superconductivity, Vol. 26, No. 7, pp. 1-5. Oct. 2016.X. Yang, Y.R. Zheng, M. T. Ghasr, and K. Donnell Hilgedick, “Microwave Imaging from Sparse Measurements for Nearfield Synthetic ApertureRadar (SAR),” IEEE Trans. Instrumentation Measurement, In press Mar. 2017.
Y. R. Zheng Stochastic Dynamic Programming 25 \ 25