Download - Stochastic Dynamic Programming for Network Resource Allocationyrz218/Resources/Zheng2017... · Stochastic Dynamic Programming for Network Resource Allocation Yahong Rosa Zheng Department

Stochastic Dynamic Programming for Network

Resource Allocation

Yahong Rosa Zheng

Department of Electrical & Computer EngineeringMissouri University of Science and Technology, USA

The work was supported by NSF, ONR, and Dept. of Transportation.

Introduction Problem Formulation Resource Allocation Numerical Results Related Works

Summary of My Research Projects

Secure

Cognitive/Relay

Networks

FPGA/DSP

Hardware

Implementation

MIMO

Wireless

Transceiver

Network

Resource

Networks

MIMO for

Massive

5G Cellular

Intelligent

Transportation

Systems

Smart &

Connected

Communities

Underwater

Infrastructure

Monitoring

Sensor

Underwater

Processing

Signal

Nearfield Array

Fading

Modeling

Channel

SAR & MRI

Sensing Allocation

Compressive

DSP Communication Networks Applications

Y. R. Zheng Stochastic Dynamic Programming 2 \ 25


Markov Decision Process

• States: S1, S2, S3, · · · and Actions: a0, a1, · · · ;• Transition Probabilities: T (Sj |ai , Si )– probability of transition to State Sj from State Si by taking Action ai ;

• Expected Rewards: R(Sj |ai , Si) - reward of transition to State Sj bytaking Action ai at State Si .

a0

a0

a0

a1

a1

a1S1

S2

S3

0.3

0.5

0.3

0.6

0.8

0.6

0.2

0.1

0.4

0.5

+7

-8

0.7

State Transition Diagram

T TStates

Value

Actions

Functions

stst−1 st+1

rtrt−1 rt+1

atat−1 at+1

MDP model



Solution to an MDP

A policy π : S → A maps each state to a desirable action.Optimal solution optimizes the expected discounted rewards over a horizon[0,T ]:

T∑

t=0

γtR (st+1|π(st), st)

Value function of State s is defined as

V (s) =∑

s′

T (s ′|a, s) [R(s ′|a, s) + γV (s ′)]

Bellman equation: value iteration

Vi+1(s) = maxa

∑

s′

T (s ′|a, s) [R(s ′|a, s) + γVi(s′)]



Applications: Wireless Network with Energy Harvesting

• Wireless Internet of Things (IoT) for ITS;

• Smart sensors for underwater infrastructure monitoring;

• Cellular base stations powered by renewable energy.

IoT for Intelligent Transportation Systems

antennaWi-Fi

Vehicle

GatewayNode

MobileTower

Cellphone Wi-Fiantenna

Cellphoneantenna

Piles

Transducer

Soil deposits

Hydrophone Current flow

River bank

Smart rocks

Pier

Bridge Deck

Natural rocks

Scour hole

Acoustic

Water

Communication

Smart Rocks for Bridge Scour Monitoring



Network Resource Allocation

Design Goals:

• Utilize harvested energy as much as possible;

• Ensure network performance.

Research Challenges:

• Harvested energy is non-stable: followsstatistical models;

• Mobile communication channels aretime-variant and stochastic.

Existing Approaches:

• Online vs. Offline ⇒ Dynamic vs. Static;

• Channel State Information: Instantaneous vs.Statistics

• Gaussian Inputs vs. Finite Alphabet Inputs

Wireless Base Station Powered by

Renewable Energy Sources



System Model

y(i)u,k,d = H

(i)u,k · Pu,k,d · x(i)u,k,d + n

(i)u,k with H

(i)u,k =

√αu,k ·Ψ

12

u,kΘ(i)u,kΦ

12

u,k

i : index for channel realization; u = 1 : U user index

k = 1 : K subchannel index; d = 1 : D MCR index.



MIMO Communication Networks

OFDMA – Orthogonal Frequency-Division Multiple AccessSC-FDMA – Single-Carrier Frequency-Division Multiple Access

Equ

aliz

e

Subcarrier Mapping End

Front

Subcarrier Mapping

IDFT

IDFTMCR DFT

SubcarrierDe−map

Detect

Decode

DFTPrecode

MCR Precode Detect

Decode

EndFront

DFT

Equ

aliz

e

SubcarrierDe−map

Users 2:U

CPIP/S

...

SC−FDMA:...

Decode

OFDMA:

CPIP/S

Cha

nnel

Channel CDI

IDFTDecode

...

...

Users 2:U

xu,k,d

xu,k,d

bu

burd

rd

H

Ωu,k = αu,k , Φu,k ,Ψu,k

Pu,k,d P−1u,k,d

su,k,d

su,k,d

su,k,d

su,k,dF

FF†

F†

Pu,k,d P−1u,k,d

Spectral efficiency rd = cd · Nt · log2 Md .where cd – coding rate; Md – Constellation Size of Modulation;

Linear precoders Pu,k,d : Nt × Nt complex matrices;Binary indicators su,k,d [t] ∈ 0, 1 – resource assignment table



Objectives of Resource Allocation

Network Throughput = Expected Sum of Average Mutual Information (AMI)

I(Pu,k,d |Ωu,k ) = Nt logMd −1

MNt

d

MNtd

∑

m=1

EHu,kEnu,k log

MNtd

∑

n=1

evmn/σ

2

where vmn = ‖Hu,kPu,k,d (xm,d − xn,d) + nu,k‖2 − ‖nu,k‖2.

Contrast to Gaussian Inputs with InstantaneousChannel State Information:

IG(Pu,k |Hu,k ) = log det(

I+Hu,kPu,kP†u,kH

†u,k/σ

2)

Solution is classic waterfilling for Gaussian inputsor mercury waterfilling for finite alphabet.However, Rx cannot distinguish symbol from noise;

Channel estimated at t1 is different from that at t2.−2

−1

0

1

2

−2

−1

0

1

20

0.05

0.1

0.15

0.2

0.25

Magnitude of xMagnitude of y

Pro

babili

ty

QPSK modulation vs. Gaussian inputs



Online vs. Offline

Energy dynamics: B[t + 1] = min(B[t] + G [t]− (E [t]− τPr )+,Bmax);

Battery states: B[t] = B1 : BM ; Harvested energy G [t] = G1 : GJ ;Actions: energy allocation E [t] = E1 : EN ;where

E [t] =

U∑

u=1

K∑

k=1

D∑

d=1

su,k,d [t] · Eu,k,d [t]

with Eu,k,d [t] = τ · Tr(

Pu,k,d [t] · PHu,k,d [t]

)



Online Resource Allocation

SDP:maximize

P,SR

(j)SUM

(

P ,S)

(j = 1 : T ; time slot index)

subject to B[T + 1] ≥ QB , E [t] ≤ τPr + B[t]

U∑

u=1

D∑

d=1

su,k,d [t] ≤ 1, ∀k , t.

Value function:

R(j)SUM

(

P ,S)

= Rj

(

P [1],S[1]∣

∣Ω,B[1])

+

T∑

t=j+1

EB

Rt

(

P [t],S[t]∣

∣Ω,B[t])

where P[t] = Pu,k,d [t] : ∀u, k , d −→ P = P[t] : ∀t

S[t] = su,k,d [t] : ∀u, k , d −→ S = S[t] : ∀t

Rt

(

P[t],S[t]∣

∣Ω,B[t])

= W ·

U∑

u=1

ρu

K∑

k=1

D∑

d=1

rd · su,k,d [t]

with I(

Pu,k,d [t]∣

∣Ωu,k

)

≥ rd · su,k,d [t], ∀u, k , d



Break the Curse of Dimensionality

Existing energy harvesting designs

• Offline designs with non-causalenergy arrival profile:[Ozel2011,JSAC],[Gregori2013,TC]

• Online designs with causalenergy arrival profile:SISO – [Ho2012,TSP]Heuristic – [Ng2013,TWC]

Our approach:

• Layered decomposition achievesglobal optimal solution with lowcomplexity

W. Zeng, Y.R. Zheng, and R. Schober, “Online Resource Allocation for Energy Harvesting Downlink Multiuser Systems: Precoding withModulation, Coding Rate, and Subchannel Selection,” IEEE Trans. Wireless Commun., Oct. 2015.



The Upper Layer: a One-Dimensional SDP Problem

maximizeE

RUL

(

E)

= R⋆

ML(E [1]∣

∣B[1])

+

T∑

t=2

EB

R⋆

ML

(

E [t]∣

∣B[t])

(1a)

subject to B[T + 1] ≥ QB (1b)

B[t + 1] = min

B[t] + G [t]− (E [t]− τPr )+,Bmax

, ∀t (1c)

E [t] ≤ τPr + B[t], ∀t (1d)

......

......

...

Time Slot

... ...Time Slot Time Slot Time Slot Time Slot

State

State

State

Direction for Backward Recursion

...

Time Slot

...

Virtual

Time Slot

... ...

... ...

......



The Middle Layer: a Knapsack Problem

maximizeS

RML(S) = W ·U∑

u=1

ρu

K∑

k=1

D∑

d=1

rd · su,k,d (2a)

subject to

U∑

u=1

K∑

k=1

D∑

d=1

su,k,d · ϕ⋆

u,k,d ≤ E ≤ τPr + B (2b)

U∑

u=1

D∑

d=1

su,k,d ≤ 1, ∀k , su,k,d ∈ 0, 1, ∀u, k , d (2c)

Method of Lagrangian Multiplier with µ for the energy constraint:Step 1 Solve K convex optimization problems:

maximizeSk

Lk(

Sk , µ)

subject to

U∑

u=1

D∑

d=1

su,k,d ≤ 1

where Lk(Sk , µ) =

U∑

u=1

D∑

d=1

su,k,d ·(

W · ρurd − µ · ϕ⋆

u,k,d

)



The Middle Layer Solution

Step 2 For a given µ, compute

g(µ) = maximizeS

L(S, µ) =

K∑

k=1

Lk(Sk , µ) + µ · E

Step 3 Solve forminimize

µ≥0g(µ)

Since g(µ) is piecewise linear in µ → Step 3 is solved by subgradient-aidedbisection search.The gradient is computed at g(µ) at µ = (µl + µh)/2 as

∇µg(µ) = E −U∑

u=1

K∑

k=1

D∑

d=1

s(µ)u,k,d · ϕ⋆

u,k,d

where s(µ)u,k,d : Optimal solution in Step 1 with a given µ.



The Lower Layer: Power Allocation

Rate maximization:

maximizeP

I(P∣

∣Ωu,k )

subject to τ · Tr(

P · PH)

≤ E

Two Optimization Methods:

• Convex optimization w.r.t GramMatrix (HP)†HP;

• Maximize a Tight Upper Bound ofMutual Information. −20 −10 0 10 20

0

1

2

3

4

H = [2 1; 1 1], Hd = svd(H) = [2.6180 0; 0 0.3820]

Mut

ual

Info

rmat

ion

(bi

ts/s

/Hz)

SNR (dB)

Hd, Gaussian, CWF

Hd, Gaussian, No WF

Hd, QPSK, Optimal P

Hd, QPSK, Max Div

Hd, QPSK, M/WF

Hd, QPSK, No WF

Hd, QPSK, CWF

The dual problem:

ϕ⋆

u,k,d = minimizeP

ϕ(P) = τ · Tr(

P · PH)

(4a)

subject to I(P∣

∣Ωu,k) ≥ rd , ∀u, k , d . (4b)

C. Xiao, Y. R. Zheng, and Z. Ding, “Globally Optimal Linear Precoders for Finite Alphabet Signals over Complex Vector Gaussian Channels,”IEEE Trans. SP, July 2011.W. Zeng, C. Xiao, M. Wang, and J. Lu, “Linear precoding for finite alphabet inputs over MIMO fading channels with statistical CSI,” IEEE Trans.

SP, June 2012.



The Lower Layer Solution

Step 1: SVD channel correlation matrix Φu,k = Uu,kΣu,kVu,k

Let P = U ·√

Diag(λ) · V. Set U = Uu,k .Set V as the modulation diversity matrix.

Step 2: Approximate the average mutual information by

IA(λ|Ωu,k ) = Nt logMd−1

MNtd

MNtd

∑

m=1

log

MNtd

∑

n=1

∏

q

(

1+αu,kψ

(q)u,k

2σ2e(d)Hmn Diag(λ)Σu,k e

(d)mn

)−1

.

where ψ(q)u,k is the q-th eigenvalue of the receive correlation matrix Ψu,k .

Step 3: Solve for the power allocation λ by

minimizeλ

τ · 1Tλ

subject to IA(

λ∣

∣Ωu,k

)

≥ rd and λ ≥ 0

With the interior-point algorithm with the log-barrier function.



Simulation Setup

• Network Resource Profile # of subchannels K = 128, # of MCR schemes D = 12; # of Users U = 9, user priority weights ρu = 1 for all u ;

• MIMO Fading Channel Profile COST231-Hata propagation model with 6 multipath taps Per User: 2× 2 MIMO transmit correlation matrix Ψt = Ψ(0.95) receive correlation matrix Ψr = Ψ(0.5) the (i , j)-th element of Ψ(ρ) follows the exponential model

[Ψ(ρ)]i,j = ρ|i−j|, ρ ∈ [0, 1) and i , j = 1, · · · ,Nt or Nr

• Energy Harvesting Profile initial battery state: 1 unit; battery capacity Bmax = 10 harvested energy Ek , ∀k , takes its value in set 0,1,2 with equal

probability; the length of time slot τ = 1. battery state set and power choice set are discretized as 0, 1, · · · ,Bmax



Performance: Upper Layer

0 0.5 1 1.5 2 2.5 3400

500

600

700

800

900

1000

Power from Stable Energy Source: Pr

Sum

of A

vera

ge M

utu

al In

form

ation (

Mbit)

Proposed Precoder: Dynamic Programming

Proposed Precoder: Greedy Method

No Precoding: Dynamic Programming

No Precoding: Greedy Method

Max. Capacity: Dynamic Programming

Max. Capacity: Greedy Method

Achievable AMI vs. regular power source Pr with T = 40 and Bmax = 10.



Performance: Middle Layer

−10 −5 0 5 105

10

15

20

25

30

Energy Consumption (dB)

Sum

of A

vera

ge M

utu

al In

form

ation (

Mbps)

Max. Capacity: Gaussian inputs with Subchannel Selection

Linear Programming: MCR and Subchannel Selection

Proposed Precoder: MCR and Subchannel Selection

No Precoding: MCR and Subchannel Selection

Max. Capacity: MCR and Subchannel Selection

The proposed algorithm approaches the Gaussian input upper bound.



Performance: Lower Layer

−20 −15 −10 −5 0 5 10

1

2

3

4

5

6

7

Energy Consumption (dB)

Avera

ge M

utu

al In

form

ation (

bps/H

z)

Max. Capacity: Gaussian

Proposed Precoder: BPSK, QPSK, 16−QAM

No Precoding: BPSK, QPSK, 16−QAM

Max. Capacity: BPSK, QPSK, 16−QAM

BPSK, 1/2, 2

/3, 3/4, 4

/5Q

PSK, 1/2

, 2/3

, 3/4

, 4/5

16−Q

AM, 1

/2, 2

/3, 3

/4, 4

/5

Energy consumption vs. AMI for 2× 2 correlated MIMO channels.



Concluding Remarks

• Online resource allocation for energy harvesting wireless networks:multi-dimensional SDP problem.

• An innovative layered decomposition approach solves the SDPoptimization with preserved optimality

• Computational complexity with KUD subchannels, MJN battery/energylevels, and 2N2

t KUD real-valued precoder coefficients:

Direct computation of SDP:

O(

TMJ · (N2N2t KUD + 2KUD) · fO

)

fO = complexity for computing theMIMO Average Mutual Information.

The Proposed 3-layer algorithm:

O(

TMJN+N ·KUD+√

Nt ·KUD ·fA)

.

fA = 7.2× 10−8fO for QPSKfA = 3.2× 10−6fO for BPSK.



Security in Cognitive Radio Networks

maximizeP

Rsc

(

P|Ω)

=

I∑

i=1

min1≤j≤J

[

EHiI(xi ; yi)− EGj

I(xi ; zj)]+

subject to Tr(P†P) ≤ γ0 & Tr(P†ΘfkP) ≤ γk , ∀k .where Ω = Θhi ,Θgj ,Θfk, ∀i , j , k .

SR-I

SR-2

SR-1PR-1

PR-2

PR-K

ED-1 ED-2 ED-J

... ...

ST

...

H1

H2

HI

...

G1 G2... GJ

F1

F2

...

FK

ST – secondary transmitter to design precoder P;PR-k – kth primary user with channel Fk ;SR-i – ith secondary receiver with channel Hi ;ED-j – jth eavesdropper with channel Gj ;Algorithms:

• ST utilizes channel statistical information Ω;

• Difference of Convex Functions in terms ofW = P†H†HP;

• Generalized Quadratic Matrix Programming(GQMP).

J. Jin, Y.R. Zheng, W. Chen, and C. Xiao, “Generalized Quadratic Matrix Programming: A Unified Framework for Linear Precoding WithArbitrary Input Distributions,” IEEE Trans. Signal Processing. Accepted June 2017.



High Data-Rate Underwater Acoustic Communications

Severe Multipath and Doppler Fading Channels Cause Phase Rotation.

Doppler (Hz)

Del

ay (

ms)

Scattering function (moving−source)

10 15 200

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0.5

1

1.5

2

2.5

3

x 106

Doppler (Hz)

Del

ay (

ms)

Scattering function (fixed−source)

−5 0 50

2

4

6

8

10

12

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

6

−1 −0.5 0 0.5 1

x 105

−1

−0.5

0

0.5

1

x 105

Imag

inar

y

Real

Received signals of first channel

−1.5 −1 −0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

1

1.5

Imag

inar

y

Real

Equalized signals of 4 channels

−1.5 −1 −0.5 0 0.5 1 1.5

−1.5

−1

−0.5

0

0.5

1

1.5

Imag

inar

y

Real

Symbols after initial phase correction

−1 −0.5 0 0.5 1

−1

−0.5

0

0.5

1

Imag

inar

y

Real

Phase−corrected signals of 4 channels

Algorithms:• Multiple-Input

Multiple-Output(MIMO);

• TD and FD TurboEqualization;

• Adaptive ChannelEstimation or DirectAdaptation.

MIMOLC-MMSE equalizer

H

MAP decoder

y

Iterative MIMOchannel estimator

Pilot

2σ

1Π

1,ˆ

pb

Π

1,( )Ee kL c

1,( )De kL c

1, '( )Ee kL c

1, '( )De kL c

DE

MU

XM

UX

LLRs

MAP decoder

1 Π

,ˆ

N pb

Π

,( )Ee N kL c

,( )De N kL c

, '( )Ee N kL c

, '( )De N kL c

⋮

⋮

⋮ ⋮ ⋮

LLRs

s

Y.R.Zheng, J. Wu, and C. Xiao, “Turbo Equalization for Underwater Acoustic Communications,” IEEE Commun. Mag., Vol. 53, No. 11, pp.79-87, Nov. 2015.Y.R.Zheng and Weimin Duan, “Improved Turbo Receivers for underwater acoustic communications,” US Patent Application #62483358.Pending. April 9, 2017.



Compressive Sensing Using L1 Optimization

Near-Field SAR Image Reconstruction

( )zkkS yx ,,

( )zyxs ,,

( )zyx kkkF ,,′

( )ω,, yx kkF

Truncation Repair

SAR R-SAR

(Truncation)

( )ω,, yxf

.1−NU!

.12−D" .2D#

.NU$

222

2 zyx kkk ++= νω222

2yxz kkk −−%

&

'()

*=νω

.2D#

ReferenceForward

ReverseReference

.12−D"

( )ω,,ˆ yxf

( )zyx kkkF ,,

I

)(Ω )( 1−Ω

3D SAR Transform by NUFFT:s(x , y , z) = F−1

2D

F−1NU

F (kx , ky , ω)

with F (kx , ky , ω) = F2D f (x , y , ω) exp(−jz0kz)and wavenumber kz =

√

4k2 − k2x − k2

y

Matrix form CS-SAR by minimizating:

J (s) = λℓ ‖Ψs‖1 + λtTV(s) +∥

∥

∥f −ΦΩ

−1s∥

∥

∥

2

2

where Ψ – sparsifying domain; Φ –measurement matrix; Ω – SAR transform.Algorithms:

• Adaptive Bases Selection within CSIterations (i.e. OMP, ADM, CG);

• Split-Bregman Algorithm for 3D CS-SARw/ Fast Convergence;

• Adaptive Measurement via Non-referenceImage Quality Assessment.

D. Bi and L. Ma and X. Xie and Y. Xie and X. Li and Y. R. Zheng, “A Splitting Bregman-Based Compressed Sensing Approach for Radial UTEMRI,” IEEE Transactions on Applied Superconductivity, Vol. 26, No. 7, pp. 1-5. Oct. 2016.X. Yang, Y.R. Zheng, M. T. Ghasr, and K. Donnell Hilgedick, “Microwave Imaging from Sparse Measurements for Nearfield Synthetic ApertureRadar (SAR),” IEEE Trans. Instrumentation Measurement, In press Mar. 2017.


Download - Stochastic Dynamic Programming for Network Resource Allocationyrz218/Resources/Zheng2017... · Stochastic Dynamic Programming for Network Resource Allocation Yahong Rosa Zheng Department

Top Related