jorangewireless systems design in the beyond 5g era

Wireless Systems Design in the Beyond 5G Era:Promises of Deep Learning and DeepReinforcement Learning

Ekram Hossain09 November 2020

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Outline

. Part I: Introduction

. Part II: NN, DNN, and Deep Learning

. Part III: DL and DRL Models for Resource Allocation

. Part IV: Conclusion

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 2 / 50


Evolution toward next generation wireless networks

Cell-edge

User

Relay

Base

Station

Wireless Access LinkBackhaul Link

Data Internet

InternetPico Cell

Dense Small Cells

Relay-aided D2D

communication

D2D Link

M2M device/sensor

M2M Gateway

Indoor Femto

Access Point

Femto Gateway

C-RAN

Remote Radio Head

Fronthaul Link

Internet

Enhanced 5G

RAN

Evolution of B5G

SDN Controller

Core Network

Edge Servers

Enhanced Evolved Packet

Core (EPC)

Mm-wave Massive

MIMO BSs

Aerial Drones

Underwater

Drones

Cell free

Massive MIMO

VLC

Smart Grid/ Energy Internet

Group Mobility4G/5G Multi-tier

Cellular Network

Single-tier Cellular

Network

Core Network



Next generation wireless networks: Beyond 5G

Internet-of-EverythingConverged terrestrial and non-terrestrial communication networks

Massive capacity, massive connectivity, energy e�iciency

Programmable radio environment

Convergence of communications and computing

Virtualization/network slicing and “SDN”-ization of convergedcommunication networks



Network management and control in B5G era

AI-based super intelligent RAN for 6Gresource allocation (communications and computing) optimization,cell and radio design, network personalization, etc.

Optimal solutions are obtained by applyingexhaustive search methods, genetic algorithms, combinatorial, andbranch and bound techniques

Incur significantly high time and computational complexity



Network management and control in B5G era

Sub-optimal solutions obtained based on techniques such as◦ Lagrangian relaxations, iterative distributed optimization, heuristic

algorithms, and game theory

Also very computation intensive and/or may not be feasible forlarge cellular networks due to high signaling overhead

Sub-optimal solutions can be far from optimal solutions and theirconvergence properties and the optimality gap could be unknown.



Machine learning basicsEssence of data-driven machine learning:

A pa�ern exists which cannot be pinned down mathematically.

Data is available.

Learn from input data set by analyzing data statistically.

Make decision or predict output without being explicitlyprogrammed.

Typical applications: Image recognition, speech recognition,medical diagnosis, etc.

Applications to cellular communications: Decoding, modulationclassification, spectrum sensing, localization, mobility prediction,hando� management, resource optimization, load balancing/cellassociation, power control, fault detection, fault classification,caching, tra�ic classification, congestion prediction



Machine learning basicsLearning diagram

f:

(ideal credit approval function)

(historical records of credit customers)

HYPOTHESIS SET

(set of candidate formulas)

ALGORITHMLEARNING FINAL

HYPOTHESIS

UNKNOWN TARGET FUNCTION

(final credit approval formula)

TRAINING EXAMPLES

X Y

x y x yNN11( , ), ... , ( , )

H

A

g ~ f ~

⃝ AML

∗ Yaser Abu-Mostafa, lecture slides on “Learning from Data",h�ps://work.caltech.edu/lectures.html#lectures



Resource management based on machine learning

With machine learning (ML), solutions are learned by exploitingthe data samples

ML tools can be used to obtain practical solutions for radioresource allocation problems in a large cellular network given thepast optimal or near-optimal resource allocation decisions.

ML-based resource allocation algorithms can be implementedonline.

Note: Deep learning (DL) is a sub-class of ML based on “DeepNeural Networks" which use multiple layers of nonlinear processingunits.




No performance guarantee (suboptimal performance)

Lack of interpretability (blackbox mapping of inputs to outputs)

Depends on the availability of data

O. Simeone, “A very brief introduction to machine learning with applications tocommunication systems," IEEE Trans. on Cognitive Communications and Networking, vol. 4, no.4, Dec. 2018.



When to use machine learning?

No mathematical model or e�icient algorithm (modeling and/oralgorithmic deficit)

Task involves a function that maps well-defined inputs towell-defined outputs

The function does not change rapidly over time

Large data sets can be made available

Error can be tolerated and no need for optimal solutions



Resource management based on machine learningExample: Sub-band and power allocation in multi-cell networks

Consider a downlink cellular network of K base stations(k ∈ {1, · · · ,K}), F frequency sub-bands

Bandwidth of each sub-band is B, power allocated by cell k infrequency sub-band f is Pk,f which is discrete

Total power of cell k is limited by a maximum value Pmaxk such

that∑

f∈F Pk,f ≤ Pmaxk , ∀k ∈ {1, · · · ,K}, Uk = set of users

associated with cell k

Vector Ak,f denotes allocation of sub-band f in cell k and Pk,fdenotes allocation of power in sub-band f in cell k

K. I. Ahmed, H. Tabassum, and E. Hossain, “Deep learning for radio resource allocation inmulti-cell networks," h�ps://arxiv.org/abs/1808.00667




Sub-band and power allocation in multi-cell networks (contd.)

Sum-utility for all cells:

U =∑

k∈{1,··· ,K}

∑u∈Uk

F∑f=1

[I(Ak,f = u)B log (1 + αSINRu,k,f )]

where α = −1.5/ log(5BER)



Resource management based on machine learningSub-band and power allocation in multi-cell networks (contd.)Optimization problem:

maximizeAk,f ,Pk,fU

subject to∑

f∈F Pk,f ≤ Pmaxk , ∀k ∈ {1, · · · ,K}

A non-convex combinatorial optimization problemOptimal solution can be obtained by an exhaustive search.With K cells, F sub-bands, and P discrete power levels, there will

be (PF )K = PKF possible combinations available for powerse�ing at the base stations

e.g. for K = 15, F = 5, P = 5, 312515 possible combinations =⇒infeasible

For joint sub-band and power allocation: (∑

k Uk)F × PKF

possibilities



Resource management based on machine learningExample: Optimal user association for sum-rate maximization in aninterference channel

Assume K transmi�ers and M receivers, each transmi�er canserve only one user, one receiver can be served by at most amtransmi�ers, and minimum rate constraint for each user

SINR for transmission from k to m:

γk,m =pkgkm

σ2 +∑

l 6=k plglm

User association variable:

ρk,m =

{1, if transmi�er k serves receiver m0, otherwise.

Sum rate = B∑K

k=1

∑Mm=1 ρk,m log2(1 + γk,m)

A. Zappone, M. di Renzo, and M. Debbah, “Wireless networks design in the era of deeplearning: Model-based, AI-based, or both?," arXiv:1902.02647v1



Resource management based on machine learningProblem formulation:

maxρ

K∑k=1

M∑m=1

ρk,m log2(1 + γk,m)

s.t.

M∑m=1

ρk,m ≤ 1, ∀k ∈ {1, · · · ,K}

K∑k=1

ρk,m ≤ am, ∀m ∈ {1, · · · ,M}

M∑m=1

ρk,m log2(1 + γk,m) ≥ R(min)k , ∀k ∈ {1, · · · ,K}

ρk,m ∈ {0, 1}, ∀k ∈ {1, · · · ,K}, ∀m ∈ {1, · · · ,M}



Machine learning basics

Categories: Supervised, unsupervised, and reinforcement learning

Supervised learning:Given data set D = {(x1, y1), (x2, y2), · · · , (xN, yN )}, predict ythat generalizes the input-output mapping in D to inputs x outsideD.

Classification (discrete output) and regression (continuous output)problems

Classification accuracy (or error rate) and root mean square error(RMSE)

Common supervised learning techniques: Bayesian classification,K-Nearest Neighbor (KNN), Neural Network (NN), Support VectorMachine (SVM), Decision Tree (DT) classification, recommendersystem



Neural Networks (NNs)

Defines a mappingg(x, θ) : Rn → Rk of an inputvector x ∈ Rn to an outputvector y ∈ Rk.

Consist of basic componentsknown as neurons or nodes

Three layers: input layer,hidden layers, and outputlayer.

Nodes can performnon-linear functions.

Neural network



Neural networks

Mathematically, for neuron k ina hidden layer

vk =

m∑j=1

wkjxj + bk

yk = ϕ(vk)

Commonly used learningalgorithm: Back propagation algo

Non-linear model of a neuron



Neural Networks

Activation functions:Sigmoidal: ϕ(x) = 1

1+e−x (generally used in output layer)

Hyperbolic tangent: tanh(x) = ex−e−x

ex+e−x (generally used in outputlayer)

Rectified linear unit: ReLU (x) = max(0, x) (generally used in thehidden layers)

Refinements of ReLU, e.g. Leaky ReLU, exponential ReLUSo�max: exi∑

j exj (generally used in output layer)



Neural Networks

Training procedure:

Target function y = f∗(x). Here, x is the input vector and y isthe output vector

y = f(x; θ), where θ denotes the unknown parameters, i.e.weights and biases

Goal is to learn θ precisely so that our model can be closer to theoriginal one



Neural Networks

Training procedure: feed-forward and back-propagation neuralnetwork

A training dataset composed of inputs and outputs is typicallyused to train the model

Initialize the weights and biases randomly and feed the inputs tothe input layer.

Forward pass: Propagation of information from the input layer tothe output layer.

A cost function (e.g. mean squared error [MSE])

to measure the quality of the model by calculating the error betweenthe predicted and the original value



Neural Networks

Training procedure (contd.):

Backward pass: The error signal is propagated backward throughthe hidden layers and updates the weights and biases of each layer

E.g. standard gradient descent, algo. to optimize weights ofneurons based on the gradient of loss function

Training process continues until the error rate reaches a thresholdvalue.



Example: Learning XOR

Target function: f∗(X1, X2)

Model function: y = f(X1, X2; θ)

Learning algorithm will adapt the parameters θ to make f assimilar as possible to f∗.

[X1, X2] ∈ {[0, 0], [0, 1], [1, 0], [1, 1]}MSE loss function:

e(θ) = 14

∑X1,X2

(f∗(X1, X2)− f(X1, X2; θ))2



Deep Learning (DL) and Deep Neural Network (DNN)

DL: A branch of machinelearning which can be supervised,semi-supervised or unsupervised.

DNN: A deep learningarchitecture consisting of manyhidden layers

Improves performance at afaster rate when the dimensionof the training data (no. offeatures) increases

Deep neural network



Reinforcement Learning

An agent learns by interacting with the environment.

The agent senses its current state and then chooses an action.

For every action, the agent either receives a reward for a goodmove or a penalty for a bad move.

The primary job of the agent is to maximize the expectedcumulative reward through a series of actions.

MDPs formally describe an environment for RL where theenvironment is fully observable.



Q Learning

An MDP models a sequential decision-making problem where thestate transition (P) and reward function (R) depend on only thecurrent state and the applied action.

In case state transition P (i.e. which gives the next state for anystate-action pair) and the reward function (R) are known, the MDPcan be solved through Dynamic Programming.

If P and R are not known, Q-learning method can be used toobtain the optimal action-value function.

An action-value function determines the value of being in acertain state and taking a certain action at that state.



Q Learning

Q-learning is a technique that evaluates which action to takebased on an action-value function (e.g. Q(s, a) value).

Main idea of Q-learning: “explore" all possibilities ofstate-action pairs and estimate the Q(s, a) value (i.e. the long-termreward that will be received by applying an action in a state).

Bellman equation:Q(st+1, a) = (1− α)Q(st, a) + α [R(st, a) + γ maxa′ Q(s′t, a

′)]

Optimal policy: π∗(s) = arg maxaQ∗(s, a)



Deep Reinforcement Learning (DRL)In RL, a policy is stored in tabular form.

Traditional RL is limited to low dimensional state and actionspace.

Not feasible for large action and state space

Therefore, instead of a tabular method, a function approximationsuch as DNN can be used.



DQL (contd.)

2019-11-19, 2(44 PMIntroduction to Deep Q-Learning for Reinforcement Learning (in Python)

Page 9 of 21https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/

(https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2019/04/Screenshot-2019-04-16-at-5.46.01-PM.png)

So, what are the steps involved in reinforcement learning using deep Q-learning networks(DQNs)?

1. All the past experience is stored by the user in memory2. The next action is determined by the maximum output of the Q-network3. The loss function here is mean squared error of the predicted Q-value and the target Q-

value – Q*. This is basically a regression problem. However, we do not know the targetor actual value here as we are dealing with a reinforcement learning problem. Goingback to the Q-value update equation derived fromthe Bellman equation. we have:

The section in green represents the target. We can argue that it is predicting its own value,but since R is the unbiased true reward, the network is going to update its gradient usingbackpropagation to _nally converge.

Challenges in Deep RL as Compared to Deep Learning

h�ps://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/



DL-based resource allocation model

System model

Recall: Downlink multi-cell network with K cells and Ffrequency sub-bands

Cu,k is CQI vector of user u in cell k:Cu,k := (Cu,k,1, Cu,k,2, . . . , Cu,k,F ).

A location indicator Vu,k is used to indicate whether a user u ofcell k is cell-centre user or cell-edge user:

Vu,k =

{1, if Ru,k > R/2

0, otherwise

where Ru,k = distance of user u in cell k from the BS and R = cellradius.



Supervised DL approach

DL model takes Cu,k vector along with Vu,k of all users as inputand predict power and sub-band allocations as output, e.g. for aparticular user the input is:

CQI vectorLocationindicator

13 9 7 1

For 5 BSs, 5 users/cell, 3 frequency sub-bands =⇒ input data sizeis 5× 5× (3 + 1) = 100

For K cells, U users/cell and F sub-bands, size of input data =(K × U × (F + 1))




Convert output data from decimal base to n-bit binary and alsouse their complements in the output

Example output for a particular cell (36 bits), with 3-bit binaryrepresentation along with complements

Power allocation Sub-band allocation1 1 3 3 5 2

bin comp bin comp bin comp bin comp bin comp bin comp001 110 001 110 011 100 011 100 101 010 010 101

With 5 cells, total number of outputs = 5× 36 = 180.




Data generation and Testing:

To train and test our DNN model, we use GA.

Training dataset consists of input data and labeled target data (oroutput data)

DNN model for power and sub-band allocation:




System operation:

All users in the network periodically send their CQI values totheir serving BSs.

BS then sends the processed information of all users to a centralentity (e.g. SDN controller), which runs the DNN model.

DNN model generates allocation vector and power vector for allthe BSs

Once the prediction is made, SDN controller sends back theallocation and power vectors to BSs.



Supervised DL approachSimulation parameters:K = 5 BSs, R = 500 m, Pmax = 40 W, directional antenna,

number of users per cell = 5, B = 2.88 MHz, N0 = −174 dBm/Hz, F =3, P = 3, and power levels = {6.4, 12.8, 19.2}W.Input data generation:

Using GA, produce around 17000 samples for our DNN model.Use 80% of this data-set for training and the remaining 20% for

testing.Comparison between exhaustive search and genetic algorithm

Parameter Exhaustive Search Genetic AlgorithmAvg. Execution time 1460.00 sec. 118.76 sec.Max. Execution time 2247.50 sec. 158.65 sec.Min. Execution time 1509.00 sec. 95.13 sec.Accuracy 100% 85.25%




Simulation results:

Test accuracy vs. number ofhidden layers

2 3 4 5

Number of Hidden Layers

50

55

60

65

70

75

80

85

90

95

100

Tes

t Acc

urac

y (%

)

Test Accuracy Vs. Number of Hidden Layers

Test accuracy vs. number ofsamples

4000 6000 8000 10000 12000

Number of Training Samples

50

55

60

65

70

75

80

85

90

95

100

Acc

urac

y (%

)

Accuracy Vs. Number of Training Samples

Training accuracyTest accuracy




Simulation results:

We achieve the maximum test accuracy of 86.31% for 4 hiddenlayers.

Continuous increase of the number of hidden layers will notimprove the accuracy significantly and in some cases the modelmay even start to learn noisy features.

The training accuracy keeps increasing with the trainingexamples, and a�er some specific training examples, it starts tosaturate.

With a further increase in training data, the model starts tooverfit the data leading to reduced test accuracy



DQL Model for Power Allocation

State and Action:Use Cu,k vector along with Vu,k of all users in a network as state.

For K cells, U users/cell, and total F subbands, the state size is(K × U × (F + 1)).

An action corresponds to a power allocation.

With P power levels, total number of power combinations in a cell =PF

Let m be the total number of combinations possible in a cell a�erapplying the maximal power constraint.

With K cells, the agent has to take K actions from a total of K ×mactions.



DQL Model for Power AllocationTraining

Use Deep Q-learning with experience replay approach totrain the DQL based power allocation model.

Testing

Need to compare the power allocation the solution with theoptimal power allocation

A Genetic Algorithm (GA) is used to find the near-optimal powerallocation.

K. I. Ahmed, H. Tabassum, and E. Hossain, “A deep Q-learning method for downlink powerallocation in multi-cell networks," h�ps://arxiv.org/abs/1904.13032



DQL With Experience Replay

All the past experience is stored

The next action is determined by the maximum output of theQ-network

The loss function here is mean squared error of the predictedQ-value and the target Q-value.

However, how to obtain the target or actual value? Recall theQ-value update equation (the section in green represents thetarget):




DQL With Experience Replay

Instead of using one neural network for learning, we can use two:prediction network and target network

Target network has the same architecture as the predictionnetwork, but with frozen parameters.

For every C iterations (a hyper-parameter), the parameters fromthe prediction network are copied to the target network.




DQL Model for Power AllocationTesting (contd.)

DQL model compared with di�erent power allocation schemes interms of the total network throughput.

Power allocation schemes used for comparison: WMMSE, randompower allocation, maximum power allocation

WMMSE: a distributed approach to sum-rate maximization underpower constraint

Q. Shi et al., “An iteratively weighted MMSE approach to distributed sum-utilitymaximization for a MIMO interfering broadcast channel," IEEE Trans. on Signal Processing,vol. 59, no. 9, pp. 4331–4340.



Performance of DQL ModelSimulation parameters:

K = 5 BSs, R = 500 m, Pmax = 40 W, number of users per cell =5, B = 2.88 MHz, N0 = −174 dBm/Hz, F = 3, P = 5, and power levels= {6.4, 9.6, 12.8, 16, 19.2}W.

Training parameters:

Parameter ValueNumber of hidden layers 2

Layers{Input, Hidden Layer,Hidden Layer, Output}

No. of neurons per layer {100, 1440, 720, 360}Replay memory size 80, 000Batch size 64Update target frequency 1000Learning rate 0.00025Optimizer RMSprop



Performance of DQL Model

Simulation results:

Normalized throughput vs.testing samples.

160 180 200 220 240 260Testing Sample

0.90

0.92

0.94

0.96

0.98

1.00

Norm

aliz

ed T

hro

ughput

GA (near-optimal)DRLAvg. (DRL)Max. PowerAvg. (Max. Power)

RandomAvg. (Random)WMMSEAvg. (WMMSE)

Normalized error vs. testingsamples

180 200 220 240 260Testing Sample

0

2

4

6

8

Norm

aliz

ed E

rror

(%)

DRLAvg. Error (DRL)Max. PowerAvg. Error (Max. Power)

RandomAvg. Error (Random)WMMSEAvg. Error (WMMSE)



Performance of DQL Model

Simulation results: (contd.)

Avg. normalized throughputvs. learning rate

0.00025 0.0025 0.025

Learning Rate

0.9900

0.9905

0.9910

0.9915

0.9920

0.9925

0.9930

Avg.

Norm

aliz

ed T

hro

ughput

5 cell

Avg. normalized throughputvs. number of hidden layers

1 2 3

Number of Hidden Layers

0.9900

0.9905

0.9910

0.9915

0.9920

0.9925

0.9930

Avg.

Norm

aliz

ed T

hro

ughput

5 cell



DQL Model for Power Allocation

Simulation results (contd.)

Proposed DQL model performs be�er compared to other powerallocation schemes

For a high learning rate, performance of DQL degrades

A continuous increase of the number of hidden layers degradesthe performance significantly

In our set up, we achieve the maximum value of the averagenormalized throughput for a DQN with 2 hidden layers and alearning rate of 0.0025.



Final remarks

6G wireless networks will need to be “super intelligent"(self-reconfigurable, context-aware, secure)

AI and ML are suitable to address complex “6G wireless" problems(e.g. resource allocation in massive networks, network control,security)

Application of deep learning in scenarios with massiveconnections and spatio-temporal correlations



Final remarks

Crucial to develop e�icient o�line methods to generateoptimal/near-optimal data samples for training the DNNarchitectures.

Deep reinforcement learning models are particularly suitable (e.g.to solve optimization/“end-to-end" learning problems)

Very problem specific

Leverage state-of-the-art DRL techniques

Distributed and federated learning techniques


Thank you!

jorangewireless systems design in the beyond 5g era

Documents