jorangewireless systems design in the beyond 5g era

50
Wireless Systems Design in the Beyond 5G Era: Promises of Deep Learning and Deep Reinforcement Learning Ekram Hossain 09 November 2020

Upload: others

Post on 08-Jun-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: JOrangeWireless Systems Design in the Beyond 5G Era

Wireless Systems Design in the Beyond 5G Era:Promises of Deep Learning and DeepReinforcement Learning

Ekram Hossain09 November 2020

Page 2: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Outline

. Part I: Introduction

. Part II: NN, DNN, and Deep Learning

. Part III: DL and DRL Models for Resource Allocation

. Part IV: Conclusion

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 2 / 50

Page 3: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Evolution toward next generation wireless networks

Cell-edge

User

Relay

Base

Station

Wireless Access LinkBackhaul Link

Data Internet

InternetPico Cell

Dense Small Cells

Relay-aided D2D

communication

D2D Link

M2M device/sensor

M2M Gateway

Indoor Femto

Access Point

Femto Gateway

C-RAN

Remote Radio Head

Fronthaul Link

Internet

Enhanced 5G

RAN

Evolution of B5G

SDN Controller

Core Network

Edge Servers

Enhanced Evolved Packet

Core (EPC)

Mm-wave Massive

MIMO BSs

Aerial Drones

Underwater

Drones

Cell free

Massive MIMO

VLC

Smart Grid/ Energy Internet

Group Mobility4G/5G Multi-tier

Cellular Network

Single-tier Cellular

Network

Core Network

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 3 / 50

Page 4: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Next generation wireless networks: Beyond 5G

Internet-of-EverythingConverged terrestrial and non-terrestrial communication networks

Massive capacity, massive connectivity, energy e�iciency

Programmable radio environment

Convergence of communications and computing

Virtualization/network slicing and “SDN”-ization of convergedcommunication networks

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 4 / 50

Page 5: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Network management and control in B5G era

AI-based super intelligent RAN for 6Gresource allocation (communications and computing) optimization,cell and radio design, network personalization, etc.

Optimal solutions are obtained by applyingexhaustive search methods, genetic algorithms, combinatorial, andbranch and bound techniques

Incur significantly high time and computational complexity

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 5 / 50

Page 6: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Network management and control in B5G era

Sub-optimal solutions obtained based on techniques such as◦ Lagrangian relaxations, iterative distributed optimization, heuristic

algorithms, and game theory

Also very computation intensive and/or may not be feasible forlarge cellular networks due to high signaling overhead

Sub-optimal solutions can be far from optimal solutions and theirconvergence properties and the optimality gap could be unknown.

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 6 / 50

Page 7: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Machine learning basicsEssence of data-driven machine learning:

A pa�ern exists which cannot be pinned down mathematically.

Data is available.

Learn from input data set by analyzing data statistically.

Make decision or predict output without being explicitlyprogrammed.

Typical applications: Image recognition, speech recognition,medical diagnosis, etc.

Applications to cellular communications: Decoding, modulationclassification, spectrum sensing, localization, mobility prediction,hando� management, resource optimization, load balancing/cellassociation, power control, fault detection, fault classification,caching, tra�ic classification, congestion prediction

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 7 / 50

Page 8: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Machine learning basicsLearning diagram

f:

(ideal credit approval function)

(historical records of credit customers)

HYPOTHESIS SET

(set of candidate formulas)

ALGORITHMLEARNING FINAL

HYPOTHESIS

UNKNOWN TARGET FUNCTION

(final credit approval formula)

TRAINING EXAMPLES

X Y

x y x yNN11( , ), ... , ( , )

H

A

g ~ f ~

⃝ AML

∗ Yaser Abu-Mostafa, lecture slides on “Learning from Data",h�ps://work.caltech.edu/lectures.html#lectures

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 8 / 50

Page 9: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Resource management based on machine learning

With machine learning (ML), solutions are learned by exploitingthe data samples

ML tools can be used to obtain practical solutions for radioresource allocation problems in a large cellular network given thepast optimal or near-optimal resource allocation decisions.

ML-based resource allocation algorithms can be implementedonline.

Note: Deep learning (DL) is a sub-class of ML based on “DeepNeural Networks" which use multiple layers of nonlinear processingunits.

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 9 / 50

Page 10: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Resource management based on machine learning

No performance guarantee (suboptimal performance)

Lack of interpretability (blackbox mapping of inputs to outputs)

Depends on the availability of data

O. Simeone, “A very brief introduction to machine learning with applications tocommunication systems," IEEE Trans. on Cognitive Communications and Networking, vol. 4, no.4, Dec. 2018.

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 10 / 50

Page 11: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

When to use machine learning?

No mathematical model or e�icient algorithm (modeling and/oralgorithmic deficit)

Task involves a function that maps well-defined inputs towell-defined outputs

The function does not change rapidly over time

Large data sets can be made available

Error can be tolerated and no need for optimal solutions

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 11 / 50

Page 12: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Resource management based on machine learningExample: Sub-band and power allocation in multi-cell networks

Consider a downlink cellular network of K base stations(k ∈ {1, · · · ,K}), F frequency sub-bands

Bandwidth of each sub-band is B, power allocated by cell k infrequency sub-band f is Pk,f which is discrete

Total power of cell k is limited by a maximum value Pmaxk such

that∑

f∈F Pk,f ≤ Pmaxk , ∀k ∈ {1, · · · ,K}, Uk = set of users

associated with cell k

Vector Ak,f denotes allocation of sub-band f in cell k and Pk,fdenotes allocation of power in sub-band f in cell k

K. I. Ahmed, H. Tabassum, and E. Hossain, “Deep learning for radio resource allocation inmulti-cell networks," h�ps://arxiv.org/abs/1808.00667

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 12 / 50

Page 13: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Resource management based on machine learning

Sub-band and power allocation in multi-cell networks (contd.)

Sum-utility for all cells:

U =∑

k∈{1,··· ,K}

∑u∈Uk

F∑f=1

[I(Ak,f = u)B log (1 + αSINRu,k,f )]

where α = −1.5/ log(5BER)

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 13 / 50

Page 14: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Resource management based on machine learningSub-band and power allocation in multi-cell networks (contd.)Optimization problem:

maximizeAk,f ,Pk,fU

subject to∑

f∈F Pk,f ≤ Pmaxk , ∀k ∈ {1, · · · ,K}

A non-convex combinatorial optimization problemOptimal solution can be obtained by an exhaustive search.With K cells, F sub-bands, and P discrete power levels, there will

be (PF )K = PKF possible combinations available for powerse�ing at the base stations

e.g. for K = 15, F = 5, P = 5, 312515 possible combinations =⇒infeasible

For joint sub-band and power allocation: (∑

k Uk)F × PKF

possibilities

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 14 / 50

Page 15: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Resource management based on machine learningExample: Optimal user association for sum-rate maximization in aninterference channel

Assume K transmi�ers and M receivers, each transmi�er canserve only one user, one receiver can be served by at most amtransmi�ers, and minimum rate constraint for each user

SINR for transmission from k to m:

γk,m =pkgkm

σ2 +∑

l 6=k plglm

User association variable:

ρk,m =

{1, if transmi�er k serves receiver m0, otherwise.

Sum rate = B∑K

k=1

∑Mm=1 ρk,m log2(1 + γk,m)

A. Zappone, M. di Renzo, and M. Debbah, “Wireless networks design in the era of deeplearning: Model-based, AI-based, or both?," arXiv:1902.02647v1

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 15 / 50

Page 16: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Resource management based on machine learningProblem formulation:

maxρ

K∑k=1

M∑m=1

ρk,m log2(1 + γk,m)

s.t.

M∑m=1

ρk,m ≤ 1, ∀k ∈ {1, · · · ,K}

K∑k=1

ρk,m ≤ am, ∀m ∈ {1, · · · ,M}

M∑m=1

ρk,m log2(1 + γk,m) ≥ R(min)k , ∀k ∈ {1, · · · ,K}

ρk,m ∈ {0, 1}, ∀k ∈ {1, · · · ,K}, ∀m ∈ {1, · · · ,M}

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 16 / 50

Page 17: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Machine learning basics

Categories: Supervised, unsupervised, and reinforcement learning

Supervised learning:Given data set D = {(x1, y1), (x2, y2), · · · , (xN, yN )}, predict ythat generalizes the input-output mapping in D to inputs x outsideD.

Classification (discrete output) and regression (continuous output)problems

Classification accuracy (or error rate) and root mean square error(RMSE)

Common supervised learning techniques: Bayesian classification,K-Nearest Neighbor (KNN), Neural Network (NN), Support VectorMachine (SVM), Decision Tree (DT) classification, recommendersystem

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 17 / 50

Page 18: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Neural Networks (NNs)

Defines a mappingg(x, θ) : Rn → Rk of an inputvector x ∈ Rn to an outputvector y ∈ Rk.

Consist of basic componentsknown as neurons or nodes

Three layers: input layer,hidden layers, and outputlayer.

Nodes can performnon-linear functions.

Neural network

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 18 / 50

Page 19: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Neural networks

Mathematically, for neuron k ina hidden layer

vk =

m∑j=1

wkjxj + bk

yk = ϕ(vk)

Commonly used learningalgorithm: Back propagation algo

Non-linear model of a neuron

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 19 / 50

Page 20: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Neural Networks

Activation functions:Sigmoidal: ϕ(x) = 1

1+e−x (generally used in output layer)

Hyperbolic tangent: tanh(x) = ex−e−x

ex+e−x (generally used in outputlayer)

Rectified linear unit: ReLU (x) = max(0, x) (generally used in thehidden layers)

Refinements of ReLU, e.g. Leaky ReLU, exponential ReLUSo�max: exi∑

j exj (generally used in output layer)

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 20 / 50

Page 21: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Neural Networks

Training procedure:

Target function y = f∗(x). Here, x is the input vector and y isthe output vector

y = f(x; θ), where θ denotes the unknown parameters, i.e.weights and biases

Goal is to learn θ precisely so that our model can be closer to theoriginal one

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 21 / 50

Page 22: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Neural Networks

Training procedure: feed-forward and back-propagation neuralnetwork

A training dataset composed of inputs and outputs is typicallyused to train the model

Initialize the weights and biases randomly and feed the inputs tothe input layer.

Forward pass: Propagation of information from the input layer tothe output layer.

A cost function (e.g. mean squared error [MSE])

to measure the quality of the model by calculating the error betweenthe predicted and the original value

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 22 / 50

Page 23: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Neural Networks

Training procedure (contd.):

Backward pass: The error signal is propagated backward throughthe hidden layers and updates the weights and biases of each layer

E.g. standard gradient descent, algo. to optimize weights ofneurons based on the gradient of loss function

Training process continues until the error rate reaches a thresholdvalue.

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 23 / 50

Page 24: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Example: Learning XOR

Target function: f∗(X1, X2)

Model function: y = f(X1, X2; θ)

Learning algorithm will adapt the parameters θ to make f assimilar as possible to f∗.

[X1, X2] ∈ {[0, 0], [0, 1], [1, 0], [1, 1]}MSE loss function:

e(θ) = 14

∑X1,X2

(f∗(X1, X2)− f(X1, X2; θ))2

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 24 / 50

Page 25: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Deep Learning (DL) and Deep Neural Network (DNN)

DL: A branch of machinelearning which can be supervised,semi-supervised or unsupervised.

DNN: A deep learningarchitecture consisting of manyhidden layers

Improves performance at afaster rate when the dimensionof the training data (no. offeatures) increases

Deep neural network

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 25 / 50

Page 26: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Reinforcement Learning

An agent learns by interacting with the environment.

The agent senses its current state and then chooses an action.

For every action, the agent either receives a reward for a goodmove or a penalty for a bad move.

The primary job of the agent is to maximize the expectedcumulative reward through a series of actions.

MDPs formally describe an environment for RL where theenvironment is fully observable.

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 26 / 50

Page 27: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Q Learning

An MDP models a sequential decision-making problem where thestate transition (P) and reward function (R) depend on only thecurrent state and the applied action.

In case state transition P (i.e. which gives the next state for anystate-action pair) and the reward function (R) are known, the MDPcan be solved through Dynamic Programming.

If P and R are not known, Q-learning method can be used toobtain the optimal action-value function.

An action-value function determines the value of being in acertain state and taking a certain action at that state.

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 27 / 50

Page 28: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Q Learning

Q-learning is a technique that evaluates which action to takebased on an action-value function (e.g. Q(s, a) value).

Main idea of Q-learning: “explore" all possibilities ofstate-action pairs and estimate the Q(s, a) value (i.e. the long-termreward that will be received by applying an action in a state).

Bellman equation:Q(st+1, a) = (1− α)Q(st, a) + α [R(st, a) + γ maxa′ Q(s′t, a

′)]

Optimal policy: π∗(s) = arg maxaQ∗(s, a)

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 28 / 50

Page 29: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Deep Reinforcement Learning (DRL)In RL, a policy is stored in tabular form.

Traditional RL is limited to low dimensional state and actionspace.

Not feasible for large action and state space

Therefore, instead of a tabular method, a function approximationsuch as DNN can be used.

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 29 / 50

Page 30: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

DQL (contd.)

2019-11-19, 2(44 PMIntroduction to Deep Q-Learning for Reinforcement Learning (in Python)

Page 9 of 21https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/

(https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2019/04/Screenshot-2019-04-16-at-5.46.01-PM.png)

So, what are the steps involved in reinforcement learning using deep Q-learning networks(DQNs)?

1. All the past experience is stored by the user in memory2. The next action is determined by the maximum output of the Q-network3. The loss function here is mean squared error of the predicted Q-value and the target Q-

value – Q*. This is basically a regression problem. However, we do not know the targetor actual value here as we are dealing with a reinforcement learning problem. Goingback to the Q-value update equation derived fromthe Bellman equation. we have:

The section in green represents the target. We can argue that it is predicting its own value,but since R is the unbiased true reward, the network is going to update its gradient usingbackpropagation to _nally converge.

Challenges in Deep RL as Compared to Deep Learning

h�ps://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 30 / 50

Page 31: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

DL-based resource allocation model

System model

Recall: Downlink multi-cell network with K cells and Ffrequency sub-bands

Cu,k is CQI vector of user u in cell k:Cu,k := (Cu,k,1, Cu,k,2, . . . , Cu,k,F ).

A location indicator Vu,k is used to indicate whether a user u ofcell k is cell-centre user or cell-edge user:

Vu,k =

{1, if Ru,k > R/2

0, otherwise

where Ru,k = distance of user u in cell k from the BS and R = cellradius.

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 31 / 50

Page 32: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Supervised DL approach

DL model takes Cu,k vector along with Vu,k of all users as inputand predict power and sub-band allocations as output, e.g. for aparticular user the input is:

CQI vectorLocationindicator

13 9 7 1

For 5 BSs, 5 users/cell, 3 frequency sub-bands =⇒ input data sizeis 5× 5× (3 + 1) = 100

For K cells, U users/cell and F sub-bands, size of input data =(K × U × (F + 1))

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 32 / 50

Page 33: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Supervised DL approach

Convert output data from decimal base to n-bit binary and alsouse their complements in the output

Example output for a particular cell (36 bits), with 3-bit binaryrepresentation along with complements

Power allocation Sub-band allocation1 1 3 3 5 2

bin comp bin comp bin comp bin comp bin comp bin comp001 110 001 110 011 100 011 100 101 010 010 101

With 5 cells, total number of outputs = 5× 36 = 180.

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 33 / 50

Page 34: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Supervised DL approach

Data generation and Testing:

To train and test our DNN model, we use GA.

Training dataset consists of input data and labeled target data (oroutput data)

DNN model for power and sub-band allocation:

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 34 / 50

Page 35: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Supervised DL approach

System operation:

All users in the network periodically send their CQI values totheir serving BSs.

BS then sends the processed information of all users to a centralentity (e.g. SDN controller), which runs the DNN model.

DNN model generates allocation vector and power vector for allthe BSs

Once the prediction is made, SDN controller sends back theallocation and power vectors to BSs.

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 35 / 50

Page 36: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Supervised DL approachSimulation parameters:K = 5 BSs, R = 500 m, Pmax = 40 W, directional antenna,

number of users per cell = 5, B = 2.88 MHz, N0 = −174 dBm/Hz, F =3, P = 3, and power levels = {6.4, 12.8, 19.2}W.Input data generation:

Using GA, produce around 17000 samples for our DNN model.Use 80% of this data-set for training and the remaining 20% for

testing.Comparison between exhaustive search and genetic algorithm

Parameter Exhaustive Search Genetic AlgorithmAvg. Execution time 1460.00 sec. 118.76 sec.Max. Execution time 2247.50 sec. 158.65 sec.Min. Execution time 1509.00 sec. 95.13 sec.Accuracy 100% 85.25%

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 36 / 50

Page 37: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Supervised DL approach

Simulation results:

Test accuracy vs. number ofhidden layers

2 3 4 5

Number of Hidden Layers

50

55

60

65

70

75

80

85

90

95

100

Tes

t Acc

urac

y (%

)

Test Accuracy Vs. Number of Hidden Layers

Test accuracy vs. number ofsamples

4000 6000 8000 10000 12000

Number of Training Samples

50

55

60

65

70

75

80

85

90

95

100

Acc

urac

y (%

)

Accuracy Vs. Number of Training Samples

Training accuracyTest accuracy

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 37 / 50

Page 38: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Supervised DL approach

Simulation results:

We achieve the maximum test accuracy of 86.31% for 4 hiddenlayers.

Continuous increase of the number of hidden layers will notimprove the accuracy significantly and in some cases the modelmay even start to learn noisy features.

The training accuracy keeps increasing with the trainingexamples, and a�er some specific training examples, it starts tosaturate.

With a further increase in training data, the model starts tooverfit the data leading to reduced test accuracy

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 38 / 50

Page 39: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

DQL Model for Power Allocation

State and Action:Use Cu,k vector along with Vu,k of all users in a network as state.

For K cells, U users/cell, and total F subbands, the state size is(K × U × (F + 1)).

An action corresponds to a power allocation.

With P power levels, total number of power combinations in a cell =PF

Let m be the total number of combinations possible in a cell a�erapplying the maximal power constraint.

With K cells, the agent has to take K actions from a total of K ×mactions.

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 39 / 50

Page 40: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

DQL Model for Power AllocationTraining

Use Deep Q-learning with experience replay approach totrain the DQL based power allocation model.

Testing

Need to compare the power allocation the solution with theoptimal power allocation

A Genetic Algorithm (GA) is used to find the near-optimal powerallocation.

K. I. Ahmed, H. Tabassum, and E. Hossain, “A deep Q-learning method for downlink powerallocation in multi-cell networks," h�ps://arxiv.org/abs/1904.13032

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 40 / 50

Page 41: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

DQL With Experience Replay

All the past experience is stored

The next action is determined by the maximum output of theQ-network

The loss function here is mean squared error of the predictedQ-value and the target Q-value.

However, how to obtain the target or actual value? Recall theQ-value update equation (the section in green represents thetarget):

h�ps://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 41 / 50

Page 42: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

DQL With Experience Replay

Instead of using one neural network for learning, we can use two:prediction network and target network

Target network has the same architecture as the predictionnetwork, but with frozen parameters.

For every C iterations (a hyper-parameter), the parameters fromthe prediction network are copied to the target network.

h�ps://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 42 / 50

Page 43: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

DQL Model for Power AllocationTesting (contd.)

DQL model compared with di�erent power allocation schemes interms of the total network throughput.

Power allocation schemes used for comparison: WMMSE, randompower allocation, maximum power allocation

WMMSE: a distributed approach to sum-rate maximization underpower constraint

Q. Shi et al., “An iteratively weighted MMSE approach to distributed sum-utilitymaximization for a MIMO interfering broadcast channel," IEEE Trans. on Signal Processing,vol. 59, no. 9, pp. 4331–4340.

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 43 / 50

Page 44: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Performance of DQL ModelSimulation parameters:

K = 5 BSs, R = 500 m, Pmax = 40 W, number of users per cell =5, B = 2.88 MHz, N0 = −174 dBm/Hz, F = 3, P = 5, and power levels= {6.4, 9.6, 12.8, 16, 19.2}W.

Training parameters:

Parameter ValueNumber of hidden layers 2

Layers{Input, Hidden Layer,Hidden Layer, Output}

No. of neurons per layer {100, 1440, 720, 360}Replay memory size 80, 000Batch size 64Update target frequency 1000Learning rate 0.00025Optimizer RMSprop

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 44 / 50

Page 45: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Performance of DQL Model

Simulation results:

Normalized throughput vs.testing samples.

160 180 200 220 240 260Testing Sample

0.90

0.92

0.94

0.96

0.98

1.00

Norm

aliz

ed T

hro

ughput

GA (near-optimal)DRLAvg. (DRL)Max. PowerAvg. (Max. Power)

RandomAvg. (Random)WMMSEAvg. (WMMSE)

Normalized error vs. testingsamples

180 200 220 240 260Testing Sample

0

2

4

6

8

Norm

aliz

ed E

rror

(%)

DRLAvg. Error (DRL)Max. PowerAvg. Error (Max. Power)

RandomAvg. Error (Random)WMMSEAvg. Error (WMMSE)

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 45 / 50

Page 46: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Performance of DQL Model

Simulation results: (contd.)

Avg. normalized throughputvs. learning rate

0.00025 0.0025 0.025

Learning Rate

0.9900

0.9905

0.9910

0.9915

0.9920

0.9925

0.9930

Avg.

Norm

aliz

ed T

hro

ughput

5 cell

Avg. normalized throughputvs. number of hidden layers

1 2 3

Number of Hidden Layers

0.9900

0.9905

0.9910

0.9915

0.9920

0.9925

0.9930

Avg.

Norm

aliz

ed T

hro

ughput

5 cell

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 46 / 50

Page 47: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

DQL Model for Power Allocation

Simulation results (contd.)

Proposed DQL model performs be�er compared to other powerallocation schemes

For a high learning rate, performance of DQL degrades

A continuous increase of the number of hidden layers degradesthe performance significantly

In our set up, we achieve the maximum value of the averagenormalized throughput for a DQN with 2 hidden layers and alearning rate of 0.0025.

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 47 / 50

Page 48: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Final remarks

6G wireless networks will need to be “super intelligent"(self-reconfigurable, context-aware, secure)

AI and ML are suitable to address complex “6G wireless" problems(e.g. resource allocation in massive networks, network control,security)

Application of deep learning in scenarios with massiveconnections and spatio-temporal correlations

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 48 / 50

Page 49: JOrangeWireless Systems Design in the Beyond 5G Era

Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion

Final remarks

Crucial to develop e�icient o�line methods to generateoptimal/near-optimal data samples for training the DNNarchitectures.

Deep reinforcement learning models are particularly suitable (e.g.to solve optimization/“end-to-end" learning problems)

Very problem specific

Leverage state-of-the-art DRL techniques

Distributed and federated learning techniques

Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 49 / 50

Page 50: JOrangeWireless Systems Design in the Beyond 5G Era

Thank you!