jorangewireless systems design in the beyond 5g era
TRANSCRIPT
Wireless Systems Design in the Beyond 5G Era:Promises of Deep Learning and DeepReinforcement Learning
Ekram Hossain09 November 2020
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Outline
. Part I: Introduction
. Part II: NN, DNN, and Deep Learning
. Part III: DL and DRL Models for Resource Allocation
. Part IV: Conclusion
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 2 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Evolution toward next generation wireless networks
Cell-edge
User
Relay
Base
Station
Wireless Access LinkBackhaul Link
Data Internet
InternetPico Cell
Dense Small Cells
Relay-aided D2D
communication
D2D Link
M2M device/sensor
M2M Gateway
Indoor Femto
Access Point
Femto Gateway
C-RAN
Remote Radio Head
Fronthaul Link
Internet
Enhanced 5G
RAN
Evolution of B5G
SDN Controller
Core Network
Edge Servers
Enhanced Evolved Packet
Core (EPC)
Mm-wave Massive
MIMO BSs
Aerial Drones
Underwater
Drones
Cell free
Massive MIMO
VLC
Smart Grid/ Energy Internet
Group Mobility4G/5G Multi-tier
Cellular Network
Single-tier Cellular
Network
Core Network
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 3 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Next generation wireless networks: Beyond 5G
Internet-of-EverythingConverged terrestrial and non-terrestrial communication networks
Massive capacity, massive connectivity, energy e�iciency
Programmable radio environment
Convergence of communications and computing
Virtualization/network slicing and “SDN”-ization of convergedcommunication networks
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 4 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Network management and control in B5G era
AI-based super intelligent RAN for 6Gresource allocation (communications and computing) optimization,cell and radio design, network personalization, etc.
Optimal solutions are obtained by applyingexhaustive search methods, genetic algorithms, combinatorial, andbranch and bound techniques
Incur significantly high time and computational complexity
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 5 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Network management and control in B5G era
Sub-optimal solutions obtained based on techniques such as◦ Lagrangian relaxations, iterative distributed optimization, heuristic
algorithms, and game theory
Also very computation intensive and/or may not be feasible forlarge cellular networks due to high signaling overhead
Sub-optimal solutions can be far from optimal solutions and theirconvergence properties and the optimality gap could be unknown.
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 6 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Machine learning basicsEssence of data-driven machine learning:
A pa�ern exists which cannot be pinned down mathematically.
Data is available.
Learn from input data set by analyzing data statistically.
Make decision or predict output without being explicitlyprogrammed.
Typical applications: Image recognition, speech recognition,medical diagnosis, etc.
Applications to cellular communications: Decoding, modulationclassification, spectrum sensing, localization, mobility prediction,hando� management, resource optimization, load balancing/cellassociation, power control, fault detection, fault classification,caching, tra�ic classification, congestion prediction
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 7 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Machine learning basicsLearning diagram
f:
(ideal credit approval function)
(historical records of credit customers)
HYPOTHESIS SET
(set of candidate formulas)
ALGORITHMLEARNING FINAL
HYPOTHESIS
UNKNOWN TARGET FUNCTION
(final credit approval formula)
TRAINING EXAMPLES
X Y
x y x yNN11( , ), ... , ( , )
H
A
g ~ f ~
⃝ AML
∗ Yaser Abu-Mostafa, lecture slides on “Learning from Data",h�ps://work.caltech.edu/lectures.html#lectures
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 8 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Resource management based on machine learning
With machine learning (ML), solutions are learned by exploitingthe data samples
ML tools can be used to obtain practical solutions for radioresource allocation problems in a large cellular network given thepast optimal or near-optimal resource allocation decisions.
ML-based resource allocation algorithms can be implementedonline.
Note: Deep learning (DL) is a sub-class of ML based on “DeepNeural Networks" which use multiple layers of nonlinear processingunits.
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 9 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Resource management based on machine learning
No performance guarantee (suboptimal performance)
Lack of interpretability (blackbox mapping of inputs to outputs)
Depends on the availability of data
O. Simeone, “A very brief introduction to machine learning with applications tocommunication systems," IEEE Trans. on Cognitive Communications and Networking, vol. 4, no.4, Dec. 2018.
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 10 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
When to use machine learning?
No mathematical model or e�icient algorithm (modeling and/oralgorithmic deficit)
Task involves a function that maps well-defined inputs towell-defined outputs
The function does not change rapidly over time
Large data sets can be made available
Error can be tolerated and no need for optimal solutions
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 11 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Resource management based on machine learningExample: Sub-band and power allocation in multi-cell networks
Consider a downlink cellular network of K base stations(k ∈ {1, · · · ,K}), F frequency sub-bands
Bandwidth of each sub-band is B, power allocated by cell k infrequency sub-band f is Pk,f which is discrete
Total power of cell k is limited by a maximum value Pmaxk such
that∑
f∈F Pk,f ≤ Pmaxk , ∀k ∈ {1, · · · ,K}, Uk = set of users
associated with cell k
Vector Ak,f denotes allocation of sub-band f in cell k and Pk,fdenotes allocation of power in sub-band f in cell k
K. I. Ahmed, H. Tabassum, and E. Hossain, “Deep learning for radio resource allocation inmulti-cell networks," h�ps://arxiv.org/abs/1808.00667
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 12 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Resource management based on machine learning
Sub-band and power allocation in multi-cell networks (contd.)
Sum-utility for all cells:
U =∑
k∈{1,··· ,K}
∑u∈Uk
F∑f=1
[I(Ak,f = u)B log (1 + αSINRu,k,f )]
where α = −1.5/ log(5BER)
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 13 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Resource management based on machine learningSub-band and power allocation in multi-cell networks (contd.)Optimization problem:
maximizeAk,f ,Pk,fU
subject to∑
f∈F Pk,f ≤ Pmaxk , ∀k ∈ {1, · · · ,K}
A non-convex combinatorial optimization problemOptimal solution can be obtained by an exhaustive search.With K cells, F sub-bands, and P discrete power levels, there will
be (PF )K = PKF possible combinations available for powerse�ing at the base stations
e.g. for K = 15, F = 5, P = 5, 312515 possible combinations =⇒infeasible
For joint sub-band and power allocation: (∑
k Uk)F × PKF
possibilities
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 14 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Resource management based on machine learningExample: Optimal user association for sum-rate maximization in aninterference channel
Assume K transmi�ers and M receivers, each transmi�er canserve only one user, one receiver can be served by at most amtransmi�ers, and minimum rate constraint for each user
SINR for transmission from k to m:
γk,m =pkgkm
σ2 +∑
l 6=k plglm
User association variable:
ρk,m =
{1, if transmi�er k serves receiver m0, otherwise.
Sum rate = B∑K
k=1
∑Mm=1 ρk,m log2(1 + γk,m)
A. Zappone, M. di Renzo, and M. Debbah, “Wireless networks design in the era of deeplearning: Model-based, AI-based, or both?," arXiv:1902.02647v1
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 15 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Resource management based on machine learningProblem formulation:
maxρ
K∑k=1
M∑m=1
ρk,m log2(1 + γk,m)
s.t.
M∑m=1
ρk,m ≤ 1, ∀k ∈ {1, · · · ,K}
K∑k=1
ρk,m ≤ am, ∀m ∈ {1, · · · ,M}
M∑m=1
ρk,m log2(1 + γk,m) ≥ R(min)k , ∀k ∈ {1, · · · ,K}
ρk,m ∈ {0, 1}, ∀k ∈ {1, · · · ,K}, ∀m ∈ {1, · · · ,M}
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 16 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Machine learning basics
Categories: Supervised, unsupervised, and reinforcement learning
Supervised learning:Given data set D = {(x1, y1), (x2, y2), · · · , (xN, yN )}, predict ythat generalizes the input-output mapping in D to inputs x outsideD.
Classification (discrete output) and regression (continuous output)problems
Classification accuracy (or error rate) and root mean square error(RMSE)
Common supervised learning techniques: Bayesian classification,K-Nearest Neighbor (KNN), Neural Network (NN), Support VectorMachine (SVM), Decision Tree (DT) classification, recommendersystem
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 17 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Neural Networks (NNs)
Defines a mappingg(x, θ) : Rn → Rk of an inputvector x ∈ Rn to an outputvector y ∈ Rk.
Consist of basic componentsknown as neurons or nodes
Three layers: input layer,hidden layers, and outputlayer.
Nodes can performnon-linear functions.
Neural network
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 18 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Neural networks
Mathematically, for neuron k ina hidden layer
vk =
m∑j=1
wkjxj + bk
yk = ϕ(vk)
Commonly used learningalgorithm: Back propagation algo
Non-linear model of a neuron
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 19 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Neural Networks
Activation functions:Sigmoidal: ϕ(x) = 1
1+e−x (generally used in output layer)
Hyperbolic tangent: tanh(x) = ex−e−x
ex+e−x (generally used in outputlayer)
Rectified linear unit: ReLU (x) = max(0, x) (generally used in thehidden layers)
Refinements of ReLU, e.g. Leaky ReLU, exponential ReLUSo�max: exi∑
j exj (generally used in output layer)
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 20 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Neural Networks
Training procedure:
Target function y = f∗(x). Here, x is the input vector and y isthe output vector
y = f(x; θ), where θ denotes the unknown parameters, i.e.weights and biases
Goal is to learn θ precisely so that our model can be closer to theoriginal one
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 21 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Neural Networks
Training procedure: feed-forward and back-propagation neuralnetwork
A training dataset composed of inputs and outputs is typicallyused to train the model
Initialize the weights and biases randomly and feed the inputs tothe input layer.
Forward pass: Propagation of information from the input layer tothe output layer.
A cost function (e.g. mean squared error [MSE])
to measure the quality of the model by calculating the error betweenthe predicted and the original value
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 22 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Neural Networks
Training procedure (contd.):
Backward pass: The error signal is propagated backward throughthe hidden layers and updates the weights and biases of each layer
E.g. standard gradient descent, algo. to optimize weights ofneurons based on the gradient of loss function
Training process continues until the error rate reaches a thresholdvalue.
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 23 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Example: Learning XOR
Target function: f∗(X1, X2)
Model function: y = f(X1, X2; θ)
Learning algorithm will adapt the parameters θ to make f assimilar as possible to f∗.
[X1, X2] ∈ {[0, 0], [0, 1], [1, 0], [1, 1]}MSE loss function:
e(θ) = 14
∑X1,X2
(f∗(X1, X2)− f(X1, X2; θ))2
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 24 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Deep Learning (DL) and Deep Neural Network (DNN)
DL: A branch of machinelearning which can be supervised,semi-supervised or unsupervised.
DNN: A deep learningarchitecture consisting of manyhidden layers
Improves performance at afaster rate when the dimensionof the training data (no. offeatures) increases
Deep neural network
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 25 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Reinforcement Learning
An agent learns by interacting with the environment.
The agent senses its current state and then chooses an action.
For every action, the agent either receives a reward for a goodmove or a penalty for a bad move.
The primary job of the agent is to maximize the expectedcumulative reward through a series of actions.
MDPs formally describe an environment for RL where theenvironment is fully observable.
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 26 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Q Learning
An MDP models a sequential decision-making problem where thestate transition (P) and reward function (R) depend on only thecurrent state and the applied action.
In case state transition P (i.e. which gives the next state for anystate-action pair) and the reward function (R) are known, the MDPcan be solved through Dynamic Programming.
If P and R are not known, Q-learning method can be used toobtain the optimal action-value function.
An action-value function determines the value of being in acertain state and taking a certain action at that state.
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 27 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Q Learning
Q-learning is a technique that evaluates which action to takebased on an action-value function (e.g. Q(s, a) value).
Main idea of Q-learning: “explore" all possibilities ofstate-action pairs and estimate the Q(s, a) value (i.e. the long-termreward that will be received by applying an action in a state).
Bellman equation:Q(st+1, a) = (1− α)Q(st, a) + α [R(st, a) + γ maxa′ Q(s′t, a
′)]
Optimal policy: π∗(s) = arg maxaQ∗(s, a)
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 28 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Deep Reinforcement Learning (DRL)In RL, a policy is stored in tabular form.
Traditional RL is limited to low dimensional state and actionspace.
Not feasible for large action and state space
Therefore, instead of a tabular method, a function approximationsuch as DNN can be used.
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 29 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
DQL (contd.)
2019-11-19, 2(44 PMIntroduction to Deep Q-Learning for Reinforcement Learning (in Python)
Page 9 of 21https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/
(https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2019/04/Screenshot-2019-04-16-at-5.46.01-PM.png)
So, what are the steps involved in reinforcement learning using deep Q-learning networks(DQNs)?
1. All the past experience is stored by the user in memory2. The next action is determined by the maximum output of the Q-network3. The loss function here is mean squared error of the predicted Q-value and the target Q-
value – Q*. This is basically a regression problem. However, we do not know the targetor actual value here as we are dealing with a reinforcement learning problem. Goingback to the Q-value update equation derived fromthe Bellman equation. we have:
The section in green represents the target. We can argue that it is predicting its own value,but since R is the unbiased true reward, the network is going to update its gradient usingbackpropagation to _nally converge.
Challenges in Deep RL as Compared to Deep Learning
h�ps://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 30 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
DL-based resource allocation model
System model
Recall: Downlink multi-cell network with K cells and Ffrequency sub-bands
Cu,k is CQI vector of user u in cell k:Cu,k := (Cu,k,1, Cu,k,2, . . . , Cu,k,F ).
A location indicator Vu,k is used to indicate whether a user u ofcell k is cell-centre user or cell-edge user:
Vu,k =
{1, if Ru,k > R/2
0, otherwise
where Ru,k = distance of user u in cell k from the BS and R = cellradius.
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 31 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Supervised DL approach
DL model takes Cu,k vector along with Vu,k of all users as inputand predict power and sub-band allocations as output, e.g. for aparticular user the input is:
CQI vectorLocationindicator
13 9 7 1
For 5 BSs, 5 users/cell, 3 frequency sub-bands =⇒ input data sizeis 5× 5× (3 + 1) = 100
For K cells, U users/cell and F sub-bands, size of input data =(K × U × (F + 1))
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 32 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Supervised DL approach
Convert output data from decimal base to n-bit binary and alsouse their complements in the output
Example output for a particular cell (36 bits), with 3-bit binaryrepresentation along with complements
Power allocation Sub-band allocation1 1 3 3 5 2
bin comp bin comp bin comp bin comp bin comp bin comp001 110 001 110 011 100 011 100 101 010 010 101
With 5 cells, total number of outputs = 5× 36 = 180.
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 33 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Supervised DL approach
Data generation and Testing:
To train and test our DNN model, we use GA.
Training dataset consists of input data and labeled target data (oroutput data)
DNN model for power and sub-band allocation:
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 34 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Supervised DL approach
System operation:
All users in the network periodically send their CQI values totheir serving BSs.
BS then sends the processed information of all users to a centralentity (e.g. SDN controller), which runs the DNN model.
DNN model generates allocation vector and power vector for allthe BSs
Once the prediction is made, SDN controller sends back theallocation and power vectors to BSs.
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 35 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Supervised DL approachSimulation parameters:K = 5 BSs, R = 500 m, Pmax = 40 W, directional antenna,
number of users per cell = 5, B = 2.88 MHz, N0 = −174 dBm/Hz, F =3, P = 3, and power levels = {6.4, 12.8, 19.2}W.Input data generation:
Using GA, produce around 17000 samples for our DNN model.Use 80% of this data-set for training and the remaining 20% for
testing.Comparison between exhaustive search and genetic algorithm
Parameter Exhaustive Search Genetic AlgorithmAvg. Execution time 1460.00 sec. 118.76 sec.Max. Execution time 2247.50 sec. 158.65 sec.Min. Execution time 1509.00 sec. 95.13 sec.Accuracy 100% 85.25%
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 36 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Supervised DL approach
Simulation results:
Test accuracy vs. number ofhidden layers
2 3 4 5
Number of Hidden Layers
50
55
60
65
70
75
80
85
90
95
100
Tes
t Acc
urac
y (%
)
Test Accuracy Vs. Number of Hidden Layers
Test accuracy vs. number ofsamples
4000 6000 8000 10000 12000
Number of Training Samples
50
55
60
65
70
75
80
85
90
95
100
Acc
urac
y (%
)
Accuracy Vs. Number of Training Samples
Training accuracyTest accuracy
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 37 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Supervised DL approach
Simulation results:
We achieve the maximum test accuracy of 86.31% for 4 hiddenlayers.
Continuous increase of the number of hidden layers will notimprove the accuracy significantly and in some cases the modelmay even start to learn noisy features.
The training accuracy keeps increasing with the trainingexamples, and a�er some specific training examples, it starts tosaturate.
With a further increase in training data, the model starts tooverfit the data leading to reduced test accuracy
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 38 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
DQL Model for Power Allocation
State and Action:Use Cu,k vector along with Vu,k of all users in a network as state.
For K cells, U users/cell, and total F subbands, the state size is(K × U × (F + 1)).
An action corresponds to a power allocation.
With P power levels, total number of power combinations in a cell =PF
Let m be the total number of combinations possible in a cell a�erapplying the maximal power constraint.
With K cells, the agent has to take K actions from a total of K ×mactions.
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 39 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
DQL Model for Power AllocationTraining
Use Deep Q-learning with experience replay approach totrain the DQL based power allocation model.
Testing
Need to compare the power allocation the solution with theoptimal power allocation
A Genetic Algorithm (GA) is used to find the near-optimal powerallocation.
K. I. Ahmed, H. Tabassum, and E. Hossain, “A deep Q-learning method for downlink powerallocation in multi-cell networks," h�ps://arxiv.org/abs/1904.13032
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 40 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
DQL With Experience Replay
All the past experience is stored
The next action is determined by the maximum output of theQ-network
The loss function here is mean squared error of the predictedQ-value and the target Q-value.
However, how to obtain the target or actual value? Recall theQ-value update equation (the section in green represents thetarget):
h�ps://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 41 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
DQL With Experience Replay
Instead of using one neural network for learning, we can use two:prediction network and target network
Target network has the same architecture as the predictionnetwork, but with frozen parameters.
For every C iterations (a hyper-parameter), the parameters fromthe prediction network are copied to the target network.
h�ps://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 42 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
DQL Model for Power AllocationTesting (contd.)
DQL model compared with di�erent power allocation schemes interms of the total network throughput.
Power allocation schemes used for comparison: WMMSE, randompower allocation, maximum power allocation
WMMSE: a distributed approach to sum-rate maximization underpower constraint
Q. Shi et al., “An iteratively weighted MMSE approach to distributed sum-utilitymaximization for a MIMO interfering broadcast channel," IEEE Trans. on Signal Processing,vol. 59, no. 9, pp. 4331–4340.
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 43 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Performance of DQL ModelSimulation parameters:
K = 5 BSs, R = 500 m, Pmax = 40 W, number of users per cell =5, B = 2.88 MHz, N0 = −174 dBm/Hz, F = 3, P = 5, and power levels= {6.4, 9.6, 12.8, 16, 19.2}W.
Training parameters:
Parameter ValueNumber of hidden layers 2
Layers{Input, Hidden Layer,Hidden Layer, Output}
No. of neurons per layer {100, 1440, 720, 360}Replay memory size 80, 000Batch size 64Update target frequency 1000Learning rate 0.00025Optimizer RMSprop
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 44 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Performance of DQL Model
Simulation results:
Normalized throughput vs.testing samples.
160 180 200 220 240 260Testing Sample
0.90
0.92
0.94
0.96
0.98
1.00
Norm
aliz
ed T
hro
ughput
GA (near-optimal)DRLAvg. (DRL)Max. PowerAvg. (Max. Power)
RandomAvg. (Random)WMMSEAvg. (WMMSE)
Normalized error vs. testingsamples
180 200 220 240 260Testing Sample
0
2
4
6
8
Norm
aliz
ed E
rror
(%)
DRLAvg. Error (DRL)Max. PowerAvg. Error (Max. Power)
RandomAvg. Error (Random)WMMSEAvg. Error (WMMSE)
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 45 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Performance of DQL Model
Simulation results: (contd.)
Avg. normalized throughputvs. learning rate
0.00025 0.0025 0.025
Learning Rate
0.9900
0.9905
0.9910
0.9915
0.9920
0.9925
0.9930
Avg.
Norm
aliz
ed T
hro
ughput
5 cell
Avg. normalized throughputvs. number of hidden layers
1 2 3
Number of Hidden Layers
0.9900
0.9905
0.9910
0.9915
0.9920
0.9925
0.9930
Avg.
Norm
aliz
ed T
hro
ughput
5 cell
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 46 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
DQL Model for Power Allocation
Simulation results (contd.)
Proposed DQL model performs be�er compared to other powerallocation schemes
For a high learning rate, performance of DQL degrades
A continuous increase of the number of hidden layers degradesthe performance significantly
In our set up, we achieve the maximum value of the averagenormalized throughput for a DQN with 2 hidden layers and alearning rate of 0.0025.
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 47 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Final remarks
6G wireless networks will need to be “super intelligent"(self-reconfigurable, context-aware, secure)
AI and ML are suitable to address complex “6G wireless" problems(e.g. resource allocation in massive networks, network control,security)
Application of deep learning in scenarios with massiveconnections and spatio-temporal correlations
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 48 / 50
Part I: Introduction Part II: NN, DNN, and Deep Learning Part III: DL and DRL Models for Resource Allocation Part IV: Conclusion
Final remarks
Crucial to develop e�icient o�line methods to generateoptimal/near-optimal data samples for training the DNNarchitectures.
Deep reinforcement learning models are particularly suitable (e.g.to solve optimization/“end-to-end" learning problems)
Very problem specific
Leverage state-of-the-art DRL techniques
Distributed and federated learning techniques
Ekram Hossain (U. of Manitoba, Canada) ComSoc VDL 49 / 50
Thank you!