towards deep learning-based wireless communication systems

1

Towards Deep Learning-based WirelessCommunication Systems: Design Perspective

Wonjun Kim, Yongjun Ahn, Jinhong Kim, and Byonghyo ShimSeoul National University, Seoul, Korea

Abstract—Deep learning (DL), a branch of artificial intelli-gence (AI) techniques, has shown great promise in various fieldssuch as image classification and segmentation, speech recognition,language translation, among others. This remarkable success ofDL has stimulated increasing interest in applying this paradigmto wireless communication systems. Since DL principles areinductive in nature and dearly distinct from the conventionalwireless communication principle based on Shannon theory, whenone tries to use DL technique to the wireless applications, onemight easily get lost or stuck in the middle in the bewilderingvariety of DL techniques. The primary purpose of this articleis to provide friendly overview, insightful knowledge, and usefultips that wireless researchers need to know when designing DL-based wireless systems. Specifically, we discuss key issues andpossible solutions related to the DL model selection, trainingdata acquisition, and neural network design encountering in thereal wireless system design.

I. INTRODUCTION

ARTIFICIAL intelligence (AI) is a powerful tool to per-form tasks that seem to be simple for human being but

are extremely difficult for conventional (rule-based) computerprogram. Deep learning (DL), a branch of AI techniquesintroduced by Lecun, Bengio, and Hinton [1], has shown greatpromise in many practical applications. In the past few years,we have witnessed great success of DL in various fields such astraditional Go game, image classification, speech recognition,language translation, among others [2]–[4]. Recently, DLtechniques have also been applied to various wireless com-munication applications such as multiple-input-multiple-output(MIMO) detection, channel estimation, spectrum sensing, andresource scheduling.

When one tries to use AI technique to the wireless appli-cations, one can be easily overwhelmed by so many knobs tocontrol and small details to be aware of. In contrast to theconventional communication systems where the performanceanalysis and the algorithm design are done analytically, DLrequires lots of hands-on experience and heuristic knowledgein the design of neural network, training dataset generation,and also choice of the training strategy. In fact, since the DLdesign process is data-driven and inductive in nature, one caneasily get lost or stuck in the middle when they try to solvethe wireless communication problem using the DL technique.

The primary goal of this paper is to present a friendlyoverview of the DL-based wireless systems to serve as astarting point to facilitate the use of DL in the wirelesssystem design. The successful design of DL-based systemcomes down to the choice of a proper DL model for the

target wireless application, detailed neural network architec-ture design, and training data acquisition along with trainingstrategy selection. We get to the heart of these issues withoutgoing into too much detail. With this purpose in mind, wediscuss key principles of DL-based design and then provideseveral useful design tips learned from our experience withplentiful wireless communication examples including channelestimation, MIMO beamforming, power management, andangle-of-arrival (AoA) detection.

After the introduction, we organize the rest of this article asfollows. In Section II, we briefly review the design principlesof conventional and discuss DL-based wireless systems andlearning techniques used for the wireless system design. InSection III, we explain the training dataset collection andneural network architecture design issues. We discuss futureissues and conclude the paper in Section IV.

II. ARTIFICIAL INTELLIGENCE-BASED WIRELESSCOMMUNICATIONS

In this section, we briefly discuss two design principles:conventional wireless systems and the AI-based systems. Wethen discuss how specific communication function is mappedto the DL techniques.

A. Design Principles of Conventional and AI-based WirelessSystems

When designing wireless systems, whole system is dividedinto several functional blocks such as channel encoder, symbolmapper, channel estimator, MIMO detector, and channel de-coder. In each functional block, system modeling, performanceanalysis, and algorithm design are performed. System model,typically expressed as a clean-cut linear equation, defines therelationship between observation (e.g., received signal) andlatent variables to be recovered (e.g., transmit signal). Usingthis, theoretical analysis is conducted to obtain the perfor-mance limit such as the capacity bound or achievable degreeof freedom (DOF) and then a proper algorithm achievingnear optimal performance is developed (e.g., MMSE channelestimator, maximum-likelihood (ML)-based symbol detector).For example, in the mmWave channel estimator design, apropagation channel is modeled by the geometric parameterssuch as angle-of-departure/arrival (AoD/AoA), path delay, andpath gain and then the compressed sensing (CS) technique isemployed to find out sparse parameters used to reconstruct themmWave channel.

2

TABLE ISUMMARY OF DL TECHNIQUES

Learning technique Applicable problem Loss function Application exampleDetection problem usingthe classification training

Cross entropy,Kullback-Leibler (KL) divergence,

MIMO detection,Active user detectionSupervised

learning Estimation problem usingthe regression training

Mean squared error (MSE),Mean absolute error (MAE)

Channel estimation,DoA estimation

Unsupervisedlearning Optimization problem Objective function to be optimized

(e.g., sum-rate, cell throughput)MIMO beamforming,Resource scheduling

Reinforcementlearning

Sequential decisionmaking problem

Cumulative reward(e.g., total power consumption)

Power management,Spectrum sensing

When the wireless environments and systems are becomingmore complicated, it is very difficult to come up with a simpleyet tractable system model. Further, due to the excessiveassumptions in fading/noise/interference distribution, inputstatistics, and traffic/mobility pattern, obtained analytic resultwill leave a considerable gap from the real-world performance.

As an entirely-new paradigm to deal with the problem,AI has been popularly used in various applications such ascomputer vision, speech recognition, robot control, and au-tonomous driving [3], [4]. A holy grail of AI is to let machinelearn the complicated, often highly nonlinear, relationshipbetween the input dataset and the desired output withouthuman intervention. As a technique to implement AI, DL, anapproach to use deeply stacked neural network in the training,has been widely used in recent years. In a nutshell, DL-basedsystems are distinct from the conventional systems in twomain respects: data-driven training and end-to-end learningof the black box. Instead of following the analytical avenue,the DL model approximates the desired function as a wholeusing the training dataset. In the training phase, DL parameters(weights and biases) are updated to identify the end-to-endmapping between the input dataset and the desired output.Once the training is finished, DL returns the predicted outputfor the input in the inference phase. This means that what weessentially need to do is to just feed a training dataset intothe properly designed DL model. It seems to be simple butrequires lots of hands-on experience to get the most bang forthe buck.

B. Learning Techniques for DL-based Wireless Communica-tion

When one tries to use DL to the wireless systems, perhapsthe first thing to consider is to determine what learning tech-nique to use. Depending on the design goal, training dataset,and learning mechanism, DL techniques can be roughly di-vided into three categories: supervised learning, unsupervisedlearning, and reinforcement learning.

1) Supervised learning: primary goal of the supervisedlearning is to learn a mapping function between the inputdataset and the desired solution called label. To scrutinizethe quality of a designed neural network and reflect it in theweight update process, we need a loss function that measureshow far the predicted output is from the label. The differencebetween the predicted output and the label, in a form ofcross entropy or mean squared error (MSE), is used as a

loss function. Typically, there are two types of the supervisedlearning: classification to find out the categorical class of giveninput (e.g., device is active or not) and regression to return thenumerical value (e.g., estimated channel). The classificationtask is suitable for the detection problem such as the MIMOdetection, automatic modulation classification (AMC), andactive user detection (AUD) and the regression task is a goodfit for the estimation problem such as the channel estimation,angle (DoA/AoD) estimation, and log likelihood ratio (LLR)generation [5]–[8].

In AUD, for example, we identify a few active (data-transmitting) users among all possible users in a cell so that theproblem can be well interpreted as a multi-label classificationproblem identifying a few labels among all possible classes.By employing the set of received vectors as inputs and theactive user index as an output, deep neural network (DNN)is trained to find out indices of active users. In the channelestimation problem, on the other hand, desired task is toproduce the real-valued channel estimate from the receivedpilot signals. Using the MSE between the real channel and theDL output (i.e., estimate of real channel) as a loss function,DNN learns the regression mapping between received vectorand channel.

2) Unsupervised learning: unsupervised learning is usedwhen the ground-truth label is unavailable. In this case, clearly,one cannot compute the difference between the generatedoutput and the label so that the design goal (i.e., objectivefunction) is used as a loss function instead. In the resourceallocation problem, for example, it is very difficult to findout an optimal resource scheduling maximizing the quality-of-service (QoS) since the problem is highly nonlinear mixed-integer programming [9]. In this case, by employing the QoSfunction itself (e.g., throughput, latency, reliability) as a lossfunction, the DL model can be trained. Another example fittingto this category is the MIMO beamforming problem. Essenceof this problem is to find out the downlink beamformingvector maximizing the users’ sum rate [10]. Since the sumrate maximization problem is non-convex, it is in general verydifficult to find out the optimal beamforming vector so that thesupervised learning might not be an appropriate option. Whenwe try to solve the problem using the unsupervised learning,we set the downlink channel between the BS and the user asa training dataset, the beamforming vector as an output, andthe sum rate which is a function of the beamforming vector asa loss function. In short, the unsupervised learning is useful

3

Fig. 1. Illustration of synthetic data generation strategy.

when the desired output used as a label is unavailable for thereasons such as nonlinearity/nonconvexity of the problem.

3) Reinforcement learning (RL): RL is a goal-orientedlearning technique where an agent learns how to solve atask by trials and errors. In the learning process, the agentobserves the state of an environment, takes an action, andthen receives a reward for the action. RL is suitable forthe sequential decision-making problem whose purpose isto find out a series of actions maximizing the performancemetric. Recently, deep RL (DRL) has been popularly usedsince it can effectively handle the large-scale state-action pairin dynamically varying wireless environments. For example,when we try to improve the energy efficiency of the ultra-densenetwork (UDN), we can use DRL to control the on/off mode oruser association pattern of small-cell base stations (SBSs) [11].In the DRL implementation, a digital unit (DU), playing therole of the agent, observes the state (e.g., channel state anduser rate constraint) and then determines the action (on/offmode selection of SBSs) based on the reward. To minimize theenergy consumption, the reward should be set to be high/lowwhen the consumed energy is small/large. By playing sufficientnumber of episodes (sequence of states, actions, and rewards),DRL-based DNN learns the SBS control policy minimizingthe long-term energy consumption of the wireless network.

III. ISSUES TO BE CONSIDERED FOR DL-BASEDWIRELESS COMMUNICATION SYSTEMS

Two key ingredients in the DL-based wireless systems aresufficient and comprehensive training samples and properlydesigned neural network. In this section, we delve into theseissues.

A. Training Dataset Acquisition

When the number of samples is not sufficient enough, thedesigned DL model would be closely fitted to the trainingdataset, making it difficult to make a reasonable inferencefor the unseen data. This problem that the trained DL modellacks the generalizing capability is often called overfitting.In the modulation classification task, for example, if thereceived signals are generated from the BPSK and QPSKmodulation exclusively, then the trained classifier cannot ac-curately identify the 16-QAM modulated symbols. To preventthe overfitting problem, a dataset should be large enough tocover all possible scenarios. This is not easy, in particular forwireless systems, since the number of real transmissions willbe humongous. In acquiring the training dataset, we basicallyhave three options:

!"#$%

!"#"$%&'$

!"#$%$&'"()#*+$"

()*+$),)#%&'$

!"#$(*"#),-". .#/#

-')*"./)*&$)01&)'#

2$%)#)#3./%&%*"&.4'$.(560%*"/

+',,1#)+%&)'#.*7*&",

8"%9

'$

:%;"

Fig. 2. Description of GAN-based data generation strategy.

• Collection from the actual received signals• Synthetic data generation using the analytic system model• Real-like training set generation using generative adver-

sarial network (GAN)In the data acquisition, a straightforward option is to col-

lect the real transmit/receive signal pair. Doing so, however,will cause a significant overhead since it requires too manytraining data transmissions. For example, when collecting onemillion received signals in 5G NR systems, it will take morethan 15 minutes (106 symbols × 0.1 subframe/symbol × 8ms/subframe).

To reduce the overhead, one can consider syntheticallygenerated dataset (see Fig. 1). In fact, in the design, test,and performance evaluation phase of most wireless systems,analytic models have been widely used. For example, prop-agation channels such as the extended pedestrian A (EPA)channel or extended vehicular A (EVA) channel has beenpopularly employed in the generation of training dataset [12].Since the synthetic data can be generated easily using a simpleprogramming, time and effort to collect huge training data canbe saved. However, there might be some, arguably non-trivial,performance degradation caused by the model mismatch.

Yet another intriguing option is to use the artificial butrealistic samples generated by the DL technique. This ap-proach is in particular useful when the analytic model isunknown or non-existent (e.g., underwater acoustic commu-nication and satellite communication) and real measured datais not enough. In this case, GAN technique, an approach togenerate samples having the same distribution with the inputdataset, can be employed [13]. In essence, GAN consists oftwo neural networks: generator and discriminator (see Fig. 2).The generator produces real-like data from the random noiseand the discriminator tries to distinguish whether the generatedoutput is real or fake. To train these two networks, the min-maxloss function, typically expressed as the cross-entropy distancebetween the distribution of the real-like data and that of thereal data, is often used [13]. When the training is finishedproperly, the generator output is fairly reliable and hence thediscriminator cannot judge whether the generator output is realor fake, which implies that we can readily use the generatoroutput as a training data.

In order to observe the validity of the training data genera-tion strategies we discussed, we evaluate the MSE performanceof the DL-based channel estimator (see Fig. 3). As shown inFig. 3, we observe that the GAN-based data generation and thesynthetic data generation are effective and fairly competitive.Interestingly, the performance gap between actual data trans-

4

Fig. 3. MSE performances of the DL-based channel estimator using threedistinct strategies.

mission and GAN-based data generation is insignificant sincethe distribution of the GAN-generated data matches well withthat of the real data. Whereas, the gap between synthetic datageneration and actual data transmission is a bit large (around2 dB at MSE= 10−4) due to the model mismatch between realchannels and synthetic channels generated from the analyticmodel.

B. DNN Architecture Design

In the design of DNN-based wireless systems, oneshould consider the input characteristics (e.g., tempo-ral/spatial/geometric correlation), wireless environments (e.g.,mmWave/THz/V2X/ UAV link), and system configurations(e.g., bandwidth, power, number of antennas).

1) Baseline network: a natural first step of the DNN designis to choose the baseline architecture. Based on the connectionshape between neighboring layers, the neural network canbe divided into three types: fully-connected network (FCN),convolutional neural network (CNN), and recurrent neuralnetwork (RNN).

FCN can be used universally since each hidden unit (neu-ron) is connected to all neurons in the next layer. Whenthe input dataset has a spatial structure (e.g., 2D-resourcetime/frequency grid and the 2D antenna array in MIMOsystems), CNN might be an appealing option. In CNN, eachneuron is computed by the convolution between the 2Dspatial filter and a part (e.g., rectangular shaped region) ofneurons in the previous layer. Due to the local connectivitywithin the convolution filter, CNN facilitates the extraction ofspatial correlated feature. For example, in the mmWave MIMObeamforming, 2D beam radiation patterns among the uniformrectangular array (URA) antennas can be extracted using CNN.

Whereas, when the input sequence is temporally correlated,which is true for most of communication channels, RNN orlong-short term memory (LSTM) might be a good choice.By employing the current inputs together with outputs of theprevious hidden layer, temporally correlated feature can be

extracted. For instance, by applying RNN to the mmWavebeam tracking, change of the Doppler frequency caused bythe mobile’s movement can be extracted.

2) Activation layer: activation layer is used to 1) embedthe nonlinearity in the hidden layer and 2) generate the desiredtype of output in the final layer.

In each hidden layer, weighted sum of inputs passes throughthe activation layer to determine whether the informationgenerated by the hidden unit is activated (delivered to thenext layer) or not. To this end, rectified linear unit (ReLU)function or hyperbolic tangent function can be used [1]. Byimposing the nonlinearity to the linearly transformed input,one can better promote the nonlinear operation (e.g., suc-cessive interference cancellation) and systematic nonlinearity(e.g., amplifier distortion or nonlinear RF filtering).

In the final layer of DNN, the activation layer is used tomake sure that the generated output is the desired type. Indeed,to compute the loss function, the DL output and the true labelshould be the same type. In the classification problem, theground-truth label for each class is the probability so that thefinal output should also be a form of the probability. Whenthere are several active users in AUD or the occupied/emptybands are non-unique in the spectrum sensing problem, itwould be desirable to use the sigmoid function [1] returningthe individual probability for each class. Whereas, when theproblem is modeled as a multi-class classification problemsuch as the MIMO detection problem, a softmax function [6]would be a good fit since it normalizes the output vector intothe probability distribution over all classes.

3) Input normalization: in the training process, the neuralnetwork computes the gradient of a loss function with respectto any weight and then updates the weight in the negativedirection of the gradient. Therefore, when the input varies ina wide range (e.g., multi-user communication scenario), varia-tion in the weight update process will also be large, degradingthe training stability and convergence speed severely. To pre-vent such ill-behavior, one should perform the normalizationfor the outputs of each layer. Typically, there are two typesof the normalization strategies: layer normalization and batchnormalization.

When the input vector contains signals from multiple userswith different wireless geometries, variation of the receivedsignals would be quite large. The layer normalization is agood fit for this case. By normalizing each input vector, thelayer normalization scheme ensures that the normalized inputdistribution has the fixed mean and variance.

Whereas, when the input data consists of several differenttypes of information, the batch normalization (BN) can be abetter option. In the mini-batch consisting of multiple inputvectors, elements in each row (i.e., elements with the sameinput type) are normalized. For example, in the DRL-basedpower management problem discussed in Section II.B, boththe channel state information (CSI) and required data rate areused as inputs to DRL. Since the scale of two componentswould be quite different, the layer normalization will not workand simply mess up the input dataset. To avoid the hassle, CSIand data rate need to be normalized separately using BN.

4) Dropout layer: when we use DNN consisting of multiple

5

Fig. 4. Exemplary DNN architecture designed for the AoA detection.

hidden layers, the final output is determined by the activatedhidden units in each layer. So, for the highly correlatedinputs (e.g., samples generated from the non-orthogonal sparsecodebook and bit streams with small Hamming distance), theiractivation patterns will also be similar so that the final infer-ence can be easily corrupted in the presence of perturbations(e.g., noise, inter-user interference, and channel estimationerror). In order to mitigate this problem, the dropout layerwhere the activated hidden units are dropped out randomlycan be used in the training phase [14]. In this scheme, bytemporarily removing part of incoming and outgoing con-nections randomly, ambiguity (similarity) of the activationpatterns among correlated dataset can be better resolved.

5) Ensemble learning: ensemble learning, a method toaverage out multiple outputs (inferences) of independently-trained networks, is conceptually analogous to the receiverdiversity technique in that it enhances the output quality with-out requiring additional wireless resources (e.g., frequency,time, and transmitter power). In the multi-user communicationscenario, for example, the trained network might be closelyfitted to the certain wireless environment so that the trainedDNN might not generate a reliable prediction for the inputsobtained from unobserved wireless scenario. In this case,ensemble learning comes to the rescue. Key idea of theensemble learning is to train the multiple neural networkswith different training sets and initial parameters obtained fromdifferent wireless conditions and then combine the generatedoutputs to improve the quality of the final inference. Using theensemble learning-based DNN, one can mitigate the overfittingcaused by the wireless environments considerably.

6) Loss function: since DNN weights are updated in adirection to minimize the loss function, the loss functionshould well reflect the design goal. When the ground-truthlabel is available, one can use the cross entropy, MSE, or meanabsolute error (MAE). If this is not the case, as we discussedin unsupervised learning, one might use the design goal (e.g.,throughput or energy consumption) as a loss function.

If there exist multiple constraints for the problem at hand,these constraints should be combined together in the lossfunction. For example, in the DL-based power managementin UDN, the DNN is trained to minimize the total consumedpower and at the same time should satisfy the rate requirementof a mobile. To do so, we set the loss function as the weightedsum of power consumption loss and the rate constraint loss.By controlling the weight of each loss, one can achieve the

trade-off between the consumed power and data rate.7) Weight update strategy: in order to update the network

parameter set, the gradient of a loss function should becomputed first. A straightforward way to update the parametersis the batch gradient descent (BGD) method where the gradientof the loss function is computed for the entire training dataset.Since the whole dataset is used in each and every trainingiteration, the training cost is quite expensive and the trainingspeed will be very slow. Further, in the non-static scenariowhere the channel characteristics are varying, parameters cor-responding to the dynamically changing wireless environments(e.g., Doppler spread, scatter location) would not be updatedproperly. A better option, in fact widely used option, is thestochastic gradient descent (SGD) method. In contrast to BGD,SGD uses a small number of samples in each training iterationso that it can update the network parameters as soon as a fewsamples are obtained.

8) Knowledge distillation: when one trains the DL modelin the Internet of Things (IoT) device, on-device energyconsumption is a big concern since most of the IoT devices arebattery-powered. To reduce the training overhead, knowledgedistillation (KD) technique [15], an approach to generate arelatively small-sized DL model from a trained large model,can be employed. Key idea of KD is to train a small network(a.k.a student network) using the output of a large network(a.k.a teacher network). In the generation of the loss function,output of the student network is compared against the outputof the teacher network as well as the ground-truth label. Indoing so, the student network, implemented in IoT device,can easily capture the underlying feature (e.g., similarity anddifference among the classes) extracted by the teacher networkimplemented in the digital unit (DU). For the properly de-signed student network using KD, we see that the performanceof student network is fairly comparable to that of teachernetwork.

In Fig. 4, we present the DNN architecture for the AoAdetection using the techniques we discussed. Due to the sparsescattering in the mmWave band, a propagation path can becharacterized by a few AoAs. By identifying these angles, thereceiver can align the beam direction to the transmitter, therebymaximizing the signal-to-noise ratio (SNR). In DNN, we usethe received signal and the steering vectors corresponding topossible AoAs as inputs and the set of the detected AoAs asoutputs. Since an input is a composite of the received signaland steering vectors, we use BN to normalize each component

6

Fig. 5. AoA detection performance of various DNNs as a function of SNR.The detection success probability corresponds to the percentage of detectedAoAs among all angles.

separately. Also, to generate the individual probability for eachangle, we use the sigmoid activation function in the final layer.

To judge the effectiveness of the DNN architecture con-sisting of the normalization, dropout, and ensemble learning,we evaluate the success probability of the AoA detection. Inour simulations, we train 1) FCN, 2) FCN with BN, 3) FCNwith the dropout layer, 4) FCN with the ensemble network,and 5) FCN with all techniques we discussed. As shownin Fig. 5, the performance gain introduced by the detailedDNN techniques is considerable. For example, FCN with thedropout layer achieves a significant gain (4.8 dB gain) over theconventional FCN since the highly correlated steering vectorscan be better resolved using the dropout technique. The gainobtained from BN is also significant (4.7 dB gain) since itreduces the variation of the received signal caused by thedevice location change. Finally, when the gains induced by alltechniques are combined together, we can achieve very reliableAoA detection performance, which can never be achieved bythe basic FCN even in high SNR regime.

IV. CONCLUSION AND DISCUSSION

In this article, we presented an overview of DL-basedwireless system with emphasis on the design issues relatedto DL model selection, training set acquisition, and DNNarchitecture design. As the automated services and applicationsusing machines, vehicles, and sensors proliferate, we expectthat DL will be more popular and eventually become adominating design paradigm in 6G era. To deal with var-ious frequency bands (i.e., sub-6GHz/mmWave/THz), wire-less resources (massive MIMO antennas, intelligent reflectingsurface, relays), and geographical environment, we need togo beyond the state-of-the-art DL technique used mainly forthe purpose of the function approximator and exploit moreaggressive and advanced DL techniques. For example, whenwe try to train a DL model for the desired task, transferlearning, an approach to use the pre-trained model for a similartask, can be employed. By recycling most of parameters in

the pre-trained model and then training only a small part ofparameters, new model can learn the distinct information forthe desired task while sharing the common feature betweentwo tasks. Another approach worth investigation is the metalearning, a technique to learn the desired task quickly usingthe DL models of similar tasks. By setting the parametersminimizing the sum loss functions of similar tasks as initialparameters, DL model for the desired task can be learned withreduced training overhead.

Our hope is that this article will serve as a useful guidefor communication researchers who want to apply the DLtechnique in their wireless application. For the test codeof wireless examples discussed in this paper, check outhttp://islab.snu.ac.kr/publication.

ACKNOWLEDGMENT

This work was supported by Samsung Research Funding &Incubation Center.

REFERENCES

[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning," Nature, vol. 521,no. 7553, pp. 436-444, May 2015.

[2] D. Silver et al., “Mastering the game of go with deep neural networksand tree search," Nature, vol. 529, no. 7587, pp. 484-489, Jan. 2016.

[3] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification withdeep convolutional neural networks,” In Proc. Adv. Neural Inf. Process.Syst. (NIPS), 2012.

[4] A. Graves, A. Mohamed, G. Hinton, “Speech recognition with deeprecurrent neural networks," In Proc. Int. Conf. Acoustics, Speech, andSignal Process. (ICASSP), 2013.

[5] N. Samuel, T. Diskin, and A. Wissel, “Learning to detect," IEEE Trans.Signal. Process., vol. 67, no. 10, pp. 2554-2564, 2019.

[6] W. Kim, Y. Ahn, and B. Shim, “Deep neural network-based active userdetection for grant-free NOMA systems," IEEE Trans. Commun., vol. 68,no. 4, pp. 2143-2155, 2020.

[7] H. Huang, J. Yang, H. Huang, Y. Song, and G. Gui, “Deep learning forsuper-resolution channel estimation and DOA estimation based massiveMIMO system," IEEE Trans. Veh. Technol., vol. 67, no. 9, pp. 8549-8560,2018.

[8] X. Wei, C. Hu, and L. Dai “Deep learning for beamspace channelestimation in millimeter-wave massive MIMO systems," IEEE. Trans.Commun., vol. 69, no. 1, Jan. 2021.

[9] M. Alenezi, K. K. Chai, A. S. Alam, Y. Chen, and S. Jimaa, “Unsu-pervised Learning Clustering and Dynamic Transmission Scheduling forEfficient Dense LoRaWAN Networks," IEEE Access, 2020.

[10] J. Guo, C. K. Wen, and S. Jin, “Deep Learning-Based CSI Feedback forBeamforming in Single-and Multi-cell Massive MIMO Systems," IEEEJournal. Sel. Areas. Commun., 2020.

[11] H. Ju, S. Kim, Y. Kim, H. Lee, and B. Shim, “Energy-EfficientUltra-Dense Network via Deep Reinforcement Learning," in Proc. IEEEWorkshop Sig. Proc. Adv. Wireless Commun. (SPAWC), 2020.

[12] 3GPP TS 36.104. “Evolved Universal Terrestrial Radio Access (E-UTRA); Base Station (BS) Radio Transmission and Reception," 3rdGeneration Partnership Project; Technical Specification Group RadioAccess Network.

[13] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,D. Warde-Farley,S. Ozair, A. Courville, and Y. Ben-gio, “Generative adversarial nets," inProc. Adv. Neural Inf. Process. Syst. (NIPS), 2014.

[14] N. Srivastava et al., “Dropout: a simple way to prevent neural networksfrom overfitiing," The journal of machine learning research, vol. 15, no.1, pp. 1929-1958, 2014.

[15] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neuralnetwork," arXiv preprint arXiv:1503.02531, 2015.

7

Wonjun Kim (Member, IEEE) is currently pursuing the Ph.D. degree in theSchool of Electrical and Computer Engineering, Seoul National University,Korea. His research interests include compressed sensing and deep learningtechniques for the 5G and 6G wireless communications.

Yongjun Ahn (Member, IEEE) is currently pursuing the Ph.D. degree in theSchool of Electrical and Computer Engineering, Seoul National University,Korea. His research interests include 5G NR standardization and 6G wirelesssystem design.

Jinhong Kim (Member, IEEE) is currently pursuing the Ph.D. degree in theSchool of Electrical and Computer Engineering, Seoul National University,Korea. His research focuses on deep learning architecture design for 6Gwireless communications.

Byonghyo Shim (Senior Member, IEEE) is a professor in the Departmentof Electrical and Computer Engineering, Seoul National University, Korea.From 2005 to 2007 he worked for Qualcomm Inc., and from 2007 to 2014he was with Korea University. His research interests include 5G and 6Gcommunications, statistical signal processing, information theory, and deeplearning.

towards deep learning-based wireless communication systems

Documents