a brief introduction to machine learning (with ...i o er some pointers to speci c applications for...

A Brief Introduction to Machine Learning(With Applications to Communications)

Osvaldo Simeone

King’s College London

August 8, 2018

Osvaldo Simeone Brief Intro to ML + Comm 1 / 129

Goals and Learning Outcomes

Goals:I Provide an introduction to main areas in machine learning with a focus

on probabilistic methodsI Offer some pointers to specific applications for telecom

Learning outcomes:I Recognize scenarios in which machine learning can and cannot be usefulI Identify specific classes of machine learning methods that apply to a

given problem with applications to telecom networks


For More...

O. Simeone, “A Brief Introduction to Machine Learning forEngineers,” arXiv:1709.02840.

O. Simeone, “A Very Brief Introduction to Machine Learning withApplications to Communication Systems,” arXiv:1808.02342.


What is Machine Learning?Traditional engineering approach:

I Acquisition of domain knowledge...



I ... mathematical (physics-based) modelling...



I ... and optimized algorithm design with performance guarantees


What is Machine Learning?

Machine learning approach:I Selection of a general purpose model and a learning algorithm...


What is Machine Learning?

Machine learning approach:I ... learning based on data (examples) and use of the trained

(black-box) “machine”


When to Use Machine Learning?

Advantages:I lower costI faster developmentI reduced implementation complexity

DisadvantagesI suboptimal performanceI lack of interpretabilityI limited applicability


When to Use Machine Learning?

(Slightly modified) criteria by [Brynjolfsson and Mitchell ’17]:I traditional engineering flow not suitable: model deficit or algorithm

deficitI the task involves a function that maps well-defined inputs to

well-defined outputsI the task provides clear feedback with clearly definable goals and metricsI large data sets exist or can be created containing input-output pairsI the task does not involve long chains of logic or reasoning that depend

on diverse background knowledge or common senseI the task requires does not require detailed explanations for how the

decision was madeI the task has a tolerance for error and no need for provably correct or

optimal solutionsI the phenomenon or function being learned should not change rapidly

over time


Taxonomy of Machine Learning Methods

Supervised learning

Unsupervised learning

Reinforcement learning



Supervised vs unsupervised learning



Reinforcement learning: feedback-based sequential decision making

[@ D. Silver]

st at

rt


Communication Networks

Fog network architecture [5GPPP]

Core

Network

Edge

Cloud

Wireless

Edge

Access

Network

Core

Cloud


Communication Networks


Core

Network

Edge

Cloud

Wireless

Edge

Access

Network

Core

Cloud

Cloud

Edge

Data collection and processing can take place at the edge and/or atthe cloud.


Data in Communication Networks

Data at the edge:I PHY: Baseband signals, (multi-RAT) channel qualityI MAC/ Link: Throughput, FER, random access load and latencyI Network: Location, traffic loads across services, users’ device types,

battery levelsI Application: Users’ preferences, content demands, computing loads,

QoS metrics


Data in Communication Networks

Data at the cloud:I Network: Mobility patterns, network-wide traffic statistics, outage ratesI Application: User’s behavior patterns, subscription information, service

usage statistics, TCP/IP traffic statistics


Learning in Communication Networks

Which tasks?I traditional engineering flow not suitable: model deficit or algorithm

deficit (depends)I the task involves a function that maps well-defined inputs to

well-defined outputs XI the task provides clear feedback with clearly definable goals and

metrics XI large data sets exist or can be created containing input-output pairsXI the task does not involve long chains of logic or reasoning that depend

on diverse background knowledge or common sense XI the task requires does not require detailed explanations for how the

decision was made XI the task has a tolerance for error and no need for provably correct or

optimal solutions (depends)I the phenomenon or function being learned should not change rapidly

over time (depends)


Overview

Supervised Learning

Unsupervised Learning

Reinforcement Learning


Overview

Supervised Learning




Supervised Learning

Supervised learning:I regression: continuous labelsI classification: discrete labels


Supervised Learning: Regression

0 0.2 0.4 0.6 0.8 1-1.5

-1

-0.5

0

0.5

1

1.5

Training set D: N training points (xn, tn), n = 1, ...,Nxn = covariates, domain points, or explanatory variablestn = dependent variables, labels, or responses (continuous)Goal: Predict the label t for a new, that is, as of yet unobserved,domain point x


Supervised Learning: Classification

4 5 6 7 8 90.5

1

1.5

2

2.5

3

3.5

4

4.5

?

Training set D: N training points (xn, tn), n = 1, ...,Nxn = covariates, domain points, or explanatory variablestn = dependent variables, labels, or responses (discrete)Goal: Predict the label (class) t for a new, that is, as of yetunobserved, domain point x


Supervised Learning

Impossible task without assuming a model (inductive bias) by the nofree lunch theorem

Memorizing vs. learning: Retrieval of a value tn corresponding to analready observed pair (xn, tn) ∈ D vs. predict the value t for anunseen x


Defining Supervised Learning

Training set D:

(xn, tn) ∼i.i.d.

p(x , t), n = 1, ...,N

Based on the training set D, we derive a predictor t(x).

Test pair:(x, t) ∼

indep. of Dp(x , t)

Quality of the prediction t(x) for a pair (x , t)

`(t, t(x))

for some loss function `(t, t), e.g., `(t, t) = (t − t)2 (quadratic) or`(t, t) = 1(t 6= t) (probability of error)


Defining Supervised Learning

Goal: minimize average loss on the test pair (generalization loss)

Lp(t) = E(x,t)∼pxt [`(t, t(x))]

Alternative viewpoints to frequentist framework: Bayesian andMinimum Description Length (MDL)


When the True Distribution p(x , t) is Known...

... we don’t need data D

... and we have a standard inference problem, i.e., estimation(regression) or detection (classification).

The solution can be directly computed from the posterior distribution

p(t|x) =p(x , t)

p(x)

as

t∗(x) = argmint

Et∼pt|x [`(t, t)|x ]


When the Model p(x , t) is Known...

With quadratic loss, conditional mean: t∗(x) = Et∼pt|x [t|x ]

With probability of error, maximum a posteriori (MAP):t∗(x) = argmaxt p(t|x)

Example: with joint distribution

x\t 0 1

0 0.05 0.45

1 0.4 0.1

, we have

p(t = 1|x = 0) = 0.9

and

t∗(x = 0) = 0.9× 1 + 0.1× 0 = 0.9 for quadratic loss,

t∗(x = 0) = 1 for probability of error (MAP)

.


When the True Distribution p(x , t) is Not Known...

... we need data D

... and we have a learning problem

1. Model selection (inductive bias): Define a parametric model

p(x , t|θ)︸︷︷︸generative model

or p(t|x , θ)︸︷︷︸discriminative model

2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ

3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)


Logistic Regression

Example: Binary classification (t ∈ {0, 1})1. Model selection (inductive bias): logistic regression(discriminative model)

φ(x) = [φ1(x) · · ·φD′(x)]T is a vector of features (e.g., bag-of-wordsmodel for a text).


Logistic Regression

Parametric probabilistic model:

p(t = 1|x ,w) = σ(wTφ(x))

where σ(a) = (1 + exp(−a))−1 is the sigmoid function.


Logistic Regression2. Learning: To be discussed3. Inference: With probability of error loss, MAP classification

wTφ(x)︸︷︷︸logit or LLR

t=1≷t=0

0


Multi-Layer Neural Networks

1. Model selection (inductive bias): multi-layer neural network(discriminative model)

Multiple layers of learnable weights enable feature learning.


Supervised Learning







Learning: Maximum Likelihood

ML selects a value of θ that is the most likely to have generated theobserved training set D:

maximize p(D|θ)

⇐⇒maximize ln p(D|θ) (log-likelihood, or LL)

⇐⇒minimize − ln p(D|θ) (negative log-likelihood, or NLL)

For discriminative models:

minimize − ln p(tD|xD, θ) = −N∑

n=1

ln p(tn|xn, θ)


Learning: Maximum LikelihoodThe problem rarely has analytical solutions and is typically addressedby Stochastic Gradient Descent (SGD).For discriminative models, we have

θnew ← θold + γ∇θ ln p(tn|xn, θ)|θ=θold

γ is the learning rate.With multi-layer neural networks, this approach yields thebackpropagation algorithm.


Supervised Learning







Model Selection

How to select a model (inductive bias)?

Model selection typically requires the model order, i.e., the capacity ofthe model.

Ex.: For logistic regression,I Model order M: Number of features


Model Selection

Example: Regression using a discriminative model p(t|x)

M∑m=0

wmxm

︸︷︷︸t(x): polynomial of order M

+N (0, 1)

0 0.2 0.4 0.6 0.8 1-1.5

-1

-0.5

0

0.5

1

1.5


Model Selection

With M = 1, using ML learning of the coefficients –

0 0.2 0.4 0.6 0.8 1-3

-2

-1

0

1

2

3

M= 1


Model Selection: Underfitting...

With M = 1, the ML predictor t(x) underfits the data:I the model is not rich enough to capture the variations present in the

data;I large training loss

LD(θ) =1

N

N∑n=1

(tn − t(xn))2


Model Selection

With M = 9, using ML learning of the coefficients –

0 0.2 0.4 0.6 0.8 1-3

-2

-1

0

1

2

3

= 9M

M= 1


Model Selection: ... vs Overfitting

With M = 9, the ML predictor overfits the data:I the model is too rich and, in order to account for the observations in

the training set, it appears to yield inaccurate predictions outside it;I presumably we have a large generalization loss

Lp(t) = E(x,t)∼pxt [(t− t(x))2]


Model Selection

M = 3 seems to be a resonable choice...

... but how do we know given that we have no data outside of thetraining set?

0 0.2 0.4 0.6 0.8 1-3

-2

-1

0

1

2

3

= 9M

M= 1

M= 3


Model Selection: ValidationKeep some data (validation set) to estimate the generalization errorfor different values of M(See cross-validation for a more efficient way to use the data.)


Model Selection: ValidationValidation allows model order selection.

1 2 3 4 5 6 7 8 90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

root

ave

rage

squ

ared

loss

training

generalization (via validation)

overfittingunderfitting

Validation can also be used more generally to select otherhyperparameters (e.g., learning rate).


Model Selection: ValidationModel order selection should depend on the amount of data...It is a problem of bias (asymptotic error) versus generalization gap.

0 10 20 30 40 50 60 700

0.2

0.4

0.6

0.8

1ro

ot a

vera

ge q

uadr

atic

loss

M

M

= 1

= 7

generalization (via validation)

training


Application to Communication Networks


Core

Network

Edge

Cloud

Wireless

Edge

Access

Network

Core

Cloud

Cloud

Edge


At the Edge: Overview

At the edge:I PHY: Detection and decoding, precoding and power allocation,

modulation recognition, localization, interference cancelation, jointsource channel coding, equalization in the presence of non-linearities

I MAC/ Link: Radio resource allocation, scheduling, multi-RAThandover, dynamic spectrum access, admission control

I Network: Proactive cachingI Application: Computing resource allocation, content request prediction


At the Edge: PHYChannel detection and decoding – classification

[Cammerer et al '17]


At the Edge: PHYChannel detection and decoding – classificationModel deficit

[Farsad and Goldsmith '18]


At the Edge: PHYChannel equalization in the presence of non-linearities, e.g., foroptical links – regressionAlgorithm deficit

[Wang et al ‘16]Osvaldo Simeone Brief Intro to ML + Comm 54 / 129

At the Edge: PHY

Channel equalization in the presence of non-linearities, e.g., forsatellite links with non-linear ampliers – regression

Algorithm deficit

[Bouchired et al ’98]


At the Edge: PHY

Channel decoding for modulation schemes with complex optimaldecoders, e.g., continuous phase modulation – classification

Algorithm deficit

[De Veciana and Zakhor '92]


At the Edge: PHY

Channel decoding – classification

Leverage domain knowledge to set up the parametrized model to belearned

[Nachmani et al ‘16]


At the Edge: PHYChannel equalization to compensate for hardware impairments –regressionLeverage domain knowledge to design the decoder

[Schibisch et al ‘18]


At the Edge: PHYModulation recognition – classificationAlgorithm deficit

[Agirman-Tosun et al '11]


At the Edge: PHYLocalization – regressionModel deficit

(coordinates)

[Fang and Lin ‘08]Osvaldo Simeone Brief Intro to ML + Comm 60 / 129

At the Edge: PHYPrecoding and power allocation – regressionAlgorithm deficit

[Sun et al ’17]


At the Edge: PHYInterference cancellation – regressionModel deficit

[Balatsoukas-Stimming ‘17]Osvaldo Simeone Brief Intro to ML + Comm 62 / 129

At the Edge: MAC/ LinkSpectrum sensing – classificationModel deficit

[Tumuluru et al '10]


At the Edge: MAC/ LinkMmwave channel quality prediction using depth images – regressionModel deficit

[Okamoto et al '18]Osvaldo Simeone Brief Intro to ML + Comm 64 / 129

At the Edge: Network and ApplicationContent prediction for proactive caching – classificationModel deficit

[Chen et al '17]


At the Cloud: Overview

At the cloud:I Network: Routing (classification vs look-up tables), SDN flow table

updating, proactive caching, congestion controlI Application: Cloud/ fog computing, Internet traffic classification


At the Cloud: NetworkLink prediction for wireless routing – classification/ regressionModel deficit

[Wang et al 06]Osvaldo Simeone Brief Intro to ML + Comm 67 / 129

At the Cloud: NetworkLink prediction for optical routing – classification/ regressionModel deficit

[Musumeci et al ’18]Osvaldo Simeone Brief Intro to ML + Comm 68 / 129

At the Cloud: NetworkCongestion prediction for smart routing – classificationModel deficit

[Tang et al ‘17]Osvaldo Simeone Brief Intro to ML + Comm 69 / 129

At the Cloud: Network and ApplicationTraffic classification – classificationModel deficit

[Nguyen et al '08]


Overview

Supervised Learning





Unsupervised learning tasks operate over unlabelled data sets.

General goal: discover properties of the data, e.g., for compressedrepresentation

“Some of us see unsupervised learning as the key towards machineswith common sense.” (Y. LeCun)


“Defining” Unsupervised Learning

Training set D:xn ∼

i.i.d.p(x), n = 1, ...,N

Goal: Learn some useful properties of the distribution p(x)

Alternative viewpoints to frequentist framework: Bayesian and MDL


Unsupervised Learning Tasks

Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers

Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)

Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks

Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films



1. Model selection (inductive bias): Define a parametric modelp(x |θ)


3. Clustering, feature extraction, sample generation...


Models

Unsupervised learning models typically involve hidden or latentvariables.

zn = hidden, or latent, variables for each data point xn

Ex.: zn = cluster index of xn


(a) Directed Generative Models

Model data x as being caused by z :

p(x |θ) =∑z

p(z |θ)p(x |z , θ)


(a) Directed Generative Models

Ex.: Document clusteringI x is a document, and z is (interpreted as) topicI p(z |θ) = distribution of topicsI p(x |z , θ) = distribution of words in document given topic

Basic representatives:I Mixture of GaussiansI Likelihood-free models


(d) Autoencoders

Model encoding from data to hidden variables, as well as decodingfrom hidden variables back to data:

p(z |x , θ) and p(x |z , θ),


(d) Autoencoders

Ex.: CompressionI x is an image and z is (interpreted as) a compressed (e.g., sparse)

representationI p(z |x , θ) = compression of image to representationI p(x |z , θ) = decompression of representation into an image

Basic representative: Principal Component Analysis (PCA), dictionarylearning, neural network-based autoencoders


Learning: Maximum Likelihood

Focus on directed generative models (a)

To simplify the notation, consider a single data point x (sum overdata set D to generalize).

ML problem:

maxθ

ln p(x |θ) = ln

(∑z

p(x , z |θ)

)

Key issue: Need to marginalize over latent variables, whosedistribution is not known, in order to evaluate LL.


ELBO

To tackle this issue, a standard approach is the introduction of avariational distribution q(z) and the use of the Evidence LowerBOund (ELBO).

For any fixed value x and any distribution q(z) on the latent variablesz (possibly dependent on x), the ELBO L(q, θ) is defined as

L(q, θ) =Ez∼q(z)[ln p(x , z|θ)− ln q(z)︸︷︷︸learning signal

]


ELBOThe ELBO is a global lower bound on the LL function

ln p(x |θ) ≥ L(q, θ),

where equality holds at a value θ0 if and only if the distribution q(z)satisfies

q(z) = p(z |x , θ0).

-4 -3 -2 -1 0 1 2 3 4-5

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

Log-

likel

ihoo

d

ELBO (0= 3)

ELBO (0 = 2)

LL


Expectation-Maximization (EM) Algorithm

...

LL

newold



Initialize parameter vector θold.

For each iterationI E step: For fixed parameter vector θold,

maxqL(q, θold)→ qnew(z) = p(z |x , θold)

I M step: For fixed variational distribution qnew(z),

maxθL(qnew, θ)→ max

θEz∼qnew(z) [ln p(x , z|θ)]



Initialize parameter vector θold.

For each iterationI E step: For fixed parameter vector θold,

maxqL(q, θold)→ qnew(z) = p(z |x , θold).

Bayesian inference of the latent variables

I M step: For fixed variational distribution qnew(z),

maxθL(qnew, θ)→ max

θEz∼qnew(z) [ln p(x , z|θ)]

Solve a supervised learning problem



EM guarantees decreasing objective values, which ensuresconvergence to a local optimum of the original problem.

...

LL

newold


Example: Mixture of Gaussians

Directed generative model:

z ∼ Bern(π)

x|z =k ∼ N(µk ,Σk)


Scaling EM

EM algorithm may be impractical for large scale problems: need tocompute posterior in E step and to average over z in the M step.

Solutions:I E step: Parametrize the variational distribution q(z |ϕ) or q(z |x , ϕ) and

maximize ELBO over ϕ (variational autoencoder)I M step: Approximate Ez∼qnew(z) [ln p(x , z|θ)] via Monte CarloI Use gradient descent for E and/or M steps


Learning: Beyond Maximum Likelihood

ML tends to provide inclusive and “blurry” estimates of thedistribution of the data distribution.

-5 0 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

This can be a problem for tasks such as data generation.


Learning: Beyond Maximum Likelihood

ML can be proven to minimize the KL divergence

KL(pD(x)||p(x |θ)) = Ez∼pD(x)

[ln

pD(x)

p(x |θ)

]betwen the empirical distribution

pD(x) =N[x ]

N(with counts N[x ] = |{n : xn = x}|)

and the model.


Learning: Beyond Maximum LikelihoodThe KL divergence is part of the larger class of f -divergences betweentwo distributions p(x) and q(x):

Df (p||q) = maxT (x)

Ex∼p[T (x)]− Ex∼q[g(T (x))],

for some concave increasing function g(·).

𝑇(𝑥)

𝑥~𝑝(𝑥)

𝑥~𝑞(𝑥)

𝑝 𝑥 if 𝑇 𝑥 large

discriminator

𝑞 𝑥 if 𝑇 𝑥 small


Learning: Generative Adversarial Networks (GANs)

Generalizing the ML problem, GANs attempt to solve the problem

minθ

maxϕ

Ex∼pD [Tϕ(x)]− Ex∼p(x |θ)[g(Tϕ(x))]

for some differentiable function Tϕ(x) of the parameter vector ϕ.

Choice of the divergence (via the discriminator) is tailored to data.

Can be applied to likelihood-free models.


Learning: Generative Adversarial Networks (GANs)

84 [NVIDIA]


Applications to Communication Networks


Core

Network

Edge

Cloud

Wireless

Edge

Access

Network

Core

Cloud

Cloud

Edge


At the Edge: Overview

At the edge:I PHY: E2E encoding/decoding, CSI compression and feedback,

fingerprinting for localization, blind source separation, blind channelequalization

I MAC/ Link: Clustering for resource allocation, clustering forself-organizing multi-hop networks


At the Edge: PHYEnd-to-end encoding/decoding for wireless channels – autoencoders

[O’Shea and Hoydis ’17]


At the Edge: PHYEnd-to-end encoding/decoding for optical channels – autoencodersAlgorithm deficit

[Karanov et al ‘18]


At the Edge: PHYEnd-to-end encoding/decoding for Gaussian channels with feedback –autoencoders based on Recurrent Neural Network (RNN)Algorithm deficit

[Kim et al ‘18]


At the Edge: PHYChannel State Information (CSI) compression and feedback –autoencodersModel deficit

[Wen et al ‘17]


At the Edge: PHYFingerprinting for localization – autoencodersModel deficit

[Xiao et al '17]Osvaldo Simeone Brief Intro to ML + Comm 110 / 129

At the Edge: PHYMimicking a propagation channel - GAN (see also [Ye et al ’18])Model deficit

[O’Shea et al ‘18]


At the Edge: PHY

Mimicking and identifying a propagation channel (e.g., satellite) -generative models

Leveraging domain knowledge improves the learned model.

104 [Ibnkahla ‘00]


At the Edge: MAC/ Link

Generating artificial examples to augment training set for spectrumsensing - GAN

[Nakashima et al '18]


At the Edge: MAC/ LinkResource allocation – clusteringAlgorithm deficit

[Abdelnasser et al '14]


At the Cloud: Overview

At the cloud:I Network: Clustering for group-based access control, anomaly detectionI Application: Community detection in social media, Internet traffic

clustering


At the Cloud: NetworkSelf-organizing multi-hop networks – clusteringAlgorithm deficit

[Abbassi and Younis '07]


At the Cloud: NetworkAnomaly detection – density estimationModel deficit

[Musumeci et al ’18]


At the Cloud: ApplicationCommunity detection in social networks - clusteringModel deficit

[Abbe et al ‘16]Osvaldo Simeone Brief Intro to ML + Comm 118 / 129

Concluding Remarks

Machine learning tools can leverage the availability of data andcomputing resources in modern communication systems.

Supervised, unsupervised and reinforcement learning paradigms lendthemselves to different key communication (sub)tasks.

Not a universal solution – case by case analysis of advantages anddisadvantages.


Concluding Remarks

Engineering the integration of traditional model-based techniques anddata-driven machine learning methods

[Reich ‘96]


Acknowledgements

This work has received funding from the European Research Council(ERC) under the European Union’s Horizon 2020 research and innovation

programme (grant agreement No. 725731).


References

[O’Shea and Hoydis ’17] T. J. O’Shea, J. Hoydis, An Introduction toMachine Learning Communications Systems, 2017[Cammerer et al ’17] Cammerer S, Gruber T, Hoydis J, Brink ST. ScalingDeep Learning-based Decoding of Polar Codes via Partitioning. arXivpreprint arXiv:1702.06901. 2017 Feb 22.[Balatsoukas-Stimming ’17] Balatsoukas-Stimming A. Non-Linear DigitalSelf-Interference Cancellation for In-Band Full-Duplex Radios Using NeuralNetworks. arXiv preprint arXiv:1711.00379. 2017 Nov 1.[Sun et al ’17] Sun H, Chen X, Shi Q, Hong M, Fu X, Sidiropoulos ND.Learning to Optimize: Training Deep Neural Networks for WirelessResource Management. arXiv preprint arXiv:1705.09412. 2017 May 26.[de Kerret et al ’17] Paul de Kerret, David Gesbert, Maurizio Filippone,Decentralized Deep Scheduling for Interfrence Channels, arXiv, 2017.[Wen et al ’17] Wen, Chao-Kai; Shih, Wan-Ting; Jin, Shi, Deep Learningfor Massive MIMO CSI Feedback arXiv.


References[Agirman-Tosun et al ’11] Agirman-Tosun H, et al. “Modulationclassification of MIMO-OFDM signals by independent component analysisand support vector machines.” in Proc. Asilomar 2011.[Fang and Lin ’08] Fang SH, Lin TN. Indoor location system based ondiscriminant-adaptive neural network in IEEE 802.11 environments. IEEETransactions on Neural networks. 2008 Nov;19(11):1973-8.[Tumuluru et al ’10] Tumuluru VK, Wang P, Niyato D. A neural networkbased spectrum prediction scheme for cognitive radio. in Proc. ICCC 2010.[Chen et al ’17] Chen M, et a;. Echo state networks for proactive cachingin cloud-based radio access networks with mobile users. IEEE Transactionson Wireless Communications. 2017.[Xiao et al ’17] Xiao C, Yang D, Chen Z, Tan G. 3-D BLE IndoorLocalization Based on Denoising Autoencoder. IEEE Access.2017;5:12751-60.[Abdelnasser et al ’14] Abdelnasser A, et al. Clustering and resourceallocation for dense femtocells in a two-tier cellular OFDMA network.IEEE Transactions on Wireless Communications, 2014.


References

[Nguyen et al ’08] Nguyen TT, Armitage G. A survey of techniques forinternet traffic classification using machine learning. IEEECommunications Surveys & Tutorials. 2008 Oct 1;10(4):56-76.[Wang et al 06] Wang, Yong, Margaret Martonosi, and Li-Shiuan Peh. "Asupervised learning approach for routing optimizations in wireless sensornetworks." Proc. ACM Workshop on Multi-hop ad hoc Networks, 2006.[Abbassi and Younis ’07] Abbasi AA, Younis M. A survey on clusteringalgorithms for wireless sensor networks. Computer communications, 2007.[Abbe et al ’16] Abbe E, Bandeira AS, Hall G. Exact recovery in thestochastic block model. IEEE Transactions on Information Theory, 2016.[Wang et al ’18] Wang, Zhi, et al. Handover Control in Wireless Systemsvia Asynchronous Multi-User Deep Reinforcement Learning, arxiv.[Wang et al ’16] Wang D, et al. Nonlinearity Mitigation Using a MachineLearning Detector Based on k-Nearest Neighbors. IEEE PhotonicsTechnology Letters. 2016.


References[Venkatraman et al ’10] Venkatraman P, et al. Opportunistic bandwidthsharing through reinforcement learning. IEEE Trans. Veh. Techn., 2010.[Iannello et al ’12] Iannello F, Simeone O, Spagnolini U. Optimality ofmyopic scheduling and whittle indexability for energy harvesting sensors. inProc. CISS 2012.[Xu et al ’17] Xu Z, et al. A deep reinforcement learning based frameworkfor power-efficient resource allocation in cloud RANs, in Proc. IEEE ICC2017.[Mnih et al ’15] Mnih V, et al. Human-level control through deepreinforcement learning. Nature, 2015.[Bogale et al ’18] T. Bogale, et al., Machine Intelligence Techniques forNext-Generation Context-Aware Wireless Networks, arxiv.[Tang et al ’17] F. Tang et al., "On Removing Routing Protocol fromFuture Wireless Networks: A Real-time Deep Learning Approach forIntelligent Traffic Control," IEEE Wireless Communications, 2018.[Sallent et al ’15] O. Sallent, et al., "Learning-based coexistence for LTEoperation in unlicensed bands," in Proc. IEEE ICC 2015.


References[Kato et al ’17] N. Kato et al., "The Deep Learning Vision forHeterogeneous Network Traffic Control: Proposal, Challenges, and FuturePerspective," IEEE Wireless Communications, June 2017.[Siracusano and Bifulco ’18] Siracusano, Giuseppe; Bifulco, Roberto,In-network Neural Networks, arXiv:1801.05731.[He et al ’17] He Y, et al. Deep Reinforcement Learning (DRL)-basedResource Management in Software-Defined and Virtualized Vehicular AdHoc Networks, in Proc. ACM SDAIVN 2017.[Farsad and Goldsmith ’18] Farsad, Nariman; Goldsmith, Andrea, NeuralNetwork Detection of Data Sequences in Communication Systems,arXiv:1802.02046[Emigh et al ’15] Emigh, Matthew, et al. "A model based approach toexploration of continuous-state MDPs using Divergence-to-Go." in Proc.IEEE Machine Learning for Signal Processing (MLSP), 2015.[Caciularu and Burshtein ’18] Avi Caciularu, David Burshtein, BlindChannel Equalization using Variational Autoencoders, in Proc. IEEE ICCworkshop, 2018.


References[Musumeci et al ’18] Francesco Musumeci, Cristina Rottondi, AvishekNag, Irene Macaluso, Darko Zibar, Marco Ruffini, Massimo Tornatore,Networking and Internet Architecture A Survey on Application of MachineLearning Techniques in Optical Networks, arXiv:1803.07976[Okamoto et al ’18] Hironao Okamoto, et al., Machine-Learning-BasedFuture Received Signal Strength Prediction Using Depth Images formmWave CommunicationsarXiv:1804.00709.[Davaslioglu and Sagduyu ’18] Kemal Davaslioglu, Yalin E. Sagduyu,Generative Adversarial Learning for Spectrum Sensing, in Proc. IEEE ICC2018.[Aoudia and Hoydis ’18] Model Faycal Ait Aoudia, Jakob Hoydis,End-to-End Learning of Communications Systems Without a Channel,arXiv:1804.02276.[Karanov et al ’18] Boris Karanov, et al, End-to-end Deep Learning ofOptical Fiber Communications, arXiv:1804.04097.[Nachmani et al ‘16] Nachmani, et al. "Learning to decode linear codesusing deep learning." Allerton 2016.Osvaldo Simeone Brief Intro to ML + Comm 127 / 129

References

[O’Shea et al ’18] Timothy J. O’Shea, Tamoghna Roy, Nathan West,“Approximating the Void: Learning Stochastic Channel Models fromObservation with Variational Generative Adversarial Networks”,arXiv:1805.06350.[Zhao et al ’18] Zhifeng Zhao et al, “Deep Reinforcement Learning forNetwork Slicing”, arXiv:1805.06591.[Ibnkahla ’00] Ibnkahla M. Applications of neural networks to digitalcommunications–a survey. Signal processing. 2000 Jul 1;80(7):1185-215.[Bouchired et al ’98] Bouchired S, Roviras D, Castanie F. Equalisation ofsatellite mobile channels with neural network techniques. SpaceCommunications. 1998.[De Veciana and Zakhor ’92] De Veciana G, Zakhor A. Neural net-basedcontinuous phase modulation receivers. IEEE transactions oncommunications. 1992.


References

[Schibisch et al ‘18] Stefan Schibisch, et al,“Online Label Recovery forDeep Learning-based Communication through Error Correcting Codes,”.arXiv:1807.00747.[Ye et al ’18] Hao Ye, et al, “Channel Agnostic End-to-End Learning basedCommunication Systems with Conditional GAN,” arXiv:1807.00447.[Kim et al ’18] Hyeji Kim, et al “Deepcode: Feedback Codes via DeepLearning,” arXiv:1807.00801.


a brief introduction to machine learning (with ...i o er some pointers to speci c applications for...

Documents