a brief introduction to machine learning (with ...i o er some pointers to speci c applications for...
TRANSCRIPT
A Brief Introduction to Machine Learning(With Applications to Communications)
Osvaldo Simeone
King’s College London
August 8, 2018
Osvaldo Simeone Brief Intro to ML + Comm 1 / 129
Goals and Learning Outcomes
Goals:I Provide an introduction to main areas in machine learning with a focus
on probabilistic methodsI Offer some pointers to specific applications for telecom
Learning outcomes:I Recognize scenarios in which machine learning can and cannot be usefulI Identify specific classes of machine learning methods that apply to a
given problem with applications to telecom networks
Osvaldo Simeone Brief Intro to ML + Comm 2 / 129
For More...
O. Simeone, “A Brief Introduction to Machine Learning forEngineers,” arXiv:1709.02840.
O. Simeone, “A Very Brief Introduction to Machine Learning withApplications to Communication Systems,” arXiv:1808.02342.
Osvaldo Simeone Brief Intro to ML + Comm 3 / 129
What is Machine Learning?Traditional engineering approach:
I Acquisition of domain knowledge...
Osvaldo Simeone Brief Intro to ML + Comm 4 / 129
What is Machine Learning?Traditional engineering approach:
I ... mathematical (physics-based) modelling...
Osvaldo Simeone Brief Intro to ML + Comm 5 / 129
What is Machine Learning?Traditional engineering approach:
I ... and optimized algorithm design with performance guarantees
Osvaldo Simeone Brief Intro to ML + Comm 6 / 129
What is Machine Learning?
Machine learning approach:I Selection of a general purpose model and a learning algorithm...
Osvaldo Simeone Brief Intro to ML + Comm 7 / 129
What is Machine Learning?
Machine learning approach:I ... learning based on data (examples) and use of the trained
(black-box) “machine”
Osvaldo Simeone Brief Intro to ML + Comm 8 / 129
When to Use Machine Learning?
Advantages:I lower costI faster developmentI reduced implementation complexity
DisadvantagesI suboptimal performanceI lack of interpretabilityI limited applicability
Osvaldo Simeone Brief Intro to ML + Comm 9 / 129
When to Use Machine Learning?
(Slightly modified) criteria by [Brynjolfsson and Mitchell ’17]:I traditional engineering flow not suitable: model deficit or algorithm
deficitI the task involves a function that maps well-defined inputs to
well-defined outputsI the task provides clear feedback with clearly definable goals and metricsI large data sets exist or can be created containing input-output pairsI the task does not involve long chains of logic or reasoning that depend
on diverse background knowledge or common senseI the task requires does not require detailed explanations for how the
decision was madeI the task has a tolerance for error and no need for provably correct or
optimal solutionsI the phenomenon or function being learned should not change rapidly
over time
Osvaldo Simeone Brief Intro to ML + Comm 10 / 129
Taxonomy of Machine Learning Methods
Supervised learning
Unsupervised learning
Reinforcement learning
Osvaldo Simeone Brief Intro to ML + Comm 11 / 129
Taxonomy of Machine Learning Methods
Supervised vs unsupervised learning
Osvaldo Simeone Brief Intro to ML + Comm 12 / 129
Taxonomy of Machine Learning Methods
Reinforcement learning: feedback-based sequential decision making
[@ D. Silver]
st at
rt
Osvaldo Simeone Brief Intro to ML + Comm 13 / 129
Communication Networks
Fog network architecture [5GPPP]
Core
Network
Edge
Cloud
Wireless
Edge
Access
Network
Core
Cloud
Osvaldo Simeone Brief Intro to ML + Comm 14 / 129
Communication Networks
Fog network architecture [5GPPP]
Core
Network
Edge
Cloud
Wireless
Edge
Access
Network
Core
Cloud
Cloud
Edge
Data collection and processing can take place at the edge and/or atthe cloud.
Osvaldo Simeone Brief Intro to ML + Comm 15 / 129
Data in Communication Networks
Data at the edge:I PHY: Baseband signals, (multi-RAT) channel qualityI MAC/ Link: Throughput, FER, random access load and latencyI Network: Location, traffic loads across services, users’ device types,
battery levelsI Application: Users’ preferences, content demands, computing loads,
QoS metrics
Osvaldo Simeone Brief Intro to ML + Comm 16 / 129
Data in Communication Networks
Data at the cloud:I Network: Mobility patterns, network-wide traffic statistics, outage ratesI Application: User’s behavior patterns, subscription information, service
usage statistics, TCP/IP traffic statistics
Osvaldo Simeone Brief Intro to ML + Comm 17 / 129
Learning in Communication Networks
Which tasks?I traditional engineering flow not suitable: model deficit or algorithm
deficit (depends)I the task involves a function that maps well-defined inputs to
well-defined outputs XI the task provides clear feedback with clearly definable goals and
metrics XI large data sets exist or can be created containing input-output pairsXI the task does not involve long chains of logic or reasoning that depend
on diverse background knowledge or common sense XI the task requires does not require detailed explanations for how the
decision was made XI the task has a tolerance for error and no need for provably correct or
optimal solutions (depends)I the phenomenon or function being learned should not change rapidly
over time (depends)
Osvaldo Simeone Brief Intro to ML + Comm 18 / 129
Overview
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Osvaldo Simeone Brief Intro to ML + Comm 19 / 129
Overview
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Osvaldo Simeone Brief Intro to ML + Comm 20 / 129
Overview
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Osvaldo Simeone Brief Intro to ML + Comm 21 / 129
Supervised Learning
Supervised learning:I regression: continuous labelsI classification: discrete labels
Osvaldo Simeone Brief Intro to ML + Comm 22 / 129
Supervised Learning: Regression
0 0.2 0.4 0.6 0.8 1-1.5
-1
-0.5
0
0.5
1
1.5
Training set D: N training points (xn, tn), n = 1, ...,Nxn = covariates, domain points, or explanatory variablestn = dependent variables, labels, or responses (continuous)Goal: Predict the label t for a new, that is, as of yet unobserved,domain point x
Osvaldo Simeone Brief Intro to ML + Comm 23 / 129
Supervised Learning: Classification
4 5 6 7 8 90.5
1
1.5
2
2.5
3
3.5
4
4.5
?
Training set D: N training points (xn, tn), n = 1, ...,Nxn = covariates, domain points, or explanatory variablestn = dependent variables, labels, or responses (discrete)Goal: Predict the label (class) t for a new, that is, as of yetunobserved, domain point x
Osvaldo Simeone Brief Intro to ML + Comm 24 / 129
Supervised Learning
Impossible task without assuming a model (inductive bias) by the nofree lunch theorem
Memorizing vs. learning: Retrieval of a value tn corresponding to analready observed pair (xn, tn) ∈ D vs. predict the value t for anunseen x
Osvaldo Simeone Brief Intro to ML + Comm 25 / 129
Supervised Learning
Impossible task without assuming a model (inductive bias) by the nofree lunch theorem
Memorizing vs. learning: Retrieval of a value tn corresponding to analready observed pair (xn, tn) ∈ D vs. predict the value t for anunseen x
Osvaldo Simeone Brief Intro to ML + Comm 25 / 129
Defining Supervised Learning
Training set D:
(xn, tn) ∼i.i.d.
p(x , t), n = 1, ...,N
Based on the training set D, we derive a predictor t(x).
Test pair:(x, t) ∼
indep. of Dp(x , t)
Quality of the prediction t(x) for a pair (x , t)
`(t, t(x))
for some loss function `(t, t), e.g., `(t, t) = (t − t)2 (quadratic) or`(t, t) = 1(t 6= t) (probability of error)
Osvaldo Simeone Brief Intro to ML + Comm 26 / 129
Defining Supervised Learning
Training set D:
(xn, tn) ∼i.i.d.
p(x , t), n = 1, ...,N
Based on the training set D, we derive a predictor t(x).
Test pair:(x, t) ∼
indep. of Dp(x , t)
Quality of the prediction t(x) for a pair (x , t)
`(t, t(x))
for some loss function `(t, t), e.g., `(t, t) = (t − t)2 (quadratic) or`(t, t) = 1(t 6= t) (probability of error)
Osvaldo Simeone Brief Intro to ML + Comm 26 / 129
Defining Supervised Learning
Training set D:
(xn, tn) ∼i.i.d.
p(x , t), n = 1, ...,N
Based on the training set D, we derive a predictor t(x).
Test pair:(x, t) ∼
indep. of Dp(x , t)
Quality of the prediction t(x) for a pair (x , t)
`(t, t(x))
for some loss function `(t, t), e.g., `(t, t) = (t − t)2 (quadratic) or`(t, t) = 1(t 6= t) (probability of error)
Osvaldo Simeone Brief Intro to ML + Comm 26 / 129
Defining Supervised Learning
Goal: minimize average loss on the test pair (generalization loss)
Lp(t) = E(x,t)∼pxt [`(t, t(x))]
Alternative viewpoints to frequentist framework: Bayesian andMinimum Description Length (MDL)
Osvaldo Simeone Brief Intro to ML + Comm 27 / 129
Defining Supervised Learning
Goal: minimize average loss on the test pair (generalization loss)
Lp(t) = E(x,t)∼pxt [`(t, t(x))]
Alternative viewpoints to frequentist framework: Bayesian andMinimum Description Length (MDL)
Osvaldo Simeone Brief Intro to ML + Comm 27 / 129
When the True Distribution p(x , t) is Known...
... we don’t need data D
... and we have a standard inference problem, i.e., estimation(regression) or detection (classification).
The solution can be directly computed from the posterior distribution
p(t|x) =p(x , t)
p(x)
as
t∗(x) = argmint
Et∼pt|x [`(t, t)|x ]
Osvaldo Simeone Brief Intro to ML + Comm 28 / 129
When the True Distribution p(x , t) is Known...
... we don’t need data D
... and we have a standard inference problem, i.e., estimation(regression) or detection (classification).
The solution can be directly computed from the posterior distribution
p(t|x) =p(x , t)
p(x)
as
t∗(x) = argmint
Et∼pt|x [`(t, t)|x ]
Osvaldo Simeone Brief Intro to ML + Comm 28 / 129
When the Model p(x , t) is Known...
With quadratic loss, conditional mean: t∗(x) = Et∼pt|x [t|x ]
With probability of error, maximum a posteriori (MAP):t∗(x) = argmaxt p(t|x)
Example: with joint distribution
x\t 0 1
0 0.05 0.45
1 0.4 0.1
, we have
p(t = 1|x = 0) = 0.9
and
t∗(x = 0) = 0.9× 1 + 0.1× 0 = 0.9 for quadratic loss,
t∗(x = 0) = 1 for probability of error (MAP)
.
Osvaldo Simeone Brief Intro to ML + Comm 29 / 129
When the Model p(x , t) is Known...
With quadratic loss, conditional mean: t∗(x) = Et∼pt|x [t|x ]
With probability of error, maximum a posteriori (MAP):t∗(x) = argmaxt p(t|x)
Example: with joint distribution
x\t 0 1
0 0.05 0.45
1 0.4 0.1
, we have
p(t = 1|x = 0) = 0.9
and
t∗(x = 0) = 0.9× 1 + 0.1× 0 = 0.9 for quadratic loss,
t∗(x = 0) = 1 for probability of error (MAP)
.
Osvaldo Simeone Brief Intro to ML + Comm 29 / 129
When the True Distribution p(x , t) is Not Known...
... we need data D
... and we have a learning problem
1. Model selection (inductive bias): Define a parametric model
p(x , t|θ)︸ ︷︷ ︸generative model
or p(t|x , θ)︸ ︷︷ ︸discriminative model
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)
Osvaldo Simeone Brief Intro to ML + Comm 30 / 129
When the True Distribution p(x , t) is Not Known...
... we need data D
... and we have a learning problem
1. Model selection (inductive bias): Define a parametric model
p(x , t|θ)︸ ︷︷ ︸generative model
or p(t|x , θ)︸ ︷︷ ︸discriminative model
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)
Osvaldo Simeone Brief Intro to ML + Comm 30 / 129
When the True Distribution p(x , t) is Not Known...
... we need data D
... and we have a learning problem
1. Model selection (inductive bias): Define a parametric model
p(x , t|θ)︸ ︷︷ ︸generative model
or p(t|x , θ)︸ ︷︷ ︸discriminative model
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)
Osvaldo Simeone Brief Intro to ML + Comm 30 / 129
Logistic Regression
Example: Binary classification (t ∈ {0, 1})1. Model selection (inductive bias): logistic regression(discriminative model)
φ(x) = [φ1(x) · · ·φD′(x)]T is a vector of features (e.g., bag-of-wordsmodel for a text).
Osvaldo Simeone Brief Intro to ML + Comm 31 / 129
Logistic Regression
Parametric probabilistic model:
p(t = 1|x ,w) = σ(wTφ(x))
where σ(a) = (1 + exp(−a))−1 is the sigmoid function.
Osvaldo Simeone Brief Intro to ML + Comm 32 / 129
Logistic Regression2. Learning: To be discussed3. Inference: With probability of error loss, MAP classification
wTφ(x)︸ ︷︷ ︸logit or LLR
t=1≷t=0
0
Osvaldo Simeone Brief Intro to ML + Comm 33 / 129
Multi-Layer Neural Networks
1. Model selection (inductive bias): multi-layer neural network(discriminative model)
Multiple layers of learnable weights enable feature learning.
Osvaldo Simeone Brief Intro to ML + Comm 34 / 129
Supervised Learning
1. Model selection (inductive bias): Define a parametric model
p(x , t|θ)︸ ︷︷ ︸generative model
or p(t|x , θ)︸ ︷︷ ︸discriminative model
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)
Osvaldo Simeone Brief Intro to ML + Comm 35 / 129
Supervised Learning
1. Model selection (inductive bias): Define a parametric model
p(x , t|θ)︸ ︷︷ ︸generative model
or p(t|x , θ)︸ ︷︷ ︸discriminative model
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)
Osvaldo Simeone Brief Intro to ML + Comm 36 / 129
Learning: Maximum Likelihood
ML selects a value of θ that is the most likely to have generated theobserved training set D:
maximize p(D|θ)
⇐⇒maximize ln p(D|θ) (log-likelihood, or LL)
⇐⇒minimize − ln p(D|θ) (negative log-likelihood, or NLL)
For discriminative models:
minimize − ln p(tD|xD, θ) = −N∑
n=1
ln p(tn|xn, θ)
Osvaldo Simeone Brief Intro to ML + Comm 37 / 129
Learning: Maximum Likelihood
ML selects a value of θ that is the most likely to have generated theobserved training set D:
maximize p(D|θ)
⇐⇒maximize ln p(D|θ) (log-likelihood, or LL)
⇐⇒minimize − ln p(D|θ) (negative log-likelihood, or NLL)
For discriminative models:
minimize − ln p(tD|xD, θ) = −N∑
n=1
ln p(tn|xn, θ)
Osvaldo Simeone Brief Intro to ML + Comm 37 / 129
Learning: Maximum LikelihoodThe problem rarely has analytical solutions and is typically addressedby Stochastic Gradient Descent (SGD).For discriminative models, we have
θnew ← θold + γ∇θ ln p(tn|xn, θ)|θ=θold
γ is the learning rate.With multi-layer neural networks, this approach yields thebackpropagation algorithm.
Osvaldo Simeone Brief Intro to ML + Comm 38 / 129
Supervised Learning
1. Model selection (inductive bias): Define a parametric model
p(x , t|θ)︸ ︷︷ ︸generative model
or p(t|x , θ)︸ ︷︷ ︸discriminative model
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)
Osvaldo Simeone Brief Intro to ML + Comm 39 / 129
Model Selection
How to select a model (inductive bias)?
Model selection typically requires the model order, i.e., the capacity ofthe model.
Ex.: For logistic regression,I Model order M: Number of features
Osvaldo Simeone Brief Intro to ML + Comm 40 / 129
Model Selection
How to select a model (inductive bias)?
Model selection typically requires the model order, i.e., the capacity ofthe model.
Ex.: For logistic regression,I Model order M: Number of features
Osvaldo Simeone Brief Intro to ML + Comm 40 / 129
Model Selection
Example: Regression using a discriminative model p(t|x)
M∑m=0
wmxm
︸ ︷︷ ︸t(x): polynomial of order M
+N (0, 1)
0 0.2 0.4 0.6 0.8 1-1.5
-1
-0.5
0
0.5
1
1.5
Osvaldo Simeone Brief Intro to ML + Comm 41 / 129
Model Selection
With M = 1, using ML learning of the coefficients –
0 0.2 0.4 0.6 0.8 1-3
-2
-1
0
1
2
3
M= 1
Osvaldo Simeone Brief Intro to ML + Comm 42 / 129
Model Selection: Underfitting...
With M = 1, the ML predictor t(x) underfits the data:I the model is not rich enough to capture the variations present in the
data;I large training loss
LD(θ) =1
N
N∑n=1
(tn − t(xn))2
Osvaldo Simeone Brief Intro to ML + Comm 43 / 129
Model Selection
With M = 9, using ML learning of the coefficients –
0 0.2 0.4 0.6 0.8 1-3
-2
-1
0
1
2
3
= 9M
M= 1
Osvaldo Simeone Brief Intro to ML + Comm 44 / 129
Model Selection: ... vs Overfitting
With M = 9, the ML predictor overfits the data:I the model is too rich and, in order to account for the observations in
the training set, it appears to yield inaccurate predictions outside it;I presumably we have a large generalization loss
Lp(t) = E(x,t)∼pxt [(t− t(x))2]
Osvaldo Simeone Brief Intro to ML + Comm 45 / 129
Model Selection
M = 3 seems to be a resonable choice...
... but how do we know given that we have no data outside of thetraining set?
0 0.2 0.4 0.6 0.8 1-3
-2
-1
0
1
2
3
= 9M
M= 1
M= 3
Osvaldo Simeone Brief Intro to ML + Comm 46 / 129
Model Selection: ValidationKeep some data (validation set) to estimate the generalization errorfor different values of M(See cross-validation for a more efficient way to use the data.)
Osvaldo Simeone Brief Intro to ML + Comm 47 / 129
Model Selection: ValidationValidation allows model order selection.
1 2 3 4 5 6 7 8 90
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
root
ave
rage
squ
ared
loss
training
generalization (via validation)
overfittingunderfitting
Validation can also be used more generally to select otherhyperparameters (e.g., learning rate).
Osvaldo Simeone Brief Intro to ML + Comm 48 / 129
Model Selection: ValidationValidation allows model order selection.
1 2 3 4 5 6 7 8 90
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
root
ave
rage
squ
ared
loss
training
generalization (via validation)
overfittingunderfitting
Validation can also be used more generally to select otherhyperparameters (e.g., learning rate).
Osvaldo Simeone Brief Intro to ML + Comm 48 / 129
Model Selection: ValidationModel order selection should depend on the amount of data...It is a problem of bias (asymptotic error) versus generalization gap.
0 10 20 30 40 50 60 700
0.2
0.4
0.6
0.8
1ro
ot a
vera
ge q
uadr
atic
loss
M
M
= 1
= 7
generalization (via validation)
training
Osvaldo Simeone Brief Intro to ML + Comm 49 / 129
Application to Communication Networks
Fog network architecture [5GPPP]
Core
Network
Edge
Cloud
Wireless
Edge
Access
Network
Core
Cloud
Cloud
Edge
Osvaldo Simeone Brief Intro to ML + Comm 50 / 129
At the Edge: Overview
At the edge:I PHY: Detection and decoding, precoding and power allocation,
modulation recognition, localization, interference cancelation, jointsource channel coding, equalization in the presence of non-linearities
I MAC/ Link: Radio resource allocation, scheduling, multi-RAThandover, dynamic spectrum access, admission control
I Network: Proactive cachingI Application: Computing resource allocation, content request prediction
Osvaldo Simeone Brief Intro to ML + Comm 51 / 129
At the Edge: PHYChannel detection and decoding – classification
[Cammerer et al '17]
Osvaldo Simeone Brief Intro to ML + Comm 52 / 129
At the Edge: PHYChannel detection and decoding – classificationModel deficit
[Farsad and Goldsmith '18]
Osvaldo Simeone Brief Intro to ML + Comm 53 / 129
At the Edge: PHYChannel equalization in the presence of non-linearities, e.g., foroptical links – regressionAlgorithm deficit
[Wang et al ‘16]Osvaldo Simeone Brief Intro to ML + Comm 54 / 129
At the Edge: PHY
Channel equalization in the presence of non-linearities, e.g., forsatellite links with non-linear ampliers – regression
Algorithm deficit
[Bouchired et al ’98]
Osvaldo Simeone Brief Intro to ML + Comm 55 / 129
At the Edge: PHY
Channel decoding for modulation schemes with complex optimaldecoders, e.g., continuous phase modulation – classification
Algorithm deficit
[De Veciana and Zakhor '92]
Osvaldo Simeone Brief Intro to ML + Comm 56 / 129
At the Edge: PHY
Channel decoding – classification
Leverage domain knowledge to set up the parametrized model to belearned
[Nachmani et al ‘16]
Osvaldo Simeone Brief Intro to ML + Comm 57 / 129
At the Edge: PHYChannel equalization to compensate for hardware impairments –regressionLeverage domain knowledge to design the decoder
[Schibisch et al ‘18]
Osvaldo Simeone Brief Intro to ML + Comm 58 / 129
At the Edge: PHYModulation recognition – classificationAlgorithm deficit
[Agirman-Tosun et al '11]
Osvaldo Simeone Brief Intro to ML + Comm 59 / 129
At the Edge: PHYLocalization – regressionModel deficit
(coordinates)
[Fang and Lin ‘08]Osvaldo Simeone Brief Intro to ML + Comm 60 / 129
At the Edge: PHYPrecoding and power allocation – regressionAlgorithm deficit
[Sun et al ’17]
Osvaldo Simeone Brief Intro to ML + Comm 61 / 129
At the Edge: PHYInterference cancellation – regressionModel deficit
[Balatsoukas-Stimming ‘17]Osvaldo Simeone Brief Intro to ML + Comm 62 / 129
At the Edge: MAC/ LinkSpectrum sensing – classificationModel deficit
[Tumuluru et al '10]
Osvaldo Simeone Brief Intro to ML + Comm 63 / 129
At the Edge: MAC/ LinkMmwave channel quality prediction using depth images – regressionModel deficit
[Okamoto et al '18]Osvaldo Simeone Brief Intro to ML + Comm 64 / 129
At the Edge: Network and ApplicationContent prediction for proactive caching – classificationModel deficit
[Chen et al '17]
Osvaldo Simeone Brief Intro to ML + Comm 65 / 129
At the Cloud: Overview
At the cloud:I Network: Routing (classification vs look-up tables), SDN flow table
updating, proactive caching, congestion controlI Application: Cloud/ fog computing, Internet traffic classification
Osvaldo Simeone Brief Intro to ML + Comm 66 / 129
At the Cloud: NetworkLink prediction for wireless routing – classification/ regressionModel deficit
[Wang et al 06]Osvaldo Simeone Brief Intro to ML + Comm 67 / 129
At the Cloud: NetworkLink prediction for optical routing – classification/ regressionModel deficit
[Musumeci et al ’18]Osvaldo Simeone Brief Intro to ML + Comm 68 / 129
At the Cloud: NetworkCongestion prediction for smart routing – classificationModel deficit
[Tang et al ‘17]Osvaldo Simeone Brief Intro to ML + Comm 69 / 129
At the Cloud: Network and ApplicationTraffic classification – classificationModel deficit
[Nguyen et al '08]
Osvaldo Simeone Brief Intro to ML + Comm 70 / 129
Overview
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Osvaldo Simeone Brief Intro to ML + Comm 71 / 129
Unsupervised Learning
Unsupervised learning tasks operate over unlabelled data sets.
General goal: discover properties of the data, e.g., for compressedrepresentation
“Some of us see unsupervised learning as the key towards machineswith common sense.” (Y. LeCun)
Osvaldo Simeone Brief Intro to ML + Comm 72 / 129
Unsupervised Learning
Unsupervised learning tasks operate over unlabelled data sets.
General goal: discover properties of the data, e.g., for compressedrepresentation
“Some of us see unsupervised learning as the key towards machineswith common sense.” (Y. LeCun)
Osvaldo Simeone Brief Intro to ML + Comm 72 / 129
“Defining” Unsupervised Learning
Training set D:xn ∼
i.i.d.p(x), n = 1, ...,N
Goal: Learn some useful properties of the distribution p(x)
Alternative viewpoints to frequentist framework: Bayesian and MDL
Osvaldo Simeone Brief Intro to ML + Comm 73 / 129
“Defining” Unsupervised Learning
Training set D:xn ∼
i.i.d.p(x), n = 1, ...,N
Goal: Learn some useful properties of the distribution p(x)
Alternative viewpoints to frequentist framework: Bayesian and MDL
Osvaldo Simeone Brief Intro to ML + Comm 73 / 129
Unsupervised Learning Tasks
Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers
Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)
Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks
Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films
Osvaldo Simeone Brief Intro to ML + Comm 74 / 129
Unsupervised Learning Tasks
Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers
Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)
Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks
Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films
Osvaldo Simeone Brief Intro to ML + Comm 74 / 129
Unsupervised Learning Tasks
Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers
Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)
Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks
Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films
Osvaldo Simeone Brief Intro to ML + Comm 74 / 129
Unsupervised Learning Tasks
Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers
Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)
Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks
Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films
Osvaldo Simeone Brief Intro to ML + Comm 74 / 129
Unsupervised Learning
1. Model selection (inductive bias): Define a parametric modelp(x |θ)
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Clustering, feature extraction, sample generation...
Osvaldo Simeone Brief Intro to ML + Comm 75 / 129
Unsupervised Learning
1. Model selection (inductive bias): Define a parametric modelp(x |θ)
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Clustering, feature extraction, sample generation...
Osvaldo Simeone Brief Intro to ML + Comm 76 / 129
Models
Unsupervised learning models typically involve hidden or latentvariables.
zn = hidden, or latent, variables for each data point xn
Ex.: zn = cluster index of xn
Osvaldo Simeone Brief Intro to ML + Comm 77 / 129
Models
Unsupervised learning models typically involve hidden or latentvariables.
zn = hidden, or latent, variables for each data point xn
Ex.: zn = cluster index of xn
Osvaldo Simeone Brief Intro to ML + Comm 78 / 129
(a) Directed Generative Models
Model data x as being caused by z :
p(x |θ) =∑z
p(z |θ)p(x |z , θ)
Osvaldo Simeone Brief Intro to ML + Comm 79 / 129
(a) Directed Generative Models
Ex.: Document clusteringI x is a document, and z is (interpreted as) topicI p(z |θ) = distribution of topicsI p(x |z , θ) = distribution of words in document given topic
Basic representatives:I Mixture of GaussiansI Likelihood-free models
Osvaldo Simeone Brief Intro to ML + Comm 80 / 129
(a) Directed Generative Models
Ex.: Document clusteringI x is a document, and z is (interpreted as) topicI p(z |θ) = distribution of topicsI p(x |z , θ) = distribution of words in document given topic
Basic representatives:I Mixture of GaussiansI Likelihood-free models
Osvaldo Simeone Brief Intro to ML + Comm 80 / 129
(d) Autoencoders
Model encoding from data to hidden variables, as well as decodingfrom hidden variables back to data:
p(z |x , θ) and p(x |z , θ),
Osvaldo Simeone Brief Intro to ML + Comm 81 / 129
(d) Autoencoders
Ex.: CompressionI x is an image and z is (interpreted as) a compressed (e.g., sparse)
representationI p(z |x , θ) = compression of image to representationI p(x |z , θ) = decompression of representation into an image
Basic representative: Principal Component Analysis (PCA), dictionarylearning, neural network-based autoencoders
Osvaldo Simeone Brief Intro to ML + Comm 82 / 129
(d) Autoencoders
Ex.: CompressionI x is an image and z is (interpreted as) a compressed (e.g., sparse)
representationI p(z |x , θ) = compression of image to representationI p(x |z , θ) = decompression of representation into an image
Basic representative: Principal Component Analysis (PCA), dictionarylearning, neural network-based autoencoders
Osvaldo Simeone Brief Intro to ML + Comm 82 / 129
Unsupervised Learning
1. Model selection (inductive bias): Define a parametric modelp(x |θ)
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Clustering, feature extraction, sample generation...
Osvaldo Simeone Brief Intro to ML + Comm 83 / 129
Learning: Maximum Likelihood
Focus on directed generative models (a)
To simplify the notation, consider a single data point x (sum overdata set D to generalize).
ML problem:
maxθ
ln p(x |θ) = ln
(∑z
p(x , z |θ)
)
Key issue: Need to marginalize over latent variables, whosedistribution is not known, in order to evaluate LL.
Osvaldo Simeone Brief Intro to ML + Comm 84 / 129
Learning: Maximum Likelihood
Focus on directed generative models (a)
To simplify the notation, consider a single data point x (sum overdata set D to generalize).
ML problem:
maxθ
ln p(x |θ) = ln
(∑z
p(x , z |θ)
)
Key issue: Need to marginalize over latent variables, whosedistribution is not known, in order to evaluate LL.
Osvaldo Simeone Brief Intro to ML + Comm 84 / 129
ELBO
To tackle this issue, a standard approach is the introduction of avariational distribution q(z) and the use of the Evidence LowerBOund (ELBO).
For any fixed value x and any distribution q(z) on the latent variablesz (possibly dependent on x), the ELBO L(q, θ) is defined as
L(q, θ) =Ez∼q(z)[ln p(x , z|θ)− ln q(z)︸ ︷︷ ︸learning signal
]
Osvaldo Simeone Brief Intro to ML + Comm 85 / 129
ELBO
To tackle this issue, a standard approach is the introduction of avariational distribution q(z) and the use of the Evidence LowerBOund (ELBO).
For any fixed value x and any distribution q(z) on the latent variablesz (possibly dependent on x), the ELBO L(q, θ) is defined as
L(q, θ) =Ez∼q(z)[ln p(x , z|θ)− ln q(z)︸ ︷︷ ︸learning signal
]
Osvaldo Simeone Brief Intro to ML + Comm 85 / 129
ELBOThe ELBO is a global lower bound on the LL function
ln p(x |θ) ≥ L(q, θ),
where equality holds at a value θ0 if and only if the distribution q(z)satisfies
q(z) = p(z |x , θ0).
-4 -3 -2 -1 0 1 2 3 4-5
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
Log-
likel
ihoo
d
ELBO (0= 3)
ELBO (0 = 2)
LL
Osvaldo Simeone Brief Intro to ML + Comm 86 / 129
Expectation-Maximization (EM) Algorithm
...
LL
newold
Osvaldo Simeone Brief Intro to ML + Comm 87 / 129
Expectation-Maximization (EM) Algorithm
Initialize parameter vector θold.
For each iterationI E step: For fixed parameter vector θold,
maxqL(q, θold)→ qnew(z) = p(z |x , θold)
I M step: For fixed variational distribution qnew(z),
maxθL(qnew, θ)→ max
θEz∼qnew(z) [ln p(x , z|θ)]
Osvaldo Simeone Brief Intro to ML + Comm 88 / 129
Expectation-Maximization (EM) Algorithm
Initialize parameter vector θold.
For each iterationI E step: For fixed parameter vector θold,
maxqL(q, θold)→ qnew(z) = p(z |x , θold).
Bayesian inference of the latent variables
I M step: For fixed variational distribution qnew(z),
maxθL(qnew, θ)→ max
θEz∼qnew(z) [ln p(x , z|θ)]
Solve a supervised learning problem
Osvaldo Simeone Brief Intro to ML + Comm 89 / 129
Expectation-Maximization (EM) Algorithm
EM guarantees decreasing objective values, which ensuresconvergence to a local optimum of the original problem.
...
LL
newold
Osvaldo Simeone Brief Intro to ML + Comm 90 / 129
Example: Mixture of Gaussians
Directed generative model:
z ∼ Bern(π)
x|z =k ∼ N(µk ,Σk)
Osvaldo Simeone Brief Intro to ML + Comm 91 / 129
Example: Mixture of Gaussians
Osvaldo Simeone Brief Intro to ML + Comm 92 / 129
Example: Mixture of Gaussians
Osvaldo Simeone Brief Intro to ML + Comm 93 / 129
Example: Mixture of Gaussians
Osvaldo Simeone Brief Intro to ML + Comm 94 / 129
Example: Mixture of Gaussians
Osvaldo Simeone Brief Intro to ML + Comm 95 / 129
Example: Mixture of Gaussians
Osvaldo Simeone Brief Intro to ML + Comm 96 / 129
Example: Mixture of Gaussians
Osvaldo Simeone Brief Intro to ML + Comm 97 / 129
Scaling EM
EM algorithm may be impractical for large scale problems: need tocompute posterior in E step and to average over z in the M step.
Solutions:I E step: Parametrize the variational distribution q(z |ϕ) or q(z |x , ϕ) and
maximize ELBO over ϕ (variational autoencoder)I M step: Approximate Ez∼qnew(z) [ln p(x , z|θ)] via Monte CarloI Use gradient descent for E and/or M steps
Osvaldo Simeone Brief Intro to ML + Comm 98 / 129
Scaling EM
EM algorithm may be impractical for large scale problems: need tocompute posterior in E step and to average over z in the M step.
Solutions:I E step: Parametrize the variational distribution q(z |ϕ) or q(z |x , ϕ) and
maximize ELBO over ϕ (variational autoencoder)I M step: Approximate Ez∼qnew(z) [ln p(x , z|θ)] via Monte CarloI Use gradient descent for E and/or M steps
Osvaldo Simeone Brief Intro to ML + Comm 98 / 129
Learning: Beyond Maximum Likelihood
ML tends to provide inclusive and “blurry” estimates of thedistribution of the data distribution.
-5 0 50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
This can be a problem for tasks such as data generation.
Osvaldo Simeone Brief Intro to ML + Comm 99 / 129
Learning: Beyond Maximum Likelihood
ML can be proven to minimize the KL divergence
KL(pD(x)||p(x |θ)) = Ez∼pD(x)
[ln
pD(x)
p(x |θ)
]betwen the empirical distribution
pD(x) =N[x ]
N(with counts N[x ] = |{n : xn = x}|)
and the model.
Osvaldo Simeone Brief Intro to ML + Comm 100 / 129
Learning: Beyond Maximum LikelihoodThe KL divergence is part of the larger class of f -divergences betweentwo distributions p(x) and q(x):
Df (p||q) = maxT (x)
Ex∼p[T (x)]− Ex∼q[g(T (x))],
for some concave increasing function g(·).
𝑇(𝑥)
𝑥~𝑝(𝑥)
𝑥~𝑞(𝑥)
𝑝 𝑥 if 𝑇 𝑥 large
discriminator
𝑞 𝑥 if 𝑇 𝑥 small
Osvaldo Simeone Brief Intro to ML + Comm 101 / 129
Learning: Generative Adversarial Networks (GANs)
Generalizing the ML problem, GANs attempt to solve the problem
minθ
maxϕ
Ex∼pD [Tϕ(x)]− Ex∼p(x |θ)[g(Tϕ(x))]
for some differentiable function Tϕ(x) of the parameter vector ϕ.
Choice of the divergence (via the discriminator) is tailored to data.
Can be applied to likelihood-free models.
Osvaldo Simeone Brief Intro to ML + Comm 102 / 129
Learning: Generative Adversarial Networks (GANs)
84 [NVIDIA]
Osvaldo Simeone Brief Intro to ML + Comm 103 / 129
Applications to Communication Networks
Fog network architecture [5GPPP]
Core
Network
Edge
Cloud
Wireless
Edge
Access
Network
Core
Cloud
Cloud
Edge
Osvaldo Simeone Brief Intro to ML + Comm 104 / 129
At the Edge: Overview
At the edge:I PHY: E2E encoding/decoding, CSI compression and feedback,
fingerprinting for localization, blind source separation, blind channelequalization
I MAC/ Link: Clustering for resource allocation, clustering forself-organizing multi-hop networks
Osvaldo Simeone Brief Intro to ML + Comm 105 / 129
At the Edge: PHYEnd-to-end encoding/decoding for wireless channels – autoencoders
[O’Shea and Hoydis ’17]
Osvaldo Simeone Brief Intro to ML + Comm 106 / 129
At the Edge: PHYEnd-to-end encoding/decoding for optical channels – autoencodersAlgorithm deficit
[Karanov et al ‘18]
Osvaldo Simeone Brief Intro to ML + Comm 107 / 129
At the Edge: PHYEnd-to-end encoding/decoding for Gaussian channels with feedback –autoencoders based on Recurrent Neural Network (RNN)Algorithm deficit
[Kim et al ‘18]
Osvaldo Simeone Brief Intro to ML + Comm 108 / 129
At the Edge: PHYChannel State Information (CSI) compression and feedback –autoencodersModel deficit
[Wen et al ‘17]
Osvaldo Simeone Brief Intro to ML + Comm 109 / 129
At the Edge: PHYFingerprinting for localization – autoencodersModel deficit
[Xiao et al '17]Osvaldo Simeone Brief Intro to ML + Comm 110 / 129
At the Edge: PHYMimicking a propagation channel - GAN (see also [Ye et al ’18])Model deficit
[O’Shea et al ‘18]
Osvaldo Simeone Brief Intro to ML + Comm 111 / 129
At the Edge: PHY
Mimicking and identifying a propagation channel (e.g., satellite) -generative models
Leveraging domain knowledge improves the learned model.
104 [Ibnkahla ‘00]
Osvaldo Simeone Brief Intro to ML + Comm 112 / 129
At the Edge: MAC/ Link
Generating artificial examples to augment training set for spectrumsensing - GAN
[Nakashima et al '18]
Osvaldo Simeone Brief Intro to ML + Comm 113 / 129
At the Edge: MAC/ LinkResource allocation – clusteringAlgorithm deficit
[Abdelnasser et al '14]
Osvaldo Simeone Brief Intro to ML + Comm 114 / 129
At the Cloud: Overview
At the cloud:I Network: Clustering for group-based access control, anomaly detectionI Application: Community detection in social media, Internet traffic
clustering
Osvaldo Simeone Brief Intro to ML + Comm 115 / 129
At the Cloud: NetworkSelf-organizing multi-hop networks – clusteringAlgorithm deficit
[Abbassi and Younis '07]
Osvaldo Simeone Brief Intro to ML + Comm 116 / 129
At the Cloud: NetworkAnomaly detection – density estimationModel deficit
[Musumeci et al ’18]
Osvaldo Simeone Brief Intro to ML + Comm 117 / 129
At the Cloud: ApplicationCommunity detection in social networks - clusteringModel deficit
[Abbe et al ‘16]Osvaldo Simeone Brief Intro to ML + Comm 118 / 129
Concluding Remarks
Machine learning tools can leverage the availability of data andcomputing resources in modern communication systems.
Supervised, unsupervised and reinforcement learning paradigms lendthemselves to different key communication (sub)tasks.
Not a universal solution – case by case analysis of advantages anddisadvantages.
Osvaldo Simeone Brief Intro to ML + Comm 119 / 129
Concluding Remarks
Engineering the integration of traditional model-based techniques anddata-driven machine learning methods
[Reich ‘96]
Osvaldo Simeone Brief Intro to ML + Comm 120 / 129
Acknowledgements
This work has received funding from the European Research Council(ERC) under the European Union’s Horizon 2020 research and innovation
programme (grant agreement No. 725731).
Osvaldo Simeone Brief Intro to ML + Comm 121 / 129
References
[O’Shea and Hoydis ’17] T. J. O’Shea, J. Hoydis, An Introduction toMachine Learning Communications Systems, 2017[Cammerer et al ’17] Cammerer S, Gruber T, Hoydis J, Brink ST. ScalingDeep Learning-based Decoding of Polar Codes via Partitioning. arXivpreprint arXiv:1702.06901. 2017 Feb 22.[Balatsoukas-Stimming ’17] Balatsoukas-Stimming A. Non-Linear DigitalSelf-Interference Cancellation for In-Band Full-Duplex Radios Using NeuralNetworks. arXiv preprint arXiv:1711.00379. 2017 Nov 1.[Sun et al ’17] Sun H, Chen X, Shi Q, Hong M, Fu X, Sidiropoulos ND.Learning to Optimize: Training Deep Neural Networks for WirelessResource Management. arXiv preprint arXiv:1705.09412. 2017 May 26.[de Kerret et al ’17] Paul de Kerret, David Gesbert, Maurizio Filippone,Decentralized Deep Scheduling for Interfrence Channels, arXiv, 2017.[Wen et al ’17] Wen, Chao-Kai; Shih, Wan-Ting; Jin, Shi, Deep Learningfor Massive MIMO CSI Feedback arXiv.
Osvaldo Simeone Brief Intro to ML + Comm 122 / 129
References[Agirman-Tosun et al ’11] Agirman-Tosun H, et al. “Modulationclassification of MIMO-OFDM signals by independent component analysisand support vector machines.” in Proc. Asilomar 2011.[Fang and Lin ’08] Fang SH, Lin TN. Indoor location system based ondiscriminant-adaptive neural network in IEEE 802.11 environments. IEEETransactions on Neural networks. 2008 Nov;19(11):1973-8.[Tumuluru et al ’10] Tumuluru VK, Wang P, Niyato D. A neural networkbased spectrum prediction scheme for cognitive radio. in Proc. ICCC 2010.[Chen et al ’17] Chen M, et a;. Echo state networks for proactive cachingin cloud-based radio access networks with mobile users. IEEE Transactionson Wireless Communications. 2017.[Xiao et al ’17] Xiao C, Yang D, Chen Z, Tan G. 3-D BLE IndoorLocalization Based on Denoising Autoencoder. IEEE Access.2017;5:12751-60.[Abdelnasser et al ’14] Abdelnasser A, et al. Clustering and resourceallocation for dense femtocells in a two-tier cellular OFDMA network.IEEE Transactions on Wireless Communications, 2014.
Osvaldo Simeone Brief Intro to ML + Comm 123 / 129
References
[Nguyen et al ’08] Nguyen TT, Armitage G. A survey of techniques forinternet traffic classification using machine learning. IEEECommunications Surveys & Tutorials. 2008 Oct 1;10(4):56-76.[Wang et al 06] Wang, Yong, Margaret Martonosi, and Li-Shiuan Peh. "Asupervised learning approach for routing optimizations in wireless sensornetworks." Proc. ACM Workshop on Multi-hop ad hoc Networks, 2006.[Abbassi and Younis ’07] Abbasi AA, Younis M. A survey on clusteringalgorithms for wireless sensor networks. Computer communications, 2007.[Abbe et al ’16] Abbe E, Bandeira AS, Hall G. Exact recovery in thestochastic block model. IEEE Transactions on Information Theory, 2016.[Wang et al ’18] Wang, Zhi, et al. Handover Control in Wireless Systemsvia Asynchronous Multi-User Deep Reinforcement Learning, arxiv.[Wang et al ’16] Wang D, et al. Nonlinearity Mitigation Using a MachineLearning Detector Based on k-Nearest Neighbors. IEEE PhotonicsTechnology Letters. 2016.
Osvaldo Simeone Brief Intro to ML + Comm 124 / 129
References[Venkatraman et al ’10] Venkatraman P, et al. Opportunistic bandwidthsharing through reinforcement learning. IEEE Trans. Veh. Techn., 2010.[Iannello et al ’12] Iannello F, Simeone O, Spagnolini U. Optimality ofmyopic scheduling and whittle indexability for energy harvesting sensors. inProc. CISS 2012.[Xu et al ’17] Xu Z, et al. A deep reinforcement learning based frameworkfor power-efficient resource allocation in cloud RANs, in Proc. IEEE ICC2017.[Mnih et al ’15] Mnih V, et al. Human-level control through deepreinforcement learning. Nature, 2015.[Bogale et al ’18] T. Bogale, et al., Machine Intelligence Techniques forNext-Generation Context-Aware Wireless Networks, arxiv.[Tang et al ’17] F. Tang et al., "On Removing Routing Protocol fromFuture Wireless Networks: A Real-time Deep Learning Approach forIntelligent Traffic Control," IEEE Wireless Communications, 2018.[Sallent et al ’15] O. Sallent, et al., "Learning-based coexistence for LTEoperation in unlicensed bands," in Proc. IEEE ICC 2015.
Osvaldo Simeone Brief Intro to ML + Comm 125 / 129
References[Kato et al ’17] N. Kato et al., "The Deep Learning Vision forHeterogeneous Network Traffic Control: Proposal, Challenges, and FuturePerspective," IEEE Wireless Communications, June 2017.[Siracusano and Bifulco ’18] Siracusano, Giuseppe; Bifulco, Roberto,In-network Neural Networks, arXiv:1801.05731.[He et al ’17] He Y, et al. Deep Reinforcement Learning (DRL)-basedResource Management in Software-Defined and Virtualized Vehicular AdHoc Networks, in Proc. ACM SDAIVN 2017.[Farsad and Goldsmith ’18] Farsad, Nariman; Goldsmith, Andrea, NeuralNetwork Detection of Data Sequences in Communication Systems,arXiv:1802.02046[Emigh et al ’15] Emigh, Matthew, et al. "A model based approach toexploration of continuous-state MDPs using Divergence-to-Go." in Proc.IEEE Machine Learning for Signal Processing (MLSP), 2015.[Caciularu and Burshtein ’18] Avi Caciularu, David Burshtein, BlindChannel Equalization using Variational Autoencoders, in Proc. IEEE ICCworkshop, 2018.
Osvaldo Simeone Brief Intro to ML + Comm 126 / 129
References[Musumeci et al ’18] Francesco Musumeci, Cristina Rottondi, AvishekNag, Irene Macaluso, Darko Zibar, Marco Ruffini, Massimo Tornatore,Networking and Internet Architecture A Survey on Application of MachineLearning Techniques in Optical Networks, arXiv:1803.07976[Okamoto et al ’18] Hironao Okamoto, et al., Machine-Learning-BasedFuture Received Signal Strength Prediction Using Depth Images formmWave CommunicationsarXiv:1804.00709.[Davaslioglu and Sagduyu ’18] Kemal Davaslioglu, Yalin E. Sagduyu,Generative Adversarial Learning for Spectrum Sensing, in Proc. IEEE ICC2018.[Aoudia and Hoydis ’18] Model Faycal Ait Aoudia, Jakob Hoydis,End-to-End Learning of Communications Systems Without a Channel,arXiv:1804.02276.[Karanov et al ’18] Boris Karanov, et al, End-to-end Deep Learning ofOptical Fiber Communications, arXiv:1804.04097.[Nachmani et al ‘16] Nachmani, et al. "Learning to decode linear codesusing deep learning." Allerton 2016.Osvaldo Simeone Brief Intro to ML + Comm 127 / 129
References
[O’Shea et al ’18] Timothy J. O’Shea, Tamoghna Roy, Nathan West,“Approximating the Void: Learning Stochastic Channel Models fromObservation with Variational Generative Adversarial Networks”,arXiv:1805.06350.[Zhao et al ’18] Zhifeng Zhao et al, “Deep Reinforcement Learning forNetwork Slicing”, arXiv:1805.06591.[Ibnkahla ’00] Ibnkahla M. Applications of neural networks to digitalcommunications–a survey. Signal processing. 2000 Jul 1;80(7):1185-215.[Bouchired et al ’98] Bouchired S, Roviras D, Castanie F. Equalisation ofsatellite mobile channels with neural network techniques. SpaceCommunications. 1998.[De Veciana and Zakhor ’92] De Veciana G, Zakhor A. Neural net-basedcontinuous phase modulation receivers. IEEE transactions oncommunications. 1992.
Osvaldo Simeone Brief Intro to ML + Comm 128 / 129
References
[Schibisch et al ‘18] Stefan Schibisch, et al,“Online Label Recovery forDeep Learning-based Communication through Error Correcting Codes,”.arXiv:1807.00747.[Ye et al ’18] Hao Ye, et al, “Channel Agnostic End-to-End Learning basedCommunication Systems with Conditional GAN,” arXiv:1807.00447.[Kim et al ’18] Hyeji Kim, et al “Deepcode: Feedback Codes via DeepLearning,” arXiv:1807.00801.
Osvaldo Simeone Brief Intro to ML + Comm 129 / 129