a brief introduction to machine learning (with ...i o er some pointers to speci c applications for...

150
A Brief Introduction to Machine Learning (With Applications to Communications) Osvaldo Simeone King’s College London August 8, 2018 Osvaldo Simeone Brief Intro to ML + Comm 1 / 129

Upload: others

Post on 19-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

A Brief Introduction to Machine Learning(With Applications to Communications)

Osvaldo Simeone

King’s College London

August 8, 2018

Osvaldo Simeone Brief Intro to ML + Comm 1 / 129

Page 2: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Goals and Learning Outcomes

Goals:I Provide an introduction to main areas in machine learning with a focus

on probabilistic methodsI Offer some pointers to specific applications for telecom

Learning outcomes:I Recognize scenarios in which machine learning can and cannot be usefulI Identify specific classes of machine learning methods that apply to a

given problem with applications to telecom networks

Osvaldo Simeone Brief Intro to ML + Comm 2 / 129

Page 3: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

For More...

O. Simeone, “A Brief Introduction to Machine Learning forEngineers,” arXiv:1709.02840.

O. Simeone, “A Very Brief Introduction to Machine Learning withApplications to Communication Systems,” arXiv:1808.02342.

Osvaldo Simeone Brief Intro to ML + Comm 3 / 129

Page 4: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

What is Machine Learning?Traditional engineering approach:

I Acquisition of domain knowledge...

Osvaldo Simeone Brief Intro to ML + Comm 4 / 129

Page 5: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

What is Machine Learning?Traditional engineering approach:

I ... mathematical (physics-based) modelling...

Osvaldo Simeone Brief Intro to ML + Comm 5 / 129

Page 6: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

What is Machine Learning?Traditional engineering approach:

I ... and optimized algorithm design with performance guarantees

Osvaldo Simeone Brief Intro to ML + Comm 6 / 129

Page 7: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

What is Machine Learning?

Machine learning approach:I Selection of a general purpose model and a learning algorithm...

Osvaldo Simeone Brief Intro to ML + Comm 7 / 129

Page 8: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

What is Machine Learning?

Machine learning approach:I ... learning based on data (examples) and use of the trained

(black-box) “machine”

Osvaldo Simeone Brief Intro to ML + Comm 8 / 129

Page 9: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

When to Use Machine Learning?

Advantages:I lower costI faster developmentI reduced implementation complexity

DisadvantagesI suboptimal performanceI lack of interpretabilityI limited applicability

Osvaldo Simeone Brief Intro to ML + Comm 9 / 129

Page 10: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

When to Use Machine Learning?

(Slightly modified) criteria by [Brynjolfsson and Mitchell ’17]:I traditional engineering flow not suitable: model deficit or algorithm

deficitI the task involves a function that maps well-defined inputs to

well-defined outputsI the task provides clear feedback with clearly definable goals and metricsI large data sets exist or can be created containing input-output pairsI the task does not involve long chains of logic or reasoning that depend

on diverse background knowledge or common senseI the task requires does not require detailed explanations for how the

decision was madeI the task has a tolerance for error and no need for provably correct or

optimal solutionsI the phenomenon or function being learned should not change rapidly

over time

Osvaldo Simeone Brief Intro to ML + Comm 10 / 129

Page 11: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Taxonomy of Machine Learning Methods

Supervised learning

Unsupervised learning

Reinforcement learning

Osvaldo Simeone Brief Intro to ML + Comm 11 / 129

Page 12: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Taxonomy of Machine Learning Methods

Supervised vs unsupervised learning

Osvaldo Simeone Brief Intro to ML + Comm 12 / 129

Page 13: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Taxonomy of Machine Learning Methods

Reinforcement learning: feedback-based sequential decision making

[@ D. Silver]

st at

rt

Osvaldo Simeone Brief Intro to ML + Comm 13 / 129

Page 14: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Communication Networks

Fog network architecture [5GPPP]

Core

Network

Edge

Cloud

Wireless

Edge

Access

Network

Core

Cloud

Osvaldo Simeone Brief Intro to ML + Comm 14 / 129

Page 15: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Communication Networks

Fog network architecture [5GPPP]

Core

Network

Edge

Cloud

Wireless

Edge

Access

Network

Core

Cloud

Cloud

Edge

Data collection and processing can take place at the edge and/or atthe cloud.

Osvaldo Simeone Brief Intro to ML + Comm 15 / 129

Page 16: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Data in Communication Networks

Data at the edge:I PHY: Baseband signals, (multi-RAT) channel qualityI MAC/ Link: Throughput, FER, random access load and latencyI Network: Location, traffic loads across services, users’ device types,

battery levelsI Application: Users’ preferences, content demands, computing loads,

QoS metrics

Osvaldo Simeone Brief Intro to ML + Comm 16 / 129

Page 17: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Data in Communication Networks

Data at the cloud:I Network: Mobility patterns, network-wide traffic statistics, outage ratesI Application: User’s behavior patterns, subscription information, service

usage statistics, TCP/IP traffic statistics

Osvaldo Simeone Brief Intro to ML + Comm 17 / 129

Page 18: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Learning in Communication Networks

Which tasks?I traditional engineering flow not suitable: model deficit or algorithm

deficit (depends)I the task involves a function that maps well-defined inputs to

well-defined outputs XI the task provides clear feedback with clearly definable goals and

metrics XI large data sets exist or can be created containing input-output pairsXI the task does not involve long chains of logic or reasoning that depend

on diverse background knowledge or common sense XI the task requires does not require detailed explanations for how the

decision was made XI the task has a tolerance for error and no need for provably correct or

optimal solutions (depends)I the phenomenon or function being learned should not change rapidly

over time (depends)

Osvaldo Simeone Brief Intro to ML + Comm 18 / 129

Page 19: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Overview

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Osvaldo Simeone Brief Intro to ML + Comm 19 / 129

Page 20: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Overview

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Osvaldo Simeone Brief Intro to ML + Comm 20 / 129

Page 21: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Overview

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Osvaldo Simeone Brief Intro to ML + Comm 21 / 129

Page 22: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Supervised Learning

Supervised learning:I regression: continuous labelsI classification: discrete labels

Osvaldo Simeone Brief Intro to ML + Comm 22 / 129

Page 23: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Supervised Learning: Regression

0 0.2 0.4 0.6 0.8 1-1.5

-1

-0.5

0

0.5

1

1.5

Training set D: N training points (xn, tn), n = 1, ...,Nxn = covariates, domain points, or explanatory variablestn = dependent variables, labels, or responses (continuous)Goal: Predict the label t for a new, that is, as of yet unobserved,domain point x

Osvaldo Simeone Brief Intro to ML + Comm 23 / 129

Page 24: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Supervised Learning: Classification

4 5 6 7 8 90.5

1

1.5

2

2.5

3

3.5

4

4.5

?

Training set D: N training points (xn, tn), n = 1, ...,Nxn = covariates, domain points, or explanatory variablestn = dependent variables, labels, or responses (discrete)Goal: Predict the label (class) t for a new, that is, as of yetunobserved, domain point x

Osvaldo Simeone Brief Intro to ML + Comm 24 / 129

Page 25: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Supervised Learning

Impossible task without assuming a model (inductive bias) by the nofree lunch theorem

Memorizing vs. learning: Retrieval of a value tn corresponding to analready observed pair (xn, tn) ∈ D vs. predict the value t for anunseen x

Osvaldo Simeone Brief Intro to ML + Comm 25 / 129

Page 26: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Supervised Learning

Impossible task without assuming a model (inductive bias) by the nofree lunch theorem

Memorizing vs. learning: Retrieval of a value tn corresponding to analready observed pair (xn, tn) ∈ D vs. predict the value t for anunseen x

Osvaldo Simeone Brief Intro to ML + Comm 25 / 129

Page 27: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Defining Supervised Learning

Training set D:

(xn, tn) ∼i.i.d.

p(x , t), n = 1, ...,N

Based on the training set D, we derive a predictor t(x).

Test pair:(x, t) ∼

indep. of Dp(x , t)

Quality of the prediction t(x) for a pair (x , t)

`(t, t(x))

for some loss function `(t, t), e.g., `(t, t) = (t − t)2 (quadratic) or`(t, t) = 1(t 6= t) (probability of error)

Osvaldo Simeone Brief Intro to ML + Comm 26 / 129

Page 28: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Defining Supervised Learning

Training set D:

(xn, tn) ∼i.i.d.

p(x , t), n = 1, ...,N

Based on the training set D, we derive a predictor t(x).

Test pair:(x, t) ∼

indep. of Dp(x , t)

Quality of the prediction t(x) for a pair (x , t)

`(t, t(x))

for some loss function `(t, t), e.g., `(t, t) = (t − t)2 (quadratic) or`(t, t) = 1(t 6= t) (probability of error)

Osvaldo Simeone Brief Intro to ML + Comm 26 / 129

Page 29: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Defining Supervised Learning

Training set D:

(xn, tn) ∼i.i.d.

p(x , t), n = 1, ...,N

Based on the training set D, we derive a predictor t(x).

Test pair:(x, t) ∼

indep. of Dp(x , t)

Quality of the prediction t(x) for a pair (x , t)

`(t, t(x))

for some loss function `(t, t), e.g., `(t, t) = (t − t)2 (quadratic) or`(t, t) = 1(t 6= t) (probability of error)

Osvaldo Simeone Brief Intro to ML + Comm 26 / 129

Page 30: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Defining Supervised Learning

Goal: minimize average loss on the test pair (generalization loss)

Lp(t) = E(x,t)∼pxt [`(t, t(x))]

Alternative viewpoints to frequentist framework: Bayesian andMinimum Description Length (MDL)

Osvaldo Simeone Brief Intro to ML + Comm 27 / 129

Page 31: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Defining Supervised Learning

Goal: minimize average loss on the test pair (generalization loss)

Lp(t) = E(x,t)∼pxt [`(t, t(x))]

Alternative viewpoints to frequentist framework: Bayesian andMinimum Description Length (MDL)

Osvaldo Simeone Brief Intro to ML + Comm 27 / 129

Page 32: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

When the True Distribution p(x , t) is Known...

... we don’t need data D

... and we have a standard inference problem, i.e., estimation(regression) or detection (classification).

The solution can be directly computed from the posterior distribution

p(t|x) =p(x , t)

p(x)

as

t∗(x) = argmint

Et∼pt|x [`(t, t)|x ]

Osvaldo Simeone Brief Intro to ML + Comm 28 / 129

Page 33: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

When the True Distribution p(x , t) is Known...

... we don’t need data D

... and we have a standard inference problem, i.e., estimation(regression) or detection (classification).

The solution can be directly computed from the posterior distribution

p(t|x) =p(x , t)

p(x)

as

t∗(x) = argmint

Et∼pt|x [`(t, t)|x ]

Osvaldo Simeone Brief Intro to ML + Comm 28 / 129

Page 34: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

When the Model p(x , t) is Known...

With quadratic loss, conditional mean: t∗(x) = Et∼pt|x [t|x ]

With probability of error, maximum a posteriori (MAP):t∗(x) = argmaxt p(t|x)

Example: with joint distribution

x\t 0 1

0 0.05 0.45

1 0.4 0.1

, we have

p(t = 1|x = 0) = 0.9

and

t∗(x = 0) = 0.9× 1 + 0.1× 0 = 0.9 for quadratic loss,

t∗(x = 0) = 1 for probability of error (MAP)

.

Osvaldo Simeone Brief Intro to ML + Comm 29 / 129

Page 35: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

When the Model p(x , t) is Known...

With quadratic loss, conditional mean: t∗(x) = Et∼pt|x [t|x ]

With probability of error, maximum a posteriori (MAP):t∗(x) = argmaxt p(t|x)

Example: with joint distribution

x\t 0 1

0 0.05 0.45

1 0.4 0.1

, we have

p(t = 1|x = 0) = 0.9

and

t∗(x = 0) = 0.9× 1 + 0.1× 0 = 0.9 for quadratic loss,

t∗(x = 0) = 1 for probability of error (MAP)

.

Osvaldo Simeone Brief Intro to ML + Comm 29 / 129

Page 36: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

When the True Distribution p(x , t) is Not Known...

... we need data D

... and we have a learning problem

1. Model selection (inductive bias): Define a parametric model

p(x , t|θ)︸ ︷︷ ︸generative model

or p(t|x , θ)︸ ︷︷ ︸discriminative model

2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ

3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)

Osvaldo Simeone Brief Intro to ML + Comm 30 / 129

Page 37: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

When the True Distribution p(x , t) is Not Known...

... we need data D

... and we have a learning problem

1. Model selection (inductive bias): Define a parametric model

p(x , t|θ)︸ ︷︷ ︸generative model

or p(t|x , θ)︸ ︷︷ ︸discriminative model

2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ

3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)

Osvaldo Simeone Brief Intro to ML + Comm 30 / 129

Page 38: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

When the True Distribution p(x , t) is Not Known...

... we need data D

... and we have a learning problem

1. Model selection (inductive bias): Define a parametric model

p(x , t|θ)︸ ︷︷ ︸generative model

or p(t|x , θ)︸ ︷︷ ︸discriminative model

2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ

3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)

Osvaldo Simeone Brief Intro to ML + Comm 30 / 129

Page 39: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Logistic Regression

Example: Binary classification (t ∈ {0, 1})1. Model selection (inductive bias): logistic regression(discriminative model)

φ(x) = [φ1(x) · · ·φD′(x)]T is a vector of features (e.g., bag-of-wordsmodel for a text).

Osvaldo Simeone Brief Intro to ML + Comm 31 / 129

Page 40: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Logistic Regression

Parametric probabilistic model:

p(t = 1|x ,w) = σ(wTφ(x))

where σ(a) = (1 + exp(−a))−1 is the sigmoid function.

Osvaldo Simeone Brief Intro to ML + Comm 32 / 129

Page 41: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Logistic Regression2. Learning: To be discussed3. Inference: With probability of error loss, MAP classification

wTφ(x)︸ ︷︷ ︸logit or LLR

t=1≷t=0

0

Osvaldo Simeone Brief Intro to ML + Comm 33 / 129

Page 42: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Multi-Layer Neural Networks

1. Model selection (inductive bias): multi-layer neural network(discriminative model)

Multiple layers of learnable weights enable feature learning.

Osvaldo Simeone Brief Intro to ML + Comm 34 / 129

Page 43: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Supervised Learning

1. Model selection (inductive bias): Define a parametric model

p(x , t|θ)︸ ︷︷ ︸generative model

or p(t|x , θ)︸ ︷︷ ︸discriminative model

2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ

3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)

Osvaldo Simeone Brief Intro to ML + Comm 35 / 129

Page 44: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Supervised Learning

1. Model selection (inductive bias): Define a parametric model

p(x , t|θ)︸ ︷︷ ︸generative model

or p(t|x , θ)︸ ︷︷ ︸discriminative model

2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ

3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)

Osvaldo Simeone Brief Intro to ML + Comm 36 / 129

Page 45: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Learning: Maximum Likelihood

ML selects a value of θ that is the most likely to have generated theobserved training set D:

maximize p(D|θ)

⇐⇒maximize ln p(D|θ) (log-likelihood, or LL)

⇐⇒minimize − ln p(D|θ) (negative log-likelihood, or NLL)

For discriminative models:

minimize − ln p(tD|xD, θ) = −N∑

n=1

ln p(tn|xn, θ)

Osvaldo Simeone Brief Intro to ML + Comm 37 / 129

Page 46: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Learning: Maximum Likelihood

ML selects a value of θ that is the most likely to have generated theobserved training set D:

maximize p(D|θ)

⇐⇒maximize ln p(D|θ) (log-likelihood, or LL)

⇐⇒minimize − ln p(D|θ) (negative log-likelihood, or NLL)

For discriminative models:

minimize − ln p(tD|xD, θ) = −N∑

n=1

ln p(tn|xn, θ)

Osvaldo Simeone Brief Intro to ML + Comm 37 / 129

Page 47: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Learning: Maximum LikelihoodThe problem rarely has analytical solutions and is typically addressedby Stochastic Gradient Descent (SGD).For discriminative models, we have

θnew ← θold + γ∇θ ln p(tn|xn, θ)|θ=θold

γ is the learning rate.With multi-layer neural networks, this approach yields thebackpropagation algorithm.

Osvaldo Simeone Brief Intro to ML + Comm 38 / 129

Page 48: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Supervised Learning

1. Model selection (inductive bias): Define a parametric model

p(x , t|θ)︸ ︷︷ ︸generative model

or p(t|x , θ)︸ ︷︷ ︸discriminative model

2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ

3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)

Osvaldo Simeone Brief Intro to ML + Comm 39 / 129

Page 49: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Model Selection

How to select a model (inductive bias)?

Model selection typically requires the model order, i.e., the capacity ofthe model.

Ex.: For logistic regression,I Model order M: Number of features

Osvaldo Simeone Brief Intro to ML + Comm 40 / 129

Page 50: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Model Selection

How to select a model (inductive bias)?

Model selection typically requires the model order, i.e., the capacity ofthe model.

Ex.: For logistic regression,I Model order M: Number of features

Osvaldo Simeone Brief Intro to ML + Comm 40 / 129

Page 51: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Model Selection

Example: Regression using a discriminative model p(t|x)

M∑m=0

wmxm

︸ ︷︷ ︸t(x): polynomial of order M

+N (0, 1)

0 0.2 0.4 0.6 0.8 1-1.5

-1

-0.5

0

0.5

1

1.5

Osvaldo Simeone Brief Intro to ML + Comm 41 / 129

Page 52: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Model Selection

With M = 1, using ML learning of the coefficients –

0 0.2 0.4 0.6 0.8 1-3

-2

-1

0

1

2

3

M= 1

Osvaldo Simeone Brief Intro to ML + Comm 42 / 129

Page 53: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Model Selection: Underfitting...

With M = 1, the ML predictor t(x) underfits the data:I the model is not rich enough to capture the variations present in the

data;I large training loss

LD(θ) =1

N

N∑n=1

(tn − t(xn))2

Osvaldo Simeone Brief Intro to ML + Comm 43 / 129

Page 54: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Model Selection

With M = 9, using ML learning of the coefficients –

0 0.2 0.4 0.6 0.8 1-3

-2

-1

0

1

2

3

= 9M

M= 1

Osvaldo Simeone Brief Intro to ML + Comm 44 / 129

Page 55: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Model Selection: ... vs Overfitting

With M = 9, the ML predictor overfits the data:I the model is too rich and, in order to account for the observations in

the training set, it appears to yield inaccurate predictions outside it;I presumably we have a large generalization loss

Lp(t) = E(x,t)∼pxt [(t− t(x))2]

Osvaldo Simeone Brief Intro to ML + Comm 45 / 129

Page 56: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Model Selection

M = 3 seems to be a resonable choice...

... but how do we know given that we have no data outside of thetraining set?

0 0.2 0.4 0.6 0.8 1-3

-2

-1

0

1

2

3

= 9M

M= 1

M= 3

Osvaldo Simeone Brief Intro to ML + Comm 46 / 129

Page 57: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Model Selection: ValidationKeep some data (validation set) to estimate the generalization errorfor different values of M(See cross-validation for a more efficient way to use the data.)

Osvaldo Simeone Brief Intro to ML + Comm 47 / 129

Page 58: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Model Selection: ValidationValidation allows model order selection.

1 2 3 4 5 6 7 8 90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

root

ave

rage

squ

ared

loss

training

generalization (via validation)

overfittingunderfitting

Validation can also be used more generally to select otherhyperparameters (e.g., learning rate).

Osvaldo Simeone Brief Intro to ML + Comm 48 / 129

Page 59: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Model Selection: ValidationValidation allows model order selection.

1 2 3 4 5 6 7 8 90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

root

ave

rage

squ

ared

loss

training

generalization (via validation)

overfittingunderfitting

Validation can also be used more generally to select otherhyperparameters (e.g., learning rate).

Osvaldo Simeone Brief Intro to ML + Comm 48 / 129

Page 60: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Model Selection: ValidationModel order selection should depend on the amount of data...It is a problem of bias (asymptotic error) versus generalization gap.

0 10 20 30 40 50 60 700

0.2

0.4

0.6

0.8

1ro

ot a

vera

ge q

uadr

atic

loss

M

M

= 1

= 7

generalization (via validation)

training

Osvaldo Simeone Brief Intro to ML + Comm 49 / 129

Page 61: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Application to Communication Networks

Fog network architecture [5GPPP]

Core

Network

Edge

Cloud

Wireless

Edge

Access

Network

Core

Cloud

Cloud

Edge

Osvaldo Simeone Brief Intro to ML + Comm 50 / 129

Page 62: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: Overview

At the edge:I PHY: Detection and decoding, precoding and power allocation,

modulation recognition, localization, interference cancelation, jointsource channel coding, equalization in the presence of non-linearities

I MAC/ Link: Radio resource allocation, scheduling, multi-RAThandover, dynamic spectrum access, admission control

I Network: Proactive cachingI Application: Computing resource allocation, content request prediction

Osvaldo Simeone Brief Intro to ML + Comm 51 / 129

Page 63: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYChannel detection and decoding – classification

[Cammerer et al '17]

Osvaldo Simeone Brief Intro to ML + Comm 52 / 129

Page 64: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYChannel detection and decoding – classificationModel deficit

[Farsad and Goldsmith '18]

Osvaldo Simeone Brief Intro to ML + Comm 53 / 129

Page 65: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYChannel equalization in the presence of non-linearities, e.g., foroptical links – regressionAlgorithm deficit

[Wang et al ‘16]Osvaldo Simeone Brief Intro to ML + Comm 54 / 129

Page 66: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHY

Channel equalization in the presence of non-linearities, e.g., forsatellite links with non-linear ampliers – regression

Algorithm deficit

[Bouchired et al ’98]

Osvaldo Simeone Brief Intro to ML + Comm 55 / 129

Page 67: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHY

Channel decoding for modulation schemes with complex optimaldecoders, e.g., continuous phase modulation – classification

Algorithm deficit

[De Veciana and Zakhor '92]

Osvaldo Simeone Brief Intro to ML + Comm 56 / 129

Page 68: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHY

Channel decoding – classification

Leverage domain knowledge to set up the parametrized model to belearned

[Nachmani et al ‘16]

Osvaldo Simeone Brief Intro to ML + Comm 57 / 129

Page 69: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYChannel equalization to compensate for hardware impairments –regressionLeverage domain knowledge to design the decoder

[Schibisch et al ‘18]

Osvaldo Simeone Brief Intro to ML + Comm 58 / 129

Page 70: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYModulation recognition – classificationAlgorithm deficit

[Agirman-Tosun et al '11]

Osvaldo Simeone Brief Intro to ML + Comm 59 / 129

Page 71: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYLocalization – regressionModel deficit

(coordinates)

[Fang and Lin ‘08]Osvaldo Simeone Brief Intro to ML + Comm 60 / 129

Page 72: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYPrecoding and power allocation – regressionAlgorithm deficit

[Sun et al ’17]

Osvaldo Simeone Brief Intro to ML + Comm 61 / 129

Page 73: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYInterference cancellation – regressionModel deficit

[Balatsoukas-Stimming ‘17]Osvaldo Simeone Brief Intro to ML + Comm 62 / 129

Page 74: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: MAC/ LinkSpectrum sensing – classificationModel deficit

[Tumuluru et al '10]

Osvaldo Simeone Brief Intro to ML + Comm 63 / 129

Page 75: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: MAC/ LinkMmwave channel quality prediction using depth images – regressionModel deficit

[Okamoto et al '18]Osvaldo Simeone Brief Intro to ML + Comm 64 / 129

Page 76: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: Network and ApplicationContent prediction for proactive caching – classificationModel deficit

[Chen et al '17]

Osvaldo Simeone Brief Intro to ML + Comm 65 / 129

Page 77: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Cloud: Overview

At the cloud:I Network: Routing (classification vs look-up tables), SDN flow table

updating, proactive caching, congestion controlI Application: Cloud/ fog computing, Internet traffic classification

Osvaldo Simeone Brief Intro to ML + Comm 66 / 129

Page 78: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Cloud: NetworkLink prediction for wireless routing – classification/ regressionModel deficit

[Wang et al 06]Osvaldo Simeone Brief Intro to ML + Comm 67 / 129

Page 79: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Cloud: NetworkLink prediction for optical routing – classification/ regressionModel deficit

[Musumeci et al ’18]Osvaldo Simeone Brief Intro to ML + Comm 68 / 129

Page 80: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Cloud: NetworkCongestion prediction for smart routing – classificationModel deficit

[Tang et al ‘17]Osvaldo Simeone Brief Intro to ML + Comm 69 / 129

Page 81: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Cloud: Network and ApplicationTraffic classification – classificationModel deficit

[Nguyen et al '08]

Osvaldo Simeone Brief Intro to ML + Comm 70 / 129

Page 82: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Overview

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Osvaldo Simeone Brief Intro to ML + Comm 71 / 129

Page 83: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Unsupervised Learning

Unsupervised learning tasks operate over unlabelled data sets.

General goal: discover properties of the data, e.g., for compressedrepresentation

“Some of us see unsupervised learning as the key towards machineswith common sense.” (Y. LeCun)

Osvaldo Simeone Brief Intro to ML + Comm 72 / 129

Page 84: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Unsupervised Learning

Unsupervised learning tasks operate over unlabelled data sets.

General goal: discover properties of the data, e.g., for compressedrepresentation

“Some of us see unsupervised learning as the key towards machineswith common sense.” (Y. LeCun)

Osvaldo Simeone Brief Intro to ML + Comm 72 / 129

Page 85: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

“Defining” Unsupervised Learning

Training set D:xn ∼

i.i.d.p(x), n = 1, ...,N

Goal: Learn some useful properties of the distribution p(x)

Alternative viewpoints to frequentist framework: Bayesian and MDL

Osvaldo Simeone Brief Intro to ML + Comm 73 / 129

Page 86: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

“Defining” Unsupervised Learning

Training set D:xn ∼

i.i.d.p(x), n = 1, ...,N

Goal: Learn some useful properties of the distribution p(x)

Alternative viewpoints to frequentist framework: Bayesian and MDL

Osvaldo Simeone Brief Intro to ML + Comm 73 / 129

Page 87: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Unsupervised Learning Tasks

Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers

Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)

Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks

Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films

Osvaldo Simeone Brief Intro to ML + Comm 74 / 129

Page 88: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Unsupervised Learning Tasks

Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers

Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)

Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks

Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films

Osvaldo Simeone Brief Intro to ML + Comm 74 / 129

Page 89: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Unsupervised Learning Tasks

Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers

Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)

Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks

Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films

Osvaldo Simeone Brief Intro to ML + Comm 74 / 129

Page 90: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Unsupervised Learning Tasks

Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers

Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)

Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks

Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films

Osvaldo Simeone Brief Intro to ML + Comm 74 / 129

Page 91: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Unsupervised Learning

1. Model selection (inductive bias): Define a parametric modelp(x |θ)

2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ

3. Clustering, feature extraction, sample generation...

Osvaldo Simeone Brief Intro to ML + Comm 75 / 129

Page 92: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Unsupervised Learning

1. Model selection (inductive bias): Define a parametric modelp(x |θ)

2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ

3. Clustering, feature extraction, sample generation...

Osvaldo Simeone Brief Intro to ML + Comm 76 / 129

Page 93: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Models

Unsupervised learning models typically involve hidden or latentvariables.

zn = hidden, or latent, variables for each data point xn

Ex.: zn = cluster index of xn

Osvaldo Simeone Brief Intro to ML + Comm 77 / 129

Page 94: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Models

Unsupervised learning models typically involve hidden or latentvariables.

zn = hidden, or latent, variables for each data point xn

Ex.: zn = cluster index of xn

Osvaldo Simeone Brief Intro to ML + Comm 78 / 129

Page 95: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

(a) Directed Generative Models

Model data x as being caused by z :

p(x |θ) =∑z

p(z |θ)p(x |z , θ)

Osvaldo Simeone Brief Intro to ML + Comm 79 / 129

Page 96: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

(a) Directed Generative Models

Ex.: Document clusteringI x is a document, and z is (interpreted as) topicI p(z |θ) = distribution of topicsI p(x |z , θ) = distribution of words in document given topic

Basic representatives:I Mixture of GaussiansI Likelihood-free models

Osvaldo Simeone Brief Intro to ML + Comm 80 / 129

Page 97: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

(a) Directed Generative Models

Ex.: Document clusteringI x is a document, and z is (interpreted as) topicI p(z |θ) = distribution of topicsI p(x |z , θ) = distribution of words in document given topic

Basic representatives:I Mixture of GaussiansI Likelihood-free models

Osvaldo Simeone Brief Intro to ML + Comm 80 / 129

Page 98: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

(d) Autoencoders

Model encoding from data to hidden variables, as well as decodingfrom hidden variables back to data:

p(z |x , θ) and p(x |z , θ),

Osvaldo Simeone Brief Intro to ML + Comm 81 / 129

Page 99: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

(d) Autoencoders

Ex.: CompressionI x is an image and z is (interpreted as) a compressed (e.g., sparse)

representationI p(z |x , θ) = compression of image to representationI p(x |z , θ) = decompression of representation into an image

Basic representative: Principal Component Analysis (PCA), dictionarylearning, neural network-based autoencoders

Osvaldo Simeone Brief Intro to ML + Comm 82 / 129

Page 100: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

(d) Autoencoders

Ex.: CompressionI x is an image and z is (interpreted as) a compressed (e.g., sparse)

representationI p(z |x , θ) = compression of image to representationI p(x |z , θ) = decompression of representation into an image

Basic representative: Principal Component Analysis (PCA), dictionarylearning, neural network-based autoencoders

Osvaldo Simeone Brief Intro to ML + Comm 82 / 129

Page 101: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Unsupervised Learning

1. Model selection (inductive bias): Define a parametric modelp(x |θ)

2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ

3. Clustering, feature extraction, sample generation...

Osvaldo Simeone Brief Intro to ML + Comm 83 / 129

Page 102: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Learning: Maximum Likelihood

Focus on directed generative models (a)

To simplify the notation, consider a single data point x (sum overdata set D to generalize).

ML problem:

maxθ

ln p(x |θ) = ln

(∑z

p(x , z |θ)

)

Key issue: Need to marginalize over latent variables, whosedistribution is not known, in order to evaluate LL.

Osvaldo Simeone Brief Intro to ML + Comm 84 / 129

Page 103: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Learning: Maximum Likelihood

Focus on directed generative models (a)

To simplify the notation, consider a single data point x (sum overdata set D to generalize).

ML problem:

maxθ

ln p(x |θ) = ln

(∑z

p(x , z |θ)

)

Key issue: Need to marginalize over latent variables, whosedistribution is not known, in order to evaluate LL.

Osvaldo Simeone Brief Intro to ML + Comm 84 / 129

Page 104: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

ELBO

To tackle this issue, a standard approach is the introduction of avariational distribution q(z) and the use of the Evidence LowerBOund (ELBO).

For any fixed value x and any distribution q(z) on the latent variablesz (possibly dependent on x), the ELBO L(q, θ) is defined as

L(q, θ) =Ez∼q(z)[ln p(x , z|θ)− ln q(z)︸ ︷︷ ︸learning signal

]

Osvaldo Simeone Brief Intro to ML + Comm 85 / 129

Page 105: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

ELBO

To tackle this issue, a standard approach is the introduction of avariational distribution q(z) and the use of the Evidence LowerBOund (ELBO).

For any fixed value x and any distribution q(z) on the latent variablesz (possibly dependent on x), the ELBO L(q, θ) is defined as

L(q, θ) =Ez∼q(z)[ln p(x , z|θ)− ln q(z)︸ ︷︷ ︸learning signal

]

Osvaldo Simeone Brief Intro to ML + Comm 85 / 129

Page 106: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

ELBOThe ELBO is a global lower bound on the LL function

ln p(x |θ) ≥ L(q, θ),

where equality holds at a value θ0 if and only if the distribution q(z)satisfies

q(z) = p(z |x , θ0).

-4 -3 -2 -1 0 1 2 3 4-5

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

Log-

likel

ihoo

d

ELBO (0= 3)

ELBO (0 = 2)

LL

Osvaldo Simeone Brief Intro to ML + Comm 86 / 129

Page 107: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Expectation-Maximization (EM) Algorithm

...

LL

newold

Osvaldo Simeone Brief Intro to ML + Comm 87 / 129

Page 108: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Expectation-Maximization (EM) Algorithm

Initialize parameter vector θold.

For each iterationI E step: For fixed parameter vector θold,

maxqL(q, θold)→ qnew(z) = p(z |x , θold)

I M step: For fixed variational distribution qnew(z),

maxθL(qnew, θ)→ max

θEz∼qnew(z) [ln p(x , z|θ)]

Osvaldo Simeone Brief Intro to ML + Comm 88 / 129

Page 109: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Expectation-Maximization (EM) Algorithm

Initialize parameter vector θold.

For each iterationI E step: For fixed parameter vector θold,

maxqL(q, θold)→ qnew(z) = p(z |x , θold).

Bayesian inference of the latent variables

I M step: For fixed variational distribution qnew(z),

maxθL(qnew, θ)→ max

θEz∼qnew(z) [ln p(x , z|θ)]

Solve a supervised learning problem

Osvaldo Simeone Brief Intro to ML + Comm 89 / 129

Page 110: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Expectation-Maximization (EM) Algorithm

EM guarantees decreasing objective values, which ensuresconvergence to a local optimum of the original problem.

...

LL

newold

Osvaldo Simeone Brief Intro to ML + Comm 90 / 129

Page 111: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Example: Mixture of Gaussians

Directed generative model:

z ∼ Bern(π)

x|z =k ∼ N(µk ,Σk)

Osvaldo Simeone Brief Intro to ML + Comm 91 / 129

Page 112: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Example: Mixture of Gaussians

Osvaldo Simeone Brief Intro to ML + Comm 92 / 129

Page 113: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Example: Mixture of Gaussians

Osvaldo Simeone Brief Intro to ML + Comm 93 / 129

Page 114: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Example: Mixture of Gaussians

Osvaldo Simeone Brief Intro to ML + Comm 94 / 129

Page 115: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Example: Mixture of Gaussians

Osvaldo Simeone Brief Intro to ML + Comm 95 / 129

Page 116: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Example: Mixture of Gaussians

Osvaldo Simeone Brief Intro to ML + Comm 96 / 129

Page 117: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Example: Mixture of Gaussians

Osvaldo Simeone Brief Intro to ML + Comm 97 / 129

Page 118: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Scaling EM

EM algorithm may be impractical for large scale problems: need tocompute posterior in E step and to average over z in the M step.

Solutions:I E step: Parametrize the variational distribution q(z |ϕ) or q(z |x , ϕ) and

maximize ELBO over ϕ (variational autoencoder)I M step: Approximate Ez∼qnew(z) [ln p(x , z|θ)] via Monte CarloI Use gradient descent for E and/or M steps

Osvaldo Simeone Brief Intro to ML + Comm 98 / 129

Page 119: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Scaling EM

EM algorithm may be impractical for large scale problems: need tocompute posterior in E step and to average over z in the M step.

Solutions:I E step: Parametrize the variational distribution q(z |ϕ) or q(z |x , ϕ) and

maximize ELBO over ϕ (variational autoencoder)I M step: Approximate Ez∼qnew(z) [ln p(x , z|θ)] via Monte CarloI Use gradient descent for E and/or M steps

Osvaldo Simeone Brief Intro to ML + Comm 98 / 129

Page 120: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Learning: Beyond Maximum Likelihood

ML tends to provide inclusive and “blurry” estimates of thedistribution of the data distribution.

-5 0 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

This can be a problem for tasks such as data generation.

Osvaldo Simeone Brief Intro to ML + Comm 99 / 129

Page 121: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Learning: Beyond Maximum Likelihood

ML can be proven to minimize the KL divergence

KL(pD(x)||p(x |θ)) = Ez∼pD(x)

[ln

pD(x)

p(x |θ)

]betwen the empirical distribution

pD(x) =N[x ]

N(with counts N[x ] = |{n : xn = x}|)

and the model.

Osvaldo Simeone Brief Intro to ML + Comm 100 / 129

Page 122: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Learning: Beyond Maximum LikelihoodThe KL divergence is part of the larger class of f -divergences betweentwo distributions p(x) and q(x):

Df (p||q) = maxT (x)

Ex∼p[T (x)]− Ex∼q[g(T (x))],

for some concave increasing function g(·).

𝑇(𝑥)

𝑥~𝑝(𝑥)

𝑥~𝑞(𝑥)

𝑝 𝑥 if 𝑇 𝑥 large

discriminator

𝑞 𝑥 if 𝑇 𝑥 small

Osvaldo Simeone Brief Intro to ML + Comm 101 / 129

Page 123: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Learning: Generative Adversarial Networks (GANs)

Generalizing the ML problem, GANs attempt to solve the problem

minθ

maxϕ

Ex∼pD [Tϕ(x)]− Ex∼p(x |θ)[g(Tϕ(x))]

for some differentiable function Tϕ(x) of the parameter vector ϕ.

Choice of the divergence (via the discriminator) is tailored to data.

Can be applied to likelihood-free models.

Osvaldo Simeone Brief Intro to ML + Comm 102 / 129

Page 124: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Learning: Generative Adversarial Networks (GANs)

84 [NVIDIA]

Osvaldo Simeone Brief Intro to ML + Comm 103 / 129

Page 125: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Applications to Communication Networks

Fog network architecture [5GPPP]

Core

Network

Edge

Cloud

Wireless

Edge

Access

Network

Core

Cloud

Cloud

Edge

Osvaldo Simeone Brief Intro to ML + Comm 104 / 129

Page 126: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: Overview

At the edge:I PHY: E2E encoding/decoding, CSI compression and feedback,

fingerprinting for localization, blind source separation, blind channelequalization

I MAC/ Link: Clustering for resource allocation, clustering forself-organizing multi-hop networks

Osvaldo Simeone Brief Intro to ML + Comm 105 / 129

Page 127: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYEnd-to-end encoding/decoding for wireless channels – autoencoders

[O’Shea and Hoydis ’17]

Osvaldo Simeone Brief Intro to ML + Comm 106 / 129

Page 128: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYEnd-to-end encoding/decoding for optical channels – autoencodersAlgorithm deficit

[Karanov et al ‘18]

Osvaldo Simeone Brief Intro to ML + Comm 107 / 129

Page 129: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYEnd-to-end encoding/decoding for Gaussian channels with feedback –autoencoders based on Recurrent Neural Network (RNN)Algorithm deficit

[Kim et al ‘18]

Osvaldo Simeone Brief Intro to ML + Comm 108 / 129

Page 130: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYChannel State Information (CSI) compression and feedback –autoencodersModel deficit

[Wen et al ‘17]

Osvaldo Simeone Brief Intro to ML + Comm 109 / 129

Page 131: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYFingerprinting for localization – autoencodersModel deficit

[Xiao et al '17]Osvaldo Simeone Brief Intro to ML + Comm 110 / 129

Page 132: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHYMimicking a propagation channel - GAN (see also [Ye et al ’18])Model deficit

[O’Shea et al ‘18]

Osvaldo Simeone Brief Intro to ML + Comm 111 / 129

Page 133: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: PHY

Mimicking and identifying a propagation channel (e.g., satellite) -generative models

Leveraging domain knowledge improves the learned model.

104 [Ibnkahla ‘00]

Osvaldo Simeone Brief Intro to ML + Comm 112 / 129

Page 134: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: MAC/ Link

Generating artificial examples to augment training set for spectrumsensing - GAN

[Nakashima et al '18]

Osvaldo Simeone Brief Intro to ML + Comm 113 / 129

Page 135: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Edge: MAC/ LinkResource allocation – clusteringAlgorithm deficit

[Abdelnasser et al '14]

Osvaldo Simeone Brief Intro to ML + Comm 114 / 129

Page 136: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Cloud: Overview

At the cloud:I Network: Clustering for group-based access control, anomaly detectionI Application: Community detection in social media, Internet traffic

clustering

Osvaldo Simeone Brief Intro to ML + Comm 115 / 129

Page 137: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Cloud: NetworkSelf-organizing multi-hop networks – clusteringAlgorithm deficit

[Abbassi and Younis '07]

Osvaldo Simeone Brief Intro to ML + Comm 116 / 129

Page 138: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Cloud: NetworkAnomaly detection – density estimationModel deficit

[Musumeci et al ’18]

Osvaldo Simeone Brief Intro to ML + Comm 117 / 129

Page 139: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

At the Cloud: ApplicationCommunity detection in social networks - clusteringModel deficit

[Abbe et al ‘16]Osvaldo Simeone Brief Intro to ML + Comm 118 / 129

Page 140: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Concluding Remarks

Machine learning tools can leverage the availability of data andcomputing resources in modern communication systems.

Supervised, unsupervised and reinforcement learning paradigms lendthemselves to different key communication (sub)tasks.

Not a universal solution – case by case analysis of advantages anddisadvantages.

Osvaldo Simeone Brief Intro to ML + Comm 119 / 129

Page 141: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Concluding Remarks

Engineering the integration of traditional model-based techniques anddata-driven machine learning methods

[Reich ‘96]

Osvaldo Simeone Brief Intro to ML + Comm 120 / 129

Page 142: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

Acknowledgements

This work has received funding from the European Research Council(ERC) under the European Union’s Horizon 2020 research and innovation

programme (grant agreement No. 725731).

Osvaldo Simeone Brief Intro to ML + Comm 121 / 129

Page 143: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

References

[O’Shea and Hoydis ’17] T. J. O’Shea, J. Hoydis, An Introduction toMachine Learning Communications Systems, 2017[Cammerer et al ’17] Cammerer S, Gruber T, Hoydis J, Brink ST. ScalingDeep Learning-based Decoding of Polar Codes via Partitioning. arXivpreprint arXiv:1702.06901. 2017 Feb 22.[Balatsoukas-Stimming ’17] Balatsoukas-Stimming A. Non-Linear DigitalSelf-Interference Cancellation for In-Band Full-Duplex Radios Using NeuralNetworks. arXiv preprint arXiv:1711.00379. 2017 Nov 1.[Sun et al ’17] Sun H, Chen X, Shi Q, Hong M, Fu X, Sidiropoulos ND.Learning to Optimize: Training Deep Neural Networks for WirelessResource Management. arXiv preprint arXiv:1705.09412. 2017 May 26.[de Kerret et al ’17] Paul de Kerret, David Gesbert, Maurizio Filippone,Decentralized Deep Scheduling for Interfrence Channels, arXiv, 2017.[Wen et al ’17] Wen, Chao-Kai; Shih, Wan-Ting; Jin, Shi, Deep Learningfor Massive MIMO CSI Feedback arXiv.

Osvaldo Simeone Brief Intro to ML + Comm 122 / 129

Page 144: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

References[Agirman-Tosun et al ’11] Agirman-Tosun H, et al. “Modulationclassification of MIMO-OFDM signals by independent component analysisand support vector machines.” in Proc. Asilomar 2011.[Fang and Lin ’08] Fang SH, Lin TN. Indoor location system based ondiscriminant-adaptive neural network in IEEE 802.11 environments. IEEETransactions on Neural networks. 2008 Nov;19(11):1973-8.[Tumuluru et al ’10] Tumuluru VK, Wang P, Niyato D. A neural networkbased spectrum prediction scheme for cognitive radio. in Proc. ICCC 2010.[Chen et al ’17] Chen M, et a;. Echo state networks for proactive cachingin cloud-based radio access networks with mobile users. IEEE Transactionson Wireless Communications. 2017.[Xiao et al ’17] Xiao C, Yang D, Chen Z, Tan G. 3-D BLE IndoorLocalization Based on Denoising Autoencoder. IEEE Access.2017;5:12751-60.[Abdelnasser et al ’14] Abdelnasser A, et al. Clustering and resourceallocation for dense femtocells in a two-tier cellular OFDMA network.IEEE Transactions on Wireless Communications, 2014.

Osvaldo Simeone Brief Intro to ML + Comm 123 / 129

Page 145: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

References

[Nguyen et al ’08] Nguyen TT, Armitage G. A survey of techniques forinternet traffic classification using machine learning. IEEECommunications Surveys & Tutorials. 2008 Oct 1;10(4):56-76.[Wang et al 06] Wang, Yong, Margaret Martonosi, and Li-Shiuan Peh. "Asupervised learning approach for routing optimizations in wireless sensornetworks." Proc. ACM Workshop on Multi-hop ad hoc Networks, 2006.[Abbassi and Younis ’07] Abbasi AA, Younis M. A survey on clusteringalgorithms for wireless sensor networks. Computer communications, 2007.[Abbe et al ’16] Abbe E, Bandeira AS, Hall G. Exact recovery in thestochastic block model. IEEE Transactions on Information Theory, 2016.[Wang et al ’18] Wang, Zhi, et al. Handover Control in Wireless Systemsvia Asynchronous Multi-User Deep Reinforcement Learning, arxiv.[Wang et al ’16] Wang D, et al. Nonlinearity Mitigation Using a MachineLearning Detector Based on k-Nearest Neighbors. IEEE PhotonicsTechnology Letters. 2016.

Osvaldo Simeone Brief Intro to ML + Comm 124 / 129

Page 146: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

References[Venkatraman et al ’10] Venkatraman P, et al. Opportunistic bandwidthsharing through reinforcement learning. IEEE Trans. Veh. Techn., 2010.[Iannello et al ’12] Iannello F, Simeone O, Spagnolini U. Optimality ofmyopic scheduling and whittle indexability for energy harvesting sensors. inProc. CISS 2012.[Xu et al ’17] Xu Z, et al. A deep reinforcement learning based frameworkfor power-efficient resource allocation in cloud RANs, in Proc. IEEE ICC2017.[Mnih et al ’15] Mnih V, et al. Human-level control through deepreinforcement learning. Nature, 2015.[Bogale et al ’18] T. Bogale, et al., Machine Intelligence Techniques forNext-Generation Context-Aware Wireless Networks, arxiv.[Tang et al ’17] F. Tang et al., "On Removing Routing Protocol fromFuture Wireless Networks: A Real-time Deep Learning Approach forIntelligent Traffic Control," IEEE Wireless Communications, 2018.[Sallent et al ’15] O. Sallent, et al., "Learning-based coexistence for LTEoperation in unlicensed bands," in Proc. IEEE ICC 2015.

Osvaldo Simeone Brief Intro to ML + Comm 125 / 129

Page 147: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

References[Kato et al ’17] N. Kato et al., "The Deep Learning Vision forHeterogeneous Network Traffic Control: Proposal, Challenges, and FuturePerspective," IEEE Wireless Communications, June 2017.[Siracusano and Bifulco ’18] Siracusano, Giuseppe; Bifulco, Roberto,In-network Neural Networks, arXiv:1801.05731.[He et al ’17] He Y, et al. Deep Reinforcement Learning (DRL)-basedResource Management in Software-Defined and Virtualized Vehicular AdHoc Networks, in Proc. ACM SDAIVN 2017.[Farsad and Goldsmith ’18] Farsad, Nariman; Goldsmith, Andrea, NeuralNetwork Detection of Data Sequences in Communication Systems,arXiv:1802.02046[Emigh et al ’15] Emigh, Matthew, et al. "A model based approach toexploration of continuous-state MDPs using Divergence-to-Go." in Proc.IEEE Machine Learning for Signal Processing (MLSP), 2015.[Caciularu and Burshtein ’18] Avi Caciularu, David Burshtein, BlindChannel Equalization using Variational Autoencoders, in Proc. IEEE ICCworkshop, 2018.

Osvaldo Simeone Brief Intro to ML + Comm 126 / 129

Page 148: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

References[Musumeci et al ’18] Francesco Musumeci, Cristina Rottondi, AvishekNag, Irene Macaluso, Darko Zibar, Marco Ruffini, Massimo Tornatore,Networking and Internet Architecture A Survey on Application of MachineLearning Techniques in Optical Networks, arXiv:1803.07976[Okamoto et al ’18] Hironao Okamoto, et al., Machine-Learning-BasedFuture Received Signal Strength Prediction Using Depth Images formmWave CommunicationsarXiv:1804.00709.[Davaslioglu and Sagduyu ’18] Kemal Davaslioglu, Yalin E. Sagduyu,Generative Adversarial Learning for Spectrum Sensing, in Proc. IEEE ICC2018.[Aoudia and Hoydis ’18] Model Faycal Ait Aoudia, Jakob Hoydis,End-to-End Learning of Communications Systems Without a Channel,arXiv:1804.02276.[Karanov et al ’18] Boris Karanov, et al, End-to-end Deep Learning ofOptical Fiber Communications, arXiv:1804.04097.[Nachmani et al ‘16] Nachmani, et al. "Learning to decode linear codesusing deep learning." Allerton 2016.Osvaldo Simeone Brief Intro to ML + Comm 127 / 129

Page 149: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

References

[O’Shea et al ’18] Timothy J. O’Shea, Tamoghna Roy, Nathan West,“Approximating the Void: Learning Stochastic Channel Models fromObservation with Variational Generative Adversarial Networks”,arXiv:1805.06350.[Zhao et al ’18] Zhifeng Zhao et al, “Deep Reinforcement Learning forNetwork Slicing”, arXiv:1805.06591.[Ibnkahla ’00] Ibnkahla M. Applications of neural networks to digitalcommunications–a survey. Signal processing. 2000 Jul 1;80(7):1185-215.[Bouchired et al ’98] Bouchired S, Roviras D, Castanie F. Equalisation ofsatellite mobile channels with neural network techniques. SpaceCommunications. 1998.[De Veciana and Zakhor ’92] De Veciana G, Zakhor A. Neural net-basedcontinuous phase modulation receivers. IEEE transactions oncommunications. 1992.

Osvaldo Simeone Brief Intro to ML + Comm 128 / 129

Page 150: A Brief Introduction to Machine Learning (With ...I O er some pointers to speci c applications for telecom Learning outcomes: I Recognize scenarios in which machine learning can and

References

[Schibisch et al ‘18] Stefan Schibisch, et al,“Online Label Recovery forDeep Learning-based Communication through Error Correcting Codes,”.arXiv:1807.00747.[Ye et al ’18] Hao Ye, et al, “Channel Agnostic End-to-End Learning basedCommunication Systems with Conditional GAN,” arXiv:1807.00447.[Kim et al ’18] Hyeji Kim, et al “Deepcode: Feedback Codes via DeepLearning,” arXiv:1807.00801.

Osvaldo Simeone Brief Intro to ML + Comm 129 / 129