bayesian deep learning (mlss 2019) · 2019. 8. 27. · bayesian deep learning (mlss 2019) yarin gal...

Bayesian Deep Learning (MLSS 2019)

Yarin Gal

University of Oxfordyarin@cs.ox.ac.uk

Unless specified otherwise, photos are either original work or taken from Wikimedia, under Creative Commons license

Contents

Today:

I Introduction

I The Language of Uncertainty

I Bayesian Probabilistic Modelling

I Bayesian Probabilistic Modelling of Functions

2 of 54

Bayesian Deep Learning: Introduction

Introduction

3 of 54

With great power...

I Many engineering advances in ML

I Systems applied to toy data→ deployed in real-life settings

I Control handed-over to automatedsystems; w many scenarios which canbecome life-threatening to humans

I Medical: automated decision making orrecommendation systems

I Automotive: autonomous control of dronesand self driving cars

I High frequency trading: ability to affecteconomic markets on global scale

I But all of these can be quite dangerous...

4 of 54

With great power...

4 of 54

With great power...

4 of 54

With great power...

4 of 54

With great power...

4 of 54

Example: Medical Diagnostics

I dAIbetes: an exciting new startup (not really)I claims to automatically diagnose diabetic retinopathyI accuracy 99% on their 4 train/test patientsI engineer trained two deep learning systems to predict probability y

given input fondus image x .

The engineer runs their system on your fondus image x∗ (RHS):

I Which model f1, f2 would you want the engineer to use for yourdiagnosis?

5 of 54

I Which model f1, f2 would you want the engineer to use for yourdiagnosis?

5 of 54

I Which model f1, f2 would you want the engineer to use for yourdiagnosis? None of these! (‘I don’t know’)

5 of 54

Example: Autonomous Driving

Autonomous systems

I Range from simple robotic vacuums toself-driving cars

I Largely divided into systems which

I control behaviour w rule-based systems

I learn and adapt to environment

Both can use of ML tools

I ML for low-level feature extraction(perception)

I reinforcement learning

6 of 54

Example: Autonomous Driving (cnt)

Real-world example: assisted drivingI first fatality of assisted driving (June 2016)I low-level system failed to distinguish white side of trailer from

bright sky

7 of 54

Real-world example: assisted drivingI first fatality of assisted driving (June 2016)I low-level system failed to distinguish white side of trailer from

bright sky

If system had identified its own uncertainty:I alert user to take control over steeringI propagate uncertainty to decision making

7 of 54

Point estimates

In medical / robotics / science...

X can’t use ML models giving asingle point estimate (singlevalue) in prediction

V must use ML models giving ananswer that says ‘10 but I’muncertain’; or ‘10± 5’

I Give me a distribution overpossible outcomes!

8 of 54

Point estimates

8 of 54

Point estimates

8 of 54

ML pipeline in self-driving carsI process raw sensory input w perception models

I eg image segmentation to find where other cars and pedestrians areI output fed into prediction model

I eg where other car will goI output fed into ‘higher-level’ decision making procedures

I eg rule based system (“cyclist to your left → do not steer left”)I industry’s starting to use uncertainty for lots of components in the

pipelineI eg pedestrian prediction models predict a distribution of

pedestrian locations in X timestepsI or uncertainty in perception components

9 of 54

10 of 54

Sources of uncertainty

I Above are some examples of uncertainty

I Many other sources of uncertainty

I Test data is very dissimilar to training dataI model trained on diabetes fondus photos of subpopulation AI never saw subpopulation B

I “images are outside data distribution model was trained on”I desired behaviour

I return a prediction (attempting to extrapolate)I +information that image lies outside data distribution

I (model retrained w subpop. B labels → low uncertainty on these)

11 of 54

Sources of uncertainty (cnt)

I Uncertainty in model parameters that best explain data

I large number of possible models can explain a dataset

I uncertain which model parameters to choose to predict with

I affects how we predict with new test points

12 of 54

Sources of uncertainty (cnt)

I Training labels are noisyI measurement imprecision

I expert mistakes

I crowd sourced labels

even infinity data → ambiguity inherent in data itself

13 of 54

Deep learning models are deterministic

Deep learning does not capture uncertainty:

I regression models output a single scalar/vector

I classification models output a probability vector (erroneouslyinterpreted as model uncertainty)

2010 2015 2020 2025 2030 2035 2040 2045Year

f Firs

14 of 54

But when combined with probability theory can capture uncertainty ina principled way

→ known as Bayesian Deep Learning

14 of 54

Teaser: Uncertainty in Autonomous Driving

15 of 54

Teaser

Define model and train on data x train, y train:

1 from tensorflow.keras.layers import Input, Dense, Dropout2

3 inputs = Input(shape=(1,))4 x = Dense(512, activation="relu")(inputs)5 x = Dropout(0.5)(x, training=True)6 x = Dense(512, activation="relu")(x)7 x = Dropout(0.5)(x, training=True)8 outputs = Dense(1)(x)9

10 model = tf.keras.Model(inputs, outputs)11 model.compile(loss="mean_squared_error",12 optimizer="adam")13 model.fit(x_train, y_train)

16 of 54

Teaser

1 # do stochastic forward passes on x_test:2 samples = [model.predict(x_test) for _ in range(100)]3 m = np.mean(samples, axis=0) # predictive mean4 v = np.var(samples, axis=0) # predictive variance5

6 # plot mean and uncertainty7 plt.plot(x_test, m)8 plt.fill_between(x_test, m - 2*v**0.5, m + 2*v**0.5,9 alpha=0.1) # plot two std (95% confidence)

Playgroud (working code): bdl101.ml/play

17 of 54

Bayesian deep learning

All resources (including these slides): bdl101.ml

18 of 54

Today and tomorrow we’ll understand why thiscode makes sense, and get a taste of

I the formal language of uncertainty(Bayesian probability theory)

I tools to use this language in ML (Bayesianprob. modelling)

I techniques to scale to real-world deeplearning systems (modern variationalinference)

I developing big deep learning systemswhich convey uncertainty

I w real-world examples

19 of 54

Basic concepts marked green (if you want to use as tools); Advancedtopics marked amber (if you want to develop new stuff in BDL)

19 of 54

Bayesian Probability Theory

Bayesian Probability Theory:the Language of Uncertainty

Deriving the laws of probability theory from rational degrees of belief

20 of 54

Betting game 1 (some philosophy for the soul)

1 import numpy as np2 def toss():3 if np.random.rand() < 0.5:4 print(’Heads’)5 else:6 print(’Tails’)

I unit wager : a ‘promise note’ where seller commits to pay noteowner £1 if outcome of toss=‘heads’; a tradeable note; eg..

21 of 54

I would you pay p=£0.01 for a unit wager on ‘heads’?I pay a penny to buy a note where I commit to paying £1 if ‘heads’

I p=£0.99?I pay 99 pence for a note where I commit to paying £1 if ‘heads’