introduction to machine learning10701/slides/13_active_learning.pdf · higher dimensions...
TRANSCRIPT
![Page 1: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/1.jpg)
1
Introduction to Machine Learning
Active Learning
Barnabás Póczos
![Page 2: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/2.jpg)
2
Credits
Some of the slides are taken from Nina Balcan.
![Page 3: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/3.jpg)
3
Modern applications: massive amounts of raw data.
Only a tiny fraction can be annotated by human experts.
Billions of webpages Images
Classic Supervised Learning Paradigm is Insufficient Nowadays
Sensor measurements
![Page 4: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/4.jpg)
4
Modern applications: massive amounts of raw data.
Expert
We need techniques that minimize need for expert/human
intervention => Active Learning
Modern ML: New Learning Approaches
The Large Synoptic
Survey Telescope
15 Terabytes of data …
every night
![Page 5: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/5.jpg)
5
Active Learning Intro
▪ Batch Active Learning vs Selective Sampling Active Learning
▪ Exponential Improvement on # of labels
▪ Sampling bias: Active Learning can hurt performance
Active Learning with SVM
Gaussian Processes▪ Regression
▪ Properties of Multivariate Gaussian distributions
▪ Ridge regression
▪ GP = Bayesian Ridge Regression + Kernel trick
Active Learning with Gaussian Processes
Contents
![Page 6: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/6.jpg)
6
• Two faces of active learning. Sanjoy Dasgupta. 2011.
• Active Learning. Bur Settles. 2012.
• Active Learning. Balcan-Urner. Encyclopedia of Algorithms. 2015
Additional resources
![Page 7: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/7.jpg)
7
A Label for that Example
Request for the Label of an Example
A Label for that Example
Request for the Label of an Example
Data Source
A set of
Unlabeled
examples
. . .
Algorithm outputs a classifier w.r.t D
Learning
Algorithm
Expert
• Learner can choose specific examples to be labeled.
• Goal: use fewer labeled examples
[pick informative examples to be labeled].
Underlying data distr. D.
Batch Active Learning
![Page 8: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/8.jpg)
8
Unlabeled
example 𝑥3
Unlabeled
example 𝑥1
Unlabeled
example 𝑥2
Request for
label or let it
go?
Data Source
Learning
Algorithm
Expert
Request label
A label 𝑦1for example 𝑥1
Let it
go
Algorithm outputs a classifier w.r.t D
Request label
A label 𝑦3for example 𝑥3
• Selective sampling AL (Online AL): stream of unlabeled examples,
when each arrives make a decision to ask for label or not.
• Goal: use fewer labeled examples[pick informative examples to be labeled].
Underlying data distr. D.
Selective Sampling Active Learning
![Page 9: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/9.jpg)
9
• Need to choose the label requests carefully, to get
informative labels.
• Guaranteed to output a relatively good classifier for
most learning problems.
• Doesn’t make too many label requests.
Hopefully a lot less than passive learning.
What Makes a Good Active Learning Algorithm?
![Page 10: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/10.jpg)
10
• YES! (sometimes)
• We often need far fewer labels for active learning
than for passive.
• This is predicted by theory and has been observed
in practice.
Can adaptive querying really do better than passive/random sampling?
![Page 11: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/11.jpg)
11
• Threshold fns on the real line:
w
+-
Exponential improvement.
hw(x) = 1(x ¸ w), C = {hw: w 2 R}
• How can we recover the correct labels with ≪ N queries?
-
• Do binary search (query at half)!
Active: only O(log 1/ϵ) labels.
Passive supervised: Ω(1/ϵ) labels to find an -accurate threshold.
+-
Active Algorithm
Just need O(log N) labels!
• N = O(1/ϵ) we are guaranteed to get a classifier of error ≤ ϵ.
• Get N unlabeled examples
• Output a classifier consistent with the N inferred labels.
Can adaptive querying help? [CAL92, Dasgupta04]
![Page 12: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/12.jpg)
12
Uncertainty sampling in SVMs common and quite useful in
practice.
• At any time during the alg., we have a “current guess” wt
of the separator: the max-margin separator of all labeled
points so far.
• Request the label of the example closest to the current
separator.
E.g., [Tong & Koller, ICML 2000; Jain, Vijayanarasimhan & Grauman, NIPS 2010;
Schohon Cohn, ICML 2000]
Active SVM Algorithm
Active SVM
![Page 13: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/13.jpg)
13
Active SVM seems to be quite useful in practice.
• Find 𝑤𝑡 the max-margin
separator of all labeled
points so far.
• Request the label of the example
closest to the current separator:
minimizing 𝑥𝑖 ⋅ 𝑤𝑡 .
[Tong & Koller, ICML 2000; Jain, Vijayanarasimhan & Grauman, NIPS 2010]
Algorithm (batch version)
Input Su={x1, …,xmu} drawn i.i.d from the underlying source D
Start: query for the labels of a few random 𝑥𝑖s.
For 𝒕 = 𝟏, ….,
(highest uncertainty)
Active SVM
![Page 14: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/14.jpg)
14
• Uncertainty sampling works sometimes….
• However, we need to be very very very careful!!!
• Myopic, greedy techniques can suffer from sampling bias.
(The active learning algorithm samples from a different (x,y)
distribution than the true data)
• A bias created because of the querying strategy; as time goes
on the sample is less and less representative of the true data
source.[Dasgupta10]
DANGER!!!
![Page 15: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/15.jpg)
15
DANGER!!!
• Main tension: want to choose informative points, but also want to guarantee that the classifier we output does well on true random examples from the underlying distribution.
• Observed in practice too!!!!
![Page 16: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/16.jpg)
16
Interesting open question to analyze under
what conditions they are successful.
Other Interesting Active Learning Techniques used in Practice
![Page 17: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/17.jpg)
17
Centroid of largest unsampled cluster
[Jaime G. Carbonell]
Density-Based Sampling
![Page 18: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/18.jpg)
18
Closest to decision boundary (Active SVM)
[Jaime G. Carbonell]
Uncertainty Sampling
![Page 19: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/19.jpg)
19
Maximally distant from labeled x’s
[Jaime G. Carbonell]
Maximal Diversity Sampling
![Page 20: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/20.jpg)
20
Uncertainty + Diversity criteria
Density + uncertainty criteria
[Jaime G. Carbonell]
Ensemble-Based Possibilities
![Page 21: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/21.jpg)
21
Active learning could be really helpful, could provide
exponential improvements in label complexity (both
theoretically and practically)!
Need to be very careful due to sampling bias.
Common heuristics
• (e.g., those based on uncertainty sampling).
What You Should Know so far
![Page 22: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/22.jpg)
22
Gaussian Processes for Regression
![Page 23: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/23.jpg)
23
http://www.gaussianprocess.org/
Some of these slides are taken from D. Lizotte, R. Parr, C. Guesterin
Additional resources
![Page 24: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/24.jpg)
24
• Nonmyopic Active Learning of Gaussian Processes: An
Exploration–Exploitation Approach. A.Krause and C. Guestrin,
ICML 2007
• Near-Optimal Sensor Placements in Gaussian Processes: Theory,
Efficient Algorithms and Empirical Studies. A.Krause, A. Singh,
and C. Guestrin, Journal of Machine Learning Research 9 (2008)
• Bayesian Active Learning for Posterior Estimation, Kandasamy,
K., Schneider, J., and Poczos, B, International Joint Conference on
Artificial Intelligence (IJCAI), 2015
Additional resources
![Page 25: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/25.jpg)
25
Why GPs for Regression?
Motivation:
All the above regression method give point estimates. We would like a method that could also provide confidence during the estimation.
Regression methods:
Linear regression, multilayer precpetron, ridge regression, support vector regression, kNN regression, etc…
Application in Active Learning:
This method can be used for active learning: query the next point and its label where the uncertainty is the highest
![Page 26: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/26.jpg)
26
• Here’s where the function will most likely be.(expected function)
• Here are some examples of what it might look like. (sampling from the posterior distribution [blue, red, green functions)
• Here is a prediction of what you’ll see if you evaluate your function at x’, with confidence
GPs can answer the following questions:
Why GPs for Regression?
![Page 27: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/27.jpg)
27
Properties of Multivariate Gaussian
Distributions
![Page 28: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/28.jpg)
28
1D Gaussian DistributionParameters
• Mean,
• Variance, 2
![Page 29: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/29.jpg)
29
Multivariate Gaussian
![Page 30: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/30.jpg)
30
A 2-dimensional Gaussian is defined by
• a mean vector = [ 1, 2 ]
• a covariance matrix:
where i,j2 = E[ (xi – i) (xj – j) ]
is (co)variance
Note: is symmetric,
“positive semi-definite”: x: xT x 0
2
2,2
2
2,1
2
1,2
2
1,1
Multivariate Gaussian
![Page 31: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/31.jpg)
31
Multivariate Gaussian examples
= (0,0)
18.0
8.01
![Page 32: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/32.jpg)
32
Multivariate Gaussian examples
= (0,0)
18.0
8.01
![Page 33: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/33.jpg)
33
Marginal distributions of Gaussians are Gaussian
Given:
The marginal distribution is:
bbba
abaa
baba xxx ),(),,(
Useful Properties of Gaussians
![Page 34: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/34.jpg)
34
Marginal distributions of Gaussians are Gaussian
![Page 35: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/35.jpg)
35
Block Matrix Inversion
Theorem
Definition: Schur complements
![Page 36: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/36.jpg)
36
Conditional distributions of Gaussians are Gaussian
Notation:
Conditional Distribution:
bbba
abaa1
bbba
abaa
Useful Properties of Gaussians
![Page 37: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/37.jpg)
37
Higher Dimensions Visualizing > 3 dimensional Gaussian random
variables is… difficult
Means and variances of marginal variable s are practical, but then we don’t see correlations between those variables
Marginals are Gaussian, e.g., f(6) ~ N(µ(6), σ2(6))
Visualizing an 8-dimensional Gaussian variable f:
µ(6)σ2(6)
654321 7 8
![Page 38: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/38.jpg)
38
Yet Higher DimensionsWhy stop there?
![Page 39: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/39.jpg)
39
Getting Ridiculous
Why stop there?
![Page 40: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/40.jpg)
40
Gaussian Process
Probability distribution indexed by an arbitrary set (integer, real, finite dimensional vector, etc)
Each element (indexed by x) is a Gaussian distribution over the reals with mean µ(x)
These distributions are dependent/correlated as defined by k(x,z)
Any finite subset of indices defines a multivariate Gaussian distribution
Definition of GP:
![Page 41: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/41.jpg)
41
Gaussian Process
Distribution over functions….
If our regression model is a GP, then it won’t be a point estimate anymore! It can provide regression estimates with confidence
Domain (index set) of the functions can be pretty much whatever
• Reals
• Real vectors
• Graphs
• Strings
• Sets
• …
![Page 42: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/42.jpg)
42
Bayesian Updates for GPs
• How can we do regression and learn the GP from data?
• We will be Bayesians today:• Start with GP prior
• Get some data
• Compute a posterior
![Page 43: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/43.jpg)
43
Samples from the prior distribution
Picture is taken from Rasmussen and Williams
![Page 44: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/44.jpg)
44
Samples from the posterior distribution
Picture is taken from Rasmussen and Williams
![Page 45: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/45.jpg)
45
PriorZero mean Gaussians with covariance k(x,z)
![Page 46: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/46.jpg)
46
Data
![Page 47: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/47.jpg)
47
Posterior
![Page 48: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/48.jpg)
48
Ridge Regression
Linear regression:
Ridge regression:
The Gaussian Process is a Bayesian Generalization
of the kernelized ridge regression
![Page 49: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/49.jpg)
49
Weight Space View
GP = Bayesian ridge regression in feature space
+ Kernel trick to carry out computations
The training data
![Page 50: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/50.jpg)
50
Bayesian Analysis of Linear Regression with Gaussian noise
Linear regression:
Linear regression
with noise:
![Page 51: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/51.jpg)
51
Bayesian Analysis of Linear Regression with Gaussian noise
The likelihood:
![Page 52: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/52.jpg)
52
Bayesian Analysis of Linear Regression with Gaussian noise
The prior:
Now, we can calculate the posterior:
![Page 53: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/53.jpg)
53
Bayesian Analysis of Linear Regression with Gaussian noise
After “completing the square”
MAP estimation
Ridge Regression
![Page 54: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/54.jpg)
54
Bayesian Analysis of Linear Regression with Gaussian noise
This posterior covariance matrix doesn’t depend on the observations y,
A strange property of Gaussian Processes
![Page 55: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/55.jpg)
55
Projections of Inputs into Feature Space
The Bayesian linear regression suffers from
limited expressiveness
To overcome the problem )
go to a feature space and do linear regression there
a., explicit features
b., implicit features (kernels)
![Page 56: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/56.jpg)
56
Explicit Features
Linear regression in the feature space
![Page 57: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/57.jpg)
57
Explicit Features
The predictive distribution after feature map:
Reminder: This is what we had without feature maps:
![Page 58: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/58.jpg)
58
Explicit Features
Shorthands:
The predictive distribution after feature map:
![Page 59: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/59.jpg)
59
Explicit FeaturesThe predictive distribution after feature map:
A problem with (*) is that it needs an NxN matrix inversion...
(*)
(*) can be rewritten:
Theorem:
![Page 60: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/60.jpg)
60
Proofs• Mean expression. We need:
• Variance expression. We need:
Lemma:
Matrix inversion Lemma:
![Page 61: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/61.jpg)
61
From Explicit to Implicit Features
![Page 62: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/62.jpg)
62
From Explicit to Implicit Features
The feature space always appears in the form of:
Lemma:
No need to know the explicit N dimensional features.
Their inner product is enough.
![Page 63: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/63.jpg)
63
GP pseudo code Inputs:
![Page 64: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/64.jpg)
64
GP pseudo code (continued)
Outputs:
![Page 65: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/65.jpg)
65
Results
![Page 66: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/66.jpg)
66
Results using Netlab , Sin function
![Page 67: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/67.jpg)
67
Results using Netlab, Sin function
Increased # of training points
![Page 68: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/68.jpg)
68
Results using Netlab, Sin function
Increased noise
![Page 69: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/69.jpg)
69
Results using Netlab, Sinc function
![Page 70: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/70.jpg)
70
Applications: Sensor placement
Temperature modeling with GP
Near-Optimal Sensor Placements in Gaussian
Processes: Theory, Efficient Algorithms and
Empirical Studies. A.Krause, A. Singh, and C.
Guestrin, Journal of Machine Learning Research (2008)
![Page 71: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/71.jpg)
71
An example of placements chosen using entropy and mutual information
criteria on temperature data. Diamonds indicate the positions chosen
using entropy; squares the positions chosen using MI.
Applications: Sensor placement
Entropy criterion:
Mutual information criterion:
![Page 72: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/72.jpg)
Properties of Multivariate Gaussian distribution
Gaussian process = Bayesian Ridge Regression
GP Algorithm
GP application in active learning
What You Should Know
![Page 73: Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances](https://reader033.vdocument.in/reader033/viewer/2022042916/5f57a73da79e9f3b59591c7a/html5/thumbnails/73.jpg)
73
Thanks for the Attention! ☺