1 budgeted nonparametric learning from data streams ryan gomes and andreas krause california...
Post on 21-Dec-2015
214 views
TRANSCRIPT
![Page 1: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/1.jpg)
1
Budgeted Nonparametric Learning from Data
Streams
Ryan Gomes and Andreas KrauseCalifornia Institute of Technology
![Page 2: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/2.jpg)
Application ExamplesClustering Millions of Internet
Images
Torralba et al. 80 Million tiny images. IEEE PAMI Nov. 2008
2
![Page 3: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/3.jpg)
Application ExamplesNonlinear Regression in Embedded
Systems
Control Input
Act
uato
r S
tate
3
![Page 4: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/4.jpg)
Data Streams
• Can’t access data set all at once• Can’t control order of data access (random access may be available)
Charikar et al. Better streaming algorithms for clustering problems. STOC 2003
4
![Page 5: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/5.jpg)
Data Streams
maximum wait until an element is revisited
elements available at iteration t
5
![Page 6: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/6.jpg)
Nonparametric Methods
• Highly flexible, use training examples to make predictions
• In streaming environment: select budget of K examples to do prediction
6
![Page 7: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/7.jpg)
Problem Statementactive set at iteration t:
monotone utility function: when
,
Given sequence of available elementsmaintain active sets
,
where final
active set satisfies:
7
![Page 8: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/8.jpg)
Exemplar Based Clustering
8
![Page 9: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/9.jpg)
Gaussian Process Regression
information gain
M. Seeger et al. Fast forward selection to speed up sparse gaussian process regression. (AISTATS 2003)
9
![Page 10: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/10.jpg)
Gaussian Process Regression
expected variance reduction
10
![Page 11: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/11.jpg)
Submodularity
andIf then
FC, FV, and FH are all submodular! “diminishing returns”
greater change
smaller change
11
![Page 12: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/12.jpg)
StreamGreedy
Repeat:
Until forconsecutive iterations
1.
2.
3.
12
![Page 13: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/13.jpg)
Optimality of StreamGreedy
•Clustering-consistency•FC, FV, and FH are clustering-consistent when data consists of very well-separated clusters•Preferable to select exemplar from new cluster rather than two from same cluster
13
![Page 14: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/14.jpg)
Theorem: If F is monotonic, submodular, and clustering-consistent then StreamGreedy finds
after at most iterations.
Optimality of StreamGreedy
14
![Page 15: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/15.jpg)
Approximation Guarantee
Theorem: Assume F is monotonic submodular and further assume F is bounded by constant B. Then StreamGreedy finds
after at most iterations.
•Typically, data does not consist of well-separated clusters •Maximizing F is NP-hard in general
15
![Page 16: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/16.jpg)
Limited Stream Access
Approximate and
Uniform subsample approximation
“validation set”
within accuracy.
16
![Page 17: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/17.jpg)
Approximation Guarantee
Theorem: Assume F is monotonic submodular and may be evaluated to ε-precision. Further, assume F is bounded by constant B. Then StreamGreedy finds
after at most iterations.
•May only be able to approximately evaluate F
17
![Page 18: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/18.jpg)
with distance
• Convergence rate comparable to online k-means
• Quantization performance difference due to exemplar constraint
MNIST Convergence
18
Example based centers Unconstrained centers
![Page 19: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/19.jpg)
• Good performance with small validation sets• Larger validation set needed for larger number of
clusters K
Validation Set Size
19
![Page 20: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/20.jpg)
Tiny Images
StreamGreedy Online K-means
> 1.5 millions 28 x 28 pixel RGB images
• Online K-means finds many singleton or empty clusters
20
![Page 21: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/21.jpg)
StreamGreedy Exemplars
Tiny Images
21
Online k-means centers
![Page 22: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/22.jpg)
StreamGreedy Cluster Examples
Nearest to exemplar
Randomly Chosen
Tiny Images
22
![Page 23: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/23.jpg)
Run time vs. Accuracy
• Vary and • StreamGreedy performance saturates with run
time• Outperforms Online K-means in less time
23
![Page 24: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/24.jpg)
Gaussian Process RegressionKin-40k dataset
outperforms but requires sufficient validation set
24
![Page 25: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d555503460f94a33221/html5/thumbnails/25.jpg)
Conclusions
•Flexible framework•Theoretical performance guarantees:•Exemplar based clustering with non-metric similarities in streaming environment•Leads to efficient algorithms•Excellent empirical performance
StreamGreedy
25