1 budgeted nonparametric learning from data streams ryan gomes and andreas krause california...

25
1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

1

Budgeted Nonparametric Learning from Data

Streams

Ryan Gomes and Andreas KrauseCalifornia Institute of Technology

Page 2: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Application ExamplesClustering Millions of Internet

Images

Torralba et al. 80 Million tiny images. IEEE PAMI Nov. 2008

2

Page 3: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Application ExamplesNonlinear Regression in Embedded

Systems

Control Input

Act

uato

r S

tate

3

Page 4: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Data Streams

• Can’t access data set all at once• Can’t control order of data access (random access may be available)

Charikar et al. Better streaming algorithms for clustering problems. STOC 2003

4

Page 5: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Data Streams

maximum wait until an element is revisited

elements available at iteration t

5

Page 6: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Nonparametric Methods

• Highly flexible, use training examples to make predictions

• In streaming environment: select budget of K examples to do prediction

6

Page 7: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Problem Statementactive set at iteration t:

monotone utility function: when

,

Given sequence of available elementsmaintain active sets

,

where final

active set satisfies:

7

Page 8: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Exemplar Based Clustering

8

Page 9: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Gaussian Process Regression

information gain

M. Seeger et al. Fast forward selection to speed up sparse gaussian process regression. (AISTATS 2003)

9

Page 10: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Gaussian Process Regression

expected variance reduction

10

Page 11: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Submodularity

andIf then

FC, FV, and FH are all submodular! “diminishing returns”

greater change

smaller change

11

Page 12: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

StreamGreedy

Repeat:

Until forconsecutive iterations

1.

2.

3.

12

Page 13: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Optimality of StreamGreedy

•Clustering-consistency•FC, FV, and FH are clustering-consistent when data consists of very well-separated clusters•Preferable to select exemplar from new cluster rather than two from same cluster

13

Page 14: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Theorem: If F is monotonic, submodular, and clustering-consistent then StreamGreedy finds

after at most iterations.

Optimality of StreamGreedy

14

Page 15: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Approximation Guarantee

Theorem: Assume F is monotonic submodular and further assume F is bounded by constant B. Then StreamGreedy finds

after at most iterations.

•Typically, data does not consist of well-separated clusters •Maximizing F is NP-hard in general

15

Page 16: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Limited Stream Access

Approximate and

Uniform subsample approximation

“validation set”

within accuracy.

16

Page 17: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Approximation Guarantee

Theorem: Assume F is monotonic submodular and may be evaluated to ε-precision. Further, assume F is bounded by constant B. Then StreamGreedy finds

after at most iterations.

•May only be able to approximately evaluate F

17

Page 18: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

with distance

• Convergence rate comparable to online k-means

• Quantization performance difference due to exemplar constraint

MNIST Convergence

18

Example based centers Unconstrained centers

Page 19: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

• Good performance with small validation sets• Larger validation set needed for larger number of

clusters K

Validation Set Size

19

Page 20: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Tiny Images

StreamGreedy Online K-means

> 1.5 millions 28 x 28 pixel RGB images

• Online K-means finds many singleton or empty clusters

20

Page 21: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

StreamGreedy Exemplars

Tiny Images

21

Online k-means centers

Page 22: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

StreamGreedy Cluster Examples

Nearest to exemplar

Randomly Chosen

Tiny Images

22

Page 23: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Run time vs. Accuracy

• Vary and • StreamGreedy performance saturates with run

time• Outperforms Online K-means in less time

23

Page 24: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Gaussian Process RegressionKin-40k dataset

outperforms but requires sufficient validation set

24

Page 25: 1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology

Conclusions

•Flexible framework•Theoretical performance guarantees:•Exemplar based clustering with non-metric similarities in streaming environment•Leads to efficient algorithms•Excellent empirical performance

StreamGreedy

25