strata new-york-2012
DESCRIPTION
This set of slides describes several on-line learning algorithms which taken together can provide significant benefit to real-time applications.TRANSCRIPT
![Page 1: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/1.jpg)
1©MapR Technologies - Confidential
Online Learning Bayesian bandits and more
![Page 2: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/2.jpg)
2©MapR Technologies - Confidential
whoami – Ted Dunning
Ted [email protected]@apache.org@ted_dunning
We’re hiring at MapR
For slides and other info http://www.slideshare.net/tdunning
![Page 3: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/3.jpg)
3©MapR Technologies - Confidential
Online
ScalableIncremental
![Page 4: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/4.jpg)
4©MapR Technologies - Confidential
Scalability and Learning
What does scalable mean?
What are inherent characteristics of scalable learning?
What are the logical implications?
![Page 5: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/5.jpg)
5©MapR Technologies - Confidential
Scalable ≈ On-line
If you squint just right
![Page 6: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/6.jpg)
6©MapR Technologies - Confidential
unit of work ≈ unit of time
![Page 7: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/7.jpg)
7©MapR Technologies - Confidential
Learning
State
Infinite Data
Stream
![Page 8: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/8.jpg)
8©MapR Technologies - Confidential
Pick One
![Page 9: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/9.jpg)
9©MapR Technologies - Confidential
![Page 10: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/10.jpg)
10©MapR Technologies - Confidential
![Page 11: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/11.jpg)
11©MapR Technologies - Confidential
Now pick again
![Page 12: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/12.jpg)
12©MapR Technologies - Confidential
A Quick Diversion
You see a coin– What is the probability of heads?– Could it be larger or smaller than that?
I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change?– And did it ever have a single value?
![Page 13: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/13.jpg)
13©MapR Technologies - Confidential
Which One to Play?
One may be better than the other The better coin pays off at some rate Playing the other will pay off at a lesser rate– Playing the lesser coin has “opportunity cost”
But how do we know which is which?– Explore versus Exploit!
![Page 14: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/14.jpg)
14©MapR Technologies - Confidential
A First Conclusion
Probability as expressed by humans is subjective and depends on information and experience
![Page 15: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/15.jpg)
15©MapR Technologies - Confidential
A Second Conclusion
A single number is a bad way to express uncertain knowledge
A distribution of values might be better
![Page 16: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/16.jpg)
16©MapR Technologies - Confidential
I Dunno
![Page 17: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/17.jpg)
17©MapR Technologies - Confidential
5 and 5
![Page 18: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/18.jpg)
18©MapR Technologies - Confidential
2 and 10
![Page 19: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/19.jpg)
19©MapR Technologies - Confidential
The Cynic Among Us
![Page 20: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/20.jpg)
20©MapR Technologies - Confidential
Demo
![Page 21: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/21.jpg)
21©MapR Technologies - Confidential
An Example
![Page 22: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/22.jpg)
22©MapR Technologies - Confidential
An Example
![Page 23: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/23.jpg)
23©MapR Technologies - Confidential
The Cluster Proximity Features
Every point can be described by the nearest cluster – 4.3 bits per point in this case– Significant error that can be decreased (to a point) by increasing number of
clusters Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign
bit + 2 proximities)– Error is negligible– Unwinds the data into a simple representation
![Page 24: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/24.jpg)
24©MapR Technologies - Confidential
Diagonalized Cluster Proximity
![Page 25: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/25.jpg)
25©MapR Technologies - Confidential
Lots of Clusters Are Fine
![Page 26: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/26.jpg)
26©MapR Technologies - Confidential
Surrogate Method
Start with sloppy clustering into κ = k log n clusters Use these clusters as a weighted surrogate for the data Cluster surrogate data using ball k-means
Results are provably high quality for highly clusterable data Sloppy clustering can be done on-line Surrogate can be kept in memory Ball k-means pass can be done at any time
![Page 27: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/27.jpg)
27©MapR Technologies - Confidential
Algorithm Costs
O(k d log n) per point for Lloyd’s algorithm … not so good for k = 2000, n = 108
Surrogate methods …. O(d log κ) = O(d (log k + log log n)) per point
This is a big deal:– k d log n = 2000 x 10 x 26 = 500,000– log k + log log n = 11 + 5 = 17– 30,000 times faster makes the grade as a bona fide big deal
![Page 28: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/28.jpg)
28©MapR Technologies - Confidential
30,000 times faster sounds good
![Page 29: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/29.jpg)
29©MapR Technologies - Confidential
30,000 times faster sounds good
but that isn’t the big news
![Page 30: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/30.jpg)
30©MapR Technologies - Confidential
30,000 times faster sounds good
but that isn’t the big news
these algorithms do on-line clustering
![Page 31: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/31.jpg)
31©MapR Technologies - Confidential
Parallel Speedup?
✓
![Page 32: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/32.jpg)
32©MapR Technologies - Confidential
What about deployment?
![Page 33: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/33.jpg)
33©MapR Technologies - Confidential
Learning
State
Infinite Data
Stream
![Page 34: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/34.jpg)
34©MapR Technologies - Confidential
Mapper
State
Data Split
![Page 35: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/35.jpg)
35©MapR Technologies - Confidential
Mapper
State
Data Split
Need shared memory!
MapperMapper
![Page 36: Strata new-york-2012](https://reader031.vdocument.in/reader031/viewer/2022020115/554f5b17b4c905524c8b54d3/html5/thumbnails/36.jpg)
36©MapR Technologies - Confidential
whoami – Ted Dunning
We’re hiring at MapR
Ted [email protected]@apache.org@ted_dunning
For slides and other infohttp://www.slideshare.net/tdunning