boston.rb: collaborative filtering
Embed Size (px)
TRANSCRIPT
-
8/14/2019 Boston.rb: Collaborative Filtering
1/50
Collaborative Filtering
Tyler McM
-
8/14/2019 Boston.rb: Collaborative Filtering
2/50
... which for the purposes of this talk means:
Recommendations
-
8/14/2019 Boston.rb: Collaborative Filtering
3/50
Netflix
Google Reader
Pandora
-
8/14/2019 Boston.rb: Collaborative Filtering
4/50
Last.fm
...and of course... Amazon
-
8/14/2019 Boston.rb: Collaborative Filtering
5/50
(shameless plug)
-
8/14/2019 Boston.rb: Collaborative Filtering
6/50
Item A Item B Item C
Bob5 1 5
Suzie 5 1 ?
Joe 1 5 1
I like to think of it as a fill-in-blank puzzle.
-
8/14/2019 Boston.rb: Collaborative Filtering
7/50
Dataset DatasetDataset
Correlate CorrelateCorrelate
Recommendations
Content Booster
Output
-
8/14/2019 Boston.rb: Collaborative Filtering
8/50
Data
-
8/14/2019 Boston.rb: Collaborative Filtering
9/50
Data
Data > Algorithms
-
8/14/2019 Boston.rb: Collaborative Filtering
10/50
Data
Amazon uses a simple item-to-item correlation system
-
8/14/2019 Boston.rb: Collaborative Filtering
11/50
Data
Amazon uses a simple item-to-item correlation system
How can they get away with that?
~ 20 million items
n million users
-
8/14/2019 Boston.rb: Collaborative Filtering
12/50
If every user bought 200 items their user-itemmatrix would be 0.001% full
Data
-
8/14/2019 Boston.rb: Collaborative Filtering
13/50
Datapurchases
ratings
-
8/14/2019 Boston.rb: Collaborative Filtering
14/50
Datapurchases
ratings
views
votes
tell-a-friend
wishlists
wedding registry
baby registry
shopping cart
-
8/14/2019 Boston.rb: Collaborative Filtering
15/50
Datapurchases
ratings
views
votes
tell-a-friend
wishlists
wedding registry
baby registry
shopping cart
anything you can measure!
-
8/14/2019 Boston.rb: Collaborative Filtering
16/50
Data
Data > Algorithms
more different data > more of the same data
-
8/14/2019 Boston.rb: Collaborative Filtering
17/50
Correlation
-
8/14/2019 Boston.rb: Collaborative Filtering
18/50
Correlatio
Find patterns in the data sets
-
8/14/2019 Boston.rb: Collaborative Filtering
19/50
Correlatio
Pearson
Singular Value Decomposition
-
8/14/2019 Boston.rb: Collaborative Filtering
20/50
Correlatio
Pearson
Singular Value Decomposition
Kendall tau coefficient
Spearman's rho
point biserial correlation coefficient
C l i
-
8/14/2019 Boston.rb: Collaborative Filtering
21/50
Correlatio
Word of Caution: Watch for O(n2) here
-
8/14/2019 Boston.rb: Collaborative Filtering
22/50
Recommendation
R d ti
-
8/14/2019 Boston.rb: Collaborative Filtering
23/50
Recommendatio
This is the part where we figure out what you'll like.
R d ti
-
8/14/2019 Boston.rb: Collaborative Filtering
24/50
Recommendatio
Bob Suzie Joe
Bob -0.74 0.856
Suzie 0.87 0.1
Joe 0.74 -0.9
So we have all these correlation matrices.
One for each of the datasets that we correlated.
R d ti
-
8/14/2019 Boston.rb: Collaborative Filtering
25/50
Recommendatio
Joe 0.9
Bob 0.75
Suzie 0.5
So let's say we have a user named Fred...
Recommendatio
-
8/14/2019 Boston.rb: Collaborative Filtering
26/50
Recommendatio
Joe 0.9
Bob 0.75
Suzie 0.5
JoeItem A 5
Item B 4
BobItem B 5
Item C 2
SuzieItem C 2
Item A 2
Recommendatio
-
8/14/2019 Boston.rb: Collaborative Filtering
27/50
Recommendatio
JoeItem A 5
Item B 4
BobItem B 5
Item C 2
SuzieItem C 2
Item A 2
Item AJoe 5
Suzie 2
Item BJoe 4
Bob 5
Item CBob 2
Suzie 2
Recommendatio
-
8/14/2019 Boston.rb: Collaborative Filtering
28/50
Recommendatio
Item AJoe 5
Suzie 2
Item BJoe 4
Bob 5
Item CBob 2
Suzie 2
Item A 3.93
Item B 4.45
Item C 2
Recommendatio
-
8/14/2019 Boston.rb: Collaborative Filtering
29/50
Recommendatio
Item A 3.93
Item B 4.45
Item C 2
-
8/14/2019 Boston.rb: Collaborative Filtering
30/50
Content Boosting
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
31/50
Content Boostin
Your users reveal their preferences in their actions.
-
8/14/2019 Boston.rb: Collaborative Filtering
32/50
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
33/50
Content Boostin
Your users reveal their preferences in their actions.
If I mark every horror movie in your system as a 1... I don't like horror movies.
If I rate every Will Smith movie as 5 stars... I probably like Will Smith.
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
34/50
Content Boostin
All Items have properties.
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
35/50
Content Boostin
All Items have properties.
Movies have genres, actors, studio, locations, etc...
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
36/50
Content Boostin
All Items have properties.
Movies have genres, actors, studio, locations, etc...
Comics have genres, writers, artists, publishers, etc...
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
37/50
All Items have properties.
Movies have genres, actors, studio, locations, etc...
Comics have genres, writers, artists, publishers, etc...
Kittens have color, gender, breed, cute captions, etc...
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
38/50
I Am Legend 5Action
Will Smith
Cloverfield 4Action
No Will Smith
Independence Day 4ActionWill Smith
Sleepless in Seattle 1Romance
No Will Smith
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
39/50
I Am Legend 5Action
Will Smith
Cloverfield 4Action
No Will Smith
Independence Day 4ActionWill Smith
Sleepless in Seattle 1Romance
No Will Smith
So what do my preferences say about m
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
40/50
I Am Legend 5Action
Will Smith
Cloverfield 4Action
No Will Smith
Independence Day 4ActionWill Smith
Sleepless in Seattle 1Romance
No Will Smith
So what do my preferences say about m
My mean rating is 3.5, so...
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
41/50
I Am Legend 5Action
Will Smith
Cloverfield 4Action
No Will Smith
Independence Day 4ActionWill Smith
Sleepless in Seattle 1Romance
No Will Smith
So what do my preferences say about m
My mean rating is 3.5, so...
Action: +0.8
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
42/50
I Am Legend 5Action
Will Smith
Cloverfield 4Action
No Will Smith
Independence Day 4ActionWill Smith
Sleepless in Seattle 1Romance
No Will Smith
So what do my preferences say about m
My mean rating is 3.5, so...
Action: +0.8
Romance: -2.5
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
43/50
I Am Legend 5Action
Will Smith
Cloverfield 4Action
No Will Smith
Independence Day 4ActionWill Smith
Sleepless in Seattle 1Romance
No Will Smith
So what do my preferences say about m
My mean rating is 3.5, so...
Action: +0.8
Romance: -2.5
Will Smith: +1
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
44/50
Your recommendations are only as good as theamount and quality of your data.
Content Boostin
-
8/14/2019 Boston.rb: Collaborative Filtering
45/50
Your recommendations are only as good as theamount and quality of your data.
Content Boosting is thus especially useful if you have limited data.
-
8/14/2019 Boston.rb: Collaborative Filtering
46/50
Output
-
8/14/2019 Boston.rb: Collaborative Filtering
47/50
Output
I have nothing interesting to say about output...
-
8/14/2019 Boston.rb: Collaborative Filtering
48/50
Output
I have nothing interesting to say about output...
Moving on.
-
8/14/2019 Boston.rb: Collaborative Filtering
49/50
Now let's look at some code.
-
8/14/2019 Boston.rb: Collaborative Filtering
50/50
http://github.com/tyler/collaborative_filter