boston.rb: collaborative filtering

Upload: senor-smiles

Post on 30-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Boston.rb: Collaborative Filtering

    1/50

    Collaborative Filtering

    Tyler McM

  • 8/14/2019 Boston.rb: Collaborative Filtering

    2/50

    ... which for the purposes of this talk means:

    Recommendations

  • 8/14/2019 Boston.rb: Collaborative Filtering

    3/50

    Netflix

    Google Reader

    Pandora

  • 8/14/2019 Boston.rb: Collaborative Filtering

    4/50

    Last.fm

    ...and of course... Amazon

  • 8/14/2019 Boston.rb: Collaborative Filtering

    5/50

    (shameless plug)

  • 8/14/2019 Boston.rb: Collaborative Filtering

    6/50

    Item A Item B Item C

    Bob5 1 5

    Suzie 5 1 ?

    Joe 1 5 1

    I like to think of it as a fill-in-blank puzzle.

  • 8/14/2019 Boston.rb: Collaborative Filtering

    7/50

    Dataset DatasetDataset

    Correlate CorrelateCorrelate

    Recommendations

    Content Booster

    Output

  • 8/14/2019 Boston.rb: Collaborative Filtering

    8/50

    Data

  • 8/14/2019 Boston.rb: Collaborative Filtering

    9/50

    Data

    Data > Algorithms

  • 8/14/2019 Boston.rb: Collaborative Filtering

    10/50

    Data

    Amazon uses a simple item-to-item correlation system

  • 8/14/2019 Boston.rb: Collaborative Filtering

    11/50

    Data

    Amazon uses a simple item-to-item correlation system

    How can they get away with that?

    ~ 20 million items

    n million users

  • 8/14/2019 Boston.rb: Collaborative Filtering

    12/50

    If every user bought 200 items their user-itemmatrix would be 0.001% full

    Data

  • 8/14/2019 Boston.rb: Collaborative Filtering

    13/50

    Datapurchases

    ratings

  • 8/14/2019 Boston.rb: Collaborative Filtering

    14/50

    Datapurchases

    ratings

    views

    votes

    tell-a-friend

    wishlists

    wedding registry

    baby registry

    shopping cart

  • 8/14/2019 Boston.rb: Collaborative Filtering

    15/50

    Datapurchases

    ratings

    views

    votes

    tell-a-friend

    wishlists

    wedding registry

    baby registry

    shopping cart

    anything you can measure!

  • 8/14/2019 Boston.rb: Collaborative Filtering

    16/50

    Data

    Data > Algorithms

    more different data > more of the same data

  • 8/14/2019 Boston.rb: Collaborative Filtering

    17/50

    Correlation

  • 8/14/2019 Boston.rb: Collaborative Filtering

    18/50

    Correlatio

    Find patterns in the data sets

  • 8/14/2019 Boston.rb: Collaborative Filtering

    19/50

    Correlatio

    Pearson

    Singular Value Decomposition

  • 8/14/2019 Boston.rb: Collaborative Filtering

    20/50

    Correlatio

    Pearson

    Singular Value Decomposition

    Kendall tau coefficient

    Spearman's rho

    point biserial correlation coefficient

    C l i

  • 8/14/2019 Boston.rb: Collaborative Filtering

    21/50

    Correlatio

    Word of Caution: Watch for O(n2) here

  • 8/14/2019 Boston.rb: Collaborative Filtering

    22/50

    Recommendation

    R d ti

  • 8/14/2019 Boston.rb: Collaborative Filtering

    23/50

    Recommendatio

    This is the part where we figure out what you'll like.

    R d ti

  • 8/14/2019 Boston.rb: Collaborative Filtering

    24/50

    Recommendatio

    Bob Suzie Joe

    Bob -0.74 0.856

    Suzie 0.87 0.1

    Joe 0.74 -0.9

    So we have all these correlation matrices.

    One for each of the datasets that we correlated.

    R d ti

  • 8/14/2019 Boston.rb: Collaborative Filtering

    25/50

    Recommendatio

    Joe 0.9

    Bob 0.75

    Suzie 0.5

    So let's say we have a user named Fred...

    Recommendatio

  • 8/14/2019 Boston.rb: Collaborative Filtering

    26/50

    Recommendatio

    Joe 0.9

    Bob 0.75

    Suzie 0.5

    JoeItem A 5

    Item B 4

    BobItem B 5

    Item C 2

    SuzieItem C 2

    Item A 2

    Recommendatio

  • 8/14/2019 Boston.rb: Collaborative Filtering

    27/50

    Recommendatio

    JoeItem A 5

    Item B 4

    BobItem B 5

    Item C 2

    SuzieItem C 2

    Item A 2

    Item AJoe 5

    Suzie 2

    Item BJoe 4

    Bob 5

    Item CBob 2

    Suzie 2

    Recommendatio

  • 8/14/2019 Boston.rb: Collaborative Filtering

    28/50

    Recommendatio

    Item AJoe 5

    Suzie 2

    Item BJoe 4

    Bob 5

    Item CBob 2

    Suzie 2

    Item A 3.93

    Item B 4.45

    Item C 2

    Recommendatio

  • 8/14/2019 Boston.rb: Collaborative Filtering

    29/50

    Recommendatio

    Item A 3.93

    Item B 4.45

    Item C 2

  • 8/14/2019 Boston.rb: Collaborative Filtering

    30/50

    Content Boosting

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    31/50

    Content Boostin

    Your users reveal their preferences in their actions.

  • 8/14/2019 Boston.rb: Collaborative Filtering

    32/50

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    33/50

    Content Boostin

    Your users reveal their preferences in their actions.

    If I mark every horror movie in your system as a 1... I don't like horror movies.

    If I rate every Will Smith movie as 5 stars... I probably like Will Smith.

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    34/50

    Content Boostin

    All Items have properties.

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    35/50

    Content Boostin

    All Items have properties.

    Movies have genres, actors, studio, locations, etc...

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    36/50

    Content Boostin

    All Items have properties.

    Movies have genres, actors, studio, locations, etc...

    Comics have genres, writers, artists, publishers, etc...

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    37/50

    All Items have properties.

    Movies have genres, actors, studio, locations, etc...

    Comics have genres, writers, artists, publishers, etc...

    Kittens have color, gender, breed, cute captions, etc...

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    38/50

    I Am Legend 5Action

    Will Smith

    Cloverfield 4Action

    No Will Smith

    Independence Day 4ActionWill Smith

    Sleepless in Seattle 1Romance

    No Will Smith

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    39/50

    I Am Legend 5Action

    Will Smith

    Cloverfield 4Action

    No Will Smith

    Independence Day 4ActionWill Smith

    Sleepless in Seattle 1Romance

    No Will Smith

    So what do my preferences say about m

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    40/50

    I Am Legend 5Action

    Will Smith

    Cloverfield 4Action

    No Will Smith

    Independence Day 4ActionWill Smith

    Sleepless in Seattle 1Romance

    No Will Smith

    So what do my preferences say about m

    My mean rating is 3.5, so...

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    41/50

    I Am Legend 5Action

    Will Smith

    Cloverfield 4Action

    No Will Smith

    Independence Day 4ActionWill Smith

    Sleepless in Seattle 1Romance

    No Will Smith

    So what do my preferences say about m

    My mean rating is 3.5, so...

    Action: +0.8

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    42/50

    I Am Legend 5Action

    Will Smith

    Cloverfield 4Action

    No Will Smith

    Independence Day 4ActionWill Smith

    Sleepless in Seattle 1Romance

    No Will Smith

    So what do my preferences say about m

    My mean rating is 3.5, so...

    Action: +0.8

    Romance: -2.5

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    43/50

    I Am Legend 5Action

    Will Smith

    Cloverfield 4Action

    No Will Smith

    Independence Day 4ActionWill Smith

    Sleepless in Seattle 1Romance

    No Will Smith

    So what do my preferences say about m

    My mean rating is 3.5, so...

    Action: +0.8

    Romance: -2.5

    Will Smith: +1

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    44/50

    Your recommendations are only as good as theamount and quality of your data.

    Content Boostin

  • 8/14/2019 Boston.rb: Collaborative Filtering

    45/50

    Your recommendations are only as good as theamount and quality of your data.

    Content Boosting is thus especially useful if you have limited data.

  • 8/14/2019 Boston.rb: Collaborative Filtering

    46/50

    Output

  • 8/14/2019 Boston.rb: Collaborative Filtering

    47/50

    Output

    I have nothing interesting to say about output...

  • 8/14/2019 Boston.rb: Collaborative Filtering

    48/50

    Output

    I have nothing interesting to say about output...

    Moving on.

  • 8/14/2019 Boston.rb: Collaborative Filtering

    49/50

    Now let's look at some code.

  • 8/14/2019 Boston.rb: Collaborative Filtering

    50/50

    http://github.com/tyler/collaborative_filter