1
Recommender Systems and Collaborative
Filtering
Recommender Systems and Collaborative
Filtering
Jon HerlockerAssistant Professor
School of Electrical Engineering and Computer Science
Oregon State UniversityCorvallis, OR
(also President, MusicStrands, Inc.)
2
Personalized Recommender Systems and Collaborative Filtering (CF)
Personalized Recommender Systems and Collaborative Filtering (CF)
3
OutlineOutline
• The recommender system space
• Pure collaborative filtering (CF)
• CF algorithms for prediction• Evaluation of CF algorithms• CF in web search (if time)
4
Recommender SystemsRecommender Systems• Help people make decisions
– Examples: • Where to spend attention• Where to spend money
• Help maintain awareness– Examples:
• New products• New information
• In both cases– Many options, limited resources
5
Stereotypical Integrator of RS Has: Stereotypical Integrator of RS Has: Large product (item) catalog
– With product attributesLarge user base
– With user attributes (age, gender, city, country, …)
Evidence of customer preferences– Explicit ratings (powerful, but
harder to elicit)– Observations of user activity
(purchases, page views, emails, prints, …)
6
Users Items
Observed preferences
The RS SpaceThe RS Space
Item-ItemLinks
User-UserLinks
Links derived from similar attributes,
similar content, explicit cross references
Links derived from similar attributes,
explicit connections
(Ratings, purchases, page views, laundry
lists, play lists)
7
Users Items
Observed preferences
Individual PersonalizationIndividual Personalization
Item-ItemLinks
User-UserLinks
8
Users Items
Observed preferences
Classic CFClassic CF
Item-ItemLinks
User-UserLinks
In the end, most models will be hybrid
9
Collaborative FilteringCollaborative Filtering
Collaborative FilteringProcess
Community Opinions
Items you’ve experienced
Predictions
Unseen items
Your OpinionsYou
10
Find a Restaurant!Find a Restaurant!
Pizza Pipeline
Local Boyz
El Tapio
Adinky Deli
Izzys Cha-da Thai
Jon D A B D ? ? Tami A F D F Mickey A A A A A A Goofy D A C John A C A C A Ben F A F Nathan D A A
11
PizzaPipeline
LocalBoyz
ElTapio
AdinkyDeli
Izzys Cha-daThai
Jon D A B D ? ?Tami A F D FMickey A A A A A AGoofy D A CJohn A C A C ABen F A FNathan D A A
Find a Restaurant!Find a Restaurant!
12
PizzaPipeline
LocalBoyz
ElTapio
AdinkyDeli
Izzys Cha-daThai
Jon D A B D ? ?Tami A F D FMickey A A A A A AGoofy D A CJohn A C A C ABen F A FNathan D A A
Find a Restaurant!Find a Restaurant!
13
PizzaPipeline
LocalBoyz
ElTapio
AdinkyDeli
Izzys Cha-daThai
Jon D A B D ? ?Tami A F D FMickey A A A A A AGoofy D A CJohn A C A C ABen F A FNathan D A A
Find a Restaurant!Find a Restaurant!
14
PizzaPipeline
LocalBoyz
ElTapio
AdinkyDeli
Izzys Cha-daThai
Jon D A B D ? ?Tami A F D FMickey A A A A A AGoofy D A CJohn A C A C ABen F A FNathan D A A
Find a Restaurant!Find a Restaurant!
15
PizzaPipeline
LocalBoyz
ElTapio
AdinkyDeli
Izzys Cha-daThai
Jon D A B D A ?Tami A F D FMickey A A A A A AGoofy D A CJohn A C A C ABen F A FNathan D A A
Find a Restaurant!Find a Restaurant!
16
PizzaPipeline
LocalBoyz
ElTapio
AdinkyDeli
Izzys Cha-daThai
Jon D A B D A FTami A F D FMickey A A A A A AGoofy D A CJohn A C A C ABen F A FNathan D A A
Find a Restaurant!Find a Restaurant!
17
Advantages of Pure CFAdvantages of Pure CF• No expensive and error-prone
user attributes or item attributes • Incorporates quality and taste• Works on any rate-able item• One data model => many
content domains• Serendipity• Users understand it!
18Predictive Algorithms for Collaborative Filtering
Predictive Algorithms for Collaborative Filtering
19
Predictive Algorithms for Collaborative Filtering
Predictive Algorithms for Collaborative Filtering• Frequently proposed taxonomy for
collaborative filtering systems– Model-based methods
• Build a model offline• Use model to generate recommendations• Original data not needed at predict-time
– Instance-based methods • Use the ratings directly to generate
recommendations
20
Model-Based AlgorithmsModel-Based Algorithms• Probabilistic Bayesian
approaches, clustering, PCA, SVD, etc.
• Key ideas– Reduced dimension
representations (aggregations) of original data
– Ability to reconstruct an approximation of the original data
21
Stereotypical model-based approachesStereotypical model-based approaches
• Lower dimensionality => faster performance
• Can explain recommendations• Can over-generalize• Not using the latest data• Force a choice of aggregation
dimensions ahead of time
22
Instance-Based MethodsInstance-Based Methods• Primarily nearest neighbor
approaches• Key ideas
– Predict over raw ratings data (sometimes called memory-based methods)
– Highly personalized recommendations
23
Stereotypical model-based approachesStereotypical model-based approaches• Use most up-to-date ratings• Are simple and easy to explain
to users• Are unstable when there are
few ratings• Have linear (w.r.t. users and
items) run-times• Allow a different aggregation
method for each user, possibly chosen at runtime
24
Evaluating CF Recommender Systems
Evaluating CF Recommender Systems
25
Evaluation – User TasksEvaluation – User Tasks• Evaluation depends on the
user task• Most common tasks
– Annotation in context • Predict ratings for individual items
– Find good items• Produce top-N recommendations
• Other possible tasks– Find all good items– Recommend sequence– Many others…
26
Novelty and Trust - ConfidenceNovelty and Trust - Confidence• Tradeoff
– High confidence recommendations• Recommendations are obvious• Low utility for user• However, they build trust
– Recommendations with high prediction yet lower confidence
• Higher variability of error• Higher novelty => higher utility for
user
27
Test Users
Community
Ratings Data
Training Set
Test Set
28
Predictive Accuracy MetricsPredictive Accuracy Metrics• Mean absolute error (MAE)
• Most common metric • Characteristics
– Assumes errors at all levels in the ranking have equal weight
– Sensitive to small changes– Good for “Annotate in Context” task– May not be appropriate for “Find Good
Items” task
N
rpE
N
iii
1
29
Classification Accuracy MetricsClassification Accuracy Metrics• Precision/Recall
– Precision: Ratio of “good” items recommended to number of items recommended
– Recall: Ratio of “good” items recommended to the total number of “good” items
• Characteristics– Simple, easy to understand– Binary classification of “goodness”– Appropriate for “Find Good Items”– Can be dangerous due to lack of ratings
for recommended items
30
ROC CurvesROC Curves
• “Relative Operating Characteristic” or “Receiver Operating Characteristic”
• Characteristics– Binary classification– Not a single number metric– Covers performance of system at
all points in the recommendation list
– More complex
31
Figure 1. A possible representation of the density functions for relevant and irrelevant items.
Predicted Level of Relevance
Probability
Non-Relevant Relevant
Filter cutoff
32
33
Prediction to Rating Correlation MetricsPrediction to Rating Correlation Metrics• Pearson, Spearman, Kendall• Characteristics
– Compare non-binary ranking to non-binary ranking
– Rank correlation metrics suffer from “weak orderings”
– Can only be computed on rated items
– Provide a single score
34
Half-life Utility MetricHalf-life Utility Metric
• Characteristics– Explicitly incorporates idea of decreasing
user utility– Tuning parameters reduce comparability– Weak orderings can result in different
utilities for the same system ranking– All items rated less than the max contribute
equally– Only metric to really consider non-uniform
utility
jj
jaa
drR
1/1
,
2
0,max
35
Does it Matter What Metric You Use?Does it Matter What Metric You Use?• An empirical study to gain
some insight…
36
Analysis of 432 variations of an algorithm on a 100,000 rating movie dataset
37
Comparison among results provided by all the per-user correlation metrics and the mean average precision per user metric. These metrics have strong linear
relationships with each other.
38
39
Comparison between metrics that are averaged overall rather than per-user. Note the linear relationship between the different metrics.
40
A comparison of representative metrics from the three subsets that were depicted in the previous slides. Within each of the subsets, the metrics strongly agree, but this figure shows that metrics from different subsets do not correlate well
41
Does it Matter What Metric You Use?Does it Matter What Metric You Use?• Yes.
42
Want to try CF?Want to try CF?
• CoFE “Collaborative Filtering Engine” – Open source Java– Easy to add new algorithms– Includes testing infrastructure
(this month)– Reference implementations of
many popular CF algorithms– One high performance algorithm
• Production ready (see Furl.net)
http://eecs.oregonstate.edu/iis/CoFE
43
Improving WebSearch Using CF
With Janet Webster, OSU Libraries
44
Controversial ClaimControversial Claim• Improvements in text
analysis will substantially improve the search experience
• Focus on improving results of Mean Average Precision (MAP) metric
45
Evidence of the ClaimEvidence of the Claim• Human Subjects Study by
Turpin and Hersh (SIGIR 2001)– Compared human performance of
• 1970s search model (basic TF/IDF)• Recent OKAPI search model with
greatly improved MAP
– Task: locating medical information
– No statistical difference
46
Bypass the Hard Problem!Bypass the Hard Problem!• The hard problem –
Automatic analysis of text – Software “understanding”
language
• We propose: Let humans assist with the analysis of text!– Enter Collaborative Filtering
47
50
The Human ElementThe Human Element• Capture and leverage the experience
of every user– Recommendations are based on human
evaluation• Explicit votes• Inferred votes (implicit)
• Recommend (question, document) pairs– Not just documents– Human can determine if questions have
similarity
• System gets smarter with each use– Not just each new document
51
Research IssuesResearch Issues• Basic Issues
– Is the concept sound? – What are the roadblocks?
• More mathematical issues– Algorithms for ranking recommendations (question,
document, votes) – Robustness with unreliable data
• Text/Content Analysis– Improved NLP for matching questions– Incorporating more information into information context
• More social issues– Training users for new paradigm– Privacy– Integrating with existing reference library practices and
systems– Offensive material in questions– Most effective user interface metaphors
52
Initial ResultsInitial Results
53
Initial ResultsInitial Results
Average visited documents: 2.196
First click - recommendation
(141 – 71.6%)
First click - Google result
(56 – 28.4%)
Average ratings: 14.727 Average ratings: 20.715
Only Google Results (706 - 59.13%)
Google results + recommendations (488 - 40.87%)
Average visited documents: 1.598
Clicked
(172 – 24.4%)
No clicks
(534 - 75.6%)
Clicked
(197 – 40.4%)
No click(291 – 59.6%)
Three months SERF usage – 1194 search transactions
Average visited documents: 2.196
First click - recommendation
(141 – 71.6%)
First click - Google result
(56 – 28.4%)
Average ratings: 14.727 Average ratings: 20.715
Only Google Results (706 - 59.13%)
Google results + recommendations (488 - 40.87%)
Average visited documents: 1.598
Clicked
(172 – 24.4%)
No clicks
(534 - 75.6%)
Clicked
(197 – 40.4%)
No click(291 – 59.6%)
Three months SERF usage – 1194 search transactions
Average visited documents: 2.196
First click - recommendation
(141 – 71.6%)
First click - Google result
(56 – 28.4%)
Average ratings: 14.727 Average ratings: 20.715
Only Google Results (706 - 59.13%)
Google results + recommendations (488 - 40.87%)
Average visited documents: 1.598
Clicked
(172– 24.4%)
No clicks
(534 - 75.6%)
Clicked
(197 – 40.4%)
No click(291 – 59.6%)
Three months SERF usage – 1194 search transactions
Average visited documents: 2.196
First click - recommendation
(141 – 71.6%)
First click - Google result
(56 – 28.4%)
Average rating: 14.727(49% Voted as Useful)
Average rating: 20.715(69% Voted as Useful)
Only Google Results (706 - 59.13%)
Google results + recommendations (488 - 40.87%)
Average visited documents: 1.598
Clicked
(172 – 24.4%)
No clicks
(534 - 75.6%)
Clicked
(197 – 40.4%)
No click(291 – 59.6%)
Three months SERF usage – 1194 search transactions
Vote of yes = 30, vote of no = 0
58
SERF Project SummarySERF Project Summary• No large leaps in language
understanding expected– Understanding the meaning of
language is *very* hard
• Collaborative filtering (CF) bypasses this problem– Humans do the analysis
• Technology is widely applicable
59
Talk MessagesTalk Messages
• Model for learning options in recommender systems
• Survey of popular predictive algorithms for CF
• Survey of evaluation metrics• Empirical data showing metrics
can matter• Evidence showing CF could
significantly improve web search
60
Links & Contacts Links & Contacts
CoFE– http://eecs.oregonstate.edu/iis/
CoFESERF
– http://osulibrary.oregonstatate.edu/
[email protected]+ 1 (541) 737-8894