collaborative filtering recommendation system
Embed Size (px)
Collaborative Filtering Recommender SystemVIMALENDU SHEKHARMILIND GOKHALERENUKA DESHMUKH
Subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item.
Helps deciding in what to wear, what to buy, what stocks to purchase etc.
Applied in a variety of applications like movies, books, research arcticles.
People relied on the recommendations from their peers.
This method doesn’t take the personal preference of the user in to account.
It also limits the search space.
Computer based recommender systems overcomes this by expanding the search space and providing a more fine tunes results.
Tasks of Recommender Systems
Predict Task- The user’s preference for an item.
Recommend Task- Produce best ranked list of n-items for user’s need.
collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc.
For recommender systems collaborative filtering is a method of making automatic predictions about the interests of a user by collecting preferences information from many users.
Based on the idea that people who agreed in their evaluation of certain items in the past are likely to agree again in the future.
User - User Collaborative Filtering
Basic Idea- find other users whose past rating behavior is similar to that of the current user and use their ratings on other items to predict what the current user will like.
Required: Ratings matrix and similarity function that computes the similarity between two users.
The selection of neighbors can be random or based on a threshold value.
User U’s prediction for item i is given by pu,I
Item-Item Collaborative Filtering
Basic Idea- Recommend items that are similar to the user’s highly preferred items.
Provides performance gains by lending itself well to pre-computing similarity matrix.
User U’s prediction for item is given by pu,i
Cosine similarity or conditional probability is used to computer item-item similarity.
Problem: User-User or Item-Item CF: The user-items ratings domain is a vector space. Thus
redundancy Information Retrieval: term-document matrix thus high dimensional representation of
terms and documents. Synonymy, Polysemy, noise
Can we Reduce the number of dimensions to a constant k? Truncated SVD – Singular dimensionality reduction by singular value decomposition
Applications: Information retrieval: LSA/LSI Latent semantic analysis / index. CF
The core idea of probabilistic methods is to compute either P(i|u), the probability that user u will purchase or view item i, or the probability distribution P(ru,i|u) over user u’s rating of item I
Cross-Sell System: uses pairwise conditional probabilities with the na¨ıve Bayes assumption to
do recommendation in unary e-commerce domains. Based on user purchase histories, the algorithm estimates P(a|b) (the
probability that a user purchases a given that they have purchased b) for each pair of items a, b. The user’s currently-viewed item or shopping basket is combined with these pairwise probabilities to recommend items optimizing the expected value of site-defined objective functions
Probabilistic Matrix Factorization
Probabilistic latent semantic analysis/indexing (PLSA/PLSI) PLSA decomposes the probability P(i|u) by introducing a set Z of latent
factors. Here z is a factor on the basis of which user (u) decides which item (i) to view or purchase.
P(i|u) is therefore Thus basically users are represented as a mixture of preference profiles
or feature preferences and attributes the item preference by user, to the preference profiles rather than directly to the users.
ˆU is the matrix of the mixtures of preference profiles for each userˆT is the matrix of preference profile probabilities of selecting various items.Σ is a diagonal matrix such that σz = P(z)
Hybrids can be particularly beneficial when the algorithms involved cover different use cases or different aspects of the data set.
7 Classes of Hybrid Recommenders Weighted – takes scores produced by several recommenders and combines them Switching – switch between difference algorithms according to the context Mixed – present several recommender results but not combined into single list. Feature-combining – Use multiple recommendation data sources to get a single meta-
recommender algorithm Cascading – chain the algorithms (output of one to other as input) Feature-augmenting – Uses output of one algo as one of the inputs to other algo Meta-level – Train a model using one algo and give it as input to another algo
Example: Netflix Prize – Feature weighted linear stacking;
function gj of item meta-features, such as number of ratings or genre, to alter the blending ratio of the various algorithms’ predictions on an item-by-item basis
User-based Algo: more tractable when there are more items than users Item-based Algo: more tractable when there are more users than items Minimal offline computation but higher online computation Matrix Factorization methods:
- Require expensive offline model + Fast for online use + Reduced impact of ratings noise + Reduced impact of user rating on each others’ ratings
Probabilistic Models: when recommendation process should follow models of user behavior.
Evaluating Recommender Systems
It can be costly to try algorithms on real sets of users and measure the effects.
Offline Algorithmic Evaluations: Pre-test algorithms in order to understand user testing. It is beneficial for performing direct, objective comparison of different
algorithms in a reproducible fashion
EachMovie: by DEC Systems Research center – 2.8M user ratings of movies
MovieLens: 100K timestamped user ratings, 1M ratings, and 10M rating and 100K timestamped records of users tagging movies.
Jester: ratings of 100 jokes from 73,421 users between April 99’ – May 03’, and ratings of 150 jokes from 63,974 users between Nov 06’ – May 09’
BookCrossing: 1.1M ratings from 279K users for 271K books Netflix: 100M datestamped ratings of 17K movies from 480K users.
Offline Evaluation Structure
The users in the data set are split into two groups: training set and test set. A recommender model is built against the training set. The users in the test set are then split into two parts: query set and target
set. The recommender is given the query set as a user history and asked to
recommend items or to predict ratings for the items in the target set; it is then evaluated on how well its recommendations or predictions match
with those held out in the query. This whole process is frequently repeated as in k-fold cross-validation by
splitting the users into k equal sets.
Prediction Accuracy: MAE
(MAE) Mean Absolute Error: Example: 5-star scale [1, 5], an MAE of 0.7 means that the algorithm,
on average, was off by 0.7 stars. This is useful for understanding the results in a particular context, but
makes it difficult to compare results across data sets as they have differing rating ranges
(NMAE) Normalized mean absolute error: Divides the ranges of possible ratings and thus a common metric range of [0,1]
Prediction Accuracy: RMSE
(RMSE) Root Mean Square Error: Amplifies the larger absolute errors
Netflix Prize: $1M prize was awarded for a 10% improvement in RMSE over Netflix’s internal algorithm.
Further, RMSE can also be normalized like NMAE by dividing the rating scale. Out of the three techniques, which one to use depends on how the results are
to be compared. Mostly these metrics are used for evaluation of predict tasks.
Accuracy over time
Temporal versions of MAE and RMSE introduced to measure the accuracy of recommender systems over time as and when more users are added to the system.
Hence the timestamped datasets prove to be very useful for measuring accuracy over time.
nt - number of ratings computed up through time ttu,i - the time of rating ru,i.
Decision Support Metrics
This framework examines the capacity for a retrieval system to accurately identify resources relevant to a query, measuring separately its capacity to find all relevant items and avoid finding irrelevant items.
A confusion matrix is used for measuring this.
Decision Support Metrics
High Precision System: Example - Movie Recommendation High Recall System: Example – Legal precedent needs
Offline evaluation though useful is limited to operating on past data. Recommender systems with similar metric performance can still give
different results and a decrease in the error may or may not make the system better at meeting the user needs.
For this online user testing is needed. Field Trials: Here the recommender is deployed in the live systems and
users’ interaction with the system are recorded Virtual Lab Studies: They generally have a small user base who are
invited to participate instead of live user base.
Building a Data Set
The need for preference data can be decomposed into two types of information needs: User information: user’s preferences Item information: what kinds of users like or dislike each item
User–item preferences: Set of characteristics, user preferences for those characteristics, and those characteristics’ applicability to various items.
Item–item model: What items are liked by the same users as well as the current user’s preferences.
Problem of providing recommendations when there is not yet data available
Item cold-start: A new item has been added to the database (e.g., when a new movie or book is released) but has not yet received enough ratings to be recommendable.
User cold-start: A new user has joined the system but their preferences are not yet known
Sources of Preference Data
Preference data (ratings) comes from two primary sources. Explicit ratings: Preferences the user has explicitly stated for particular items. Implicit ratings: Preferences inferred by the system from observable user
activity, such as purchases or clicks.
Many recommender systems obtain ratings by having users explicitly state their preferences for items. These stated preferences are then used to estimate the user’s preference for items they have not rated. Drawback: There can, for many reasons, be a discrepancy between what the
users say and what they do.
Preferences can also be inferred from user behaviorUsenet – Reading Domain
Time spent readingSaving or replyingCopying text into new articlesMentions of URLs.
Intelligent Music Mgmt System
Infers the user’s preference for various songs in their library as they skip them or allow them to play to completion
Page views Item purchases as gifts or personal useShared accounts can be misleading
Types of preference data
GroupLens used a 5-star scale Jester uses a semi-continuous −10 to +10 graphical scale Ringo used a 7-star scale Pandora music uses a “like”/“dislike” method
Dealing with Noise
Noise in rating can be introduced by – normal human error and other factors.
Natural noise in ratings can be detected by asking users to re-rate items.
Another approach is detecting and ignoring noisy ratings by comparing each rating to the user’s predicted preference for that item and discarding ratings whose differences exceed some threshold from the prediction and recommendation process.