recommender systems
DESCRIPTION
Recommender Systems . Based Rajaraman and Ullman: Mining Massive Data Sets & Francesco Ricci et al. Recommender Systems Handbook. . Recommender System. All of these thrive on User Generated Content (UGC)!. Recommender System. Central Theme : Predict ratings for unrated items - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/1.jpg)
Recommender Systems
Based Rajaraman and Ullman: Mining Massive Data Sets &
Francesco Ricci et al. Recommender Systems Handbook.
![Page 2: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/2.jpg)
Recommender System
All of th
ese thriv
e on
User Generat
ed Content (UGC)!
![Page 3: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/3.jpg)
Recommender System
Central Theme :o Predict ratings for unrated itemso Recommend top-k items
![Page 4: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/4.jpg)
RS – Major Approaches
• Basic question: Given (highly incomplete/sparse), given predict
1 3 5
1 4 4
4 2 3
3 5 4
4 4 3
![Page 5: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/5.jpg)
RS – Approaches
• Content-based: how similar is to items has rated/liked in the past?– Use metadata for measuring similarity. + works even when no ratings available on affected items. - Requires metadata!
• Collaborative Filtering: Identify items (users) with their rating vector; no need for metadata; but cold-start is a problem.
![Page 6: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/6.jpg)
RS – Approaches • CF can be memory-based (as sketched on p5): item ’s
“characteristics captured by the ratings it has received (rating vector).
• Or it can be model-based: model user/item’s behavior via latent factors (to be learned from data). – Dimensionality reduction – Original ratings matrix is usually (very) low rank. Matrix completion:
• using Singular value decomposition (SVD). • Using matrix factorization (MF) [and variants].
• MovieLens – example of RS using CF.
![Page 7: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/7.jpg)
Collaborative Filtering
![Page 8: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/8.jpg)
Key concepts/questions
• How is user f/b expressed: ratings or implicit? • How to measure similarity? • How many nearest neighbors to pick (if
memory- or neighborhood-based). • How to predict unknown ratings? • Distinguished (also called active) user and
(target) item.
![Page 9: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/9.jpg)
A Naïve Algorithm (memory-based)
• Find top- most similar neighbors to distinguished user (using chosen similarity or proximity measure).
• item rated by sufficiently many of these, compute by aggregating by chosen neighbors above.
• Sort items with predicted ratings and recommend top- items to
![Page 10: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/10.jpg)
An Example 4 5 1 5 5 4 2 4 5 3 3
• Jaccard(A,B) = 1/5 <2/4 = Jaccard(A,C)! • – OK, but ignores internal “rating scales” easy/hard
graders. • See the Rajaraman et al. book for “rounded” Jaccard/Cosine. • A more principled approach: subtract from each rating the
corresponding user’s mean rating, then apply Jaccard/cosine.
![Page 11: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/11.jpg)
An Example
2/3 5/3 -7/3 1/3 1/3 -2/3
-5/3 1/3 4/3 0 0
• See what just happened to the ratings! • Behavior and items more well-separated. • Cosine can now be + or -: check (A,B) and
(A,C).
![Page 12: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/12.jpg)
Prediction using Memory/Neighborhood-based approaches
• A popular approach – using Pearson correlation coefficient.
• where• i.e., cosine of the “vectors of deviations from
the mean”. • – normalization factor = • See the RecSys handbook and [Adomavicius
and Tuzhilin TKDE 2005 for alternatives.
![Page 13: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/13.jpg)
User-User vs Item-Item.
• User-User CF: what we just discussed! • Item-Item – dual in principle: find items most
similar to distinguished item ; for every user who did not rate the distinguished item but rated sufficiently many from the similarity group, compute
• In practice, item-item has been found to be better than user-user.
![Page 14: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/14.jpg)
Simpler Alternatives for Rating Estimation • Simple average of ratings by most similar neighbors. • Weighted average. • User’s mean plus offset corresponding to weighted
average of offsets by most similar neighbors (Pearson!).
• Or you can see the popular vote by most similar neighbors: e.g., has 5 most similar neighbors who have rated . – rated 1; rated 3; rated 4; rated 5. – Simple majority: – Suppose 1.0. Then ie-breaking arbitrary.
![Page 15: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/15.jpg)
Item-based CF • Dual to user-based CF, in principle. • “People who bought also bought ”. • Natural connection to association rules (each user = a
transaction). • Predict unknown rating of user on item as the aggregate of
ratings by on items similar to • E.g., using mean-centering and Pearson correlation for
item-item similarity,
where mean rating of by various users and similarity b/w and and – the usual normalization factor.
![Page 16: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/16.jpg)
Item-based CF Computation Illustrated • Similarities: computing sim. b/w all pairs of items is prohibitive! • But do we need to? • How efficiently can we compute the sim. of all pairs of items for which
the sim. Is positive?
X
X
X
X
𝑖
𝑢
…
![Page 17: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/17.jpg)
Item-based CF – Recommendation Generation
X
X
X
X
𝑖
𝑢 X X X X X similar items?similar items?
How efficiently can we generate recommendations for a given user?
![Page 18: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/18.jpg)
Some empirical facts re. user-based vs. item-based CF
• User profiles are typically thinner than item profiles; depends on application domain. – Certainly holds for movies (Netflix).
• as users provide more ratings, user-user sim. can chage more dyamically than item-item sim.
• Can we precompute item-item sim. and speed up prediction computation?
• What about refreshing sim. against updates? Can we do it incrementally? How often should we do this?
• Why not do this for user-user?
![Page 19: Recommender Systems](https://reader036.vdocument.in/reader036/viewer/2022081419/56816777550346895ddc7393/html5/thumbnails/19.jpg)
User & Item-based CF are both personalized
• Non-personalized would estimate an unknown rating as a global average.
• Every user gets the same recommendation list, modulo items s/he may have already rated.
• Personalized clearly leads to better predictions.