models for information retrieval and recommendation
TRANSCRIPT
![Page 1: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/1.jpg)
Recommendation and Information Retrieval:
Two sides of the same coin?Prof.dr.ir. Arjen P. de Vries
[email protected] CWI, TU Delft, Spinque
![Page 2: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/2.jpg)
Outline• Recommendation Systems
– Collaborative Filtering (CF)• Probabilistic approaches
– Language modelling for Information Retrieval – Language modelling for log-based CF– Brief: adaptations for rating-based CF
• Vector Space Model (“Back to the Future”)– User and item spaces, orthonormal bases and
“the spectral theorem”
![Page 3: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/3.jpg)
Recommendation• Informally:
– Search for information “without a query”
• Three types:– Content-based recommendation– Collaborative filtering (CF)
• Memory-based• Model-based
– Hybrid approaches
![Page 4: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/4.jpg)
Recommendation• Informally:
– Search for information “without a query”
• Three types:– Content-based recommendation– Collaborative filtering
• Memory-based• Model-based
– Hybrid approaches
Today’s focus!
![Page 5: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/5.jpg)
Popularity-based
Content-based
Online News
![Page 6: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/6.jpg)
![Page 7: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/7.jpg)
Mus
ic
Collaborative Filtering
![Page 8: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/8.jpg)
Collaborative Filtering• Collaborative filtering (originally introduced by
Patti Maes as “social information filtering”)
1. Compare user judgments2. Recommend differences between
similar users
• Leading principle:People’s tastes are not randomly distributed– A.k.a. “You are what you buy”
![Page 9: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/9.jpg)
Rating Matrix
![Page 10: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/10.jpg)
Users
![Page 11: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/11.jpg)
Items
![Page 12: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/12.jpg)
Rating
![Page 13: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/13.jpg)
User Profile
![Page 14: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/14.jpg)
Item Profile
![Page 15: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/15.jpg)
Unknown Rating
![Page 16: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/16.jpg)
Collaborative Filtering
If user Boris watched Love Actually, how would he rate it?
![Page 17: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/17.jpg)
Collaborative Filtering
• Standard item-based formulation (Adomavicius & Tuzhilin 2005)
sim ,
rat , rat ,sim ,
u
u
j Ij I
i ju i u j
i j
![Page 18: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/18.jpg)
Collaborative Filtering• Benefits over content-based approach
– Overcomes problems with finding suitable features to represent e.g. art, music
– Serendipity– Implicit mechanism for qualitative aspects like
style• Problems: large groups, broad domains
![Page 19: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/19.jpg)
Prediction vs. Ranking• Original formulations focused on modelling
the users’ item ratings: rating prediction– Evaluation of algorithms (e.g., Netflix prize) by
Mean Absolute Error (MAE) or Root Mean Square Error (RMSE) between predicted and actual ratings
![Page 20: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/20.jpg)
Recency-based
![Page 21: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/21.jpg)
Prediction vs. Ranking• Original formulations focused on modelling
the users’ item ratings: rating prediction– Evaluation of algorithms (e.g., Netflix prize) by
Mean Absolute Error (MAE) or Root Mean Square Error (RMSE) between predicted and actual ratings
• For the end user, the ranking of recommended items is the essential problem: relevance ranking– Evaluation by precision at fixed rank (P@N)
![Page 22: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/22.jpg)
Relevance Ranking• Core problem of Information Retrieval!
![Page 23: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/23.jpg)
Generative Model• A statistical model for generating data
– Probability distribution over samples in a given ‘language’
MP ( | M ) = P ( | M )
P ( | M, )P ( | M, )P ( | M, )
© Victor Lavrenko, Aug. 2002
![Page 24: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/24.jpg)
Unigram models etc.
• Unigram Models
• N-gram Models (here, N=2)
= P ( ) P ( | ) P ( | ) P ( | )
P ( ) P ( ) P ( ) P ( )
P ( )
P ( ) P ( | ) P ( | ) P ( | )
© Victor Lavrenko, Aug. 2002
![Page 25: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/25.jpg)
Fundamental Problem
• Usually we don’t know the model M– But have a sample representative of that
model
• First estimate a model from a sample• Then compute the observation probability
P ( | M ( ) )
M© Victor Lavrenko, Aug. 2002
![Page 26: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/26.jpg)
• Unigram Language Models (LM)– Urn metaphor
Language Models…
• P( ) ~ P ( ) P ( ) P ( ) P ( ) = 4/9 * 2/9 * 4/9 * 3/9
© Victor Lavrenko, Aug. 2002
![Page 27: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/27.jpg)
… for Information Retrieval• Rank models (documents) by probability of
generating the query:Q:
P( | ) = 4/9 * 2/9 * 4/9 * 3/9 = 96/9
P( | ) = 3/9 * 3/9 * 3/9 * 3/9 = 81/9
P( | ) = 2/9 * 3/9 * 2/9 * 4/9 = 48/9
P( | ) = 2/9 * 5/9 * 2/9 * 2/9 = 40/9
![Page 28: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/28.jpg)
Zero-frequency Problem• Suppose some event not in our example
– Model will assign zero probability to that event– And to any set of events involving the unseen event
• Happens frequently in natural language text, and it is incorrect to infer zero probabilities– Especially when dealing with incomplete samples
?
![Page 29: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/29.jpg)
Smoothing• Idea:
Shift part of probability mass to unseen events
• Interpolate document-based model with a background model (of “general English”)– Reflects expected frequency of events– Plays role of IDF
+ (1-)
![Page 30: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/30.jpg)
Relevance Ranking• Core problem of Information Retrieval!
– Question arising naturally:Are CF and IR, from a modelling perspective, really two different problems then?
Jun Wang, Arjen P. de Vries, Marcel JT Reinders, A User-Item Relevance Model for Log-Based Collaborative Filtering, ECIR 2006
![Page 31: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/31.jpg)
User-Item Relevance Models• Idea: CF by a probabilistic retrieval model
![Page 32: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/32.jpg)
User-Item Relevance Models• Idea: CF by a probabilistic retrieval model• Treat user profile as query and answer
the following question:– “ What is the probability that this item
is relevant to this user, given his or her profile”
• Hereto, apply the language modelling approach to IR as a formal model to compute the user-item relevance
![Page 33: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/33.jpg)
Implicit or explicit relevance?• Rating-based CF:
– Users explicitly rate “items”We use “items” to represent contents (movie, music,
etc.)
• Log-based CF:– User profiles are gathered by logging the
interactions. Music play-list, web surf log, etc.
![Page 34: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/34.jpg)
User-Item Relevance Models • Existing User-based/Item-based
approaches– Heuristic implementations of “word-of-mouth” – Unclear how to best deal with the sparse data!
• User-Item Relevance Models– Give probabilistic justification– Integrate smoothing to tackle the problem of
sparsity
![Page 35: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/35.jpg)
User-Item Relevance Models
Other Items that the target user liked
Other users who liked the target itemTarget Item
Target User
Releva
nce?
Item Representation
User Representation
![Page 36: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/36.jpg)
User-Item Relevance Models• Introduce the following random variables
• Rank items by their log odds of relevance
( | , )RSV ( ) log( | , )UP R r U IIP R r U I
1 1Users: { ,..., } Items: I { ,..., } K MU u u i i Relevance : { , } , "relevant", "not relevant"R r r r r
![Page 37: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/37.jpg)
Item Representation
Query Items: other Items that
the target user liked
Target Item
Target User
Releva
nce
Item Representation
{ib}
im?
uk
![Page 38: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/38.jpg)
User-Item Relevance Models• Item representation
– Use items that I liked to represent target user– Assume the item “ratings” are independent– Linear interpolation smoothing to address
sparsity
: ( , ) 0
( | , ) ( | , ) ( | )RSV ( ) log log( | , ) ( | , ) ( | )
...(1 ) ( | , )log(1 ) log ( | )
( | )
k
b b u b mk
m k k m mu m
m k k m m
ml b mm
i i L c i i b
P r i u P u r i P r iiP r i u P u r i P r i
P i i r P i rP i r
( | , ) (1 ) ( | , ) ( | )b m ml b m ml bP i i r P i i r P i r
[0,1] is a parameter to adjust the strength of smoothing
![Page 39: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/39.jpg)
Co-occurrence popularity
User-Item Relevance Models• Probabilistic justification of Item-based CF
– The RSV of a target item is the combination of its popularity and its co-occurrence with items (query items) that the target user liked.
: ( , ) 0
(1 ) ( | , )RSV ( ) log(1 ) log ( | )( | )k
b b u b mk
ml b mu m m
i i L c i i b
P i i ri P i rP i r
![Page 40: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/40.jpg)
Co-occurrence between target item and query item
Popularity of query item
User-Item Relevance Models• Probabilistic justification of Item-based CF
– The RSV of a target item is the combination of its popularity and its co-occurrence with items (query items) that the target user liked
• Item co-occurrence should be emphasized if more users express interest in both target & query item
• Item co-occurrence should be suppressed when the popularity of the query item is high
: ( , ) 0
(1 ) ( | , )RSV ( ) log(1 ) log ( | )( | )k
b b u b mk
ml b mu m m
i i L c i i b
P i i ri P i rP i r
![Page 41: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/41.jpg)
User Representation
Other users who liked the target itemTarget Item
Target User
Releva
nce
{ub}im?
uk
![Page 42: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/42.jpg)
User-Item Relevance Models
• User representation– Represent target item by users who like it– Assume the user profiles are independent– Linear interpolation smoothing to address sparsity
:
( | , ) ( | , ) ( | )RSV ( ) log log( | , ) ( | , ) ( | )
...(1 ) ( | , )log(1 )
( | )
k
b b im
m k m k ku m
m k m k k
ml b k
u u L b
P r i u P i r u P r uiP r i u P i r u P r u
P u u rP u r
( | , ) (1 ) ( | , ) ( | )ml b k ml b k ml bP u u r P u u r P u r
[0,1] is a parameter to adjust the strength of smoothing
![Page 43: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/43.jpg)
Co-occurrence between the target user and the other users
Popularity of the other users
User-Item Relevance Models
• Probabilistic justification of User-based CF– The RSV of a target item towards a target user is
calculated by the target user’s co-occurrence with other users who liked the target item
• User co-occurrence is emphasized if more items liked by target user are also liked by the other user
• User co-occurrence should be suppressed when this user liked many items
:
(1 ) ( | , )RSV ( ) log(1 )( | )k
b b im
ml b ku m
u u L b
P u u riP u r
![Page 44: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/44.jpg)
![Page 45: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/45.jpg)
Empirical Results• Data Set:
– Music play-lists from audioscrobbler.com– 428 users and 516 items– 80% users as training set and 20% users as test set. – Half of items in test set as ground truth, others as user
profiles
• Measurement– Recommendation Precision: (num of corrected items)/(num. of recommended)– Averaged over 5 runs– Compared with the suggestion lib developed in GroupLens
![Page 46: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/46.jpg)
P@N vs. lambda
![Page 47: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/47.jpg)
Effectiveness (P@N)
![Page 48: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/48.jpg)
So far…
• User-Item relevance models– Give a probabilistic justification for CF– Deal with the problem of sparsity– Provide state-of-art performance
![Page 49: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/49.jpg)
Rating Prediction?• Previous log-based CF method predicts
nor uses rating information– Ranks items solely by usage frequency– Appropriate for, e.g., music recommendation
in a service like Spotify or personalised TV
![Page 50: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/50.jpg)
…… bi Bi1i
,1ax
1,mx
,A bx
,a Bx, ?a bxau
Au
1u
……
![Page 51: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/51.jpg)
bi … …Sorted Item Similarity
1,bx
,A bx
au , ?a bx Rating Prediction
SIR
Unknown Rating
![Page 52: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/52.jpg)
bi
au……
,1ax ,a Bx, ?a bx
Sorted U
ser Sim
ilarity
Rating P
rediction
Unknown Rating
SUR
![Page 53: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/53.jpg)
Sparseness• Whether you choose SIR or SUR, in many
cases, the neighborhood extends to include “not-so-similar” users and/or items
• Idea:Take into considerations the similar item ratings made by similar users as extra source for prediction
Jun Wang, Arjen P. de Vries, Marcel JT Reinders, Unifying user-based and item-based collaborative filtering approaches by similarity
fusion, SIGIR 2006
![Page 54: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/54.jpg)
bi …
, ?a bxau
……
Sorted U
ser Sim
ilarity
… Sorted Item Similarity
SIR
Unknown Rating
SUIR
SUR
Rating Prediction
Rating Prediction
![Page 55: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/55.jpg)
,a bx
,k mx
2I1I
1 1I 1 0I
2 1I
,a bx SIR ,a bx SUR
,a bx SUIR ,a bx SUIR
2 0I
Similarity Fusion
![Page 56: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/56.jpg)
Sketch of Derivation
2
,
, 2 2
, 2 2
, 2 2
, ,
, , ,
( | , , )
( , | , , ) ( )
( , 1 | , , ) ( 1)
( , 0 | , , )(1 ( 1))
( | ) ( | , )(1 )
( | ) ( ( ) ( )(1 )
k m
k mI
k m
k m
k m k m
k m k m k m
P x SUR SIR SUIR
P x I SUR SIR SUIR P I
P x I SUR SIR SUIR P I
P x I SUR SIR SUIR P I
P x SUIR P x SUR SIR
P x SUIR P x SUR P x SIR
)(1 )
See SIGIR 2006 paper for more details
![Page 57: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/57.jpg)
User-Item Relevance Models
Theoretical Level
InformationRetrievalField
Machine Learning
Field
User RepresentationItem Representation
Combination rules
Similarity Fusion
Individual Predictor
Latent Predictor Space, Latent semantic analysis, manifold learning etc.
Relevance Feedback. Query expansion etc
![Page 58: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/58.jpg)
Remarks• SIGIR 2006 paper estimates probabilities
directly from the similarity distance given between users and items
• TOIS 2008 paper below applies Parzen window kernel density estimation to the rating data itself, to give a full probabilistic derivation– Shows how the “kernel trick” let’s us generalize the
distance measure; such that a cosine (projection) kernel (length-normalized dot product) can be chosen, while keeping Gaussian kernel Parzen windows
Jun Wang, Arjen P. de Vries, and Marcel J. T. Reinders. Unified relevance models for rating prediction in collaborative filtering. ACM
TOIS 26 (3), June 2008
![Page 59: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/59.jpg)
Relevance Feedback• Relevance Models for query expansion in IR
– Language model estimated from known relevant or from top-k documents (Pseudo-RFB)
– Expand query with terms generated by the LM• Application to recommendation
– User profile used to identify neighbourhood; a Relevance Model estimated from that neighbourhood used to expand the profile
– Deploy probabilistic clustering method PPC to construct the neighbourhood
– Very good empirical results on P@N
Javier Parapar, Alejandro Bellogín, Pablo Castells, Álvaro Barreiro. Relevance-Based Language Modelling for Recommender
Systems.Information Processing & Management 49 (4), pp. 966-980
![Page 60: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/60.jpg)
CF =~ IR?Follow-up question:
Can we go beyond “model level” equivalences observed so far, and actually cast the CF problem such that we can use the full IR machinery?
Alejandro Bellogín, Jun Wang, and Pablo Castells.Text Retrieval Methods for Item Ranking in Collaborative
Filtering. ECIR 2011
![Page 61: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/61.jpg)
IR System
Query Process
Text Retrieval
EngineOutputInverted
Index
Term occurrences (term-doc
matrix)
Query
![Page 62: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/62.jpg)
CF RecSys?!
User Profile Process
Item Similarity
Text Retrieval
EngineOutputInverted
Index
User Profiles
(User-item matrix)
User profile (as query)
![Page 63: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/63.jpg)
Collaborative Filtering
• Standard item-based formulation
• More general
1 2rat , , , , ,j g u j g u
u i f u i j f u j f i j
sim ,
rat , rat ,sim ,
u
u
j Ij I
i ju i u j
i j
![Page 64: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/64.jpg)
Text Retrieval
• In (Metzler & Zaragoza, 2009)
– In particular: factored form
, , ,t g q
s q d s q d t
1 2, , , ,s q d t w q t w d t
![Page 65: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/65.jpg)
Text Retrieval
• Examples– TF:
– TF-IDF:
– BM25:
1
2
, qf
, tf ,
w q t t
w d t t d
1
2
, qf
, tf , logdf
w q t t
Nw d t t dt
31
3
12
1
1 qf,
qf
df 0.5 1 tf ,, log
df 0.5 1 dl / dl tf ,
k tw q t
k t
N t k t dw d t
t k b b d t d
![Page 66: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/66.jpg)
IR =~ CF?• In item-based Collaborative Filtering
• Apply different models– With different normalizations and norms: sqd,
L1 and L2
tf , sim ,
qf rat ,
t d i j
t u j
sqd
Document
No norm Norm ( /|D|)
QueryNo norm s00 s01
Norm ( /|Q|) s10 s11
t jd iq u
![Page 67: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/67.jpg)
IR =~ CF!• TF L1 s01 is equivalent to item-based CF
sim ,rat , rat ,
sim ,u
u
j Ij I
i ju i u j
i j
1 2
tf ,, , , qf
tf ,t g q t g qt g q
t ds q d w q t w d t t
t d
tf , sim ,
qf rat ,
t d i j
t u j
![Page 68: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/68.jpg)
Empirical Results• Movielens 1M
– Movielens100k: comparable results
• TF L1 s01 equivalent to item-based CF (baseline)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
TF L1 s01
TF-IDF L1 s01
TF-IDF L2 s11
BM25 L2 s11
TF L1 s10
BM25 L1 s01
TF-IDF L1 s10
BM25 L1 s00
TF-IDF L2 s10
TF-IDF L1 s00
TF L2 s10
BM25 L2 s10
TF L1 s00
BM25 L1 s11
BM25 L1 s10
BM25 L2 s01
TF L2 s11
TF-IDF L2 s01
TF L2 s01
nDCG
![Page 69: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/69.jpg)
Vector Space Model• Challenge:
– No shared “words” to relate documents to queries• Solution:
– First project users and items in a common space• Two extreme settings:
– Project users into a space with dimensionality of the number of items
– Project items into a space with dimensionality of the number of users
A. Bellogín, J. Wang, P. Castells. Bridging Memory-Based Collaborative Filtering and Text Retrieval. Information Retrieval Journal
![Page 70: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/70.jpg)
Item Space• User
• Item
• Rank
• Predict rating:
![Page 71: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/71.jpg)
User space• User
• Item
• Rank
• Predict rating:
![Page 72: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/72.jpg)
Linear Algebra• Users and items in shared orthonormal
space:
• Consider covariance matrix
• Spectral theorem now states that an orthonormal basis of eigenvectors exists
![Page 73: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/73.jpg)
Linear Algebra• Use this basis to represent items and
users:
• The dot product then has a remarkable form (of the IR models discussed):
![Page 74: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/74.jpg)
Subspaces…• Number of items (n) vs. number of users
(m):– If n < m, a linear dependency must exist
between users in terms of the item space components
– In this case, it has been known empirically that item-based algorithms tend to perform better
• Dimension of sub-space key for the performance of the algorithm?
• ~ better estimation (more data per item) in the probabilistic versions
![Page 75: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/75.jpg)
Subspaces…• Matrix Factorization methods are captured
by assuming a lower-dimensionality space to project items and users into (usually considered “model-based” rather than “memory-based”)
~ Latent Semantic Indexing (a VSM method replicated as pLSA and variants)
![Page 76: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/76.jpg)
Ratings into Inverted File
• Note: distribution of item occurrences not Zipfian like text, so existing implementations (including choice of compression etc.) may be sub-optimal for CF runtime performance
![Page 77: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/77.jpg)
Weighting schemes
![Page 78: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/78.jpg)
Empirical results 1M
![Page 79: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/79.jpg)
Empirical results 10M
![Page 80: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/80.jpg)
Rating prediction
![Page 81: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/81.jpg)
Concluding Remarks• The probabilistic models are elegant (often
deploying impressive maths), but what do they really add in understanding IR & CF – i.e., beyond the (often claimed to be “ad-hoc”) approaches of the VSM?
![Page 82: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/82.jpg)
Concluding Remarks• Clearly, the models in CF & IR are closely
related• Should these then really be studied in two
different (albeit overlapping) communities, RecSys vs. SIGIR?
![Page 83: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/83.jpg)
Meanwhile at TREC…
![Page 84: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/84.jpg)
Contextual Suggestions• Given a user profile and a context, make
suggestions– AKA Context-aware Recommendation, zero-
query Information Retrieval, …
![Page 85: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/85.jpg)
“Entertain me”• Recommend “things to do”, where
– User profile consists of opinions about attractions
– Context consists of a specific geo-location
![Page 86: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/86.jpg)
TREC-CS (1/3)
• Given a user profile– 70 – 100 POIs represented by a title,
description and URL (situated in Chicago / Santa Fe)
– Rated on a scale 0 – 4
125, Adler Planetarium & Astronomy Museum, ''Interactive exhibits & high-tech sky shows entertain stargazers -- lakefront views are a bonus.'',
http://www.adlerplanetarium.org/131,Lincoln Park Zoo,"Lincoln Park Zoo is a free 35-acre
zoo located in Lincoln Park in Chicago, Illinois. The zoo was founded in 1868, making it one of the oldest zoos in the U.S. It is also one of a few free admission zoos in the United States.", http://www.lpzoo.org/
700, 125, 4, 4700, 131, 0, 1
![Page 87: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/87.jpg)
TREC CS (2/3)• … and a context
– Corresponding to a metropolitan area in the USA, e.g., 109, Kalamazoo, MI
![Page 88: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/88.jpg)
TREC CS (3/3)• Suggest Web pages / snippets
– From the Open Web, or from ClueWeb
700, 109 ,1,"About KIA History Kalamazoo Institute of Arts KIA History","The Kalamazoo Institute of Arts is a nonprofit art museum and school. Since , the institute has offered art classes and free admission programming, including exhibitions, lectures, events, activities and a permanent collection. The KIAs mission is to cultivate the creation and appreciation of the visual arts for the communities",clueweb12-1811wb-14-09165
![Page 89: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/89.jpg)
Common approach
![Page 90: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/90.jpg)
References• Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems:
a survey of the state-of-the-art and possible extensions. IEEE TKDE 17(6), 734-749 (2005)
• Alejandro Bellogín, Jun Wang, and Pablo Castells.Text Retrieval Methods for Item Ranking in Collaborative Filtering. ECIR 2011.
• Metzler, D., Zaragoza, H.: Semi-parametric and non-parametric term weighting for information retrieval. ECIR 2009.
• Javier Parapar, Alejandro Bellogín, Pablo Castells, Álvaro Barreiro. Relevance-Based Language Modelling for Recommender Systems.Information Processing & Management 49 (4), pp. 966-980
• A. Bellogín, J. Wang, P. Castells. Bridging Memory-Based Collaborative Filtering and Text Retrieval.Information Retrieval (to appear)
• Jun Wang, Arjen P. de Vries, Marcel JT Reinders, Unifying user-based and item-based collaborative filtering approaches by similarity fusion, SIGIR 2006
• Jun Wang, Arjen P. de Vries, Marcel JT Reinders, A User-Item Relevance Model for Log-Based Collaborative Filtering, ECIR 2006
• Jun Wang, Arjen P. de Vries, and Marcel J. T. Reinders. Unified relevance models for rating prediction in collaborative filtering. ACM TOIS 26 (3), June 2008.
• Jun Wang, Stephen Robertson, Arjen P. de Vries, and Marcel J.T. Reinders. Probabilistic relevance ranking for collaborative filtering. Information Retrieval 11 (6):477-497, 2008
![Page 91: Models for Information Retrieval and Recommendation](https://reader035.vdocument.in/reader035/viewer/2022070522/58ef06331a28ab18618b4695/html5/thumbnails/91.jpg)
Thanks• Alejandro Bellogín• Jun Wang• Thijs Westerveld• Victor Lavrenko