evaluation of collaborative filtering algorithms for recommending articles on citeulike
TRANSCRIPT
Evaluation of Collaborative Filtering Algorithms for Recommending Articles on CiteULike
June 29th, 2009
HT 2009, Workshop “Web 3.0: Merging Semantic Web and Social Web”
Dr. Peter Brusilovsky, Associate ProfessorDenis Parra, PhD StudentSchool of Information SciencesUniversity of Pittsburgh
Outline
• Motivation• Methods
– CCF– NwCF– BM25
• The Study• Description of the Data• Results• Conclusions
MotivationBased on information available on CiteULike : Develop user-centered recommendations of
scientific articles. Investigate the potential of users’ tags in
collaborative tagging systems to provide recommendations.
Compare the accuracy of user-based collaborative filtering methods.
Why CiteULike? Popular collaborative tagging system more topic-
oriented than delicious: article references. Familiarity with the system.
Methods: CCF (1 / 2)• Classic Collaborative Filtering (CCF): user-based
recommendations, using Pearson Correlation (users’ similarity) and adjusted ratings to rank items to recommend [1]
∑∑∑
⊂⊂
⊂
−−
−−=
nunu
nu
CRi nniCRi uui
CRi nniuui
rrrr
rrrrnuuserSim
,,
,
22 )()(
))((),(
∑∑
⊂
⊂−⋅
+=)(
)(
),(
)(),(),(
uneighborsn
uneighborsn nni
unuuserSim
rrnuuserSimriupred
Methods: NwCF (1 / 2)• Neighbor weighted Collaborative Filtering
(NwCF): Similar to CCF, yet incorporates the “amount of neighbors rating an item” in the ranking formula of recommended items
∑∑∑
⊂⊂
⊂
−−
−−=
nunu
nu
CRi nniCRi uui
CRi nniuui
rrrr
rrrrnuuserSim
,,
,
22 )()(
))((),(
),())(1(log),( 10 iupredinbriudpre ⋅+=′
Methods: BM25 (1 / 2)
• BM25: We obtain the similarity between users (neighbors) using their set of tags as “documents” and performing an Okapi BM25 (probabilistic IR model) Retrieval Status Value [2] calculation.
),())(1(log),( 10 iupredinbriudpre ⋅+=′
∑∈ +
+⋅
+×+−+
⋅=qt tq
tq
tdaved
tdd tfk
tfk
tfLLbbk
tfkIDFRSV
3
3
1
1)1(
))/()1((
)1(
The Study
• 7 subjects• To each subject, four lists of 10
recommendations (each list) were created (CCF, NwCF, BM25_10, BM25_20)
• The four lists were combined and sorted randomly (due to overlapping of recommendations, less than 40 items)
• Subjects were asked to evaluate relevance (relevant/somewhat relevant/not relevant) and novelty (novel/ somewhat novel/ not novel)
Description of the Data
Crawl CUL for 20 “center users” (only 7 were used for the study)
Annotation: tuple {user, article, tag}
Item # of unique instances
users 358articles 186,122tags 51,903
annotations 902,711
Results
(a) nDCG (b) Average Novelty (c) Precision_2@5
(d) Precision_2@10 (e) Precision_2_1@5 (f) Precision_2_1@10
Conclusions
• The rating scale must be considered carefully in a CF approach.
• NwCF, which incorporates the number of raters, decreases the uncertainty produced by items with too few ratings.
• The tag-based user similarity approach shows interesting results, which can lead us to consider it a valid approach to Pearson-correlation when using CF algorithms.
• We will incorporate more users in our future studies to make the results more conclusive.