evaluation of collaborative filtering algorithms for recommending articles on citeulike

Evaluation of Collaborative Filtering Algorithms for Recommending Articles on CiteULike

June 29th, 2009

HT 2009, Workshop “Web 3.0: Merging Semantic Web and Social Web”

Dr. Peter Brusilovsky, Associate ProfessorDenis Parra, PhD StudentSchool of Information SciencesUniversity of Pittsburgh

Outline

• Motivation• Methods

– CCF– NwCF– BM25

• The Study• Description of the Data• Results• Conclusions

MotivationBased on information available on CiteULike : Develop user-centered recommendations of

scientific articles. Investigate the potential of users’ tags in

collaborative tagging systems to provide recommendations.

Compare the accuracy of user-based collaborative filtering methods.

Why CiteULike? Popular collaborative tagging system more topic-

oriented than delicious: article references. Familiarity with the system.

CiteULike

Methods: CCF (1 / 2)• Classic Collaborative Filtering (CCF): user-based

recommendations, using Pearson Correlation (users’ similarity) and adjusted ratings to rank items to recommend [1]

∑∑∑

⊂⊂

⊂

−−

−−=

nunu

nu

CRi nniCRi uui

CRi nniuui

rrrr

rrrrnuuserSim

,,

,

22 )()(

))((),(

∑∑

⊂

⊂−⋅

+=)(

)(

),(

)(),(),(

uneighborsn

uneighborsn nni

unuuserSim

rrnuuserSimriupred

Methods: CCF (2 / 2)

3

4

1

4

4

1

1

3

3

2

5

3

4

2

1

3

2

2

53

3

2

Methods: NwCF (1 / 2)• Neighbor weighted Collaborative Filtering

(NwCF): Similar to CCF, yet incorporates the “amount of neighbors rating an item” in the ranking formula of recommended items

∑∑∑

⊂⊂

⊂

−−

−−=

nunu

nu

CRi nniCRi uui

CRi nniuui

rrrr

rrrrnuuserSim

,,

,

22 )()(

))((),(

),())(1(log),( 10 iupredinbriudpre ⋅+=′

Methods: NwCF (2 / 2)

3

4

1

4

4

1

1

3

3

2

5

3

4

2

1

3

2

2

53

3

2

Methods: BM25 (1 / 2)

• BM25: We obtain the similarity between users (neighbors) using their set of tags as “documents” and performing an Okapi BM25 (probabilistic IR model) Retrieval Status Value [2] calculation.

),())(1(log),( 10 iupredinbriudpre ⋅+=′

∑∈ +

+⋅

+×+−+

⋅=qt tq

tq

tdaved

tdd tfk

tfk

tfLLbbk

tfkIDFRSV

3

3

1

1)1(

))/()1((

)1(

Methods: BM25 (2 / 2)

Query terms Doc_1 Doc_2 Doc_3

The Study

• 7 subjects• To each subject, four lists of 10

recommendations (each list) were created (CCF, NwCF, BM25_10, BM25_20)

• The four lists were combined and sorted randomly (due to overlapping of recommendations, less than 40 items)

• Subjects were asked to evaluate relevance (relevant/somewhat relevant/not relevant) and novelty (novel/ somewhat novel/ not novel)

Description of the Data

Crawl CUL for 20 “center users” (only 7 were used for the study)

Annotation: tuple {user, article, tag}

Item # of unique instances

users 358articles 186,122tags 51,903

annotations 902,711

Results

(a) nDCG (b) Average Novelty (c) Precision_2@5

(d) Precision_2@10 (e) Precision_2_1@5 (f) Precision_2_1@10

Conclusions

• The rating scale must be considered carefully in a CF approach.

• NwCF, which incorporates the number of raters, decreases the uncertainty produced by items with too few ratings.

• The tag-based user similarity approach shows interesting results, which can lead us to consider it a valid approach to Pearson-correlation when using CF algorithms.

• We will incorporate more users in our future studies to make the results more conclusive.

Questions?

Bibliography

• [1] Schafer, J., Frankowski, D., Herlocker, J. and Sen, S. 2007 Collaborative Filtering Recommender Systems. The Adaptive Web. (May 2007), 291-324.

• [2] Manning, C., Raghavan, P. and Schutze, H. 2008 Introduction to Information Retrieval. Cambridge University Press.

evaluation of collaborative filtering algorithms for recommending articles on citeulike

Education