document recommendation in social tagging services z. guan, c. wang, j. bu, c. chen, k. yang, d....

34
Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July 22, 2010 Hyunwoo Kim

Upload: derek-waters

Post on 02-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Document Recommendation in Social Tagging Services

Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. HeZhejiang University, ChinaWWW 2010

July 22, 2010Hyunwoo Kim

Page 2: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion

2 / 25

Page 3: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Introduction [1/5]

Social tagging services– Allowing users to annotate various online resources with

tags– Facilitating the users in finding and organizing online re-

sources– Providing meaningful collaborative semantic data

Recommender systems– Focusing on user rating data in traditional studies– Social tagging data is becoming more and more prevalent

recently

In this paper– The problem of document recommendation using purely

tagging data

3 / 25

Page 4: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Introduction [2/5]

Searching in most tagging services– Keyword-based search– The number of returned results is very large– Returning resources which literally match the given tags– Ignoring semantically related tags

Searching for automobile → resources tags by car may not be re-trieved

4 / 25

Page 5: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Introduction [3/5]

Differences between tagging data and rating data– Tagging data doesn’t have users’ explicit preference in-

formation on resources– Tagging data: user, tag and resource– Rating data: user and resource

Collaborative filtering method

5 / 25

Page 6: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Introduction [4/5]

Multi-type Interrelated Objects Embedding (MIOE)– Annotation relationships between tags and documents– Usage relationships between tags and users– Bookmarking relationships between users and documents– Affinity relationships among documents– 3 bipartite graphs and 1 affinity graph

Optimal semantic space– Preserving the connectivity structure of these graphs– Representing users, tags and documents in the same space

if (two objects are strongly connected) {the corresponding edge has a high weight;

two object should be mapped close to each other in the space; }

6 / 25

Page 7: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Introduction [5/5]

Goal of MIOE– Given a user, the closest documents which have not been

bookmarked by this user are recommended to her– Naturally capturing the correlations among tags– Applied to any social tagging data as long as a notion of

similarity between resources is defined

7 / 25

Page 8: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion

8 / 25

Page 9: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Multi-type Interrelated Objects Em-bedding [1/7]

The basic intuition behind MIOE

if (a user u has used a tag t many times) {she has strong interest in the topic represented by the tag

t;}

if(t has been applied to document d many times) {d is strongly related to the topic represented by t;

}

We should recommend such document d to the user u;

9 / 25

Page 10: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

MIOE [2/7]

- Learning the Optimal Semantic Space

10 / 25

Page 11: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

MIOE [3/7]

- Learning the Optimal Semantic Space

Representing users, tags and documents in the same space

Strongly connected two objects should be mapped close to each other in the learned space

: documents

: users

: tags

x

y

z

11 / 25

Page 12: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

MIOE [4/7]

- Learning the Optimal Semantic Space

The problem– Finding a semantic space for users, tags and document

which best preserves the connectivity structures of graphs– Annotation relationship, usage relationship, bookmark re-

lationship and affinity relationship

Given a user, recommending a list of document in which the users would be interested with the highest probabilities

M. Belkin et al., “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering”, Advances in Neural Information Processing Systems 14, 2001

W. Min et al., “Locality Pursuit Embedding”, Pattern Recognition 37, 2004

X. He et al., “Learning a Maximum Margin Subspace for Image Retrieval”, IEEE Transactions on Knowledge and Data Engineering 20, 2008

12 / 25

Page 13: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

MIOE [5/7]

- Learning the Optimal Semantic Space

Projections*

– PCA (Principal Component Analysis)– LPE (Locality Pursuit Embedding)

* W. Min et al., “Locality pursuit Embedding”,

Pattern Recognition 37, 2004

13 / 25

Page 14: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

MIOE [6/7]

- Learning the Optimal Semantic Space

Distance metric: Euclidean distance

A(a) B(b) abBAd ),(

A(a1, a2)

B(b1, b2)2

222

11 )()(),( ababBAd

233

222

211 )()()(),( abababBAd

A(a1, a2, a3)

B(b1, b2, b3)

14 / 25

Page 15: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

MIOE [7/7]

- Learning the Optimal Semantic Space

In practice– New objects will continually join in the tagging data– Re-computing the optimal space for each new object is

costly

Solution– Approximating the positions of new objects in the learned

space by using approximated eigenfunctions based on the kernel trick*

– Re-computing the optimal space periodically

* Y. Bengio et al., “Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering” , Advances in Neural Information Processing Systems 16, 2003

15 / 25

Page 16: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion

16 / 25

Page 17: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Experiments [1/6]

Data sets: Del.icio.us and CiteULike

Compared Algorithms– User-CF: a version of user-based CF algorithm for unary data– Funk-SVD: Singular Vector Decomposition to approximate

the original user-item matrix using a low rank ma-trix

– TVS: Tag Vector Similarity to represent users and document in the tag space as TF-IDF tag profile vectors

– CVS: Content Vector Similarity to maintain multiple for a user to better capture the user’s interests

Del.icio.us CiteULike

No. of users 300 300

No. of tags 14,790 10,753

No. of documents 12,819 11,558

No. of bookmarks 122,879 34,061

17 / 25

Page 18: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Experiments [2/6]

Evaluation methodology– Total 300 users– 270 users as training users– 30 users as test users

50% bookmarks are used for model construction (training) Remaining 50% bookmarks are used for evaluation (ground

truth)

Evaluation metrics– Precision– Mean Average Precision (MAP)– Normalized Discount Cumulative Gain (NDCG)

18 / 25

Page 19: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Experiments [3/6]

19 / 25

Page 20: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Experiments [4/6]

20 / 25

Page 21: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Experiments [5/6]

21 / 25

Page 22: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Frequently used tags: blog(30), design(25), rails(22), programming(20), reference(15), javascript(15), ajax(15), development(14), software(13), apple(13), ruby(11)

Rank URL Description

1 http://www.aptana.com/ An IDE software for Web application development and deployment.

2 http://www.squidfingers.com/ The personal portfolio of a graphic designer who design background patterns and wallpapers for web pages.

3 http://www.fudgie.org/ A multiple server log file visuallizer written in Ruby.

Experiments [6/6]

Case studies– Recommended Web pages

– Nearest tags

Selected Tag Six Nearest Tags

shopping product, buy, consumer, merchandise, products, shop

funny humor, humour, culture, weird, interesting, cool

food kitchen, eating, foodblog, craving, gourmet, cooking

music mp3, songza, socialpl, deezer, musicsearch, pandora

travel trip, bookings, trvl, charter, transporation, travelsearch

22 / 25

Page 23: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion

23 / 25

Page 24: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Conclusion Focusing on the problem of document recommen-

dation in social tagging services Modeling as a representation learning problem Proposing a novel semantic space learning algo-

rithm (MIOE)

Optimal semantic space for users, tags and docu-ments by keeping related objects close in the target space

Future work– Examining tag ambiguity issue which is harmful to MIOE– Improving MIOE’s scalability to be applied to very large

datasets24 / 25

Page 25: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Thank You

Page 26: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Appendix [1/9]

Q(f, g, p): cost function f: |U| x 1 vector for U, fi is the coordinate of ui on the

line g: |T| x 1 vector for T, gi is the coordinate of ti on the

line p: |D| x 1 vector for D, pi is the coordinate of di on the

line Rut, Rtd, Rud: weighted adjacent matrices W: affinity matrix

26 / 25

Page 27: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Dut: diagonal matrix, (i, i)th-elements equal to the sum of the i-th row of Rut

Dtu: diagonal matrix, (i, i)th-elements equal to the sum of the i-th column of Rut

Appendix [2/9]

27 / 25

Page 28: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Dtd: diagonal matrix, (i, i)th-elements equal to the sum of the i-th row of Rtd

Ddt: diagonal matrix, (i, i)th-elements equal to the sum of the i-th column of Rtd

Dtd: diagonal matrix, (i, i)th-elements equal to the sum of the i-th row of Rud

Ddt: diagonal matrix, (i, i)th-elements equal to the sum of the i-th column of Rud

Appendix [3/9]

28 / 25

Page 29: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Appendix [4/9]

29 / 25

Using graph Laplacian matrix*

D: diagonal matrix, (i, i)-th elements equal to the sum of the i-th row of W

W: affinity matrix

* M. Belkin et al., “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering”, Advances in Neural Information Processing Systems 14, 2001

Page 30: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Appendix [5/9]

30 / 25

Using Rayleigh quotient* in order to remove an arbitrary scaling factor

* J. Ham et al., “Semisupervised alignment of manifolds”, the Annual Conference on Uncertainty in Artificial Intelligence, 2005

Page 31: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Appendix [6/9]

31 / 25

Using Rayleigh quotient

Page 32: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

By the Rayleigh-Ritz theorem*– The solution of this optimization problem is given by the

eigenvector corresponding to the second smallest eigenvalue of

* H. Lutkepohl, “Handbook of Matrices”, Wiley, 1996

Appendix [7/9]

32 / 25

L~

Page 33: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

Appendix [8/9]

33 / 25

Maximizing the global variance in the target sub-space instead of maximizing

The variance of f, g and p*

* F. R. K. Chung, “Spectral Graph Theory”, American Mathematical Society, 1997

Page 34: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July

The optimization problem becomes

This optimization problem can be solved by finding the generalized eigenvector corresponding to the second smallest eigenvalue of

Appendix [9/9]

34 / 25