document recommendation in social tagging services z. guan, c. wang, j. bu, c. chen, k. yang, d....
TRANSCRIPT
![Page 1: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/1.jpg)
Document Recommendation in Social Tagging Services
Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. HeZhejiang University, ChinaWWW 2010
July 22, 2010Hyunwoo Kim
![Page 2: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/2.jpg)
Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion
2 / 25
![Page 3: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/3.jpg)
Introduction [1/5]
Social tagging services– Allowing users to annotate various online resources with
tags– Facilitating the users in finding and organizing online re-
sources– Providing meaningful collaborative semantic data
Recommender systems– Focusing on user rating data in traditional studies– Social tagging data is becoming more and more prevalent
recently
In this paper– The problem of document recommendation using purely
tagging data
3 / 25
![Page 4: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/4.jpg)
Introduction [2/5]
Searching in most tagging services– Keyword-based search– The number of returned results is very large– Returning resources which literally match the given tags– Ignoring semantically related tags
Searching for automobile → resources tags by car may not be re-trieved
4 / 25
![Page 5: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/5.jpg)
Introduction [3/5]
Differences between tagging data and rating data– Tagging data doesn’t have users’ explicit preference in-
formation on resources– Tagging data: user, tag and resource– Rating data: user and resource
Collaborative filtering method
5 / 25
![Page 6: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/6.jpg)
Introduction [4/5]
Multi-type Interrelated Objects Embedding (MIOE)– Annotation relationships between tags and documents– Usage relationships between tags and users– Bookmarking relationships between users and documents– Affinity relationships among documents– 3 bipartite graphs and 1 affinity graph
Optimal semantic space– Preserving the connectivity structure of these graphs– Representing users, tags and documents in the same space
if (two objects are strongly connected) {the corresponding edge has a high weight;
two object should be mapped close to each other in the space; }
6 / 25
![Page 7: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/7.jpg)
Introduction [5/5]
Goal of MIOE– Given a user, the closest documents which have not been
bookmarked by this user are recommended to her– Naturally capturing the correlations among tags– Applied to any social tagging data as long as a notion of
similarity between resources is defined
7 / 25
![Page 8: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/8.jpg)
Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion
8 / 25
![Page 9: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/9.jpg)
Multi-type Interrelated Objects Em-bedding [1/7]
The basic intuition behind MIOE
if (a user u has used a tag t many times) {she has strong interest in the topic represented by the tag
t;}
if(t has been applied to document d many times) {d is strongly related to the topic represented by t;
}
We should recommend such document d to the user u;
9 / 25
![Page 10: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/10.jpg)
MIOE [2/7]
- Learning the Optimal Semantic Space
10 / 25
![Page 11: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/11.jpg)
MIOE [3/7]
- Learning the Optimal Semantic Space
Representing users, tags and documents in the same space
Strongly connected two objects should be mapped close to each other in the learned space
: documents
: users
: tags
x
y
z
11 / 25
![Page 12: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/12.jpg)
MIOE [4/7]
- Learning the Optimal Semantic Space
The problem– Finding a semantic space for users, tags and document
which best preserves the connectivity structures of graphs– Annotation relationship, usage relationship, bookmark re-
lationship and affinity relationship
Given a user, recommending a list of document in which the users would be interested with the highest probabilities
M. Belkin et al., “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering”, Advances in Neural Information Processing Systems 14, 2001
W. Min et al., “Locality Pursuit Embedding”, Pattern Recognition 37, 2004
X. He et al., “Learning a Maximum Margin Subspace for Image Retrieval”, IEEE Transactions on Knowledge and Data Engineering 20, 2008
12 / 25
![Page 13: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/13.jpg)
MIOE [5/7]
- Learning the Optimal Semantic Space
Projections*
– PCA (Principal Component Analysis)– LPE (Locality Pursuit Embedding)
* W. Min et al., “Locality pursuit Embedding”,
Pattern Recognition 37, 2004
13 / 25
![Page 14: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/14.jpg)
MIOE [6/7]
- Learning the Optimal Semantic Space
Distance metric: Euclidean distance
A(a) B(b) abBAd ),(
A(a1, a2)
B(b1, b2)2
222
11 )()(),( ababBAd
233
222
211 )()()(),( abababBAd
A(a1, a2, a3)
B(b1, b2, b3)
14 / 25
![Page 15: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/15.jpg)
MIOE [7/7]
- Learning the Optimal Semantic Space
In practice– New objects will continually join in the tagging data– Re-computing the optimal space for each new object is
costly
Solution– Approximating the positions of new objects in the learned
space by using approximated eigenfunctions based on the kernel trick*
– Re-computing the optimal space periodically
* Y. Bengio et al., “Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering” , Advances in Neural Information Processing Systems 16, 2003
15 / 25
![Page 16: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/16.jpg)
Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion
16 / 25
![Page 17: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/17.jpg)
Experiments [1/6]
Data sets: Del.icio.us and CiteULike
Compared Algorithms– User-CF: a version of user-based CF algorithm for unary data– Funk-SVD: Singular Vector Decomposition to approximate
the original user-item matrix using a low rank ma-trix
– TVS: Tag Vector Similarity to represent users and document in the tag space as TF-IDF tag profile vectors
– CVS: Content Vector Similarity to maintain multiple for a user to better capture the user’s interests
Del.icio.us CiteULike
No. of users 300 300
No. of tags 14,790 10,753
No. of documents 12,819 11,558
No. of bookmarks 122,879 34,061
17 / 25
![Page 18: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/18.jpg)
Experiments [2/6]
Evaluation methodology– Total 300 users– 270 users as training users– 30 users as test users
50% bookmarks are used for model construction (training) Remaining 50% bookmarks are used for evaluation (ground
truth)
Evaluation metrics– Precision– Mean Average Precision (MAP)– Normalized Discount Cumulative Gain (NDCG)
18 / 25
![Page 19: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/19.jpg)
Experiments [3/6]
19 / 25
![Page 20: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/20.jpg)
Experiments [4/6]
20 / 25
![Page 21: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/21.jpg)
Experiments [5/6]
21 / 25
![Page 22: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/22.jpg)
Frequently used tags: blog(30), design(25), rails(22), programming(20), reference(15), javascript(15), ajax(15), development(14), software(13), apple(13), ruby(11)
Rank URL Description
1 http://www.aptana.com/ An IDE software for Web application development and deployment.
2 http://www.squidfingers.com/ The personal portfolio of a graphic designer who design background patterns and wallpapers for web pages.
3 http://www.fudgie.org/ A multiple server log file visuallizer written in Ruby.
Experiments [6/6]
Case studies– Recommended Web pages
– Nearest tags
Selected Tag Six Nearest Tags
shopping product, buy, consumer, merchandise, products, shop
funny humor, humour, culture, weird, interesting, cool
food kitchen, eating, foodblog, craving, gourmet, cooking
music mp3, songza, socialpl, deezer, musicsearch, pandora
travel trip, bookings, trvl, charter, transporation, travelsearch
22 / 25
![Page 23: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/23.jpg)
Contents Introduction Multi-type Interrelated Objects Embedding Experiments Conclusion
23 / 25
![Page 24: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/24.jpg)
Conclusion Focusing on the problem of document recommen-
dation in social tagging services Modeling as a representation learning problem Proposing a novel semantic space learning algo-
rithm (MIOE)
Optimal semantic space for users, tags and docu-ments by keeping related objects close in the target space
Future work– Examining tag ambiguity issue which is harmful to MIOE– Improving MIOE’s scalability to be applied to very large
datasets24 / 25
![Page 25: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/25.jpg)
Thank You
![Page 26: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/26.jpg)
Appendix [1/9]
Q(f, g, p): cost function f: |U| x 1 vector for U, fi is the coordinate of ui on the
line g: |T| x 1 vector for T, gi is the coordinate of ti on the
line p: |D| x 1 vector for D, pi is the coordinate of di on the
line Rut, Rtd, Rud: weighted adjacent matrices W: affinity matrix
26 / 25
![Page 27: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/27.jpg)
Dut: diagonal matrix, (i, i)th-elements equal to the sum of the i-th row of Rut
Dtu: diagonal matrix, (i, i)th-elements equal to the sum of the i-th column of Rut
Appendix [2/9]
27 / 25
![Page 28: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/28.jpg)
Dtd: diagonal matrix, (i, i)th-elements equal to the sum of the i-th row of Rtd
Ddt: diagonal matrix, (i, i)th-elements equal to the sum of the i-th column of Rtd
Dtd: diagonal matrix, (i, i)th-elements equal to the sum of the i-th row of Rud
Ddt: diagonal matrix, (i, i)th-elements equal to the sum of the i-th column of Rud
Appendix [3/9]
28 / 25
![Page 29: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/29.jpg)
Appendix [4/9]
29 / 25
Using graph Laplacian matrix*
D: diagonal matrix, (i, i)-th elements equal to the sum of the i-th row of W
W: affinity matrix
* M. Belkin et al., “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering”, Advances in Neural Information Processing Systems 14, 2001
![Page 30: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/30.jpg)
Appendix [5/9]
30 / 25
Using Rayleigh quotient* in order to remove an arbitrary scaling factor
* J. Ham et al., “Semisupervised alignment of manifolds”, the Annual Conference on Uncertainty in Artificial Intelligence, 2005
![Page 31: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/31.jpg)
Appendix [6/9]
31 / 25
Using Rayleigh quotient
![Page 32: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/32.jpg)
By the Rayleigh-Ritz theorem*– The solution of this optimization problem is given by the
eigenvector corresponding to the second smallest eigenvalue of
* H. Lutkepohl, “Handbook of Matrices”, Wiley, 1996
Appendix [7/9]
32 / 25
L~
![Page 33: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/33.jpg)
Appendix [8/9]
33 / 25
Maximizing the global variance in the target sub-space instead of maximizing
The variance of f, g and p*
* F. R. K. Chung, “Spectral Graph Theory”, American Mathematical Society, 1997
![Page 34: Document Recommendation in Social Tagging Services Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He Zhejiang University, China WWW 2010 July](https://reader036.vdocument.in/reader036/viewer/2022062719/56649ee55503460f94bf5318/html5/thumbnails/34.jpg)
The optimization problem becomes
This optimization problem can be solved by finding the generalized eigenvector corresponding to the second smallest eigenvalue of
Appendix [9/9]
34 / 25