social bookmarking and collaborative filtering
Embed Size (px)
DESCRIPTION
Social Bookmarking and Collaborative Filtering. Christopher G. Wagner. What is Social Bookmarking?. Bookmark storage Online storage vice locally in a browser No folders Items can belong to more than one “folder” Finding others with similar interests - PowerPoint PPT PresentationTRANSCRIPT

+
Social Bookmarking and Collaborative FilteringChristopher G. Wagner

+ What is Social Bookmarking? Bookmark storage
Online storage vice locally in a browser
No folders Items can belong to more than one “folder”
Finding others with similar interests
Using interests of others to locate more interesting sites

+ Views of Social Bookmarks View personal bookmarks and tags
View all items with a particular tag(s) New way of searching
View tags of another user
Create private and public groups for sharing
View ratings of bookmarks

+ Joshua Schacter’s del.icio.us

+ Joshua’s ‘math’ Tag

+ The ‘math’ Tag

+ The del.icio.us Interface

+ My del.icio.us

+ A del.icio.us Network

+ The ‘for:’ Tag

+ Social Bookmarking Projects Del.icio.us
Furl.net
Flickr.com
Simpy.com
Gmail.com
Clusty.com
Stumbleupon.com
IBM’s dogear

+ What is Collaborative Filtering? “Collaborative filtering (CF) is the method of making automatic predictions (filtering) about the interests of a user by collecting taste information from many users (collaborating).”
-Wikipedia (http://en.wikipedia.org/wiki/Collaborative_filtering)
Take advantage of users’ input and behavior to make recommendations.
“System for helping people find relevant content”
-Rashmi Sinha (http://www.rashmisinha.com)

+ TraditionalCollaborative Filtering Each user represented by an N-dimensional vector, where N is the number of items
Elements of vector can be ratings, or indicator of purchase, etc. Typically multiplied by the inverse frequency
Use algorithm to measure similarity of vectors, e.g. cosine similarity
€
cos(A,B) = A • BA × B

+ Problems
M customers, N items
O(MN) is worst case
Typically O(M+N) Still problematic when M,N ~ 106

+ Cluster Models
View customers as a classification problem Create clusters of customers Assign user to “nearest” cluster Base recommendations on user’s cluster

+ Search Based Methods
Construct searches based on keywords from user’s existing items
Not practical if user has many items
Recommendations tend to be poor

+ Types ofCollaborative Filtering Active
Sending pointers to a resource User ratings
Passive Observing user behavior
Item Based Items become the focus, not users

+ Active Collaborative Filtering Uses a peer-to-peer approach
Users want to actively share information, recommendations, evaluations, ratings, etc. Usually, information is from a user who has direct experience with the product
Biased opinions Less data available

+ Netflix Queue

+ Netflix Ratings

+ Netflix Recommendations

+ Netflix Prize
October 2, 2006 - October 2, 2011
Improve their recommendation system by at least 10% over the current method
$1M Grand Prize
$50k Yearly Prizes

+ Passive Collaborative Filtering Monitor user’s activity
Purchasing item Repeated use of an item Number of times queried
Makes use of implicit filters Requires nothing additional from users Doesn’t capture user’s evaluation

+ Google’s Sponsored Links
www.AreYouASlackerMom.com
www.royalsaharajasper.com
Related to Pi Mu Epsilon “Will pay stipend to Grad” “Cheap Faculty Flights” “Greek Ringtone”

+ Google’s Personalized Search

+ Item-to-ItemCollaborative Filtering•Focus is on finding similar items, not similar customers
•Originally proposed by Vucetic and Obradovic in 2000
•Matches user’s items to similar items to create recommendations
•Association Rule Mining

+ Amazon Slide
Similar to impulse items in checkout line Tailored to each user

+ Amazon’s Recommendations

+ Amazon’s Similar Items

+ Amazon’s Algorithm
For each item in product catalog, I1
For each customer C who purchased I1
For each item I2 purchased by customer C
Record that a customer purchased I1 and I2
For each item I2
Compute the similarity between I1 and I2
•Only items purchased by common customer are compared, not all pairs of items

+ Run Time of Algorithm
Worst case O(N2M)
In practice, more like O(NM)
Is run offline, so it does not affect customer
For customer, you only have to aggregate items similar to their purchases and make recommendations, which is fast

+ Collaborative FilteringWith Tags User input is usually a barrier, not so with tags
User’s bookmarks reveal information about their interests, which is useful for finding others of similar interests
Applications to corporate repositories of information (IBM’s dogear) Both active (tags) and passive (logs) filtering

+ References
G. Linden, B. Smith, and J. York, “Amazon.com Recommendations: Item-to-Item Collaborative Filtering,” IEEE Internet Computing, 2003, pp. 76-80.
R. Sinha, “Collaborative Filtering strikes back (this time with tags)”, http://www.rashmisinha.com/archives/05_10/tags-collaborative-filtering.html.
S. Vucetic and Z. Obradovic, “A Regression-Based Approach for Scaling-Up Personalized Recommender Systems in E-Commerce,” Workshop on Web Mining for E-Commerce, at the 6th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD), Boston, MA, 2000.
R. Wash and E. Rader, “Collaborative Filtering with del.icio.us”, working paper.
R. Wash and E. Rader, “Incentives for Contribution in del.icio.us: The Role of Tagging in Information Discovery”, working paper.
Wikipedia, “Collaborative Filtering”, http://en.wikipedia.org/wiki/Collaborative_filtering.