collective intelligence, part ii

37
Data Mining and Machine Learning- in a nutshell Arizona State University Data Mining and Machine Learning Lab Collective Intelligence 1 DATA MINING AND MACHINE LEARNING IN A NUTSHELL COLLECTIVE INTELLIGENCE Mohammad-Ali Abbasi http://www.public.asu.edu/~mabbasi2/ SCHOOL OF COMPUTING, INFORMATICS, AND DECISION SYSTEMS ENGINEERING ARIZONA STATE UNIVERSITY http://dmml.asu.edu/

Upload: mohammad-ali-abbasi

Post on 27-Jan-2015

112 views

Category:

Education


3 download

DESCRIPTION

Collective Intelligence, part ii, Huan Liu, Mohammad Ali Abbasi

TRANSCRIPT

Page 1: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 1

DATA MINING AND MACHINE LEARNINGIN A NUTSHELL

COLLECTIVE INTELLIGENCE

Mohammad-Ali Abbasihttp://www.public.asu.edu/~mabbasi2/

SCHOOL OF COMPUTING, INFORMATICS, AND DECISION SYSTEMS ENGINEERINGARIZONA STATE UNIVERSITY

http://dmml.asu.edu/

Page 2: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 2

• Recommendation Systems• Collaborative filtering• Content Based Filtering

Filtering &Making Recommendation

Page 3: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 3

Page 4: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 4

Page 5: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 5

Collaborative Filtering- Example

Page 6: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 6

Page 7: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 7

Page 8: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 8

Collaborative Filtering- Example

Page 9: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 9

Page 10: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 10

Collaborative Filtering

• Collaborative Filtering is a method of making personalized suggestions for other products, based on your previous shopping habits.

• The method of making automatic predictions (filtering) about the interests of a user by collecting taste information from many users.

• In most cases the goal is to predict user preferences on items by learning their aggregated relationships through the historical records

Page 11: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 11

What are they and Why are they

• RS – problem of information filtering

• RS – problem of machine learning

• Enhance user experience– Assist users in finding information– Reduce search and navigation time

• Increase productivity

• Increase credibility

• Mutually beneficial proposition

Page 12: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 12

Personalization

• Recommenders are instances of personalization software.

• Personalization concerns adapting to the individual needs, interests, and preferences of each user.

• Includes:– Recommending– Filtering– Predicting (e.g. form or calendar appt. completion)

• From a business perspective, it is viewed as part of Customer Relationship Management (CRM).

Page 13: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 13

Netfilx Prize

• The Netflix Prize sought to substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences.

On September 21, 2009 “BellKor’s Pragmatic Chaos” team, owned $1M Grand Prize.

Page 14: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 15

Types of Collaborative Filtering

• Memory-Based– This mechanism uses user rating data to compute

similarity between users or items then uses this similarity to make a recommendation• Similarity methods: Pearson correlation, vector cosine

• Model-Based– Models are developed using data mining, machine

learning algorithms to find patterns based on training data to make predictions for real data.• Model Based Alg.: Bayesian Networks, clustering

models, latent semantic models (SVD) , probabilistic latent semantic analysis, Latent Dirichlet allocation, Markov DP

Page 15: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 16

• Customer Based Algorithms• Item Based Algorithms• Cluster Models

Memory Based Algorithms

Page 16: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 17

Recommendation Algorithms- Customer Based Algorithm

• Most algorithms start by finding a set of customers whose purchased and rated items overlap the user’s purchased and rated items.

• The algorithm aggregates items from these similar customers, eliminates items the user has already purchased or rated, and recommends the remaining items to the user.

• Two popular versions of these algorithms:– collaborative filtering – cluster models.

Page 17: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 18

Collaborative Filtering

• A traditional collaborative filtering algorithm represents a customer as an N-dimensional vector of items, where N is the number of distinct catalog items. The components of the vector are positive for purchased or positively rated items and negative for negatively rated items.

Page 18: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 19

The user-oriented neighborhood method

• Joe likes the three movies on the left.

• To make a prediction for him, the system finds similar users who also liked those movies, and then determines which other movies they liked.

• In this case, all three liked Saving Private Ryan, so that is the first recommendation.

• Two of them liked Dune, so that is next, and so on.

Page 19: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 20

Recommendation Algorithms, Item Based Algorithm

• These algorithms focus on finding similar items, not similar customers.

• For each of the user’s purchased and rated items, the algorithm attempts to find similar items. It then aggregates the similar items and recommends them.– search-based methods – Amazon’s item-to-item collaborative filtering

Page 20: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 21

Cluster Models

• To find customers who are similar to the user, cluster models divide the customer base into many segments and treat the task as a classification problem.

• The algorithm’s goal is to assign the user to the segment containing the most similar customers.

• It then uses the purchases and ratings of the customers in the segment to generate recommendations.

Page 21: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 22

Clustering Example

• Clustering based on Gender and Genre

Page 22: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 23

Amazon Item Based- Collaborative Filtering

• Rather than matching the user to similar customers, item-to-item method, matches each of the user’s purchased and rated items to similar items, then combines those similar items into a recommendation list

Page 23: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 24

Inside of the algorithms, Customer Based Algorithms

• vi,j= vote of user i on item j

• Ii = items for which user i has voted

• Mean vote for i is

• Predicted vote for “active user” a is weighted sum

weights of n similar usersnormalizer

Page 24: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 25

Customer Based Algorithms, Computing the weights

• K-nearest neighbor

• Pearson correlation coefficient (Resnick ’94, Grouplens):

• Cosine distance (from IR)

else0

)neighbors( if1),(

aiiaw

Page 25: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 26

Customer Based Algorithms, Computing the weights

• Cosine with “inverse user frequency” fi = log(n/nj), where n is number of users, nj is number of users voting for item j

Page 26: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 29

Customer Based Algorithms, Evaluation

• Split users into train/test sets

• For each user a in the test set:– split a’s votes into observed (I) and to-predict (P)– measure average absolute deviation between

predicted and actual votes in P– predict votes in P, and form a ranked list – assume (a) utility of k-th item in list is max(va,j-

d,0), where d is a “default vote” (b) probability of reaching rank k drops exponentially in k. Score a list by its expected utility Ra

• Average Ra over all test users

Page 27: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 30

Collaborative Filtering Systems- Review

• Look for users who share the same rating patterns with the active user (the user whom the prediction is for).

• Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user

• Build an item-item matrix determining relationships between pairs of items

• Using the matrix, and the data on the current user, infer his taste

Page 28: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 31

Collaborative Filtering highlights

• Use other users recommendations (ratings) to judge item’s utility

• Key is to find users/user groups whose interests match with the current user

• Vector Space model widely used (directions of vectors are user specified ratings)

• More users, more ratings: better results

• Can account for items dissimilar to the ones seen in the past too

• Example: Movielens.org

Page 29: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 32

Collaborative Filtering Limitations

• Different users might use different scales. Possible solution: weighted ratings, i.e. deviations from average rating

• Finding similar users/user groups isn’t very easy

• New user: No preferences available

• New item: No ratings available

• Demographic filtering is required

• Multi-criteria ratings is required

Page 30: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 33

Challenges in Recommendation algorithms

• A large retailer might have huge amounts of data, tens of millions of customers and millions of distinct catalog items.

• Many applications require the results set to be returned in realtime, in no more than half a second, while still producing high-quality recommendations.

• New customers typically have extremely limited information, based on only a few purchases or product ratings.

• Older customers can have a glut of information, based on thousands of purchases and ratings.

• Customer data is volatile: Each interaction provides valuable customer data, and the algorithm must respond immediately to new information.

Page 31: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 34

Model Based Algorithms

Page 32: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 35

Model Based Methods

• Model or content-based methods treat the recommendations problem as a search for related items.

• Given the user’s purchased and rated items, the algorithm constructs a search query to find other popular items by the same author, artist, or director, or with similar keywords or subjects.– If a customer buys the Godfather DVD Collection, for

example, the system might recommend other crime drama titles, other titles starring Marlon Brando, or other movies directed by Francis Ford Coppola.

Page 33: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 36

Content based RS highlights

• Recommend items similar to those users preferred in the past

• User profiling is the key

• Items/content usually denoted by keywords

• Matching “user preferences” with “item characteristics” … works for textual information

• Vector Space Model widely used

Page 34: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 37

Content based RS - Limitations

• Not all content is well represented by keywords, e.g. images

• Items represented by same set of features are indistinguishable

• Overspecialization: unrated items not shown

• Users with thousands of purchases is a problem

• New user: No history available

• Shouldn’t show items that are too different, or too similar

Page 35: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 38

Other issues, not addressed much

• Combining and weighting different types of information sources– How much is a web page link worth vs a link in a newsgroup?

• Spamming—how to prevent vendors from biasing results?

• Efficiency issues—how to handle a large community?

• What do we measure when we evaluate CF?– Predicting actual rating may be useless!– Example: music recommendations:

• Beatles, Eric Clapton, Stones, Elton John, Led Zep, the Who, ...– What’s useful and new? for this need model of user’s prior

knowledge, not just his tastes.• Subjectively better recs result from “poor” distance metrics

Page 36: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 39

References

• http://www.cs.duke.edu/csed/socialnet/workshop/2006/assign/cf-4up.pdf

• http://www.deitel.com/ResourceCenters/Web20/RecommenderSystems/RecommenderSystemsandCollaborativeFiltering/tabid/1318/Default.aspx

• http://public.research.att.com/~volinsky/netflix/RecSys08tutorial.pdf

• http://www.grouplens.org/papers/pdf/www10_sarwar.pdf

• http://web4.cs.ucl.ac.uk/staff/jun.wang/blog/topics/research-resources/collaborative-filtering/

• http://webwhompers.com/collaborative-filtering.html

Page 37: Collective Intelligence, part II

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 40

Mohammad-Ali Abbasi (Ali), Ali, is a Ph.D student at Data Mining and Machine Learning Lab, Arizona State University. His research interests include Data Mining, Machine Learning, Social Computing, and Social Media Behavior Analysis.

http://www.public.asu.edu/~mabbasi2/