content-based book recommending using learning for text categorization

20
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER BY RAYMOND J. MOONEY AND LORIENE ROY UNIVERSITY OF TEXAS, AUSTIN

Upload: steven-goff

Post on 31-Dec-2015

20 views

Category:

Documents


1 download

DESCRIPTION

CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION. TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER BY RAYMOND J. MOONEY AND LORIENE ROY UNIVERSITY OF TEXAS, AUSTIN. OVERVIEW. Introduction Techniques - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

CONTENT-BASED BOOK RECOMMENDING USING LEARNING

FOR TEXT CATEGORIZATION

TRIVIKRAM BHAT

UNIVERSITY OF TEXAS AT ARLINGTON

DATA MINING

CSE6362

BASED ON PAPER

BY

RAYMOND J. MOONEY AND LORIENE ROY

UNIVERSITY OF TEXAS, AUSTIN

Page 2: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

2

OVERVIEW

• Introduction

• Techniques

• Drawbacks of Existing Systems

• Advantages of Content Based Systems

• LIBRA

• System Description

• Experimental Results

• Future Work

• Conclusions

Page 3: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

3

INTRODUCTION

General goal of a Recommender System• Make personalized suggestions based on previous

examples of users likes and dislikes

Types• Existing systems that use Social Filtering methods

(base recommendations on other users preferences)

• Content Based systems (use information about an item itself to make

suggestions)

Page 4: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

4

INTRODUCTION

Companies

• Firefly

• Net Perceptions

• LikeMinds

• Amazon ( Book Recommending )

• Barnes And Noble ( Book Recommending )

Page 5: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

5

TECHNIQUES

Social / Collaborative Filtering

• Maintain a Database of user preferences

• Find other users whose known preferences correlate significantly with a given user

Content Based Filtering

• Allows a system to uniquely characterize each user without having to match their interests to someone else’s

• Items are recommended based on the information of the item itself

Page 6: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

6

DRAWBACKS OF EXISTING SYSTEMS

• Assume that a given user’s tastes are generally the same as another user

• Assume that there are sufficient number of ratings

• Tend to recommend popular titles

• Need for sufficient information about other users which raises concerns about privacy and access to customer data

Page 7: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

7

ADVANTAGES OF CONTENT BASED SYSTEMS

• Items are recommended based on the content of the item rather than on other users preferences

• Provides a way to list content features that caused the item to be recommended

• Allows users to provide initial subject information to aid the system

Page 8: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

8

LIBRA(Learning Intelligent Book

Recommending Agent)

• A database of book information extracted from web pages at Amazon.com

• Users select a set of training books and rate them on a scale of 1-10

• System learns a profile of the user using a Bayesian learning algorithm

• Produces a ranked list of the most recommended additional titles from the system catalog

Page 9: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

9

SYSTEM DESCRIPTION

Extracting information and building a database• Perform Amazon subject search

• Download book description URL’s

• Information Extraction using slots to get valuable information about each book

• Current slots used are title, authors, published reviews and many more

• A simple extraction system is sufficient as the layout of Amazon’s automatically generated pages is regular

• Some preprocessing is done

(author names into unique tokens of the form first_initial_last-name)

Page 10: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

10

SYSTEM DESCRIPTION

Learning a Profile• User selects titles (maybe for a particular author)

- Need not perform a random scan of the entire database

• Users rate the selected titles based on a scale of 1-10

• Naïve Bayesian text classifier is used to classify a book title as either positive(6-10) or negative(1-5)

• N training books Be (1 <= e <= N)

• Each has 2 real weights

- Positive weight e1 = (r-1)/9

- Negative weight e0 = 1 - e1

- r = user rating (1 <= r <= 10)

Page 11: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

11

SYSTEM DESCRIPTION

Parameters• P(cj) = ej / N

• P(wk|cj, sm) = ej nkem / L(cj, sm)

– Where nkem = count of the number of times a word wk

appears in example Be in slot sm

– L(cj, sm) = ej / dm denotes the total weighted length of the documents in category cj and slot sm

– dm = vector of documents

• Strength – It measures how much more likely a word in a slot is to appear in a positively rated book than a negatively rated book

Page 12: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

12

Sample Positive Profile Features

Slot Word Strength

WORDS ZUBRIN 9.85

WORDS SMOLIN 9.39

WORDS TREFIL 8.77

WORDS DOT 8.67

SUBJECTS COMPARATIVE 8.39

AUTHOR D GOLDSMITH 8.04

WORDS ALH 7.97

WORDS MANNED 7.97

RELATED TITLES SETTLE 7.91

Page 13: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

13

SYSTEM DESCRIPTION

Producing, Explaining and Revising Recommendations• Once a profile is learnt, it is used to predict the preferred

ranking of the remaining books

• Recommendations are reviewed by the user and the user may assign their own rating to the examples they believe to be incorrectly ranked

• Retrain the system by repeating the above several times in order to produce the best results

Page 14: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

14

EXPERIMENTAL RESULTS

Data Collection• Several data sets were assembled (LIT1, LIT2, MYST, SCI, SF)

• In order to present a quantitative picture of performance on a realistic sample, books were selected at random

• If the user was not familiar with a book, the user was asked to give a rating based on the information provided by the Amazon page describing the book

Page 15: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

15

EXPERIMENTAL RESULTS

Performance Evaluation• Performed 10-fold cross validation on the examples

• Various metrics were used to measure the performance

– Classification accuracy (Acc): The percentage of examples correctly classified as positive or negative

– Precision (Pr): The percentage of examples classified as positive which are positive

Page 16: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

16

EXPERIMENTAL RESULTS

Discussion• User-selected examples v/s Randomly selected examples

– User-selected examples are better as the user can accurately rate the selection

– Randomly selected examples tend to cover the complete dataset

• Conclusion – Avoid prematurely committing to a specific methodology

Page 17: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

17

EXPERIMENTAL RESULTS

• Can Collaborative and Content-Based approaches be combined to produce better results?

• Slots – related authors, related titles

• When the above slots were removed, performance degraded

Use of both approaches together produces better results

Page 18: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

18

FUTURE WORK

• Web-Based interface (with a larger body of users)• Compare LIBRA’s Content-Based Approach to a standard

Collaborative Approach• Maximize the utility of the small training set by using various

Machine Learning techniques– Unsupervised learning– Active learning (incremental approach)

• One effective approach – provide highly rated examples, generate initial recommendations, review the results, provide low rating for bad items and retrain the system to get new recommendations

Page 19: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

19

CONCLUSIONS

• Content-Based Approach holds the promise of being able to effectively recommend items that have not been rated

• Provides accurate information without any background knowledge of other users preferences

• Combining Collaborative techniques does provide better results• www.cs.utexas.edu/users/ml/recommender.html• Partially supported by NSF

Page 20: CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

20

QUESTIONS??