recommender systems sumir chandra the applied software systems laboratory rutgers university

Post on 02-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Recommender Systems

Sumir ChandraThe Applied Software Systems

LaboratoryRutgers University

Introduction Information overload – decisions??? Too many domains, less experience, too much

data- books, movies, music, websites, articles, etc.

System providing recommendations to users based on opinions/behaviors of others- efficient attention, better matches, non-obvious connections, keep users coming back for more …

E.g. – E-commerce: Reel.com, Levi’s, eBay, Excite Commerce: call centers, direct marketing

Introduction (contd.) Data sources: purchase data, browsing &

searching data, feedback by users, text comments, expert recommendations

Taxonomy:- text comments (expert/user reviews)- attribute based (this author also wrote …)- item-to-item correlation (people who bought this item also bought …)- people-to-people correlation (users like you …)

Primary transformation: recommendation aggregation or good matching between recommender and seeker

CorrelationsItem-to-item correlation Connect users to

items they may be unaware of

Based on keywords or features of object

Key statistic: high/low- # people who bought A & B / # people who bought A

People-to-people correlation Collaborative filtering Assumes user will-

- prefer like-minded prefer- prefer dissimilar dislike

Object ranking by users CF: majority rules, nearest

neighbor, weighted averages (prediction, S.D., covariance) +ve or -ve

Design IssuesTechnical Design Space Content of evaluation: single bit to unstructured

textual notations – ease of use, computation overload Explicit/Implicit evaluation: nature of

recommendation User identity: real names, pseudonyms, anonymous Evaluation aggregation: research area – weighted

voting, content analyses, referral chains, etc. Evaluation usage: filtering out negatives, sorting of

items according to numeric evaluations, display

Design Issues (contd.)

Design Issues (contd.)Domain-Space Characteristics of items evaluated

Domain to which items belong

Sheer volume variable Lifetime – rate of gathering

and distributing evaluations Cost structure – miss a

good item, sample a bad one, costs of incorrect decisions

Domain-Space Characteristics of participants and evaluations

Set of recommenders Recommendation density – do

recommenders tend to evaluate many items in common

Set of consumers Consumer taste variability –

taste matching better for larger set, personalized aggregation better when tastes differ

Design Issues (contd.)

Design Issues (contd.)Social Implications Free Riders: take but not give; mandatory,

monetary incentives; weighted voting to avoid unfair evaluation; discourage “vote early and often” phenomenon

Privacy: information vs. privacy; privacy blends; attributed credit for recommendation efforts; blind refereeing as in peer review system

Advertisers: charge recipients through subscription or pay-per-use; advertiser support; charge owners of the evaluated media

Recommender System Types Collaborative/Social-filtering system – aggregation

of consumers’ preferences and recommendations to other users based on similarity in behavioral patterns

Content-based system – supervised machine learning used to induce a classifier to discriminate between interesting and uninteresting items for the user

Knowledge-based system – knowledge about users and products used to reason what meets the user’s requirements, using discrimination tree, decision support tools, case-based reasoning (CBR)

Content-based Collaborative Information Filtering

Research Assistant Agent Project (RAAP) Nagoya Institute of Technology, Japan Registration, research profile – bookmark database Interesting page -> agent suggestion -> classification

-> reconfirm or change In parallel, agent checks for newly classified bookmarks

-> recommend to other users -> accept/reject on login Text categorization: positive/negative examples, most

similar classifier for candidate class using term weighting, with TF-IDF scheme in Information Retrieval

Content-based Collaborative Information Filtering (contd.)

Relevance feedback – positive/negative prototypes; similarity measure is simt(c,D) = (Qt+.Dt) – (Qt-.Dt)

Feature selection – removal of non-informative terms using Information Gain (IG) using prob. of term present

Learning to recommend – agent counts with 2 matrices; user vs. category matrix (for successful classification) and user’s confidence factor (0.1 to 1) w.r.t. other users to compute correlation

Circular reference avoided – verify that recommended document is not registered in target’s database

Knowledge-based Systems FindMe technique – knowledge-based similarity retrieval User selects source item -> requests similar items “Tweak” application – same but candidate set is filtered

prior to sorting, leaving only candidates satisfying tweak Car Navigator – conversational interaction/navigation

focused around high-level responses PickAFlick – multiple task-specific retrieval strategies RentMe – query menus set, NLP to generate database Recommender Personal Shopper (RPS) – a domain-

independent implementation of FindMe algorithm

Knowledge-based Systems (contd.)

Similarity measures – goal-based, priorities for goals Sorting algorithm – metric-based bucket sorting Retrieval algorithm – priority-ordered metric

constraints, plus tweaks, forming an SQL query Product data – creation of product database in which

unique items are associated with sets of features Metrics – similarity, directional metrics with

preference

Hybrid system – knowledge-based system with collaborative filtering

Recommender Tradeoffs

Technique Pluses Minuses

Knowledge-based

A. No ramp-up requiredB. Detailed qualitative

preference feedbackC. Sensitive to preference

changes

H. Knowledge engineeringI. Suggestion ability is static

Collaborative filtering

D. Can identify niches preciselyE. Domain knowledge not neededF. Quality improves over timeG. Personalized recommendations

J. Quality dependent on large historical data setK. Subject to statistical anomalies in dataL. Insensitive to preference changes

Ideal Hybrid

A, B, C, D, F, G H

ARMaDA Recommender No single partitioning scheme performs the best

for all types of applications and systems Optimal partitioning technique depends on input

parameters and application runtime state Partitioning behavior characterized by the tuple

{partitioner, application, computer system} (PAC) PAC quality characterized by 5-component metric

– communication, load imbalance, data migration, partitioning time, partitioning overhead

Octant approach characterizes application/system state

Adaptive meta-partitioner -> fully dynamic PAC

Dynamic Characterization

RM-3D Switching Test Richtmyer-Meshkov fingering instability in 3

dimension Application trace has 51 time-step iterations RM-3D has more localized adaptation and lower

activity dynamics Depending on computer system, application RM-3D

resides in octants I and III for most of its execution Partitioning schemes pBD-ISP and G-MISP+SP are

suited for these octants Application trace -> Partitioner -> Output trace ->

Simulator -> metric measurements

RM-3D Switching Test (contd.)

RM-3D Switching Test (contd.)

Test Runs CGD – complete run pBD-ISP – complete run CGD+pBD-ISP_load (for improved load balance)

0 – 12 -> CGD 13 – 22 -> pBD-ISP23 – 26 -> CGD 27 – 36 -> pBD-ISP37 – 48 -> CGD 49 – 51 -> pBD-ISP

CGD+pBD-ISP_data (for reduced data migration)0 – 10 -> CGD 11 – 28 -> pBD-ISP29 – 34 -> CGD 35 – 51 -> pBD-ISP

RM-3D Switching Test (contd.)

Metric CGD pBD-ISP CGD+pBD-ISP_load

CGD+pBD-ISP_data

Avg. max. load

imbalance

18.9048 %

37.9821 % 34.749 % 39.3693 %

Avg. avg. data

movement

127.275 18.3137 187.431 110.216

Avg. avg. intra-level

comm.

1063.43 429.804 691.608 723.569

Avg. avg. inter-level

comm.

451.49 0 265.882 127.667

Avg. max. no. of boxes

210.333 2.98039 16.9804 84.8824

Conclusions YES !!! Experimental results conform to theoretical

observations Recommender systems in ARMaDA can result in

performance optimization Future work

- more robust rule-set and switching policies- partitioner/hierarchy optimization at switch-points- integration of recommender engine within ARMaDA- partitioner and application characterization research to form policy rule base

top related