sofiane abbar, habibur rahman, saravana n thirumuruganathan, carlos castillo, g autam das qatar...

27
SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANAN THIRUMURUGANATHAN, CARLOS CASTILLO, GAUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON Ranking Item Features by Mining Online User-Item Interactions

Upload: sheila-joseph

Post on 05-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

SOFIANE ABBAR, HABIBUR RAHMAN, SARA-VANAN THIRUMURUGANATHAN, CARLOS

CASTILLO, GAUTAM DASQATAR COMPUTING RESEARCH INSTITUTE

UNIVERSITY OF TEXAS AT ARLINGTON

Ranking Item Features by Min-ingOnline User-Item Interactions

Page 2: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Outline

Introduction Motivation and ChallengeModel and ExtensionsExperimental EvaluationRelated WorksConclusion

Page 3: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Business owners relies on user's feedback for the success of their businesses.

It is important for them to understand what are the features which makes an item popular.

User's put feedback on items in the form of reviews, tags, likes or +1's etc.

Can we leverage this information to and the ranking of features in an item ?

Can we and the global ranking or popularity of the features?

Page 4: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

IntroductionThe main focus in this paper is the investigation of

a novel-problem: how to rank the features of each item from user-item interactions.

The principal problem investigated in this paper is stated as FEATURE RANKING(FR) PROBLEM:

Where a set of features, and rudimentary user-item interactions (either at aggregate or individual level)

is given, and how to identify the most important fea-

turesper item (alternatively, a ranked list of features per

item).

Page 5: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

In this paper, the approach propose a probabilistic model that describes user-item interactions in terms of user preference distribution over featureand a feature-item transition matrix that determinethe probability that an item will be chosen, given a feature.

Page 6: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

This paper, used a database of items, where each item is described by a set of attributes, some of which are multi valued. We refer to each of the dis-tinct attribute values of an item as features(or equiva-lently, an item can be described as a set of features)

Sparsity assumption. This paper assumes that among all the features available, each user expresses ℓpreference over a relatively small fraction of them:

Page 7: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Motivation

For example Netflix, a simple user-item interaction wouldInvolve whether the user watched the movie. While someusers could have watched the movie because it starred Tom Hanks, others could have watched it because, in addition it was also directed by Steven Spielberg. Similarly, while some users might buy a car due to its manufacturer, others might buy it for the model and transmission type.

Page 8: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Example

Page 9: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Challenges

Page 10: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Models

A ranking is a relationship between a set of items such that, for any two items, the first is either “ranked higher than”, “ranked lower than” or “ranked equal” ranking is the popularity of items features and suggesting popular item features.

Page 11: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Feature Ranking with Aggregate interaction information

This model assumed that user u first picked a sin-gle

feature j based on their individual preference vec-tor hu

and then selected an item i containing j with prob-ability

proportional to Wij

Page 12: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

FR-AGG-W

Algorithm:Input: Database D and aggregate visit vector v1: W = Estimate feature item transition matrix2: constraints = { ∀i ∈[1, n] hi ≥ 0, ||h||1= 1 }3: h = argmin Error(v,Wh) subject to constraintsh4: Compute Xi = Wi· ◦ h ∀i ∈[1, n]5: return X = {X1,X2, . . . ,Xn}

Page 13: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

FR-AGG-h

Algorithm :

Input: Database D and aggregate visit vector v1: W = Estimate feature-item presence matrix2: h = Estimate aggregate preference vector3: constraints = { W ≤ W and ∀j||W·j ||1= 1 and ∀i, jWij ≥

0 } 4: W = argmin Error(v,Wh) subject to constraints W5: Compute Xi = Wi· ◦ h ∀i ∈[1, n]6: return X = {X1,X2, .. . ,Xn}

Page 14: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Variant Problem 1: (FR-AGG): Given a database D and

Aggregate interaction Vector v, estimate the item-featurevisit vector X (where Xi=Wi·◦h)

For each item I such that Error (v, W h) is mini-mized.

Variant Problem 2 (FR-INDIV): Given a database D and individual interaction matrix V, estimate the item-feature visit vector Xi for each item i (where Xi = Wi· ◦ h, is

the average of columns of H) such that Error (V, W H)

is minimized.

Page 15: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Network Flow

In this, they consider a graph-based represen-tation of the problem that maps to the element.

This algorithm finds feature to item transitionmatrix (W) by minimizing |V-Wh| error

Page 16: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Extensions

Feature Ranking with Composite Features.Baselines.Algorithms

- FR-AGG-W-LS- FR-AGG-h-LS- FR-AGG-h-NF

Evaluation Metrics -jrecision@1 -nDcG@kRanking quality

Page 17: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Proposed method(FR-INDIV-MNMF)

We choose Kullback-Leibler divergence D(V||W H) in order to measure the reconstruction error Between V and W H. This

choice (instead of other Measures such as L2 distance) al-

lows us to design an algorithm that preserves the col-

umn stochasticity constraints in the solution. In what

follows,They propose a four-step algorithm to solve the

problemof ranking item features in the presence of individ-

ual interaction matrix.

Page 18: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Step 1: Imposing sparsity constraints over H. They impose a (row) sparsity constraint over the factorW by assuming a sparse binary matrix W such that W ≤ W

An entry(W)ij = 0 iff item I does not contain feature j

A seemingly similar approach can be used to also impose (column) sparsity constraints over the Factor H by defining a sparse binary matrix H such thatH ≤ H, where an entry (H) jk= 0 if user k has not visitedany item that contains feature j

However, this straightforward approach may not generate adequate sparsity constraints, since the union of distinct fea-

tures of the items that a user has visited may be quite large

Page 19: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Step 2: Iterative algorithm with multiplicative update rules.

In the second step, they propose modifications to thealgorithm to discover factors W and H such that the Reconstruction error D (V ||W H)is minimized

Page 20: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Step 3: Imposing stochastic constraints on W and HThe matrices W and H produced by Step 2 satisfy thesparsity requirements, however, they may not satisfy the col-umn stochastic constraints, which requires that the weightsof each column of W and H sum to 1. In this step we describe a procedure for further modifying W and H such that the sto-chastic constraints are satisfied. We make use of the following theo-rem by Ho and Dooren

Page 21: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Step 4: Computing item-feature visit vectors Xi. Once the feature-item transition matrix W and individual preference matrix H are obtained, then the feature ranking

of any Item can be computed as follows.

First, compute the aggregate preference vector h by averag-ing all

column-wise vectors H.j ∈ , then perform a component wise multiplication between the item’s feature transition vector

Wi. And h,i.e. Xi = Wi. ◦ h .

Page 22: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

FR-INDIV-MNMF

Algorithm: Input: Database D and individual interaction matrix

V1: W = Estimate feature-item presence matrix2: H0 = Initialize a column-wise sparse individual

preferencematrix using setCover (Step 1)

3: Compute W1, H1 = M-NMF(W , H0) (Step 2)4: W, H = Impose stochastic constraints (Step 3)5: Compute h = average (H)6: Compute Xi = Wi oh ∀I ∈ [1, n](Step 4)7: return X ={X1, X2, . . . , Xn}

Page 23: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Experiment

They conduct a comprehensive set of experiments toevaluate the effectiveness and efficiency of various Methods for ranking item features. The ranking quality measured within two scenarios: prediction of the most prominent feature (precision@1) and overall ranking of item features (nDCG@k)

Dataset: MovieLens joint with cast data from IMDB

Page 24: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Result

Page 25: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Related Work

Nonnegative Matrix Factorization (NMF)

Attributes ranking

Feature Ranking.

Page 26: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Conclusion

In this paper, they consider the feature ranking problem

that ranks features of an item by only considering user-item interaction information such as visits.,

definedtwo variants problem based on the granularity of

the interaction information available and proposed dif-

ferent algorithms (based on constrained convex optimiza-

tion, network flow approximation and marginal NMF) to

solvethese variants. In the future, they wish to investi-

gate a variant where users can choose anitem through a weighted combination of features.

Page 27: SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON

Thank You