ranking item features by mining online user item...

1
Ranking Item Features by Mining Online User Item Interactions Sofiane Abbar 2 , Habibur Rahman 12 , Saravanan Thirumuruganathan 12 , Carlos Castillo 2 and Gautam Das 12 1 The University of Texas at Arlington, 2 Qatar Computing Research Institute Motivation Sell Consume Feedback Business Owners Social Media Advertised Products User I Business owners relies on user’s feedback for the sucess of their businesses. I It is important for them to understand what are the features which makes an item popular. I User’s put feedback on items in the form of reviews, tags, likes or +1’s etc. I Can we leverage this infomration to find the ranking of features in an item ? I Can we find the global ranking or popularity of the features? Motivating Example A Few Good Men 10000 10 Sleepers Tom Cruise Kevin Bacon Brad pitt Bipartite Graph representing Items and Features. Ranking of actors in Item (Goal 1). Global ranking the actors(Goal 2). Challenges Actors Visit Count(Naive Transfer) Rank (Naive) 10000 2 10000 + 10 1 10 3 Ranking of actors using Tag-Cloud Approach. I like the movie A few Good Menbecause of “Tom Cruise”. Users do not put elaborate reviews about his preference. I User’s do not give any direct cue why an item is liked. I Transfer of visits/likes from items to feature leads to incorrect ranking. I Although Kevin Bacon gets the highest ranking, it doesn’t answer the question why Sleepers gets so few likes. Model u A Few Good Men(AF) 10000 10 h TC u h KB h BP Sleepers(SL) W AF,TC W SL,TC W SL,KB W AF,KB W AF,BP W SL,BP I A typical user picks an actor according to the initial probability of an actor preference, h j . I Once selected an actor, user picks an item movie to the transistion probability W ij I Hence, we can model this problem as Wh v. Network Flow Modelling u A Few Good Men(AF) 0.33 Sleepers(SL) ? ? ? ? ? ? 0.33 0.33 0.7 0.3 Sink Source I We can model this problem as maximum flow in the network. I We assume there is a total flow of 1.0 inside the network. User is modelled as a source from which an uniform flow is directed towards the features. I This algorithm finds feature to item transition matrix(W) by minimizing |V - Wh| error. Solution(Aggregated User-Item Interaction) 1000 200 400 50 10 X X X X X X X ? ? ? ? I t e m s Features I t e m s I t e m s User User v W h If h is Unknown 1000 200 400 50 10 ? ? ? ? ? ? ? X X X X I t e m s Features I t e m s I t e m s User User v W h If W is unknown I If h is unknown: . Solve for h by minimizing ||v - Wh|| 2 using ordinary least squares such that |h| i=1 h i =1. I If W is unknown: . Solve for W by minimizing ||v - Wh|| 2 using ordinary least squares such that |h| j=1 W ij =1, for all item i.. I Once we have found W and h we can find the ranking of features for each item. Solution(Individual User-Item Interaction) ? ? ? ? ? ? I t e m s Features F e a t u r e s I t e m s Users Users V W H ? ? ? ? ? ? ? I Solve for W and H using marginal Non-Negative Matrix Factorization with stochasticity and sparsity constraints. I H gives individual users’ preference vector over features. I From H, we can calculate h, the global preference vector over features.This can be further used to find the ranking of features for each item. Experiments I Dataset: We considered 1500 movies(items) , 3500 distinct actors(features) from movielens where each movie has rating at least 50. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 3 4 5 Precision@1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3 4 5 6 7 8 9 Precision@1 pco(Prolificity cut -off) FR-AGG-W-LS FR-AGG-h-LS FR-INDIV-MNMF BLnb 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 10 100 1000 10000 100000 5000 10000 15000 20000 Seconds Seconds (log scale) n FR-AGG-W-LS FR-AGG-h-LS FR-INDIV-MNMF FR-AGG-h-NF 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 10 100 1000 10000 100000 5000 10000 15000 20000 Seconds Seconds (log scale) l I Aggregated User-Item Interaction and Individual User-Item Interaction produced better ranking than baseline methods. I Networkflow outperforms other methods in terms of efficiency. References [1] P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. JMLR, 5:1457-1469, Dec.2004. http://dbxlab.uta.edu/ [email protected]

Upload: others

Post on 06-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ranking Item Features by Mining Online User Item Interactionssaravananthirumuruganathan.appspot.com/docs/... · Ranking Item Features by Mining Online User Item Interactions So ane

Ranking Item Features by Mining Online User ItemInteractions

Sofiane Abbar2, Habibur Rahman1 2, Saravanan Thirumuruganathan1 2, Carlos Castillo2 and Gautam Das1 2

1The University of Texas at Arlington,2Qatar Computing Research Institute

Motivation

Sell Consume

Feedback

Business Owners

Social Media

Advertised

Products

User

I Business owners relies on user’s feedback for the sucess of their businesses.I It is important for them to understand what are the features which makes an item

popular.I User’s put feedback on items in the form of reviews, tags, likes or +1’s etc.I Can we leverage this infomration to find the ranking of features in an item ?I Can we find the global ranking or popularity of the features?

Motivating Example

A Few Good Men

10000

10

Sleepers

Tom Cruise

Kevin Bacon

Brad pitt

Bipartite Graph representing Items and Features.

Ranking of actors in Item (Goal 1).

Global ranking the actors(Goal 2).

Challenges

Actors VisitCount(Naive

Transfer)

Rank(Naive)

10000 2

10000 + 10 1

10 3

Ranking of actors using Tag-Cloud Approach.

I like the movie “A few Good Men”

because of “Tom Cruise”.

Users do not put elaborate reviewsabout his preference.

I User’s do not give any direct cue why an item is liked.I Transfer of visits/likes from items to feature leads to incorrect ranking.I Although Kevin Bacon gets the highest ranking, it doesn’t answer the question

why Sleepers gets so few likes.

Model

u

A Few Good Men(AF)

10000

10

hTC

uh

KB

hBP

Sleepers(SL)

WAF,TC

WSL,TC

WSL,KB

WAF,KB

WAF,BP

WSL,BP

I A typical user picks an actor according to the initial probability of an actorpreference, hj.

I Once selected an actor, user picks an item movie to the transistion probability Wij

I Hence, we can model this problem as Wh ≈ v.

Network Flow Modelling

u

A Few Good Men(AF)0.33

Sleepers(SL)

?

?

?

?

?

?

0.33

0.33

0.7

0.3

SinkSource

I We can model this problem as maximum flow in the network.I We assume there is a total flow of 1.0 inside the network. User is modelled as a

source from which an uniform flow is directed towards the features.I This algorithm finds feature to item transition matrix(W) by minimizing|V −Wh| error.

Solution(Aggregated User-Item Interaction)

1000

200

400

50

10

X X

X

X

X X

X

?

?

?

?

Items

Features

Items

Items

User User

v W h

If h is Unknown

1000

200

400

50

10

? ?

?

?

? ?

?

X

X

X

X

Items

Features

Items

Items

User User

v W h

If W is unknown

I If h is unknown:. Solve for h by minimizing ||v −Wh||2 using ordinary least squares such that∑|h|

i=1 hi = 1.I If W is unknown:. Solve for W by minimizing ||v −Wh||2 using ordinary least squares such that∑|h|

j=1Wij = 1, for all item i. .I Once we have found W and h we can find the ranking of features for each item.

Solution(Individual User-Item Interaction)

? ?

?

?

?

?

Items

FeaturesFeatures

Items

Users Users

V W H

? ?

? ?

?

? ?

I Solve for W and H using marginal Non-Negative Matrix Factorization withstochasticity and sparsity constraints.

I H gives individual users’ preference vector over features.I From H, we can calculate h, the global preference vector over features.This can

be further used to find the ranking of features for each item.

Experiments

I Dataset: We considered 1500 movies(items) , 3500 distinct actors(features)from movielens where each movie has rating at least 50.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

3 4 5

Pre

cis

ion

@1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

3 4 5 6 7 8 9

Pre

cis

ion

@1

pco(Prolificity cut -off)

FR-AGG-W-LS

FR-AGG-h-LS

FR-INDIV-MNMF

BLnb0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1

10

100

1000

10000

100000

5000 10000 15000 20000

Seco

nds

Seco

nds (

log sc

ale)

n

FR-AGG-W-LS

FR-AGG-h-LS

FR-INDIV-MNMF

FR-AGG-h-NF

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1

10

100

1000

10000

100000

5000 10000 15000 20000

Seco

nds

Seco

nds (

log sc

ale)

l

I Aggregated User-Item Interaction and Individual User-Item Interaction producedbetter ranking than baseline methods.

I Networkflow outperforms other methods in terms of efficiency.

References

[1] P. O. Hoyer. Non-negative matrix factorization with sparseness constraints.JMLR, 5:1457-1469, Dec.2004.

http://dbxlab.uta.edu/ [email protected]