reputation model based on rating data and application in recommender systems

Reputation Model Based On Rating Data and Application in

Recommender Systems

PhD Final SeminarStudent Name: Ahmad Abdel-Hafez

Principle Supervisor: Assoc. Prof. Yue XuAssociate Supervisor: Assoc. Prof. Dian Tjondronegoro

Panel Members:Assoc. Prof. Yue Xu (Chair)

Assoc. Prof. Richi Nayak (EECS)Dr. Ernest Foo (EECS)

Assoc. Prof. Dian Tjondronegoro (IS)

Presentation Contents

• Background on reputation models and reputation-aware recommender systems.

• Research problems and objectives.

• Contributions.

• Normal distribution-based reputation model.

• Beta distribution-based reputation model.

• Reputation-aware recommender system.

• Evaluating the proposed methods

• Conclusions.

Background - Reputation Models

• Many websites nowadays provide rating systems for customers in order to rate available products.

• Reputation systems provide methods for collecting and aggregating users’ opinions in order to calculate products’ reputations.

• The Naïve aggregation method is the arithmetic mean of ratings method.

Reputation System Components

Feedback

Collection

Reputation

Presentation

Reputation

Engine

User

Feedback

(Ratings)

Item

Reputation

Score

Reputation Engine

Detecting and

Filtering

Malicious

Ratings

Rating

Normalisation

- User Rating

Tendency

Relevant Weight

Calculation:

- User Trustworthiness

- User Reliability

- User Expertise

- Time decay

Reputation

Score

Discount or

Reward

Rating

Aggregation

User

Feedback

(Ratings)

Item

Reputation

Score


1. Weighted Mean:

– uses weight for ratings which represent:

• Reviewer reputation, expertise, or reliability [1,2,5,7].

• The time when the rating was given, assuming that newer ratings should have more weight [1,3].

2. Bayesian Reputation Models:

– Jøsang and Haller [3] introduced a multinomial Bayesian probability distribution reputation system based on Dirichlet probability distribution.


3. Fuzzy Reputation Models:– Bharadwaj and Al-Shamri [4] proposed a fuzzy

computational model for trust and reputation, which uses fuzzy rules in order to aggregate the calculated reputation values.

4. Other Reputation Models– Flow Reputation Models: The Google PageRank

calculates page reputation using the number of links to this page from other pages [5]

– Probabilistic reputation models: TRAVOS calculates user trust based on past direct transactions between users. [6]

Background - Reputation-Aware Recommender Systems

• On the other hand, recommender systems are widely used on websites that contain massive number of elements where personalisation becomes a necessity.

• Most of the available recommender systems focus on users’ preferences and rarely involve products reputation in the recommendation process.

Literature Review – Reputation-Aware Recommender Systems

1. Recently, Ku and Tai [15] investigated the effect of recommender system and reputation system on purchase intentions regarding recommended products.

2. Jøsang et al. [11] Proposed Cascading Minimum Common Belief Fusion method to combine reputation scores with recommendation scores.

3. Most recently, Wang et al. [8] proposed a trust-based probabilistic recommendation model. The trust used is for products, which is obtained based on product reputations and purchase frequencies

Research Problems - Problem 1: Sparse Dataset

• Sparse dataset is the dataset with the majority of its items having small rating count.

• Reputation systems depend on the historical feedback to generate reputation scores for items. Therefore, when the available feedback is sparse it becomes more difficult to produce accurate reputation scores for items.

Research Question 1:How to use statistical data in the rating aggregation process to

enhance the accuracy of reputation scores over sparse dataset?

Research Problems - Problem 2: Item Popularity

• An item is considered popular if it has large ratings count in comparison to other items ratings count in the same dataset.

• A reputation score is assumed to reflect the popularity of an item, specifically; unpopular items should not have high reputation scores.

Research Question 2:

How to reflect the item popularity, presented by the count of ratings of an item in the reputation calculation process?

Research Problems - Problem 3: Reputation and Recommendation

• In most Collaborative Filtering (CF) based recommender systems, items’ reputations were not considered as part of the recommendation process. And that may produce a recommendation for an item with low reputation score, which is not likely to be consumed by the user.

Research Question 3:How to incorporate item reputation into recommendation process to

provide more accurate product recommendations for users?

Research Objectives

• Objective 1: To provide a literature review about the reputation system, and reputation-aware recommender systems.

• Objective 2: To propose a new reputation model that deals with the sparsity problem.

• Objective 3: To propose a new reputation model which considers items popularity in the item reputation score.

Research Objectives

• Objective 4: To propose a new method for merging items reputation with the recommender system generated lists.

• Objective 5: To conduct experiments and evaluate the performance of the proposed approaches.

Contributions- In Reputation Models

• We propose two novel reputation models:– The normal distribution-based reputation model

with uncertainty (NDRU), which employs rating level frequencies, and uncertainty factors and produces more accurate reputation scores over sparse datasets.

– The beta distribution based reputation model (BetaDR), which uses the item relative ratings count in its rating aggregation process and beats the state-of-the-art methods in reputation scores accuracy using several well-known datasets. .

Contributions- In Recommender Systems

• We propose reputation-aware top-n recommender system:

– We adopt the concept of voting systems and propose the weighted Borda count (WBC) method to combine item reputations with item recommendation scores.

– We propose using personalised reputation generated list of items.

– We propose a method to calculate user coherence, and use it to determine the contribution weights of item reputations and recommendation scores.

Definitions

• Rating level represents the number of possible rating values that can be assigned to a specific item by a user.

• In the five stars rating system we have 5 rating levels. More instances of rating levels 1 and 2 than rating level 4 and 5 indicate that the item is not favoured by a larger number of customers.

• Usually the middle rating levels such as 3 in a rating scale [1-5] system is the most frequent rating level (we call these rating levels “Popular Rating Levels”) and 1 and 5 are the least frequent levels (we call these levels “Rare Rating Levels”).

Motivation• Using weighted mean reputation models, if we

don’t consider other factors such as time and user credibility, then the weight for each rating is 1

𝑛, where 𝑛 is the number of ratings to an item.

Weights for 7 ratings

RatingsRating Weight

Level Weight

2 0.1429

0.57142 0.1429

2 0.1429

2 0.1429

3 0.1429 0.1429

5 0.14290.2857

5 0.1429

3.0

Normal Distribution-based Reputation Model (NDR)

Weights for 100 ratings Unified Weights for 100 ratings with each score has same frequency (=20)

• Our method can be described as weighted mean; where the weights will be generated using normal distribution function.

Rationale

• We propose to ‘award’ higher frequent rating levels and popular rating levels, and ‘punish’ lower frequent rating levels and rare rating levels.

RatingsRating Weight Level Weight

Naïve NDR Naïve NDR

2 0.1429 0.0765

0.5714 0.6042 0.1429 0.1334

2 0.1429 0.1861

2 0.1429 0.208

3 0.1429 0.1861 0.1429 0.1861

5 0.1429 0.13340.2857 0.2099

5 0.1429 0.0765

3.0 2.8158

Weighting Based on the Normal Distribution

• The weights to the ratings will be calculated using the normal distribution density function

• where 𝑎𝑖 is the weight for the rating at index 𝑖 ,𝑖 = 0,… , 𝑛 − 1 , 𝜇 is the mean, 𝜎 is the standard deviation and . 𝑘 is the number of levels in the rating system

𝑎𝑖 =1

𝜎 2𝜋𝑒−

𝑥𝑖−𝜇2

2𝜎2

𝑥𝑖 =(𝑘−1)×𝑖

𝑛−1+ 1

Calculating The NDR Reputation Score

• The final reputation score is calculated as weighted mean for each rating level. Where 𝐿𝑊𝑙is called level weight

𝑤𝑖 =𝑎𝑖

𝑗=0𝑛−1𝑎𝑗

,

𝑖=0

𝑛−1

𝑤𝑖 = 1

𝑁𝐷𝑅𝑝 =

𝑙=1

𝑘

𝑙 × 𝐿𝑊𝑙𝐿𝑊𝑙 =

𝑗=0

𝑅𝑙 −1

𝑤𝑗𝑙

Enhanced NDR model by adding uncertainty (NDRU)

• We do a slight modification to our proposed NDR method by combining uncertainty factor, which is important to deal with sparse dataset.

𝑁𝐷𝑅𝑈𝑝1 =

𝑙=1

𝑘

𝑙 ×𝑛 × 𝐿𝑊𝑙 + 𝐶 × 𝑏

𝐶 + 𝑛

• 𝐶 = 2 is a priori constant and 𝑏 =1

𝑘is a base

rate for any of the 𝑘 rating values.

Motivation - Item Popularity Problem

• Less popular items (item 1) is expected to have lower reputation scores than popular items (item 2) with similar ratings distribution.

RatingFrequency

Item 1 Item 21 1 252 1 253 1 254 1 255 5 125

Count 9 225Mean 3.89 3.89NDR 4.089 4.052

Median 5 5

Beta Distribution-Based Reputation Model (BetaDR)

• A weighted mean method where the weights will be generated using the beta distribution probability density function.

• The beta distribution is flexible to produce different distribution shapes which reflect the weighting tendency suitable to every item.

• The proposed model considers statistical information of the dataset, including the ratings count.

Weighting Using the BetaDR

• The weights to the ratings will be calculated using the beta-distribution probability density function

• Equation (2) is used to evenly deploy the values of 𝑥𝑖; providing that 0 < 𝑥𝑖< 1, where 𝑥0 = 0.01 and 𝑥𝑛−1 = 0.99.

Beta(𝑥𝑖) =Γ 𝛼+𝛽

Γ 𝛼 Γ 𝛽𝑥𝑖

𝛼−1 1 − 𝑥𝑖𝛽−1 (1)

𝑥𝑖 =0.98×𝑖

𝑛−1+ 0.01 (2)

Symmetric vs Asymmetric Shapes

• The symmetric shapes will ensure the reputation model is fair and unbiased. Occurs when ( = )

• The chart bellow represents examples of the beta distributions, where and > 1

0

0.5

1

1.5

2

2.5

3

1 2 3 4 5 6 7

Index of ratings

Shape 1

Shape 2

Shape 3

𝛼< 𝛽𝛼= 𝛽𝛼> 𝛽

Symmetric Beta-Shapes

• The different symmetric shapes of the Beta distribution

• The x-axis represents ratings indexes and y-axis represents weights.

0

0.04

0.08

0.12

0.16

1 3 5 7 9 11 13 15 17 19

Beta Distribution Probability Density Function (PDF)

U Shape (α=ß=0.5)

Universal Distribution (α=ß=1)

Bell Shape (α=ß=5)

Shape Parameters Equation

𝛼 = 𝛽 = 𝜇

𝜎

2× 𝐼𝑅𝑅𝐶 , 𝜎 ≠ 0

𝐼𝑅𝑅𝐶 , 𝜎 = 0(3)

• We use symmetric shape for the beta distribution all the times. 𝛼 = 𝛽

• 𝛼 = 𝛽 < 1 ∶ The Beta-distribution will generate “U” shape

• 𝛼 = 𝛽 > 1 ∶ The Beta-distribution will generate “Bell” shape

IRRC

• 𝐼𝑅𝑅𝐶 (Item Relative Rating Count): is a ratio of the ratings count of item 𝑃 to the average of ratings for all items in the dataset.

𝐼𝑅𝑅𝐶 =𝑛𝑖

𝑛𝑖, 𝑛𝑖 =

𝑖∈𝑃 𝑛𝑖

𝑃

Weights from Beta Distribution-based Reputation Model (BetaDR)

Example

RatingLevel

Item 1 Item 2

Frequency

Weights per levelFrequency

Weights per level

Avg NDR BetaDR Avg NDR BetaDR

1 1 0.111 0.060 0.396 25 0.111 0.069 0.025

2 1 0.111 0.091 0.043 25 0.111 0.096 0.083

3 1 0.111 0.123 0.027 25 0.111 0.122 0.134

4 1 0.111 0.147 0.022 25 0.111 0.140 0.168

5 5 0.556 0.578 0.511 125 0.556 0.573 0.590

Aggregate 9 3.89 4.089 3.206 225 3.89 4.052 4.215

• Recommenders focus on generating personalized results, without perceiving the global opinions of users about the recommended items.

• The reputation of an item reflects the quality of an item, which could affect a user opinion.

• We propose a method to combine the recommender and reputation systems to enhance the accuracy of the top-n recommender results.

Reputation-Aware Recommender System

A Block Diagram of the Proposed Reputation-Aware Recommender System

UserProfiles

User Profile

Ratings

Users Similarity

Generating NearestNeighbours

Historical Data

Personalised Items’ Reputations Recommendation

ranked list of items

Combined list of Item Recommendations

Item Reputations

ItemProfiles

Item Profile

Ratings

Reputation ranked list of items

Item Clustering

Personalized Item Reputation

• The top items on the reputation-based list are not necessarily the items that a particular user likes.

• We cluster items based on user ratings or based on item categories.

• The personalized reputation is defined as the degrading process for all the items in the reputation-ranked list that do not belong to the user preference.

𝑃𝐼𝑅𝑢,𝑝 = 𝑆(𝑝), 𝑝 ∈ 𝐶𝑖 , 𝐶𝑖 ∈ 𝐹𝑢0, 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Weighted Borda-Count

• The BC method gives each item a number of points corresponding to his rank in each list.

• We introduce a weight to the traditional BC method to emphasize difference between the two lists.

𝑊𝐵𝐶 𝑝 = 𝜔 × 𝐵𝐶𝑟𝑒𝑐 𝑝 + 1 − 𝜔 × 𝐵𝐶𝑟𝑒𝑝 𝑝

𝐵𝐶(𝑝) = 𝐵𝐶𝑟𝑒𝑐(𝑝) + 𝐵𝐶𝑟𝑒𝑝(𝑝)

WBC Example

WBC Example with 𝜔 = 0.7

User Coherence

ω𝑢 =

𝑐∈𝐶

𝑝∈𝐼(𝑢,𝑐) 𝑟𝑢,𝑝 − 𝑟𝑢,𝑐2

𝐼(𝑢, 𝑐)

𝐶

𝐶𝑜ℎ𝑒𝑟𝑒𝑛𝑐𝑒𝑢 = −

𝑐∈𝐶

𝑝∈𝐼(𝑢,𝑐)

𝑟𝑢,𝑝 − 𝑟𝑢,𝑐2

• In order to define the value of ω which will enhance the accuracy of the recommender system, we propose to use user coherence.

• User coherence is defined as the stability of user ratings levels given to items with the same category [14].

• A coherent user has no problem with getting accurate recommendations, unlike incoherent users

Evaluating the Proposed Reputation Models

• Hypothesis 1: Embedding item relative rating count, and the standard deviation of items ratings in calculating items reputations will produce more accurate reputation scores using dense datasets.

• Hypothesis 2: Using uncertainty parameter, alongside with the rating levels frequencies in calculating items reputations will produce more accurate reputation scores when the dataset is sparse.

Experiment 1: Ratings prediction

• The first experiment is to predict an item rating using the item reputation score.

• The hypothesis is that the more accurate the reputation model the closer the scores it generates to actual users’ ratings.

• The mean absolute error (MAE) metric will be used to measure the prediction accuracy.

𝑀𝐴𝐸 =1

𝑃

𝑖=1

𝑚 𝑟∈𝑅𝑝 𝑟𝑝 − 𝑟

𝑅𝑝

Experiment 2: Item Ranking List Similarity

• We compare two lists of items ranked based on their reputation scores generated using different methods.

• 𝑛𝑑 and 𝑛𝑐 is the number of discordant and concordant pairs between the two lists respectively.

𝜏 =𝑛𝑐 − 𝑛𝑑

12𝑛 𝑛 − 1

𝑛𝑑 = 𝑖, 𝑗 𝐴 𝑖 < 𝐴 𝑗 , 𝐵 𝑖 > 𝐵(𝑗)}𝑛𝑐 = 𝑖, 𝑗 𝐴 𝑖 < 𝐴 𝑗 , 𝐵 𝑖 < 𝐵(𝑗)}

Experiment 3: Item Ranking Accuracy

• The IMDb website provides a special calculation for the top-250 movies of all time.

• We compare the top-250 movies generated by each one of the implemented reputation models with the IMDb produced top-250 movies

𝐴𝑃 = 𝑖∈𝐶 𝑃@𝑖

𝐸

𝐺𝑝@𝑡 =5 × 1 −

𝐼𝑝,𝑖𝑚𝑑𝑏 − 𝐼𝑝,𝑟𝑒𝑝𝑡

, 𝑝 ∈ Top−t𝑖𝑚𝑑𝑏

0, 𝑝 ∉ Top−t𝑖𝑚𝑑𝑏

𝐷𝐶𝐺𝑝@𝑡 =

𝑥=1

𝑡2𝐺𝑝@𝑡 − 1

log2 𝑥 + 1𝑛𝐷𝐶𝐺𝑝@𝑡 =

𝐷𝐶𝐺𝑝@𝑡

𝐼𝐷𝐶𝐺𝑝@𝑡

Experiment 4: Reputation-Aware Recommender Accuracy

• We implement the traditional user-based CF as the top-n recommender system.

• The method we use to combine the reputation models with recommender system is the proposed weighted Borda count method (WBC) presented in Chapter 5.

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =Relevant ⋂ Recommended

Recommended𝑅𝑒𝑐𝑎𝑙𝑙 =

Relevant ⋂ Recommended

Relevant

𝐹1−𝑆𝑐𝑜𝑟𝑒 = 2 ×𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙

Merging Method

Reputation ModelRecommender

System

Baseline Models

• Naïve method: arithmetic mean.

• IMDb: a true Bayesian estimation.

• Dirichlet reputation model [3] (2007).

• Fuzzy reputation model [5] (2009).

• PerContRep model [7] (2014).

• Trusti Model [8] (2015).

Datasets• Dense

• Sparse

Dataset #Users #Items #Ratings #Rating LevelsAverage of Ratings Count

Per Item (ARCPI)

ML-100K 943 1682 100,000 5 59.453

ML-1M 6,040 3,952 1,000,209 5 253.089

ML-10M 71,567 10,681 10,000,054 10 936.246

IMDb - 13,479 385,581,991 10 28,606.127

Dataset #Users #Items #Ratings#Rating Levels

Sparsity ARCPI

Book Crossing Dataset

17,854 113,481 277,906 10 0.99998 2.44892

Only 4 ratings per movie (4RPM)

1361 3,706 14261 5 0.99717 3.84808


1760 3,706 21054 5 0.99677 5.68105


2098 3,706 27723 5 0.99643 7.48057

Results of Rating Prediction Experiment Method/Dataset ML-100K ML-1M ML-10M

Naïve 0.796652 0.763668 1.437024

IMDb 0.781558 0.748331 1.401215

Fuzzy 0.795871 0.761167 1.430607

Dirichlet 0.796720 0.764013 1.437291

PerContRep 0.786162 0.752248 1.410826

Trusti 0.783254 0.750015 1.409872

NDR 0.791346 0.756602 1.423804

NDRU 0.790925 0.756756 1.424101

BetaDR 0.770330 0.732876 1.372640

Method/Dataset 4RPM 6RPM 8RPM Book Crossing Dataset

Naïve 0.5577 0.5610 0.5720 1.6957

IMDb 0.5601 0.5618 0.5721 1.6834

Fuzzy 0.5583 0.5628 0.5736 1.6922

Dirichlet 0.5351 0.5514 0.5705 1.6159

PerContRep 0.5602 0.5621 0.5728 1.6904

Trusti 0.5604 0.5622 0.5729 1.6908

NDR 0.5575 0.5598 0.5693 1.6874

NDRU 0.5339 0.5498 0.5676 1.5924

BetaDR 0.5571 0.5592 0.5689 1.6826

Results of Item Ranking List Similarity Experiment

0

0.2

0.4

0.6

0.8

1

0% 20% 40% 60% 80% 100%

Top X% of the ranked lists used in similarity calculation

NDR

NDRU

BetaDR

0

0.2

0.4

0.6

0.8

1

0% 20% 40% 60% 80% 100%


NDR

NDRU

BetaDR

0

0.2

0.4

0.6

0.8

1

0% 20% 40% 60% 80% 100%


NDR

NDRU

BetaDR

0

0.2

0.4

0.6

0.8

1

0% 20% 40% 60% 80% 100%


NDR

NDRU

BetaDR

Kendall similarities with the naïve method using 100K-ML dataset

Kendall similarities with the naïve method using 100K-ML dataset

Kendall similarities with the Trusti method using 8RPM dataset

Kendall similarities with the Trusti method using 8RPM dataset

Method/Metric 𝑷@𝟏𝟎 𝑷@𝟓𝟎 𝑷@𝟏𝟎𝟎 𝑷@𝟐𝟓𝟎 𝑨𝑷 𝒏𝑫𝑪𝑮@𝟏𝟎

𝒏𝑫𝑪𝑮@𝟓𝟎

𝒏𝑫𝑪𝑮@𝟏𝟎𝟎

𝒏𝑫𝑪𝑮@𝟐𝟓𝟎

Naïve 0.1 0.12 0.23 0.304 0.2262 0.0009 0.0045 0.0129 0.0403

Fuzzy 0.1 0.12 0.25 0.312 0.2312 0.0009 0.0042 0.0110 0.0397

Dirichlet 0.1 0.2 0.27 0.336 0.2620 0.0009 0.0082 0.0185 0.0511

PerContRep 0.1 0.12 0.25 0.308 0.2345 0.0009 0.0045 0.0140 0.0417

Trusti 0.7 0.82 0.77 0.756 0.7671 0.5600 0.6212 0.5989 0.5822

NDR 0.1 0.1 0.19 0.28 0.1972 0.0009 0.0038 0.0100 0.0337

NDRU 0.1 0.14 0.19 0.308 0.2217 0.0009 0.0054 0.0114 0.0407

BetaDR 0.2 0.24 0.45 0.432 0.3712 0.1327 0.0277 0.0686 0.1133

Results of Item Ranking Accuracy Experiment

Method/Metric 𝑷@𝟏𝟎 𝑷@𝟓𝟎 𝑷@𝟏𝟎𝟎 𝑷@𝟐𝟓𝟎 𝑨𝑷 𝒏𝑫𝑪𝑮@𝟏𝟎

𝒏𝑫𝑪𝑮@𝟓𝟎

𝒏𝑫𝑪𝑮@𝟏𝟎𝟎

𝒏𝑫𝑪𝑮@𝟐𝟓𝟎

Naïve 0.0 0.0 0.0 0.0156 0.00288 0.0 0.0 0.0 0.00001

Fuzzy 0.0125 0.0185 0.02825 0.0758 0.02652 0.00121 0.00329 0.01642 0.09841

Dirichlet 0.365 0.575 0.5585 0.5882 0.56438 0.05065 0.16393 0.19453 0.24182

PerContRep 0.125 0.385 0.4515 0.5032 0.42906 0.00371 0.04145 0.07726 0.12991

Trusti 0.7 0.642 0.353 0.2604 0.41111 0.55510 0.50833 0.30792 0.17065

NDR 0.005 0.0055 0.00625 0.0746 0.02312 0.0 0.0 0.0 0.00088

NDRU 0.395 0.598 0.573 0.593 0.57775 0.05813 0.17051 0.19782 0.26946

BetaDR 0.095 0.02 0.0115 0.1282 0.03529 0.05877 0.01245 0.00635 0.00829

Used Reputation

Method with CFML-100K ML-1M ML-10M

Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score

N/A 0.0236 0.0412 0.0300 0.0301 0.0327 0.0314 0.0379 0.0306 0.0339

IMDb 0.0305 0.0532 0.0388 0.0369 0.0596 0.0456 0.0401 0.0718 0.0515

Fuzzy 0.0283 0.0494 0.0359 0.0271 0.0399 0.0323 0.0311 0.0407 0.0353

Dirichlet 0.0297 0.0519 0.0377 0.0351 0.0539 0.0425 0.0382 0.0523 0.0442

PerContRep 0.0282 0.0491 0.0358 0.0264 0.0391 0.0316 0.0304 0.0389 0.0341

Trusti 0.0301 0.0519 0.0381 0.0360 0.0561 0.0439 0.0395 0.0681 0.0500

NDR 0.0286 0.0494 0.0362 0.0289 0.0439 0.0349 0.0349 0.0442 0.0390

NDRU 0.0297 0.0518 0.0377 0.0352 0.0537 0.0425 0.0385 0.0533 0.0447

BetaDR 0.0307 0.0540 0.0392 0.0380 0.0643 0.0478 0.0412 0.0791 0.0542

4RPM 6RPM 8RPM

Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score

N/A 0.0080 0.1459 0.0152 0.0088 0.1575 0.0167 0.0096 0.1601 0.0181

IMDb 0.0067 0.0747 0.0123 0.0078 0.1027 0.0145 0.0088 0.1396 0.0166

Fuzzy 0.0071 0.1319 0.0135 0.0084 0.1445 0.0159 0.0096 0.1472 0.0180

Dirichlet 0.0082 0.1468 0.0155 0.0095 0.1593 0.0179 0.0105 0.1629 0.0197

PerContRep 0.0075 0.1362 0.0142 0.0087 0.1481 0.0164 0.0098 0.1506 0.0184

Trusti 0.0043 0.0486 0.0079 0.0064 0.0826 0.0119 0.0083 0.0967 0.0153

NDR 0.0080 0.1461 0.0152 0.0090 0.1581 0.0170 0.0102 0.1611 0.0192

NDRU 0.0083 0.1479 0.0157 0.0097 0.1601 0.0183 0.0111 0.1646 0.0208

BetaDR 0.0081 0.1461 0.0154 0.0093 0.1579 0.0176 0.0103 0.1620 0.0194

Results of Reputation-aware Recommender Accuracy Experiment

Evaluating the Reputation-Aware Recommender System

• Hypothesis 3: The accuracy of recommender systems can be enhanced by combining the item reputation factor using WBC as a merging method. The accuracy enhancement takes place with both sparse and dense datasets.

Recommender Systems and Reputation Models Employed

• We use the BetaDR reputation model for the dense dataset evaluation part. In the sparse evaluation part we used the NDRU reputation model.

• We use two recommender systems for the top-n recommendation experiment:– The traditional user-based CF [12].– The reliability-aware recommender system [13].

Merging Method

Reputation ModelRecommender

System

Datasets• For the dense datasets, we used the ML-100K

and the ML-1M datasets described before.

MovieLens 5% (ML5) MovieLens 10% (ML10)

Number of ratings 6,515 13,077

Sparsity 0.99589 0.99175

Minimum number of ratings per user

5 10

Maximum number of ratings per user

36 73

Average number of ratings per user

6.849 13.867

Minimum number of ratings per movie

0 0

Maximum number of ratings per movie

59 114

Average number of ratings per movie

3.840 7.774

Baseline Methods

• Borda Count Method (BC) [9].

• Coombs Method [10]

• Baldwin method [9].

• Proportional Representation (Naïve method)

• CasMin Method [11] (2013).

• A Trust-Based Probabilistic Recommendation Model (Trusti) [8] (2015).

Results on Dense DatasetMethod Used to Combine Reputation with TCF ML-100K ML-1M

Precision Recall F1-score Precision Recall F1-score

TCF 0.0257 0.0446 0.0326 0.0238 0.0281 0.0258

Naïve (PR) 0.0218 0.0391 0.0280 0.0206 0.0244 0.0223

BC 0.0455 0.0728 0.0560 0.0401 0.0585 0.0476

Baldwin 0.0295 0.0522 0.0377 0.0297 0.0388 0.0337

Coombs 0.0293 0.0512 0.0373 0.0286 0.0383 0.0328

CasMin 0.0078 0.0154 0.0104 0.0061 0.0141 0.0085

Trusti0.0326 0.0543 0.0407 0.0340 0.0460 0.0391

WBC 0.0476 0.0832 0.0606 0.0412 0.0594 0.0486

WBC-P 0.0624 0.1199 0.0820 0.0447 0.0698 0.0545

Method Used to Combine Reputation with RA-

CFML-100K ML-1M


RA-CF 0.0338 0.0555 0.0420 0.0272 0.0331 0.0299

Naïve (PR) 0.0289 0.0473 0.0359 0.0248 0.0314 0.0277

BC 0.0459 0.0695 0.0553 0.0318 0.0474 0.0380

Baldwin 0.0386 0.0630 0.0479 0.0296 0.0418 0.0346

Coombs 0.0385 0.0628 0.0477 0.0266 0.0409 0.0322

CasMin 0.0092 0.0169 0.0119 0.0075 0.0183 0.0106

Trusti0.0413 0.0655 0.0506 0.0352 0.0480 0.0406

WBC 0.0506 0.0786 0.0616 0.0397 0.0663 0.0497

WBC-P 0.0626 0.1195 0.0821 0.0423 0.0738 0.0537

Results on Sparse DatasetMethod Used to Combine Reputation

with TCFML5 ML10


TCF 0.0023 0.0410 0.0044 0.0028 0.0265 0.0051

Naïve (PR) 0.0012 0.0187 0.0022 0.0023 0.0218 0.0041

BC 0.0056 0.0985 0.0106 0.0074 0.0832 0.0136

Baldwin 0.0030 0.0516 0.0057 0.0031 0.0308 0.0057

Coombs 0.0030 0.0515 0.0056 0.0030 0.0303 0.0055

CasMin 0.0004 0.0046 0.0007 0.0005 0.0051 0.0009

Trusti0.0029 0.0517 0.0056 0.0029 0.0288 0.0053

WBC 0.0059 0.0999 0.0111 0.0078 0.0888 0.0143

WBC-P 0.0211 0.3837 0.0400 0.0219 0.2531 0.0403

Method Used to Combine Reputation

with RA-CFML5 ML10


RA-CF 0.0229 0.4137 0.0435 0.0250 0.3009 0.0462

Naïve (PR) 0.0223 0.4034 0.0423 0.0236 0.2850 0.0436

BC 0.0095 0.1708 0.0180 0.0132 0.1542 0.0244

Baldwin 0.0223 0.4021 0.0422 0.0247 0.2966 0.0455

Coombs 0.0222 0.4020 0.0420 0.0242 0.2956 0.0447

CasMin 0.0012 0.0187 0.0023 0.0019 0.0234 0.0035

Trusti0.0204 0.4071 0.0389 0.0216 0.3055 0.0403

WBC 0.0230 0.4217 0.0436 0.0255 0.3019 0.0470

WBC-P 0.0245 0.4397 0.0464 0.0337 0.3975 0.0621

Conclusions

• The first reputation model we propose is the NDRU model. – It uses the normal distribution to generate ratings weights. – It employs the uncertainty factor. – It is more accurate when used with sparse datasets.

• The second reputation model we propose is the BetaDR model. – It uses the beta distribution to generate the weights for

the ratings.– It uses the item relative rating count (IRRC), and the

standard deviation of item’s ratings and its mean. – This model proved to produce more accurate results when

used with dense dataset.

Conclusions (Cont.)

• We propose the weighted Borda count (WBC)method to combine reputation scores with recommender systems in order to enhance the accuracy of the top-n recommendations.

• We propose to generate personalized reputation scores for each user to purify reputation scores and make them useful in recommender systems.

• We noticed that merging reputation models with recommenders has the potential to enhance the recommendation accuracy.

Relevant Publications

• Journal Papers1. Abdel-Hafez, A., & Xu, Y. (2013). A survey of user modelling in social media websites. In Computer and Information

Science, 6(4), pp. 59-71. 2. Abdel-Hafez, A., Xu, Y., & Jøsang, A. (2015). A normal-distribution based rating aggregation method for generating

product reputations. In Web Intelligence. 13(1), pp. 43-51. IOS Press.

• Conference Papers3. Abdel-Hafez, A., Xu, Y., & Tjondronegoro, D. (2012). Product reputation model: an opinion mining based approach.

Paper presented at the 1st International Workshop on Sentiment Discovery from Affective Data (SDAD’12), p16-20, CEUR workshop.

4. Abdel-Hafez, A., & Xu, Y. (2013). Ontology-based product's reputation model. Paper presented at the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 37-40. IEEE.

5. Abdel-Hafez, A., Tang, X., Tian, N., & Xu, Y. (2014). A reputation-enhanced recommender system. Paper presented at Advanced Data Mining and Applications (ADMA’14), pp. 185-198. Springer International Publishing.

6. Abdel-Hafez, A., Xu, Y., & Jøsang, A. (2014). A normal-distribution based reputation model. Paper presented at the Trust, Privacy, and Security in Digital Business (TrustBus’14), pp. 144-155. Springer International Publishing.

7. Abdel-Hafez, A., Phung, Q. V., & Xu, Y. (2014). Utilizing voting systems for ranking user tweets. Paper presented at the 2014 Recommender Systems Challenge (RecSysChallenge’14), pp. 23-28. ACM.

8. Abdel-Hafez, A., Xu, Y., & Tian, N. (2014). Item reputation-aware recommender systems. Paper presented at the 16th International Conference on Information Integration and Web-based Applications & Services (iiWAS’14), pp. 79-86. ACM.

9. Abdel-Hafez, A., Xu, Y., & Jøsang, A. (2014). A rating aggregation method for generating product reputations. Paper presented at the 25th ACM conference on Hypertext and social media (Hypertext’14), pp. 291-293.ACM.

10. Abdel-Hafez, A., Xu, Y., & Jøsang, A. (2015). An accurate rating aggregation method for generating item reputation. Paper to be presented at the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA'15). IEEE. (Accepted)

11. Abdel-Hafez, A., & Xu, Y. (2015). Exploiting the beta distribution-based reputation model in recommender system. Paper to be presented at the 28th Australasian Joint Conference on Artificial Intelligence. (Accepted)

References[1] Ayday, E., Lee, H., & Fekri, F. (2009). An iterative algorithm for trust and reputation management. In IEEE International Symposium on

Information Theory (ISIT), 2051-2055, IEEE.

[2] Riggs, T., & Wilensky, R. (2001). An algorithm for automated rating of reviewers. Paper presented at the Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries.

[3] A. Jøsang and J. Haller, "Dirichlet reputation systems," in Availability, Reliability and Security, 2007. ARES 2007. The Second International Conference on, 2007, pp. 112-119.

[4] Bharadwaj, K. K., & Al-Shamri, M. Y. H. (2009). Fuzzy computational models for trust and reputation systems. Electronic Commerce Research and Applications, 8(1), 37-47.

[5] Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: bringing order to the web.

[6] Teacy, W. L., Patel, J., Jennings, N. R., & Luck, M. (2006). Travos: Trust and reputation in the context of inaccurate information sources. Autonomous Agents and Multi-Agent Systems, 12(2), 183-198.

[7] Yan, Z., Chen, Y., & Shen, Y. (2014). PerContRep: a practical reputation system for pervasive content services. The Journal of Supercomputing, 70(3), 1051-1074.

[8] Wang, Y., Yin, G., Cai, Z., Dong, Y., & Dong, H. (2015). A trust-based probabilistic recommendation model for social networks. Journal of Network and Computer Applications, 55(2015), 59-67.

[9] De Grazia, A. (1953). Mathematical derivation of an election system. Isis, 44(1/2), 42-51.

[10] Coombs, C. H. (1964). A theory of data. New York: Wiley.

[11] Jøsang, A., Pini, M. S., Santini, F., & Xu, Y. (2013). Combining Recommender and Reputation Systems to Produce Better Online Advice. Modeling Decisions for Artificial Intelligence, 8234, 126-138.

[12] Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2000). Analysis of recommendation algorithms for e-commerce. In Proceedings of the 2nd ACM Conference on Electronic Commerce, 158-167, ACM.

[13] Hernando, A., Bobadilla, J., Ortega, F., & Tejedor, J. (2013). Incorporating reliability measurements into the predictions of a recommender system. Information Sciences, 218(2013), 1-16.

[14] Said, A., Jain, B. J., Narr, S., & Plumbaum, T. (2012). Users and noise: The magic barrier of recommender systems User Modeling, Adaptation, and Personalization (pp. 237-248): Springer International Publishing.

[15] Ku, Y.-C., & Tai, Y.-M. (2013). What Happens When Recommendation System Meets Reputation System? The Impact of Recommendation Information on Purchase Intention. In System Sciences (HICSS), 2013 46th Hawaii International Conference on, 1376-1383, IEEE.6

Thank you.

Questions?

reputation model based on rating data and application in recommender systems

Data & Analytics