reputation model based on rating data and application in recommender systems
TRANSCRIPT
Reputation Model Based On Rating Data and Application in
Recommender Systems
PhD Final SeminarStudent Name: Ahmad Abdel-Hafez
Principle Supervisor: Assoc. Prof. Yue XuAssociate Supervisor: Assoc. Prof. Dian Tjondronegoro
Panel Members:Assoc. Prof. Yue Xu (Chair)
Assoc. Prof. Richi Nayak (EECS)Dr. Ernest Foo (EECS)
Assoc. Prof. Dian Tjondronegoro (IS)
Presentation Contents
• Background on reputation models and reputation-aware recommender systems.
• Research problems and objectives.
• Contributions.
• Normal distribution-based reputation model.
• Beta distribution-based reputation model.
• Reputation-aware recommender system.
• Evaluating the proposed methods
• Conclusions.
Background - Reputation Models
• Many websites nowadays provide rating systems for customers in order to rate available products.
• Reputation systems provide methods for collecting and aggregating users’ opinions in order to calculate products’ reputations.
• The Naïve aggregation method is the arithmetic mean of ratings method.
Reputation System Components
Feedback
Collection
Reputation
Presentation
Reputation
Engine
User
Feedback
(Ratings)
Item
Reputation
Score
Reputation Engine
Detecting and
Filtering
Malicious
Ratings
Rating
Normalisation
- User Rating
Tendency
Relevant Weight
Calculation:
- User Trustworthiness
- User Reliability
- User Expertise
- Time decay
Reputation
Score
Discount or
Reward
Rating
Aggregation
User
Feedback
(Ratings)
Item
Reputation
Score
Background - Reputation Models
1. Weighted Mean:
– uses weight for ratings which represent:
• Reviewer reputation, expertise, or reliability [1,2,5,7].
• The time when the rating was given, assuming that newer ratings should have more weight [1,3].
2. Bayesian Reputation Models:
– Jøsang and Haller [3] introduced a multinomial Bayesian probability distribution reputation system based on Dirichlet probability distribution.
Background - Reputation Models
3. Fuzzy Reputation Models:– Bharadwaj and Al-Shamri [4] proposed a fuzzy
computational model for trust and reputation, which uses fuzzy rules in order to aggregate the calculated reputation values.
4. Other Reputation Models– Flow Reputation Models: The Google PageRank
calculates page reputation using the number of links to this page from other pages [5]
– Probabilistic reputation models: TRAVOS calculates user trust based on past direct transactions between users. [6]
Background - Reputation-Aware Recommender Systems
• On the other hand, recommender systems are widely used on websites that contain massive number of elements where personalisation becomes a necessity.
• Most of the available recommender systems focus on users’ preferences and rarely involve products reputation in the recommendation process.
Literature Review – Reputation-Aware Recommender Systems
1. Recently, Ku and Tai [15] investigated the effect of recommender system and reputation system on purchase intentions regarding recommended products.
2. Jøsang et al. [11] Proposed Cascading Minimum Common Belief Fusion method to combine reputation scores with recommendation scores.
3. Most recently, Wang et al. [8] proposed a trust-based probabilistic recommendation model. The trust used is for products, which is obtained based on product reputations and purchase frequencies
Research Problems - Problem 1: Sparse Dataset
• Sparse dataset is the dataset with the majority of its items having small rating count.
• Reputation systems depend on the historical feedback to generate reputation scores for items. Therefore, when the available feedback is sparse it becomes more difficult to produce accurate reputation scores for items.
Research Question 1:How to use statistical data in the rating aggregation process to
enhance the accuracy of reputation scores over sparse dataset?
Research Problems - Problem 2: Item Popularity
• An item is considered popular if it has large ratings count in comparison to other items ratings count in the same dataset.
• A reputation score is assumed to reflect the popularity of an item, specifically; unpopular items should not have high reputation scores.
Research Question 2:
How to reflect the item popularity, presented by the count of ratings of an item in the reputation calculation process?
Research Problems - Problem 3: Reputation and Recommendation
• In most Collaborative Filtering (CF) based recommender systems, items’ reputations were not considered as part of the recommendation process. And that may produce a recommendation for an item with low reputation score, which is not likely to be consumed by the user.
Research Question 3:How to incorporate item reputation into recommendation process to
provide more accurate product recommendations for users?
Research Objectives
• Objective 1: To provide a literature review about the reputation system, and reputation-aware recommender systems.
• Objective 2: To propose a new reputation model that deals with the sparsity problem.
• Objective 3: To propose a new reputation model which considers items popularity in the item reputation score.
Research Objectives
• Objective 4: To propose a new method for merging items reputation with the recommender system generated lists.
• Objective 5: To conduct experiments and evaluate the performance of the proposed approaches.
Contributions- In Reputation Models
• We propose two novel reputation models:– The normal distribution-based reputation model
with uncertainty (NDRU), which employs rating level frequencies, and uncertainty factors and produces more accurate reputation scores over sparse datasets.
– The beta distribution based reputation model (BetaDR), which uses the item relative ratings count in its rating aggregation process and beats the state-of-the-art methods in reputation scores accuracy using several well-known datasets. .
Contributions- In Recommender Systems
• We propose reputation-aware top-n recommender system:
– We adopt the concept of voting systems and propose the weighted Borda count (WBC) method to combine item reputations with item recommendation scores.
– We propose using personalised reputation generated list of items.
– We propose a method to calculate user coherence, and use it to determine the contribution weights of item reputations and recommendation scores.
Definitions
• Rating level represents the number of possible rating values that can be assigned to a specific item by a user.
• In the five stars rating system we have 5 rating levels. More instances of rating levels 1 and 2 than rating level 4 and 5 indicate that the item is not favoured by a larger number of customers.
• Usually the middle rating levels such as 3 in a rating scale [1-5] system is the most frequent rating level (we call these rating levels “Popular Rating Levels”) and 1 and 5 are the least frequent levels (we call these levels “Rare Rating Levels”).
Motivation• Using weighted mean reputation models, if we
don’t consider other factors such as time and user credibility, then the weight for each rating is 1
𝑛, where 𝑛 is the number of ratings to an item.
Weights for 7 ratings
RatingsRating Weight
Level Weight
2 0.1429
0.57142 0.1429
2 0.1429
2 0.1429
3 0.1429 0.1429
5 0.14290.2857
5 0.1429
3.0
Normal Distribution-based Reputation Model (NDR)
Weights for 100 ratings Unified Weights for 100 ratings with each score has same frequency (=20)
• Our method can be described as weighted mean; where the weights will be generated using normal distribution function.
Rationale
• We propose to ‘award’ higher frequent rating levels and popular rating levels, and ‘punish’ lower frequent rating levels and rare rating levels.
RatingsRating Weight Level Weight
Naïve NDR Naïve NDR
2 0.1429 0.0765
0.5714 0.6042 0.1429 0.1334
2 0.1429 0.1861
2 0.1429 0.208
3 0.1429 0.1861 0.1429 0.1861
5 0.1429 0.13340.2857 0.2099
5 0.1429 0.0765
3.0 2.8158
Weighting Based on the Normal Distribution
• The weights to the ratings will be calculated using the normal distribution density function
• where 𝑎𝑖 is the weight for the rating at index 𝑖 ,𝑖 = 0,… , 𝑛 − 1 , 𝜇 is the mean, 𝜎 is the standard deviation and . 𝑘 is the number of levels in the rating system
𝑎𝑖 =1
𝜎 2𝜋𝑒−
𝑥𝑖−𝜇2
2𝜎2
𝑥𝑖 =(𝑘−1)×𝑖
𝑛−1+ 1
Calculating The NDR Reputation Score
• The final reputation score is calculated as weighted mean for each rating level. Where 𝐿𝑊𝑙is called level weight
𝑤𝑖 =𝑎𝑖
𝑗=0𝑛−1𝑎𝑗
,
𝑖=0
𝑛−1
𝑤𝑖 = 1
𝑁𝐷𝑅𝑝 =
𝑙=1
𝑘
𝑙 × 𝐿𝑊𝑙𝐿𝑊𝑙 =
𝑗=0
𝑅𝑙 −1
𝑤𝑗𝑙
Enhanced NDR model by adding uncertainty (NDRU)
• We do a slight modification to our proposed NDR method by combining uncertainty factor, which is important to deal with sparse dataset.
𝑁𝐷𝑅𝑈𝑝1 =
𝑙=1
𝑘
𝑙 ×𝑛 × 𝐿𝑊𝑙 + 𝐶 × 𝑏
𝐶 + 𝑛
• 𝐶 = 2 is a priori constant and 𝑏 =1
𝑘is a base
rate for any of the 𝑘 rating values.
Motivation - Item Popularity Problem
• Less popular items (item 1) is expected to have lower reputation scores than popular items (item 2) with similar ratings distribution.
RatingFrequency
Item 1 Item 21 1 252 1 253 1 254 1 255 5 125
Count 9 225Mean 3.89 3.89NDR 4.089 4.052
Median 5 5
Beta Distribution-Based Reputation Model (BetaDR)
• A weighted mean method where the weights will be generated using the beta distribution probability density function.
• The beta distribution is flexible to produce different distribution shapes which reflect the weighting tendency suitable to every item.
• The proposed model considers statistical information of the dataset, including the ratings count.
Weighting Using the BetaDR
• The weights to the ratings will be calculated using the beta-distribution probability density function
• Equation (2) is used to evenly deploy the values of 𝑥𝑖; providing that 0 < 𝑥𝑖< 1, where 𝑥0 = 0.01 and 𝑥𝑛−1 = 0.99.
Beta(𝑥𝑖) =Γ 𝛼+𝛽
Γ 𝛼 Γ 𝛽𝑥𝑖
𝛼−1 1 − 𝑥𝑖𝛽−1 (1)
𝑥𝑖 =0.98×𝑖
𝑛−1+ 0.01 (2)
Symmetric vs Asymmetric Shapes
• The symmetric shapes will ensure the reputation model is fair and unbiased. Occurs when ( = )
• The chart bellow represents examples of the beta distributions, where and > 1
0
0.5
1
1.5
2
2.5
3
1 2 3 4 5 6 7
Index of ratings
Shape 1
Shape 2
Shape 3
𝛼< 𝛽𝛼= 𝛽𝛼> 𝛽
Symmetric Beta-Shapes
• The different symmetric shapes of the Beta distribution
• The x-axis represents ratings indexes and y-axis represents weights.
0
0.04
0.08
0.12
0.16
1 3 5 7 9 11 13 15 17 19
Beta Distribution Probability Density Function (PDF)
U Shape (α=ß=0.5)
Universal Distribution (α=ß=1)
Bell Shape (α=ß=5)
Shape Parameters Equation
𝛼 = 𝛽 = 𝜇
𝜎
2× 𝐼𝑅𝑅𝐶 , 𝜎 ≠ 0
𝐼𝑅𝑅𝐶 , 𝜎 = 0(3)
• We use symmetric shape for the beta distribution all the times. 𝛼 = 𝛽
• 𝛼 = 𝛽 < 1 ∶ The Beta-distribution will generate “U” shape
• 𝛼 = 𝛽 > 1 ∶ The Beta-distribution will generate “Bell” shape
IRRC
• 𝐼𝑅𝑅𝐶 (Item Relative Rating Count): is a ratio of the ratings count of item 𝑃 to the average of ratings for all items in the dataset.
𝐼𝑅𝑅𝐶 =𝑛𝑖
𝑛𝑖, 𝑛𝑖 =
𝑖∈𝑃 𝑛𝑖
𝑃
Weights from Beta Distribution-based Reputation Model (BetaDR)
Example
RatingLevel
Item 1 Item 2
Frequency
Weights per levelFrequency
Weights per level
Avg NDR BetaDR Avg NDR BetaDR
1 1 0.111 0.060 0.396 25 0.111 0.069 0.025
2 1 0.111 0.091 0.043 25 0.111 0.096 0.083
3 1 0.111 0.123 0.027 25 0.111 0.122 0.134
4 1 0.111 0.147 0.022 25 0.111 0.140 0.168
5 5 0.556 0.578 0.511 125 0.556 0.573 0.590
Aggregate 9 3.89 4.089 3.206 225 3.89 4.052 4.215
• Recommenders focus on generating personalized results, without perceiving the global opinions of users about the recommended items.
• The reputation of an item reflects the quality of an item, which could affect a user opinion.
• We propose a method to combine the recommender and reputation systems to enhance the accuracy of the top-n recommender results.
Reputation-Aware Recommender System
A Block Diagram of the Proposed Reputation-Aware Recommender System
UserProfiles
User Profile
Ratings
Users Similarity
Generating NearestNeighbours
Historical Data
Personalised Items’ Reputations Recommendation
ranked list of items
Combined list of Item Recommendations
Item Reputations
ItemProfiles
Item Profile
Ratings
Reputation ranked list of items
Item Clustering
Personalized Item Reputation
• The top items on the reputation-based list are not necessarily the items that a particular user likes.
• We cluster items based on user ratings or based on item categories.
• The personalized reputation is defined as the degrading process for all the items in the reputation-ranked list that do not belong to the user preference.
𝑃𝐼𝑅𝑢,𝑝 = 𝑆(𝑝), 𝑝 ∈ 𝐶𝑖 , 𝐶𝑖 ∈ 𝐹𝑢0, 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Weighted Borda-Count
• The BC method gives each item a number of points corresponding to his rank in each list.
• We introduce a weight to the traditional BC method to emphasize difference between the two lists.
𝑊𝐵𝐶 𝑝 = 𝜔 × 𝐵𝐶𝑟𝑒𝑐 𝑝 + 1 − 𝜔 × 𝐵𝐶𝑟𝑒𝑝 𝑝
𝐵𝐶(𝑝) = 𝐵𝐶𝑟𝑒𝑐(𝑝) + 𝐵𝐶𝑟𝑒𝑝(𝑝)
WBC Example
WBC Example with 𝜔 = 0.7
User Coherence
ω𝑢 =
𝑐∈𝐶
𝑝∈𝐼(𝑢,𝑐) 𝑟𝑢,𝑝 − 𝑟𝑢,𝑐2
𝐼(𝑢, 𝑐)
𝐶
𝐶𝑜ℎ𝑒𝑟𝑒𝑛𝑐𝑒𝑢 = −
𝑐∈𝐶
𝑝∈𝐼(𝑢,𝑐)
𝑟𝑢,𝑝 − 𝑟𝑢,𝑐2
• In order to define the value of ω which will enhance the accuracy of the recommender system, we propose to use user coherence.
• User coherence is defined as the stability of user ratings levels given to items with the same category [14].
• A coherent user has no problem with getting accurate recommendations, unlike incoherent users
Evaluating the Proposed Reputation Models
• Hypothesis 1: Embedding item relative rating count, and the standard deviation of items ratings in calculating items reputations will produce more accurate reputation scores using dense datasets.
• Hypothesis 2: Using uncertainty parameter, alongside with the rating levels frequencies in calculating items reputations will produce more accurate reputation scores when the dataset is sparse.
Experiment 1: Ratings prediction
• The first experiment is to predict an item rating using the item reputation score.
• The hypothesis is that the more accurate the reputation model the closer the scores it generates to actual users’ ratings.
• The mean absolute error (MAE) metric will be used to measure the prediction accuracy.
𝑀𝐴𝐸 =1
𝑃
𝑖=1
𝑚 𝑟∈𝑅𝑝 𝑟𝑝 − 𝑟
𝑅𝑝
Experiment 2: Item Ranking List Similarity
• We compare two lists of items ranked based on their reputation scores generated using different methods.
• 𝑛𝑑 and 𝑛𝑐 is the number of discordant and concordant pairs between the two lists respectively.
𝜏 =𝑛𝑐 − 𝑛𝑑
12𝑛 𝑛 − 1
𝑛𝑑 = 𝑖, 𝑗 𝐴 𝑖 < 𝐴 𝑗 , 𝐵 𝑖 > 𝐵(𝑗)}𝑛𝑐 = 𝑖, 𝑗 𝐴 𝑖 < 𝐴 𝑗 , 𝐵 𝑖 < 𝐵(𝑗)}
Experiment 3: Item Ranking Accuracy
• The IMDb website provides a special calculation for the top-250 movies of all time.
• We compare the top-250 movies generated by each one of the implemented reputation models with the IMDb produced top-250 movies
𝐴𝑃 = 𝑖∈𝐶 𝑃@𝑖
𝐸
𝐺𝑝@𝑡 =5 × 1 −
𝐼𝑝,𝑖𝑚𝑑𝑏 − 𝐼𝑝,𝑟𝑒𝑝𝑡
, 𝑝 ∈ Top−t𝑖𝑚𝑑𝑏
0, 𝑝 ∉ Top−t𝑖𝑚𝑑𝑏
𝐷𝐶𝐺𝑝@𝑡 =
𝑥=1
𝑡2𝐺𝑝@𝑡 − 1
log2 𝑥 + 1𝑛𝐷𝐶𝐺𝑝@𝑡 =
𝐷𝐶𝐺𝑝@𝑡
𝐼𝐷𝐶𝐺𝑝@𝑡
Experiment 4: Reputation-Aware Recommender Accuracy
• We implement the traditional user-based CF as the top-n recommender system.
• The method we use to combine the reputation models with recommender system is the proposed weighted Borda count method (WBC) presented in Chapter 5.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =Relevant ⋂ Recommended
Recommended𝑅𝑒𝑐𝑎𝑙𝑙 =
Relevant ⋂ Recommended
Relevant
𝐹1−𝑆𝑐𝑜𝑟𝑒 = 2 ×𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
Merging Method
Reputation ModelRecommender
System
Baseline Models
• Naïve method: arithmetic mean.
• IMDb: a true Bayesian estimation.
• Dirichlet reputation model [3] (2007).
• Fuzzy reputation model [5] (2009).
• PerContRep model [7] (2014).
• Trusti Model [8] (2015).
Datasets• Dense
• Sparse
Dataset #Users #Items #Ratings #Rating LevelsAverage of Ratings Count
Per Item (ARCPI)
ML-100K 943 1682 100,000 5 59.453
ML-1M 6,040 3,952 1,000,209 5 253.089
ML-10M 71,567 10,681 10,000,054 10 936.246
IMDb - 13,479 385,581,991 10 28,606.127
Dataset #Users #Items #Ratings#Rating Levels
Sparsity ARCPI
Book Crossing Dataset
17,854 113,481 277,906 10 0.99998 2.44892
Only 4 ratings per movie (4RPM)
1361 3,706 14261 5 0.99717 3.84808
Only 6 ratings per movie (6RPM)
1760 3,706 21054 5 0.99677 5.68105
Only 8 ratings per movie (8RPM)
2098 3,706 27723 5 0.99643 7.48057
Results of Rating Prediction Experiment Method/Dataset ML-100K ML-1M ML-10M
Naïve 0.796652 0.763668 1.437024
IMDb 0.781558 0.748331 1.401215
Fuzzy 0.795871 0.761167 1.430607
Dirichlet 0.796720 0.764013 1.437291
PerContRep 0.786162 0.752248 1.410826
Trusti 0.783254 0.750015 1.409872
NDR 0.791346 0.756602 1.423804
NDRU 0.790925 0.756756 1.424101
BetaDR 0.770330 0.732876 1.372640
Method/Dataset 4RPM 6RPM 8RPM Book Crossing Dataset
Naïve 0.5577 0.5610 0.5720 1.6957
IMDb 0.5601 0.5618 0.5721 1.6834
Fuzzy 0.5583 0.5628 0.5736 1.6922
Dirichlet 0.5351 0.5514 0.5705 1.6159
PerContRep 0.5602 0.5621 0.5728 1.6904
Trusti 0.5604 0.5622 0.5729 1.6908
NDR 0.5575 0.5598 0.5693 1.6874
NDRU 0.5339 0.5498 0.5676 1.5924
BetaDR 0.5571 0.5592 0.5689 1.6826
Results of Item Ranking List Similarity Experiment
0
0.2
0.4
0.6
0.8
1
0% 20% 40% 60% 80% 100%
Top X% of the ranked lists used in similarity calculation
NDR
NDRU
BetaDR
0
0.2
0.4
0.6
0.8
1
0% 20% 40% 60% 80% 100%
Top X% of the ranked lists used in similarity calculation
NDR
NDRU
BetaDR
0
0.2
0.4
0.6
0.8
1
0% 20% 40% 60% 80% 100%
Top X% of the ranked lists used in similarity calculation
NDR
NDRU
BetaDR
0
0.2
0.4
0.6
0.8
1
0% 20% 40% 60% 80% 100%
Top X% of the ranked lists used in similarity calculation
NDR
NDRU
BetaDR
Kendall similarities with the naïve method using 100K-ML dataset
Kendall similarities with the naïve method using 100K-ML dataset
Kendall similarities with the Trusti method using 8RPM dataset
Kendall similarities with the Trusti method using 8RPM dataset
Method/Metric 𝑷@𝟏𝟎 𝑷@𝟓𝟎 𝑷@𝟏𝟎𝟎 𝑷@𝟐𝟓𝟎 𝑨𝑷 𝒏𝑫𝑪𝑮@𝟏𝟎
𝒏𝑫𝑪𝑮@𝟓𝟎
𝒏𝑫𝑪𝑮@𝟏𝟎𝟎
𝒏𝑫𝑪𝑮@𝟐𝟓𝟎
Naïve 0.1 0.12 0.23 0.304 0.2262 0.0009 0.0045 0.0129 0.0403
Fuzzy 0.1 0.12 0.25 0.312 0.2312 0.0009 0.0042 0.0110 0.0397
Dirichlet 0.1 0.2 0.27 0.336 0.2620 0.0009 0.0082 0.0185 0.0511
PerContRep 0.1 0.12 0.25 0.308 0.2345 0.0009 0.0045 0.0140 0.0417
Trusti 0.7 0.82 0.77 0.756 0.7671 0.5600 0.6212 0.5989 0.5822
NDR 0.1 0.1 0.19 0.28 0.1972 0.0009 0.0038 0.0100 0.0337
NDRU 0.1 0.14 0.19 0.308 0.2217 0.0009 0.0054 0.0114 0.0407
BetaDR 0.2 0.24 0.45 0.432 0.3712 0.1327 0.0277 0.0686 0.1133
Results of Item Ranking Accuracy Experiment
Method/Metric 𝑷@𝟏𝟎 𝑷@𝟓𝟎 𝑷@𝟏𝟎𝟎 𝑷@𝟐𝟓𝟎 𝑨𝑷 𝒏𝑫𝑪𝑮@𝟏𝟎
𝒏𝑫𝑪𝑮@𝟓𝟎
𝒏𝑫𝑪𝑮@𝟏𝟎𝟎
𝒏𝑫𝑪𝑮@𝟐𝟓𝟎
Naïve 0.0 0.0 0.0 0.0156 0.00288 0.0 0.0 0.0 0.00001
Fuzzy 0.0125 0.0185 0.02825 0.0758 0.02652 0.00121 0.00329 0.01642 0.09841
Dirichlet 0.365 0.575 0.5585 0.5882 0.56438 0.05065 0.16393 0.19453 0.24182
PerContRep 0.125 0.385 0.4515 0.5032 0.42906 0.00371 0.04145 0.07726 0.12991
Trusti 0.7 0.642 0.353 0.2604 0.41111 0.55510 0.50833 0.30792 0.17065
NDR 0.005 0.0055 0.00625 0.0746 0.02312 0.0 0.0 0.0 0.00088
NDRU 0.395 0.598 0.573 0.593 0.57775 0.05813 0.17051 0.19782 0.26946
BetaDR 0.095 0.02 0.0115 0.1282 0.03529 0.05877 0.01245 0.00635 0.00829
Used Reputation
Method with CFML-100K ML-1M ML-10M
Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score
N/A 0.0236 0.0412 0.0300 0.0301 0.0327 0.0314 0.0379 0.0306 0.0339
IMDb 0.0305 0.0532 0.0388 0.0369 0.0596 0.0456 0.0401 0.0718 0.0515
Fuzzy 0.0283 0.0494 0.0359 0.0271 0.0399 0.0323 0.0311 0.0407 0.0353
Dirichlet 0.0297 0.0519 0.0377 0.0351 0.0539 0.0425 0.0382 0.0523 0.0442
PerContRep 0.0282 0.0491 0.0358 0.0264 0.0391 0.0316 0.0304 0.0389 0.0341
Trusti 0.0301 0.0519 0.0381 0.0360 0.0561 0.0439 0.0395 0.0681 0.0500
NDR 0.0286 0.0494 0.0362 0.0289 0.0439 0.0349 0.0349 0.0442 0.0390
NDRU 0.0297 0.0518 0.0377 0.0352 0.0537 0.0425 0.0385 0.0533 0.0447
BetaDR 0.0307 0.0540 0.0392 0.0380 0.0643 0.0478 0.0412 0.0791 0.0542
4RPM 6RPM 8RPM
Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score
N/A 0.0080 0.1459 0.0152 0.0088 0.1575 0.0167 0.0096 0.1601 0.0181
IMDb 0.0067 0.0747 0.0123 0.0078 0.1027 0.0145 0.0088 0.1396 0.0166
Fuzzy 0.0071 0.1319 0.0135 0.0084 0.1445 0.0159 0.0096 0.1472 0.0180
Dirichlet 0.0082 0.1468 0.0155 0.0095 0.1593 0.0179 0.0105 0.1629 0.0197
PerContRep 0.0075 0.1362 0.0142 0.0087 0.1481 0.0164 0.0098 0.1506 0.0184
Trusti 0.0043 0.0486 0.0079 0.0064 0.0826 0.0119 0.0083 0.0967 0.0153
NDR 0.0080 0.1461 0.0152 0.0090 0.1581 0.0170 0.0102 0.1611 0.0192
NDRU 0.0083 0.1479 0.0157 0.0097 0.1601 0.0183 0.0111 0.1646 0.0208
BetaDR 0.0081 0.1461 0.0154 0.0093 0.1579 0.0176 0.0103 0.1620 0.0194
Results of Reputation-aware Recommender Accuracy Experiment
Evaluating the Reputation-Aware Recommender System
• Hypothesis 3: The accuracy of recommender systems can be enhanced by combining the item reputation factor using WBC as a merging method. The accuracy enhancement takes place with both sparse and dense datasets.
Recommender Systems and Reputation Models Employed
• We use the BetaDR reputation model for the dense dataset evaluation part. In the sparse evaluation part we used the NDRU reputation model.
• We use two recommender systems for the top-n recommendation experiment:– The traditional user-based CF [12].– The reliability-aware recommender system [13].
Merging Method
Reputation ModelRecommender
System
Datasets• For the dense datasets, we used the ML-100K
and the ML-1M datasets described before.
MovieLens 5% (ML5) MovieLens 10% (ML10)
Number of ratings 6,515 13,077
Sparsity 0.99589 0.99175
Minimum number of ratings per user
5 10
Maximum number of ratings per user
36 73
Average number of ratings per user
6.849 13.867
Minimum number of ratings per movie
0 0
Maximum number of ratings per movie
59 114
Average number of ratings per movie
3.840 7.774
Baseline Methods
• Borda Count Method (BC) [9].
• Coombs Method [10]
• Baldwin method [9].
• Proportional Representation (Naïve method)
• CasMin Method [11] (2013).
• A Trust-Based Probabilistic Recommendation Model (Trusti) [8] (2015).
Results on Dense DatasetMethod Used to Combine Reputation with TCF ML-100K ML-1M
Precision Recall F1-score Precision Recall F1-score
TCF 0.0257 0.0446 0.0326 0.0238 0.0281 0.0258
Naïve (PR) 0.0218 0.0391 0.0280 0.0206 0.0244 0.0223
BC 0.0455 0.0728 0.0560 0.0401 0.0585 0.0476
Baldwin 0.0295 0.0522 0.0377 0.0297 0.0388 0.0337
Coombs 0.0293 0.0512 0.0373 0.0286 0.0383 0.0328
CasMin 0.0078 0.0154 0.0104 0.0061 0.0141 0.0085
Trusti0.0326 0.0543 0.0407 0.0340 0.0460 0.0391
WBC 0.0476 0.0832 0.0606 0.0412 0.0594 0.0486
WBC-P 0.0624 0.1199 0.0820 0.0447 0.0698 0.0545
Method Used to Combine Reputation with RA-
CFML-100K ML-1M
Precision Recall F1-score Precision Recall F1-score
RA-CF 0.0338 0.0555 0.0420 0.0272 0.0331 0.0299
Naïve (PR) 0.0289 0.0473 0.0359 0.0248 0.0314 0.0277
BC 0.0459 0.0695 0.0553 0.0318 0.0474 0.0380
Baldwin 0.0386 0.0630 0.0479 0.0296 0.0418 0.0346
Coombs 0.0385 0.0628 0.0477 0.0266 0.0409 0.0322
CasMin 0.0092 0.0169 0.0119 0.0075 0.0183 0.0106
Trusti0.0413 0.0655 0.0506 0.0352 0.0480 0.0406
WBC 0.0506 0.0786 0.0616 0.0397 0.0663 0.0497
WBC-P 0.0626 0.1195 0.0821 0.0423 0.0738 0.0537
Results on Sparse DatasetMethod Used to Combine Reputation
with TCFML5 ML10
Precision Recall F1-score Precision Recall F1-score
TCF 0.0023 0.0410 0.0044 0.0028 0.0265 0.0051
Naïve (PR) 0.0012 0.0187 0.0022 0.0023 0.0218 0.0041
BC 0.0056 0.0985 0.0106 0.0074 0.0832 0.0136
Baldwin 0.0030 0.0516 0.0057 0.0031 0.0308 0.0057
Coombs 0.0030 0.0515 0.0056 0.0030 0.0303 0.0055
CasMin 0.0004 0.0046 0.0007 0.0005 0.0051 0.0009
Trusti0.0029 0.0517 0.0056 0.0029 0.0288 0.0053
WBC 0.0059 0.0999 0.0111 0.0078 0.0888 0.0143
WBC-P 0.0211 0.3837 0.0400 0.0219 0.2531 0.0403
Method Used to Combine Reputation
with RA-CFML5 ML10
Precision Recall F1-score Precision Recall F1-score
RA-CF 0.0229 0.4137 0.0435 0.0250 0.3009 0.0462
Naïve (PR) 0.0223 0.4034 0.0423 0.0236 0.2850 0.0436
BC 0.0095 0.1708 0.0180 0.0132 0.1542 0.0244
Baldwin 0.0223 0.4021 0.0422 0.0247 0.2966 0.0455
Coombs 0.0222 0.4020 0.0420 0.0242 0.2956 0.0447
CasMin 0.0012 0.0187 0.0023 0.0019 0.0234 0.0035
Trusti0.0204 0.4071 0.0389 0.0216 0.3055 0.0403
WBC 0.0230 0.4217 0.0436 0.0255 0.3019 0.0470
WBC-P 0.0245 0.4397 0.0464 0.0337 0.3975 0.0621
Conclusions
• The first reputation model we propose is the NDRU model. – It uses the normal distribution to generate ratings weights. – It employs the uncertainty factor. – It is more accurate when used with sparse datasets.
• The second reputation model we propose is the BetaDR model. – It uses the beta distribution to generate the weights for
the ratings.– It uses the item relative rating count (IRRC), and the
standard deviation of item’s ratings and its mean. – This model proved to produce more accurate results when
used with dense dataset.
Conclusions (Cont.)
• We propose the weighted Borda count (WBC)method to combine reputation scores with recommender systems in order to enhance the accuracy of the top-n recommendations.
• We propose to generate personalized reputation scores for each user to purify reputation scores and make them useful in recommender systems.
• We noticed that merging reputation models with recommenders has the potential to enhance the recommendation accuracy.
Relevant Publications
• Journal Papers1. Abdel-Hafez, A., & Xu, Y. (2013). A survey of user modelling in social media websites. In Computer and Information
Science, 6(4), pp. 59-71. 2. Abdel-Hafez, A., Xu, Y., & Jøsang, A. (2015). A normal-distribution based rating aggregation method for generating
product reputations. In Web Intelligence. 13(1), pp. 43-51. IOS Press.
• Conference Papers3. Abdel-Hafez, A., Xu, Y., & Tjondronegoro, D. (2012). Product reputation model: an opinion mining based approach.
Paper presented at the 1st International Workshop on Sentiment Discovery from Affective Data (SDAD’12), p16-20, CEUR workshop.
4. Abdel-Hafez, A., & Xu, Y. (2013). Ontology-based product's reputation model. Paper presented at the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 37-40. IEEE.
5. Abdel-Hafez, A., Tang, X., Tian, N., & Xu, Y. (2014). A reputation-enhanced recommender system. Paper presented at Advanced Data Mining and Applications (ADMA’14), pp. 185-198. Springer International Publishing.
6. Abdel-Hafez, A., Xu, Y., & Jøsang, A. (2014). A normal-distribution based reputation model. Paper presented at the Trust, Privacy, and Security in Digital Business (TrustBus’14), pp. 144-155. Springer International Publishing.
7. Abdel-Hafez, A., Phung, Q. V., & Xu, Y. (2014). Utilizing voting systems for ranking user tweets. Paper presented at the 2014 Recommender Systems Challenge (RecSysChallenge’14), pp. 23-28. ACM.
8. Abdel-Hafez, A., Xu, Y., & Tian, N. (2014). Item reputation-aware recommender systems. Paper presented at the 16th International Conference on Information Integration and Web-based Applications & Services (iiWAS’14), pp. 79-86. ACM.
9. Abdel-Hafez, A., Xu, Y., & Jøsang, A. (2014). A rating aggregation method for generating product reputations. Paper presented at the 25th ACM conference on Hypertext and social media (Hypertext’14), pp. 291-293.ACM.
10. Abdel-Hafez, A., Xu, Y., & Jøsang, A. (2015). An accurate rating aggregation method for generating item reputation. Paper to be presented at the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA'15). IEEE. (Accepted)
11. Abdel-Hafez, A., & Xu, Y. (2015). Exploiting the beta distribution-based reputation model in recommender system. Paper to be presented at the 28th Australasian Joint Conference on Artificial Intelligence. (Accepted)
References[1] Ayday, E., Lee, H., & Fekri, F. (2009). An iterative algorithm for trust and reputation management. In IEEE International Symposium on
Information Theory (ISIT), 2051-2055, IEEE.
[2] Riggs, T., & Wilensky, R. (2001). An algorithm for automated rating of reviewers. Paper presented at the Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries.
[3] A. Jøsang and J. Haller, "Dirichlet reputation systems," in Availability, Reliability and Security, 2007. ARES 2007. The Second International Conference on, 2007, pp. 112-119.
[4] Bharadwaj, K. K., & Al-Shamri, M. Y. H. (2009). Fuzzy computational models for trust and reputation systems. Electronic Commerce Research and Applications, 8(1), 37-47.
[5] Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: bringing order to the web.
[6] Teacy, W. L., Patel, J., Jennings, N. R., & Luck, M. (2006). Travos: Trust and reputation in the context of inaccurate information sources. Autonomous Agents and Multi-Agent Systems, 12(2), 183-198.
[7] Yan, Z., Chen, Y., & Shen, Y. (2014). PerContRep: a practical reputation system for pervasive content services. The Journal of Supercomputing, 70(3), 1051-1074.
[8] Wang, Y., Yin, G., Cai, Z., Dong, Y., & Dong, H. (2015). A trust-based probabilistic recommendation model for social networks. Journal of Network and Computer Applications, 55(2015), 59-67.
[9] De Grazia, A. (1953). Mathematical derivation of an election system. Isis, 44(1/2), 42-51.
[10] Coombs, C. H. (1964). A theory of data. New York: Wiley.
[11] Jøsang, A., Pini, M. S., Santini, F., & Xu, Y. (2013). Combining Recommender and Reputation Systems to Produce Better Online Advice. Modeling Decisions for Artificial Intelligence, 8234, 126-138.
[12] Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2000). Analysis of recommendation algorithms for e-commerce. In Proceedings of the 2nd ACM Conference on Electronic Commerce, 158-167, ACM.
[13] Hernando, A., Bobadilla, J., Ortega, F., & Tejedor, J. (2013). Incorporating reliability measurements into the predictions of a recommender system. Information Sciences, 218(2013), 1-16.
[14] Said, A., Jain, B. J., Narr, S., & Plumbaum, T. (2012). Users and noise: The magic barrier of recommender systems User Modeling, Adaptation, and Personalization (pp. 237-248): Springer International Publishing.
[15] Ku, Y.-C., & Tai, Y.-M. (2013). What Happens When Recommendation System Meets Reputation System? The Impact of Recommendation Information on Purchase Intention. In System Sciences (HICSS), 2013 46th Hawaii International Conference on, 1376-1383, IEEE.6
Thank you.
Questions?