a survey of unfair rating problem and detection methods in ...€¦ · 2 unfair rating in...

A survey of unfair rating problem and detection methods inreputation management systems

Long NgoTartu [email protected]

AbstractA reputation management system (RMS) is a sys-tem that helps users in an online community tocompute reputation of others. One important issuein this kind of systems is how to eliminate the effectof unfair ratings. In this paper we make a survey ofhow people have solved this problem so far. Besidebriefly explaining the solutions, we discuss aboutpros and cons of every solution and the future re-search direction on this issue.

KEYWORDS: reputation management, deception,unfair rating, filter, survey

1 IntroductionInternet is everywhere today and it has been shap-ing our ways of living to a certain degree. Oneof the changes that it brought is now people canjoin online communities, in which we may notknow others’ real identities or know how to reachthem. As a result, it raised a question "Can we trustsomeone who we can see him only in Internet?".This became a big question for the development ofe-commerce infrastructure. The difficulties comefrom the differences between reputation of some-one in an online community and that in real life. Inan online environment, anyone can easily appear,do good things, bad things (to a certain extent), andeven disappear also. But for the goal of building In-ternet communities where people can exchange re-sources and trade, we must make reputation some-thing valuable, just to keep people really responsi-ble and behave well. Hence, what we need are RM-Ses which allow us to calculate reputation of a user

in order to conclude that he is trustworthy or not. Asimple way is we decide according to our own ex-perience with this user only, do not care what hehave done with others. But it does not help be-cause one user often has few past transaction withanother and they could be few and far between ortoo old. Thus we have to collect opinions from thecommunity. In a practical RMS, a user after see-ing or reading or getting something from someone,he publishes his rating about it so that others canget the rating whenever they need. Our problemis, there could be some people try to make wrongratings intentionally,or unfair ratings, and then thatratings lead users to wrong decisions. As a result,together with the need for a good RMS, comes theissue of controlling unfair rating.

One well known example of a RMS is eBay, anonline auction site. In this site, people sell and buygood to and from people they never meet from allover the world. So how a user know that a seller (ora buyer) is worth working with? The answer is hewill look at the history of ratings from other usersafter they have bought something from this seller.If most of them said that this seller is a good one,then we can believe him to a certain extent. How-ever, the question "can we trust him or not" actuallyturns the question "can we trust people who haverated him or not". That is why we concern aboutthe problem of unfair rating. If people say that aseller is good, but actually they tell a lie, then wewill trust a wrong person.

In this paper, we will try to draw an overviewpicture of how people are trying to handle the prob-lem of unfair ratings. We will see how many typesof methods people have figured out currently andwhat are typical methods in each type. We will also

Graduate seminar- Tartu Autumn 2007

analyse what are good things and not good things inevery method, even compare them sometimes. Be-cause researchers have proposed a number of meth-ods already and we have to look at several of them,we can not cover everything in too detail. However,we will try to get the overview idea, just to under-stand what are the points in each method. Also ourpurpose is not to explain about RMSes, but its filter-ing method, so we will mention about how a RMSis built only if it helps to understand the related fil-tering techniques for unfair rating.

Because of the limited scope of this paper, wecan not cover everything from scratch. So beforewe starting going into the issue, we have to assumethat readers have a basic knowledge about onlinerating systems and some practical experience aboutthem, such as know what eBay is. Even the pa-per is written in the way that it tries to explain ev-erything necessary, sometimes readers can not un-derstand without knowing some things about RM-Ses. Specifically, readers are assumed to have basicknowledge about Bayesian rating systems and Betadistribution [5]

Because we collect knowledge from differentsources, we may have different terminologies forthe same thing, but we will try to explain every-thing clearly to keep the paper understandable. Andit should be noticed that, in all of RMSes we willmention in this paper, a buyer can always be a sellerand vice versa.

The rest of paper is organised as follow. Sec-tion 2 will explain unfair rating types. Then section3 will describe some methods, organised in groups,to cope with this problems. After that section 4 willdiscuss about methods mentioned in section 3, be-fore we conclude in section 5

2 Unfair rating in reputationmanagement systems

There are several ways to cheat in online RMSes.Malicious users can steal others’ identities or theirinformation, hide a disreputable account and start anew one after defrauding, discriminate some otherusers, compete unfairly, and so on. This paper fo-cuses on coping with a popular and simple cheat-ing way, which is making unfair ratings. Generallyspeaking, a buyer will make a numerical rating R ,

after a transaction, describing how the buyer thinksabout the quality of the service provided by a sellerin a RMS. Normally a rating is a vector, which rep-resents different aspects of quality. We can assumeit is just one number, even 1-bit number (1 for pos-itive, 0 for negative), without losing generality andthe greater the number is, the higher the quality is.It should be recalled that the terms buyer and sellercan be understood flexibly because any user can actas these both roles. In the case of P2P networks,where a service could be a file to be downloaded,the buyer could be the requester and the seller couldbe the file provider.

Actually, different buyers may have differenttaste for a service and that issue is concerned in ev-ery RMS [4]. Hence, we can not say that all ofhonest users must have the same rating for one ser-vice. However, a bad user means he intentionallymakes ratings that are not true from the quality ofservice he has received.

There are three known types of unfair ratings:

• ballot-stuffing: A buyer gives positive ratingfor a seller although the buyer has not get anygood quality service from the seller. The rea-son could be that the buyer is colluded withthe seller or the buyer is a virtual buyer cre-ated by the seller [3]. This is to exaggerate theseller’s reputation or to weaken some bad ac-tions he has done. Therefore, other buyers arefooled to believe that this seller is very worthworking with.

• bad-mouthing: A buyer gives negative ratingfor a seller although the buyer has not get anybad quality service from the seller. The reasonmay be that the buyer does not like the selleror the buyer is asked to do so by the seller’srivals. The result is a seller’s reputation de-creases wrongfully.

In some RMSes, which care positive ratingsonly, this type of cheating does not exist[3].

• complementary[7, 9]: A buyer gives the ratingopposite the real quality of service he has re-ceived. Specifically, if he gets good service,he rates it bad and vice versa.

To make it more clear, we look at Yu and Singh’sdeception models, which illustrate these unfair rat-


ing types [9]. In the models, let α be the exaggera-tion coefficient (0 < α < 1), x is the true rating andx′ is the provided rating (we assume x is in the scaleof [0, 1]), then the deception models are describedas in Figure 1

Figure 1: Deception models for unfair rating[9]

x’ in the models is calculated as follows.

• Normal: x′ = x

• Complementary: x′ = 1− x

• Exaggerated positive (ballot-stuffing): x′ =α + x− α ∗ x

• Exaggerated negative (bad-mouthing): x′ =x− α ∗ x/(1− α)

The meaning of formula for exaggerated positiveor negative case is it keeps the bad ratings alwayshigher or lower then the correct one. In fact, ex-aggerated positive (or ballot-stuffing) could be thecase that a user always gives 1. In that case α = 1.The same thing can be found in the case of exag-gerated negative ratings.

We have talked about types of unfair rating inRMSes. In the next chapter we will see what meth-ods can be used to cope with this problem.

3 Methods for detecting unfairratings

There are a number of research projects related tounfair rating problems. According to [8, 6], they

can be put into two groups, endogenous and exoge-nous

• Endogenous discounting of unfair ratingMethods in this groups try to recognise unfairrating by calculating its statistical character-istics only then comparing. The idea is falseratings often have different statistical patternsfrom true ones. Researches in this group in-clude [8, 7, 4, 2].

• Exogenous discounting of unfair ratingMethods in this group use other factors, oftenreputation of raters, to decide the weight for agiven rating in order to calculate the final rep-utation. The idea is based on the assumptionthat users with low reputation tend to give lowquality ratings. Reputations of raters could bedecided by other sources of information. Oneexample is rating of raters given by others. Ex-amples of this method are described in [3, 1, 9]

In the rest of the paper, we use this idea to clas-sify methods. We also can divide proposals intodifferent way, such as proposals for centralised ordecentralised RMSes, but not in this paper.

Because of the scope of this paper, only sometypical recent research projects are discussed. Italso should be noticed that many of proposals arebased on RMSes which basicaly are Bayesian rat-ing systems. The reason is Bayesian theory is use-ful to calculate future probability from past events.Although understanding Bayesian reputation sys-tems and Beta distribution help to understand mostof methods, we do not explain the knowledge indetails here, but we will try to make readers under-stand somehow. More details about it can be foundin [5]

3.1 Endogenous discounting of unfairratings

Methods in this groups statistically analyse ratingsfrom other agents to decide which raters can betrusted in and then give suitable weights for theirratings.


Figure 2: 1% and 99% quantiles of beta (p|8,2) [8]

3.1.1 Filtering out by comparing individualratings with the total one

Whitby et al gave this proposal in [8]. The filter-ing algorithm is clear and simple. Suppose that anagent want to calculate the reputation of an agent Z.We assume that it has the aggregate rating ρt(X, Z)∗, which is the cumulative rating for Z by X , fromevery rater X in the system (We can simply considerit as the information a buyer requests others when-ever he needs to calculate reputation of a seller. Inthis paper, it is not necessary to understand whatand why it is so). And then,from every ρt(X, Z)we make a lower and upper bound (Although to getthe idea of the algorithm it is not necessary to knowhow to make the bounds, but it is worth noticingthat the lower is q quantile and the upper is 1 − qquantile of ρt(X, Z). The Figure 2 depicts an ex-ample of the bounds with q = 0.01. 0.01 quan-tile means 1% percent of data fall below and 99%fall above that value ). Combining all of these rat-ings ρt(X, Z), we can calculate the final reputationscore Rt(Z) of Z. Now we already have all of thecomputations we need. The way we consider a rat-ing is unfair is simple:if Rt(Z) is not inside lowerand upper bound calculated by a rating ρt(X, Z),this rating is unfair. This process is repeated untilno ρt(X, Z) left conflict with Rt(Z). The pseudocode of the algorithm is described in [8] as follows.

∗Computation for this value is designed for Beta reputationsystems. Readers can refer to [5] for more details, but it is notnecessary for just getting the idea of the method

Data: a set of individual ratings ρt(X, Z)Result: The final set of fair ratersC is the set of all raters;1

F is the set of all assumed fair raters;2

Z is the target agent (that we want to get3

the reputation of);F = C;4

repeat5

ρt(Z) :=∑

X∈F ρt(X, Z);// This6

is how the aggregaterating is calculated

Rt(Z) :=7

E(ρt(Z));// Expectationof ρt(Z) or the totalcalculated reputation

for each rater R in F do8

f := beta(ρt(R,Z));// f is9

beta distributionfunction.Do not needto care about it

l := q quantile of f ;// upper10

boundu := (1− q) quantile of f ;11

// lower boundif l > Rt(Z) or u < Rt(Z) then12

F := F\R // throw13

out unfair raterend14

end15

until F does not change ;16

Algorithm 1: Filtering by comparing

In the end, Rt(Z) is the reputation we hope thatall of unfair ratings are excluded.

This way of filtering sounds reasonable becauseit tries to filter out all of rating that is too differ-ent from the rest by using quantiles of the majorityopinion. The algorithm is also clear and easy to un-derstand. However, because of its iteration it mayhave loading problem when there are a large num-ber of ratings. Another issue is that it assumes rat-ings follow Beta distribution, while this assumptionmay not be true when there are few ratings[7]

3.1.2 Filtering out by using entropy

Weng et al. in [7] described a method using en-tropy. The algorithm is explained in the context ofa Bayesian rating system. However, it can also be


applied not only in Beta distribution RMSes, butalso in other types as long as we can measure theentropy of ratings.

General idea of the method is that if a rating istoo different from majority’s opinion, then it couldbe an unfair one. This idea is pretty the same insome other RMSes. However, in this case entropyis used to decide how a rating is different to others.

To make it more understandable, let’s briefly re-cap what is Entropy. Entropy of a source of infor-mation shows how uncertain the source is. The for-mal formula of entropy of a variable V is: H(V ) =∑

Pr(v)log(Pr(v)), where v is a possible valuethat V can be, and Pr(v) is the corresponding prob-ability of V being v.

Now it should be necessary to explain a little bitabout how we can calculate the probabilities in thecase of RMSes. This is the key point, because thisfiltering method uses probabilities as input, so itdoes not care how the probabilities are calculated.It depends on RMSes. In [7], the authors use Betareputation systems as an example. In that kind ofsystems, the following formulas are used:

Prp =α

α + β, Prn =

β

α + β

with α and β are sum of positive and negativeratings (the author mentioned about forgetting fac-tor but I skip it to keep things simple), Prp andPrn are probabilities that a rating is a positive ornegative correspondingly. Another thing is we donot take into account all of ratings, but in a timewindow W with a specific size, or only recent rat-ings. As a consequence, the entropy of a seller’sbehaviour observed by a buyer B is calculated asfollow

H(rB) = −Prplog(Prp)− Prnlog(Prn)

To compare ratings of different buyer, the authordefined the Quality of a rating as follow:

Q(rB) =Hmax(rB)−H(RB)

Hmax(rB)−Hmin(RB)(3.1.1)

where Hmax(rB) and Hmin(rB) are maximumand minimum uncertainties of rB

By measuring all ratings from a buyer X like this,the buyer B will aggregate every other’s rating if

|Q(rX)−Q(r)| ≤ ε

where Q(rX) is the quality of the ratings from abuyer X, Q(r) is the current quality of the buyer B’saggregate ratings and ε is the screening threshold(must be between 0 and 1). The effectiveness andsensitivity of this screening algorithm depends onchoosing a suitable ε.

The pseudo code for the algo-rithm is outlined as follows [7].

Data: a set of B’s local ratings and a set ofothers’ ratings about a seller

Result: the final set of fair ratersB is the buyer initiating the rating1

aggregation;C is the set of buyers who ratings are2

requested;X is a particular buyer in the set C;3

measure Quality of buyer B’s local rating4

Q(rB) using (3.1.1);Q(r) := Q(rB);; // Initially5

for all X in C do6

measure the rating quality Q(rX) of7

ratings reported by X;if |Q(rX)−Q(r)| ≤ ε then8

aggregate X’s rating by updating α9

and β, then update Q(r);else10

discard X’s rating;// consider11

it unfair;12

end13

end14

Algorithm 2: Filtering by comparing en-tropy

This approach uses local ratings as the initialbase to compare with other ratings. This makessense because someone should believe in himselfmore strongly than in others. However, in the casewhen the local rating is very different from the rest,the buyer may predict wrongly. In that case, thebuyer may predict wrong for several times beforehis set of local ratings is corrected (by learningfrom mistakes). Of course, we can increase thethreshold to make the algorithm not too sensitive,but then it may compromise the system by accept-ing more potential unfair ratings.


Figure 3: Reputation tree of a rater [2]

3.1.3 Building reputation tree

Chen and Singh in [2] proposed a method usingratings themselves to measure the reputation raters.The idea is that a person with higher reputation lesslikely to give an unfair rating. By having a rater’sreputation, we can use it as the weight value of thatrater when calculate score by aggregating ratingsabout an object. To avoid confusing, here the au-thors used the word score instead of reputation foran object (a service, a seler, anything that a userwants to ask others’ opinions about it).

In this method, ratings are put into categories ofdifferent levels, making a reputation tree. Figure 3shows a tree with a rater as the root and leaves ashis ratings for objects.

The main idea of building a reputation tree is rat-ings about related objects are put in categories andcategories may have their parent-categories. Everyrating is evaluated to deduce its local match (LM),denoting the rating’s quality and local confidence(LC), denoting the confidence of LM (We will dis-cuss about how to do it later on). Each categoryhas its global match (GM) and global confidence(GC) as its aggregating quality from the children’squalities and the confidence of that quality. Top ofthe tree also has its GC and GM. The way we com-pute the GC and GM of a rater is bottom - up, asdescribed below. (just briefly explained to avoidcomplexity, for details please read [2]):

• Evaluate a comment: Ratings are leaves ofthe tree, so if we can measure their quali-ties, we can gradually climb up to compute thequality of the top (quality of the rater). Howto compute a rating’s quality is complex, sowe will skip it. The general idea is the moresimilar others’ opinions a rating about an ob-ject has, the higher the comment’s quality andconfidence are. Now we assume that we cansomehow calculate LM and LC of ratings,which are leaves of a tree.

• To calculate the quality of a node from itschildren’s qualities:

GM =

N∑i=1

(Mi × Ci)

N∑i=1

Ci

GC = 1− 4

√√√√√√√√√√N∑

i=1

(C2

i × (1− Ci)4)

(N∑

i=1

Ci

)2

Where

N is the total number of items rated by therater under this category

Mi is the match (local or global) for the itemnumber i

Ci: the confidence in Mi

In this way, a reputation of a node is a sum ofits children’s reputation with the weight val-ues are their confidence. Thus, if a user thinksa child node has certain reputation with highconfidence, that reputation will take much partto help the user calculate the reputation for theparent node. Confidence values are calculateddifferently, but we do not go into details here.

• To get the reputation of the rater:

Reputation = (GM + 1)GC − 1

The point is that each rater’s reputation is usedas the weight of his rating for an item. The score(or reputation) of an item is the average of all rat-ings related to it, using raters’ reputation as theweights. Recall that this method assumes high rep-utation rater less likely to rate unfairly.

This method introduced some interesting ideassuch as reputation of raters or reputation confi-dence. The bottom-up way to calculate the repu-tation in the tree also sounds logical. However, tomy knowledge there are still some problems. First,the method has not taken into account the cate-gory of the object that we want to get reputationof. The reason is, if some one always gives cor-rect ratings for cars and wrong ratings for movies,


then he likely to give the same rating quality for anew car or a new movie. Therefore, an expert incars can not be considered an expert in movies andvice versa. But in this method, his rating’s weightis the same, no matter the object is a car or a movie.Second, it could require high computing resource,because of a lot of computation for a reputation ofany rater. Not to mention that every time a rater hasa new rating, his reputation must be recalculated.

3.1.4 Controlled anonymity and agent cluster-ing

Dellarocas in [4] proposed a set of two methodsto eliminate bat-mouthing and ballot-stuffing sep-arately.

Controlled anonymityThe key idea of this method is a market place

in that the controller knows all of identities ofusers but it does not publish them. Instead of aseller’s fixed identity, buyers see a pseudonym thatis changed after every transaction by the controller.However, the controller publishes the seller’s repu-tation along with his pseudonym so that buyers canknow the seller is good or bad. Because a bad buyerdoes not know who is the one he "hates", he can notgive a bad-mouthing rating.

This method assumes that it is difficult to guesswho is who from the rest of information. For exam-ple, it is difficult or impossible to find someone byhis reputation or the services he offers. It may meanthat for a certain service there are enough providersto make guessing difficult. Another thing is thismethod can not applied in some cases that requireidentity to be publish. For instance a market placeof restaurant and hotel services .

One inherent problem of this method is some-times users really need to recognise ones who haveprovided services to them. For example, it is im-possible for a buyer to recommend a good seller toothers. And also if a buyer has a bad experiencewith a seller and want to avoid a him in the future,it is also impossible.

Clustering The above method can prevent usersfrom giving bad-mouthing rates, but it can not pre-vent someone from inflating a seller’s reputation.The reason is any seller can make tricks to signal

others so that they can recognise him. For example,that seller can invent a strange behaviour makinghim unique from the rest. Therefore, the authorsproposed a clustering method to eliminate ballot-stuffing

Aware of different tastes of raters for the samething, the author mention about personalised repu-tation rating. The idea is the RMS should providedifferent estimated reputation of a seller’s servicebased on each individual buyer’s taste. It makessense because different buyers with different tastesexpect to see different estimated reputations for thesame service. For this reason calculating unbiasedpersonalised reputation of a seller s for a the buyerb is described as follows.

Firstly, we find the nearest neighbour set N ofb. N includes only the buyers who have ratedfor s and are nearest neighbours of b in term ofsimilarity opinions about commonly rated sellers.In some cases this step can filter out all of unfairraters. However, if a bad seller is clever enoughto have spies who are similar enough to b to ap-pear in N , then we need further effort. In this case,use a clustering algorithm to form N into two clus-ters: Nl that contains lower ratings and Nu thatcontains higher ratings. Basically, Nl is consid-ered to contain all fair ratings and Nu contains allof unfair ratings. The author also proposed an en-hanced method to eliminate unfair rating flooding.The idea of the enhanced version is that if there isa user who give similar ratings with very high fre-quency, then he is likely a bad user who flood unfairratings.

This method is based on practical facts, howeverit does not distinguish old ratings and new ratings.Because reputation can change, we should discrim-inate ratings in different times.

To sum up, the ideas of these methods are inter-esting and clear. They are not a kind of "one-suits-all" methods, but in some certain cases it couldwork well.

3.2 Exogenous discounting of unfairratings

Methods in this type use not only statistical infor-mation of ratings but also other types of informa-tion. In this section we will have a look at some ofproposed methods.


3.2.1 Clustering ratings and checking IP ad-dresses

Cornelli et al proposed a very practical method in[3]. This paper described a reputation managementsolution for P2P networks in very details. Filteringout suspect ratings is part of the solution.

In this design for P2P network, a node b requestsratings from others’ nodes about another node s be-fore deciding to download something from s or not.The paper proposed just a scheme to detect raterswho made ballot-stuffing but did not mention aboutbad-mouthing. The original idea is that a ballot-stuffing action in a P2P likely originates from userswhich have common properties, because the usersare created from a common source. Thus we canuse clustering techniques to see if raters are welldistributed or not. If there are a low number ofclusters and many raters are in the same cluster,the node b wants to work with is considered un-trustworthy. It should be noticed that, in this casewe also want to detect unfair ratings, but the resultis not the set of fair raters, but the answer about anode is honest or not.

However, if we can do that, there is nothing toprevent a malicious user from doing the same thing.Of course he can distribute his forged nodes to dif-ferent clusters, therefore ratings can pass cluster-ing techniques. Another problem is that if we doclustering based on IP addresses, the result may notcorrect because many users can use the same proxyIP address. The paper then gave a solution to detectthis action by using a check phase afterwards. Afterclustering, we conduct an IP checking phase: Someraters are chosen randomly to be contacted directlyto check if they are real ones or not. If too manyraters who b can not contact, then we can concludethat s is dishonest to create those users.

The method is explained very practically andsomehow it is suitable for P2P networks only. Be-cause it uses IP address, that is too technically spe-cific for the purpose of checking bad users, it maynot be able to adapt the future when bad users cre-ate many new ways to pass this check.

3.2.2 Using trust rating in mobile ad-hoc net-work

In [1] Buchegger explained a method also for P2Penvironment. In principle, this method accepts a

rating if and only if the rater is trustworthy and hisinformation is not to different from the object’s cur-rent reputation. The final output is to decide that auser is regular or misbehaved.

Firstly, lets describe some terms.

First-hand information Fi,j : A user i updates therating for a user j after every transaction or after aperiod without any transaction between them usingthe following formulas:

• After a new transaction

α = uα + s

β = uβ + (1− s)

where:

u is discount factor for past experiences,

s is rating for this transaction, the value couldbe 1 for positive, 0 for negative

• After a period of inactivity

α = uα

β = uβ

Then the pair(α, β) is call first-hand informationabout user j from user j, denoted by Fi,j . α and βrelate to the number of recent good and bad trans-actions correspondingly. Initially (α, β) is (1,1)

Reputation Ri,j : Reputation about user j kept byuser i is Ri,j or a pair (α′, β′). This pair is updatedin two cases

• When Fi,j is updated: In this case (α′, β′) isupdated in the same way as (α, β) is updated

• When it receives first-hand information from auser k about j

– If user k is trustworthy (we will see howto know it later)

Ri,j = Ri,j + wFk,j

where w is a small positive constant.

– If user k is not trustworthyWe define E(Beta(α, β)) is the expec-tation of the distribution Beta(α, β)[5].To simply understand E(Beta(α, β)), itis actually the probability that a rated


user will provide a good service that de-serves to get a positive rating. Let sayFk,j = (αF , βF ) and Ri,j = (α′, β′),then we conduct the following deviationtest:

|E(Beta(α′, β′))−E(Beta(αF , βF ))|≥d

where d is a positive threshold

From Ri,j = (α′, β′) the user i can have areputation conclusion about user j as follow

Ri,j =

regular if |E(Beta(α′, β′)) < r

misbehaved if |E(Beta(α′, β′)) ≥ r

(3.2.2)where r is a threshold.

Now it is about time to go back to the problemof how to decide that a user k is trustworthy or not.User i always keeps the pair Ti,j = (γ, δ) as thetrust rating about user j. Initially Ti,j = (1, 1) andis updated as follow

Whenever user i receives information about userj from user k, he conducts the deviation test. Lets = 1 if the test is succeeds, s = 0 if not. Then

γ = vγ + s

δ = vδ + (1− s)

where v is discount factor for trust, like u.

Then

user j is

trustworthy E(Beta(γ, δ)) < t

untrustworthy E(Beta(γ, δ)) ≥ t

where t is a threshold

Above is how user i thinks user j is regular ormisbehaved. It should be noticed that only first-hand information is published, reputation and trustrating are not. Therefore no second-hand infor-mation, which is information passed from users touser, is used.

3.2.3 Using Weighted Majority Algorithm andBelief function

Bin Yu and Munindar Singh in [9] proposed amethod that updates weight values of other users

after every transaction.The whole solution of RMS is based on

Dempster-Shafer theory, that measures how much auser believes in others, and Weighted Majority Al-gorithm, that updates weight values. However, wewill try to explain the idea of eliminating unfair rat-ing in another way around, so that we do not haveto care about the above theories.

Like every RMS, whenever a user A wants to seeanother user B is trustworthy or not,A collects in-formation from other users. The point here is theinformation , which A gets, measures how muchanother user believes in user B. That is why the in-formation is called belief value and it is expressedin the following form:

m(T),m(qT)andm(T, qT (3.2.3)

where

• T denotes B is trustworthy

• qT denotes B is not trustworthy

• m(T), m(qT) denote how much T and qTare believed to happen

• m(T, qT) means uncertainty between Tand qT

User A stores a set of wi, that is the set of otherusers’ weight values . If A meets a new user, wi

will be 1 initially. Therefore, the effective beliefrating will be:

m′i(T) = wi ∗mi(T)

m′i(qT) = wi ∗mi(qT)

m′i(T, qT) = 1−m′

i(T)−m′i(qT)

Belief values from all of witnesses will be com-bined to get the final prediction. However, to focuson unfair detecting, we will just see how to copewith wrong ratings only.

The general idea is firstly we convert from ev-ery belief to prediction, simply if you trust some-one more then you more likely predict he will be-have honestly. Secondly, after every real transac-tion with a seller, we will decrease weight values ofusers who have guessed wrongly.


Firstly, to convert from belief to prediction,we can convert belief to probability according to(3.2.3) as follow:

q′i(T) = m′i(T) + m′

i(T, qT)q′i(qT) = m′

i(qT) + m′i(T, qT)

where q′i is likelihood rating of T and qT . There-fore,the prediction, or probability that a user will behonest, from witness Wi is

πi =q′i(T)

q′i(T) + q′i(qT)

=m′

i(T) + m′i(T, qT)

1 + m′i(T, qT)

(3.2.4)

After getting a new service from another user Ag ,a user Ar will rate it as xk, where 0 ≤ xk ≤ 1. IfAr uses this rating only, he will estimate the prob-ability of Ag being trustworthy as follows.

ρ =

1 if xk ≥ Ωr

0.5 if ωr ≤ xk ≥ Ωr

0 otherwise(3.2.5)

Secondly, now user Ar can update weight valueof Ag as follows.

w′i = θwi

where θ is simply defined as

θ = 1− |πi − ρ|2

(3.2.6)

It should be noticed that after every transaction,all of users who have given ratings to the user Ar

will be assessed and their weight values will be up-dated accordingly. By applying this mechanism,after some rounds, users who often give bad rat-ing will have low weights. Therefore, unfair ratingfrom them will not affect much to decide a certainuser is trustworthy or not.

This algorithm quite makes sense from the factthat it tries to eliminate "bad advisors". In real life,it is the case when you do not believe in people youhave realised that they always tell lies or they arejust "stupid". However, this algorithm should bedeveloped to be more suitable in many other casesfor some reasons below.

First, this algorithm assumes a community inwhich a user work with relatively fixed set of wit-nesses for several times. That is why the goal of thealgorithm is assessing other users’ values after ev-ery "real experience". However, it also means thata user has to accept some first times, he may havewrong opinion about another user. For this reason,if a new buyer in an online auction site just wantsto buy something in only one time, it is likely thathe can not get what he want, because he believeseveryone in the first time.

Second, it can be more severe when we have un-fair rating flooding. In this case, before a new userrealises who is trustworthy and who is not, he maysuffers much from to many "bad advisors" in somefirst transactions.

Third, it seems that θ is always ≤ 1. There-fore there is no way to increase the weight value ofa user, who have given "bad advice" in some firsttimes, but recently he is always correct. It is unfairbecause he still has low weight, while every "newadvisor" get the weight value of 1.

However, this algorithm has quite different ap-proach from that of others. It is realistic in somecases and we may have better solutions from com-bining it with other algorithms.

4 Discussion

After a brief look at some methods above, we cancome up with some following comments

Unfair ratings is a common problem in RMSes.In fact, whenever we think of building a RMS,we can not ignore this issue. Among the paperswe have discussed above, some have unfair rat-ing detection as part of their RMS proposals[3, 1,2].Some other researches about unfair rating detec-tion for RMSes that have been proposed before sep-arately [4, 7, 8, 9]

All solutions accept only slightly modificationsof reputation. It means a buyer never turns to dis-trust a seller just after one bad transaction. In addi-tion, in the case a seller suddenly provides bad ser-vice, no one can predict it (of course). But the prob-lem is, even getting information from some otherusers about it, a user still properly makes wrong de-cisions because it needs time before reputation candecrease significantly. However, if one user expe-


riences a bad service, reputation he calculates forthe seller reduces much more than if he just hearsfrom someone else. The reason is local informa-tion, or observation, has more weight than informa-tion from others.

One remarkable point is some methods can pro-duce different results [7, 8] depending on the orderof processing ratings. It leads to the fact that, some-times a rating can be considered unfair, sometimesit is not, although this thing does not happen often.However, methods such as clustering do not havethis property because it processes the set of ratingsas a whole.

One thing we can realise is endogenous methodsare always preferable for centralised RMSes whileexogenous ones are used more in P2P systems. Thereason could be from the characteristics of thesetwo architectures. In centralised systems, users canget information from a trusted server, with informa-tion can get as a bunch together. As a consequence,users can easily to analyse data statistically. In thecase of P2P network, there is no fixedly trustednode. Also there is nothing guarantees that we al-ways can get rating information we want, becausedata is not put in a centralised database. On theother hand, we can always get more other types ofinformation from any node we can contact (for ex-ample, IP address). For these reasons, exogenousmethods could be more suitable for P2P environ-ments.

Although we have several methods with differentideas, approaches and techniques, it is hard to saywhich one is the best. Every method has its own ad-vantages and disadvantages, as we have discussedwhen we look at each of them. Deciding which oneto apply in a certain case requires deep understandof the system and the method.

We have found any implementation of these re-searched above. To our knowledge, the welknownauction site eBay still uses very simple way to man-age reputation and it works. The reason could bethat among the proposed methods, there is no per-fect on, while the current implementation in reallife from works somehow.

5 ConclusionRMSes have become popular today and promise animportant role in the Internet age. However, unfairrating is a big obstacle to have a really good RMS.No RMS can be perfect if it can not solve absolutelythe problem unfair ratings. This issues have beenresearched in some projects recently but mainly inexperiments, not yet to implemented in real life. Atthis moment, some popular RMS still use simpleway to manage reputation. For example, eBay justallow buyers and sellers give rate after every trans-action and others can see the percentage of positiverating of any one. This way somehow makes usersscared of giving ratings, because of vengeance. TheRMS of eBay is not a perfect one, but it is accept-able so far. But we still need better one for not onlyonline market but also other types of online com-munities.

In this paper, we have been looking at some re-cently proposed methods for detecting unfair rat-ings in RMSes. Although we can not cover all ofrelated researches that have been ever done, we stillhave an overview of current trends in this issue. Wealso discusses pros and cons of the methods and seewhat should be researched more in the future. Al-though a lot of work has been done, more effort isrequired to improve the way we manage unfair rat-ing, before we can see really good RMSes in thefuture.

References[1] S. Buchegger and J. Le Boudec. A Robust Rep-

utation System for Mobile Ad-hoc Networks.Proceedings of P2PEcon, June, 2004.

[2] M. Chen and J. Singh. Computing and usingreputations for internet ratings. Proceedings ofthe 3rd ACM conference on Electronic Com-merce, pages 154–162, 2001.

[3] F. Cornelli, E. Damiani, S. Paraboschi, andP. Samarati. Choosing Reputable Servents ina P2P Network.

[4] C. Dellarocas. Immunizing online reputationreporting systems against unfair ratings anddiscriminatory behavior. Proceedings of the


2nd ACM conference on Electronic commerce,pages 150–157, 2000.

[5] A. Jøsang and R. Ismail. The beta reputationsystem. Proceedings of the 15th Bled Elec-tronic Commerce Conference, 2002.

[6] A. Jøsang, R. Ismail, and C. Boyd. A survey oftrust and reputation systems for online serviceprovision. Decision Support Systems, pages 1–27, 2005.

[7] J. Weng, C. Miao, and A. Goh. Protect-ing Online Rating Systems from Unfair Rat-ings. LECTURE NOTES IN COMPUTER SCI-ENCE, 3592:50, 2005.

[8] A. Whitby, A. Jøsang, and J. Indulska. Fil-tering out unfair ratings in bayesian reputationsystems. Proceedings of the 7th Intl. Workshopon Trust in Agent Societies, 2004.

[9] B. Yu and M. Singh. Detecting deceptionin reputation management. Proceedings ofthe second international joint conference onAutonomous agents and multiagent systems,pages 73–80, 2003.

a survey of unfair rating problem and detection methods in ...€¦ · 2 unfair rating in...

Documents