node labels as random variables
DESCRIPTION
Opinion Fraud Detection in Online Reviews using Network Effects. users: `honest’ / `fraudster’. reviews: `genuine’ / `fake’. products: `good’ / `bad’. review sentiments (+: thumbs-up, -: thumbs-down). the user-product review network (bipartite). i. Leman Akoglu Stony Brook University - PowerPoint PPT PresentationTRANSCRIPT
node labels as random variables
prior belief observed neighbor potentials
compatibility potentials
Opinion Fraud Detection Opinion Fraud Detection in Online Reviews using Network Effectsin Online Reviews using Network Effects
Leman AkogluStony Brook University
Christos FaloutsosCarnegie Mellon University
Rishi ChandyCarnegie Mellon University
Which reviews do/should you trust?
Problem Statement
A network classification problem: Given
Classify network objects into type-specific classes:
the user-product review network (bipartite)
review sentiments (+: thumbs-up, -: thumbs-down)
users: `honest’ / `fraudster’
products: `good’ / `bad’
reviews: `genuine’ / `fake’
Property 1: Network effectsFraudulence of reviews/reviewers is revealed in relation to others. So
review network should be used.
A Fake-Review(er) Detection System
Desired properties that such a system to have:
Property 2: Side informationInformation on behavioral (e.g. login
times) and linguistic (e.g. use of capital letters) clues should be exploited.
Property 3: Un/Semi-supervisionMethods should not expect fully labeled training
set. (humans are at best close to random)
Property 4: ScalabilityMethods should be (sub)linear in
data/network size.
Property 5: Incremental Methods should compute fraudulence
scores incrementally with the arrival of data (hourly/daily).
Problem Formulation: A Collective Classification Approach
Objective function utilizes pairwise Markov Random Fields (Kindermann&Snell, 1980):
edge signs
Finding best assignments is the inference problem, NP-hard for general graphs.We use a computationally tractable (linearly scalable with network size) approximate inference algorithm called Loopy Belief Propagation (LBP) (Pearl, 1982).
•Iterative process in which neighbor variables “talk” to each other, passing messages
•When consensus reached, calculate belief
signed Inference Algorithm (sIA):
Inference
“I (variable x1) believe you (variable x2) belong in these states
with various
likelihoods…”I) Repeat for each node:
II) At convergence:
Scoring:
Before After
Compatibility:
Datasets
I) SWM: All app reviews of entertainment category (games, news, sports, etc.) from an anonymous online app store database As of June 2012:* 1, 132, 373 reviews* 966, 842 users* 15,094 software products (apps)
Ratings: 1 (worst) to 5 (best)II) Also simulated fake review data (with ground truth)
Compared to 2 iterative classifiers (modified to handle signed edges): I) Weighted-vote Relational Classifier (wv-RC) (Macskassy&Provost, 2003) II) HITS (honesty-goodness in mutual recursion) (Kleinberg, 1999)
Competitors
Real-data Results
Performance on simulated data: (from left to right) sIA, wv-RC, HITS
Top 100 users and their product votes:
+ (4-5) rating o (1-2) rating
“bot” members?
Top-scorers matter:
ConclusionsNovel framework that exploits network effects to automatically spot fake review(er)s.
• Problem formulation as collective classification in bipartite networks• Efficient scoring/inference algorithm to handle signed edges• Desirable properties: i) general, ii) un/semi-supervised, iii) scalable• Experiments on real&synthetic data: better than competitors, finds real fraudsters.