node labels as random variables

node labels as random variables prior belief observed neighbor potentials compatibility potentials Opinion Fraud Detection Opinion Fraud Detection in Online Reviews using Network Effects in Online Reviews using Network Effects Leman Akoglu Stony Brook University [email protected] du Christos Faloutsos Carnegie Mellon University [email protected] Rishi Chandy Carnegie Mellon University [email protected] Which reviews do/should you trust ? Problem Statement A network classification problem: Given Classify network objects into type-specific classes: the user-product review network (bipartite) review sentiments (+: thumbs-up, -: thumbs-down) users: `honest’ / `fraudster’ products: `good’ / `bad’ reviews: `genuine’ / `fake’ Property 1: Network effects Fraudulence of reviews/reviewers is revealed in relation to others. So review network should be used. A Fake-Review(er) Detection System Desired properties that such a system to have: Property 2: Side information Information on behavioral (e.g. login times) and linguistic (e.g. use of capital letters) clues should be exploited. Property 3: Un/Semi- supervision Methods should not expect fully labeled training set. (humans are at best close to random) Property 4: Scalability Methods should be (sub)linear in data/network size. Property 5: Incremental Methods should compute fraudulence scores incrementally with the arrival of data (hourly/daily). Problem Formulation: A Collective Classification Approach Objective function utilizes pairwise Markov Random Fields (Kindermann&Snell, 1980): edge signs Finding best assignments is the inference problem, NP-hard for general graphs. We use a computationally tractable (linearly scalable with network size) approximate inference algorithm called Loopy Belief Propagation (LBP) (Pearl, 1982). •Iterative process in which neighbor variables “talk” to each other, passing messages •When consensus reached, calculate belief signed Inference Algorithm (sIA): Inference “I (variable x1) believe you (variable x2) belong in these states with various likelihoods…” I) Repeat for each node: II) At convergence: Scoring: Before After Compatibility: Datasets I) SWM: All app reviews of entertainment category (games, news, sports, etc.) from an anonymous online app store database As of June 2012: * 1, 132, 373 reviews * 966, 842 users * 15,094 software products (apps) Ratings: 1 (worst) to 5 (best) II) Also simulated fake review data (with ground truth) Compared to 2 iterative classifiers (modified to handle signed edges): I) Weighted-vote Relational Classifier (wv-RC) (Macskassy&Provost, 2003) II) HITS (honesty-goodness in mutual recursion) (Kleinberg, 1999) Competitors Real-data Results Performance on simulated data: (from left to right) sIA, wv-RC, HITS Top 100 users and their product votes: + (4-5) rating o (1- 2) rating “bot” members? Top-scorers matter: Conclusions Novel framework that exploits network effects to automatically spot fake review(er)s. • Problem formulation as collective classification in bipartite networks • Efficient scoring/inference algorithm to handle signed edges • Desirable properties: i) general, ii) un/semi-supervised, iii) scalable • Experiments on real&synthetic data: better than competitors,

Upload: minda

Post on 09-Jan-2016

24 views

Category:

Documents

1 download

Report

Download

Tags:

Embed Size (px):

DESCRIPTION

Opinion Fraud Detection in Online Reviews using Network Effects. users: `honest’ / `fraudster’. reviews: `genuine’ / `fake’. products: `good’ / `bad’. review sentiments (+: thumbs-up, -: thumbs-down). the user-product review network (bipartite). i. Leman Akoglu Stony Brook University - PowerPoint PPT Presentation

TRANSCRIPT

node labels as random variables

prior belief observed neighbor potentials

compatibility potentials

Opinion Fraud Detection Opinion Fraud Detection in Online Reviews using Network Effectsin Online Reviews using Network Effects

Leman AkogluStony Brook University

[email protected]

Christos FaloutsosCarnegie Mellon University

[email protected]

Rishi ChandyCarnegie Mellon University

[email protected]

Which reviews do/should you trust?

Problem Statement

A network classification problem: Given

Classify network objects into type-specific classes:

the user-product review network (bipartite)

review sentiments (+: thumbs-up, -: thumbs-down)

users: `honest’ / `fraudster’

products: `good’ / `bad’

reviews: `genuine’ / `fake’

Property 1: Network effectsFraudulence of reviews/reviewers is revealed in relation to others. So

review network should be used.

A Fake-Review(er) Detection System

Desired properties that such a system to have:

Property 2: Side informationInformation on behavioral (e.g. login

times) and linguistic (e.g. use of capital letters) clues should be exploited.

Property 3: Un/Semi-supervisionMethods should not expect fully labeled training

set. (humans are at best close to random)

Property 4: ScalabilityMethods should be (sub)linear in

data/network size.

Property 5: Incremental Methods should compute fraudulence

scores incrementally with the arrival of data (hourly/daily).

Problem Formulation: A Collective Classification Approach

Objective function utilizes pairwise Markov Random Fields (Kindermann&Snell, 1980):

edge signs

Finding best assignments is the inference problem, NP-hard for general graphs.We use a computationally tractable (linearly scalable with network size) approximate inference algorithm called Loopy Belief Propagation (LBP) (Pearl, 1982).

•Iterative process in which neighbor variables “talk” to each other, passing messages

•When consensus reached, calculate belief

signed Inference Algorithm (sIA):

Inference

“I (variable x1) believe you (variable x2) belong in these states

with various

likelihoods…”I) Repeat for each node:

II) At convergence:

Scoring:

Before After

Compatibility:

Datasets

I) SWM: All app reviews of entertainment category (games, news, sports, etc.) from an anonymous online app store database As of June 2012:* 1, 132, 373 reviews* 966, 842 users* 15,094 software products (apps)

Ratings: 1 (worst) to 5 (best)II) Also simulated fake review data (with ground truth)

Compared to 2 iterative classifiers (modified to handle signed edges): I) Weighted-vote Relational Classifier (wv-RC) (Macskassy&Provost, 2003) II) HITS (honesty-goodness in mutual recursion) (Kleinberg, 1999)

Competitors

Real-data Results

Performance on simulated data: (from left to right) sIA, wv-RC, HITS

Top 100 users and their product votes:

+ (4-5) rating o (1-2) rating

“bot” members?

Top-scorers matter:

ConclusionsNovel framework that exploits network effects to automatically spot fake review(er)s.

• Problem formulation as collective classification in bipartite networks• Efficient scoring/inference algorithm to handle signed edges• Desirable properties: i) general, ii) un/semi-supervised, iii) scalable• Experiments on real&synthetic data: better than competitors, finds real fraudsters.

SnapNETS: Automatic Segmentation of Network …badityap/papers/snapnets-aaai17.pdfSnapNETS: Automatic Segmentation of Network Sequences with Node Labels Sorour E. Amiri, Liangzhe Chen,

1 Assembly Instructions Assembly language instructions may involve mnemonics, labels, variables, constants, and directives. Examples are as follows. here

Section 6.1 Exploring Quantitative Data. Quantitative vs. Categorical Variables Categorical Labels for which arithmetic does not make sense. Sex, ethnicity,

Mindware · Manufacturer of Labels, Product labels,clear to clear labels,mark sensing labels , shrink labels, movie tickets, taffeta suppliers, thermal ribbon convertors, labels manaufatcurer

TABLE OF CONTENTS— PLANT LABELS & TAGS · 2018-07-16 · TABLE OF CONTENTS— PLANT LABELS & TAGS PLANT LABELS & TAGS Thin Plastic Labels Heavy Plastic Labels Plastic Computer Labels

SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Modelling and analysis using GROOVE · Graphs in groove consist of labelled nodes and edges.1 An edge is a binary arrow between two nodes. Node labels can either be node types or

A study of patients with isolated mediastinal and hilar … · 2014-06-26 · ders. Lymph node variables included total number of enlarged lymph node stations, pattern of lymph node

Finite Element Representation of Muscles...Node 179 Node 204 Node 205 Node 206 Node 227 Node 228 Node 229 Node 264 Node 265 Node 266 Node 277 Node 278 Node 279 Figure 3: Force-strain

of Dynamic Communication Systems · 2007-02-20 · Partner Graph Grammars • Assume undirected, node-labeled graph G= (VG,EG,ℓG) over ﬁnite set N of node labels • A PGG is

+ struct Node { + int x; + Node * next; + }; + Node *mylist; + Node *mylist = NULL; head 121

Catalog of Labels Stock · 2019. 5. 15. · Catalog of Labels Stock LABELLON & CLICK LABELS issue 6 #1 Paper Labels Film Labels Color Labels Pre-printed Labels Roll Labels A4 , A3