Transcript
Page 1: Mining Product Reputations On the Web

Mining Product Mining Product Reputations on the Reputations on the WebWebSIGKDD 02 Edmonton, Alberta, Canada

Copyright 2002 ACM

Satoshi Morinaga, Kenji YamanishiNEC Corporation

Kenji Tateishi, Toshikazu Fukushima

NEC Corporation

Page 2: Mining Product Reputations On the Web

AgendaAgenda• Introduction• Reputation mining system• Opinion extraction• Reputation analysis• Experiments• Concluding remarks

Page 3: Mining Product Reputations On the Web

Presented byJoyce Chen

Introduction

Knowing the reputation of your own and/or competitors’ products is important.Problems:

Handling the large volume of open answer by manually Gather the large volume of high quality survey data.

Solution:New framework for automatically collecting and analyzing opinions on the internet.Combining the opinion extraction technique and text mining methodologies.

Previously employed in Survey Analyzer (SA. Is a trademark of NEC corporation in Japan.)Text mining focus on open answerText classification through close answer or manual labeling

Page 4: Mining Product Reputations On the Web

Presented byJoyce Chen

Introduction (cont.)

Opinion extractionCollects people’s opinions about products from the internet and attaches three labels:

The name of product referred toThe positive/negative nature of opinionopinion-likeliness (a numerical value the degree of system confidence that the extracted statement is.)

Labeled opinions put into an opinion database. Reputation analysis

Rule analysis (Extracting characteristic words)“monochrome” and “inexpensive”, “lightweight” and “convenient”Stochastic complexity

Co-occurrence analysisTypical sentence analysisCorrespondence analysis

Two-dimensional positioning mapDisplay the corresponding relationships among the target categories.

Page 5: Mining Product Reputations On the Web

Presented byJoyce Chen

Reputation mining system

Page 6: Mining Product Reputations On the Web

Presented byJoyce Chen

Opinion extraction

Web page collection moduleUse a crawler to collect web pages relevant to input product names.

Positive/negative determining moduleChecked with a previously prepared “evaluation-expression dictionary”“fast”, “good”, “light” are positive expression“heavy”, “easily broken”, “noisy” are negative expression

Page 7: Mining Product Reputations On the Web

Presented byJoyce Chen

Opinion extraction (cont.)

Opinion-likeliness calculation moduleCalculate its opinion-likeliness scoreA real value ranging from 1 to 5The higher score, the higher likelihoodUsing syntactic property rules

Learned manually from training examples or

Standard machine learning

Page 8: Mining Product Reputations On the Web

Presented byJoyce Chen

Reputation analysis

Rule Analysis (Characteristic-Word Extraction)

TrainingResemble decision tree generation

Use stochastic complexity as a criterion

Text classification rules & association rulesOrdered sequences of IF-THEN-ELSE rules

Extract keywords indicative of a specified categoryStochastic complexity formula

Score(w) represents information gain

Page 9: Mining Product Reputations On the Web

Presented byJoyce Chen

Rule Analysis (cont.)

Page 10: Mining Product Reputations On the Web

Presented byJoyce Chen

Co-occurrence analysis

Extract a list of words that co-occur with characteristic words

Page 11: Mining Product Reputations On the Web

Presented byJoyce Chen

Typical sentence analysis& Correspondence analysis

Typical sentence analysisGive user a simple overview of tendenciesScores are calculated on the basis of the naïve Bayesian theory (posterior probability )

Correspondence analysisCreate a two-dimensional position map.An extension of principal component analysis (PCA)

Page 12: Mining Product Reputations On the Web

Presented byJoyce Chen

Experiments – Cellular Phone

Page 13: Mining Product Reputations On the Web

Presented byJoyce Chen

Experiments - PDAs

Page 14: Mining Product Reputations On the Web

Presented byJoyce Chen

Experiments – Internet Service Providers

Page 15: Mining Product Reputations On the Web

Presented byJoyce Chen

Concluding remarks

Purpose a framework for mining product reputation on the web.Four fundamental tasks:

Characteristic word extractionCo-occurring word extractionTypical sentence extractionCorrespondence analysis

The key to combining two parts is opinion labelingThis framework could applied to mining reputation far beyond industrial products. i.e., events, services, companies, governments, etc.


Top Related