mining product reputations on the web
Post on 04-Jul-2015
2.759 Views
Preview:
TRANSCRIPT
Mining Product Mining Product Reputations on the Reputations on the WebWebSIGKDD 02 Edmonton, Alberta, Canada
Copyright 2002 ACM
Satoshi Morinaga, Kenji YamanishiNEC Corporation
Kenji Tateishi, Toshikazu Fukushima
NEC Corporation
AgendaAgenda• Introduction• Reputation mining system• Opinion extraction• Reputation analysis• Experiments• Concluding remarks
Presented byJoyce Chen
Introduction
Knowing the reputation of your own and/or competitors’ products is important.Problems:
Handling the large volume of open answer by manually Gather the large volume of high quality survey data.
Solution:New framework for automatically collecting and analyzing opinions on the internet.Combining the opinion extraction technique and text mining methodologies.
Previously employed in Survey Analyzer (SA. Is a trademark of NEC corporation in Japan.)Text mining focus on open answerText classification through close answer or manual labeling
Presented byJoyce Chen
Introduction (cont.)
Opinion extractionCollects people’s opinions about products from the internet and attaches three labels:
The name of product referred toThe positive/negative nature of opinionopinion-likeliness (a numerical value the degree of system confidence that the extracted statement is.)
Labeled opinions put into an opinion database. Reputation analysis
Rule analysis (Extracting characteristic words)“monochrome” and “inexpensive”, “lightweight” and “convenient”Stochastic complexity
Co-occurrence analysisTypical sentence analysisCorrespondence analysis
Two-dimensional positioning mapDisplay the corresponding relationships among the target categories.
Presented byJoyce Chen
Reputation mining system
Presented byJoyce Chen
Opinion extraction
Web page collection moduleUse a crawler to collect web pages relevant to input product names.
Positive/negative determining moduleChecked with a previously prepared “evaluation-expression dictionary”“fast”, “good”, “light” are positive expression“heavy”, “easily broken”, “noisy” are negative expression
Presented byJoyce Chen
Opinion extraction (cont.)
Opinion-likeliness calculation moduleCalculate its opinion-likeliness scoreA real value ranging from 1 to 5The higher score, the higher likelihoodUsing syntactic property rules
Learned manually from training examples or
Standard machine learning
Presented byJoyce Chen
Reputation analysis
Rule Analysis (Characteristic-Word Extraction)
TrainingResemble decision tree generation
Use stochastic complexity as a criterion
Text classification rules & association rulesOrdered sequences of IF-THEN-ELSE rules
Extract keywords indicative of a specified categoryStochastic complexity formula
Score(w) represents information gain
Presented byJoyce Chen
Rule Analysis (cont.)
Presented byJoyce Chen
Co-occurrence analysis
Extract a list of words that co-occur with characteristic words
Presented byJoyce Chen
Typical sentence analysis& Correspondence analysis
Typical sentence analysisGive user a simple overview of tendenciesScores are calculated on the basis of the naïve Bayesian theory (posterior probability )
Correspondence analysisCreate a two-dimensional position map.An extension of principal component analysis (PCA)
Presented byJoyce Chen
Experiments – Cellular Phone
Presented byJoyce Chen
Experiments - PDAs
Presented byJoyce Chen
Experiments – Internet Service Providers
Presented byJoyce Chen
Concluding remarks
Purpose a framework for mining product reputation on the web.Four fundamental tasks:
Characteristic word extractionCo-occurring word extractionTypical sentence extractionCorrespondence analysis
The key to combining two parts is opinion labelingThis framework could applied to mining reputation far beyond industrial products. i.e., events, services, companies, governments, etc.
top related