mining feature-opinion pairs and their reliability scores from web opinion sources presented by sole...
TRANSCRIPT
MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY
SCORES FROM WEB OPINION SOURCES
Presented by Sole
A. Kamal, M. Abulaish, and T. AnwarInternational Conference on Web
Intelligence, Mining and Semantics (WIMS) 2012
2
Introduction
Opinion Data: user-generated content
Opinion Sources
Forums Discussion Groups Blogs
Customer Manufacturer
3
Introduction
Problems with Reviews
Information OverloadTime Consuming
Solution Approach to
Extract feature-opinions pairs from reviews Determine the reliability score of each pair
Biased Information
4
Related Work
Relatively new area of study Information Retrieval
Classification of positive/negative reviews NLP, text mining, probabilistic approaches
Identify patterns on text to extract attribute-value pairs
5
Proposed Approach
Architecture of the system
6
Pre-processing
Review crawler Noisy reviews are removed
Eliminate reviews created with no purpose or to increase/decrease the popularity of a product
Markup language is filtered Remaining content is divided into
manageable sizes Boundaries are determined based on heuristics,
e.g., granularity of words, stemming, synonyms
7
Document Parser
Text Analysis Assigns Part-Of-Speech (POS) tags to
each word Converts each sentence into a set of
dependency relations between pairs of words
Facilitate information extractionNoun Phrases
Adjectives
Adverbs
Product features
Degree of expressiveness of
opinions
Opinions
8
Feature and Opinion Learner Feature-opinion learner
Analyzes dependency relations generated by a document
Generates all the possible information components from the documents Information component: <f, m, o>
f refers to feature m refers to modifier o refers to an opinion
9
Feature and Opinion Learner Rule 1
In a dependency relation R, if there exist relationships nn(w1,w2) and nsubj(w3,w1) such that POS(w1)=POS(w2)=NN*, POS(w3)=JJ* and w1,w2 are not stop-words, or if there exists a relationship nsubj(w3,w4) such that POS(w3)=JJ*, POS(w4)=NN* and w3,w4 are not stop-words, then either (w1,w2) or w4 is considered as a feature and w3 as an opinion.
10
Feature and Opinion Learner Rule 2
In a dependency relation R, if there exist relationships nn(w1,w2) and nsubj(w3,w1) such that POS(w1)=POS(w2)=NN*, POS(w3)=JJ* and w1,w2 are not stop-words, or if there exists a relationship nsubj(w3,w4) such that POS(w3)=JJ*, POS(w4)=NN* and w3,w4 are not stop-words, then either (w1,w2) or w3 is considered as a feature and w4 as an opinion. Thereafter, the relationship advmod(w3,w5) relating w3 with some adverbial word w5 is searched. In case of presence of advmod relationship, the information component is identified as <(w1,w2) or w4,w5,w3> otherwise <(w1,w2) or w4,-,w3>.
11
Feature and Opinion Learner Rule 3
In a dependency relation R, if there exist relationships nn(w1,w2) and nsubj(w3,w1) such that POS(w1)=POS(w2)=NN*, POS(w3)=VB* and w1,w2 are not stop-words, or if there exist a relationship nsubj(w3,w4) such that POS(w3)=VB*, POS(w4)=NN* and w4 is not a stop-word, then we search for acomp(w3,w5) relation. If acomp relationship exists such that POS(w5)=JJ* and w5 is not a stop-word then either (w1,w2) or w4 is assumed as the feature and w5 as an opinion. Thereafter, the modifier is searched and the information component is generated in the same way as in Rule-2.
12
Feature and Opinion Learner Rule 4
In a dependency relation R, if there exist relationships nn(w1,w2) and nsubj(w3,w1) such that POS(w1)=POS(w2)=NN*, POS(w3)=VB* and w1,w2 are not stop-words, or if, there exist a relationship nsubj(w3,w4) such that POS(w3)=VB*, POS(w4)=NN* and w4 is not a stop-word, then we search for dobj(w3,w5) relation. If dobj relationship exists such that POS(w5)=NN* and w5 is not a stop-word then either (w1,w2) or w4 is assumed as the feature and w5 as an opinion.
13
Feature and Opinion Learner Rule 5
In a dependency relation R, if there exists a amod(w1,w2) relation such that POS(w1)=NN*, POS(w2)=JJ*, and w1 and w2 are not stop-words then w2 is assumed to be an opinion and w1 as an feature.
14
Feature and Opinion Learner Rule 6
In a dependency relation R, if there exists relationships nn(w1,w2) and nsubj(w3,w1) such that POS(w1)=POS(w2)=NN*, POS(w3)=VB* and w1,w2 are not stop-words, or if there exists a relationship nsubj(w3,w4) such that POS(w3)=VB*, POS(w4)=NN* and w4 is not a stop-word, then we search for dobj(w3,w5) relation. If dobj relationship exists such that POS(w5)=NN* and w5 is not a stop-word then either (w1,w2) or w4 is assumed as the feature and w5 as an opinion. Thereafter, the relationship amod(w5,w6) is searched. In case of presence of amod relationship, if POS(w6)=JJ* and w6 is not a stop-word, then the information component is identified as <(w1,w2) or w4, w5,w6> otherwise <(w1,w2) or w4, w5,->.
15
Feature and Opinion Learner Example
Consider the following opinion sentences related to Nokia N95
The screen is very attractive and bright
The sound some times comes out very clear
Nokia N95 has a pretty screen
Yes, the push mail is the “Best” in the
business
16
Reliability Score Generator
Reliability Score Removes noise due to parsing errors Addresses contradicting opinions in reviews
17
Reliability Score Generator
HITS Algorithm A higher score value for a pair reflects a tight integrity
of the 2 components in a pair The Hubs and Authority scores are computed iteratively The feature score is calculated using the term
frequency and inverse sentence frequency in each sentence of the document
𝐴𝑆(𝑡+1 ) (𝑑𝑘 )= ∑𝑝 𝑖𝑗∈𝑉 𝑝
❑
𝑊𝑘𝑖𝑗×𝐻𝑆(𝑡 )𝑝𝑖𝑗
𝐻𝑆( 𝑡+1) (𝑝𝑖𝑗 )= ∑𝑑𝑘∈𝑉 𝑑
❑
𝑊𝑘𝑖𝑗×𝐴𝑆(𝑡 )𝑑𝑘
Based on feature score and opinion
score
Authority
Hub
18
Reliability Score Generator
Pseudocode1 G := set of pages 2 for each page p in G do 3 p.auth = 1 // p.auth is the authority score of the page p 4 p.hub = 1 // p.hub is the hub score of the page p 5 function HubsAndAuthorities(G) 6 for step from 1 to k do // run the algorithm for k steps 7 for each page p in G do // update all authority values first 8 p.auth = 0 9 for each page q in p.incomingNeighbors do // p.incomingNeighbors is the set of pages that link to p 10 p.auth += q.hub 11 for each page p in G do // then update all hub values 12 p.hub = 0 13 for each page r in p.outgoingNeighbors do //p.outgoingNeighbors is the set of pages that p links to 14 p.hub += r.auth
19
Experimental Results
Dataset
400 review
4333 noun (or verb) and adjective
pairs
1366 candidate features
obtained after filtering
Sample list of extracted features, opinions, modifiers
20
Experimental Results
Metrics
True positive (TP): number of feature-opinion pairs that the system identifies correctly
False positive (FP): number of feature-opinion pairs that are falsely identified by the system
False negative (FN): number of feature-opinion pairs that the system fails to identify
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛=𝑇𝑃
𝑇𝑃+𝐹𝑃 𝑅𝑒𝑐𝑎𝑙𝑙=𝑇𝑃
𝑇𝑃+𝐹𝑁
𝐹−𝑀𝑒𝑎𝑠𝑢𝑟𝑒=2×𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛×𝑅𝑒𝑐𝑎𝑙𝑙𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
21
Experimental Results
Feature and Opinion Learner Precision: 79.3% Recall: 70.6% F-Measure: 74.7% Observations
Direct and strong relationships between noun and adjectives cause non-relevant feature-opinion pairs
Lack of grammatical correctness in reviews affects the results yielded by NLP parsers
Recall values lower than precision indicates the inability of the systems to extract certain feature-opinion pairs correctly
22
Experimental Results
Sample results for different products
Observations Lack of variations on metric values indicates
the applicability of the proposed approach regardless of the domain of the review documents
23
Experimental Results
Reliability Score GeneratorTop-5 hub scored feature and opinion pairs and their reliability scores
Sample feature-opinion pairs along with their hub and reliability scores
24
Conclusions
Future Work Refine rules to improve precision and
identify implicit features Handle informal text common in reviews
Reviews + Rules
Feature-pinion pairs
Hits Algorithm
Reliability scores