an effective statistical approach to blog post opinion retrieval
DESCRIPTION
An Effective Statistical Approach to Blog Post Opinion Retrieval. Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008). Introduction. Blogs have recently emerged as a new grassroots publishing medium. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/1.jpg)
An Effective Statistical Approach to Blog Post
Opinion Retrieval
Ben He, Craig Macdonald, Jiyin He, Iadh Ounis
(CIKM 2008)
![Page 2: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/2.jpg)
2
Introduction
Blogs have recently emerged as a new grassroots publishing medium.
A key feature that distinguishes blog content from other Web content is their subjective nature.
Bloggers tend to express opinions and comments towards some given targets, such as persons, organizations or products.
![Page 3: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/3.jpg)
3
Under the TREC opinion finding task, only a handful of groups achieved an improvement over their baseline, using techniques such as NLP or SVM classifiers.
These proposed approaches either involve considerable manual efforts in collecting evidence for opinions, or lead to little improvement over a baseline that does not include any opinion finding feature.
Introduction
![Page 4: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/4.jpg)
4
This paper proposes a statistical and light-weight automatic dictionary-based approach.
Also shows that despite its apparent simplicity, it provides statistically significant improvements over robust baselines, including the best TREC baseline run, without any manual effort.
Introduction
![Page 5: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/5.jpg)
5
The Statistical Dictionary-basedApproach to Opinion Retrieval
1. Automatically generates a dictionary from the collection without requiring manual effort.
2. Assigns a weight to each term in the dictionary, which represents how opinionated the term is.
3. Assigns an opinion score to each document in the collection using the top weighted terms from the dictionary as a query.
4. Appropriately combines the opinion score with the initial relevance score produced by the retrieval baseline.
![Page 6: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/6.jpg)
6
Dictionary Generation
To derive the dictionary, we filter out too frequent or too rare terms in the collection.
We remove those terms because if a term appears too many or too few times in the collection, then it probably contains too little or too specific information so that it can not be generalized to different queries in indicating opinion.
![Page 7: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/7.jpg)
7
We firstly rank all terms in the collection by their within-collection frequencies in descending order.
The terms, whose rankings are in the range (s·#terms, u·#terms), are selected in the dictionary.
We apply s = 0.00007 and u = 0.001.
Dictionary Generation
![Page 8: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/8.jpg)
8
Dictionary Generation
![Page 9: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/9.jpg)
9
Term Weighting
D(Rel): relevant document set. D(opRel): opinionated relevant document set. For each term t in the opinion term dictionary, w
e measure wopn(t), the divergence of the term’s distribution in D(opRel) from that in D(Rel).
This divergence value measures how a term stands out from the opinionated documents, compared with all relevant documents.
The higher the divergence is, the more opinionated the term is.
![Page 10: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/10.jpg)
10
Term Weighting
A commonly used measure for term weighting is the KL divergence from a term’s distribution in a document set to its distribution in the whole collection.
![Page 11: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/11.jpg)
11
KL divergence measure considers only the divergence from one distribution to the other, while ignoring how frequent a term occurs in the opinionated documents.
The weights of the terms in the opinion dictionary might be biased towards the terms with high KL divergence values, but containing low information in the opinionated document set D(opRel).
Term Weighting
![Page 12: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/12.jpg)
12
Term Weighting
Another method: Bo1 term weighting model, which measures how informative a term is in the set D(opRel) against D(Rel).
λ= tfrel/Nrel
![Page 13: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/13.jpg)
13
Generating the Opinion Score
We take the X (in the experiment, set X=100) top weighted terms from the opinion dictionary, and submit them to the retrieval system as a query Qopn.
The retrieval system assigns a relevance score to each document in the collection.
Such a relevance score reflects the extent to which the top weighted opinionated terms are informative in the document, capturing the overall opinionated nature of the document.
This is called the opinion score: Score(d, Qopn).
![Page 14: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/14.jpg)
14
Score Combination
1. Linear combination:
2. Log. combination:
![Page 15: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/15.jpg)
15
Experiment: Data
Dataset: Blog06 collection. Use permalinks, which are the blog posts and t
heir associated comments. Each term is stemmed using Porter’s English st
emmer, and standard English stopwords are removed.
![Page 16: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/16.jpg)
16
Experiment: Baseline
InLB document weighting model:
b=0.2337
![Page 17: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/17.jpg)
17
Experiment: External Opinion Dictionary
We also manually generate a dictionary compiled from various external linguistic resources.
The dictionary contains approximately 12,000 English words, mostly adjectives, adverbs and nouns, which are supposed to be subjective.
In this paper, we denote the manually edited dictionary by the external dictionary, and we denote the automatically derived one by the internal dictionary.
![Page 18: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/18.jpg)
18
Experiment: External Opinion Dictionary
![Page 19: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/19.jpg)
19
Experiment: Evaluation
![Page 20: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/20.jpg)
20
Experiment: Evaluation
Use Bo1 term weighting method. Set a=0.25, k=250.
![Page 21: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/21.jpg)
21
This paper has proposed an effective and practical approach to retrieving opinionated blog posts without the need for manual effort.
The use of the automatically generated internal dictionary provides a retrieval performance that is as good as the use of an external dictionary manually compiled from various linguistic resources.
Conclusions and Future Work
![Page 22: An Effective Statistical Approach to Blog Post Opinion Retrieval](https://reader035.vdocument.in/reader035/viewer/2022062410/56815931550346895dc6671f/html5/thumbnails/22.jpg)
22
In the future:1. Extend the work to detecting the polarity or the
orientation of the retrieved opinionated documents.
2. Study the connection of the opinion finding task to question answering.
Ex. Extracting the opinionated sentences within a blog post about a given target.
Conclusions and Future Work