towards answering opinion questions: separating facts from...

79
Introduction The Approach Experiments Conclusion Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences Hong Yu Vasileios Hatzivassiloglou Columbia University, New York 24. November, 2008 Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP-03). Presented by Lasse Soelberg Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 1 / 35

Upload: others

Post on 21-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Towards Answering Opinion Questions: SeparatingFacts from Opinions and Identifying the Polarity of

Opinion Sentences

Hong Yu Vasileios Hatzivassiloglou

Columbia University, New York

24. November, 2008

Proceedings of the 2003 Conference on Empirical Methods inNatural Language Processing (EMNLP-03).

Presented by Lasse SoelbergHong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 1 / 35

Page 2: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

IntroductionGoals of the Paper

Layout

1 IntroductionIntroductionGoals of the Paper

2 The ApproachDocument ClassificationFinding Opinion SentencesIdentifying the Polarity

3 ExperimentsDataEvaluationResults

4 ConclusionConclusionRelated WorkArticle Evaluation

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 2 / 35

Page 3: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

IntroductionGoals of the Paper

Introduction

Towards Answering Opinion Questions

Question-answering systems.

Easier to use factual statements.

Extend to also use subjective opinion statements.

Simple Question

Who was elected as the new US President in 2008?

Complex Question

What has caused the current financial crisis?

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 3 / 35

Page 4: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

IntroductionGoals of the Paper

Introduction

Towards Answering Opinion Questions

Question-answering systems.

Easier to use factual statements.

Extend to also use subjective opinion statements.

Simple Question

Who was elected as the new US President in 2008?

Complex Question

What has caused the current financial crisis?

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 3 / 35

Page 5: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

IntroductionGoals of the Paper

Introduction

Towards Answering Opinion Questions

Question-answering systems.

Easier to use factual statements.

Extend to also use subjective opinion statements.

Simple Question

Who was elected as the new US President in 2008?

Complex Question

What has caused the current financial crisis?

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 3 / 35

Page 6: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

IntroductionGoals of the Paper

Separating Facts from Opinions and Identifying thePolarity of Opinion Sentences

Document Classification

Classifying articles as either subjective or objective

Finding Opinion Sentences

In both subjective and objective articles

Identify the Polarity of Opinion Sentences

Determine if the opinions are positive or negative

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 4 / 35

Page 7: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

IntroductionGoals of the Paper

Separating Facts from Opinions and Identifying thePolarity of Opinion Sentences

Document Classification

Classifying articles as either subjective or objective

Finding Opinion Sentences

In both subjective and objective articles

Identify the Polarity of Opinion Sentences

Determine if the opinions are positive or negative

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 4 / 35

Page 8: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

IntroductionGoals of the Paper

Separating Facts from Opinions and Identifying thePolarity of Opinion Sentences

Document Classification

Classifying articles as either subjective or objective

Finding Opinion Sentences

In both subjective and objective articles

Identify the Polarity of Opinion Sentences

Determine if the opinions are positive or negative

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 4 / 35

Page 9: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Layout

1 IntroductionIntroductionGoals of the Paper

2 The ApproachDocument ClassificationFinding Opinion SentencesIdentifying the Polarity

3 ExperimentsDataEvaluationResults

4 ConclusionConclusionRelated WorkArticle Evaluation

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 5 / 35

Page 10: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Document Types

Training Sets

Articles from Wall Street Journal, which is annotated withdocument types.

Subjective Articles (Opinion)

Editorials

Letter to the Editor

Objective Articles (Fact)

News

Business

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 6 / 35

Page 11: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Document Types

Training Sets

Articles from Wall Street Journal, which is annotated withdocument types.

Subjective Articles (Opinion)

Editorials

Letter to the Editor

Objective Articles (Fact)

News

Business

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 6 / 35

Page 12: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Document Types

Training Sets

Articles from Wall Street Journal, which is annotated withdocument types.

Subjective Articles (Opinion)

Editorials

Letter to the Editor

Objective Articles (Fact)

News

Business

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 6 / 35

Page 13: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Classification

Naive Bayes

Calculating the likelihood that the document is either subjective orobjective.

Bayes Rule

P(c |d) = P(c)P(d |c)P(d)

where c is a class, d is a document and single words are used asfeature.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 7 / 35

Page 14: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Classification

Naive Bayes

Calculating the likelihood that the document is either subjective orobjective.

Bayes Rule

P(c |d) = P(c)P(d |c)P(d)

where c is a class, d is a document and single words are used asfeature.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 7 / 35

Page 15: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Three Different Approaches

Rely on Expectation

Documents classified as opinions tends to have mostly opinionsentences, and documents classified as facts tends to have morefactual sentences.

The Three Approaches

Similarity Approach

Naive Bayes Classifier

Multiple Naive Bayes Classifier

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 8 / 35

Page 16: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Three Different Approaches

Rely on Expectation

Documents classified as opinions tends to have mostly opinionsentences, and documents classified as facts tends to have morefactual sentences.

The Three Approaches

Similarity Approach

Naive Bayes Classifier

Multiple Naive Bayes Classifier

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 8 / 35

Page 17: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Similarity Approach

Hypothesis

Opinion sentences within a given topic will be more similar toother opinion sentences than to factual sentences.

SimFinder

Measures sentence similarity based on shared words, phrases andWordNet synsets.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 9 / 35

Page 18: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Similarity Approach

Hypothesis

Opinion sentences within a given topic will be more similar toother opinion sentences than to factual sentences.

SimFinder

Measures sentence similarity based on shared words, phrases andWordNet synsets.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 9 / 35

Page 19: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Variants

The score variant

Select documents with the same topic as the sentence.

Average the similarities with each sentence in the documents.

Assign the sentence to the category with the highest average.

The frequency variant

Count how many of the sentences, for each category, that exceedsa predetermined threshold (set to 0.65).

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 10 / 35

Page 20: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Variants

The score variant

Select documents with the same topic as the sentence.

Average the similarities with each sentence in the documents.

Assign the sentence to the category with the highest average.

The frequency variant

Count how many of the sentences, for each category, that exceedsa predetermined threshold (set to 0.65).

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 10 / 35

Page 21: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Naive Bayes Classifier

Bayes Rule

P(c |d) = P(c)P(d |c)P(d)

Some of Features Used

Words

Bigrams

Trigrams

Parts of Speech

Counts of positive and negative words

Counts of the polarities of semantically oriented words

Average semantic orientation score of the words

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 11 / 35

Page 22: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Naive Bayes Classifier

Bayes Rule

P(c |d) = P(c)P(d |c)P(d)

Some of Features Used

Words

Bigrams

Trigrams

Parts of Speech

Counts of positive and negative words

Counts of the polarities of semantically oriented words

Average semantic orientation score of the words

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 11 / 35

Page 23: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Multiple Naive Bayes Classifier

Problem

The designation of all sentences as opinions or facts is anapproximation.

Solution

Use multiple Naive Bayes classifiers, each using a different subsetof the features.

The Goal

Reduce the training set to the sentences most likely to be correctlylabelled.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 12 / 35

Page 24: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Multiple Naive Bayes Classifier

Problem

The designation of all sentences as opinions or facts is anapproximation.

Solution

Use multiple Naive Bayes classifiers, each using a different subsetof the features.

The Goal

Reduce the training set to the sentences most likely to be correctlylabelled.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 12 / 35

Page 25: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Multiple Naive Bayes Classifier

Problem

The designation of all sentences as opinions or facts is anapproximation.

Solution

Use multiple Naive Bayes classifiers, each using a different subsetof the features.

The Goal

Reduce the training set to the sentences most likely to be correctlylabelled.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 12 / 35

Page 26: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Multiple Naive Bayes Classifier

The Approach

Train separate classifiers C1,C2, ...,Cm given separate featuresets F1,F2, ...,Fm.

Assume sentences inherit the document classification.

Train C1 on the entire training set, and use it to predict labelsfor the training set.

Remove sentences with labels different from the assumption,and train C2 on the remaining sentences.

Continue iteratively until no more sentences can be removed.

Five Feature Sets

Starting with only words and adding in bigrams, trigrams, part ofspeech and polarity.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 13 / 35

Page 27: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Multiple Naive Bayes Classifier

The Approach

Train separate classifiers C1,C2, ...,Cm given separate featuresets F1,F2, ...,Fm.

Assume sentences inherit the document classification.

Train C1 on the entire training set, and use it to predict labelsfor the training set.

Remove sentences with labels different from the assumption,and train C2 on the remaining sentences.

Continue iteratively until no more sentences can be removed.

Five Feature Sets

Starting with only words and adding in bigrams, trigrams, part ofspeech and polarity.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 13 / 35

Page 28: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Multiple Naive Bayes Classifier

The Approach

Train separate classifiers C1,C2, ...,Cm given separate featuresets F1,F2, ...,Fm.

Assume sentences inherit the document classification.

Train C1 on the entire training set, and use it to predict labelsfor the training set.

Remove sentences with labels different from the assumption,and train C2 on the remaining sentences.

Continue iteratively until no more sentences can be removed.

Five Feature Sets

Starting with only words and adding in bigrams, trigrams, part ofspeech and polarity.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 13 / 35

Page 29: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Multiple Naive Bayes Classifier

The Approach

Train separate classifiers C1,C2, ...,Cm given separate featuresets F1,F2, ...,Fm.

Assume sentences inherit the document classification.

Train C1 on the entire training set, and use it to predict labelsfor the training set.

Remove sentences with labels different from the assumption,and train C2 on the remaining sentences.

Continue iteratively until no more sentences can be removed.

Five Feature Sets

Starting with only words and adding in bigrams, trigrams, part ofspeech and polarity.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 13 / 35

Page 30: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Multiple Naive Bayes Classifier

The Approach

Train separate classifiers C1,C2, ...,Cm given separate featuresets F1,F2, ...,Fm.

Assume sentences inherit the document classification.

Train C1 on the entire training set, and use it to predict labelsfor the training set.

Remove sentences with labels different from the assumption,and train C2 on the remaining sentences.

Continue iteratively until no more sentences can be removed.

Five Feature Sets

Starting with only words and adding in bigrams, trigrams, part ofspeech and polarity.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 13 / 35

Page 31: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Multiple Naive Bayes Classifier

The Approach

Train separate classifiers C1,C2, ...,Cm given separate featuresets F1,F2, ...,Fm.

Assume sentences inherit the document classification.

Train C1 on the entire training set, and use it to predict labelsfor the training set.

Remove sentences with labels different from the assumption,and train C2 on the remaining sentences.

Continue iteratively until no more sentences can be removed.

Five Feature Sets

Starting with only words and adding in bigrams, trigrams, part ofspeech and polarity.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 13 / 35

Page 32: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Identifying the Polarity of Opinion Sentences

What We Have

Sentences that are distinguished as either opinions or facts.

What We Want

Separate the opinion sentences into three classes

Positive sentences.

Negative Sentences.

Neutral sentences.

How We Do It

By the number and strength of semantically oriented words (eitherpositive or negative) in the sentence.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 14 / 35

Page 33: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Identifying the Polarity of Opinion Sentences

What We Have

Sentences that are distinguished as either opinions or facts.

What We Want

Separate the opinion sentences into three classes

Positive sentences.

Negative Sentences.

Neutral sentences.

How We Do It

By the number and strength of semantically oriented words (eitherpositive or negative) in the sentence.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 14 / 35

Page 34: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Identifying the Polarity of Opinion Sentences

What We Have

Sentences that are distinguished as either opinions or facts.

What We Want

Separate the opinion sentences into three classes

Positive sentences.

Negative Sentences.

Neutral sentences.

How We Do It

By the number and strength of semantically oriented words (eitherpositive or negative) in the sentence.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 14 / 35

Page 35: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Semantically Oriented Words

Hypothesis

Positive words co-occur more than expected by chance, and so donegative words.

Approach

Measure the words co-occurence with words from a known seed setof semantically oriented words.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 15 / 35

Page 36: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Semantically Oriented Words

Hypothesis

Positive words co-occur more than expected by chance, and so donegative words.

Approach

Measure the words co-occurence with words from a known seed setof semantically oriented words.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 15 / 35

Page 37: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Semantically Oriented Words

Log-likelihood ratio

L(Wi ,POSj) = log

( Freq(Wi ,POSj ,ADJp)+ε

Freq(Wall ,POSj ,ADJp)

Freq(Wi ,POSj ,ADJn)+ε

Freq(Wall ,POSj ,ADJn)

)Where Wi is a word in the sentence, ADJp is positive seed wordset, ADJn is negative seed word set, POSj is part of speechcollocation frequency ratio with ADJp and ADJn and ε is asmoothing constant (0.5).

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 16 / 35

Page 38: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Sentence Polarity Tagging

Determine the orientation of an opinion sentence

Specify cutoffs tp and tn.

Calculate the sentences average log-likelihood score.

Positive sentences have average scores greater than tp.

Negative sentences have average scores lower than tn.

Neutral sentences have average scores between tp and tn.

Optimal tp and tn values

Are obtained from the training data via density estimation, using asmall subset of hand-labeled sentences.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 17 / 35

Page 39: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Sentence Polarity Tagging

Determine the orientation of an opinion sentence

Specify cutoffs tp and tn.

Calculate the sentences average log-likelihood score.

Positive sentences have average scores greater than tp.

Negative sentences have average scores lower than tn.

Neutral sentences have average scores between tp and tn.

Optimal tp and tn values

Are obtained from the training data via density estimation, using asmall subset of hand-labeled sentences.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 17 / 35

Page 40: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Seed Set

Seed words used

The seed words were subsets of 1.336 adjectives that weremanually classified as either positive or negative.

Seed Set Size

To see whether seed set sizes would influence the result, seed setsof 1, 20, 100 and over 600 positive and negative pairs of adjectiveswere used.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 18 / 35

Page 41: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

Document ClassificationFinding Opinion SentencesIdentifying the Polarity

Seed Set

Seed words used

The seed words were subsets of 1.336 adjectives that weremanually classified as either positive or negative.

Seed Set Size

To see whether seed set sizes would influence the result, seed setsof 1, 20, 100 and over 600 positive and negative pairs of adjectiveswere used.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 18 / 35

Page 42: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Layout

1 IntroductionIntroductionGoals of the Paper

2 The ApproachDocument ClassificationFinding Opinion SentencesIdentifying the Polarity

3 ExperimentsDataEvaluationResults

4 ConclusionConclusionRelated WorkArticle Evaluation

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 19 / 35

Page 43: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Data

Data Used

The data is from the TREC 8,9 and 11 collections, which consistsof more than 1.7 million newswire articles from six differentsources.

Wall Street journal

Some articles are marked with document type

Editorial (2,877)

Letter to Editor (1,695)

Business (2,009)

News (3,714)

2,000 articles from each type is randomly selected.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 20 / 35

Page 44: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Data

Data Used

The data is from the TREC 8,9 and 11 collections, which consistsof more than 1.7 million newswire articles from six differentsources.

Wall Street journal

Some articles are marked with document type

Editorial (2,877)

Letter to Editor (1,695)

Business (2,009)

News (3,714)

2,000 articles from each type is randomly selected.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 20 / 35

Page 45: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Evaluation Metrics

Recall

The fraction of the relevant documents that are retrieved.

recall = |{relevant documents}∩{retrieved documents}||{relevant documents}|

Precision

The fraction of the retrieved documents that are relevant.

precision = |{relevant documents}∩{retrieved documents}||{retrieved documents}|

F-measure

The weighted harmonic mean of recall and precision.

F = 2·precision·recall(precision+recall)

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 21 / 35

Page 46: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Evaluation Metrics

Recall

The fraction of the relevant documents that are retrieved.

recall = |{relevant documents}∩{retrieved documents}||{relevant documents}|

Precision

The fraction of the retrieved documents that are relevant.

precision = |{relevant documents}∩{retrieved documents}||{retrieved documents}|

F-measure

The weighted harmonic mean of recall and precision.

F = 2·precision·recall(precision+recall)

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 21 / 35

Page 47: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Evaluation Metrics

Recall

The fraction of the relevant documents that are retrieved.

recall = |{relevant documents}∩{retrieved documents}||{relevant documents}|

Precision

The fraction of the retrieved documents that are relevant.

precision = |{relevant documents}∩{retrieved documents}||{retrieved documents}|

F-measure

The weighted harmonic mean of recall and precision.

F = 2·precision·recall(precision+recall)

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 21 / 35

Page 48: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Examples

Common Attributes

Body of 1,000 documents.

100 relevant documents.

Example 1

50 retrieved documents, all relevant.

Precision = 1.00

, Recall = 0.5

, F-Measure = 0.67

Example 2

Retrieves all documents.

Recall = 1.00

, precision = 0.1

, F-Measure = 0.18

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 22 / 35

Page 49: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Examples

Common Attributes

Body of 1,000 documents.

100 relevant documents.

Example 1

50 retrieved documents, all relevant.

Precision = 1.00

, Recall = 0.5

, F-Measure = 0.67

Example 2

Retrieves all documents.

Recall = 1.00

, precision = 0.1

, F-Measure = 0.18

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 22 / 35

Page 50: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Examples

Common Attributes

Body of 1,000 documents.

100 relevant documents.

Example 1

50 retrieved documents, all relevant.

Precision = 1.00

, Recall = 0.5

, F-Measure = 0.67

Example 2

Retrieves all documents.

Recall = 1.00

, precision = 0.1

, F-Measure = 0.18

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 22 / 35

Page 51: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Examples

Common Attributes

Body of 1,000 documents.

100 relevant documents.

Example 1

50 retrieved documents, all relevant.

Precision = 1.00, Recall = 0.5

, F-Measure = 0.67

Example 2

Retrieves all documents.

Recall = 1.00

, precision = 0.1

, F-Measure = 0.18

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 22 / 35

Page 52: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Examples

Common Attributes

Body of 1,000 documents.

100 relevant documents.

Example 1

50 retrieved documents, all relevant.

Precision = 1.00, Recall = 0.5, F-Measure = 0.67

Example 2

Retrieves all documents.

Recall = 1.00

, precision = 0.1

, F-Measure = 0.18

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 22 / 35

Page 53: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Examples

Common Attributes

Body of 1,000 documents.

100 relevant documents.

Example 1

50 retrieved documents, all relevant.

Precision = 1.00, Recall = 0.5, F-Measure = 0.67

Example 2

Retrieves all documents.

Recall = 1.00

, precision = 0.1

, F-Measure = 0.18

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 22 / 35

Page 54: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Examples

Common Attributes

Body of 1,000 documents.

100 relevant documents.

Example 1

50 retrieved documents, all relevant.

Precision = 1.00, Recall = 0.5, F-Measure = 0.67

Example 2

Retrieves all documents.

Recall = 1.00

, precision = 0.1

, F-Measure = 0.18

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 22 / 35

Page 55: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Examples

Common Attributes

Body of 1,000 documents.

100 relevant documents.

Example 1

50 retrieved documents, all relevant.

Precision = 1.00, Recall = 0.5, F-Measure = 0.67

Example 2

Retrieves all documents.

Recall = 1.00, precision = 0.1

, F-Measure = 0.18

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 22 / 35

Page 56: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Examples

Common Attributes

Body of 1,000 documents.

100 relevant documents.

Example 1

50 retrieved documents, all relevant.

Precision = 1.00, Recall = 0.5, F-Measure = 0.67

Example 2

Retrieves all documents.

Recall = 1.00, precision = 0.1, F-Measure = 0.18

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 22 / 35

Page 57: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Gold Standards

Document-level Standard

Already available from Wall Streel Journal.

News and Business is mapped to facts.

Editorial and Letter to the Editor is mapped to opinions.

Sentence-level Standard

There is no automated standard that can distinguish betweenfacts and opinions, or between positive and negative opinions.

Human evaluators classify a set of sentences between factsand opinions and determine the type of opinions.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 23 / 35

Page 58: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Gold Standards

Document-level Standard

Already available from Wall Streel Journal.

News and Business is mapped to facts.

Editorial and Letter to the Editor is mapped to opinions.

Sentence-level Standard

There is no automated standard that can distinguish betweenfacts and opinions, or between positive and negative opinions.

Human evaluators classify a set of sentences between factsand opinions and determine the type of opinions.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 23 / 35

Page 59: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Gold Standards

Document-level Standard

Already available from Wall Streel Journal.

News and Business is mapped to facts.

Editorial and Letter to the Editor is mapped to opinions.

Sentence-level Standard

There is no automated standard that can distinguish betweenfacts and opinions, or between positive and negative opinions.

Human evaluators classify a set of sentences between factsand opinions and determine the type of opinions.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 23 / 35

Page 60: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Topics and Articles

Topics

Four topics are chosen for the evaluation

Gun control

Illegal aliens

Social security

Welfare reform

Articles

25 articles were randomly chosen for each topic from the TRECcorpus. The articles were found using the Lucene search engine.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 24 / 35

Page 61: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Topics and Articles

Topics

Four topics are chosen for the evaluation

Gun control

Illegal aliens

Social security

Welfare reform

Articles

25 articles were randomly chosen for each topic from the TRECcorpus. The articles were found using the Lucene search engine.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 24 / 35

Page 62: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Sentences

Selection of Sentences

Four sentences chosen from each document.

The sentences were grouped into ten 50-sentence blocks.

Each block shares ten sentences with the preceding andfollowing block.

Standard A

The 300 sentences appearing once, and one judgement from theremaining 100 sentences.

Standard B

The subset of the 100 sentences appearing twice, which were givenidentical labels.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 25 / 35

Page 63: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Sentences

Selection of Sentences

Four sentences chosen from each document.

The sentences were grouped into ten 50-sentence blocks.

Each block shares ten sentences with the preceding andfollowing block.

Standard A

The 300 sentences appearing once, and one judgement from theremaining 100 sentences.

Standard B

The subset of the 100 sentences appearing twice, which were givenidentical labels.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 25 / 35

Page 64: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Sentences

Selection of Sentences

Four sentences chosen from each document.

The sentences were grouped into ten 50-sentence blocks.

Each block shares ten sentences with the preceding andfollowing block.

Standard A

The 300 sentences appearing once, and one judgement from theremaining 100 sentences.

Standard B

The subset of the 100 sentences appearing twice, which were givenidentical labels.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 25 / 35

Page 65: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Document Classification

Training

The classifier was trained on 4,000 articles from WSJ andevaluated on other 4,000 articles.

The result

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 26 / 35

Page 66: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Document Classification

Training

The classifier was trained on 4,000 articles from WSJ andevaluated on other 4,000 articles.

The result

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 26 / 35

Page 67: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Sentence Classification

Three Approaches

Similarity approach

Bayes classifier

Multiple Bayes classifiers

The Similarity Approach

{recall, precision}

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 27 / 35

Page 68: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Sentence Classification

Three Approaches

Similarity approach

Bayes classifier

Multiple Bayes classifiers

The Similarity Approach

{recall, precision}

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 27 / 35

Page 69: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Sentence Classification

Bayes classifiers

{recall, precision}

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 28 / 35

Page 70: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Sentence Classification

Seed Set Size

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 29 / 35

Page 71: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

DataEvaluationResults

Polarity Classification

Accuracy of Sentence Polarity Tagging

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 30 / 35

Page 72: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

ConclusionRelated WorkArticle Evaluation

Layout

1 IntroductionIntroductionGoals of the Paper

2 The ApproachDocument ClassificationFinding Opinion SentencesIdentifying the Polarity

3 ExperimentsDataEvaluationResults

4 ConclusionConclusionRelated WorkArticle Evaluation

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 31 / 35

Page 73: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

ConclusionRelated WorkArticle Evaluation

Article Conclusion

Document Level

A fairly straightforward Bayesian classifier using lexical informationcan distinguish between mostly factual and opinion documentswith very high precision and recall.

Sentence Level

Three techniques were described for opinion/fact classificationachieving up to 91% precision and recall on opinion sentences.

Polarity

Examined an automatic method for assigning polarity information(positive, negative or neutral), which assigns the correct polarity in90% of the cases.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 32 / 35

Page 74: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

ConclusionRelated WorkArticle Evaluation

Article Conclusion

Document Level

A fairly straightforward Bayesian classifier using lexical informationcan distinguish between mostly factual and opinion documentswith very high precision and recall.

Sentence Level

Three techniques were described for opinion/fact classificationachieving up to 91% precision and recall on opinion sentences.

Polarity

Examined an automatic method for assigning polarity information(positive, negative or neutral), which assigns the correct polarity in90% of the cases.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 32 / 35

Page 75: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

ConclusionRelated WorkArticle Evaluation

Article Conclusion

Document Level

A fairly straightforward Bayesian classifier using lexical informationcan distinguish between mostly factual and opinion documentswith very high precision and recall.

Sentence Level

Three techniques were described for opinion/fact classificationachieving up to 91% precision and recall on opinion sentences.

Polarity

Examined an automatic method for assigning polarity information(positive, negative or neutral), which assigns the correct polarity in90% of the cases.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 32 / 35

Page 76: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

ConclusionRelated WorkArticle Evaluation

Related Work

Other work

There is a lot of research in the area of automated opiniondetection.

Prior works include SimFinder and classification of subjectivewords.

Recent works includes Chinese web opinion mining andgerman news article.

Our Project - Herning Municipality

Citizens entering the homecare system gets a function evaluation,in order to establish their needs for help.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 33 / 35

Page 77: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

ConclusionRelated WorkArticle Evaluation

Relation to Our Project

Function Evaluation

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 34 / 35

Page 78: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

ConclusionRelated WorkArticle Evaluation

Evaluation of the Article

The Good

Good choice of titel.

Good written description of the use of their methods.

They keep a good flow through the article.

The Not So Good

No definition of recall and precision, not even a reference.

SimFinder is presented as state-of-the-art. Made by one of theauthors.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 35 / 35

Page 79: Towards Answering Opinion Questions: Separating Facts from ...people.cs.aau.dk/~simas/dat5_08/presentations/lasse.pdf · Towards Answering Opinion Questions: Separating Facts from

IntroductionThe Approach

ExperimentsConclusion

ConclusionRelated WorkArticle Evaluation

Evaluation of the Article

The Good

Good choice of titel.

Good written description of the use of their methods.

They keep a good flow through the article.

The Not So Good

No definition of recall and precision, not even a reference.

SimFinder is presented as state-of-the-art. Made by one of theauthors.

Hong Yu, Vasileios Hatzivassiloglou Towards Answering Opinion Questions 35 / 35