construction of a sentimental word dictionary

1
Construction of a Sentimental Word Dictionary Eduard C. Dragut Cyber Center Purdue University Weiyi Meng Computer Science Department Binghamton University Clement Yu Computer Science Department University of Illinois at Chicago Prasad Sistla Computer Science Department University of Illinois at Chicago Sentim ental W ord D ictionary The proposed dictionary has the majority sentimentproperty each w ord w ith a given partofspeech has polarity p (positive ornegative)ifthe m ajority sense ofthe w ord w ith thatpartofspeech has polarity p E.g.,the w ord “bland”has 3 senses,outofw hich 2 have negative polarity and 1 a positive polarity. W hy such a property is im portant? Deduction ofothersentim entalw ords w ith the sam e property. D etection of inconsistencies in inputdictionaries. Motivation W hy? Itfacilitates opinion m ining and opinion retrieval What? R eviews,com m ents aboutproducts,services, governm entpolicies reviewing. W here? The W eb has plenty ofreview s,com m ents and reports aboutproducts,services,etc. Bottom line: M any approaches rely on lexicons of polarw ords . C ontribution A deduction approach A setofabout20 inference rules. The resulting sentim ental w ord dictionary contains approxim ately 50% m ore w ords than the seed dictionary. The accuracy ofthe deduced polarities is com parable to thatofhum an judgm ent. 3 students w ere asked to evaluate 100 random ly chosen w ords. Building the Sentim ental D ictionary C onstructthe sentim ental dictionary on top ofthe electronic dictionary W ordN et The idea: 1. Startfrom a sm all num berofw ords w hose polarities are know n. 2. Propagate the polarities to the synsets. U se inference rules 3. D eterm ine the polarities ofnew w ords U se the m ajority sentim entdefinition. 4. G o to 2,until no m ore polarities are inferred. SynsetPolarity Inference Exam ple ofan inference rule w ith one w ord Hypothesis : w a word with polarity p Conclusion: (positive or negative) and two synsets. •Exam ple: “advance”has positive polarity in G eneralInquirer; Ithas tw o senses w ith identicalrelative frequencies in W ordN et; H ence,w e deduce thatboth its synsets have positive polarities. p 0.5 0.5 w s1 s2 p p p 0.5 0.5 w s1 s2 SynsetPolarity Inference Exam ple ofan inference rule Hypothesis : w a word with polarity p Conclusion: (positive or negative). • Exam ple: “consum ate”has positive polarity in G eneralInquirer. p w s1 sn s2 with known polarities ≠ p •unknow n polarities •sum ofrelative frequencies > 0.5 p w s1 sn s2 with known polarities ≠ p all have polarityp Experim ents:Autom atic D iscovery Data sets WordNet[Fellbaum98]and 3 sentim ental dictionaries G eneral Inquirer[Stone96], Appraisal Lexicon [Taboada04]and O pinion Finder [Wilson05] Take the union ofthe 3 sentim ental dictionaries PO S InputW ords Inferred W ords Inferred Synsets Noun 2,315 1,460 1,683 Verb 1,617 844 1,079 A djective 2,937 1,407 1,907 A dverb 925 364 430 Total 7,794 4,075 5,099 Experim ents:Accuracy 100 w ords w ere random ly chosen from the 4075 w ords according to theirdistributions N ouns (22.2% ),adjectives (37.5% ),adverbs (11.5% )and verbs (28.8% ) 3 hum ans judge theirdeduced polarities. The agreem entbetw een hum ans is 62% . The agreem entbetw een hum ans and autom atic deduction is 63.3%. R eason forlow agreem entbetw een hum ans: preconceived notions ofthe polarities ofw ords/phrases E.g.,“eatat”. R elated W ork WordNet-based SentiW ordN et[Esuli06]assigns degrees ofpolarities Q -W ordN et[Agerri10]starts from 6 synsets w ith “know n” polarities: “positive”,“negative”,“good”,“bad”,“inferior”and “superior”. Propagates the polarities using the sem antic relationships • E.g.,antonym,hypernym,etc. M easuring relative distance ofa term from exemplars [Kam ps04] E.g.,“good”and “bad” C orpora-based References [Fellbaum98] C. Fellbaum. Wordnet: An on-line lexical database and some of its applications. 1998. [Stone96] P. Stone, D. Dunphy, M. Smith, and J. Ogilvie. The general inquirer: A computer approach to content analysis. In MIT Press, 1996 [Taboada 04] M. Taboada and J. Grieve. Analyzing appraisal automatically. In AAAI Spring Symposium on Exploring Attitude and Affect in Text, 2004 References [Wilson05] T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase- level sentiment analysis. In HLT/EMNLP, 2005. [Esuli06] A. Esuli and F. Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In LREC, 2006. [Agerri10] R. Agerri and A. Garc´ıa-Serrano, Q-wordnet: Extracting polarity from wordnet senses, in LREC, 2010. [Kamps04] J. Kamps, M. Marx, R. Mokken, and M. de Rijke, Using wordnet to measure semantic

Upload: floria

Post on 05-Jan-2016

22 views

Category:

Documents


2 download

DESCRIPTION

Construction of a Sentimental Word Dictionary. Eduard C. Dragut Cyber Center Purdue University. Clement Yu Computer Science Department University of Illinois at Chicago. Prasad Sistla Computer Science Department University of Illinois at Chicago. Weiyi Meng Computer Science Department - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Construction of a Sentimental Word Dictionary

Construction of a Sentimental Word DictionaryEduard C. Dragut

Cyber CenterPurdue University

Weiyi MengComputer Science Department

Binghamton University

Clement YuComputer Science DepartmentUniversity of Illinois at Chicago

Prasad SistlaComputer Science DepartmentUniversity of Illinois at Chicago

Sentimental Word Dictionary

The proposed dictionary has the majority sentiment property each word with a given part of speech has polarity p

(positive or negative) if the majority sense of the word with that part of speech has polarity p

E.g., the word “bland” has 3 senses, out of which 2 have negative polarity and 1 a positive polarity.

Why such a property is important? Deduction of other sentimental words with the same

property. Detection of inconsistencies in input dictionaries.

Motivation

Why? It facilitates opinion mining and opinion retrieval

What? Reviews, comments about products, services,

government policies reviewing.

Where? The Web has plenty of reviews, comments and

reports about products, services, etc.

Bottom line: Many approaches rely on lexicons of polar words.

Contribution

A deduction approach A set of about 20 inference rules.

The resulting sentimental word dictionary contains approximately 50% more words than the seed dictionary.

The accuracy of the deduced polarities is comparable to that of human judgment. 3 students were asked to evaluate 100 randomly

chosen words.

Building the Sentimental Dictionary

Construct the sentimental dictionary on top of the electronic dictionary WordNet

The idea:1. Start from a small number of words whose

polarities are known.2. Propagate the polarities to the synsets.

Use inference rules

3. Determine the polarities of new words Use the majority sentiment definition.

4. Go to 2, until no more polarities are inferred.

Synset Polarity InferenceExample of an inference rule with one word

Hypothesis: w a word with polarity p Conclusion:

(positive or negative) and two synsets.

• Example:– “advance” has positive polarity in General Inquirer;– It has two senses with identical relative frequencies in WordNet;– Hence, we deduce that both its synsets have positive polarities.

p

0.5 0.5

w

s1 s2

p

pp

0.5 0.5

w

s1 s2

Synset Polarity InferenceExample of an inference rule

Hypothesis: w a word with polarity p Conclusion:

(positive or negative).

• Example:– “consumate” has positive polarity in General Inquirer.

p w

s1 sns2 …

with known polarities ≠ p

•unknown polarities•sum of relative frequencies > 0.5

p w

s1 sns2 …

with known polarities ≠ p

all have polarity p

Experiments: Automatic Discovery

Data sets WordNet [Fellbaum98] and

3 sentimental dictionaries General Inquirer [Stone96], Appraisal Lexicon [Taboada04] and Opinion Finder [Wilson05]

Take the union of the 3 sentimental dictionariesPOS Input Words Inferred Words Inferred Synsets

Noun 2,315 1,460 1,683

Verb 1,617 844 1,079

Adjective 2,937 1,407 1,907

Adverb 925 364 430

Total 7,794 4,075 5,099

Experiments: Accuracy

100 words were randomly chosen from the 4075 words according to their distributions Nouns (22.2%), adjectives (37.5%), adverbs (11.5%) and

verbs (28.8%)

3 humans judge their deduced polarities. The agreement between humans is 62%.

The agreement between humans and automatic deduction is 63.3%.

Reason for low agreement between humans: preconceived notions of the polarities of words/phrases

E.g., “eat at”.

Related Work

WordNet-based SentiWordNet[Esuli06] assigns degrees of polarities

Q-WordNet [Agerri10] starts from 6 synsets with “known” polarities:

• “positive”, “negative”, “good”, “bad”, “inferior” and “superior”.

• Propagates the polarities using the semantic relationships• E.g., antonym, hypernym, etc.

Measuring relative distance of a term from exemplars[Kamps04] E.g., “good” and “bad”

Corpora-based

References[Fellbaum98] C. Fellbaum. Wordnet: An on-line lexical database and some of its applications. 1998.[Stone96] P. Stone, D. Dunphy, M. Smith, and J. Ogilvie. The general inquirer: A computer approach to content analysis. In MIT Press, 1996[Taboada 04] M. Taboada and J. Grieve. Analyzing appraisal automatically. In AAAI Spring Symposium on Exploring Attitude and Affect in Text, 2004

References[Wilson05] T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In HLT/EMNLP, 2005.[Esuli06] A. Esuli and F. Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In LREC, 2006.[Agerri10] R. Agerri and A. Garc´ıa-Serrano, Q-wordnet: Extracting polarity from wordnet senses, in LREC, 2010.[Kamps04] J. Kamps, M. Marx, R. Mokken, and M. de Rijke, Using wordnet to measure semantic orientation of adjectives, in LREC, 2004.