deeper sentiment analysis using machine translation technology kanauama hiroshi, nasukawa tetsuya...

20
Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsu ya Tokyo Research Laboratory, IBM J apan Coling 2004

Upload: roy-parsons

Post on 04-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

Deeper Sentiment Analysis Using Machine Translation

Technology

Kanauama Hiroshi, Nasukawa TetsuyaTokyo Research Laboratory, IBM Japan

Coling 2004

Page 2: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

abstract This paper proposes a new paradigm for senti

ment analysis : translation from text documents to a set of sentiment units.

Making use of an existing transfer-based machine translation engine.

Page 3: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

introduction Sentiment analysis (SA) is a task to obtain

someone’s feelings as expressed in positive or negative comments (favorable or unfavorable), questions, and requests.

SA is becoming a useful tool for the commercial activities.

This paper describes a method to extract a set of sentiment units from sentences, which is the

key component of SA.

Page 4: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

introduction A sentiment unit is a tuple of a sentiment, a predicate,

and its arguments. It has excellent lens, but the price is too high. I don’t think the

quality of the recharger has any problem. [favorable] excellent (lens) [unfavorable] high (price)[favorable] problematic+neg (recharger)

Three sentiment units indicate that the camera has good features in its lens and recharger, and a bad feature in its price.

The extraction of these sentiment units is not a trivial task because many syntactic and semantic operations are required.

A sentiment unit should be constructed as the smallest possible informative unit so that it is easy to handle for the organizing processes after extraction.

Page 5: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

introduction Implemented an accurate sentiment analyzer by

making use of an existing transfer-based machine translation engine (Watanabe, 1992), replacing the translation patterns and bilingual lexicons with sentiment patterns and a sentiment polarity lexicon.

Use deep analysis techniquessuch as those used for machine translation where all of the syntactic and semantic phenomena must be handled.

Page 6: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

our SA system attaches importance to each individual sentiment expression, rather than to the quantitative tendencies of reputation.

introduction

Page 7: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

Sentiment Unit

A predicate is a word, typically a verb or an adjective, which conveys the main notion of the sentiment unit.

An argument is also a word, typically a noun, which modifies the predicate with a case postpositional in Japanese. They roughly correspond to a subject and an object of the predicate in English.

For example, the sentence,”ABC123 has an excellent lens”. [fav] excellent <ABC123, lens>

Page 8: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

Sentiment Unit Semantically similar representations should be aggre

gated to organize extracted sentiments. Predicates may have features, such as negation, facilit

y, difficulty, etc. “ABC123 doesn’t have an excellent lens.”

[unf] excellent + neg <ABC123, lens> Easy to break. [unf] break + facil Difficult to learn [unf] learn + diff

The surface string is the corresponding part in the original text. It is used for reference in the view of the output of SA.

Page 9: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

Implementation :Transfer-based Machine Translation Engine

the transfer-based machine translation system consists of three parts: a source language syntactic parser, a bilingual transfer which handles the syntactic

tree structures, a target language generator.

Page 10: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

Implementation

Page 11: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

Techniques Required for Sentiment Analysis

Full syntactic parsing plays an important role to extract sentiments correctly, because only by a shallow parser are not always reliable. For example, expressions such as “I don’t think X is good”, is not favorable opinions about X, even though “X is good” appears on the surface. Therefore we use top-down pattern matching on the tree structures from the full parsing in order to find each sentiment fragment.

In our method, initially the top node is examined to see whether or not the node and its combination of children nodes match with one of the patterns in the pattern repository. In this top-down manner, the nodes “don’t think” in the above examples are examined before “X is good

Page 12: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

There are three types of patterns: principal patterns,

The pattern converts a Japanese expression “ noun ga warui” to a sentiment unit “[unf] bad <noun>”.

The pattern converts an expression “ noun wo ki-ni iru” to a sentiment unit “[fav] like <noun>”

Techniques Required for Sentiment Analysis

Page 13: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

auxiliary patterns expands the scope of matching. The pattern matches with phrases such as “X-wa yoi-to o

mowa-nai. (I don’t think X is good.)” and produces a sentiment unit with the negation feature. When this pattern is attached to a principal pattern, its favorability is inverted.

nominal patterns Using this pattern, convert a noun phrase “renzu-no shits

u (quality of the lens)” into just “lens”. EX: The quality of the lens is good. [fav] good <lens>?[fav] good <quality>

Pattern used for compound nouns such as “junden jikan (researching time). A sentiment unit “long <time>” is not informative, but “long <recharging time> “ can be regarded as a [unf]sentiment.

Techniques Required for Sentiment Analysis

Page 14: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

Disambiguation of sentiment polarity

Some adjectives and verbs may be used for both favorable and unfavorable predicates. This variation of sentiment polarity can be disambiguated naturally in the same manner as the word sense disambiguation in machine translation. The resolution is high fav ABC123 is expensive unf The semantic category assigned to a noun holds th

e information used for this type of disambiguation.

Page 15: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

Resources Principal patterns : verbal and adjectival, and assigne

d a sentiment polarity to each word. (total 3752 words)

Auxiliary/Nominal patterns: 95 auxiliary patterns and 36 nominal patterns were created manually.

Polarity lexicon: Some nouns were assigned sentiment polarity, e.g. [unf] for ‘noise’. (There are many ...)”.

Some patterns and lexicons are domain dependent. Fortunately the translation engine used here has a function to selectively use domain-dependent dictionaries, and thus we can prepare patterns which are especially suited for the domain of digital cameras.

Page 16: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

Evaluation Bulletin boards on the WWW that are

discussing digital cameras. A total of 200 randomly selected

sentences were analyzed by our system. The resources were created by looking at

other part of the same domain texts.

Page 17: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

Experiment 1 See the reliability of the extracted sentiment

polarity, use 3 metrics: Weak / Strong Precision, Recall

Using 2 method (a) based on machine translation engine (b) the lexicon-only method, which emulates the shallow

parsing approach. Use simple polarity lexicon of adjectives and verbs. No disambiguation was done. Direct negation of and adjective or verb.

Page 18: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

Experiment 1 The MT method outputs a sentiment unit only when the expressi

on is reachable from the root node of the syntactic tree through the combination of sentiment fragments, while the lexicon-only method picks up sentiment units from any node in the syntactic tree.

The sentence is an example where the lexicon-only method output the wrong sentiment unit , while the MT method did not output this sentiment unit gashitsu-ga kirei-da-to iu hyouka-ha uke-masen-deshi-ta. ‘There w

as no opinion that the picture was sharp.’ [fav] clear <picture> In the lexicon-only method,

some errors occurred due to the ambiguity in sentiment polarity of an adjective or a verb, e.g. Capabilities are high.” since high/expensive is always assigned the [unf] feature.

Page 19: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

Experiment 2 Compare the scope of the extracted sentiment units b

etween MT and (c): a method that support only naïve predicate-argument structures and doesn’t use nominal patterns.

The output by the MT was less redundant and more informative than Naïve method. Ex: It seems the function was enhanced last may

(A) [fav] enhance <function, May> (C) [fav] enhance <function>

Ex: A zoom is more desirable. (A) [fav] desirable <hou> (C) [fav] desirable <zoom>

Page 20: Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

conclusion

We have shown that the deep syntactic and semantic analysis makes possible the reliable extraction of sentiment units, and the outlining of sentiments became useful because of the aggregation of the variations in expressions, and the informative outputs of the arguments.

when we regard the extraction of sentiment units as a kind of translation. Many techniques which have been studied for the purpose of machine translation, such as word sense disambiguation, anaphora resolution, can accelerate the further enhancement of sentiment analysis.