a probabilistic approach to tweets' sentiment classification - acii 2013 conference

A Probabilistic Approach to Tweets’ Sentiment Classification

Francesco Colace, Massimo De Santo, Luca Greco

DIEM –Università degli Studi di Salerno

{fcolace, desanto, lgreco}@unisa.it

ACII 2013 – Geneva, 2-5 September 2013

Motivation Web 2.0 (or Web X.Y) rules!

Social Networks, Blogs, Microblogs, Reviews’ Collectors Sites: huge and terrific quantity of heterogeneus and opinonated data

Motivation Open issues:

o How to manage this information?o How to extract the sentiment inside the data?o How to understand something about the users?o How to evaluate the opinion of people about some topics or

products? Sentiment Analysis

Outline Brief introduction to the Sentiment Analysis

o Related Works

Towards a Sentiment Analysis Frameworko The Proposed Approach

• The LDA Approach• The Mixed Graph of Terms• A sentiment mining algorithm

Experimental results

Conclusions and Future WorksACII 2013 – Geneva, 2-5 September 2013

Sentiment Analysis Sentiment:

o a thought, view, or attitude, especially based mainly on emotion instead of reason

Sentiment Analysis (as known as Opinion mining):o use of Natural Language Processing (NLP) and computational

techniques to automate the extraction and classification of sentiment from unstructured texts

Sentiment Analysis: Why?

Consumer informationo Product reviews (Amazon, e-Bay, …)

Marketingo Consumer attitudeso Trends

Politicso Politicians want to know voters’ point of viewso Voters want to know policitians’ stances and who else supports them

Socialo Find like-minded individuals or communities

Sentiment Analysis: Open Issues

What features adopt?o Wordso Sentences

How to interpret features for sentiment detection?o As a bag of words o By the use of annotated lexiconso According to syntactic patternso Analyzing the paragraph structure

Sentiment Analysis: Approaches

Naïve Bayes

Maximum Entropy Classifier

Markov Blanket Classifier

… … …

Latent Dirichlet Allocation (LDA)ACII 2013 – Geneva, 2-5 September 2013

The Proposed Approach: from the Bag-of-Words …

By the use of the Bag of Words approach, a document can be represented as an ordered set of words

Problems:

o What words express better the sentiment in a text?

o How to compare various «bag of words» derived from texts with the same sentiment?

o By the use of the bag of words is it possible to represent the documents’ domain of interest?

… to mixed Graph of Terms (mGT)

The mixed Graph of Terms is a «graph based» representation of documents

In the proposed approach, a mixed Graph of Terms is obtained by an automatic extraction of words based on probabilistic clustering techniques as Latent Dirichlet Allocation (LDA)

In a mixed Graph of Terms the words are linked according to their mutual occurence probability and «aggregating_word» and «aggregated_words» can be recognized

Our proposal: a mixed Graph of Terms can be used as a «sentiment filter»

mGT: a different point of view

In the proposed approach, in a mixed Graph of Terms two different layers can be recognized:

The Aggregator Layer: the words with higher degree of interconnection with the words that are in the documents

The “Aggregated Words” Layer: this layer expresses words that have higher degree of interconnection with one or more Aggregator Word

Latent Dirichlet Allocation In natural language processing, Latent Dirichlet Allocation (LDA) is a

generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar

For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics

The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words

By the use of the Latent Dirichlet Allocation technique a set of documents can be represented as a mixed Graph of Terms

Extraction of a Mixed Graph of Terms

mGT: an example

Sentiment Classification by the use of mGT

Step_1: Learn a mixed Graph of Terms by the use of labelled documents (i.e. Positive or Negative) obtaining:o mGT positiveo mGT negative

Step_2: Use the mixed Graph of Terms as filter in order to classify the sentiment of textso Comparing concepts that are both in the mGTs

both in the texto Comparing words that are both in the mGTs both in

the text

Sentiment Classification by the use of mGT

Experimental Results

Dataset: Movie Reviews

Approach Accuracy

Support Vector Machine* 82,90

Naive Bayes* 81,50

Maximum Entropy* 81,00

mGT-LDA 88,50

*[Bo Pang, 2002]

Dataset: Real Tweets related to Politics Training Set: 3980 Tweets Test Set: 32185 Tweets

Approach Accuracy

mGT-LDA 87,10

SVM 79,20

Naive Bayes 76,60

http://193.205.190.209/elezioni2013/

accuracy

Masterchef - http://193.205.190.209/tvshow/masterchef/

Conclusions

Pro:o Indipendent from Languageo Fast classificationo Continous Upgradeo Little Training Set

Cons:o In general, long Time for mGT building

processo An Annotated Lexicon is needed

Future Works

To improve the classification by the continous update of the training set

To Introduce SentiWordnet as Annotated lexicon

To adopt an ontological formalism for a better representation of the mGT

To build a bigger tweets’ dataset

Any Questions?

Don’t forget to tweet your sentiment!!!

a probabilistic approach to tweets' sentiment classification - acii 2013 conference

o words o sentences

bag of words o

text o

social o

o indipendent

sentiment of texts o

sentiment analysis acii

mixed graph of terms

Technology

sentichenews - sentiment analysis on newspapers and tweets

analysis of nokia customer tweets with sas® enterprise...

dataset built for arabic sentiment analysis ·...

sentiment analysis using hadoop - sce support...

social media evolution of the egyptian...

twitter sentiment analysis on demonetization tweets in india...

from tweets to polls: linking text sentiment to public...

using tweets sentiment analysis to predict stock market

acii handbook gesturesyn

clustering arabic tweets for sentiment analysis

sentiment prediction of geotagged tweets during lockdown

sentiment analysis for arabic tweets

surface and deep features ensemble for sentiment analysis...

september 2018 - brandseye...between january 4-23, 2018,...

who cares about sarcastic tweets? investigating the impact...

sentiment analysis of tweets to classify the box office

sentiment analysis of greek tweets and hashtags using...

sentiment analysis of tweets by cnn utilizing tweets with...

market sentiment and exchange rate directional forcasting ·...

predictive sentiment analysis of tweets: a stock market