international journal of pure and applied mathematics volume … › hub › 2018-118-22 ›...

12
DATA ANALYSIS USING DEEP NEURAL NETWORKS 1 Kushal Chakraborty, 2 D. Malathi 1,2 Dept. of Computer Science and Engineering SRM Institute of Science and Technology, Chennai, India 1 [email protected], 2 [email protected] Abstract: Social media has become very popular especially in this modern age of Internet. It is a large reserve of opinionated data. Nowadays person not only use social media to gain information but also to provide their opinions, reviews about a multitude variety of topics. Data Analysis is defined as the systematic, objective and exhaustive search for the study of the data and facts relevant to any real world problems. It is the process which is used to inspect, clean, transform and model data with the purpose of finding useful information, suggesting conclusions, and supporting decision making. It has various techniques and approaches. Sentiment analysis and opinion mining is one of the approaches that enables us to determine the overall view or opinion that is held or expressed by the people regarding any product, movie or any other topic. Data Analysis also includes analysis of various other aspects of data like the locations in the world where the people have talked about the topic, number of people who have been and are still talking about the topic (which indicates the topic's popularity) and many other. The utility of data analysis is innumerable. It enables the companies to perform research on the market, sales, products, enables to find out the responses of the public regarding particular policies implemented by government. In this paper we have performed a detailed analysis of the tweets relevant to a particular topic of interest using Deep Neural Networks and provide a comprehensive analytical solution about the entity. Keywords: Data Analysis; Sentiment; Opinion; Recursive Neural Network; Emoticon; Hashtag; Entity; Locations. 1. Introduction Sentiment Analysis is the field of study that employs comprehensive text analysis, computational linguistics and accepted natural language processing techniques to analyze people’s opinions, sentiments, evaluations, attitudes and emotions and to identify, quantify and study affective states and subjective information. It is applied to the voice of the customer materials towards entities such as products, services, organizations, individuals, issues, events and their attributes. It emphasizes on the statement "What is the psychology of the people? “. In this paper, we have implemented Deep Learning Neural Networks to solve the following problem: Given by the user query about: The quality of a particular product The rating about a particular movie The analysis of the upcoming elections The analysis of new policies of the government The objectives of our research work are: Is to find the sentiment about the reviews of the public regarding a particular entity of interest i.e. whether it is very negative, negative, neutral, positive, and very positive. Is to find the locations in the world where the topic has been mostly spoken about. Is to find all the relevant hashtags. Is to find all the topics relevant to our topic of interest. Is to find the number of people who have spoken about the topic at a particular point of time which indicates the popularity of the topic. There are various terminologies associated with the sentiment analysis which has very subtle differences: Opinion: Judgement formed about something Sentiment: A view or opinion about something Evaluation: Assessment of an entity’s worth or significance Appraisal: An act of assessing something or someone Attitude: A way of thinking or feeling about something Emotions: A feeling derived from one’s circumstances 2. Various Techniques of Sentiment Analysis Various techniques of sentiment analysis found in the literature survey are given in Table 1. Many authors have implemented various techniques at sentence level. The accuracy obtained and limitations are also given in the table. The authors in [3] have used the existing linguistic features as well as resources to find out the information from the informal languages used in International Journal of Pure and Applied Mathematics Volume 118 No. 22 2018, 177-187 ISSN: 1314-3395 (on-line version) url: http://acadpubl.eu/hub Special Issue ijpam.eu 177

Upload: others

Post on 26-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: International Journal of Pure and Applied Mathematics Volume … › hub › 2018-118-22 › articles › 22a › 25.pdf · 2018-04-22 · Opinion Digger / Sentence level Reviews

DATA ANALYSIS USING DEEP NEURAL NETWORKS

1Kushal Chakraborty,

2D. Malathi

1,2Dept. of Computer Science and Engineering

SRM Institute of Science and Technology, Chennai, India [email protected], [email protected]

Abstract: Social media has become very popular

especially in this modern age of Internet. It is a large

reserve of opinionated data. Nowadays person not only

use social media to gain information but also to provide their opinions, reviews about a multitude variety of

topics. Data Analysis is defined as the systematic,

objective and exhaustive search for the study of the

data and facts relevant to any real world problems. It is

the process which is used to inspect, clean, transform

and model data with the purpose of finding useful

information, suggesting conclusions, and supporting

decision making. It has various techniques and

approaches. Sentiment analysis and opinion mining is

one of the approaches that enables us to determine the

overall view or opinion that is held or expressed by the

people regarding any product, movie or any other topic.

Data Analysis also includes analysis of various other aspects of data like the locations in the world where the

people have talked about the topic, number of people

who have been and are still talking about the topic

(which indicates the topic's popularity) and many other.

The utility of data analysis is innumerable. It enables

the companies to perform research on the market, sales,

products, enables to find out the responses of the public regarding particular policies implemented by

government. In this paper we have performed a detailed

analysis of the tweets relevant to a particular topic of

interest using Deep Neural Networks and provide a

comprehensive analytical solution about the entity.

Keywords: Data Analysis; Sentiment; Opinion; Recursive Neural Network; Emoticon; Hashtag; Entity;

Locations.

1. Introduction

Sentiment Analysis is the field of study that employs

comprehensive text analysis, computational linguistics

and accepted natural language processing techniques to

analyze people’s opinions, sentiments, evaluations,

attitudes and emotions and to identify, quantify and

study affective states and subjective information. It is

applied to the voice of the customer materials towards

entities such as products, services, organizations, individuals, issues, events and their attributes. It

emphasizes on the statement "What is the psychology of

the people? “. In this paper, we have implemented Deep

Learning Neural Networks to solve the following

problem: Given by the user query about:

• The quality of a particular product

• The rating about a particular movie

• The analysis of the upcoming elections

• The analysis of new policies of the government

The objectives of our research work are:

• Is to find the sentiment about the reviews of the

public regarding a particular entity of interest i.e.

whether it is very negative, negative, neutral, positive,

and very positive.

• Is to find the locations in the world where the topic

has been mostly spoken about.

• Is to find all the relevant hashtags.

• Is to find all the topics relevant to our topic of

interest.

• Is to find the number of people who have spoken

about the topic at a particular point of time which indicates the popularity of the topic.

There are various terminologies associated with the

sentiment analysis which has very subtle differences:

• Opinion: Judgement formed about something

• Sentiment: A view or opinion about something

• Evaluation: Assessment of an entity’s worth or

significance

• Appraisal: An act of assessing something or

someone

• Attitude: A way of thinking or feeling about

something

• Emotions: A feeling derived from one’s

circumstances

2. Various Techniques of Sentiment Analysis

Various techniques of sentiment analysis found in the

literature survey are given in Table 1. Many authors

have implemented various techniques at sentence level.

The accuracy obtained and limitations are also given in

the table. The authors in [3] have used the existing

linguistic features as well as resources to find out the

information from the informal languages used in

International Journal of Pure and Applied MathematicsVolume 118 No. 22 2018, 177-187ISSN: 1314-3395 (on-line version)url: http://acadpubl.eu/hubSpecial Issue ijpam.eu

177

Page 2: International Journal of Pure and Applied Mathematics Volume … › hub › 2018-118-22 › articles › 22a › 25.pdf · 2018-04-22 · Opinion Digger / Sentence level Reviews

Twitter. They have used the supervised machine

learning approaches to find the solution to the problem.

Apoorv Agarwal et al. [5] have used tree kernel and

feature based models to implement sentiment analysis.

They have used a supervised machine learning

technique to solve the task. They have classified the

sentiment into three classes: positive, negative and

neutral. The authors in [6] have provided a technique for

automatically classifying the sentiment of the Twitter

messages based on a keyword. They have classified the sentiments into two classes: positive and negative.

In paper [9], the author has proposed a technique to

provide a synopsis of the comments of some products

based on the votes given by the customer. It provides

the ratings of the essential aspects of so that the

customer can have different viewpoints of the target

product. The authors in [10] have developed an Android

app for polarity analysis of the reviews and comments.

They have used different steps to perform this task like:

acquisition of reviews, polarity and feature

identification, parsing. The detection of sarcastic

sentences is discussed in [11]. For sentiment analysis it

is very important to find out the sarcasm in sentences if

present. The authors have discussed about the various approaches for sarcasm detection like: Rule based

approaches, Deep learning techniques.

Table 1. Various Techniques of Sentiment Analysis

Author Technique/

Approach

Dataset Accuracy/Drawback

Khan et al [17] Rule based Method/

Sentence level

1000 reviews each on

movies, airlines and

2600 reviews on

hotels

91% at document level and 86%

at sentence level/Based on

WordNet Database.

Ana C.E.S Lima

et al [1]

Emoticon, Word,

Hybrid based approaches/

Sentence level

Tweets related

Brazilian TV shows

90% average/

Criteria used to surmise the sentiments are static in nature.

Samaneh

Moghaddam et al

[20]

ILDA/

Document level

Various reviewing

websites that use

rating

73%/

Correspondence between

identified clusters and ratings is not explicit

Jorge Carrilo et al

[19]

Machine learning/

Document and sentence

level

25 reviews/ hotel from

60 different hotels

from booking.com

71.7% for 3 categories and

46.9% for 5 categories/

Not applicable on reviews written

in languages other than English

Samaneh

Moghaddam et al

[18]

Opinion Digger/

Sentence level

Reviews from rating

websites like

Amazon.com

Ranking loss of 0.49/

Requires guidelines and known

aspects to work on and based on

WordNet Database

Tai et al [23] Dependency tree –

Long Short Term

Memory/

Sentence level

Stanford Sentiment

Treebank dataset

48.4%/

N/A

Ouyang et al [16] Convolution Neural

Network/

Sentence level

Movie reviews from

rottentomatoes.org

45.4%/

N/A

In paper [12], the authors have done a detailed

analysis on hashtags and memes which have become

very important components of tweets nowadays. They

have revealed some very interesting facts about some

expected and non-expected hashtags. In the paper [13], the authors have discussed about the approaches of

using sentiment analysis to determine whether the end

user is a bot or a human. They have used SentiBot

framework for this purpose and have used the India

Election Dataset for this purpose.

International Journal of Pure and Applied Mathematics Special Issue

178

Page 3: International Journal of Pure and Applied Mathematics Volume … › hub › 2018-118-22 › articles › 22a › 25.pdf · 2018-04-22 · Opinion Digger / Sentence level Reviews

The authors in [14] have used the sentiment

predictions from some websites as noisy labels in order

to train a model. They have used 1000 tweets, which

were manually labelled, for tuning. They have used

another 1000 manually labelled tweets for testing

purpose.

A technique for aspect level sentiment analysis of

different entities is proposed in [2]. The authors have

also discussed about the various levels of sentiment

analysis which are being used nowadays. They have also discussed about the various challenges which are

faced by researchers in the field of sentiment analysis.

3. Aspect Level Sentiment Analysis Algorithm

We have proposed a system that performs sentence level

sentiment analysis on tweets and categorizes the tweets into five categories based on the score:

• 0 -Very Negative or Negative

• 1 – Somewhat Negative

• 2 – Neutral

• 3 – Somewhat Positive

• 4 – Positive

A. Twitter

Twitter is a social networking service and

microblogging platform that allows end users to post

real-time brief concise messages called tweets. There

are various characteristics of tweets:

• The maximum length of the twitter message is 140

characters, although experiments are still carried out by

Twitter to increase the length to 280 characters.

• The magnitude of data that is available in twitter is

vast. Using Twitter API and twitter4j it is very easy to

collect millions of tweets for our experiment.

• Twitter users post messages from different types of

devices. The number of errors, misspellings, non-

English words and slangs used are huge which makes it

absolutely necessary to carry out preprocessing tasks.

• Twitter users use brief, concise messages about a

variety of topics which are freely available in Twitter.

This is very effective for our data analysis project.

Following is a brief terminology about the various

components of tweets:

• Handle: A twitter handle is a username preceded by

“@” used to refer users or other users on the blog which

alerts them. It must contain less than 15 characters.

Each handle has a distinct URL with the handle

concatenated after twitter.com. These are also

sometimes referred to as the “target”.

• Emoticon: These are the pictorial facial expressions

which are generally represented by the concatenation of

punctuation and letters which expresses the user’s

sentiment or mood. These are very essential from the

sentiment analysis point of view.

• Hashtag: It is used to mark or refer to a topic,

keyword or phrase preceded by “#” symbol. It is used to

categorize messages and find out the relevant topics on

Twitter.

• Timeline: It actually shows a list of tweets which

are updated dynamically in such a way that the most

recent tweet is displayed at the top.

• Retweet: It is a common activity in Twitter which

shows the extent of popularity of tweet. In this case the tweets are generally forwarded or resent by someone to

the followers or others, although the tweet was

originally written by someone else.

Example: “PM Shri @narendramodi congratulates

@isro team for successfully launching of 100 satellites

in a single mission. #National Youth Day”.

This is a tweet taken from the Timeline of BJP in Twitter. Here there are two handles

“narendramodi”,”isro”. There is a hashtag “National

Youth Day”. So the synopsis is that here the sentiment

is positive. The topic the tweet is referring to is about

the National Youth Day. Here the entity is “Satellites”.

B. Deep Neural Network

The deep neural network which we have used in this

project is the Recursive Neural Network. In this we

recursively apply the same set of weights over an input

that is structured, to provide a prediction that is

structured, over an input that is variable-sized by using

topological order for its traversal. In its most simple

form the Recursive Neural Network can be represented

as follows:

As shown in Fig. 1, let v1 and v2 be the vectors with

n-dimensions each and W be the weight matrix which is

nx2n. Here v1,and v2 are the child nodes. The parent

vector pv can be calculated as: pv = func (W [v1 v2]

T) (1)

Here pv is also a vector of n-dimensions.

score = Ws T

.pv (2)

Here Ws € R1xn

.

Figure 1. A Simple Architecture of Recursive

Neural Network

International Journal of Pure and Applied Mathematics Special Issue

179

Page 4: International Journal of Pure and Applied Mathematics Volume … › hub › 2018-118-22 › articles › 22a › 25.pdf · 2018-04-22 · Opinion Digger / Sentence level Reviews

Figure 2. Architecture of Recursive

Neural Network

Fig. 2 shows the architecture of Recursive Neural

Network. There are 140 nodes in the input layer and 8

hidden layers and finally the output layer which will

provide the final sentiment of the sentence. The process

used for calculating the sentiment is given in the next

section.

C. Sentiment Analysis

In this paper, we have used Recursive Neural Network

for sentiment analysis. Here we have used the concept

of semantic vector space and compositionality.

Semantic vector spaces consist of the technique of

converting each unique word into vectors where each

element in the vector is used to capture the various

contexts in which a particular word has been used in the

corpus. This can be achieved using Word2vec or Glove

tools. Although it is able to detect those words that

share common contexts, but it cannot capture the

semantics of longer phrases or sentences. Example:

Suppose we have two sentences:

• The country of my birth

• The place where I was born

Now if we closely observe the two sentences we

will see that both the sentences convey almost the same meaning, although the words used in both the sentences

are different. If we take the word vectors of each word

of the two sentences we will not be able to deduce the

meaning of the sentences. This is the reason why we

move to another technique called compositionality.

Compositionality is a technique to deduce the meaning

of longer phrases. And it is shown in Fig. 3. It works in

the following way:

• Each sentence is given to compositional model.

• It is then represented as a binary tree

• By using different types of compositionality

function Recursive Neural Network will compute the

parent vectors in a bottom up approach.

• Now we give these parent vectors as features to

our classifier for the sentiment classification process.

• Softmax classifier is used for sentiment

classification i.e. giving a sentiment tag to a vector.

• Tanh and sigmoid functions are used as activation

functions to compute the parent vectors.

Figure 3. Computing parent vector and sentiment

of a trigram func(b,c), func(a,p1) are compositionality

function. pa1,pa2 are the parent vectors. Sentiment

score obtained at each step by using softmax classifier.

Figure 4. Comprehensive diagram to show the process

of computing parent vectors using Recursive Neural

Network

The Comprehensive diagram to show the process of computing parent vectors using Recursive Neural

Network is shown in Fig 2. The recursive formula for

computing the parent vector can be written as

ht = tanh(wh.ht-1 + wx.x + b) (3)

It can also be written as

ht = tanh([wh wx] [ht-1 x]T

+ b) (4)

where wh is the weight matrix associated with the

phrase, wx is the weight matrix associated with a

word, ht-1 is the vector associated with the previously

computed phrase, ht is the vector associated with

currently computed phrase, b is the bias.

International Journal of Pure and Applied Mathematics Special Issue

180

Page 5: International Journal of Pure and Applied Mathematics Volume … › hub › 2018-118-22 › articles › 22a › 25.pdf · 2018-04-22 · Opinion Digger / Sentence level Reviews

This process can be repeated again and again by

using Recursive Neural Network to find out the

sentiment of the entire sentence.

4. Data Analysis System Design

In this section we have provided the flow diagram of the

system which we have used for data analytics in Fig 5.

We have also provided the sequence of execution of our

proposed model in Fig. 6.

A. Flow Diagram of the system

Figure 5. Twitter Data Analysis Project Block Diagram

B. Steps of execution of the system

In Fig. 5 we have shown the block diagram of our

project which we have used for data analysis. It consists

of mainly two modules: Webapp and Data fetch API.

The Webapp constitutes the front-end of the project and

Data fetch API constitutes the backend of the project.

The proposed system executes in the following steps:

• User logs into our webapp using the credentials of

twitter. It is mandatory that the person who wants to use

our app for data analytics purpose needs to have an

account in twitter.

• After logging into his account successfully, the

user needs to create an entity with a particular name.

This is the place where all the analytical solution about

a particular product will be stored.

• Now the user needs to provide a keyword or

keywords about which he/she needs to find the

solution.

• Now all the tweets relevant to the particular

keyword will be taken from Twitter Stream API and

will be enqueued in RabbitMQ.

• After this all the tweets will be given to the

analytics module where the real analysis takes place.

• The result of the analysis will be segregated and

the required data will be stored in database in their

respective tables.

• A comprehensive analytical solution of our topic

of interest will then be displayed to the user.

5. Experiments and Result Discussion

A. Dataset

Large volume of datasets of tweets is not freely

available. Therefore we train our neural network using

the public dataset available at

http://nlp.stanford.edu/sentiment. This dataset contains

corpus of movie review excerpts from

rottentomatoes.com website. It contains 10662

sentences of movie reviews out of which one half was

positive and the other half was negative in the original

dataset. The label can give us the net sentiment of the

long movie review. R.Socher in 2013 wanted to achieve

5-class classification of the sentiment i.e. somewhat

negative, negative, neutral, somewhat positive, positive.

He had used the Amazon Mechanical Turk to relabel the

sentiments in the original dataset. The following are the

details of the dataset:

• Number of classification – 5

• Maximum sentence length – 53

• Number of sentences in the dataset used – 11855

• Size of the Vocabulary – 17833

• Number of words present in the Google News word

vector – 16262

• Number of sentences in the test set – 2210

B. Data Collection and Preprocessing

Tweets are collected using Twitter Stream API. We

have used Twitter4J in our project. It is an unofficial

library written in java for the Twitter API. We have

integrated our Web application with the Twitter service

using Twitter4J. We have integrated it into our project using maven build tool.

The twitter data after being collected, has been

preprocessed using Stanford Core Nlp toolkit[7]. It

contains various annotators which performs the

preprocessing tasks. Some of the important annotators

which we have used in our project are:

• tokenize: tokenizes the text into sequences of

smaller units called tokens. For English it uses PTB-

style tokenizer.

• ssplit: it is used to split the sequence of tokens into

sentences.

• pos: it is used to provide the part-of-speech of the

tokens.

• ner: it is used to recognize various types of entities

like : person, location, organization, money, date etc.

International Journal of Pure and Applied Mathematics Special Issue

181

Page 6: International Journal of Pure and Applied Mathematics Volume … › hub › 2018-118-22 › articles › 22a › 25.pdf · 2018-04-22 · Opinion Digger / Sentence level Reviews

C. Word2vec

Word2vec model was first given by Mokolov et al and a

group of scientists at Google. It is an open source tool

which is used to learn the represent the words in vector

form. It strictly adheres to Apache License 2.0 open

source license. This method of representation is also

called “word embeddings”. In terms of distribution

representation it is written as: “Any word wi in the

corpus is given a distributional representation by an embedding wi € Rd i.e. a d-dimensional vector which is

usually learnt”. It consists of a shallow neural network

which is two-layered. These neural networks are trained

so that it can reconstruct the lexical contexts of words.

A large corpus of text is given as input to Word2vec. As

a result of which a vector space is produced as output.

This vector space is generally of very high dimensions, such that each unique word in the corpus is assigned a

vector in the space. When these word vectors are

positioned in the vector space, it is found that words that

have similar contexts in the corpus are located very

close to each other in space. Internally it uses mainly

two algorithms or models to produce word embeddings.

They are: Continuous Bag of Words Model, Skip Gram

Model. An important advantage of Word2vec is that it

takes less time to execute even on large datasets.

In this paper, we use the pre-trained vectors from

the Google News dataset which consists of about 100

billion words. The vectors are freely available and can

be easily downloaded from https://code.google.com/p/word2vec/. It is contained in

the file GoogleNewsvectors-negative300.bin. It consists

of 300 dimensional vectors for 3 million words and

phrases. It is very difficult to work with such a high

dimensional vector, so we implement a famous

technique of dimensionality reduction, mostly used in

Data Science, called t-Distributed Stochastic Neighbour Embedding (t—SNE).

D. t-Distributed Stochastic Neighbour Embedding

It is a prize-winning machine learning technique for

which is used to reduce the dimensions. The main

problem with Word2vec is that for each unique word it produces a vector of very high dimensions. Since it is

not possible to visualize and work with the vectors in

ridiculously high dimensional space, we use t-SNE for

visualizing the high dimensional vectors on a two or

three dimensional space. It was developed by Geoffrey

Hinton and Laurens van der Maaten[8].

It consists of mainly two steps:

1) It creates a probabilistic distribution over pairs of

words which are of very high dimensions. Here high

dimensional words actually refer to high dimensional

vectors that represent each word. It is done in such a

way that words with similar vectors have high

probability of being selected, while words with dissimilar vectors have an extremely small probability

of getting selected. For the similarity metric, the

Euclidean distance is used.

2) It defines a probabilistic distribution over the

points in map which is of low dimensions. Its main

objective is to minimize the Kullback-Leibler

divergence between the two distributions with respect

to the location of the points in the map.

E. RabbitMQ It is an open source message broker. It accepts and

forwards messages. The main data structure used inside

RabbitMQ is queue. It can be said to be a collection of

software programs which provides various

functionalities required to access a queue. The program

that sends the messages to the queue is called the

producer. The program which removes the messages from queue, forwards the messages is called the

consumer. The addition and removal of the messages

from the queue follows the FIFO principle. The features

of RabbitMQ are:

• It is highly reliable

• It provides routing which is highly flexible.

• It provides clustering.

• It provides highly available queues.

In the propose model, we have used RabbitMQ

technology to store all the tweets relevant to our entity of interest and later the analysis module will use the

tweets from RabbitMQ to provide analytical solution

about our entity of interest.

F. Screenshots of the system

Figure 6. Entity Creation

International Journal of Pure and Applied Mathematics Special Issue

182

Page 7: International Journal of Pure and Applied Mathematics Volume … › hub › 2018-118-22 › articles › 22a › 25.pdf · 2018-04-22 · Opinion Digger / Sentence level Reviews

Figure 7. Tweet Collection through RabbitMQ

Figure 8. Most Retweeted Tweets

Figure 9. Relevant Topics

International Journal of Pure and Applied Mathematics Special Issue

183

Page 8: International Journal of Pure and Applied Mathematics Volume … › hub › 2018-118-22 › articles › 22a › 25.pdf · 2018-04-22 · Opinion Digger / Sentence level Reviews

Figure 10. Relevant Hashtags

Figure 11. X-axis: Date, Y-axis: Number of

people talking

Figure 12. X-axis: Date, Y-axis: Sentiment score

Figure 13. Green spots represent hotspots which

indicates the locations where people have talked about

the keyword.

In this section we have showed the screenshots of

our project. In Fig. 6 we have created an entity named

BJP. This entity actually refers to the place where the

statistics related to the data analysis will be stored. Here

we also mention the keywords about which we want a

comprehensive data analysis. The keywords mentioned

here are: elections and modi. If we want we can mention

the handles which will increase the chances of receiving

related tweets faster.

In Fig. 7 the screenshot shows that the tweets are collected through RabbitMQ. In the first graph, the X-

axis represents the time and the Y-axis represents the

number of tweets which have been collected. The

second graph shows the rate at which the tweets are

collected. In Fig. 8 the screenshot shows the most

retweeted tweets along with the number of times the

tweets have been retweeted. This actually indicates the popularity of the topic. In Fig. 9 the screenshot shows

the topics which are relevant or related to the keywords

which we have mentioned while creating the entity.

In Fig. 10 the screenshot shows the relevant

hashtags which we have obtained from the tweets. In

Fig. 11 the screenshot shows the graph which tells us

the number of people who have talked about this topic

at a particular date. The X-axis represents the Date and

the Y-axis represents the number of people. This will

also indicate the popularity of the topic. In Fig. 12 the

screenshot shows the graph which provides the

sentiment score at a particular date. The X-axis

represents the Date and the Y-axis represents the

sentiment score of the tweets. In Fig. 13 the screenshot

shows the geographical locations where people have

talked about the keywords. The green dots represent the

hotspots showing the locations in the world where

people have tweeted about the keyword.

6. Conclusion

Big data and Data Analytics has become one of the

important subjects of Computer Science. It has become

one of the fields in the corporate world. One of the most

International Journal of Pure and Applied Mathematics Special Issue

184

Page 9: International Journal of Pure and Applied Mathematics Volume … › hub › 2018-118-22 › articles › 22a › 25.pdf · 2018-04-22 · Opinion Digger / Sentence level Reviews

challenging subfields of data analysis is prediction of

trends and sentiment analysis. One of the greatest

challenges in sentiment analysis is detection of sarcasm.

It is really hard to detect the sentiment of a sarcastic

sentence. In future, we will focus on mainly two

aspects: increasing the accuracy of the sentiment

analysis task and detection of sarcastic sentences which

will in turn enable us to provide a greater analytical

solution in our project.

References

[1] Ana c E S Lima and Leandro N de

Castro,”Automatic sentiment Analysis of Twitter

messages”, fourth IEEE conference on CASoN, 21-23

Nov 2012, Sao Carlos.

[2] Kiruthika M, Sanjana Woona, Priyanka Giri,”Sentiment Analysis Of Twitter Data”,

International Journal of Innovations in Engineering and

Technology, Volume 6 Issue 4, April 2016,

ISSN:2319-1058.

[3] Efthymios Kouloumpis, Theresa Wilson,

Johanna Moore,”Twitter Sentiment Analysis: The

Good the Bad and the OMG!”, Proceedings of the Fifth

International AAAI Conference on Weblogs and Social Media, 2011.

[4] Alec Go, Richa Bhayani, Lei Huang,”Twitter

Sentiment Classification using Distant Supervision”,

CS224N Project Report, pp. 1-12, 2009.

[5] Apoorv Agarwal, Boyi Xie, llia Vovsha, Owen

Rambow, Rebecca Passonneau,”Sentiment Analysis of

Twitter Data”, Proceedings of the Workshop on

Language in Social Media (LSM 2011), pages 30-38, Portland, Oregon, 23 June 2011, Association for

Computational Linguistics.

[6] David M. Blei, Andrew Y. Ng, Michael I.

Jordan,”Latent Dirichlet Allocation”, Journal of

Machine Learning Research, Volume 3, 2003.

[7] Cristopher D. Manning, Mihai Surdeanu, John

Bauer, Jenny Finkel, Steven J. Bethard and David

McClosky 2014,”The StanfordCoreNlp: Natural

Language Processing Toolkit”, Association for

Computational Linguistics (ACL) System

Demonstrations, 2014.

[8] L.J.P van der Maaten and G.E. Hinton, “

Visualizing High-Dimensional Data Using t-SNE”,

Journal of Machine Learning Research, Volume 9, Nov

2008,2579-2605.

[9] Yue Lu, ChengXiang Zhai, Neel

Sundaresan,”Rated aspect summarization of short

comments”, In www 2009,pp 131-140.

[10] Shital S. Dabhade, Prof. Sonal S. Honale,”An

Application for Sentiment Analysis Based on

Expressive Feature in the Sentence”, International

Journal of Advance Research in Computer Science and

Management Studies, Volume 3, Issue 5, May 2015,

ISSN:2321-7782.

[11] Aditya Joshi, Pushpak Bhattacharyya, and

Mark J Carman. 2017. “Automatic Sarcasm Detection”

A Survey, ACM Comput. Surv. 0, 0, Article 1000

(2017), 22 pages, DOI:00.00.

[12] Dimitros Kotsakos, Panos Sakkos, Ioannis

Katakis, Dimitrios Guanopulos,”Hashtag : Meme or

Event ?”, IEEE/ACM conference on ASONAM, 17-20 August 2014, Beijing, China.

[13] John P Dickerson, Vadim Kagan, V S

Subramanian,”Using sentiment to detect Bots on

Twitter: Are Humans more opinionated than Bots ?”,

IEEE/ACM conference on ASONAM, 17-20 August

2014, Beijing, China.

[14] Luciano Barbosa and Junlan Feng

2010,”Robust sentiment detection on twitter from

biased and noisy data”, Proceedings of the 23rd

International Conference on Computational Linguistics:

Posters, pages 36-44.

[15] Xi Ouyang, Pan Zhou, Cheng Hua Li, Lijun

Liu,” Sentiment Analysis Using Convolutional Neural

Networks”, 2015 IEEE International Conference on

Computer and Information Technology; Ubiquituous

Computing and Communications; Dependable,

Autonomic and Secure Computing; Pervasive

Intelligence and Computing.

[16] Aurangzeb Khan, Baharum Baharudin, and

Khairullah Khan,”Sentiment classification from online

customer reviews using lexical contextual sentence

structure.”, In Software Engineering and Computer

Systems, ICSECS International Conference on

Software Engineering and Computer Systems, pages

317-331, Springer, 2011.

[17] Samaneh Moghaddam and Martin

Ester,”Opinion digger: an unsupervised opinion miner

from unstructured product reviews”, In Proceedings of

the 19th ACM International Conference on Information

and knowledge management, pages 1825-1828, ACM ,

2010.

[18] Jorge Carrillo de Albornoz, Laura Plaza, Pablo

Gervas and Alberto Diaz,”A joint model of feature

mining and sentiment analysis for product review

rating”, In Advances in Information Retrieval, pages

55-66, Springer, 2011.

[19] Samaneh Moghaddam and Martin Ester,”Ilda:

interdependent lda model for learning latent aspects and their ratings from online product reviews”, In

Proceedings of the 34th International ACM SIGIR

conference on Research and development in

Information Retieval, pages 665-674, ACM, 2011.

International Journal of Pure and Applied Mathematics Special Issue

185

Page 10: International Journal of Pure and Applied Mathematics Volume … › hub › 2018-118-22 › articles › 22a › 25.pdf · 2018-04-22 · Opinion Digger / Sentence level Reviews

[20] Simon O. Haykin,”Neural Networks and

Learning Machines”, Pearson Education India, Third

Edition, 1 April 2016, 944 pages.

[21] Daniel Jurafsky, Jmaes H. Martin,”Speech and

Language Processing: An Introduction to Natural

Language Processing, Computational Linguistics and

Speech Recognition”, Second Edition, 2013 , 940

pages.

[22] Kai Sheng Tai, Richard Socher, Cristopher D.

Manning,”Improved Sentiment Representations From

Tree-Structured Long Short-Term Memory Networks, In the Proceedings of the 53rd Annual Meeting of the

Association for Computational Linguistics and the 7th

International Joint Conference on natural Language

Processing, pages 1556-1566, Beijing, China, July 26-

31, 2015.

[23] S.V.Manikanthan and T.Padmapriya “Recent

Trends In M2m Communications In 4g Networks And

Evolution Towards 5g”, International Journal of Pure

and Applied Mathematics, ISSN NO:1314-3395, Vol-

115, Issue -8, Sep 2017.

[24] T.Padmapriya and V.Saminadan, “Utility based

Vertical Handoff Decision Model for LTE-A

networks”, International Journal of Computer Science and Information Security, ISSN 1947-5500, vol.14,

no.11, November 2016.

International Journal of Pure and Applied Mathematics Special Issue

186

Page 11: International Journal of Pure and Applied Mathematics Volume … › hub › 2018-118-22 › articles › 22a › 25.pdf · 2018-04-22 · Opinion Digger / Sentence level Reviews

187

Page 12: International Journal of Pure and Applied Mathematics Volume … › hub › 2018-118-22 › articles › 22a › 25.pdf · 2018-04-22 · Opinion Digger / Sentence level Reviews

188