naïve multi label classification of you tube comments using

Naïve Multi-label classification of YouTube comments

using comparative opinion mining

By- Nidhi Baranwal MCA 5th sem

Introduction

• People are connecting with each other in cyber space and show their sentiments in the form of comments. YouTube is considered as a king in the field of video sharing.

• There are situations in which opinion shared by user has comparative content. User sees the video of comparison of two options and shares his preference based on some reasoning.

• In this paper, Naïve Bayes machine learning algorithm is used to perform multi-label classification to find out the sentiments of the commentators .

• In order to reduce the computational requirements, it uses a naïve assumption that words around keywords related to particular option are enough to understand the sentiments of user.

Classification?

• Classification is a task to predict a class(label) of an instance based on data

• Supervised Learning Example: Naïve Bayes• We give the system a set of instances to learn • System builds knowledge of some structure• System can then classify new instances

Types of Classification

• Binary classification: each instance can be only one out of two classes

• Multiclass classification: each instance can be only one out of more than two classes

• Multi-label classification: each instance can be multiple classes at the same time

• Hierarchical multi-label classification: classes are organized in a hierarchy

Opinion Mining?

• Opinion mining or Sentiment analysis is concerned as “How people think about particular thing, person or idea”. • It is the process of determining whether a piece of writing is

positive, negative or neutral.• In comparative sentiment analysis we have to deal with multi-

aspect comments. Commentator compares more than one things, people or idea on the basis of some aspects.

Tasks Involved

• To find relevant comments following tasks are involved:

1. Gathering of data (gathering comments)2. Removal of noisy and irrelevant data.3. Manual assignment of sentiments to the comments in order to

make training corpus.4. Development and evaluation of classification model

Naïve Bayes Classifier

• Simple classification of words based on ‘Bayes theorem’.• It is a ‘Bag of words’ (text represented as collection of it’s

words, discarding grammar and order of words but keeping multiplicity) approach for analysis of a content

• Application: Sentiment detection, Email spam detection, Document categorization etc.

• Probabilistic Analysis of Naïve Bayes: for a document d and class c , by Bayes theorem

)()()/()|(

dPcPcdPdcP

Data Analysis

• It has worked on Iphone vs Android video, which consisted of over 8000 comments.

• Then filtered comments and only used comparative comments in the research.

• The dataset in this research is about 400 comments which are almost 5% of the original dataset.

Methodology followed

• Data collection• Class assignment (2 labels and 9 classes)• Facing difficulties with assigning annotations -handling problems with symbols and short forms -ambiguity in comments: various types• Finding part of speech and neighbor words of keywords from

comments• Using tools and steps for classification• Finding better results

Tools and Steps used

• We used WEKA(single label classification + joined label classification) and MEKA (multi label classification), specialized software , to perform machine learning tasks

• Following are the steps taken to develop classification model: Data Processing and Class balancing Classification Naïve Bayes Probabilistic classifier

Results obtained

• The results in terms of different performance measures are not satisfactory but the naïve assumption regarding neighborhood words of keywords performed well as compare to others.

• Single label comments and Joined label comments give poorer results than multi label

Contd…

THANKS

naïve multi label classification of you tube comments using

Education