naïve multi label classification of you tube comments using
TRANSCRIPT
Naïve Multi-label classification of YouTube comments
using comparative opinion mining
By- Nidhi Baranwal MCA 5th sem
Introduction
• People are connecting with each other in cyber space and show their sentiments in the form of comments. YouTube is considered as a king in the field of video sharing.
• There are situations in which opinion shared by user has comparative content. User sees the video of comparison of two options and shares his preference based on some reasoning.
• In this paper, Naïve Bayes machine learning algorithm is used to perform multi-label classification to find out the sentiments of the commentators .
• In order to reduce the computational requirements, it uses a naïve assumption that words around keywords related to particular option are enough to understand the sentiments of user.
Classification?
• Classification is a task to predict a class(label) of an instance based on data
• Supervised Learning Example: Naïve Bayes• We give the system a set of instances to learn • System builds knowledge of some structure• System can then classify new instances
Types of Classification
• Binary classification: each instance can be only one out of two classes
• Multiclass classification: each instance can be only one out of more than two classes
• Multi-label classification: each instance can be multiple classes at the same time
• Hierarchical multi-label classification: classes are organized in a hierarchy
Opinion Mining?
• Opinion mining or Sentiment analysis is concerned as “How people think about particular thing, person or idea”. • It is the process of determining whether a piece of writing is
positive, negative or neutral.• In comparative sentiment analysis we have to deal with multi-
aspect comments. Commentator compares more than one things, people or idea on the basis of some aspects.
Tasks Involved
• To find relevant comments following tasks are involved:
1. Gathering of data (gathering comments)2. Removal of noisy and irrelevant data.3. Manual assignment of sentiments to the comments in order to
make training corpus.4. Development and evaluation of classification model
Naïve Bayes Classifier
• Simple classification of words based on ‘Bayes theorem’.• It is a ‘Bag of words’ (text represented as collection of it’s
words, discarding grammar and order of words but keeping multiplicity) approach for analysis of a content
• Application: Sentiment detection, Email spam detection, Document categorization etc.
• Probabilistic Analysis of Naïve Bayes: for a document d and class c , by Bayes theorem
)()()/()|(
dPcPcdPdcP
Data Analysis
• It has worked on Iphone vs Android video, which consisted of over 8000 comments.
• Then filtered comments and only used comparative comments in the research.
• The dataset in this research is about 400 comments which are almost 5% of the original dataset.
Methodology followed
• Data collection• Class assignment (2 labels and 9 classes)• Facing difficulties with assigning annotations -handling problems with symbols and short forms -ambiguity in comments: various types• Finding part of speech and neighbor words of keywords from
comments• Using tools and steps for classification• Finding better results
Tools and Steps used
• We used WEKA(single label classification + joined label classification) and MEKA (multi label classification), specialized software , to perform machine learning tasks
• Following are the steps taken to develop classification model: Data Processing and Class balancing Classification Naïve Bayes Probabilistic classifier
Results obtained
• The results in terms of different performance measures are not satisfactory but the naïve assumption regarding neighborhood words of keywords performed well as compare to others.
• Single label comments and Joined label comments give poorer results than multi label
Contd…
Contd…
THANKS