scalable sentiment classification for big data analysis using naive bayes classifier
TRANSCRIPT
![Page 1: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/1.jpg)
2013 IEEE International Conference on Big Data
Scalable Sentiment Classification for Big DataAnalysis Using Naive Bayes Classifier
Bingwei Liu, Erik Blasch, Yu Chen, Dan Shen and Genshe Chen
![Page 2: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/2.jpg)
outline
✤ introduction
✤ Naive Bayes Classification
✤ implementation of Naive Bayes in hadoop
✤ experimental study
![Page 3: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/3.jpg)
introduction
A typical method to obtain valuable information is to extract the sentiment or opinion from a message
In this paper, it aim to evaluate the scalability ofNaive Bayes classifier (NBC) in large datasets
![Page 4: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/4.jpg)
introduction
NBC is able to scale up to analyze the sentiment of millions movie reviews with increasing throughput
the accuracy of NBC is improved and approaches 82%
![Page 5: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/5.jpg)
Naive Bayes Classification
naive Bayes classifiers is simple probabilistic classifiers based on applying Bayes' theorem with
strong (naive) independence assumptions between the features
a popular method for text categorization,( the problem of judging documents as belonging to one
category)
![Page 6: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/6.jpg)
Naive Bayes Classification
prior probability :
posterior probability:
P(A)
P(A|B)
![Page 7: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/7.jpg)
Naive Bayes Classification
P(POS|excellent,terrible) = P(POS) x P(excellent,terrible|POS)
P(excellent,terrible)
P(POS|d1) = P(POS) x P(d1|POS)
P(d1)
Bayes' theorem
![Page 8: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/8.jpg)
Naive Bayes Classification
P(POS|excellent,terrible) = P(POS) x P(excellent,terrible|POS)
P(excellent,terrible)
P(excellent,terrible|POS) P(excellent|POS) x P(terrible|POS)
independent
P(POS|excellent,terrible) = P(POS) x P(excellent|POS) x P(terrible|POS)
P(excellent,terrible)
![Page 9: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/9.jpg)
Naive Bayes Classification
classes excellent terrible
d1 POS 5 1
d2 NEG 2 6
P(POS|excellent,terrible) = P(POS) x P(excellent|POS) x P(terrible|POS)
P(excellent,terrible)
P(POS|excellent,terrible) =
P(NEG|excellent,terrible) =
d3 : (excellent,8),(terrible,2)
56
( )16
( )
12
828
( ) 268
( )x x
12
856
( ) 216
( )x x
![Page 10: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/10.jpg)
Naive Bayes Classification
P(POS|excellent,terrible) =
P(NEG|excellent,terrible) =
d3 : (excellent,8),(terrible,2)12
856
( ) 216
( )x x12
828
( ) 268
( )x x
0.00323011165
0.00000429153
d3 is POS
![Page 11: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/11.jpg)
Naive Bayes Classification
12
856
( ) 216
( )x x
![Page 12: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/12.jpg)
Naive Bayes Classification
N is the total number of documents,Nc is the number of documents in class c
Nwi is the frequency of a word wi in class c.
![Page 13: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/13.jpg)
implementation of Naive Bayes in hadoop
pre-processing raw dataset
![Page 14: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/14.jpg)
implementation of Naive Bayes in hadoop
1000 positive and 1000 negative review
![Page 15: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/15.jpg)
implementation of Naive Bayes in hadoop
(word,posSum,negSum)
the words frequency in all positive,negative document
(excellent,1000,10)
![Page 16: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/16.jpg)
implementation of Naive Bayes in hadoop
(excellent,1000,10) (excellent,20,5)
(word,posSum,negSum) (word,count,docID)
(docID,count,word,posSum,negSum)
(5,20,excellent,1000,10)
![Page 17: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/17.jpg)
implementation of Naive Bayes in hadoop
(5,10,excellent,20,5)
(5,2,terrible,5,20)
(5,pos,true)
(docID,predict,correct)
(6,neg,false)
(docID,count,word,posSum,negSum)
10xlog(20)+2xlog(5)
10xlog(5)+2xlog(20)
![Page 18: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/18.jpg)
experimental study
one name node and six data nodes. they allocate each VM two virtual CPU and 4GB of memory
7 nodes
a Dell server with 12 Intel Xeon E5-2630 2.3GHz cores and 32G memory
use Xen CloudPlatform (XCP) 1.6 as the hypervisor
![Page 19: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/19.jpg)
experimental study
training data
![Page 20: Scalable sentiment classification for big data analysis using naive bayes classifier](https://reader033.vdocument.in/reader033/viewer/2022052413/55a5f9c61a28ab6b588b457d/html5/thumbnails/20.jpg)
experimental study