estimating review score from words
DESCRIPTION
CMPE 545 Artificial Neural Networks. Estimating review score from words. Işık Barış Fidaner. S. = 1/N . score i. Metascore. The rating given to this product. r t =. The source of this review. Score. Reviewer. Quote. + affectionate. A few sentences that summarize this review. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/1.jpg)
Estimating reviewscore from words
Işık Barış Fidaner
CMPE 545 Artificial Neural Networks
![Page 2: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/2.jpg)
Metascore= 1/N . scorei
![Page 3: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/3.jpg)
Score Reviewer
Quote
The rating given to this product
The source of this review
A few sentences that summarize
this review
xt = ?
rt =
+ exuberant
+ embrace
+ affectionateBag of wordsrepresentation
Existence of somewords in the quote
![Page 4: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/4.jpg)
Purposes
1. A new database that relates text to score
(...)An affectionate, exuberant picture that seeks to bring even those who don't know Klingon from Portuguese into the embrace of a pop-culture phenomenon.(...)
90?
![Page 5: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/5.jpg)
Purposes
2. Quantify meaning with machine learning
rivetingexhileratingaffectionatecraftedexuberantdulllackingembrace
00101001
Review quote:
An affectionate, exuberant picture that seeks to bring even those who don't know Klingon from Portuguese into the embrace of a pop-culture phenomenon.
xt
73
70
65
wT
![Page 6: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/6.jpg)
Purposes
3. Meta-metacritic deductions, such as
Positive words
rivetingexhileratingcraftedsuperbextraordinarybrilliant
Negative words
unfunnytediousfailsmessdulllacking
![Page 7: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/7.jpg)
Obtaining the database
• Developed a PHP web crawler• It ran for a few days• TV show reviews– 8,335 records
• Music album reviews– 62,293 records
• Movie reviews– 113,456 records
MySQL
PHP
![Page 8: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/8.jpg)
Bag of words assumption
• Features affect the result independently
=An affectionate, exuberant picture that seeks to bring even those who don't know Klingon from Portuguese into the embrace of a pop-culture phenomenon.
phenomenon from an exuberant picture those into a portugese don’t pop-culture affectionate to embrace bring klingon of who know seeks
• Semantic organization does not matter
![Page 9: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/9.jpg)
Bag of words assumption
• The problem with modifiers:
This is not good. Is this not good?
• We rely on the information encoded in the vocabulary, not grammar
• Opinions expressed clearly and simply:
Excellent, wonderful! This is dreadful.
![Page 10: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/10.jpg)
Word selection
1. Quote count (QC)2. Product count (PC)
• Meaningful words (SS < SSmax = 20)
• Frequently used words (PC > PCmin = 20)
• Non-grammatical words (PC < PCmax = 100)
3. Score mean (SM)4. Score stdev (SS)
~20 thousand words ~300 words
![Page 11: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/11.jpg)
Significant words for TV and movies
unfunny
wastedisappointmentsupposed, fails
fancy words!casual words!
Movies areoverrated!
TV takes toomuch time!
![Page 12: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/12.jpg)
Significant words for music albums
masterpieceartists
Music is art
datemodern
Music agesquickly
personalityAlbums are attachedto the musician’spersonality
![Page 13: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/13.jpg)
The input vector and estimation
• Example input vector (divided by quote size)– xt = [1 0 0 1 0 0 0 1 0 0 0 0 ... 0] / 3
• Estimation function
• There is a weight for every selected word• xt chooses the subset of contained words• Estimation is the sum of w0 and the
arithmetic mean of the weights of contained words
![Page 14: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/14.jpg)
Linear and SVM regression
• Linear regression uses square difference err.
• Which imply these update equations:
• SVM regression uses -sensitive error func.
• With these simpler update equations
![Page 15: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/15.jpg)
Linear regression learning
Unstable learning in validation set
Error of 17 points
Error of 14 points
![Page 16: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/16.jpg)
SVM regression learning
Robustness increased, because SVM error function is linear and tolerant to error.
Error of 13 points
Error of 11 points
Better resultswith SVM!
![Page 17: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/17.jpg)
Possible improvements
• Non-linear model that actually weighs the importance of words
• Normalization by estimating reviewer parameters
• Adding two-word combinations to the input vector
![Page 18: Estimating review score from words](https://reader036.vdocument.in/reader036/viewer/2022062519/56814e95550346895dbc40ff/html5/thumbnails/18.jpg)
Estimating reviewscore from words
Işık Barış Fidaner
CMPE 545 Artificial Neural Networks