industrialize sentiment analysis for comment moderation
TRANSCRIPT
Industrialize Sentiment Analysis
for Comment Moderation
Maggie Xiong
Huffington Post
Basic Comment Moderation Process
User comments on an article Moderator publishes or rejects a comment based on a
set of guidelines “10 commandments”
Comments for different articles come in every second. We would need a small army to handle the moderation.
The comment should contribute to the discussion, conveying a respectful message, thought or idea, whether or not it agrees with another user or the author.
The comment should not intentionally misspell words, use non-alphabetic characters, or use extra or missing spaces to bypass moderation.
The comment should not attack, demean, belittle, or stereotype any person or group....
JuLiA to the Rescue
Sentiment analysis suite - JuLiA Supports various preprocessing options
Stemming, stopwords, etc Includes a number of popular ML algorithms
SVM, naïve Bayes, AdaBoost (decision tree), etc Uses hadoop for parallelizing the training of different
models and for the exploration of the parameter space Train 1000's of models with different param setup in parallel Pick the winner for production Ensemble the different winners for even higher accuracy
Training Data
Goldset About 20000 comments (~13000 train, ~7000 holdout) Publish-or-reject votes from 3 moderators
Christian and Gay? One Politician's Personal Interview (VIDEO)I'm curious if you have ever watched the film "For The Bible Tells Me So" or if you have read the book "Torn" by Justin Lee. Bottom line: Biblical interpretation varies. If that's your interpretation of the scripture then make sure you abide by it.
Rick Santorum On Middle Class: 'That's Marxism Talk,' 'There's No Class In America'what an angry petty little man he is. issues too. lots of issues he needs to work on. He certainly has nothing of value to offer or to say. he's a screwed up little prick
Paul Ryan Spending Cuts Face Backlash From Moderate RepublicansYou seem to take a negative view of democrats and draw reference to a study "I co-authored with Robert Book".....sort of like a Muslim professor writing a book on Christianity your biases disqualify you from offering anything other than a self serving opinion....now of course I'm just using republican/fox news logic here"
Training Process
73 923 balanced_winnow 5 1 10 …73 923 balanced_winnow 5 2 10 …73 923 balanced_winnow 5 3 10 …73 923 balanced_winnow 5 1 20 …73 923 balanced_winnow 5 2 20 …73 923 balanced_winnow 5 3 20 …
…
Train Request (a parameter set per line)
Investments are taxed as capital gains..... 1It was the overleveraged and underregulated banks … 1I am afraid we may be headed for … 1In the famous words of Homer Simpson, “it takes 2 to lie …” 0
…
Training Data
Model 1Model 1
Model 2Model 2
Model 3Model 3
Model 4Model 4
Model 5Model 5
Model kModel k
Hadoop Cluster
Results
Single best model: Naïve Bayes
Results
Model decision on goldset approved comments
Model decision on goldset rejected comments
Pool for Better Results
Logistic regression using multiple model results
Pool for Better Results
Model decision on goldset approved comments
Model decision on goldset rejected comments
Further Steps
Improve the training data set Data gathered within moderators' normal work flow More votes per comment More comments
Per vertical models Incorporate comment-to-article similarity
In addition to saving his own life, Zimmerman likely save a couple other lives as well.