who cares about sarcastic tweets? investigating the impact of sarcasm on sentiment analysis
DESCRIPTION
Presentation given at LREC 2014 on detecting the scope of sarcasm in tweets.TRANSCRIPT
University of Sheffield, NLP
Diana MaynardMark Greenwood
University of Sheffield, UK
Who cares about sarcastic tweets?
Twitter is full of mindless drivel
● OMMMFG!!! JUST HEARD EMINEM'S “RAPGOD”. SMFH!!! these other dudes might as well stop rapping if they not on this level
● i've got dressed but only because I need biscuits● I used to be so bad at naming any k idol group members pmsl I
would get so confused and now I'm pro ;)))● im gonna learn to be a lifegaurd hopfully so while everyone else is
working in a shop actually doing stuff il be sitting on a pool side.yay
What are people reading about?
● Of the top 10 Twitter accounts with the highest number of followers:
● 7 pop stars● 2 social media sites● and Barack Obama
● Why on earth do we care about this stuff?
Even the mindless drivel could be useful
● OMMMFG!!! JUST HEARD EMINEM'S “RAPGOD”. SMFH!!! these other dudes might as well stop rapping if they not on this level
● i've got dressed but only because I need biscuits● I used to be so bad at naming any k idol group members pmsl I
would get so confused and now I'm pro ;)))● im gonna learn to be a lifegaurd hopfully so while everyone else is
working in a shop actually doing stuff il be sitting on a pool side.yay
➔ English people like biscuits. A lot.➔ What do young people think about their future careers?➔ People who like K Idol and RapGod also like Apple
products
Sarcasm is a part of British culture
● The BBC has its own webpage on sarcasm designed to teach non-native English speakers how to be sarcastic successfully in conversation
How do you know when someone is being sarcastic?
• Use of hashtags in tweets such as #sarcasm, #irony, #whoknew etc.
It's not like I wanted to eat breakfast anyway #sarcasm
• Large collections of tweets based on hashtags can be used to make a training set for machine learning
• But you still have to know what to do with sarcasm once you've found it
• Sarcasm generally entails saying the opposite of what you mean
– But it doesn't necessarily just invert the polarity of an opinion
– “It's not like I wanted to eat breakfast anyway” is negative when uttered sarcastically, but non-opinionated when uttered neutrally.
My friend Barry likes Apple products
Or does he?
Understanding sarcasm is hard
Sarcastic or not?
How about now?
It often requires world knowledge
Capitalisation indicates sarcasm
But not always
What does sarcasm do to polarity?
● Sarcasm often indicated by hashtags in tweets such as #sarcasm, #irony, #whoknew etc.
● It's very hard to identify sarcasm outside these parameters● In general, when someone is being sarcastic, they're saying the
opposite of what they mean● So as long as you know which bit of the utterance is the sarcastic bit,
you can simply reverse the polarity
Eating breakfast food for lunch. Living the dream.
#toast #rebel #sarcasm● If there is no polarity on the original statement, the sarcastic version is
probably negative
It's not like I wanted to eat breakfast anyway #sarcasm● If there's more than one hashtag, you need to look at the combination,
and any sentiments they express
Getting the scope of hashtags right
Eating breakfast food for lunch. Living the dream.
#toast #rebel #sarcasm
Getting the scope of hashtags right
Eating breakfast food for lunch. Living the dream.
#toast #rebel #sarcasm
Getting the scope of hashtags right
Eating breakfast food for lunch. Living the dream.
#toast #rebel #sarcasm
Getting the scope of hashtags right
Eating breakfast food for lunch. Living the dream.
#toast #rebel #sarcasm
Getting the scope of hashtags right
Eating breakfast food for lunch. Living the dream.
#toast #rebel #sarcasm
Analysing Hashtags
What's in a hashtag?
● Hashtags often contain smushed words● #SteveJobs● #CombineAFoodAndABand● #southamerica
● For NER we want the individual tokens so we can link them to the right entity
● For opinion mining, individual words in the hashtags often indicate sentiment, sarcasm etc.
● #greatidea● #worstdayever
● We need to retokenise hashtags so that we can use the content in our application
How to analyse hashtags?● Camelcasing makes it relatively easy to separate the words, using an
adapted tokeniser, but many people don't bother● We use a simple approach based on dictionary matching the longest
consecutive strings, working L to R● We use a combination of dictionaries (Linux dictionary, slang
dictionary, plus gazetteers of Named Entities, modified manually)● #lifeisgreat -> #-life-is-great● #lovinglife -> #-loving-life
● It's not foolproof, however● #greatstart -> #-greats-tart
● In an experiment with 2010 English hashtags (4538 tokens): P=98.12%, R=96.41% , F1= 97.25%.
● We could use a language modelling approach based on bigrams and trigrams, but since hashtags are often novel, it might not help much
Identifying the scope of sarcasm
I am not happy that I woke up at 5:15 this morning. #greatstart #sarcasm
You are really mature. #lying #sarcasm
Rules for identifying scope
I am not happy that I woke up at 5:15 this morning. #greatstart #sarcasm
● negative sentiment + positive hashtag + sarcasm hashtag● The positive hashtag becomes negative with sarcasm
You are really mature. #lying #sarcasm● positive sentiment + sarcasm hashtag + sarcasm hashtag● The positive sentiment is turned negative by both sarcasm
hashtags● When in doubt, it's usually safe to assume that a sarcastic
statement carries negative sentiment
Experiments with sarcastic hashtags
Collected a corpus of 134 tweets containing #sarcasm Manually annotated sentences with sentiment
266 sentences, of which 68 opinionated (25%) 62 negative, 6 positive (yes, this is biased...)
Adding sarcasm detection improved accuracy of polarity
detection from 27.27% to 77.28% Even though we know these sentences are sarcastic, we don't always get
polarity right After implementing rules for sarcasm scope, 91% accuracy
Conclusions● Unlike most work on sarcasm detection, we don't try to
identify sarcasm where it's not explicitly indicated● We instead examine the effect that known sarcasm has on the
sentiment expressed in tweets● We retokenise hashtags so that we can make use of
information within them in order to identify sarcasm scope● We develop a set of rules for determining sarcasm scope, and
improve polarity detection as a result● Lots more work could be done on this topic, but it's a
#greatstart #really
Questions?
?