an introduction to emoji data science

Post on 21-Jan-2017

399 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Hamdan Azharhamdan@prismoji.com // @hamdanazhar

// November 5, 2016

🐍s, s, & major s🌹 🔑an introduction to emoji data science

🗃📊🗞

why emoji data science?

emojis data science

Overarching goals

■Understanding what emojis mean

■Using emojis to understand the topics we use them to discuss

■Getting past the “so what” hurdle and defining good questions to ask

the birth of

My reaction to this article,in emoji

So we decided to look at some actual data

Getting the data■ Use Twitter API to sample 100,000 tweets for five hashtags related

to Britain’s EU Referendum Hashtags: #NotMyVote, #VoteRemain, #EURef, #Brexit, #VoteLeave Data pulled for June 24, the day after the referendum English language tweets only

After removing retweets, we’re left with 23,989 unique tweets, i.e. the “Brexit dataset”

Of these, 1,505 tweets (6.3%) contain at least one emoji

Analyzing the data Use regular expressions in R, along with Unicode emoji

dictionaries, to extract emojis from tweets

Compute emoji counts in the Brexit dataset

Compare with counts for all >10B emoji tweets on Twitter since 2013 (from emojitracker.com)

Extract hashtags from tweets and compute hashtag profiles for various emojis

emoji emoji namebrexi

t rank

general

rankbrexit index*

general

index*overinde

x**😂 face with tears of joy 1 1 100 100  

flag of united kingdom 2 363 87 0.2 400x

👍 thumbs up sign 3 18 26 11 2.3x👏 clapping hands sign 4 45 24 6 3.9x❤ heavy black heart 5 3 21 45  😭 loudly crying face 6 7 17 29  😔 pensive face 7 13 14 18  😩 weary face 8 11 13 22  😢 crying face 9 27 12 9 1.3x🙈 see-no-evil monkey 10 24 12 9 1.3x* Index is an estimate of how prevalent a given emoji is in Brexit tweets and general tweets, with the most common emoji (😂) being

defined as 100

** Reflects how much more likely a given emoji is to be used in a Brexit tweet vs. generally on Twitter (general rank and index obtained from emojitracker.com). An emoji overindexes on Brexit if both brexit rank < general rank AND brexit index > general index.

Which emojis over-index most heavily for Brexit?(above and beyond their usual popularity on Twitter)

Finding the “hashtag signature” of a given emoji We know the distribution of

hashtags in our entire dataset We can pick a given emoji and

compute the distribution of hashtags for tweets that use that emoji

By comparing these two distributions, we can estimate which hashtags an emoji is most likely to be used with

15%

17%

20%

29%

19%

Hashtag signatures of the top emojis of Brexit

http://motherboard.vice.com/read/the-emojis-of-great-brexit

Taylor Swift is winning hearts (and minds)

Source: Analysis of 100,000 public tweets

mentioning @taylorswift13 and @kanyewest from

Aug. 1-4, 2016. (PRISMOJI)

equal

higher association with

@taylorswift13

higher association with

@kanyewest

Hearts vs. Snakes:The emoji battle underyling the epic Taylor Swift – Kanye West feud

Source: Analysis of 100,000 public tweets

mentioning @taylorswift13 and @kanyewest from

Aug. 1-4, 2016. (PRISMOJI)

#taylorswiftwhatup is the most common hashtag in tweets about both Taylor and Kanye

Source: Analysis of 100,000 public tweets

mentioning @taylorswift13 and @kanyewest from

Aug. 1-4, 2016. (PRISMOJI)

Our common emoji language of #fanlove

Source: Analysis of 250,000 public tweets

mentioning @beyonce, @justinbieber,

@djkhaled, @drake, and @rihanna from

Aug. 1-4, 2016. (PRISMOJI)

Sometimes love hurtsExamples of in tweets involving #fanlove

Source: Analysis of 250,000 public tweets

mentioning @beyonce, @justinbieber,

@djkhaled, @drake, and @rihanna from

Aug. 1-4, 2016. (PRISMOJI)

Some more examples

#firstsevenjobs

Source: Analysis of 32,979 public tweets with

the hashtags #firstsevenjobs and

#first7jobs from Aug. 8, 2016. (PRISMOJI)

Understanding gendered emojis on Twitter#wcw vs #mcm: All hearts are not created equal

higher association

with

#mcm

higher association

with

#wcw

Source: Analysis of 100,000 public tweets

with the hashtags #wcw and #mcm from June 27-

29, 2016. (PRISMOJI)

#Rio2016 Olympics

Source: Analysis of 449,680 public tweets mentioning #rio2016

fromAug. 6-22, 2016.

(PRISMOJI)

higher association with

FIRST 3 DAYS

higher association with

LAST 3 DAYS

Third Presidential Debate

Source: Analysis of public tweets during

third presidential debate on

Oct. 20, 2016. (PRISMOJI)

Three takeaways I’d like you to leave with■Understanding emojis as data can yield

interesting insights

■More work is needed to learn more about what emojis mean, and what they reveal about our world

■You can play around with emoji data too

Thank you!

• Email: hamdan@prismoji.com• Twitter: @hamdanazhar• prismoji.com• hamdanazhar.com

top related