tracking the yak: an empirical study of yik...

1
Tracking the Yak: An Empirical Study of Yik Yak Martin Saveski, Sophie Chou, Deb Roy Laboratory for Social Machines, MIT Media Lab USAGE OF VULGARITIES RQ1: Is vulgar language more acceptable on Yik Yak than on Twitter? TOPIC MODELLING RQ2: Do unique topics emerge in conversations on Yik Yak? Fraction of Yaks/Tweets that belong to this topic 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Topic # 1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 DISCUSSION Limitations Missing content of banned yaks, they may contain unique topics, Geotagged tweets may not be representative of all tweets, Data from a short time period: 19 days. Future Directions How topics vary across locations, Yik Yak as a social signal: are there any correlations between the activity on Yik Yak and the characteristics of the schools. LDA topic modeling with 200 topics. The colors show the proportion of tweets (blue) and yaks (green) that belong to each topic. We find that, on average, topics tend to be evenly split across both platforms and that there are no topics that are unique to Yik Yak. Posts on Yik Yak are only slightly more likely to contain vulgar words than posts on Twitter. We find that yaks that contain vulgar words are 38% more likely to be downvoted and have negative ratings. Banned yaks are 61% more likely to contain vulgar words. ACTIVITY PATTERNS 9 AM 3 AM 6 PM Time of Day 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Fraction of Posts Midnight YakYak Twitter Noon 0 2 4 6 8 10 12 14 16 Volume (%) Twitter YikYak Mon Tue Wed Thr Fri Sat Sun think yak yik people girl guy girls sex guys don roommate yaks campus someone fuck doesn feel actually anyone want comment boyfriend class dick semester exam frat acorn also sometimes yeah girlfriend attractive tinder study studying gpa anchor classes sorority relationship hook sailboat yellow person finals major mean pike engineering friends socks gay probably aren everyone else college shovel pink friend isn yakarma balloon greek compass yikyak h2p horny sexual purple freshman 000 penis boot saying icon mushroom upvote student honestly know maybe post white asian racist well dating exams roommates porn really flashlight comments binoculars hot find grades professor boston atlanta austin nigga game park philadelphia francisco niggas san tweet airport twitter team via report photo pittsburgh center great win season new today ain wit tweets tonight 2015 city philly event easter providence birthday fans day playoffs show international home play ass intersection aint oakland follow coach bout song madison fenway ima trending series smh atl favorite trends iphone supplemental flight hotel fan house conference lil logan top20 theatre sunday video playoff sox baseball games prom bruce jenner 1st museum tweeting gotta hartsfield ave station ready bar mike yall nurse bit album church love giants phillies apr lmao eatingndastreets313 Yaks vs Tweets. Yaks are dominated by words specific to college life, while tweets contain city names and sport related words. Upvoted vs Downvoted Yaks. Upvoted yaks are characterized by mostly general and slightly positive words, while downvoted yaks contains mostly inappropriate and racist words. feel friends time thanks week sleep h2p bed alone class yes feeling thank definitely really hours yeah weeks awkward also finals someone relationship anxiety life talk well lot nap roommate don way year love start netflix haha ask work late hug find may contact semester sometimes always depression luck feels depends though day never years appreciate good great even together studying actually pizza friend long things minutes probably sounds thought might pretty sure early usually almost hope professor study motivation keep makes classes months times exams asleep grades college struggle helps summer awesome much done finally happy procrastination worry coffee ass white black anyone middle bit pike girls page pussy hillary paul kik storm john fuck jack nigga racist vote sunny dick women joe tonight santa fat zbt ben indians yeti jimmy madison jerry frank sig walker carry aaron bitch cock suck jon feminists austin asshole gay wanna clinton tiny blacks asian hilary james hitler steve cunt dtf hernandez michael reee bitches 420 geeds taylor president cunningham kappa scott ladies phi ross sammy geed yak ann addy harry party theta smith mike kim weed chris downvote rand cathy matt george liberal ted sailboat jews sigma mary jackson chase potter sean ike slut windy wen rape snapchat retarded utunrated stfu david thomas burge liberals niggas racism gun snap anybody jenna jason CONTENT ANALYSIS Daily Posting Patterns. Twitter users tend to post earlier in the day. The peak of activity on Yik Yak is at midnight, three hours later than on Twitter. Weekly Posting Patterns. The posting activity on Yik Yak is highest on weekdays and declines during weekends. The volume of tweets is steady across the week. Upvoted Downvoted Yik Yak Twitter Laboratory for Social Machines MIT Media Lab ABSTRACT To investigate the effects of anonymity on user behavior, we study the new and controversial social app, Yik Yak. As a comparison, we look at Twitter, which has similar limitations on lengths of posts, but is public and global rather than anonymous and local. We find that posts on Yik Yak are only slightly more likely to contain vulgarities, and we do not find any significant bias in topic distributions on Yik Yak versus on Twitter; however, differences in vocabulary and most discriminative words used suggest the need for further analysis. THE YIK YAK APP Yik Yak is a Twitter-like anonymous social smartphone app. Within a 10 mile radius, users can leave posts (yaks) of up to 200 characters and interact with other community member's posts. Users can upvote and downvote posts; if a post receives 5 downvotes, it is deleted, creating a self- selecting mechanism for censoring content. DATA COLLECTION For Yik Yak, we set up scrapers to run in 5-minute intervals, with a 12- hour lookback to collect replies. For Twitter, we approximated Yik Yak hotspots by looking at geotagged tweets in a 10-mile radius around the same locations. 19 Days 20 College Campuses in USA 1.9M Yaks 1M Tweets

Upload: others

Post on 30-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tracking the Yak: An Empirical Study of Yik Yakweb.media.mit.edu/~msaveski/assets/publications/2016...Tracking the Yak: An Empirical Study of Yik Yak Martin Saveski, Sophie Chou, Deb

Tracking the Yak: An Empirical Study of Yik Yak Martin Saveski, Sophie Chou, Deb Roy − Laboratory for Social Machines, MIT Media Lab

USAGE OF VULGARITIES RQ1: Is vulgar language more acceptable on Yik Yak than on Twitter?

TOPIC MODELLING RQ2: Do unique topics emerge in conversations on Yik Yak?

Frac

tion

of Y

aks/

Twee

ts th

at b

elon

g to

this

topi

c

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Topic #1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200

DISCUSSION Limitations •  Missing content of banned yaks, they may contain unique topics, •  Geotagged tweets may not be representative of all tweets, •  Data from a short time period: 19 days.

Future Directions •  How topics vary across locations, •  Yik Yak as a social signal: are there any correlations between the

activity on Yik Yak and the characteristics of the schools.

LDA topic modeling with 200 topics. The colors show the proportion of tweets (blue) and yaks (green) that belong to each topic.

We find that, on average, topics tend to be evenly split across both platforms and that there are no topics that are unique to Yik Yak.

Posts on Yik Yak are only slightly more likely to contain vulgar words than posts on Twitter. We find that yaks that contain vulgar words are 38% more likely to be downvoted and have negative ratings. Banned yaks are 61% more likely to contain vulgar words.

ACTIVITY PATTERNS

9 AM 3 AM6 PM

Time of Day

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08Fr

actio

n of

Pos

tsMidnight

YakYak Twitter

Noon

0

2

4

6

8

10

12

14

16

Volu

me

(%)

TwitterYikYak

Mon Tue Wed Thr Fri Sat Sun

thinkyakyikpeople

girlguygirls

sexguys

donroommate

yakscampus

someonefuckdoesn

feel

actually

anyone

want

comment

boyfriend

class

dick

semester

exam

frat

acorn

also

sometimes

yeahgirlfriend

attractivetinder

study

studying

gpa

anchor

classes

sorority

relationship

hook

sailboatyellowperson

finals

major

mean

pike

engineeringfriends

socks

gay

probably

aren

everyone

elsecollegeshovel

pinkfriend

isn

yakarma

balloongreek

compass

yikyak h2p

horny

sexual

purple

freshman

000

penisboot

saying

icon

mushroom

upvotestudenthonestly

know

maybe

post

white

asian

racist

well

dating

exams

roommates

porn

really

flashlight

comments

binoculars

hot

find

grades

professor

bostonatlanta

austinniggagame

parkphiladelphia

francisco

niggas

san

tweet airporttwitter team

via

report

photo pittsburghcenter

great

win

season

new

todayain

wit

tweetstonight

2015city

philly

event

easter

providence

birthday

fansday

playoffs

show

international

home

play

ass

intersectionaint

oakland

follow

coach

bout

song

madison

fenway

ima

trending

series

smh

atlfavorite

trends

iphone

supplemental

flighthotel

fan

house

conference lil

logan

top20

theatre

sunday

video

playoff

sox

baseball

games

prom

bruce

jenner1st

museumtweeting

gotta

hartsfield

ave

station

readybar

mike

yall

nurse

bit album

church

love

giants

philliesapr

lmao

eatingndastreets313

Yaks vs Tweets. Yaks are dominated by words specific to college life, while tweets contain city names and sport related words.

Upvoted vs Downvoted Yaks. Upvoted yaks are characterized by mostly general and slightly positive words, while downvoted yaks contains mostly inappropriate and racist words.

feelfriendstimethanks

weeksleep

h2p

bed

alone

class yes

feeling

thank

definitelyreally

hours

yeah

weeksawkwardalsofinals

someone

relationship

anxiety

life

talk well

lot

nap

roommate

don

way

yearlove

startnetflix

haha ask

worklate

hug

find

may

contact

semestersometimes

always depression

luck feelsdependsthough

day

never

years

appreciate

good

great

even

together

studying

actuallypizza

friend

long

things

minutesprobably

sounds

thought

might

pretty

sure

early

usually

almost

hopeprofessor

studymotivation

keep makesclasses

monthstimesexams

asleep

grades

college

struggle helps

summer

awesome

much

done

finally

happyprocrastination worry

coffee

asswhite

black

anyonemiddle

bit

pike

girls

pagepussyhillary

paulkik

stormjohnfuck

jack

nigga

racistvote

sunny

dickwomenjoe

tonightsanta

fat

zbtbenindians

yeti

jimmy

madison

jerry

franksig

walkercarry

aaron

bitch

cock

suck

jon

feministsaustin

asshole

gay wanna

clinton

tiny

blacks

asian

hilaryjames

hitler

steve

cunt

dtf

hernandezmichael

reee

bitches420

geeds

taylor presidentcunningham

kappascott

ladies

phi

ross

sammy

geed

yak

ann

addyharry

party

theta smith

mike

kim

weed

chris

downvote

rand cathy

matt

george

liberal

ted

sailboat

jews

sigma

mary

jackson

chase

potter

sean

ike

slut

windy

wen

rape

snapchat

retarded

utunrated

stfu

david

thomas

burge

liberals

niggas

racism

gun

snap

anybody

jenna

jason

CONTENT ANALYSIS

Daily Posting Patterns. Twitter users tend to post earlier in the day. The peak of activity on Yik Yak is at midnight, three hours later than on Twitter.

Weekly Posting Patterns. The posting activity on Yik Yak is highest on weekdays and declines during weekends. The volume of tweets is steady across the week.

Upvoted Downvoted

Yik Yak Twitter

Laboratory for Social Machines

MIT Media Lab

ABSTRACT To investigate the effects of anonymity on user behavior, we study the new and controversial social app, Yik Yak. As a comparison, we look at Twitter, which has similar limitations on lengths of posts, but is public and global rather than anonymous and local. We find that posts on Yik Yak are only slightly more likely to contain vulgarities, and we do not find any significant bias in topic distributions on Yik Yak versus on Twitter; however, differences in vocabulary and most discriminative words used suggest the need for further analysis.

THE YIK YAK APP Yik Yak is a Twitter-like anonymous social smartphone app. Within a 10 mile radius, users can leave posts (yaks) of up to 200 characters and interact with other community member's posts. Users can upvote and downvote posts; if a post receives 5 downvotes, it is deleted, creating a self-selecting mechanism for censoring content.

DATA COLLECTION For Yik Yak, we set up scrapers to run in 5-minute intervals, with a 12-hour lookback to collect replies. For Twitter, we approximated Yik Yak hotspots by looking at geotagged tweets in a 10-mile radius around the same locations.

19 Days

20 College Campuses in USA

1.9M Yaks 1M Tweets