when lolcat meets philosoraptors’: an analysis of online ... · global vs local: comparative...

‘When Lolcat meets Philosoraptors’: An analysis of online social dynamics

using images

Bodhisattwa Prasad Majumder (15BM6JP11)Jaideep Karkhanis (15BM6JP18)

Jayanta Mandi (15BM6JP19)Shashank Kumar (15BM6JP43)

The Overall Activity In The Community (Troll Malayalam)

All of the images in the data has been clustered to form templates of memes.

Meme Templates

Zipf’s Law

Appearances of different templates

Meme Arrival Prediction: RPP

• To model the arrival of memes we used Reinforced Poisson Process (RPP), which is based on the intuition

of rich gets richer.

• RPP models the popularity dynamics of a content from it’s time series arrival data by modeling the

intrinsic attractiveness and rate of decay of the content

• Calculating the accuracy of prediction on a random set of 10 meme templates we achieved at Mean

Absoulte Percentage deviation of 13.15.

Meme Arrival Prediction: RPP

Clustering meme templates into temporal profiles

• From the temporal characteristics we have found existence for eight profiles with different temporal characteristics.

• Feature those are taken to generate temporal profiles are: lifespan, mean arrival time, IQR of arrival

time, maximum appearance per day , mean appearance per day, variance of no of appearance per day

Profile Identification Cluster_id lifespan mean_appearence IQR max_appearence_per_day mean_appearence_per_day variance_of_appearence__frequency_per_day

1 0.198293672 0.124086078 0.009514603 0.043582375 0.001633 0.000230229

2 0.739070804 0.763098891 0 0.034482759 0.000212441 2.45E-05

3 0.044207932 0.027915294 0.000780965 0.042821406 0.009104218 0.001159038

4 0.001135672 8.43E-05 1.10E-07 0.03588337 0.073150402 0.00011302

5 0.746343489 0.343120768 0.546262826 0.036511156 0.000348047 4.18E-05

6 0.42752443 0.356217302 0.032258219 0.035057471 0.000453536 5.28E-05

7 0.004399919 0.000485943 2.47E-05 0.159382518 0.270711471 0.017347505

8 0.803442073 0.149837737 0.143304667 0.069480185 0.001269147 0.000251875

Short description of each profile: 1. Having not so low lifespan and not so low mean arrival time2. Mostly persistent but very infrequent which means they exists in the system memory but very rarely used3. Having low lifespan but not as low as spam images, transient profiles 4. All singleton templates those appeared only once5. Persistent, but high mean inter-arrival time than classic memes6. Medium lifespan and medium inter-arrival time7. Spam images which appears more than once in a day but very short lifespan 8. Persistent and classic memes having highest lifespan

Examples and verification

SpamPersistent and classic

Persistent and classic


SpamPersistent but appears less than classic

Persistent but infrequent


Spam Misclassified as Spam

Observation: There are some images those have been used like a spam but are not spam if only temporal characteristics are considered but if the meat data of likes have been considered then it is clearly identifiable

For spam the 90th percentile of likes is 12 where as for non-spam images those are transient and have high mean appearances per day, the 90th

percentile of likes is 674

Meme Usage Curve: Alive and Dead memes

• As previously discussed the memes are clustered into 8 profiles.

• For each cluster the ECDF of the life-time is calculated.

• For each appearance of meme the active life is assumed to be identical of the ECDF of the profile

containing the meme.

ECDF approach

Meme Usage Curve

Meme Usage Curve: Post Images

Cumulative Meme Usage Curve: Post Images

Global vs Local: Usage Plot

Global vs Local: Vocabulary growth curve

Global vs Local: Active meme per day

Global vs Local: Comparative measures

Aside from Vocabulary growth curve we define two other metric to compare between Global and Local memes

For Global memes: Mean interarrival time is of 1.693241 daysFor Local memes: Mean interarrival time of 0.2808407 days

Popularity Index (defined based on the number of memes qualifies a threshold of having likes more than the 90th percentile of the like distribution):

3.63% Global memes are popular whereas 12.4% Local memes are popular

Rationale : Like Prediction of a post

We have observed that post images tend to get more likes than comment images

The problem is can we predict popularity of a post based on the comment stream!

The comment stream involves user interaction and the experiment is to look for important information hidden in the temporal evolution and popularity of comment stream which can predict the popularity of the post

Like Prediction of a postResponse Variable = Popularity of the post

where the post likes has been used a proxy of the popularity.We defined 4 popular zone based on the likes distribution

Zones: 0-1000, 2000-6000, 6000-11000, > 11000

Features: Max_likes = Maximum likes got any comment under the postMean_likes = Average number of likes got by all comments Total_likes = Total number of likes achieved by all comments under the post Total_comments = Total number of comments for the post Unique_users = Total number of unique users have commented in the postEarliest_time = Time associated with the earliest commentLatest_time = Time associated with the latest commentPost_popularity = Aggregate of popularity index of all users those who commented in the postPost_day = The day of week the post has been postedPost_time = The time of the day the post has been posted (factor into different time slots Eg: early morning, Late night, evening, noon etc. )

User popularity has been created based on the comment behavior of a particular user. A score has been created based on the likes he gets for each comment

Results (Classifier used: Boosted Trees)Result taking all comments (23K) Result taking only image comments (11K)

Test Accuracy = 75.7% Test Accuracy = 77.28%

The baseline model has been taken as SVM which gave ~ 60 % accuracy

Conclusion

• The reappearance frequency of memes follow pattern similar to Zipf’s Law• The arrival times of individual templates can be predicted by Reinforced

Poisson Process• Based on temporal characteristics, templates can be profiled • The growth of Post vocabulary is almost saturating• Borrowing of ideas from Global subculture can be seen at the inception but

later Local memes have taken over based on usage and popularity• Finally popularity of a post image can be predicted only based on image

comment which shows the predictive power image comments over text comments

References

[1] Monojit Choudhury and Animesh Mukherjee , The Structure and Dynamics of Linguistic Networks

[2] Bradley E Wiggins and G Bret Bowers , Memes as genre: A structurational analysis of the memescape

[3] Lada A. Adamic et al. , Information Evolution in Social Networks

[4] Hua-Wei Shen et. al., Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes

when lolcat meets philosoraptors’: an analysis of online ... · global vs local: comparative...

Documents