when lolcat meets philosoraptors’: an analysis of online ... · global vs local: comparative...
TRANSCRIPT
-
‘When Lolcat meets Philosoraptors’: An analysis of online social dynamics
using images
Bodhisattwa Prasad Majumder (15BM6JP11)Jaideep Karkhanis (15BM6JP18)
Jayanta Mandi (15BM6JP19)Shashank Kumar (15BM6JP43)
-
The Overall Activity In The Community (Troll Malayalam)
-
All of the images in the data has been clustered to form templates of memes.
Meme Templates
-
Zipf’s Law
-
Appearances of different templates
-
Meme Arrival Prediction: RPP
• To model the arrival of memes we used Reinforced Poisson Process (RPP), which is based on the intuition
of rich gets richer.
• RPP models the popularity dynamics of a content from it’s time series arrival data by modeling the
intrinsic attractiveness and rate of decay of the content
• Calculating the accuracy of prediction on a random set of 10 meme templates we achieved at Mean
Absoulte Percentage deviation of 13.15.
-
Meme Arrival Prediction: RPP
-
Clustering meme templates into temporal profiles
• From the temporal characteristics we have found existence for eight profiles with different temporal characteristics.
• Feature those are taken to generate temporal profiles are: lifespan, mean arrival time, IQR of arrival
time, maximum appearance per day , mean appearance per day, variance of no of appearance per day
-
Profile Identification Cluster_id lifespan mean_appearence IQR max_appearence_per_day mean_appearence_per_day variance_of_appearence__frequency_per_day
1 0.198293672 0.124086078 0.009514603 0.043582375 0.001633 0.000230229
2 0.739070804 0.763098891 0 0.034482759 0.000212441 2.45E-05
3 0.044207932 0.027915294 0.000780965 0.042821406 0.009104218 0.001159038
4 0.001135672 8.43E-05 1.10E-07 0.03588337 0.073150402 0.00011302
5 0.746343489 0.343120768 0.546262826 0.036511156 0.000348047 4.18E-05
6 0.42752443 0.356217302 0.032258219 0.035057471 0.000453536 5.28E-05
7 0.004399919 0.000485943 2.47E-05 0.159382518 0.270711471 0.017347505
8 0.803442073 0.149837737 0.143304667 0.069480185 0.001269147 0.000251875
Short description of each profile: 1. Having not so low lifespan and not so low mean arrival time2. Mostly persistent but very infrequent which means they exists in the system memory but very rarely used3. Having low lifespan but not as low as spam images, transient profiles 4. All singleton templates those appeared only once5. Persistent, but high mean inter-arrival time than classic memes6. Medium lifespan and medium inter-arrival time7. Spam images which appears more than once in a day but very short lifespan 8. Persistent and classic memes having highest lifespan
-
Examples and verification
SpamPersistent and classic
Persistent and classic
-
Examples and verification
SpamPersistent but appears less than classic
Persistent but infrequent
-
Examples and verification
Spam Misclassified as Spam
Observation: There are some images those have been used like a spam but are not spam if only temporal characteristics are considered but if the meat data of likes have been considered then it is clearly identifiable
For spam the 90th percentile of likes is 12 where as for non-spam images those are transient and have high mean appearances per day, the 90th
percentile of likes is 674
-
Meme Usage Curve: Alive and Dead memes
• As previously discussed the memes are clustered into 8 profiles.
• For each cluster the ECDF of the life-time is calculated.
• For each appearance of meme the active life is assumed to be identical of the ECDF of the profile
containing the meme.
-
ECDF approach
-
Meme Usage Curve
-
Meme Usage Curve: Post Images
-
Cumulative Meme Usage Curve: Post Images
-
Global vs Local: Usage Plot
-
Global vs Local: Vocabulary growth curve
-
Global vs Local: Active meme per day
-
Global vs Local: Comparative measures
Aside from Vocabulary growth curve we define two other metric to compare between Global and Local memes
For Global memes: Mean interarrival time is of 1.693241 daysFor Local memes: Mean interarrival time of 0.2808407 days
Popularity Index (defined based on the number of memes qualifies a threshold of having likes more than the 90th percentile of the like distribution):
3.63% Global memes are popular whereas 12.4% Local memes are popular
-
Rationale : Like Prediction of a post
We have observed that post images tend to get more likes than comment images
The problem is can we predict popularity of a post based on the comment stream!
The comment stream involves user interaction and the experiment is to look for important information hidden in the temporal evolution and popularity of comment stream which can predict the popularity of the post
-
Like Prediction of a postResponse Variable = Popularity of the post
where the post likes has been used a proxy of the popularity.We defined 4 popular zone based on the likes distribution
Zones: 0-1000, 2000-6000, 6000-11000, > 11000
Features: Max_likes = Maximum likes got any comment under the postMean_likes = Average number of likes got by all comments Total_likes = Total number of likes achieved by all comments under the post Total_comments = Total number of comments for the post Unique_users = Total number of unique users have commented in the postEarliest_time = Time associated with the earliest commentLatest_time = Time associated with the latest commentPost_popularity = Aggregate of popularity index of all users those who commented in the postPost_day = The day of week the post has been postedPost_time = The time of the day the post has been posted (factor into different time slots Eg: early morning, Late night, evening, noon etc. )
User popularity has been created based on the comment behavior of a particular user. A score has been created based on the likes he gets for each comment
-
Results (Classifier used: Boosted Trees)Result taking all comments (23K) Result taking only image comments (11K)
Test Accuracy = 75.7% Test Accuracy = 77.28%
The baseline model has been taken as SVM which gave ~ 60 % accuracy
-
Conclusion
• The reappearance frequency of memes follow pattern similar to Zipf’s Law• The arrival times of individual templates can be predicted by Reinforced
Poisson Process• Based on temporal characteristics, templates can be profiled • The growth of Post vocabulary is almost saturating• Borrowing of ideas from Global subculture can be seen at the inception but
later Local memes have taken over based on usage and popularity• Finally popularity of a post image can be predicted only based on image
comment which shows the predictive power image comments over text comments
-
References
[1] Monojit Choudhury and Animesh Mukherjee , The Structure and Dynamics of Linguistic Networks
[2] Bradley E Wiggins and G Bret Bowers , Memes as genre: A structurational analysis of the memescape
[3] Lada A. Adamic et al. , Information Evolution in Social Networks
[4] Hua-Wei Shen et. al., Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes