when lolcat meets philosoraptors’: an analysis of online ... · global vs local: comparative...

26
When Lolcat meets Philosoraptors’: An analysis of online social dynamics using images Bodhisattwa Prasad Majumder (15BM6JP11) Jaideep Karkhanis (15BM6JP18) Jayanta Mandi (15BM6JP19) Shashank Kumar (15BM6JP43)

Upload: others

Post on 27-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • ‘When Lolcat meets Philosoraptors’: An analysis of online social dynamics

    using images

    Bodhisattwa Prasad Majumder (15BM6JP11)Jaideep Karkhanis (15BM6JP18)

    Jayanta Mandi (15BM6JP19)Shashank Kumar (15BM6JP43)

  • The Overall Activity In The Community (Troll Malayalam)

  • All of the images in the data has been clustered to form templates of memes.

    Meme Templates

  • Zipf’s Law

  • Appearances of different templates

  • Meme Arrival Prediction: RPP

    • To model the arrival of memes we used Reinforced Poisson Process (RPP), which is based on the intuition

    of rich gets richer.

    • RPP models the popularity dynamics of a content from it’s time series arrival data by modeling the

    intrinsic attractiveness and rate of decay of the content

    • Calculating the accuracy of prediction on a random set of 10 meme templates we achieved at Mean

    Absoulte Percentage deviation of 13.15.

  • Meme Arrival Prediction: RPP

  • Clustering meme templates into temporal profiles

    • From the temporal characteristics we have found existence for eight profiles with different temporal characteristics.

    • Feature those are taken to generate temporal profiles are: lifespan, mean arrival time, IQR of arrival

    time, maximum appearance per day , mean appearance per day, variance of no of appearance per day

  • Profile Identification Cluster_id lifespan mean_appearence IQR max_appearence_per_day mean_appearence_per_day variance_of_appearence__frequency_per_day

    1 0.198293672 0.124086078 0.009514603 0.043582375 0.001633 0.000230229

    2 0.739070804 0.763098891 0 0.034482759 0.000212441 2.45E-05

    3 0.044207932 0.027915294 0.000780965 0.042821406 0.009104218 0.001159038

    4 0.001135672 8.43E-05 1.10E-07 0.03588337 0.073150402 0.00011302

    5 0.746343489 0.343120768 0.546262826 0.036511156 0.000348047 4.18E-05

    6 0.42752443 0.356217302 0.032258219 0.035057471 0.000453536 5.28E-05

    7 0.004399919 0.000485943 2.47E-05 0.159382518 0.270711471 0.017347505

    8 0.803442073 0.149837737 0.143304667 0.069480185 0.001269147 0.000251875

    Short description of each profile: 1. Having not so low lifespan and not so low mean arrival time2. Mostly persistent but very infrequent which means they exists in the system memory but very rarely used3. Having low lifespan but not as low as spam images, transient profiles 4. All singleton templates those appeared only once5. Persistent, but high mean inter-arrival time than classic memes6. Medium lifespan and medium inter-arrival time7. Spam images which appears more than once in a day but very short lifespan 8. Persistent and classic memes having highest lifespan

  • Examples and verification

    SpamPersistent and classic

    Persistent and classic

  • Examples and verification

    SpamPersistent but appears less than classic

    Persistent but infrequent

  • Examples and verification

    Spam Misclassified as Spam

    Observation: There are some images those have been used like a spam but are not spam if only temporal characteristics are considered but if the meat data of likes have been considered then it is clearly identifiable

    For spam the 90th percentile of likes is 12 where as for non-spam images those are transient and have high mean appearances per day, the 90th

    percentile of likes is 674

  • Meme Usage Curve: Alive and Dead memes

    • As previously discussed the memes are clustered into 8 profiles.

    • For each cluster the ECDF of the life-time is calculated.

    • For each appearance of meme the active life is assumed to be identical of the ECDF of the profile

    containing the meme.

  • ECDF approach

  • Meme Usage Curve

  • Meme Usage Curve: Post Images

  • Cumulative Meme Usage Curve: Post Images

  • Global vs Local: Usage Plot

  • Global vs Local: Vocabulary growth curve

  • Global vs Local: Active meme per day

  • Global vs Local: Comparative measures

    Aside from Vocabulary growth curve we define two other metric to compare between Global and Local memes

    For Global memes: Mean interarrival time is of 1.693241 daysFor Local memes: Mean interarrival time of 0.2808407 days

    Popularity Index (defined based on the number of memes qualifies a threshold of having likes more than the 90th percentile of the like distribution):

    3.63% Global memes are popular whereas 12.4% Local memes are popular

  • Rationale : Like Prediction of a post

    We have observed that post images tend to get more likes than comment images

    The problem is can we predict popularity of a post based on the comment stream!

    The comment stream involves user interaction and the experiment is to look for important information hidden in the temporal evolution and popularity of comment stream which can predict the popularity of the post

  • Like Prediction of a postResponse Variable = Popularity of the post

    where the post likes has been used a proxy of the popularity.We defined 4 popular zone based on the likes distribution

    Zones: 0-1000, 2000-6000, 6000-11000, > 11000

    Features: Max_likes = Maximum likes got any comment under the postMean_likes = Average number of likes got by all comments Total_likes = Total number of likes achieved by all comments under the post Total_comments = Total number of comments for the post Unique_users = Total number of unique users have commented in the postEarliest_time = Time associated with the earliest commentLatest_time = Time associated with the latest commentPost_popularity = Aggregate of popularity index of all users those who commented in the postPost_day = The day of week the post has been postedPost_time = The time of the day the post has been posted (factor into different time slots Eg: early morning, Late night, evening, noon etc. )

    User popularity has been created based on the comment behavior of a particular user. A score has been created based on the likes he gets for each comment

  • Results (Classifier used: Boosted Trees)Result taking all comments (23K) Result taking only image comments (11K)

    Test Accuracy = 75.7% Test Accuracy = 77.28%

    The baseline model has been taken as SVM which gave ~ 60 % accuracy

  • Conclusion

    • The reappearance frequency of memes follow pattern similar to Zipf’s Law• The arrival times of individual templates can be predicted by Reinforced

    Poisson Process• Based on temporal characteristics, templates can be profiled • The growth of Post vocabulary is almost saturating• Borrowing of ideas from Global subculture can be seen at the inception but

    later Local memes have taken over based on usage and popularity• Finally popularity of a post image can be predicted only based on image

    comment which shows the predictive power image comments over text comments

  • References

    [1] Monojit Choudhury and Animesh Mukherjee , The Structure and Dynamics of Linguistic Networks

    [2] Bradley E Wiggins and G Bret Bowers , Memes as genre: A structurational analysis of the memescape

    [3] Lada A. Adamic et al. , Information Evolution in Social Networks

    [4] Hua-Wei Shen et. al., Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes