
REVEALing hidden concepts in social media

Alethiometer: a framework for assessing trustworthiness and content validity in social media

Eva Jaho, Efstratios Tzoannos, Aris Papadopoulos, Nikos Sarris

1Motivation and challenge5 Vs of Big Data3 Cs of Veracity2Alethiometer frameworkC






33C1 Contributor4What can we find out about the source of information? 45Contributor modalitiesReputation- Analyse comments in the course of time, discover sentiments and opinions towards a source.- Measured by the number of upvotes or likes.History- Information about activity on different social media platforms, combined with validity data.- Measured by the update frequency of valid posts.Popularity- Information about following source activity (readings, recommendations).- Measured by the number of friends/followers, and the number of responses.

56Contributor modalitiesInfluence- Information about activities triggered by this source (re-posts, discussions or comments).- Measured by number of retweets/shares, Klout influence score.Presence- Information about type of source (individual, organisation,officially verified account, fake identity, etc.) and its presence on multiple social media platforms.- Measured by the number of accounts in different social media.

6C2 Content7Does the posted content look reliable? 78Reputation of linked web content- Measured in terms of domain reputation, page rank (GoogleRank or Alexa PageRank), or properties of the contributors to the content.Provenance- Finding the original occurrence of the content and its whole path across sources, places and time, and measuring the reputation of these sources.Popularity- Information about how many people are following this content.- Measured by the number of followers, and the number of responses.Content modalities89Influence- Analyse if this content is triggering discussions or other actions in the social sphere.- Measured by number of retweets/shares. Originality- Check whether the content or parts thereof have been used in the past (e.g., reused text or images that have appeared in the past).Authenticity- Check whether the content has been changed with respect to its original state (e.g., changed text or attached multimedia content)Objectivity and Diversity- Measured by the variation of opinions found for people, content, or general entities.

Content modalities9C3 Context10Does the 'what', 'when' and 'where stick together? 1011Cross-checking- Measured by the number of different reports or mentions about the same thing coming from independent sourcesCoherence- Measurement of text coherence (e.g., Coh-Metrix) and coherence between the content and tags, attached web-links, or attached multimedia.Proximity- Measurement of coherence between reference location/time andpublication location/time.Context modalities

1112How to combine all these parameters?

1213Approach for rating of modality parametersRate parameters on 5-point discrete scale, from 0 to 4- [0, a0) 0, [a0, a1) 1, [a1, a2) 2, [a2, a3) 3, [a3, ) 4.- a0: 20th percentile, a1: 40th percentile, a2: 60th percentile, a3: 80th percentile (adjust the scale so it follows a uniform distribution).

Weight the rating of parameters for deriving a total score uniformly or based on their significance

1314Are all these parameters necessary? 15Parameters studied Number of followersNumber of tweetsUser account ageSample: ~10 M tweets, 5 K usersCollection period: July-September 2013Preliminary statistical results16Empirical distributions

Heavy-tailed distributionsMultimodal heavy-tailed distributions with three different peaks(6.7 months, 23.3 months, 4.4 yrs)1617Correlation coefficientsFriends - followers: 0.1222Friends - tweets: 0.08Followers - tweets: 0.0197

Conclusion:- all parameters relatively independent from one-another- need to be studied independently

171818Summary Defined Alethiometer: a framework taking into account all aspects: Contributor, Content and ContextShowed an approach for combining the ratings of all parametersAttested the relative independence of parameters and the need to consider a variety of measures (also previously emphasized in the literature)Future workInvestigate statistical properties of other modalitiesExtract the significance of modalities Study correlation between content, contributor and context modalitiesSummary and future work18find us at http://ilab.atc.grfollow us @iLabATCThank [email protected] Questions & Answers

Top Related