classifying twitter content

Classifying Twitter Content

Dr Stephen DannAustralian National University

@stephendann

Presented at Marketing Science, Houston, June 11, 2011

If you’re on Twitter

Questions can be sent to @stephendannor

Hashtag #mktsci2011

Why here, why now?

Why this presentation?– MSI interest in role of social media in branding– Attitudinal metrics from web can predict

transactions

Why this method?– Try to further avoid the criteriaflation issue

• Hence announcing a coding structure exists

What outcome?– I could use a good set of equations

Series of Projects

Blog Post reacting to Pear Analytics 2009

First Monday Paper (Dann, 2010)

Marketing Science, Method <-You are here.

USF Social Marketing, Social Media in Social Marketing (next week)

AMSRS Conference, Crisis Communication Analysis (September)

ANZMAC, Categories in Detail (December)

Twitter.

Twitter matters because of what it is: at its heart, a platform that offers an exchange of ideas and information on an unprecedented scale.

Why Twitter Matters : Marketing : Idea Hub :: American Express OPEN Forumhttp://www.openforum.com/idea-hub/topics/marketing/article/why-twitter-matters-ann-handleyFri Oct 02 2009 21:16:49 GMT+1000 (AUS Eastern Standard Time)

Twitter in Plain English

How to analyze a living medium?

Hawthorn Effect*Uncertainty PrincipleSample Size / Twitter Volume[ ]

Why do any coding?

• Twitter is not about the aggregate firehose– There are those who disagree, and I have cited

many of them. However, few, if any actually read the impossibly fast updating full timeline

• Twitter is about how you use it.– Twitter becomes something in co-creation– Twitter timeline as documented history– Tracking Near-Past Behaviour

Raw Counts

Tweetstats – www.tweetstats.com

Text Analysis

Tweetstats – www.tweetstats.com Wordle – wordle.com

Prior Analysis

Boyd et al 2010Crawford 2009DiMicco, et al 2008Fahmi 2009Gay et al 2009Heany and McClurg 2009Hohl 2009Honeycutt and Herring 2009

Jansen et al 2009Java et al 2007Lariscy et al 2009Makice, 2009Miller, 2008Naaman et al 2010Pear Analytics 2009Steiner 2009Zhao and Rosson 2009

Dann (2010) based on:

Schema

Developed from ground theory approach60+ Twitter articles

Use behaviours, content analysis, sentiment analysis

10,000+ tweetsManual coding

Supporting analysisLinguistic Analysis (LIEC)

Automated analysis

Leximancer Analysis

Framework

Six categories.1. Conversational2 . News Events3 . Pass along4 . Phatic5 . Status6 . Spam

Conversational

• core of the interpersonal exchange on Twitter, and the binding activity that links different users together into a sense of community, companionship and conversation – Cahill 2009, Cranefield and Yoong 2009,

Honeycutt and Herring 2009, Java et al 2009, Perlmutter 2009, Steiner 2009, Ratkiewicz 2010).

• four identifiable sub components– action, query, referral and response

News Events

• broad selection of media releases, citizen journalism, professional journalism, PR and publicity – Mäkinen and Wangu Kuira 2008, Power and

Forte 2008, Java et al 2009, Phelan et al 2009, Chu et al 2010, Petrovic et al 2010, Zhou et al 2010, Phuvipadawat and Murata 2011, Cheong and Lee 2011).

– Seven categories:• announcements, hashtagged events, headlines,

sport, natural disasters, transport and weather.

Pass along

• where Twitter is used as a short form publishing outlet for recommended links, other Twitter remarks, or links to the author’s own content – Java et al 2007, Mischaud 2007, Heany and

McClurg 2009, Java et al 2009, Pear Analytics, 2009, Naaman et al 2010, Zhang et al 2010, Bakshy et al 2011).

• Five categories– automated endorsement, endorsements,

retweet, secondary social media and user generated content,

Phatic

• Use of Twitter as a meanings to maintain a presence within a community, and connections to other users of the service without direct conversation – Java et al 2007, Miller, 2008, Henneburg et al

2009, Keenan and Shiri 2009, Makice 2009, Pear Analytics, 2009, Fernando 2010, Marwick and boyd 2010, Zhang et al 2010

• Four categories– undirected broadcast statements, fourth wall

breaking meta commentary, greetings and the unclassifiable content

Status

• Use of the service to answer the original Twitter question of “What are you doing?” in terms of reporting the user’s sense of “Me-Now”, or statements of immediately transpired activity – Gaonkar et al 2008, Bollen et al 2009, Java et al

2009, Chu et al 2010, Dodds et al 2011, Naaman et al 2010, Zhang et al 2010

• eight categories – activity, automated status, location, mechanical,

personal statements, physical, temporal and work

Sub categories

• Conversational– Response– Referral– Query– Action

• News Events• Pass along• Phatic• Status

Sub categories

• Conversational• News Events

– Headlines– Hashtagged Event– Natural disasters– Transport– Weather– Sport– Announcement

• Pass along• Phatic

Sub categories

• Conversational• News Events• Pass along

– Retweet– Endorsement– Secondary Social Media– User generated content– Automated Endorsement

• Phatic• Status

Marketing Science Style

• N = 11672

– Three public sector organisation timelines• Local government, police force, energy company

– Two hashtags • natural disaster• conference

– One personal timeline data set

Data n Dann #Dis. #Conf Police Counc. Ener.

Convers-ational

29% 3415 1473 30 427 585 785 115

News Events

8% 884 13 17 29 784 31 10

PassAlong

50% 5787 278 533 351 2780 949 896

Phatic 3% 398 213 12 60 69 24 20

Status 10% 1188 834 10 153 126 34 31

Total 11672 2811 602 1020 4344 1823 1072

Uses of the Data

0%

10%

20%

30%

40%

50%

60%

70%

80%

Pre-crisis Flood Inter-crisis Cyclone Post Crisis

Conversational

News Events

Pass along

Phatic

Status

Here’s where you come in…

The Challenge140 characters of text

[C] [S] [PA] [N] [P] [X]*

[C1]

[C2]

[C3]

[C4]

[S1]

[S2]

[S3]

[S4]

[S5]

[S6]

[S7]

[S7]

[PA1]

[PA2]

[PA3]

[PA4]

[PA5]

[N1]

[N2]

[N3]

[N4]

[N5]

[N6]

[N7]

[P1]

[P2]

[P3]

[P4]

[X1]

[X2]

[X3]

[X4]

Time

Day

Month

Year

* Spam gets a category indicated as “Delete”

Future plans

Segments and Use-Case Scenarios

Forward facing strategic guidelines

Predictive Models

Certain level of automationBut not autonomous coding.

ReferencesBakshy, E, Hofman, J, Mason, W and Watts, D (2011) Everyone's an influencer: Quantifying Influence on Twitter, WSDM’11, February 9–

12, 2011, Hong Kong, ChinaBerger, E (2009) This Sentence Easily Would Fit on Twitter: Emergency Physicians Are Learning to “Tweet”, Annals of Emergency

Medicine, 54 (2) 23A-25ABollen, J Mao, H and Zeng, X (2011) Twitter mood predicts the stock market, Journal of Computational Science 2 (1) 1-8Bollen, J, Pepe, A, and Mao, H (2009) Modeling public mood and emotion: Twitter sentiment and socioeconomic phenomena,

WWW2010, April 2630, 2010, Raleigh, North Carolinaboyd, d, Golder, S and Lotan, G (2010) Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter, Proceedings of HICSS-

43 in January, 2010Bryce T and Pieper C (2010) Using Twitter to Receive Storm Reports, 38th Conference on Broadcast Meteorology, June 2010,Butcher, L, (2010) Using Twitter to Advance Cancer Knowledge, Oncology Times, 32 (1) 8-10Cahill, K, 2009 Building a virtual branch at Vancouver Public Library using Web 2.0 tools, Program: electronic library and information

systems 43 (2) 140-155Cheong, M and Lee, V C S (2011) A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian

sentiment and response to terrorism events via Twitter, Information Systems Frontiers, 13, p 45-59Chu, Z, Gianvecchio, S, Wang, H and Jajodia, S (2010) Who is Tweeting on Twitter: Human, Bot, or Cyborg?, ACSAC '10 Proceedings of

the 26th Annual Computer Security Applications ConferenceCranefield, J and Yoong, P (2009) Crossings: Embedding personal professional knowledge in a complex online community environment,

Online Information Review 33 (2) 257-275Crawford, K (2009)'Following you: Disciplines of listening in social media', Continuum, 23:4, 525 — 535Cuddy, Colleen(2009)'Twittering in Health Sciences Libraries', Journal of Electronic Resources in Medical Libraries, 6:2, 169 – 173Dann, S (2010) Twitter content classification, First Monday, 15 (12)- 6 December 2010,

http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/2745/2681DiMicco, J, Millen, D Geyer, W, Dugan, C, Brownholtz, B and Muller, M (2008) Motivations for Social Networking at Work CSCW’08,

November 8–12, 711-720Doods, P, Harris, K, Kloumann, I, Bliss, C and Danforth, C (2011) Temporal patterns of happiness and information in a global social

network: Hedonometrics and Twitter, arXiv:101.5120v3 11 Feb 2011Doherty, R (2010) Getting social with recruitment, Strategic HR review, 9 (6) 11-15Dong, A, Zhang, R, Kolari, P, Bai, J, Diaz, F, Chang, Y, Zheng, Z (2010) Time is of the Essence: Improving Recency Ranking Using Twitter

Data, WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USAEfron, M (2011) Information Search and Retrieval in Microblogs, Journal of the American Society for Information Science and

Technology, 62 (6) 996–1008Fahmi, W S 2009, Bloggers' street movement and the right to the city. (Re)claiming Cairo's real and virtual "spaces of freedom",

Environment and Urbanization 2009; 21; 89-107

ReferencesFernando, I (2010) Community creation by means of a social media paradigm, The Learning Organisation, 17 (6) 500-514Fields, E, (2010) A unique Twitter use for reference services, Library Hi Tech News, 6/7 14-15Gaonkar, S., Li, J., Choudhury, R.R., Cox, L., and Schmidt, A (2008) Micro-Blog: Sharing and Querying Content Through Mobile Phones and Social

Participation, MobiSys’08, June 17–20, 2008, Breckenridge, Colorado, USA.Gay, P Plait, P, Raddick, J, Cain, F and Lakdawalla, E (2009) "Live Casting: Bringing Astronomy to the Masses in Real Time", CAP Journal, June

26-29Grier, C, Thomas, K., Paxson, V and Zhang, M (2010) @spam: The Underground on 140 Characters or Less, CCS’10, October 4–8, 2010, Chicago,

Illinois, USAHeany, M and McClurg, S 2009, Social Networks and American Politics: Introduction to the Special Issue, American Politics Research 37, 727-

741Henneburg, S. Scammell, M and O'Shaughnessy, N (2009) Political marketing management and theories of democracy, Marketing Theory 2009;

9; 165-188Hohl, M (2009) Beyond the screen: visualizing visits to a website as an experience in physical space, Visual Communication, 8 (3) 273-284Honeycutt, C and Herring, S C (2009) Beyond Microblogging: Conversation and Collaboration via Twitter, (2009). Proceedings of the Forty-

Second Hawai’i International Conference on System Sciences (HICSS-42). Los Alamitos, CA: IEEE Press. 1-10, http://ella.slis.indiana.edu/~herring/honeycutt.herring.2009.pdf

Jackson, N and Lilleker, D (2011) 'Microblogging, Constituency Service and Impression Management: UK MPs and the Use of Twitter', The Journal of Legislative Studies, 17: 1, 86 — 105

Jansen, B, Zhang, M, Sobel, K and Chowdury, A (2009) Twitter power: Tweets as electronic word of mouth, Journal of the American Society for Information Science and Technology, 60(11):2169–2188, 2009 http://ist.psu.edu/faculty_pages/jjansen/academic/jansen_twitter_electronic_word_of_mouth.pdf

Java, A, Song, X, Finin, T and Tseng, B (2007) Why We Twitter: Understanding Microblogging Usage and Communities, Joint 9th WEBKDD and 1st SNA-KDD Workshop ’07 , August 12, 2007, p 56-65

Java, A, Song, X, Finin, T and Tseng, B (2009) Why We Twitter: An Analysis of a Microblogging Community in H. Zhang et al. (Eds.): WebKDD/SNA-KDD 2007, LNCS 5439, pp. 118–138, 2009.

Keenan, A and Shiri, A, (2009) Sociability and social interaction on social networking websites, Library Review 58 (6) 438-450Krums, 2009 “There's a plane in the Hudson. I'm on the ferry going to pick up the people”, http://twitpic.com/135xa, January 16, 2009Lariscy, R Avery, E J, Sweetser, K and Howes, P 2009 An examination of the role of online social media in journalists’ source mix, Public

Relations Review 35 (2009) 314–316Lauw, H., Ntoulas, A and Kenthapadi, K (2010) Estimating the Quality of Postings in the Real-time Web, WSDM 2010 Workshop on Search in

Social Media.Lerman, K and Ghosh, R 2010, Information Contagion: n Empirical Study of the Spread of News on Digg and Twitter Social Networks, In

Proceedings of the 4th International Conference on Weblogs and Social Media, 2010.Longueville, B, Smith, R., and Luraschi, G., “OMG, from here, I can see the flames!”: a use case of mining Location Based Social Networks to

acquire spatiotemporal data on forest fires" ACM LBSN '09, November 3, 2009Makice, K, 2009 Phatics and the Design of Community, CHI 2009, April 4-9, 2009, Boston, MassachusettsMäkinen, M and Wangu Kuira, M 2008, Social Media and Postelection Crisis in Kenya, The International Journal of Press/Politics 2008; 13; 328Marwick A E and boyd, d, (2011) I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience, New Media

Society, 13: 114-133

ReferencesMiller, K, 2008, New Media, Networking and Phatic Culture, Convergence: The International Journal of Research into New Media

Technologies, 14 (4) 387-400Miller, V (2009) New Media, networking and Phatic Culture, Convergence: The International Journal of Research into New Media

Technologies, 14 (4) 387-400Mischaud, E 2007, Twitter: Expressions of the Whole Self An investigation into user appropriation of a web-based communications

platform, MSc Dissertation, London School of EconomicsNaaman, M, Boase, J and Lai, C-H (2010) Is it Really About Me? Message Content in Social Awareness Streams, CSCW 2010, February 6–

10Okazaki, M and Matsuo, Y 2010 Semantic Twitter: Analyzing Tweets for Real-Time Event Notification, in Breslin, J, Burg, T, Kim, H and

Schmidt, J-H (2011) Recent Trends and Developments in Social Software, Springer Berlin / Heidelberg Pak, A, Paroubek, P , Twitter as a corpus for sentiment analysis and opinion mining, in N.Calzolari, K.Choukri, B.Maegaard,

J.Mariani,J .Odijk, S.Piperidis, M. Rosner, D.Tapias(Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), European Language Resources Association ,Valletta, Malta, May 2010, pp.19–21.

Parslow, G, 2009, Commentary: Twitter for Educational Networking, Biochemistry and Molecular Biology Education 37 (4) 255–256, 2009

Pear Analytics (2009) Twitter Study – August 2009, http://www.pearanalytics.com/wp-content/uploads/2009/08/Twitter-Study-August-2009.pdf

Perlmutter, D 2009, Political Blogging and Campaign 2008: A Roundtable, The International Journal of Press/Politics 2008; 13; 160Petrovic S, Osborne, M and Lavrenko, V (2010) Streaming First Story Detection with application to Twitter, Human Language

Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 181-189.

Phelan, O, McCarthy, K and Smyth, B (2009) Using Twitter to Recommend Real-Time Topical News, RecSys’09, October 23–25, 2009, New York, New York, USA

Phuvipadawat, S and Murata, T (2011) Detecting a Multi-Level Content Similarity from Microblogs Based on Community Structures and Named Entities, Journal of Emerging Technologies in Web Intelligence, 3 (1), 11-19

Power, R and Forte, D 2008, War & Peace in Cyberspace: Don’t twitter away your organisation’s secrets, Computer Fraud and Security, August, 18-20

Rath, L (2011) The Effects of Twitter in an Online Learning Environment, eLearn Magazine, http://www.elearnmag.org/subpage.cfm?section=articles&article=154-1

Ratkiewicz, J, Conover, M, Meiss, M, Gonçalves, B, Patil, S, Flammini, A, and Menczer, F (2010) Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams, Technical Report arXiv:1011.3768 {cs.SI}, CoRR, 2010.

Sakaki, T, Okazaki, M and Matsuo, Y (2010) Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors, Proceedings of the 19th international conference on World wide web (WWW '10). ACM, New York, NY, USA, 851-860.

Steiner H, 2009 Reference utility of social networking sites: options and functionality, Library Hi Tech News 5/6, 4-6Sullivan SJ, Schneiders AG, Cheang CW, Kitto E, Lee H, Redhead J, Ward S, Ahmed OH, McCrory PR. (2011) ‘What’s happening?’ A

content analysis of concussion-related traffic on Twitter, British Journal of Sports Medicine Mar 15. [Epub ahead of print]Thelwall, M, Buckley, K, and Paltoglou, G (2011) Sentiment in Twitter events, Journal of the American Society for Information Science

and Technology, 62 (2) 406-418

References

Welch, M., Schonfeld, U., He., D and Cho, J., Topical Semantics of Twitter Links, WSDM’11, February 9–12, 2011, Hong Kong, China

Wilson, D (2008) Monitoring technology trends with podcasts, RSS and Twitter, Library Hi Tech News, 10, 8-12

Zhang, J., Qu, Y., Cody., J and Wu, Y (2010) A Case Study of Micro-blogging in the Enterprise: Use, Value, and Related Issues, CHI 2010, April 10-15, 2010, Atlanta, Georgia, USA

Zhao, D and Rosson, M B, How and Why People Twitter: The Role that Micro-blogging Plays in Informal Communication at Work, GROUP’04, May 10–13, 2009, 243-252

Zhou, Z., Bandari, R., Kong, J., Qian, H., and Roychowdhury, V., (2010) Information Resonance on Twitter: Watching Iran, 1st Workshop on Social Media Analytics (SOMA ’10), July 25, 2010, Washington, DC, USA

Questions

[email protected]

@stephendann

This work is licensed under the Creative Commons Attribution-Share Alike 2.5 Australia License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.5/au/