mapping online publics (part 2)

18
Mapping Online Publics Axel Bruns / Jean Burgess ARC Centre of Excellence for Creative Industries and Innovation, Queensland University of Technology [email protected] @snurb_dot_info / [email protected] @jeanburgess http://mappingonlinepublics.net http://cci.edu.au/

Upload: axel-bruns

Post on 08-Jul-2015

2.581 views

Category:

Education


7 download

DESCRIPTION

Part 2 of the "Making Sense of Twitter: Quantitative Analysis Using Twapperkeeper and Other Tools" workshop, presented at the Communities & Technologies 2011 conference, Brisbane, 29 June 2011.

TRANSCRIPT

Page 1: Mapping Online Publics (Part 2)

Mapping Online Publics

Axel Bruns / Jean Burgess

ARC Centre of Excellence for Creative Industries and Innovation, Queensland University of Technology

[email protected] – @snurb_dot_info / [email protected] – @jeanburgess

http://mappingonlinepublics.net – http://cci.edu.au/

Page 2: Mapping Online Publics (Part 2)

Gathering Data

• Keyword / #hashtag archives

– Twapperkeeper.com

• No longer fully functional

– yourTwapperkeeper

• Open source solution

• Runs on your own server

• Use our modifications to be able to export CSV / TSV

• Uses Twitter streaming API to track keywords

– Including #hashtags, @mentions

Page 3: Mapping Online Publics (Part 2)

Twapperkeeper / yourTwapperkeeper data

• Typical data format (#ausvotes):

Page 4: Mapping Online Publics (Part 2)

Processing Data

• Gawk:

– Command-line tool for processing CSV / TSV data

– Can use ready-made scripts for complex processing

– Vol. 1 of our scripts collection now online at MOP

• Regular expressions (regex):

– Key tool for working with Gawk

– Powerful way of expressing search patterns

– E.g.: @[A-Za-z0-9_]+ = any @username

– See online regex primers...

Page 5: Mapping Online Publics (Part 2)

# atextractfromtoonly.awk - Extract @replies for network visualisation

#

# this script takes a Twapperkeeper CSV/TSV archive of tweets, and reworks it into simple network data for visualisation

# the output format for this script is always CSV, to enable import into Gephi and other visualisation tols

#

# expected data format:

# text,to_user_id,from_user,id,from_user_id,iso_language_code,source,profile_image_url,geo_type,geo_coordinates_0,geo_coordinates_1,created_at,time

#

# output format:

# from,to

#

# the script extracts @replies from tweets, and creates duplicates where multiple @replies are

# present in the same tweet - e.g. the tweet "@one @two hello" from user @user results in

# @user,@one and @user,@two

#

# Released under Creative Commons (BY, NC, SA) by Axel Bruns - [email protected]

BEGIN {

print "from,to"

}

/@([A-Za-z0-9_]+)/ {

a=0

do {

match(substr($1, a),/@([A-Za-z0-9_]+)?/,atArray)

a=a+atArray[1, "start"]+atArray[1, "length"]

if (atArray[1] != 0) print tolower($3) "," tolower(atArray[1])

} while(atArray[1, "start"] != 0)

}

Page 6: Mapping Online Publics (Part 2)

Running Gawk Scripts

• Gawk command line execution:

– Open terminal window

– Run command:

#> gawk -F \t -f scripts\explodetime.awk input.tsv >output.tsv

– Arguments:

• -F \t = field separator is a TAB (otherwise -F ,)

• -f scripts\explodetime.awk = run the explodetime.awk script

(adjust scripts path as required)

Page 7: Mapping Online Publics (Part 2)

Basic #hashtag data: most active users

• Pivot table in Excel – ‘from_user’ against ‘count of text’

Page 8: Mapping Online Publics (Part 2)

Identifying Time-Based Patterns

#> gawk -F \t -f scripts\explodetime.awk input.tsv >output.tsv

• Output:

– Additional time data:

• Original format + year,month,day,hour,minute

– Uses:

• Time series per year, month, day, hour, minute

Page 9: Mapping Online Publics (Part 2)

Basic #hashtag data: activity over time

• Pivot table – ‘day’ against ‘count of text’

Page 10: Mapping Online Publics (Part 2)

Identifying @reply Networks

#> gawk -F \t -f scripts\atreplyfromtoonly.awk input.tsv >output.tsv

• Output:

– Basic network information:

• from,to

– Uses:

• Key @reply recipients

• Network visualisation

Page 11: Mapping Online Publics (Part 2)

Basic #hashtag data: @replies received

• Pivot table – ‘to’ against ‘from’

Page 12: Mapping Online Publics (Part 2)

Basic @reply Network Visualisation

• Gephi:

– Open source network visualisation tool – Gephi.org

– Frequently updated, growing number of plugins

– Load CSV into Gephi

– Run ‘Average Degree’ network metric

– Filter for minimum degree / indegree / outdegree

– Adjust node size and node colour settings:

• E.g. colour = outdegree, size = indegree

– Run network visualisation:

• E.g. ForceAtlas – play with settings as appropriate

Page 13: Mapping Online Publics (Part 2)

Basic @reply Network Visualisation

• Degree = 100+, colour = outdegree, size = indegree

Page 14: Mapping Online Publics (Part 2)

Tracking Themes (and More) over Time

#> gawk -F \t -f multifilter.awk search="term1,term2,..." input.tsv >output.tsv

term examples: (julia|gillard),(tony|abbott)

.?,@[A-Za-z0-9_]+,RT @[A-Za-z0-9_]+,http

• Output:

– Basic network information:

• Original format + term1 match, term2 match, ...

– Uses:

• Use on output from explodetime.awk

• Graph occurrences of terms per time period (hour, day, ...)

Page 15: Mapping Online Publics (Part 2)

Tracking Themes over Time

• Pivot table – ‘day’ against keyword bundles, normalised to 100%

Page 16: Mapping Online Publics (Part 2)

Dynamic @reply Network Visualisation

• Multi-step process:– Make sure tweets are in ascending chronological order

– Use timeframe.awk to select period to visualise:

#> gawk -F , -f timeframe.awk start="2011 01 01 00 00 00" end="2011 01 01 23 59 59" tweets.csv >tweets-1Jan.csv

• start / end = start and end of period to select (YYYY MM DD HH MM SS)

– Use preparegexfatreplyintervals.awk to prepare data:

#> gawk -F , -f preparegexfattimeintervals.awk tweets-1Jan.csv >tweets-1Jan-prep.csv

– Use gexfattimeintervals.awk to convert to Gephi GEXF format:

#> gawk -F , -f gexfattimeintervals.awk decaytime="1800" tweets-1Jan-prep.csv >tweets-1Jan.gexf

• decaytime = time in seconds that an @reply remains ‘active’, once made

• This may take some time...