mapping the university’s social media footprint · mapping the university’s social media...

Mapping the University’s Social Media Footprint

Andrew Moffat

University of Nottingham

Nottingham, UK

[email protected]

ABSTRACT

This is a report of a study run to explore the use of social

media for public engagement by members of staff of the

University of Nottingham. The study was run as an exercise

in carrying out social media research in a responsible way,

foregrounding the rights to privacy of social media users.

Data was collected from fully consenting study participants,

who contributed the data from the Twitter accounts for

analysis while running a web tool designed to help Twitter

users monitor and manage their Twitter interactions. A

network graph visualisation was created from the data, and

different user communities identified. A study of hashtag

propagation was carried out, and certain characteristics of

successful hashtags noted. Finally, reflections were made on

the nature of social media analysis conducted in this way.

Author Keywords

Social media analysis, public engagement, Twitter, citizen-

centric research, RRI.

INTRODUCTION

Public engagement is an increasingly important part of the

work of an academic researcher. The European

Commission’s Responsible Research and Innovation (RRI)

framework, published in 2012, sets out six keys designed to

‘foster public engagement and a sustained two-way dialogue

between science and civil society’ [3]. RRI aims to embed

scientific research more deeply within society and societal

issues and concerns in order to avoid the kinds of disjuncts

between scientific research and public opinion that have

emerged in recent decades with regard to ethically sensitive

matters such as genetic modification [12]. This discursive

approach to science in society is reflected in the expectation

of research funding bodies that researchers will demonstrate

a clearly defined conduit for communicating their work to

the public, and for measuring the efficacy of that

communication. Research Councils UK, who characterise

public engagement discourse as “Pathways to Impact”, state

that: “A clearly thought through and acceptable Pathways to

Impact is an essential component of a research proposal and

a condition of funding.” [14].

The principal aim of the present study is to detect signs of

successful public engagement through Twitter. The

secondary aim is to consider the methodological

implications of the ethically-driven approach on the efficacy

of analysis. Use of network science and graph theory will

be made in the study in order to analyse the network of

interactions present in the Twitter data provided by

participants. Each of these concepts will now be discussed

in turn.

Public Engagement through Social Media

As a tool for communicating to a wide audience, social

media networks clearly have a lot to offer. With an ever-

increasing proportion of the nation’s populace using social

media to interact, not only with their immediate social

groups, but with wider society more generally (#smlondon

gives a 6% increase in active social media accounts in the

UK from 2014 to 2015) [5], the potential to spread ideas and

engage in meaningful dialogue around emerging research

themes is enormous. However, practices and attitudes

towards social media vary widely among academics, who

are, in any case, typically not among the demographic

groups that drive adoption and development of social media

(a report by Pew Research on US social media usage shows

the 18-29 age group leading usage in all major social

networks except LinkedIn) [1].

The Nature and Measurability of Impact

In our data-rich world, the response to the need for proof of

public engagement has been the emergence of metrics that

can offer conveniently comparable numeric values to

evaluate impact. Citation count, h-index and impact factor

provide researchers quantifiable proof of uptake of their

published work; however, these metrics only really give a

picture of impact within academia, and it is far more difficult

to detect and measure the extent to which scientific research

emerges from the academic community into the wider world.

Social media provide “outstanding opportunities” for public

engagement [13], and the potential to interact with a lay

audience, but there is also the potential for conversations to

remain within academic circles and not creating true public

discourse despite an appearance of success.

Eysenbach shows that the level of Twitter activity linked to

the publication of a research paper correlates with the

number of citations of that paper, although he notes that

“[c]orrelation is not causation, and it is harder to decide

whether extra citations are a result of the social media buzz,

or whether it is the underlying quality of an article or

newsworthiness that drives both the buzz and the citations”.

[4] While the two things are very likely connected, it raises

the question of to what extent this is an example of social

media used as a conduit for public outreach, or social media

as a metric to gauge impact.

Letierce et al [9] present a study of the use of Twitter to

spread the public engagement on Twitter during three

different scientific conferences. After conducting a survey

of academics in their target community to investigate their

use of social media, the study captured Tweets containing

the official hashtag of each of the three events. Among the

users in their data they distinguished between hubs and

authorities, the former being prolific users of tweets

containing @ addressivity, and the latter being people

frequently mentioned by their @username. Users with high

scores in both categories seemed to be directly involved with

the event in question. A study of hashtags used in Tweets

during the conferences suggested a classification scheme of

seven categories: technical terms, events, domains,

applications, institute/people, documentation, and other.

This classification was used to try to determine whether or

not Tweeters were reaching communities external to their

own, and it was reasoned that technical terms would be

unlikely to spread beyond the expert community, while the

more high-level domains category could appeal to people

with only a general understanding of the field. However, the

paper does not make clear how users were judged to belong

to a ‘community’ or not (indeed, the term itself is not well

defined), and conclusions drawn in this regard seem more

like suppositions.

The Ethics of Social Media Analysis

The abundance of easily available data makes social media

an obvious place to conduct research. However, whereas in

the past the accessibility of data was in itself an indication of

its public nature, the blurring of public and private inherent

in computer-mediated discourse requires us to “question

whether the availability of information on the internet

necessarily makes this information ‘fair game’” [15]. Snee

considers in particular the ethical issues in studying blogs,

which, as a form of discourse, echo the intimate nature of a

personal diary, and yet access to them is unrestricted, ergo

public. In such cases it is not clear whether the data represent

published textual material, or data taken from human

subjects, and this author/subject distinction is crucial to the

ethical debate.

To what extent social media activity can be considered

public or private in nature is another contentious issue. The

ESRC Framework for Research Ethics states that “the public

nature of any communication or information on the internet

or through social media should always be critically

examined”, [2] and that users of computer-mediated

communication may not consider their comments as being

made fully in the public sphere. This is related to the idea of

‘psychological privacy’, in discussion of which Frankel and

Siang consider “a distinction between the public and private

domains based not on the accessibility of the data, but on the

psychological perception of the subjects with regard to the

information” [6].

Citizen-Centric Approaches to Social Media Analysis

(CaSMa) is a research group within the Horizon Digital

Economy Research Institute at the University of

Nottingham. CaSMa’s role is to explore the ethical

implications of social media research and data analysis, and,

more generally, the analysis without consent of large

volumes of personal data. Built upon the concept of personal

data as a new asset class, a valuable commodity that is

routinely traded and profited from, CaSMa seeks methods of

engaging in social media-based research that foreground the

ethical considerations discussed above, and avoid the ‘help

yourself’ approach to data collection that has underpinned

much social media analysis to date.

With this in mind, the current study was designed in such a

way as to collect data only from active participants, with

their express written consent. This resulted in a number of

methodological issues and features of the resulting dataset

that will be discussed later.

Network Science and Graph Theory

The relationships and/or interactions between members of a

social group can be conceptualised as a network, and

visualised as a graph. In terms of social media data, such a

network can show latent relationships, such as friends or

contacts (who may nevertheless rarely interact), or active

interactions, such as the sending of directed messages. In a

graph visualisation of a network, individuals are represented

by dots termed nodes; nodes which share some form of

relationship or interaction are connected by lines, termed

edges. Both nodes and edges can visually reflect attributes

of the individuals and relationships: frequency of interaction

between nodes can be shown in the thickness of the edge,

while some measure of the ‘importance’ of an individual can

be shown in the size of the node. Edges can be directional,

in the sense of regarding an interaction such as sending a

message as following a path from one individual to another.

The number of edges that terminate at a node (hence the

number of other nodes to which it is connected) give it a

degree score, which in the case of directional edges can be

in-degree or out-degree. Algorithms run on the graph result

in a visual layout based on the principle that linked nodes

attract one another, while unlinked nodes repel, while

statistical tests run on the network result in a number of

measures, some of which will be considered later.

Given the existence of a network of interactions, we can

suppose that information will diffuse between nodes in a

process as summarised by Guille et al: “(i) a piece of

information carried by messages, (ii) spreads along the edges

of the network according to particular mechanics, (iii)

depending on specific properties of the edges and nodes.” [8]

The three stages of this process correspond with three main

questions behind the analysis of information diffusion in

network science, these being: “(i) which pieces of

information or topics are popular and diffuse the most, (ii)

how, why and through which paths information is diffusing,

and will be diffused in the future, (iii) which members of the

network play important roles in the spreading process?” [8]

The Nature of Public Engagement on Twitter

For the purposes of the current study, public engagement is

considered to be a form of information diffusion. If the

public is successfully engaged in discourse on a topic, we

might expect that topic to arise in a certain place or

community within the wider network, then spread to other

users over time. In information science this is referred to as

infection. Moreover, we would expect that the topic should

diffuse out of the original community, in order to

differentiate true public engagement from information

diffusion within academia: academics talking to other

academics. This necessitates some consideration of how a

community is defined, and how it might be detectable in the

data. If the goal of public engagement is to stimulate

discourse in wider society, then communities should be

defined in terms of their level of knowledge of, and

involvement with, the topic in question. These two are not

necessarily linked: one may have personal experience of

mental illness, but not have an expert knowledge of it. The

goal of creating the graph visualisation of the network is to

attempt to identify communities and observe information

diffusing into a wider context.

BACKGROUND

The current study was carried out within the wider context

of an existing project being conducted by researchers at

Horizon Digital Economy Research, University of

Nottingham. Some account of this project must be given to

justify the rationale for the current study, though the work

described below did not directly form part of the latter.

The wider context: POET The Public Outreach and Engagement Tool (POET) project

being conducted by the CaSMa research team in the Horizon

Digital Economy Research Institute aims to explore ways in

which researchers can harness the potential of social media

for public engagement. To this end, interviews were

conducted with 13 academics within the University of

Nottingham to investigate their practices and attitudes

towards social media. Participants reported feeling a sense

of obligation to use Twitter as a means of communicating

their research, and increased pressure generally to

demonstrate public engagement in their work. The

interviews corroborated an earlier study [18] in finding a

general perception of Twitter as being better suited to

professional outreach, in contrast to the perception of

Facebook as a more personal platform. In particular, the

usefulness of Twitter as a communicative medium during

conferences, both by attendees and non-attendees who

nevertheless had an interest in the event, was expressed by 7

of the 13 participants.

Based on the findings of these interviews, the decision was

made to focus on Twitter as a platform for outreach, and to

develop a software tool to allow users to monitor and

manage their Twitter interactions. Participants in the

original interviews, as well as other members of staff

including some in administrative roles, were invited to

attend a participatory design session, in which their views

were solicited regarding the development of the tool.

Following this preparatory work, the current project was

conceived as an exploration of the use of Twitter for public

engagement by members of the University of Nottingham

staff, using data generated by a pilot implementation of this

tool.

The Tool

The tool uses the Twitter API to access the user’s Twitter

feed, and present the information graphically (figure 1). In

order to allow data to be extracted for analysis, there is an

option to ‘Export my data’, which downloads the Twitter

feed data as a compressed file in json (Java-Script Object

Notation) format.

Figure 1. The Twitter web tool. Events are visible as blobs

coloured according to event type, and the passage of time is

visible from left to right.

The current study is concerned with analysis of the data

collected through the use of the tool, rather than with the

experience of using the tool itself, so this brief description

will suffice for present purposes.

METHODOLOGY

After gaining approval for the study from the Ethics

Committee of the School of Computer Science at the

University of Nottingham, individuals were contacted who

had been involved with, or shown interest in, the POET

project during the earlier interview and participatory design

workshop stages. 11 individuals agreed to participate, two

of whom offered to provide data from two different Twitter

accounts of which they were the primary contributors, giving

a total of 13 accounts. 10 of the 11 individuals were met

with in person, at which time they were fully informed of the

procedure and their consent gained by their signing of the

consent form. In accordance with both the EPSRC policy

framework on research data and the Data Protection Act,

participants were explicitly asked for their consent to their

data being retained after the end of the study, solely for

purposes of replicating the study. All participants consented

to this. They were shown how to use the tool and download

their data, and how to install the freely-available AES Crypt

encryption tool, to ensure they could send their data securely

by email. The eleventh individual was guided through the

process via email exchange.

The study was designed to reflect the principles of personal

ownership of personal data espoused by CaSMa. Rather

than ‘crawl’ or ‘scrape’ the web for tweets (which in any

case has become problematic in recent years due to Twitter

changing its policies), the only data collected was that

provided directly by active, consenting study participants.

Data collection was planned for a four-week period, with

participants asked to provide four weekly instalments of

their data, to allow for an iterative approach to data

processing and analysis. The time frame and period was

broadly the same for all the participants, although, as the tool

started collecting their data from the first use, the starting

point for each participant was the initial meeting and was

therefore subject to some variation. In the event, technical

problems led to data from one participant being unavailable

for analysis, resulting in a final count of 12 accounts. In

what follows, the term participant will refer to the

individuals, and participant account, abbreviated to PA,

will refer to the accounts used for data collection.

The data gained via this method comprised the full Twitter

feed of each account provided by participants, as would be

visible on an account’s profile screen. It incorporated all

activity by the account holder, as well as all activity by other

users in which the account id appeared, such as retweets,

mentions, (un)favouriting and (un)following.

Figure 2. Total numbers of events of each type in the full

dataset.

In its raw form the data consisted of a number of text files in

.json (Java Script Object Notation) format. The data was

presented as a series of events, each event being one of the

coloured blobs visible in the web tool. A breakdown of the

numbers of different types of events in the full, final dataset

is shown in figure 2. The programming language Python

was used to extract data and assemble several different

datasets for analysis. The ‘events’ dataset recorded each

event, along with the type of event and its accompanying

data, which were considerable and operated on several

levels. Firstly, the unique Tweet ID number of the event, the

user id and screen name of the event’s initiator (referred to

as the ‘primary user’), the date and time of the event, any

text included, any hashtags and user accounts mentioned by

the @username convention, were recorded. Additionally,

where a secondary event was referenced (such as in a

retweet, for example), the same information regarding the

secondary event was recorded (‘secondary user’, ‘secondary

Tweet ID’, ‘secondary text’ etc.).

The ‘profiles’ dataset recorded data pertaining to each user

ID present in the dataset: the name, screen name, ID number,

numbers of tweets, followers and following, and the user

description and location (that given manually in the profile

data, not that recorded automatically by GPS technology).

Due to the nature of the data, there were two variants of this

user profile data. Tweeting or retweeting logged full profile

data, whereas favouriting or following only logged user ID

and screen name. These were termed ‘full users’ and ‘short

users’ respectively, and the dataset showed a 60/40

distribution of the former and latter. Analysis

The first analytical step was to create a network visualisation

of the dataset. An initial decision was made to generate a

network based on active interactions between users –

tweeting, retweeting, liking, following – rather than simply

on all of a user’s followers. This was because, in the latter

case, many of the connections may be latent or dormant, and

may not engage actively with the user. Since the public

engagement that is under scrutiny is an active process of

communication, dormant followers were not felt to be

relevant. For similar reasons, Yang and Counts [16]

conclude that an analysis of “the interaction network, rather

than the follower network, is preferable for network analyses

of Twitter”; this was therefore the approach taken. To this

end, the profiles dataset provided the set of nodes for the

network: each unique user ID formed a node. Edges in the

graph, the lines connecting the nodes, were generated

separately, by recording each occurrence of a primary user

and secondary user, or a primary user and a user mentioned

in the text of the tweet. Multiple occurrences gave added

weight to the connection, represented in the graph by the

thickness of the line between the nodes. Edges were

directed, from the primary user to the secondary.

The visualisation was created using the software package

Gephi. After the data were imported, a Fruchterman-

Reingold algorithm was run to achieve a circular layout of

the nodes, in which the participants were clearly visible as

dominant nodes. The Fruchterman-Reingold is a force-

directed approach to graph visualisation, in which a

combination of attractive and repulsive forces operates on all

nodes in order to position them according to aesthetic

principles [7][11]. A modularity analysis was run on the data

to detect communities mathematically within the overall

network. Finally, nodes were sized according to the in-

degree measurement, a count of the number of edges

connected to them in an inward direction (in other words, the

number of times they are mentioned in another tweet, or

retweeted by another user, or their tweet favourited by

another user, etc.). The bigger the node, the more activity

there is by others in which that user appears. This measure

was chosen in order to base the importance of the user not

on their own productivity, but on the activity of others

referring to them.

The resultant graph visualisation can be seen in figure 3.

Clusters of nodes (i.e. Twitter users) are visibly delimited

from one another. Modularity class is denoted by colour,

while the size of the nodes is determined by in-degree score.

Each of these two measurements will be discussed in turn.

Node colour: Modularity Class Modularity analysis produces communities whose members

have stronger connections between each other than to the

network as a whole. In some cases this coincides with both

a single PA data and visually delimited cluster, as with class

3 (PA6), class 4 (PA4), class 7 (PA9) and class 8 (PA5).

Other cases are more complicated. PA1 and PA7 are

combined in class 6, PA2 and PA3 in class 2, and PA10,

PA11 and PA8 in class 5. PAs 2 and 3 were contributed by

the same individual, as were PAs 10 and 11; this would

explain their relationship. This is not true of PAs 1 and 7

and PAs 10/11 and 8. PAs 10, 11 and 8 are accounts

associated with the University of Nottingham, and their

shared class 5, which is located relatively centrally in the

overall graph, also contains 22 of the 27 accounts present in

the data whose names contain “UoN” or “UniOfNott”,

including the official University of Nottingham account

(figure 4). We may perhaps characterise class 5 therefore as

most directly representing the public face of the University.

PAs 1 and 7 are both significant propagators of the hashtag

#synbio, which may be an indication of a shared disciplinary

focus.

Interestingly, a small independent community has emerged

within the data of PA12, and been detected as a separate

modularity class (class 1) (figure 5). This community seems

connected to the phenomenon of looped edges. This occurs

when the primary user of an event also appears in the text of

the event as a mention (using the @ addressivity convention

of Twitter), creating an edge whose Source and Target are

the same node. In other words, it happens when a user

retweets a tweet in which they were mentioned. Figure 5

shows all nodes with a looped edge highlighted in green.

Excluding PAs, there are 94 nodes with looped edges, and

Figure 3. The complete network graph visualisation.

Coloured sections are the modularity classes (communities)

detected by the software’s statistical calculations. Node size is

determined by in-degree score.

PA1

PA2 PA3

PA4

PA5

PA7

PA6 PA8

PA9

PA10 PA11

PA12

almost three-fifths of them (54) are situated in classes 0 and

1 combined (i.e. in the data from PA12).

Figure 5. Detail of classes 0 and 1, both of which originate in

data from PA12.

Figure 6. Nodes with looped edges highlighted in green,

noticeably clustering around PA12.

We may postulate that this clustering of looped-edge nodes

may come about as a combination of both mentioning and

retweeting. If a user x mentions multiple other users in a

single tweet, and if each of them retweets it, they will all

obtain looped edges. To investigate this possibility,

measures were developed of the number of mentions made

by each PA (rather than mentions of them by others) and of

the proportion of retweets in their data. A subset of the data

was compiled for each PA, incorporating only those events

in which the PA was the primary user. Events involving

favouriting or following were omitted. The number of items

in the ‘mentions’ field of each event was counted, and the

total divided by the number of events to produce a ratio. For

the retweets, a count was made of all the event types in the

data of each PA, and the counts for the three retweet types

summed and returned as a percentage of the total number of

events. The resulting graph is shown in figure 7.

Figure 7. Comparison of PA mentioning behaviour and

retweets received.

Figure 7 shows a correspondence between the two ratios for

figure 12. It is possible that this combination of multiple

mentioning and high retweet volume gives rise to the

emergence of modularity class 1 as a separate community.

However, figure 7 also shows a similar correspondence for

PAs 1, 9, and, to a lesser extent, 10, and none of these show

the development of subcommunities; therefore some other

factor must be involved in the process, and it cannot be

explained simply by the combination of mentioning and

retweets.

Node size: Degree

In graph theory, the term degree is used to refer to the

number of edges that connect to a node. When a graph is

directed, that is, when the edges represent a directional

relationship such as a message from a sender to a receiver,

degree can be subdivided into out-degree and in-degree. A

person sending an email to another would create a graph of

two nodes connected by one edge directed from sender to

0.00

1.00

2.00

3.00

4.00

5.00

6.00

1 2 3 4 5 6 7 8 9 10 11 12

PA numberMentions per Event

Retweets as a proportion of PA dataset, multiplied by afactor of 10

Figure 4. University of Nottingham-associated Twitter

accounts highlighted in yellow (one not included for reasons

of space). Modularity class 5 is outlined in red.

receiver. For the present purposes, the ‘sender’ is regarded

as the composer of a tweet (the primary user in the data),

while the ‘receiver(s)’ is/are any user(s) who are

@mentioned in the Tweet (mentioned users in the data)

and, where relevant, the composer of material reproduced in

the tweet (the secondary user). Therefore, out-degree is a

measure of how many other users a node has mentioned or

cited, while in-degree counts how many users have

mentioned or cited the node. This latter metric was regarded

as being a better indication of public engagement, as it

showed others engaging with the node.

In-degree

The nodes in the graph shown in figure 3, then, are sized

according to their in-degree score. PA nodes are in most

cases the largest by some orders of magnitude; this is

expected due to the method of data-collection, in which only

tweets by, or pertaining to, PAs are collected. However, PAs

2, 7 and 10 are significantly smaller than the others,

indicating that these users were not mentioned or retweeted

very often. Their small size may be a contributing factor in

their being subsumed by larger modularity classes.

The node with the highest in-degree score is PA12, and other

larger-than-average nodes can be seen clustered around

PA12 in classes 0 and 1. This is another feature not seen in

the other classes, in which degree scores are uniformly low

except for the central PA. As in-degree score is related to

both mentioning and retweeting, this is clearly related to the

phenomena discussed in the previous section regarding

classes 0 and 1.

Out-degree

A different perspective on the network can be gained by

sizing nodes by out-degree. As described above, this

represents the extent to which a node mentions or cites other

users. By this measurement, PA1 is the largest node by a

substantial amount, as shown in figure 8.

Figure 8. PA nodes resized by out-degree score

The predominance of PA1 with regard to out-degree

corresponds with a measurement of the connections between

PA nodes and nodes of other modularity classes. PA1

connects to four other classes by a total of 24 nodes: 2, 2, 8

and 12, a median of 5 per class. PA3 also connects to four

others, but only with a median of 1 node per class, and PA4

connects to five other classes, but only with a median of 1.5.

PA12 shares 39 nodes with class 1, and connects to 3 others,

but with a median of 2, despite the strong link to class 1.

PA1’s higher degree of connectivity, coupled with a high

out-degree score, is perhaps indicative of a certain

eclecticism, suggesting that PA1’s Twitter activity is

relevant to a variety of other areas within the University. It

may also represent an effort on the part of PA1 to engage

others, and therefore be evidence of an intention of public

engagement.

Information Diffusion

The next step in the analysis was to examine the data for

signs of information diffusion, taken as a sign of public

engagement. It is questionable to what extent hashtags can

be reliably expected to signal the content of a tweet, and

Twitter users expected to signpost their content with explicit,

appropriate hashtags. There are many potential motivations

for hashtag use, ranging from the purely deictic to the ironic.

A study of Twitter by Zappavigna [17] features a tweet

composed almost entirely of complex hashtags, which,

rather than being used as the content-aggregating tag that is

their ostensible purpose, are used ironically to indicate the

thoughts and feelings of the user. However, it was felt in the

current study that Twitter users deliberately trying to initiate

a discussion around a topic were likely to use hashtags from

a perception of the convention as “a well-known practice on

Twitter” [9]. Furthermore, it was felt that an analysis of

hashtags could provide a measurement of the success or

otherwise of public engagement: a hashtag used frequently

by one user in an attempt to stimulate discussion, but not

picked up and used by others, could be regarded as

unsuccessful, while one used multiple times by multiple

users could be considered to have successfully diffused.

Hashtags were therefore retrieved from the dataset, along

with counts of the number of times they appear and the

number of users invoking them. To show the number of

users engaged per use of the hashtag, the formula u / f was

initially used, where u is the number of users and f is the

frequency of occurrence. However, this proved

unsatisfactory, as it naturally gave the same weight to a

hashtag used 100 times by 100 people, and one used once by

one person. Furthermore, as it is always true that u<=f (a

hashtag cannot be used by two people but only appear once),

then 0<x<=1, with all instances of u=f grouped together at

the maximum end of the range. To counteract this, the value

u2 was used, in the formula u2 / f. This foregrounded the

importance of the users, which was considered to be a better

indication of uptake and therefore engagement, while

eliminating x=1 values, except in cases of u=1, f=1; as these

were located in the middle of the range they did not interfere

in the analysis. The chosen derived metric further ensured

that the bottom of the table was occupied by hashtags used

an increasing number of times by only one user. Figure 6

shows a list of the top twenty hashtags ranked by this metric.

1

2 3

4

5

6 7

9

8

11 10

12

Figure 6. Top twenty hashtags in the dataset, ranked by u2 / f

score.

An exploration of the hashtags suggests a distinction

between hashtags that are related to a conversation around a

topic, and those which are related to a specific event. As

might be expected, the latter show peaks in usage over the

time of the event (for example figure 7), while the former

show more constant usage (figure 8), although where they

coincide with events and are used in conjunction with the

corresponding event hashtags, they too show peaks.

Figure 7. Frequency over time of event hashtag #microbio16.

Figure 8. Frequency over time of topic hashtag #synbio, with

peaks caused by related events.

From the top twenty list, five hashtags were manually chosen

to cover a range of subjects and include both topic and event

hashtags. These were mapped onto the network

visualisation by tagging their use onto the profile data of

accounts using them. The result can be seen in figure 9.

Figure 9. Hashtags mapped onto network graph visualisation.

Colours and tags shown below (‘False’ indicates nodes not

using any of the five sample hashtags):

A notable effect in the resulting image is that successful

hashtags seem to be spread in partnership with other nodes

of a higher than average in-degree. An inspection of the

distribution of nodes using the hashtag #hearingloss shows a

number of larger nodes around the central PA node clearly

distinguishable. An exception to this pattern is the hashtag

#microbio16, which is used in a community in which the

nodes around the central PA have similarly insignificant in-

degree scores. #microbio16 is an event hashtag, and it can

be hypothesised that the attendees of the event represented

in the data have only the connection with the PA in common

and therefore not indexing each other (which would lead to

higher in-degree scores).

An analysis of the tweets containing specific hashtags

suggests that the success of a hashtag is a product of the

success of individual tweets within the set. This can be

observed by looking at the secondary Tweet ID numbers of

events, which show exactly which prior tweet or retweet is

retweeted or favourited. Taking the hashtag #synbio as an

example, the data contain 120 events in which the hashtag

appears. 15 of these contain no secondary Tweet data (i.e.

they do not refer to another event). The remaining 105

contain 24 different secondary Tweet IDs (20% of the

number of events) which appear multiple times, as shown in

figure 10.

A similar analysis of the #privacy hashtag, judged to be less

successful by its u2 / f score, shows that out of 44 events, 43

contain 18 different secondary Tweet IDs (41% of the

number of events), and that multiple take-up is much lower

(figure 11).

Figure 10. Events with multiple responses in the #synbio data

subset. Y axis represents number of secondary Tweet IDs.

Figure 11. Events with multiple responses in the #privacy

data subsest, showing much lower uptake.

This analysis, whilst being only performed on two specific

cases, suggests that hashtag success is determined by take-

up (retweeting, quoting and favouriting) of individual

tweets. However, it must be remembered that the data

collection method only captures data in which participants

are directly mentioned, and this heavily weights the dataset

towards these behaviours. Original, non-referential tweets

that nevertheless use the target hashtag are not collected, and

therefore will not appear in this analysis.

DISCUSSION Evidence of public engagement

This project has presented a number of phenomena that

might be indicative of public engagement via Twitter. The

clustering of high in-degree loop-edged nodes around the

high in-degree PA12 are indicative of a great deal of

mentioning and retweeting going on in this cluster, and the

emergence of a completely new non PA-centered modularity

class (class 1), possibly as a result, might be interpreted as

the development of a self-sustaining community outside the

academic core. Various assumptions of what constitutes

‘inside’ and ‘outside’ are called into question, however.

Fields such as healthcare have an academic component, but

also a major practice-led industry component. To what

extent can enthusiastic amateurs be regarded as inside or

outside, and on what grounds? What about retired

academics? The detection of information diffusion in social

networks has a proven history, but identifying a ‘public’ and

distinguishing it in the data is needed before public

engagement can easily be detected and measured.

Citizen-centric social media research

This project has been an exercise in conducting social media

research in a way that does not intrude upon the privacy of

social media users who have not given their consent to

participate in the study. To this end the analyses here

presented have been deliberately impersonal, and have not

sought qualitative explanations for the observed phenomena

in the personal data of the users. Explanatory conclusions

have therefore been limited, and further research might focus

on the people behind the nodes in order to answer questions

regarding who is professional and who is public.

Constructing the data collection in a way that conforms to

citizen-centric principles has led to several features of the

dataset and its analysis:

PA-Centered Data

The most obvious consequence is that the data gathered by

this method are heavily weighted towards the participants

and their network communities. The data are user-centered;

they only begin with the first use by the participants of the

web tool, and the set only contains events that make

reference to the user. These two properties mean that the full

picture of a particular hashtag and its propagation is not

available in the dataset. Any discussion going on around a

hashtag but not directly referencing PA data is simply not

picked up. To conduct a wider study collecting all uses of a

hashtag over its entire lifespan would be likely to contravene

the principles of citizen-centric social media research.

A potential solution to this is in the guidelines for internet

research issued by the Association of Internet Researchers

(AoIR), in which a contextual, case-by-case approach is

advised when determining the ethical risks inherent in a

project [10]. Where an initial study such as the present

indicates a particular hashtag that clearly represents a

concerted attempt at public engagement, there is perhaps

scope to argue that a subsequent, wider study is ethically

justified on the grounds that there is a clear expectation that

the data would be public, and therefore could be considered

for analysis without explicit consent on the grounds that

“(h)uman subject research norms such as informed consent

do not apply to public, published material.” [15].

Gaps in the Data

A second reflection on the methodology of the study is that

detecting users within and outside the academic community

did not prove viable. While the hashtag analysis gave some

indication of the popularity of certain tags and topics,

identifying a non-academic audience within the users would

require a more rigorous analysis of the user data.

Particularly, a large-scale computational linguistic analysis

of the user descriptions might yield patterns that could

predict if a user could be considered non-academic. This

was beyond the scope of this study, however, and in any case

as 40% of the user data were represented only by their

Twitter ID number and screen name, any analysis of this

nature on the current dataset would be incomplete.

Finally, the difficult question remains that despite efforts to

conduct the data collection in a manner grounded in the

principles of citizen-centric social media research, it was

unavoidable that data was collected on Twitter users who

engaged in interactions with the study’s participants, but

who did not themselves give consent. The data only afforded

glimpses into their accounts, as opposed to the full access

provided by the consenting participants, but nevertheless,

personal data was collected.

CONCLUSION

This study was an exploration of public engagement through

social media by members of staff at the University of

Nottingham. Using network graph visualisation software,

the web of interactions formed by the Twitter data provided

by consenting participants was examined for patterns of

information diffusion, and an analysis of hashtags carried

out that suggested certain characteristics of successful

engagement. However, conclusions could not be drawn

regarding the engagement with external communities, as

methodological consequences in the data rendered some

forms of analysis impractical. Prior to carrying out research

of this nature again, work needs to be done to define a

‘public’ and consider how it may be detected in Twitter data.

With this in place, further work refining and testing the

necessarily tentative and small-scale findings of this study

could have the potential to demonstrate information

diffusion into the public sphere. On a broader scale, further

research can be done to consider how to couple in-depth

research to suggest hypotheses with wider scale quantitative

research that foregrounds the right to privacy of social media

users.

REFERENCES 1. Maeve Duggan. 2015. The Demographics of Social

Media Users. Pew Research.

2. ESRC. 2015. ESRC Framework for Research

Ethics. January: 1–51. Retrieved from

http://www.esrcsocietytoday.ac.uk/about-

esrc/information/framework-for-research-

ethics/index.aspx

3. European Commission. 2012. Responsible

Research and Innovation: Europe’s ability to

respond to societal challenges.

http://doi.org/10.2777/11739

4. Gunther Eysenbach. 2011. Can Tweets Predict

Citations? Metrics of Social Impact Based on

Twitter and Correlation with Traditional Metrics of

Scientific Impact. Journal of Medical Internet

Research 13, 4: e123.

http://doi.org/10.2196/jmir.2012

5. Casey Fleischman. 2015. UK Digital, Social and

Mobile Statistics for 2015. smlondon. Retrieved

from http://socialmedialondon.co.uk/digital-social-

mobile-statistics-2015/

6. Mark S Frankel, Sanyin Siang, Scientific Freedom,

Law Program, and Policy Programs. 1999. Ethical

and Legal Aspects of Human Subjects Research on

the Internet. Advancement Of Science, November:

20. Retrieved from

http://www.aaas.org/spp/sfrl/projects/intres/report.p

df

7. Thomas M. J. Fruchterman and Edward M.

Reingold. 1991. Graph Drawing by Force-directed

Placement. Software-Practice and Experience 21,

11: 1129–1164.

http://doi.org/10.1002/spe.4380211102

8. Adrien Guille, Hakim Hacid, Cécile Favre, and

Djamel a. Zighed. 2013. Information diffusion in

online social networks. ACM SIGMOD Record 42,

2: 17–28. http://doi.org/10.1145/2503792.2503797

9. Julie Letierce, Alexandre Passant, Stefan Decker,

and John G Breslin. 2009. Understanding how

Twitter is used to spread scientific messages.

October: 8. Retrieved from

http://journal.webscience.org/314/

10. Annette Markham and Elizabeth Buchanan. 2012.

Ethical Decision-Making and Internet Research

Recommendations from the AoIR Ethics Working

Committee. Recommendations from the AoIR

Ethics Working Committee (Version 2.0): 19.

http://doi.org/Retrieved from www.aoir.org

11. Michael J. McGuffin. 2012. Simple algorithms for

network visualization: A tutorial. Tsinghua Science

and Technology 17, 4: 383–398.

http://doi.org/10.1109/TST.2012.6297585

12. Richard Owen, Phil Macnaghten, and Jack Stilgoe.

2012. Responsible research and innovation: From

science in society to science for society, with

society. Science and Public Policy 39, 6: 751–760.

http://doi.org/10.1093/scipol/scs093

13. Alan C Regenberg. 2010. Tweeting science and

ethics: Social media as a tool for constructive

public engagement. The American Journal of

Bioethics 10, 5: 30–31.

http://doi.org/10.1080/15265161003743497

14. Research Councils UK. Pathways to Impact.

Retrieved from

http://www.rcuk.ac.uk/innovation/impacts/

15. Helene Snee. 2012. Making ethical decisions in an

online context : Reflections on using blogs to. 8:

52–67. http://doi.org/10.4256/mio.2013.013

16. Jiang Yang and Scott Counts. 2009. Predicting the

Speed , Scale , and Range of Information Diffusion

in Twitter. Fourth International AAAI Conference

on Weblogs and Social Media: 355–358.

17. M. Zappavigna. 2011. Ambient affiliation: A

linguistic perspective on Twitter. New Media &

Society 13, 5: 788–806.

http://doi.org/10.1177/1461444810385097

18. Yimei Zhu and Rob Procter. 2012. Use of blogs,

Twitter and Facebook by PhD students for

scholarly communication: a UK study. 2012 China

New Media Communication Association Annual

Conference, Macao International Conference 9: 1–

19.

mapping the university’s social media footprint · mapping the university’s social media...

Documents