mapping the university’s social media footprint · mapping the university’s social media...
TRANSCRIPT
Mapping the University’s Social Media Footprint
Andrew Moffat
University of Nottingham
Nottingham, UK
ABSTRACT
This is a report of a study run to explore the use of social
media for public engagement by members of staff of the
University of Nottingham. The study was run as an exercise
in carrying out social media research in a responsible way,
foregrounding the rights to privacy of social media users.
Data was collected from fully consenting study participants,
who contributed the data from the Twitter accounts for
analysis while running a web tool designed to help Twitter
users monitor and manage their Twitter interactions. A
network graph visualisation was created from the data, and
different user communities identified. A study of hashtag
propagation was carried out, and certain characteristics of
successful hashtags noted. Finally, reflections were made on
the nature of social media analysis conducted in this way.
Author Keywords
Social media analysis, public engagement, Twitter, citizen-
centric research, RRI.
INTRODUCTION
Public engagement is an increasingly important part of the
work of an academic researcher. The European
Commission’s Responsible Research and Innovation (RRI)
framework, published in 2012, sets out six keys designed to
‘foster public engagement and a sustained two-way dialogue
between science and civil society’ [3]. RRI aims to embed
scientific research more deeply within society and societal
issues and concerns in order to avoid the kinds of disjuncts
between scientific research and public opinion that have
emerged in recent decades with regard to ethically sensitive
matters such as genetic modification [12]. This discursive
approach to science in society is reflected in the expectation
of research funding bodies that researchers will demonstrate
a clearly defined conduit for communicating their work to
the public, and for measuring the efficacy of that
communication. Research Councils UK, who characterise
public engagement discourse as “Pathways to Impact”, state
that: “A clearly thought through and acceptable Pathways to
Impact is an essential component of a research proposal and
a condition of funding.” [14].
The principal aim of the present study is to detect signs of
successful public engagement through Twitter. The
secondary aim is to consider the methodological
implications of the ethically-driven approach on the efficacy
of analysis. Use of network science and graph theory will
be made in the study in order to analyse the network of
interactions present in the Twitter data provided by
participants. Each of these concepts will now be discussed
in turn.
Public Engagement through Social Media
As a tool for communicating to a wide audience, social
media networks clearly have a lot to offer. With an ever-
increasing proportion of the nation’s populace using social
media to interact, not only with their immediate social
groups, but with wider society more generally (#smlondon
gives a 6% increase in active social media accounts in the
UK from 2014 to 2015) [5], the potential to spread ideas and
engage in meaningful dialogue around emerging research
themes is enormous. However, practices and attitudes
towards social media vary widely among academics, who
are, in any case, typically not among the demographic
groups that drive adoption and development of social media
(a report by Pew Research on US social media usage shows
the 18-29 age group leading usage in all major social
networks except LinkedIn) [1].
The Nature and Measurability of Impact
In our data-rich world, the response to the need for proof of
public engagement has been the emergence of metrics that
can offer conveniently comparable numeric values to
evaluate impact. Citation count, h-index and impact factor
provide researchers quantifiable proof of uptake of their
published work; however, these metrics only really give a
picture of impact within academia, and it is far more difficult
to detect and measure the extent to which scientific research
emerges from the academic community into the wider world.
Social media provide “outstanding opportunities” for public
engagement [13], and the potential to interact with a lay
audience, but there is also the potential for conversations to
remain within academic circles and not creating true public
discourse despite an appearance of success.
Eysenbach shows that the level of Twitter activity linked to
the publication of a research paper correlates with the
number of citations of that paper, although he notes that
“[c]orrelation is not causation, and it is harder to decide
whether extra citations are a result of the social media buzz,
or whether it is the underlying quality of an article or
newsworthiness that drives both the buzz and the citations”.
[4] While the two things are very likely connected, it raises
the question of to what extent this is an example of social
media used as a conduit for public outreach, or social media
as a metric to gauge impact.
Letierce et al [9] present a study of the use of Twitter to
spread the public engagement on Twitter during three
different scientific conferences. After conducting a survey
of academics in their target community to investigate their
use of social media, the study captured Tweets containing
the official hashtag of each of the three events. Among the
users in their data they distinguished between hubs and
authorities, the former being prolific users of tweets
containing @ addressivity, and the latter being people
frequently mentioned by their @username. Users with high
scores in both categories seemed to be directly involved with
the event in question. A study of hashtags used in Tweets
during the conferences suggested a classification scheme of
seven categories: technical terms, events, domains,
applications, institute/people, documentation, and other.
This classification was used to try to determine whether or
not Tweeters were reaching communities external to their
own, and it was reasoned that technical terms would be
unlikely to spread beyond the expert community, while the
more high-level domains category could appeal to people
with only a general understanding of the field. However, the
paper does not make clear how users were judged to belong
to a ‘community’ or not (indeed, the term itself is not well
defined), and conclusions drawn in this regard seem more
like suppositions.
The Ethics of Social Media Analysis
The abundance of easily available data makes social media
an obvious place to conduct research. However, whereas in
the past the accessibility of data was in itself an indication of
its public nature, the blurring of public and private inherent
in computer-mediated discourse requires us to “question
whether the availability of information on the internet
necessarily makes this information ‘fair game’” [15]. Snee
considers in particular the ethical issues in studying blogs,
which, as a form of discourse, echo the intimate nature of a
personal diary, and yet access to them is unrestricted, ergo
public. In such cases it is not clear whether the data represent
published textual material, or data taken from human
subjects, and this author/subject distinction is crucial to the
ethical debate.
To what extent social media activity can be considered
public or private in nature is another contentious issue. The
ESRC Framework for Research Ethics states that “the public
nature of any communication or information on the internet
or through social media should always be critically
examined”, [2] and that users of computer-mediated
communication may not consider their comments as being
made fully in the public sphere. This is related to the idea of
‘psychological privacy’, in discussion of which Frankel and
Siang consider “a distinction between the public and private
domains based not on the accessibility of the data, but on the
psychological perception of the subjects with regard to the
information” [6].
Citizen-Centric Approaches to Social Media Analysis
(CaSMa) is a research group within the Horizon Digital
Economy Research Institute at the University of
Nottingham. CaSMa’s role is to explore the ethical
implications of social media research and data analysis, and,
more generally, the analysis without consent of large
volumes of personal data. Built upon the concept of personal
data as a new asset class, a valuable commodity that is
routinely traded and profited from, CaSMa seeks methods of
engaging in social media-based research that foreground the
ethical considerations discussed above, and avoid the ‘help
yourself’ approach to data collection that has underpinned
much social media analysis to date.
With this in mind, the current study was designed in such a
way as to collect data only from active participants, with
their express written consent. This resulted in a number of
methodological issues and features of the resulting dataset
that will be discussed later.
Network Science and Graph Theory
The relationships and/or interactions between members of a
social group can be conceptualised as a network, and
visualised as a graph. In terms of social media data, such a
network can show latent relationships, such as friends or
contacts (who may nevertheless rarely interact), or active
interactions, such as the sending of directed messages. In a
graph visualisation of a network, individuals are represented
by dots termed nodes; nodes which share some form of
relationship or interaction are connected by lines, termed
edges. Both nodes and edges can visually reflect attributes
of the individuals and relationships: frequency of interaction
between nodes can be shown in the thickness of the edge,
while some measure of the ‘importance’ of an individual can
be shown in the size of the node. Edges can be directional,
in the sense of regarding an interaction such as sending a
message as following a path from one individual to another.
The number of edges that terminate at a node (hence the
number of other nodes to which it is connected) give it a
degree score, which in the case of directional edges can be
in-degree or out-degree. Algorithms run on the graph result
in a visual layout based on the principle that linked nodes
attract one another, while unlinked nodes repel, while
statistical tests run on the network result in a number of
measures, some of which will be considered later.
Given the existence of a network of interactions, we can
suppose that information will diffuse between nodes in a
process as summarised by Guille et al: “(i) a piece of
information carried by messages, (ii) spreads along the edges
of the network according to particular mechanics, (iii)
depending on specific properties of the edges and nodes.” [8]
The three stages of this process correspond with three main
questions behind the analysis of information diffusion in
network science, these being: “(i) which pieces of
information or topics are popular and diffuse the most, (ii)
how, why and through which paths information is diffusing,
and will be diffused in the future, (iii) which members of the
network play important roles in the spreading process?” [8]
The Nature of Public Engagement on Twitter
For the purposes of the current study, public engagement is
considered to be a form of information diffusion. If the
public is successfully engaged in discourse on a topic, we
might expect that topic to arise in a certain place or
community within the wider network, then spread to other
users over time. In information science this is referred to as
infection. Moreover, we would expect that the topic should
diffuse out of the original community, in order to
differentiate true public engagement from information
diffusion within academia: academics talking to other
academics. This necessitates some consideration of how a
community is defined, and how it might be detectable in the
data. If the goal of public engagement is to stimulate
discourse in wider society, then communities should be
defined in terms of their level of knowledge of, and
involvement with, the topic in question. These two are not
necessarily linked: one may have personal experience of
mental illness, but not have an expert knowledge of it. The
goal of creating the graph visualisation of the network is to
attempt to identify communities and observe information
diffusing into a wider context.
BACKGROUND
The current study was carried out within the wider context
of an existing project being conducted by researchers at
Horizon Digital Economy Research, University of
Nottingham. Some account of this project must be given to
justify the rationale for the current study, though the work
described below did not directly form part of the latter.
The wider context: POET The Public Outreach and Engagement Tool (POET) project
being conducted by the CaSMa research team in the Horizon
Digital Economy Research Institute aims to explore ways in
which researchers can harness the potential of social media
for public engagement. To this end, interviews were
conducted with 13 academics within the University of
Nottingham to investigate their practices and attitudes
towards social media. Participants reported feeling a sense
of obligation to use Twitter as a means of communicating
their research, and increased pressure generally to
demonstrate public engagement in their work. The
interviews corroborated an earlier study [18] in finding a
general perception of Twitter as being better suited to
professional outreach, in contrast to the perception of
Facebook as a more personal platform. In particular, the
usefulness of Twitter as a communicative medium during
conferences, both by attendees and non-attendees who
nevertheless had an interest in the event, was expressed by 7
of the 13 participants.
Based on the findings of these interviews, the decision was
made to focus on Twitter as a platform for outreach, and to
develop a software tool to allow users to monitor and
manage their Twitter interactions. Participants in the
original interviews, as well as other members of staff
including some in administrative roles, were invited to
attend a participatory design session, in which their views
were solicited regarding the development of the tool.
Following this preparatory work, the current project was
conceived as an exploration of the use of Twitter for public
engagement by members of the University of Nottingham
staff, using data generated by a pilot implementation of this
tool.
The Tool
The tool uses the Twitter API to access the user’s Twitter
feed, and present the information graphically (figure 1). In
order to allow data to be extracted for analysis, there is an
option to ‘Export my data’, which downloads the Twitter
feed data as a compressed file in json (Java-Script Object
Notation) format.
Figure 1. The Twitter web tool. Events are visible as blobs
coloured according to event type, and the passage of time is
visible from left to right.
The current study is concerned with analysis of the data
collected through the use of the tool, rather than with the
experience of using the tool itself, so this brief description
will suffice for present purposes.
METHODOLOGY
After gaining approval for the study from the Ethics
Committee of the School of Computer Science at the
University of Nottingham, individuals were contacted who
had been involved with, or shown interest in, the POET
project during the earlier interview and participatory design
workshop stages. 11 individuals agreed to participate, two
of whom offered to provide data from two different Twitter
accounts of which they were the primary contributors, giving
a total of 13 accounts. 10 of the 11 individuals were met
with in person, at which time they were fully informed of the
procedure and their consent gained by their signing of the
consent form. In accordance with both the EPSRC policy
framework on research data and the Data Protection Act,
participants were explicitly asked for their consent to their
data being retained after the end of the study, solely for
purposes of replicating the study. All participants consented
to this. They were shown how to use the tool and download
their data, and how to install the freely-available AES Crypt
encryption tool, to ensure they could send their data securely
by email. The eleventh individual was guided through the
process via email exchange.
The study was designed to reflect the principles of personal
ownership of personal data espoused by CaSMa. Rather
than ‘crawl’ or ‘scrape’ the web for tweets (which in any
case has become problematic in recent years due to Twitter
changing its policies), the only data collected was that
provided directly by active, consenting study participants.
Data collection was planned for a four-week period, with
participants asked to provide four weekly instalments of
their data, to allow for an iterative approach to data
processing and analysis. The time frame and period was
broadly the same for all the participants, although, as the tool
started collecting their data from the first use, the starting
point for each participant was the initial meeting and was
therefore subject to some variation. In the event, technical
problems led to data from one participant being unavailable
for analysis, resulting in a final count of 12 accounts. In
what follows, the term participant will refer to the
individuals, and participant account, abbreviated to PA,
will refer to the accounts used for data collection.
The data gained via this method comprised the full Twitter
feed of each account provided by participants, as would be
visible on an account’s profile screen. It incorporated all
activity by the account holder, as well as all activity by other
users in which the account id appeared, such as retweets,
mentions, (un)favouriting and (un)following.
Figure 2. Total numbers of events of each type in the full
dataset.
In its raw form the data consisted of a number of text files in
.json (Java Script Object Notation) format. The data was
presented as a series of events, each event being one of the
coloured blobs visible in the web tool. A breakdown of the
numbers of different types of events in the full, final dataset
is shown in figure 2. The programming language Python
was used to extract data and assemble several different
datasets for analysis. The ‘events’ dataset recorded each
event, along with the type of event and its accompanying
data, which were considerable and operated on several
levels. Firstly, the unique Tweet ID number of the event, the
user id and screen name of the event’s initiator (referred to
as the ‘primary user’), the date and time of the event, any
text included, any hashtags and user accounts mentioned by
the @username convention, were recorded. Additionally,
where a secondary event was referenced (such as in a
retweet, for example), the same information regarding the
secondary event was recorded (‘secondary user’, ‘secondary
Tweet ID’, ‘secondary text’ etc.).
The ‘profiles’ dataset recorded data pertaining to each user
ID present in the dataset: the name, screen name, ID number,
numbers of tweets, followers and following, and the user
description and location (that given manually in the profile
data, not that recorded automatically by GPS technology).
Due to the nature of the data, there were two variants of this
user profile data. Tweeting or retweeting logged full profile
data, whereas favouriting or following only logged user ID
and screen name. These were termed ‘full users’ and ‘short
users’ respectively, and the dataset showed a 60/40
distribution of the former and latter. Analysis
The first analytical step was to create a network visualisation
of the dataset. An initial decision was made to generate a
network based on active interactions between users –
tweeting, retweeting, liking, following – rather than simply
on all of a user’s followers. This was because, in the latter
case, many of the connections may be latent or dormant, and
may not engage actively with the user. Since the public
engagement that is under scrutiny is an active process of
communication, dormant followers were not felt to be
relevant. For similar reasons, Yang and Counts [16]
conclude that an analysis of “the interaction network, rather
than the follower network, is preferable for network analyses
of Twitter”; this was therefore the approach taken. To this
end, the profiles dataset provided the set of nodes for the
network: each unique user ID formed a node. Edges in the
graph, the lines connecting the nodes, were generated
separately, by recording each occurrence of a primary user
and secondary user, or a primary user and a user mentioned
in the text of the tweet. Multiple occurrences gave added
weight to the connection, represented in the graph by the
thickness of the line between the nodes. Edges were
directed, from the primary user to the secondary.
The visualisation was created using the software package
Gephi. After the data were imported, a Fruchterman-
Reingold algorithm was run to achieve a circular layout of
the nodes, in which the participants were clearly visible as
dominant nodes. The Fruchterman-Reingold is a force-
directed approach to graph visualisation, in which a
combination of attractive and repulsive forces operates on all
nodes in order to position them according to aesthetic
principles [7][11]. A modularity analysis was run on the data
to detect communities mathematically within the overall
network. Finally, nodes were sized according to the in-
degree measurement, a count of the number of edges
connected to them in an inward direction (in other words, the
number of times they are mentioned in another tweet, or
retweeted by another user, or their tweet favourited by
another user, etc.). The bigger the node, the more activity
there is by others in which that user appears. This measure
was chosen in order to base the importance of the user not
on their own productivity, but on the activity of others
referring to them.
The resultant graph visualisation can be seen in figure 3.
Clusters of nodes (i.e. Twitter users) are visibly delimited
from one another. Modularity class is denoted by colour,
while the size of the nodes is determined by in-degree score.
Each of these two measurements will be discussed in turn.
Node colour: Modularity Class Modularity analysis produces communities whose members
have stronger connections between each other than to the
network as a whole. In some cases this coincides with both
a single PA data and visually delimited cluster, as with class
3 (PA6), class 4 (PA4), class 7 (PA9) and class 8 (PA5).
Other cases are more complicated. PA1 and PA7 are
combined in class 6, PA2 and PA3 in class 2, and PA10,
PA11 and PA8 in class 5. PAs 2 and 3 were contributed by
the same individual, as were PAs 10 and 11; this would
explain their relationship. This is not true of PAs 1 and 7
and PAs 10/11 and 8. PAs 10, 11 and 8 are accounts
associated with the University of Nottingham, and their
shared class 5, which is located relatively centrally in the
overall graph, also contains 22 of the 27 accounts present in
the data whose names contain “UoN” or “UniOfNott”,
including the official University of Nottingham account
(figure 4). We may perhaps characterise class 5 therefore as
most directly representing the public face of the University.
PAs 1 and 7 are both significant propagators of the hashtag
#synbio, which may be an indication of a shared disciplinary
focus.
Interestingly, a small independent community has emerged
within the data of PA12, and been detected as a separate
modularity class (class 1) (figure 5). This community seems
connected to the phenomenon of looped edges. This occurs
when the primary user of an event also appears in the text of
the event as a mention (using the @ addressivity convention
of Twitter), creating an edge whose Source and Target are
the same node. In other words, it happens when a user
retweets a tweet in which they were mentioned. Figure 5
shows all nodes with a looped edge highlighted in green.
Excluding PAs, there are 94 nodes with looped edges, and
Figure 3. The complete network graph visualisation.
Coloured sections are the modularity classes (communities)
detected by the software’s statistical calculations. Node size is
determined by in-degree score.
PA1
PA2 PA3
PA4
PA5
PA7
PA6 PA8
PA9
PA10 PA11
PA12
almost three-fifths of them (54) are situated in classes 0 and
1 combined (i.e. in the data from PA12).
Figure 5. Detail of classes 0 and 1, both of which originate in
data from PA12.
Figure 6. Nodes with looped edges highlighted in green,
noticeably clustering around PA12.
We may postulate that this clustering of looped-edge nodes
may come about as a combination of both mentioning and
retweeting. If a user x mentions multiple other users in a
single tweet, and if each of them retweets it, they will all
obtain looped edges. To investigate this possibility,
measures were developed of the number of mentions made
by each PA (rather than mentions of them by others) and of
the proportion of retweets in their data. A subset of the data
was compiled for each PA, incorporating only those events
in which the PA was the primary user. Events involving
favouriting or following were omitted. The number of items
in the ‘mentions’ field of each event was counted, and the
total divided by the number of events to produce a ratio. For
the retweets, a count was made of all the event types in the
data of each PA, and the counts for the three retweet types
summed and returned as a percentage of the total number of
events. The resulting graph is shown in figure 7.
Figure 7. Comparison of PA mentioning behaviour and
retweets received.
Figure 7 shows a correspondence between the two ratios for
figure 12. It is possible that this combination of multiple
mentioning and high retweet volume gives rise to the
emergence of modularity class 1 as a separate community.
However, figure 7 also shows a similar correspondence for
PAs 1, 9, and, to a lesser extent, 10, and none of these show
the development of subcommunities; therefore some other
factor must be involved in the process, and it cannot be
explained simply by the combination of mentioning and
retweets.
Node size: Degree
In graph theory, the term degree is used to refer to the
number of edges that connect to a node. When a graph is
directed, that is, when the edges represent a directional
relationship such as a message from a sender to a receiver,
degree can be subdivided into out-degree and in-degree. A
person sending an email to another would create a graph of
two nodes connected by one edge directed from sender to
0.00
1.00
2.00
3.00
4.00
5.00
6.00
1 2 3 4 5 6 7 8 9 10 11 12
PA numberMentions per Event
Retweets as a proportion of PA dataset, multiplied by afactor of 10
Figure 4. University of Nottingham-associated Twitter
accounts highlighted in yellow (one not included for reasons
of space). Modularity class 5 is outlined in red.
receiver. For the present purposes, the ‘sender’ is regarded
as the composer of a tweet (the primary user in the data),
while the ‘receiver(s)’ is/are any user(s) who are
@mentioned in the Tweet (mentioned users in the data)
and, where relevant, the composer of material reproduced in
the tweet (the secondary user). Therefore, out-degree is a
measure of how many other users a node has mentioned or
cited, while in-degree counts how many users have
mentioned or cited the node. This latter metric was regarded
as being a better indication of public engagement, as it
showed others engaging with the node.
In-degree
The nodes in the graph shown in figure 3, then, are sized
according to their in-degree score. PA nodes are in most
cases the largest by some orders of magnitude; this is
expected due to the method of data-collection, in which only
tweets by, or pertaining to, PAs are collected. However, PAs
2, 7 and 10 are significantly smaller than the others,
indicating that these users were not mentioned or retweeted
very often. Their small size may be a contributing factor in
their being subsumed by larger modularity classes.
The node with the highest in-degree score is PA12, and other
larger-than-average nodes can be seen clustered around
PA12 in classes 0 and 1. This is another feature not seen in
the other classes, in which degree scores are uniformly low
except for the central PA. As in-degree score is related to
both mentioning and retweeting, this is clearly related to the
phenomena discussed in the previous section regarding
classes 0 and 1.
Out-degree
A different perspective on the network can be gained by
sizing nodes by out-degree. As described above, this
represents the extent to which a node mentions or cites other
users. By this measurement, PA1 is the largest node by a
substantial amount, as shown in figure 8.
Figure 8. PA nodes resized by out-degree score
The predominance of PA1 with regard to out-degree
corresponds with a measurement of the connections between
PA nodes and nodes of other modularity classes. PA1
connects to four other classes by a total of 24 nodes: 2, 2, 8
and 12, a median of 5 per class. PA3 also connects to four
others, but only with a median of 1 node per class, and PA4
connects to five other classes, but only with a median of 1.5.
PA12 shares 39 nodes with class 1, and connects to 3 others,
but with a median of 2, despite the strong link to class 1.
PA1’s higher degree of connectivity, coupled with a high
out-degree score, is perhaps indicative of a certain
eclecticism, suggesting that PA1’s Twitter activity is
relevant to a variety of other areas within the University. It
may also represent an effort on the part of PA1 to engage
others, and therefore be evidence of an intention of public
engagement.
Information Diffusion
The next step in the analysis was to examine the data for
signs of information diffusion, taken as a sign of public
engagement. It is questionable to what extent hashtags can
be reliably expected to signal the content of a tweet, and
Twitter users expected to signpost their content with explicit,
appropriate hashtags. There are many potential motivations
for hashtag use, ranging from the purely deictic to the ironic.
A study of Twitter by Zappavigna [17] features a tweet
composed almost entirely of complex hashtags, which,
rather than being used as the content-aggregating tag that is
their ostensible purpose, are used ironically to indicate the
thoughts and feelings of the user. However, it was felt in the
current study that Twitter users deliberately trying to initiate
a discussion around a topic were likely to use hashtags from
a perception of the convention as “a well-known practice on
Twitter” [9]. Furthermore, it was felt that an analysis of
hashtags could provide a measurement of the success or
otherwise of public engagement: a hashtag used frequently
by one user in an attempt to stimulate discussion, but not
picked up and used by others, could be regarded as
unsuccessful, while one used multiple times by multiple
users could be considered to have successfully diffused.
Hashtags were therefore retrieved from the dataset, along
with counts of the number of times they appear and the
number of users invoking them. To show the number of
users engaged per use of the hashtag, the formula u / f was
initially used, where u is the number of users and f is the
frequency of occurrence. However, this proved
unsatisfactory, as it naturally gave the same weight to a
hashtag used 100 times by 100 people, and one used once by
one person. Furthermore, as it is always true that u<=f (a
hashtag cannot be used by two people but only appear once),
then 0<x<=1, with all instances of u=f grouped together at
the maximum end of the range. To counteract this, the value
u2 was used, in the formula u2 / f. This foregrounded the
importance of the users, which was considered to be a better
indication of uptake and therefore engagement, while
eliminating x=1 values, except in cases of u=1, f=1; as these
were located in the middle of the range they did not interfere
in the analysis. The chosen derived metric further ensured
that the bottom of the table was occupied by hashtags used
an increasing number of times by only one user. Figure 6
shows a list of the top twenty hashtags ranked by this metric.
1
2 3
4
5
6 7
9
8
11 10
12
Figure 6. Top twenty hashtags in the dataset, ranked by u2 / f
score.
An exploration of the hashtags suggests a distinction
between hashtags that are related to a conversation around a
topic, and those which are related to a specific event. As
might be expected, the latter show peaks in usage over the
time of the event (for example figure 7), while the former
show more constant usage (figure 8), although where they
coincide with events and are used in conjunction with the
corresponding event hashtags, they too show peaks.
Figure 7. Frequency over time of event hashtag #microbio16.
Figure 8. Frequency over time of topic hashtag #synbio, with
peaks caused by related events.
From the top twenty list, five hashtags were manually chosen
to cover a range of subjects and include both topic and event
hashtags. These were mapped onto the network
visualisation by tagging their use onto the profile data of
accounts using them. The result can be seen in figure 9.
Figure 9. Hashtags mapped onto network graph visualisation.
Colours and tags shown below (‘False’ indicates nodes not
using any of the five sample hashtags):
A notable effect in the resulting image is that successful
hashtags seem to be spread in partnership with other nodes
of a higher than average in-degree. An inspection of the
distribution of nodes using the hashtag #hearingloss shows a
number of larger nodes around the central PA node clearly
distinguishable. An exception to this pattern is the hashtag
#microbio16, which is used in a community in which the
nodes around the central PA have similarly insignificant in-
degree scores. #microbio16 is an event hashtag, and it can
be hypothesised that the attendees of the event represented
in the data have only the connection with the PA in common
and therefore not indexing each other (which would lead to
higher in-degree scores).
An analysis of the tweets containing specific hashtags
suggests that the success of a hashtag is a product of the
success of individual tweets within the set. This can be
observed by looking at the secondary Tweet ID numbers of
events, which show exactly which prior tweet or retweet is
retweeted or favourited. Taking the hashtag #synbio as an
example, the data contain 120 events in which the hashtag
appears. 15 of these contain no secondary Tweet data (i.e.
they do not refer to another event). The remaining 105
contain 24 different secondary Tweet IDs (20% of the
number of events) which appear multiple times, as shown in
figure 10.
A similar analysis of the #privacy hashtag, judged to be less
successful by its u2 / f score, shows that out of 44 events, 43
contain 18 different secondary Tweet IDs (41% of the
number of events), and that multiple take-up is much lower
(figure 11).
Figure 10. Events with multiple responses in the #synbio data
subset. Y axis represents number of secondary Tweet IDs.
Figure 11. Events with multiple responses in the #privacy
data subsest, showing much lower uptake.
This analysis, whilst being only performed on two specific
cases, suggests that hashtag success is determined by take-
up (retweeting, quoting and favouriting) of individual
tweets. However, it must be remembered that the data
collection method only captures data in which participants
are directly mentioned, and this heavily weights the dataset
towards these behaviours. Original, non-referential tweets
that nevertheless use the target hashtag are not collected, and
therefore will not appear in this analysis.
DISCUSSION Evidence of public engagement
This project has presented a number of phenomena that
might be indicative of public engagement via Twitter. The
clustering of high in-degree loop-edged nodes around the
high in-degree PA12 are indicative of a great deal of
mentioning and retweeting going on in this cluster, and the
emergence of a completely new non PA-centered modularity
class (class 1), possibly as a result, might be interpreted as
the development of a self-sustaining community outside the
academic core. Various assumptions of what constitutes
‘inside’ and ‘outside’ are called into question, however.
Fields such as healthcare have an academic component, but
also a major practice-led industry component. To what
extent can enthusiastic amateurs be regarded as inside or
outside, and on what grounds? What about retired
academics? The detection of information diffusion in social
networks has a proven history, but identifying a ‘public’ and
distinguishing it in the data is needed before public
engagement can easily be detected and measured.
Citizen-centric social media research
This project has been an exercise in conducting social media
research in a way that does not intrude upon the privacy of
social media users who have not given their consent to
participate in the study. To this end the analyses here
presented have been deliberately impersonal, and have not
sought qualitative explanations for the observed phenomena
in the personal data of the users. Explanatory conclusions
have therefore been limited, and further research might focus
on the people behind the nodes in order to answer questions
regarding who is professional and who is public.
Constructing the data collection in a way that conforms to
citizen-centric principles has led to several features of the
dataset and its analysis:
PA-Centered Data
The most obvious consequence is that the data gathered by
this method are heavily weighted towards the participants
and their network communities. The data are user-centered;
they only begin with the first use by the participants of the
web tool, and the set only contains events that make
reference to the user. These two properties mean that the full
picture of a particular hashtag and its propagation is not
available in the dataset. Any discussion going on around a
hashtag but not directly referencing PA data is simply not
picked up. To conduct a wider study collecting all uses of a
hashtag over its entire lifespan would be likely to contravene
the principles of citizen-centric social media research.
A potential solution to this is in the guidelines for internet
research issued by the Association of Internet Researchers
(AoIR), in which a contextual, case-by-case approach is
advised when determining the ethical risks inherent in a
project [10]. Where an initial study such as the present
indicates a particular hashtag that clearly represents a
concerted attempt at public engagement, there is perhaps
scope to argue that a subsequent, wider study is ethically
justified on the grounds that there is a clear expectation that
the data would be public, and therefore could be considered
for analysis without explicit consent on the grounds that
“(h)uman subject research norms such as informed consent
do not apply to public, published material.” [15].
Gaps in the Data
A second reflection on the methodology of the study is that
detecting users within and outside the academic community
did not prove viable. While the hashtag analysis gave some
indication of the popularity of certain tags and topics,
identifying a non-academic audience within the users would
require a more rigorous analysis of the user data.
Particularly, a large-scale computational linguistic analysis
of the user descriptions might yield patterns that could
predict if a user could be considered non-academic. This
was beyond the scope of this study, however, and in any case
as 40% of the user data were represented only by their
Twitter ID number and screen name, any analysis of this
nature on the current dataset would be incomplete.
Finally, the difficult question remains that despite efforts to
conduct the data collection in a manner grounded in the
principles of citizen-centric social media research, it was
unavoidable that data was collected on Twitter users who
engaged in interactions with the study’s participants, but
who did not themselves give consent. The data only afforded
glimpses into their accounts, as opposed to the full access
provided by the consenting participants, but nevertheless,
personal data was collected.
CONCLUSION
This study was an exploration of public engagement through
social media by members of staff at the University of
Nottingham. Using network graph visualisation software,
the web of interactions formed by the Twitter data provided
by consenting participants was examined for patterns of
information diffusion, and an analysis of hashtags carried
out that suggested certain characteristics of successful
engagement. However, conclusions could not be drawn
regarding the engagement with external communities, as
methodological consequences in the data rendered some
forms of analysis impractical. Prior to carrying out research
of this nature again, work needs to be done to define a
‘public’ and consider how it may be detected in Twitter data.
With this in place, further work refining and testing the
necessarily tentative and small-scale findings of this study
could have the potential to demonstrate information
diffusion into the public sphere. On a broader scale, further
research can be done to consider how to couple in-depth
research to suggest hypotheses with wider scale quantitative
research that foregrounds the right to privacy of social media
users.
REFERENCES 1. Maeve Duggan. 2015. The Demographics of Social
Media Users. Pew Research.
2. ESRC. 2015. ESRC Framework for Research
Ethics. January: 1–51. Retrieved from
http://www.esrcsocietytoday.ac.uk/about-
esrc/information/framework-for-research-
ethics/index.aspx
3. European Commission. 2012. Responsible
Research and Innovation: Europe’s ability to
respond to societal challenges.
http://doi.org/10.2777/11739
4. Gunther Eysenbach. 2011. Can Tweets Predict
Citations? Metrics of Social Impact Based on
Twitter and Correlation with Traditional Metrics of
Scientific Impact. Journal of Medical Internet
Research 13, 4: e123.
http://doi.org/10.2196/jmir.2012
5. Casey Fleischman. 2015. UK Digital, Social and
Mobile Statistics for 2015. smlondon. Retrieved
from http://socialmedialondon.co.uk/digital-social-
mobile-statistics-2015/
6. Mark S Frankel, Sanyin Siang, Scientific Freedom,
Law Program, and Policy Programs. 1999. Ethical
and Legal Aspects of Human Subjects Research on
the Internet. Advancement Of Science, November:
20. Retrieved from
http://www.aaas.org/spp/sfrl/projects/intres/report.p
df
7. Thomas M. J. Fruchterman and Edward M.
Reingold. 1991. Graph Drawing by Force-directed
Placement. Software-Practice and Experience 21,
11: 1129–1164.
http://doi.org/10.1002/spe.4380211102
8. Adrien Guille, Hakim Hacid, Cécile Favre, and
Djamel a. Zighed. 2013. Information diffusion in
online social networks. ACM SIGMOD Record 42,
2: 17–28. http://doi.org/10.1145/2503792.2503797
9. Julie Letierce, Alexandre Passant, Stefan Decker,
and John G Breslin. 2009. Understanding how
Twitter is used to spread scientific messages.
October: 8. Retrieved from
http://journal.webscience.org/314/
10. Annette Markham and Elizabeth Buchanan. 2012.
Ethical Decision-Making and Internet Research
Recommendations from the AoIR Ethics Working
Committee. Recommendations from the AoIR
Ethics Working Committee (Version 2.0): 19.
http://doi.org/Retrieved from www.aoir.org
11. Michael J. McGuffin. 2012. Simple algorithms for
network visualization: A tutorial. Tsinghua Science
and Technology 17, 4: 383–398.
http://doi.org/10.1109/TST.2012.6297585
12. Richard Owen, Phil Macnaghten, and Jack Stilgoe.
2012. Responsible research and innovation: From
science in society to science for society, with
society. Science and Public Policy 39, 6: 751–760.
http://doi.org/10.1093/scipol/scs093
13. Alan C Regenberg. 2010. Tweeting science and
ethics: Social media as a tool for constructive
public engagement. The American Journal of
Bioethics 10, 5: 30–31.
http://doi.org/10.1080/15265161003743497
14. Research Councils UK. Pathways to Impact.
Retrieved from
http://www.rcuk.ac.uk/innovation/impacts/
15. Helene Snee. 2012. Making ethical decisions in an
online context : Reflections on using blogs to. 8:
52–67. http://doi.org/10.4256/mio.2013.013
16. Jiang Yang and Scott Counts. 2009. Predicting the
Speed , Scale , and Range of Information Diffusion
in Twitter. Fourth International AAAI Conference
on Weblogs and Social Media: 355–358.
17. M. Zappavigna. 2011. Ambient affiliation: A
linguistic perspective on Twitter. New Media &
Society 13, 5: 788–806.
http://doi.org/10.1177/1461444810385097
18. Yimei Zhu and Rob Procter. 2012. Use of blogs,
Twitter and Facebook by PhD students for
scholarly communication: a UK study. 2012 China
New Media Communication Association Annual
Conference, Macao International Conference 9: 1–
19.