PERCEPTIONS AND AWARENESS OF
DATA COLLECTION IN SOCIAL MEDIA
A study submitted in partial fulfilment
of the requirements for the degree of
MSc Information Systems
at
THE UNIVERSITY OF SHEFFIELD
by
SARA MICHELLE URREA AGUILERA
September 2016
1
ABSTRACT
BACKGROUND
The growth of social media and the rise of big data has raised legitimate concerns
over data collection and user privacy. As the majority of active users are aged 16 to
34, the demographics are the primary contributors of this unstructured big data and
sought after by companies who wish to use the mined data and make profit. As
users are becoming more aware of data collection, and University students belong
to this age group, it is important to examine their perceptions, thoughts and
concerns.
AIMS
The aim of the dissertation was to analyse and investigate students’ perceptions of
their social media data being collected and used for marketing, surveillance and
other purposes with or without their awareness.
METHODS
The research was based on qualitative methods. Interviewees were audio recorded
using a semi-structured interview plan. The data was then sorted into themes and
sub-themes based on thematic analysis.
RESULTS
Most of the interviewees were aware that their data is being collected through social
media channels. They do understand that social media is a business and needs to
make profit somehow, and consider some purposes positive. But at the same time
they are worried that their data may fall in the wrong hands, that may end up being
used for undesired purposes or being used to identify private individuals.
CONCLUSIONS
Overall, the students were aware of the data collection and privacy issues, and they
were just living with it. The interviewees did not like their private information being
used, but they just accept it as part of their daily lives, as long as it does not put
them in a dangerous situation, that will not stop them from their normal activities in
the online media. Also, a number of recommendations for research and practice
were suggested.
2
ACKNOWLEDGEMENTS
I would like to thank my supervisor during this research project, Doctor Jo Bates, for
offering her guidance and advice during this time.
I would like to thank the participants that shared their time for this project.
I would like to thank my mother, Carmita, Huguito, my father and the rest of my
family that have supported me through this time.
I would like to thank my group of friends, Ecua-Mex, Oscar Winners, with a special
mention to Loli, Sole and Vil.
3
TABLE OF CONTENTS
TABLE OF CONTENTS ............................................................................................ 3
1 INTRODUCTION ............................................................................................... 5
1.1 RESEARCH AIM ........................................................................................ 7
1.2 RESEARCH OBJECTIVES ......................................................................... 7
1.3 RESEARCH QUESTIONS .......................................................................... 7
2 LITERATURE REVIEW ..................................................................................... 8
2.1 INTRODUCTION ........................................................................................ 8
2.2 BIG DATA & DATA COLLECTION.............................................................. 8
2.3 LEGAL & ETHICAL CONSIDERATIONS .................................................... 9
2.4 USER AWARENESS & PRIVACY CONCERNS ....................................... 10
2.5 CONCLUSION .......................................................................................... 13
3 METHODOLOGY ............................................................................................ 15
3.1 RESEARCH STRUCTURE ....................................................................... 15
3.2 RESPONSE RATE ................................................................................... 16
3.3 DATA COLLECTION METHODS .............................................................. 17
3.3.1 METHODS......................................................................................... 17
3.3.2 INTERVIEW ....................................................................................... 17
3.4 DATA ANALYSIS ...................................................................................... 18
3.4.1 INTRODUCTION ............................................................................... 18
3.4.2 PROCESS DESCRIPTION ................................................................ 18
3.4.3 THEMES AND SUBTHEMES DESCRIPTION ................................... 20
3.5 ETHICAL ASPECTS ................................................................................. 21
3.6 LIMITATIONS AND RISKS ....................................................................... 22
4 RESULTS AND DISCUSSION ......................................................................... 23
4.1 INTRODUCTION ...................................................................................... 23
4.2 OVERVIEW .............................................................................................. 24
4
4.3 AWARENESS OF DATA COLLECTION ................................................... 24
4.3.1 GENERAL AWARENESS .................................................................. 24
4.3.2 AWARENESS OF THE PURPOSE OF DATA COLLECTION ............ 25
4.4 PERCEPTIONS OF DATA COLLECTION ................................................ 25
4.4.1 GENERAL PERCEPTIONS ............................................................... 25
4.4.2 FEELINGS ......................................................................................... 26
4.4.3 PERCEPTIONS OF A POSITIVE COLLECTION PURPOSE ............. 26
4.4.4 PERCEPTIONS OF A NEGATIVE COLLECTION PURPOSE ........... 27
4.5 CONCERNS OF DATA COLLECTION ..................................................... 28
4.5.1 GENERAL CONCERNS .................................................................... 28
4.5.2 RISKS ................................................................................................ 28
4.5.3 PRECAUTIONS ................................................................................. 29
4.6 DISCUSSION ........................................................................................... 29
5 CONCLUSIONS .............................................................................................. 31
5.1 LIMITATIONS AND RECOMMENDATIONS ............................................. 32
6 REFERENCES ................................................................................................ 34
7 APPENDICES.................................................................................................. 37
7.1 RESEARCH ETHICS APPLICATION ....................................................... 37
7.2 INVITATION EMAIL .................................................................................. 42
7.3 CONSENT FORM ..................................................................................... 43
Perceptions and awareness of data collection in social media......................................... 43
7.4 RESEARCH ETHICS APPROVAL LETTER ............................................. 45
7.5 INTERVIEW QUESTIONS ........................................................................ 46
7.6 ACCESS TO DISSERTATION .................................................................. 47
7.7 ADDRESS & FIRST EMPLOYMENT DESTINATION DETAILS ................ 50
5
1 INTRODUCTION
“The interest in Big Data is growing exponentially” (Eynon, 2013)
The concept of “big data” has been an increasingly trending topic over the last few
years and is only expected to grow (Marr, 2016). People from different areas, such
as research, the marketing sector and the government, are more and more
interested in maximising the use of technology in order to analyse the “massive
amounts of data” in the most aggressive ways (Eynon, 2013). But what is this “big
data”? According to Soares (2012), big data is the processed information about the
“customer experiences, organisational processes, and emergent trends” that is
originated while the customer lives its normal life. This unstructured big data can be
found everywhere and is considered too big to be processed by regular data base
software. Big data is different from the Web, although the Internet helps to collect
and share this data. The whole idea is that with big data a better level of insight can
be achieved, which cannot be done using a small portion of information (Cukier &
Mayer-Schoenberger, 2013). The data that is mined through social media can reveal
a lot of information about the user, from their location to whom it is socially
interacting with or linked, the level of influence the user has, and the activity
patterns, which could be used to build a profile of the possible likely preferences or
activities (Kennedy & Moss, 2015). The organisation of all this information translates
into a “source of business analysis” that may result in performance improvements
and lead to new opportunities (Soares, 2012), proving the importance of acquiring,
analysing and processing this data.
Taking into consideration the exponential growth of big data in the past few years
and that it can be found everywhere, it makes it simple to correlate it to the
exponential growth of “social-networking data”. This major increase in social media
data is reaching a point that may run out of control, if it has not done it already
(Scarfi, 2012). The situation is raising concerns in regards to user privacy. Personal
information related to location, social media interactions and internet usage is being
sold to the data broker industry. At the beginning, it was the USA government that
was interested in data mining for surveillance purposes, now it is the business
corporations leading this activity, making the both of them interested in profiting from
the data collection without the worry of a major strict data protection law that may
benefit the user who is the actual owner of the data (Peacock, 2014). With all this
big data being used for different purposes, Google was scrutinized by European
governments due to antitrust and privacy issues. Facebook may also turn into a
6
target due to their big amount of personal data possession, reaching a point where
diplomats will have to decide whether they “treat information flows as similar to free
trade” (Cukier & Mayer-Schoenberger, 2013).
At the same time, users themselves have been becoming increasingly concerned
about their data. Several surveys done in the past years showed that clients are
worried about the information the companies hold about them, how they got this
information, and what are they using it for (Phelps, Nowak & Ferrell, 2000). Another
report showed that most of the USA’s population, which used or did not use the
Internet, were concerned about their private information when they shopped online
(Malhotra, Kim & Agarwal, 2004). In an online user study, young people between 13
to 25 years old mentioned that if they had the chance to choose, they would be
willing to accept data collection for marketing purposes, if they were somehow
rewarded for the loss of their privacy (Graeff & Harmon, 2002).
This has become even more important with the massive growth of social networks
(Statista, 2016A). The most recent possible breach of privacy through social media
channels concerned Facebook’s plans to use WhatsApp data, including phone
numbers, and combine it with Facebook data in order to suggest new friends and to
properly tailor the advertisements for the users. The user reaction has been
negative, with some of them expressing that they feel that WhatsApp is no longer a
trustworthy app, because they do not protect user privacy anymore (Tynan, 2016).
Some customers have been concerned about the “lack of control” (Schechner &
Koh, 2016). The main problem has been not only the users’ backlash, but also
European and British privacy regulators investigating the companies’ privacy
practices in this new plan of sharing information between the two platforms.
According to Statista (2016B), over the third quarter of 2014, from the worldwide
population, 54% of the active Facebook users were between 16 to 34 years old,
making this particular demographic the primary contributors of big data and the
primary target for companies seeking to exploit it. Showing that more than half of the
users are of the young age, it is important to focus on their perceptions to inspect
the moral effects data collection has on people. Most university students are within
the age group. In effect, this research will focus on the awareness, perceptions and
concerns of students from the University of Sheffield in regards to their data being
collected.
7
1.1 RESEARCH AIM
The aim of the dissertation is to analyse and investigate student’s perceptions of
their social media data being used for marketing, surveillance and other purposes
with or without their awareness.
1.2 RESEARCH OBJECTIVES
The objectives of the dissertation are:
Examine the general trends of data collection in social media within
scholarly literature and popular media;
Investigate University students’ awareness of personal data collection in
social media and their personal feelings and perceptions in regards to it
through the use of a questionnaire;
Inspect the moral effect personal data collection in social media has on
people by analysing the data from the interviews in combination with the
literature.
1.3 RESEARCH QUESTIONS
Using the interview questionnaire, this study´s goal is to answer the following
questions:
1. Are the students aware of their personal data being collected?
2. What feelings, emotions, reactions, does this situation produce in them?
3. What do they think about that situation, if it is positive or negative?
4. What measures have they taken or plan to take in regards to that?
8
2 LITERATURE REVIEW
2.1 INTRODUCTION
The literature review provides an overview of literature related to the topic of data
collection awareness. The literature review is divided into three sections:
Big Data & Data Collection provides some critical reflections from scholars;
Legal & Ethical Considerations provides an overview of issues with
regulations and ethics of data collection;
User Awareness & Privacy Concerns expands on the privacy concerns and
the state of customer awareness of data collection.
2.2 BIG DATA & DATA COLLECTION
This part of the literature review attempts to group some critical reflections from
scholars in regards to big data and data collection. The introduction provided one
brief definition on big data and its main characteristics. However, according to
Kitchin & McArdle (2016), big data do not all share the same characteristics, and
there are multiple forms of big data. There have been many concepts of big data
that included seven characteristics to identify big data by: “exhaustivity”, “fine-
grained”, “relationality”, “extensionality”, “veracity”, “value” and “variability”. However,
according to the authors’ research, for data to be classified as big data it only has to
include a few characteristics, not all. According to them, velocity – data being
“created in real time” – and exhaustivity – “capturing” all the data and not specific
parts – are the two “most important” and decisive points that define the concept of
big data.
For Boyd & Crawford (2012), the term “Big Data” was used in the past as a way to
refer to really big data sets that required supercomputers to process. Nowadays the
analysis does not require big equipment, it can be done using desktop computers
with any standard software. It is no longer about the quantity of data, but about the
spread and the impact of the content: “Big Data is less about data that is big than it
is about a capacity to search, aggregate, and cross-reference large data sets”
(p.663). The critical point would be how the data is being handled, now that it is
easier to collect and analyse on a large scale, with different purposes, like the
marketers seeing the data as a way to target advertising, insurance providers as a
way to optimize their offerings, bankers using it to gain market insights, etc. Some
9
institutions used their client’s data for inside studies, just as a way to analyse their
behaviour. However, that data from anonymous users was given away to another
company, finding out that is easy to identify the original user even when the data is
anonymous. Again, the user was affected with no chance of defending himself, or
controlling which data he wants to share and with whom.
According to Zwitter (2014), “there are three categories of big data stakeholders: big
data collectors, big data utilizers, and big data generators”. Users, the big data
generators, are the least aware of what their data is being used for. It is reaching a
point, where the big data utilizers, using algorithms, are able to determine our
preferences in many aspects, food, friendship, places, movies, etc. Like Boyd &
Crawford, Zwitter claims that “this information gathered from statistical data and
increasingly from big data can be used in a targeted way to get people to consume
or to behave in a certain way, e.g. through targeted marketing” (p.4). There is an
intent of manipulation, trying to use preferences taken from the data and use it for
purposes that may or may not be transparent, like offering to sell a particular product
and in return receive a present, some small article that they know people would like.
A regular person, with no suspicion, would just believe what he sees, without having
an idea of what is behind that proposal, leaving a big hole in regards to the ethics of
the process.
2.3 LEGAL & ETHICAL CONSIDERATIONS
The current regulations for data usage do not reserve the total rights to the actual
owner, which is the one that originates and provides the data. From Kennedy &
Moss’s perspective (2015), the metadata mined through social media is highly
valuable details as: “who is speaking and sharing, where they are located, to whom
they are linked, how influential and active they are, what their previous activity
patterns look like and what this suggests about their likely preferences and future
activities.” All that data that the user produces while interacting is being used by
companies, in a not so interactive way with the user, due to the public not being able
to participate and modify this processing of their information. So the authors express
their concern about the creation of some regulations that protect the user from this
mining of their information, that may “adapt news and other content based on the
knowledge they have about audiences”, basically offering to sell what they know the
user already likes.
10
Different information governance regulations have been established in Europe and
the U.S. and are enforced to different levels. In the UK, data governance has been
covered by the Data Protection Act 1998, which enforces strict “data protection
principles” (Gov.uk, 2015). According to the law, data collection, storage and usage
purposes have to be declared transparently. The data collectors are required by the
Act to provide access to the collected personal information, except when the Act
says otherwise. On the other hand, Peacock (2014) states that in the USA, people’s
information, particularly related to their leisure activities, social media interactions,
internet usage, are freely bought and sold to the consumer data broker industry.
Ethical law that should protect the user is almost non-existent and allows the online
retailers to increase their profits by using web tracking and the user’s personal data
storage. Every bit of personal interaction data is being analysed using the most
modern methods, somehow these companies have managed to avoid any
regulation, and the customers have no actual data protection laws to defend them.
Even per the user agreement, the client has no way to negotiate the agreement - the
user must accept or decline, and due to most of the people just wanting to use the
social media tool, they accept without even knowing what they are accepting. Due to
this, currently data tracking business is expanding, most of the big data storage
capacities are growing and becoming cheaper.
Once, the state was the one interested in data extraction, it seems that now the
corporations are the one leading the business of data extraction. The government
somehow takes advantage that there is no data extraction regulation, and more than
once they have aligned with private industries to get any desired information. It is a
win-win situation for both, which may explain why the government has no interest of
creating and enforcing a strict law to protect users from data extraction. Lyon (2014)
reviewed the effects of Edward Snowden revelations on big data, explaining the
situation since the alliance between the National Security Agency (NSA) and the
Internet companies in order to collect data for surveillance. This surveillance process
takes the data from the internet provider and the cell phone provider, having any
possible chance of getting all the desired information. The data is being filtered,
analysed, stored for whatever time they consider necessary, using algorithms to
define relations or any suspicious activities.
2.4 USER AWARENESS & PRIVACY CONCERNS
According to the literature, users are aware of privacy policies in relation to their
personal data being collected by data collectors, but the agreements are
complicated and they rely on the trust and good will of the company not to misuse
11
the data. In his study, Obar (2015) questioned what the behaviour of a digital citizen
should be like nowadays, mentioning how irrational it is for a regular person to
understand all the technical terms used in the contract agreement they have signed
for: “Imagine having to understand, manage and control, not only the myriad data
stockpiles that exist, but also the routing data associated with every data
transmission” (p.12). A regular person or any other individual without technical
knowledge is not likely to realise how a third party may take advantage or sell their
data. It is also assumed a regular individual is not likely to keep informed about any
progress or modification related to any contract that he has previously signed. Per
Obar, users need a definite solution, a concrete law that could actually protect them,
citing Lippman (p.13):
“The public is interested in law, not in the laws; in the method of
law, not in the substance; in the sanctity of contract, not in a
particular contract, in understanding based on custom, not in the
custom or that.”
In contrast, Norberg, Horne & Horne (2007) state that personal privacy will keep
deteriorating if the general public does not realise that they need to start making an
effort to actually understand what are they granting permission and to whom, every
time they share their personal data. Earlier research proved (cited in Norberg, Horne
& Horne, 2007, p.107) that users’ concerns about privacy are associated mainly with
risks of “potential negative outcomes” to themselves than with customers’ “trust” on
the company that is handling their data. At the same time because of the way the
current market works in regards to people’s ignorance of their data being collected
illegally, the authors claim that users’ behaviour may not be impacted that much by
these negative perceptions. In theory, users recognise the risks of releasing private
information, however, in practice, people often voluntarily consent to giving away
their data on the basis of trust, especially right now that people are increasing the
time they spend in “data rich transaction channels” or the world wide web, and are
ignoring or simply “ticking” to give their consent without reading the “privacy policy”
of the different online sites.
With the amount of big data available, modern data mining tools were proven to
facilitate monitoring and tracking consumer “purchase behaviour”, in order to gain
deeper targeted insight into consumers’ needs (Graeff & Harmon, 2002). However,
the simplicity of collecting this information also contributes to growing privacy
concerns over companies’ intentions to use the data for its internal marketing
12
purposes or making a profit by selling it to third-parties, losing control over where the
information ends up. At the same time, Malhotra, Kim & Agarwal (2004) claimed that
a user’s thoughts on company’s intentions for their private data are subjective to the
user’s personal beliefs. Sayre and Horne (cited in Norberg, Horne & Horne, 2007)
established that customers were willing to share their personal data with a company
if they get a reward in return. As a result, even when the customer is aware of the
importance of their data, even when the customer is concerned with their privacy,
they will likely end up knowingly giving it away.
The current belief that the control over information is a key factor in measuring the
consumers’ privacy concerns has led to suggest that the sellers should start
assuming as an implicit contract the exchange of private data between them and the
users (cited in Phelps, Nowak, & Ferrell, 2000). Taking that into consideration, a
social contract would be held any time a customer is giving away personal
information to a seller. This contract would be considered violated if the customers’
data is being collected, if the seller rent the customer’s information to a third party
without asking for the customer consent, or whether the customer is not allowed to
remove their name from a marketing list or somehow allow them to decide to restrict
the propagation of their data. In this research, it is assumed that the key point in
order to relieve the users’ privacy concerns is that customers would like to “have
more control” over their data in general and more control on how this personal data
is used.
“Consumer privacy exists when people can limit their accessibility
and control the release of information about themselves, and
invasions of privacy occur when control is lost or unwillingly
reduced as a result of a marketing transaction.” (cited in Phelps,
Nowak, & Ferrell, 2000, p.29)
These privacy issues are particularly obvious among young people using social
networks and modern technology, such as smartphones. Per Pybus, Cote &
Blanke’s paper (2015), mobile applications (apps) are more vulnerable to data
leaking than platforms that run through a desktop browser. Applications
configurations do not distinguish between first and third parties, which means
between the app proprietor and the other companies that the proprietor sells the
data to. In effect, the third party is granted easy access to the cell phone data. By
default, Android and IPhone applications share the device’s SIM identifiers, the
user’s phone number, so the third party has data and the identification of the data
13
producer for a long period of time. Despite this situation, young people still try to find
a way to protect their privacy, however they end up giving up, accepting the contract
agreement, surrendering to social media, which inevitably will generate more data to
the third party. In their paper, the authors commented on the experiment they did,
where a group of young people developed an app that somehow allowed to track
data as the regular apps does, but they were tracking their own data of their
smartphones, giving them the chance to work with their own data mined, and make
whatever they want with it, showing a different scenario, when the owner mines their
own data and decides what to do with it.
In current times where most of the young adults have a Facebook account, it is
acknowledged that this app tracks people whenever they open their browsers, even
if a person does not pose a Facebook account, or the user that actually has the
account has logged out, or disabled the tracking option, the app is still able to track
any subject (Skeggs & Yuill, 2015). Part of the agreement signed with Facebook
when opening an account mentions that the user “should not create an account that
is not for your own personal use, you will not create more than one personal
account, you will keep your contact information accurate and up-to-date” (Facebook,
2015). Basically, they are trying to guarantee that the user keeps their users
authenticated, because that is where the value resides, “they extract property from
the person rather than attaching property to the person” (Skeggs & Yuill, 2015,
p.384). Facebook requires “singularity” from the user as a way to warranty the
authenticity of the data, however, Facebook is not interested in the individual per
say, what they actually do is take all those “individual” data “into multiple aggregate
representations to be monetized as targeted ad space”.
2.5 CONCLUSION
The literature review attempted to provide a critical overview of what big data is and
what are the issues in regards to big data collection. Legal and personal privacy
issues were discussed. It was identified that different regulations are used to
maintain the integrity of data collection and privacy in different countries, with UK
being more strict than USA. However, there are still no real boundaries to data
collection.
The literature highlights that people are generally aware of the risks and the fact that
their data is being collected and used, they do not want to hand it over, but at the
end of the day, under most circumstances, they will still submit their data voluntarily.
14
When giving consent to data collectors, users have no deep knowledge of what is
included in the agreements, and the final destination of their data. It is a mix of
wanting to use the online services and difficulty of understanding the terms and
conditions. Young people are more involved in the privacy issues because they
generate a lot of data, but from their concerns to their attitudes, they are also more
interested in protecting their data, which makes them a good target to investigate.
15
3 METHODOLOGY
3.1 RESEARCH STRUCTURE
This research was based on a qualitative approach, using a semi-structured
interview for collection the data, and a thematic analysis methodology for studying
the data. According to Vaismoradi, Turunen & Bondas (2013), qualitative research
methodology is a group of “philosophical perspectives, assumptions, postulates, and
approaches” that an analyst uses to leave their research open to “analysis, critique,
replication, repetition, and/or adaptation and to choose research methods”. Also, this
methodology is not a single research method, but “different epistemological
perspectives” that have helped to create different approaches, such as: grounded
theory, phenomenology, ethnography, actions research, narrative analysis and
discourse analysis (p.398). Per Dunn (1983), this type of methodology contributes
with some guidelines to improve “and evaluate particular theories and models of
knowledge creations, diffusion and utilisation”.
In regards to interview methods, Qu & Dumay (2011) claimed that structured
interviews are more effective for studying facts, while unstructured interviews are
used for research related to focusing or meaning, and the semi-structure approach
is used for “social construction”, in a sort of overlapping the two other approaches.
Which fits with what is expected to achieve from this study, making a better fit to use
a semi-structured type of interview.
Lythcott & Duschl (1990) emphasise that any research relies on the coherence of
choosing the correct analysis methodology. For this study, thematic analysis was
chosen as the proper approach. According to Braun & Clarke (2016), “thematic
analysis is a method for identifying, analysing and reporting patterns (themes) within
data”. It is a method that provides flexibility which the other methods cannot provide.
There are no strict rules to determine what the themes are; however, and it relies on
the “researcher judgement” in order to define those. It also does not depend on
“quantifiable measures”, but is more focused on if “it captures something important
in relation to the research question”. In effect, the thematic analysis is particularly
useful for analysing qualitative data. The following list provides a step-by-step guide
on how to conduct a thematic analysis based on Braun and Clarke (2016):
16
1. Getting familiarised with the data by reading it and re-reading it several
times;
2. Generating initial codes, by trying to code or label interesting features within
the entire data set;
3. Searching for common themes, by checking the codes and trying to group
them into possible themes;
4. Reviewing themes, and rechecking if the themes actually accomplished what
was stipulated in levels 1 and 2;
5. Defining and naming themes - after rechecking, deciding which will be the
final codes, with a clear definition and name for each theme;
6. Producing a report by comparing the data analysed with the literature and
the research questions.
A pivotal step in thematic analysis is to “catalogue related patterns into sub-themes”
as per Aronson (1995). Themes are a type of pattern of living or behaviour that
could be identified in the data, which in isolation may not have a particular meaning,
but when put together, it provides a better overview of the general opinion.
Gathering sub-themes will give a better understanding of the “pattern emerging”.
Based on the qualitative methodology, the research was divided into two stages.
The first part used a semi-structured qualitative interview method to retrieve the
necessary data from the respondents, which were University students, recruited
through email. The participants were notified about their responses being collected,
anonymized and used for this research. The second stage involved analysing the
data from the interviews using the qualitative thematic analysis approach and
comparing the data results with the trends inferred in the literature review.
3.2 RESPONSE RATE
Part of the research process was to recruit participants that would provide their
opinions in regards of data collection through social media channels. After receiving
the Ethical Approval, a general email was sent to all the University members,
searching for students over 18 years old interested in participating on the study,
using the University service. The response rate was positive, with 10 people replying
to the email in a short time, showing true interest on participating in the project.
17
From the 10 emails responses, after trying to settle up meetings according to mutual
time availability, 7 of them were more feasible. The seven respondents came to the
meetings that were held at the Diamond building group rooms, from August 4th to
9th. Two other participants were brought by one of the interviewees that was
recruited by email. Those participants came freely due to their common interest in
sharing their opinions in regards of the research, making it a total of 9 interviewees.
3.3 DATA COLLECTION METHODS
3.3.1 METHODS
For the data collection, a semi structured interview was designed with assistance
from the dissertation supervisor. After obtaining the Ethical Approval, a general
email was sent through the University system, inviting any student over 18 years old
that wanted to help in the research, the invitation email can be found in Appendix
7.2. The consent form that was previously accepted by the Ethical Approval was
presented to the interviewees. The participants were given time to read it through,
and asked if they had any doubts. Also it was explained to them verbally that the
interview was going to be audio recorded, transcribed and anonymised. Both
transcripts and audio files would be stored at the University secure server with
limited access from my supervisor and me and after the dissertation gets accepted,
the data would be deleted. They were informed that they could stop the interview
anytime or avoid answering any particular question. They freely signed the form, and
a signed copy was given to them.
3.3.2 INTERVIEW
In order to prepare for the interviews, a first draft with the initial questions was used
as a test run with a colleague. This questions draft was sent to the supervisor for
review. After that, some changes in the questions were made in collaboration with
her. The final draft with some substantial changes was the one used for the
interviews, making it impractical to use the test run as part of the research data.
The interviews done with the nine participants were audio recorded with their
confirmed consent. Afterwards, that the audio recordings were transcribed and
anonymised, and stored in the University’s secure drive as part of the ethical
procedures.
18
The interview was designed as semi-structured, due to the benefits it provides to the
interviewer to follow some previously prepared questions, while allowing to expand
depending on how the interview develops and giving it the sensation of a more
relaxed conversation.
Due to the structure of the interview, it was divided into 9 questions. The questions’
subjects were related to:
- Discussing the most used social media platforms
- Assessing the awareness of the ways their data was collected on social
media, the awareness of how and for what purpose their data was used by
third parties.
- Analysing the feelings and emotions related to their data being collected, as
well as any positive and negative thoughts about the usage of their data.
- Addressing and pinpointing risks and concerns in regards to security, privacy
and integrity of data being collected, and actions and precautions taken in
regards to the latter.
The whole set of the interview questions can be found in the Appendix 7.4.
3.4 DATA ANALYSIS
3.4.1 INTRODUCTION
Data analysis section describes the process in which the thematic analysis method
was used in order to study the data obtained during the interviews. The audio from
the interviews was transcribed and used to do the analysis. Main thoughts extracted
from the transcripts were coded into labels that resemble the main idea of those
phrases. Themes and subthemes were recognised. The most important codes were
grouped into the themes and subthemes accordingly. The information was analysed
per codes in each subtheme, and conclusions were reached after that.
3.4.2 PROCESS DESCRIPTION
During the data analysis, all the transcripts from the interviews were read several
times. The phrases that contained the clearer thoughts and feelings were
highlighted. Those thoughts were coded in an effort to label them into shorter and
19
concrete phrases, giving a total of 150 codes. The codes were to be grouped into
themes as part of the thematic analysis procedures. At a first glance, three different
themes were identified: Social Media, Privacy, Data Collection. For the Social Media
theme, the subthemes identified were Platforms and frequency. For Privacy:
Feelings, Perceptions, Precautions, Comments were the subthemes. Data
Collection contained the subthemes: Awareness, Comments, Concerns, Feelings,
Purposes. All the main phrases that were coded were distributed between those
subthemes.
However, after rechecking all the codes, it was identified that some subthemes were
common in between themes, for example the Feelings and Comments subthemes
were common in between Privacy and Data Collection. Also it was recognised that
most of the topics were related to data collection rather than Privacy, so it was more
appropriate to remove the Privacy theme and focus on Data Collection. Another
issue was that the Social Media theme contained information about the participants’
platform preferences, which are more of a quantitative data that does not needed to
be grouped.
After that analysis, it was decided that it was a better approach to redefine the
themes and subthemes in a different way. First of all, Data Collection will no longer
be a theme, and the information handled in that question will be treated as a
statistical data. Secondly, the Privacy theme will be removed, as the information
obtained was not from the privacy perspective, but from the data collection point of
view. The Perceptions theme came as a solution to handle all feelings and,
emotions, related to the fact that the interviewees’ data is being collected. The
Concerns theme was used to handle the data collection issues and risks that worry
the participants and the precautions taken by them.
With the new themes and subthemes defined, the codes were grouped accordingly.
However, in order to make it more efficient, due to the big amount of codes, it was
thought that there should be between 5 to 8 codes per subtheme, which would
require to select in between the codes the more substantial. For the code selection
process, the codes that were repeated more frequently between the interviewees
were given higher precedence. After that, the codes that contain the more
interesting and actual information were selected.
20
3.4.3 THEMES AND SUBTHEMES DESCRIPTION
From the 9 interviews that were analysed, 3 main themes were identified:
Awareness, Perceptions, Concerns. From those themes, between two to four
subthemes were identified for each theme.
Table 3.1 Themes and sub-themes
Themes Sub-Themes
Awareness of Data Collection General Awareness
Awareness of the Purpose of Data
Collection
Perceptions of Data Collection General Perceptions
Feelings
Perceptions of a Positive collection
purpose
Perceptions of a Negative collection
purpose
Concerns of Data Collection General Concerns
Risks
Precautions
According to the main subject areas listed in the data collection methods section, a
number of themes have been identified and the questions were grouped and
analysed accordingly. Three themes have been defined: Awareness of Data
Collection, Perceptions of Data Collection and Concerns of Data Collection. The
answers related to the questions about awareness of how their data was collected in
social media and by third parties were put into the theme Awareness. The answers
related to examples of their perception of how their data was collected in social
media and by third parties, feelings and emotions, positive thoughts, were grouped
in the theme Perceptions. The answers related to risks, concerns and precautions
taken about data collection were put into the theme Concerns. Also the data from
the first subject area detailing most used social media platforms was used to
quantify the preferences of the interviewees.
Inside of the Awareness of Data Collection theme, the subthemes General
Awareness and Awareness of the purpose of data collection were chosen to group
the interviewees answers related to the second main subject. The General
Awareness subtheme was used to collect the opinions in regards of the interviewees
21
awareness of data collection, not particularly of how the data is collected, but
whether they know or not if the data is being collected. For the second subtheme,
Awareness of the Purpose of Data Collection, the opinions related to knowing how
the data is used, or intended to be used for, were coded in this group.
For the Perceptions of Data Collection theme, four subthemes were established.
General Perceptions subtheme was used to group the opinions the interviewees
have about data collection, how they think their data is being collected, any related
idea or thought about data collection itself. In the Feelings subtheme, the particular
sentiments or feelings that are brought about by knowing that their data has been
collected were grouped. After coding the interviewees’ awareness in regards to their
data being used for a particular purpose for the first theme, two following subthemes
were identified. Positive purposes included the opinions of the interviewees showing
acceptance towards how and where their data is being used for specific purposes,
while Negative purposes detailed the opposite of the previous subtheme.
For the third theme, the participants’ opinions were distributed between three
subthemes. General Concerns was used to classify the thoughts that worry the
interviewees the most about their data being collected. Risks subthemes was
created to group the thoughts related to what they consider a possible harm of data
collection. The ideas related to current or future measures taken by the participants
in order to protect their data were placed in the subtheme Precautions.
3.5 ETHICAL ASPECTS
As per University procedures, an ethical application was submitted to the
Information School department for review, which classified this research as “low
risk”. The application form and the certificate of approval letter were included in the
Appendix 7.1, 7.3. The interviews included a human element; however, the
participants were not asked any personal or sensitive information, no names, gender
or age were recorded. The interviews were voice recorded. All data taken from the
audio records was anonymised, as it was assured to the participants at the
beginning of the interview. An ethics consent form was given to the participants,
containing details about the research project, the kind of data that was going to be
collected from them, as well as explaining that the data will be anonymised and
saved in a secure storage drive with limited access. The participants were requested
to read and sign if they agree to give consent to the interview. A copy of the consent
form was given to the participants. The audio recordings, the transcripts, and
22
scanned versions of the consent forms were uploaded to the University secure
research data file store.
3.6 LIMITATIONS AND RISKS
After being reviewed, this research was classified as low risk by the Ethical Approval
letter. The interviewees were recruited by email, using the University service,
restricting the participants to be University students, over 18 years old and living
inside the United Kingdom. The participation in the research contain no risks for the
interviewees. During the interview no demographic data was documented, no
gender, age, background or nationality. After the interviews, the audio recordings
were transcribed and the participants’ names were represented using the letter ‘I’ to
indicate the word ‘interviewee’ followed by an ascending number to represent the
order in which the interviewees participated in the project, as a way to anonymize
the data and protect the interviewees identity, avoiding any possible risk and fulfilling
the Ethical approval agreement. The data analysis is based on the participants’
opinions provided during the interviews, which may not be entirely trustworthy,
however it is a minor risk that is part of the qualitative research.
23
4 RESULTS AND DISCUSSION
4.1 INTRODUCTION
This chapter will discuss the results found in the qualitative analysis of the nine
semi-structured interviews. The following sections will describe the findings obtained
using the thematic analysis method for each of the themes identified during the data
analysis chapter.
The Awareness of Data Collection section will illustrate the main opinions found in
the interviews in regards of the participants’ knowledge of their data being collected.
Showing the main thoughts of the majority of interviewees proving their awareness
about the subject.
The Perceptions of Data Collection section will pinpoint the participants’ thoughts
about their data being collected, their feelings in regards of that, their opinions when
they approve or disapprove the purpose what for their data is being collected.
Illustrating that most of them have a balance between knowing some of the uses
that their data will have and being aware they are not fully in control of that.
The Concerns of Data Collection section will discuss the subjects that worry the
interviewees, general concern, what they consider risky in their data being collected
and the possible precautions taken or willing to take in order to protect their
information. Being that their most common resolution was to trying to stay as private
as possible, keeping control of the data they share.
24
4.2 OVERVIEW
Figure 4.1 Number of interviewees per Platform.
From the data it can be assumed that all the interviewees use Facebook platform.
The first question of the interview was in regards to which social media platforms the
interviewees use on a regular basis. All interviewees answered they have a
Facebook account that they use in a daily basis, the second platform most used is
Twitter according to the 78% of interviewees, the third one is Instagram with 67% of
interviewees, Snapchat, Linked-in, WhatsApp count with 22% of interviewees each,
the rest Yik Yak, Telgram, Skype, Weibo, WeChat, Google+, Pinterest, Myspace,
Hi5, Wayn, only count with one interviewee, please refer to Figure 4.1
4.3 AWARENESS OF DATA COLLECTION
4.3.1 GENERAL AWARENESS
General Awareness subtheme showed that most of the interviewees are aware that
their data is being collected through social media channels, with an amount of 8
from 9 interviewees admitting their knowledge of their data being collected. For
instance, interviewee I6 mentioned “I don´t know what data they collect from me, but
they must collect data, but I don´t know why or what for?”, showing a clear
understanding of the fact that the data is being collected even when does not know
for what specific purpose. There were common comments like, “I know lots of data is
being collected”, “I am aware of these things they do, so I am not really negligent”,
from interviewees I1 and I2 respectively, others showed a particular knowledge and
conviction of how the data is collected, for example, interview I3 mentioned, “I am
0
1
2
3
4
5
6
7
8
9
Amount of interviewees per Platform
25
quite aware of how they may use the data or sell”. Only one participant was not
aware of the data being collected, stating, “I don`t know that….my information
probably be taken away while I don`t know”, [Interview I4].
4.3.2 AWARENESS OF THE PURPOSE OF DATA COLLECTION
Awareness of the purpose of data collection subtheme was created to allocate the
ways in which the participants expressed their knowledge that their data is being
used for a particular purpose. As showed in the subtheme above, most of the
interviewees were aware that their data is being collected. In this subtheme it will be
shown that also most of them are aware of the final use of that collected data,
except for the same interviewee that in the first subtheme was also not aware of
data collection, the rest showed some signs of knowledge. For example, I7
mentioned: “I wouldn`t say I am overly aware of particular ways, but I am aware that
is possible, I don`t know the how`s and exactly what”. 3 of them (I2, I5, I6) made
comments revealing their understanding that their data will be used in marketing
purposes. I5 stated “I know it`s valuable to collect people`s information and to be
able to pinpoint what it`s necessary marketable to people”. The other three (I7, I8,
I9) were able to be more specific and pinpoint that the purpose was targeted
marketing. For example, I8 claimed that “probably third party companies would use
that to target market products to you, to make profit”. Other purposes were also
identified, such as "surveillance" and "health care". I5 expressed that data will be
used: “Mostly in marketing, sometimes in defense…I mean marketing, surveillance,
health care…”.
4.4 PERCEPTIONS OF DATA COLLECTION
4.4.1 GENERAL PERCEPTIONS
For this subtheme were coded the most outstanding thoughts shared by the
interviewees in regards of their data being collected. Some mixed reflections were
identified. Two participants coincided in the fact that those third party companies
need to make money somehow, for example, I9 mentioned: “that is just the reality of
an online enterprise, they have to make money”. Two others (I5, I8) found it
unsettling that their activities were tracked. According to I5, it “is kind of creepy too,
you don`t want to be watched in your back for apps that are going to attack you”. On
the other hand, interviewee I3 found the end balance as positive, “if we did the pros
vs cons list, there are probably more pros to having this personal information out
26
there than there are cons, in my opinion”. I9 was positive that users would be
informed of any use of private data, “I think a lot of the information, transactions that
I am pretty sure happen with my consent and my knowledge”. I1 shared the
preference of searching for social media channels that provides more privacy, “I am
moving to Snapchat, because I feel it has a higher level of privacy”. There is a sense
of understanding of the business but at the same time some sense of worry that the
information is no longer under the owners' control, as stated by I3: “anybody can
actually set up an application and start actually retrieving Twitter data, you got not
control of who actually got the data and who is using it”.
4.4.2 FEELINGS
The feelings subtheme gathers all the different emotions the participants shared in
regards of data collection. The same as in the General Perceptions, there are mixed
emotions in regards of this subject. It seems it is related to if the participant
considers a good final destiny for their data or not and not being in control of
deciding this final purpose. Two described it as scary but in different contexts, for I1
it was related to not knowing the information's final destination (“It`s a bit scary,
because we don`t know what they are going to do with the information.”), while for I9
it was about data going to an unwanted destination (“Some sort of huge intelligence
network to try to then find out what every single person thinks and knows, then yeah
that can be a bit scary”). This interviewee is also one of the two interviewees,
together with I7, that coincide that they do not mind about data being used by third
parties. To this end, I7 mentioned: “I am saying basically I don`t mind about me”.
However, this same interviewee I9 stated that it depends of the intentions and
awareness of the user of the data use: “I think if my information it`s being used
maliciously, what I mean by that is that, if my information is being used by a
company that wanted to use me for its own profit without at least my understanding
it, I probably be a little bit annoyed”. Another interviewee, I6, expressed awareness
of data collection and being worried about that: “I read a lot of articles about it and it
makes me really worried”, while I8 expressed that is not that worried, “I supposed,
I`m probably not as worried as I should be”.
4.4.3 PERCEPTIONS OF A POSITIVE COLLECTION PURPOSE
This subtheme contains more common answers of what the interviewees considered
a good use of their personal information that has being collected, the following three
27
being the most voted: targeted advertising, security, sociology. 5 out of 9
interviewees agreed that targeted advertising is a positive asset. I6 mentioned: “I
guess if you thought of advertising, cause sometimes you can find good stuffs that
you did not realized you wanted”. I1 confirmed that this purpose improves the
service the user is receiving, “at the moment I am not bothered because is improving
my experience”, I2 stated it makes life more efficient “all those things, they make our
day better” “I am more productive with those apps… some are good, some are not,
gives you more information, sometimes they mess up”. Security was also
considered a positive purpose by 2 interviewees, for example, I5 said: “Positive
reasons, would be to help understand people first and second would be security”.
Another common purpose is sociology by 2 interviewees too, as I7 suggested: “The
idea that we have an idea of sociology from through digital methods it`s quite good”.
Analysis of trends was also considered positive by I6: “if you look at it more broadly
maybe if you analyse data you can find trends in data”.
4.4.4 PERCEPTIONS OF A NEGATIVE COLLECTION PURPOSE
For this subtheme were collected the negative opinions in regards to a particular
purpose of the collected data. Most of the interviewees expressed dissatisfaction
when their data was used for a marketing purpose but targeted wrongly. Three
interviewees (I2, I5, I8) mentioned situations related to that, with two of them
mentioning particular examples that happened to them, such as receiving
advertisements for products that they were not interested in purchasing, or that were
not gender appropriated, like female products being advertised to a male user,
which was the case mentioned by I5:
“I mean I am not a cross dresser, so if someone else looked at my system at that
moment and saw brassieres and panties I mean it see that I am trying to shop for
my girlfriend or something for me”.
Interviewee I2 had a similar situation, “if the coding makes an error, and because
they give information that is not pertain to you, and they keep bothering you on
something”. The interviewee also mentioned that excessive advertising can
negatively affect the efficiency of everyday routine, “anything that is fusing a lot of
information on me, that affects my own productivity and creativity”, turning the
marketing service into something irritating, “I cannot use the internet for free without
getting adverts, it´s annoying, that´s the only thing, but apart from that, I am alright”.
28
4.5 CONCERNS OF DATA COLLECTION
4.5.1 GENERAL CONCERNS
General Concerns subtheme gathers all the thoughts that worry the participants in
regards to their data being collected. The three repeated subjects with two votes
each from the 9 interviewees, were:
Fear of being hacked, as explained by I2: “I wonder if someone hacks on the
information, oh my God, like my bank accounts, or stuff like that”.
The uncertainty of how and by whom the information will be handled, as
mentioned by I9: “There is always that ‘What if your information ends up in
the wrong hands?’ Sort of question”. It was also mentioned by I1 that it can
turn into a scary situation: “It´s a bit scary, because we don´t know what they
are going to do with the information”, expressed also as a fear of the
unknown by I2: “I have a feeling like behind the scenes they are doing other
things that you are not supposed to be doing, like making clones of
everybody, like they can do stuffs that we don´t know about”.
There is also the fact that there is no way to escape from Internet history, as
I5 said: “I mean the internet always has the way of never forgetting because
information that is collected is already stored somewhere”, I2, who shared
the same thought, suggested to be careful of posting data, because “the
internet never forgets, if you don´t want it out there, don´t even share it, don´t
put it on social media”.
4.5.2 RISKS
Risks subtheme describes the issues the interviewees considered as a possible
exposure to risks due to data collection. Two of the interviewees considered
“infringement of privacy” a possible risk when asked about that, as for example I5
mentioned: “your privacy has been infringed, it´s inoperable now, I mean if you try
yourself to be a very private person, I think the moment you go in the internet, your
privacy is broken”. I3 considered that a major hazard for young people could be
sharing their location; “there are these applications where you can track geo-
surveillance on specific regions, children should be careful posting too much
personal information, location information, should be really careful, that´s the big
risk.”. Another possible risk would be the hackers trying to seek for information, as
mentioned by I5: “password is really the way forward right now, when people can
actually hack a secure server and download millions of passwords just to get
29
information”. Also it was taken into consideration that personal information, such as
the date of birth could be used to get bank account information, as pinpointed by I6:
“The risk is that if I´ve got personal information out there that lead to identify me,
people can use it fraudulently against me”. The interviewee also mentioned the
common risk that there is no control over the data: “I don´t know who has that data, I
don´t know where it goes, that´s the risk to me.”
4.5.3 PRECAUTIONS
For Precautions subtheme, were taken into consideration the main actions taken by
the interviewees in order to protect their data. Three participants mentioned trying to
be private as a measure to protect their information. For example, I2 stated: “I just
try to be private, apart, whatever I want to be out there I put it out there”. Two of
three (I3, I6) were also specific about mentioning that they are careful of what they
post on social media channels, as stated by I6: “I am quite careful what actually I put
on those platforms, like Facebook I barely post”. As well, it was mentioned by the
same participant that deleting information may help: “just removing data, and I don´t
post any much information”. Another common preventive action mentioned by two
interviewees was to change the “account privacy settings”, as described by I3: “if the
user doesn’t´ want their personal or private information being posted on social
media, they should really make their account private”. I3 also highlighted the
importance of setting the right privacy settings: “you really need to be careful with
privacy settings, so as long as you´ve got quite high privacy settings that means that
no one can actually go through your page”. Configuring a Virtual Private Network
was also a voted precaution by 2 interviewees, by for example I5: “I´ll probably get a
VPN to hide my IP address from being public”.
4.6 DISCUSSION
In regards to their awareness, most of the interviewees were aware of their data
being collected even when they are not sure what for or how. The majority of the
interviewees were aware of the purposes for that collected data, such as targeted
marketing, surveillance, behaviour tracking and healthcare. From the interviewees,
mixed reflections were identified. The different perceptions confirmed Malhotra, Kim
& Agarwal’s (2004) claim about subjectivity of the company’s intentions to the user’s
beliefs. Some users acknowledged that third party companies need to make money,
showing a sense of understanding of the business side, but at the same time some
30
sense of worry that the information is no longer under the owners’ control.
Marketing, for example, was the dominant topic among interviewees, with mixed to
positive results. Interviewees identified targeted advertising, as well as security and
sociology, as positive purposes for their personal information use. However,
interviewees expressed frustration and worry with their activities being tracked. They
also expressed dissatisfaction when their data was used for a marketing purpose but
targeted wrongly or used excessively. The end balance was positive, with
interviewees identifying more positives than negatives.
Interviewees identified many concerns of data collection, a lot of them fears and
risks. Amongst risks, they mentioned hacking, identifying user from personal data
and privacy infringement. Users feared not knowing where the data will end up and
who might end up tracking them. However, overall, the interviewees were willing to
make concessions to their data use as long as it benefitted them, just like Sayre and
Horne claimed (cited in Norberg, Horne & Horne, 2007). People said they were
scared of the surveillance factor, however they felt safe with it, so they would rather
have that, even when they did not completely like it. It meant that they agreed to
having their privacy invaded to protect their personal safety. None of the
interviewees mentioned concerns over lack of legal regulations. This confirms
Obar’s (2015) opinion that regular people do not bother with the legal considerations
of data collection, even when it is them that tick away their consent.
It was also confirmed that young people are thinking about preventative measures,
like Pybus, Cote & Blanke (2015) suggested. Users listed trying to be private as a
measure of data protection, deleting information, changing account privacy settings
and configuring a virtual private network. However, they also realised there is no
way to escape from Internet history and the easiest way is to keep more personal
data to themselves.
31
5 CONCLUSIONS
The overall aim of the dissertation was to analyse and investigate the students’
perceptions of their social media data being used for marketing, surveillance and
other purposes with or without their awareness. Numerous articles were read in
order to build the literature review to examine the general trends of data collection in
regards to big social media data, finding facts that were then contrasted with the
research findings. During the literature review, it was highlighted the how big data
from social networks became a source of business for marketing and also used for
surveillance purposes. In the legal frame it was noticed that there is a lack of strong
laws that actually protect the user that produces this data, meaning they are no
longer in control of their information. Also, it was mentioned that the users are aware
of the privacy policies but they do not bother on understand them and prefer to sing
in and trust that the company will not misuse their data. The literature review helped
build the questions for a semi-structured interview to investigate students’
awareness and perceptions. The responses were then analysed to answer the main
research questions.
RQ1: Are the students aware of their personal data being collected?
Most of the interviewees were aware that their data is being collected through social
media channels. They express their knowledge that the data that is being shared in
the internet will no longer be in control of the owner. The world wide web is hard to
manage, and they are aware that keeping as much information to themselves is the
most practical way.
RQ2: What feelings, emotions, reactions, does this situation produce in them?
Data collection provokes in the participants different types of reactions, for instance
they do understand that social media is a business and needs to make profit
somehow. But at the same time they are worried that their data may fall in the wrong
hands, that may end up being used for undesired purposes or being used to identify
private individuals.
RQ3: What do they think about that situation, if it is positive or negative?
They do consider some purposes positive, as targeted advertising, that kind of
provides them with first-hand information that can be considered useful when they
32
are actually looking for something in particular, but at the same time they expressed
that mistaken targeted advertising can turn into something annoying. Security and
surveillance were another topic that had mixed reactions. Participants considered
that somehow it makes them feel safe, knowing that location is being used for
surveillance, because it gave that sense of personal security, but at the same time
they do not want to be watched. Also, it was mentioned that young people are not
completely aware of the consequences of sharing location and that may turn into a
problem, because it is easy to track down population using geolocations apps.
RQ4: What measures have they taken or plan to take in regards to that?
Most of the interviewees agree that they do want to keep their privacy, in order to do
so, some measurements were taken into consideration. Some of them changed their
account settings in order to make their profiles private, others decided to delete
personal information from their social media profiles that may lead to identifying
them and use that data against them. Others tried to configure virtual private
networks so their information is transferred in a secure way.
Overall, the participants were aware of the situation, they knew their data is being
collected and they were just living with it. For better or for worst, they did not actually
feel affected, they did not like their private information being used, but they just
accept it, they think that is how the social media world works, and as long as it does
not put them in a dangerous situation, that will not stop them from their normal
activities in the online media.
5.1 LIMITATIONS AND RECOMMENDATIONS
This study had a number of limitations. First of all, the study did not take into
consideration age, gender, nationality, cultural background or any demographic
data. Secondly, another one of the limitations of this research was using a single
qualitative research approach. A subsequent research would benefit from using
triangulation by combining research data with quantitative data from an online
questionnaire, or using a different qualitative approach, such as grounded theory, to
assess interview data without reviewing the literature first. Also, the results were
rather ambiguous, as expected within the literature. Due to these limitations of the
research, it would be recommended that further studies will be carried out. The
following study could gather a number of responses from a wider geographical area
or a wider organisational background. A more comprehensive analysis, using a
33
combined triangulation method, that could search for quantitative results in a wider
sample. After a questionnaire, interviews could be done with a wider sample in order
to get clearer overall insights of the general trends among the participants.
Because it was established that the reception to privacy concerns is mixed, future
research could focus primarily on what concerns users have over their data. With
the lack of regulations and users being indifferent, potential research should try to
collect users’ opinions on if and how data collection should be regulated. Another
specific study could focus on the ways to combat privacy issues, in addition to the
options suggested by the current interviewees. With the current events in social
media like Facebook and WhatsApp scandal, it would be extremely valuable to
investigate more about how to efficiently protect their social media data.
34
6 REFERENCES
Aronson, J. (1995). A Pragmatic View of Thematic Analysis. The Qualitative Report,
2(1), 1-3. Retrieved from http://nsuworks.nova.edu/tqr/vol2/iss1/3
Boyd, D., & Crawford, K. (2012). CRITICAL QUESTIONS FOR BIG DATA.
Information, Communication & Society, 15(5), 662–679. doi:
10.1080/1369118X.2012.678878
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative
Research in Psychology, 3(2), 77–101. doi: 10.1191/1478088706qp063oa
Cukier, K., & Mayer-Schoenberger, V. (2013). The Rise of big data. Foreign Affairs,
92(3), 27–40. Retrieved from
http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=87000329
&site=ehost-live
Dunn, W. N. (1983). Qualitative Methodology. Science Communication, 4(4), 590–
597. doi: 10.1177/0164025983004004007
Eynon, R. (2013). The rise of Big Data: what does it mean for education, technology,
and media research? Learning, Media and Technology, 38(3), 237–240. doi:
10.1080/17439884.2013.771783.
Facebook. (2016). Statement of Rights and Responsibilities. Retrieved from
https://www.facebook.com/terms
Graeff, T. R., & Harmon, S. (2002). Collecting and using personal data: consumers’
awareness and concerns. Journal of Consumer Marketing, 19(4), 302–318.
doi: 10.1108/07363760210433627
Gov.uk (2015). Data protection. Retrieved from https://www.gov.uk/data-
protection/the-data-protection-act
Kennedy, H., & Moss, G. (2015). Known or knowing publics? Social media data
mining and the question of public agency. Big Data & Society, 2(2),
2053951715611145. doi: 10.1177/2053951715611145
Kitchin, R., & McArdle, G. (2016). What makes Big Data, Big Data? Exploring the
ontological characteristics of 26 datasets. Big Data & Society, 3(1), 1-10. doi:
10.1177/2053951716631130
35
Lyon, D. (2014). Surveillance, Snowden, and Big Data: Capacities, consequences,
critique. Big Data & Society, 1(2), 1-13. doi: 10.1177/2053951714541861
Lythcott, J., & Duschl, R. (1990). Qualitative research: From methods to
conclusions. Science Education, 74(4), 445–460. doi:
10.1002/sce.3730740405
Malhotra, N. K., Kim, S. S., & Agarwal, J. (2004). Internet Users’ Information Privacy
Concerns (IUIPC): The Construct, the Scale, and a Causal Model.
Information Systems Research, 15(4), 336–355. doi: 10.1287/isre.1040.0032
Marr, B. (2016, March 15). 17 Predictions About The Future Of Big Data Everyone
Should Read [Blog post]. Forbes. Retrieved from
http://www.forbes.com/sites/bernardmarr/2016/03/15/17-predictions-about-
the-future-of-big-data-everyone-should-read/#4d361826157c
Norberg, P. A., Horne, D. R., & Horne, D. A. (2007). The Privacy Paradox: Personal
Information Disclosure Intentions versus Behaviors. Journal of Consumer
Affairs, 41(1), 100–126. doi: 10.1111/j.1745-6606.2006.00070.x
Obar, J. A. (2015). Big Data and The Phantom Public: Walter Lippmann and the
fallacy of data privacy self-management. Big Data & Society, 2(2), 1-16. doi:
10.1177/2053951715608876
Peacock, S. E. (2014). How web tracking changes user agency in the age of Big
Data: The used user. Big Data & Society, 1(2), 1-11.
http://doi.org/10.1177/2053951714564228
Phelps, J., Nowak, G., & Ferrell, E. (2000). Privacy Concerns and Consumer
Willingness to Provide Personal Information. Journal of Public Policy &
Marketing, 19(1), 27–41. doi: 10.1509/jppm.19.1.27.16941
Pybus, J., Cote, M., & Blanke, T. (2015). Hacking the social life of Big Data. Big
Data & Society, 2(2), 1-10. doi: 10.1177/2053951715616649
Qu, S. Q., & Dumay, J. (2011). The qualitative research interview. Qualitative
Research in Accounting & Management, 8(3), 238–264.
doi:10.1108/11766091111162070
Scarfi, M. (2012, June 28). Social media and the big data explosion. Forbes.
Retrieved from http://www.forbes.com/sites/onmarketing/2012/06/28/social-
media-and-the-big-data-explosion/#7b6d9a4f6aa7
36
Schechner, S., & Koh, Y. (2016, August 29). European Regulators Scrutinize
WhatsApp Data-Sharing Plan With Facebook. The Wall Street Journal.
Retrieved from http://www.wsj.com/articles/european-regulators-scrutinize-
whatsapp-data-sharing-plan-with-facebook-1472506175
Skeggs, B., & Yuill, S. (2015). Capital experimentation with person/a formation: how
Facebook’s monetization refigures the relationship between property,
personhood and protest. Information, Communication & Society, 19(3), 380-
396. Retrieved from
http://www.tandfonline.com/doi/full/10.1080/1369118X.2015.1111403
Soares, L. (2012). The Rise of Big Data. EDUCAUSE Review, 47(3), 60-61.
Retrieved from http://er.educause.edu/~/media/files/article-
downloads/erm1237.pdf
Statista. (2016A). Number of social network users worldwide from 2010 to 2020 (in
billions). Retrieved from http://www.statista.com/statistics/278414/number-of-
worldwide-social-network-users/
Statista. (2016B). Age distribution of active social media users worldwide as of 3rd
quarter 2014, by platform. Retrieved from
http://www.statista.com/statistics/274829/age-distribution-of-active-social-
media-users-worldwide-by-platform/
Tynan, D. (2016, August 25). WhatsApp privacy backlash: Facebook angers users
by harvesting their data. The Guardian. Retrieved from
https://www.theguardian.com/technology/2016/aug/25/whatsapp-backlash-
facebook-data-privacy-users
Vaismoradi, M., Turunen, H., & Bondas, T. (2013). Content analysis and thematic
analysis: Implications for conducting a qualitative descriptive study. Nursing
& Health Sciences, 15(3), 398–405. doi: 10.1111/nhs.12048
Zwitter, A. (2014). Big Data ethics. Big Data & Society, 1(2), 1-6. doi:
10.1177/2053951714559253
37
7 APPENDICES
7.1 RESEARCH ETHICS APPLICATION
38
39
40
41
42
7.2 INVITATION EMAIL
43
7.3 CONSENT FORM
The University of Sheffield Information School
Perceptions and awareness of data collection in social media.
Researchers Sara Michelle Urrea Aguilera ([email protected])
Purpose of the research To analyse and investigate student´s perceptions of their social data being used for marketing, surveillance and other purposes with or without their awareness.
Who will be participating? We are inviting all higher education/university students over the age of 18.
What will you be asked to do? We will ask you a series of questions about your understanding of and views about personal data collection in social media. We would like you to provide as in-depth an answer as possible.
What are the potential risks of participating? The risks of participating are the same as those experienced in everyday life.
What data will we collect? We will collect the responses from a number of interviewees. All participants will be audio recorded during the interviews, and some anonymised notes will be taken by the interviewer.
What will we do with the data? A transcript will be created of each audio recording. The collected responses will be analysed and discussed in my dissertation for the master’s degree.
Will my participation be confidential? We will anonymize the responses of all interviewees. No identifying personal information will be used for the project research after the interviews. All data will be stored in a secure location on the Information School´s research data drive, which can be accessed only by me, my supervisor, and the School´s Examinations Officer and ICT staff operating the facility. I will also back up the data and store a password protected version on my laptop. All data will be deleted once the dissertation is accepted.
What will happen to the results of the research project? The results of this study will be included in my master’s dissertation which will be publicly available. The results may also be published e.g. as a scholarly journal article. Please contact the School in six months.
44
I confirm that I have read and understand the description of the research project, andthat I have had an opportunity to ask questions about the project.
I understand that my participation is voluntary and that I am free to withdraw at anytime without any negative consequences.
I understand that if I withdraw I can request for the data I have already provided to bedeleted, however this might not be possible if the data has already been anonymisedor findings published.
I understand that I may decline to answer any particular question or questions, or to doany of the activities.
I understand that my responses will be kept strictly confidential, that my name oridentity will not be linked to any research materials, and that I will not be identified oridentifiable in any report or reports that result from the research, unless I have agreedotherwise.
I give permission for all the research team members to have access to my responses.
I agree to take part in the research project as described above.
Participant Name (Please print) Participant Signature
Researcher Name (Please print) Researcher Signature
Date
Note: If you have any difficulties with, or wish to voice concern about, any aspect of your participation in this study, please contact Dr Jo Bates, Research Ethics Coordinator, Information School, The University of Sheffield ([email protected]), or the University Registrar and Secretary.
45
7.4 RESEARCH ETHICS APPROVAL LETTER
46
7.5 INTERVIEW QUESTIONS
QUESTIONS:
In this interview I will be asking you about your awareness and perceptions of
personal data collection in social media. As you may have noticed, we have reached
an era where we are voluntarily and involuntarily giving away our private personal
information to third party companies that profit of our data and allowing them to use
it for their own means.
1. Which social media platforms do you use on a regular basis?
2. How aware do you feel you are about the ways in which your personal data
is being collected in your social media channels, for example, Facebook,
Twitter, Weibo?
3. In which ways do you think your personal details are being collected by
social media platforms?
In case they say they don’t know or fail to elaborate:a. By using personal data during registration
b. By uploading pictures
c. By conversing through messaging services
d. By allowing third party applications access your data, E.g. your GPS
location.
e. By tracking your activity on other websites
4. How aware do you feel you are about the ways in which your personal social
media data is being used by third parties other than the social media
companies that collect it?
5. Can you give some examples of how you think your social media data is
being used by third parties?
6. How does it make you feel that your social media data is being collected and
used for such purposes? / What emotions you have about sharing sensitive
personal data in this way? Why do you think you feel this way?
7. What specific risks and concerns do you think sharing personal data on
social media poses?
8. Can you think any positive reasons for personal data being collected? What
would they be?
9. Are you currently doing anything in regards to your personal data being
collected? Is there anything that you would like to do or plan to do in the
future but haven’t got around to yet?
47
7.6 ACCESS TO DISSERTATION
Access to Dissertation
A Dissertation submitted to the University may be held by the Department (or
School) within which the Dissertation was undertaken and made available for
borrowing or consultation in accordance with University Regulations.
Requests for the loan of dissertations may be received from libraries in the UK and
overseas. The Department may also receive requests from other organisations, as
well as individuals. The conservation of the original dissertation is better assured if
the Department and/or Library can fulfill such requests by sending a copy. The
Department may also make your dissertation available via its web pages.
In certain cases, where confidentiality of information is concerned, if either the
author or the supervisor so requests, the Department will withhold the dissertation
from loan or consultation for the period specified below. Where no such restriction is
in force, the Department may also deposit the Dissertation in the University of
Sheffield Library.
To be completed by the Author – Select (a) or (b) by placing a tick in the
appropriate box
If you are willing to give permission for the Information School to make your
dissertation available in these ways, please complete the following:
X (a) Subject to the General Regulation on Intellectual Property, I, the author,
agree to this dissertation being made immediately available through the
Department and/or University Library for consultation, and for the
Department and/or Library to reproduce this dissertation in whole or part in
order to supply single copies for the purpose of research or private study
48
(b) Subject to the General Regulation on Intellectual Property, I, the author,
request that this dissertation be withheld from loan, consultation or
reproduction for a period of [ ] years from the date of its submission.
Subsequent to this period, I agree to this dissertation being made available
through the Department and/or University Library for consultation, and for
the Department and/or Library to reproduce this dissertation in whole or
part in order to supply single copies for the purpose of research or private
study
Name Sara Michelle Urrea Aguilera
Department Information School
Signed Sara Urrea Date 01/09/2016
To be completed by the Supervisor – Select (a) or (b) by placing a tick in the
appropriate box
(a) I, the supervisor, agree to this dissertation being made immediately
available through the Department and/or University Library for loan or
consultation, subject to any special restrictions (*) agreed with external
organisations as part of a collaborative project.
*Special
restrictions
(b) I, the supervisor, request that this dissertation be withheld from loan,
consultation or reproduction for a period of [ ] years from the date of its
submission. Subsequent to this period, I, agree to this dissertation being
made available through the Department and/or University Library for loan
or consultation, subject to any special restrictions (*) agreed with external
organisations as part of a collaborative project
Name
Department
49
Signed Date
THIS SHEET MUST BE SUBMITTED WITH DISSERTATIONS BY
DEPARTMENTAL REQUIREMENTS.