data loss/leakage prevention1026824/fulltext02.pdf · data loss/leakage prevention (dlp) is a...
TRANSCRIPT
MASTER'S THESIS
Data Loss/Leakage Prevention
Hariharan SethuramanMohammed Abdul Haseeb
Master (120 credits)Master of Science in Information Security
Luleå University of TechnologyDepartment of Computer science, Electrical and Space engineering
Master Thesis - Data Loss/Leakage Prevention Page 1
Master Thesis – A7009N
Masters in Information Security
August 2012
Data Loss/Leakage Prevention (DLP)
Department of Computer and Electrical Engineering Division of Computer and System Science SE-971 87 Luleå SWEDEN
This Master Thesis is submitted to the Department of Business Administration and Social Science, Division of Computer and System Science at LULEÅ UNIVERSITY OF TECHNOLOGY in partial fulfillment of the requirements for the degree of Master of Science in Information Security. The Master Thesis is equivalent to 26 weeks of full time studies.
Master Thesis - Data Loss/Leakage Prevention Page 2
Contact Information:
Project Member(s): Mohammed Abdul Haseeb Hariharan Jadavallur Sethuraman
E-mail: [email protected] E-mail: [email protected]
University Advisor(s):
Jorgen Nilsson, Svante Edzen & Soren Samuelsson
LULEÅ UNIVERSITY OF TECHNOLOGY Department of Computer & Electrical Engineering Division of Computer and System Science SE-971 87 Luleå SWEDEN
Email: [email protected] [email protected] [email protected]
Master Thesis - Data Loss/Leakage Prevention Page 3
ABSTRACT
In today‟s business world, many organizations use Information Systems to manage their
sensitive and business critical information. The need to protect such a key component of the
organization cannot be over emphasized. Data Loss/Leakage Prevention has been found to be
one of the effective ways of preventing Data Loss.
DLP solutions detect and prevent unauthorized attempts to copy or send sensitive data, both
intentionally or/and unintentionally, without authorization, by people who are authorized to
access the sensitive information. DLP is designed to detect potential data breach incidents in
timely manner and this happens by monitoring data.
Data Loss Prevention is found to be the data leakage/loss control mechanism that fits naturally
with the organizational structure of businesses. It not only helps the organization protect
structured data but it also helps protection and leakage prevention of unstructured data.
DLP is considered as preventive control which when applied helps organization prevents data
leakage of sensitive information (Personal identifiable information, financial information, trade
secrets, merger and acquisitions etc.).
The DLP solution is not only for the big organizations and for particular industry sector like
banking and finance but it is a need for small organizations and other fields of business (Health
care, aviation, consulting etc.) due to various Laws and Regulatory requirement by different
countries.
In this thesis we have taken a case study of one of the organization which is a fortune 500
company head quartered in US and spread all across the globe, and is having its business majorly
in payroll processing and HR solutions. The sensitivity of the nature of work and the data the
organization has made us to do a detailed study and case research of DLP in conjunction with the
previous technologies organization has after implementing the DLP solution.
Master Thesis - Data Loss/Leakage Prevention Page 4
We have done detailed study and research on the security gaps DLP has narrowed in the
organization by studying all the technologies prior to implementing DLP.
Further research on DLP is also covered in this thesis; here we have taken one of the hottest and
emerging technologies in market, i.e. cloud computing with DLP. It's always said that one of the
major disadvantage of adopting cloud computing is Security, here in this report we have tried to
find and analyze the gap that can be filled by integrating DLP with cloud computing. A more
detailed study and research has to be taken in this area of cloud computing with DLP.
Master Thesis - Data Loss/Leakage Prevention Page 5
ACKNOWLEDGMENT
We would like to thanks Almighty for giving us an opportunity to study in Luleå University of
Technology and to work on this Master Thesis of Data Loss Prevention. A very big 'thank you' to
our Professors and supervisors for their guidance and support and helping us to complete this
required work. To our organization supervisor for mentoring us throughout the thesis with
valuable inputs and for his comments and suggestions that have shaped this work. To all our
lecturers and friends for very useful comments and suggestions to fine tune thesis. To our
families, for their love and support throughout the duration of our course, especially during the
Master Thesis course. And to all those who have in their various ways are contributing to our
successful studies in LULEÅ UNIVERSITY OF TECHNOLOGY.
We run short of words for expressing our deepest thanks to our university supervisors Savante
Edzen, Jorgen Nilsson and Soren Samuelsson for their constant support and help and valuable
suggestions throughout the course.
Master Thesis - Data Loss/Leakage Prevention Page 6
DECLARATION
This Thesis is subject to Luleå University of Technology Thesis confidentiality agreement.
Quotation from is permitted, provided that the name of the Organization in which it was
carried out remains anonymous; Copyright of this Thesis rests with the authors. This Thesis
may not be reproduced under any circumstances.
Master Thesis - Data Loss/Leakage Prevention Page 7
CONTENTS
Abstract 3
Acknowledgement 5
Declaration 5
CHAPTER I
1.1 Introduction 10
1.2 Objective 11
1.3 Definition of Security gap 12
1.4 Organization Details 12
1.5 Problem Definition 13
1.6 Thesis Outline 13
1.7 Research Questions 14
1.8 Thesis Limitation 14
CHAPTER II
2. Literature review on DLP technology
2.1 Introduction on DLP 15
2.2 Defining the Data Loss 15
2.3 What data is sensitive? 16
2.4 Why data loss is so prevalent 17
2.5 Sizing up the data loss 17
2.6 DLP Key Features 20
2.7 DLP Limitations 20
2.8 Previous Research on DLP 20
CHAPTER III
3. Research Methodology
3.1 Research Approach 23
3.2 Research Purpose 24
3.3 Research Strategy 25
3.4 Data Collection Method 25
3.5 Analysis Plan 26
Master Thesis - Data Loss/Leakage Prevention Page 8
CHAPTER IV
4. Theoretical Study on Data Leakage Problems
4.1 Introduction 28
4.2 Definition of Data Leakage 28
4.3 Description of Data Leakage Problems 28
4.4 Classification of Data Leakage 29
4.5 Causes for Data Leakage Problems 31
CHAPTER V
5. Empirical Study and Data Analysis
5.1 Why Organization is in need of DLP technology? 34
5.2 Organization Details and History 34
5.3 Usage of DLP Technology in contrast with previously used
technologies in organization 35
5.4 Empirical study on DLP products and its implementation 39
5.4.1 DLP Network 41
5.4.2 DLP Datacenter 42
5.4.3 DLP Endpoint 43
5.4.4 Design the Deployment 43
5.4.5 DLP Dashboard 45
5.4.6 DLP Admin Console 46
5.4.7 Content Analysis and Policy Application
Content Blades 48
Policies 48
5.4.8 Incidents 49
5.4.9 Reports 51
CHAPTER VI
6. Analysis
6.1 Research on Security gap analysis 53
6.1.1Research Question 53
Master Thesis - Data Loss/Leakage Prevention Page 9
6.1.2 Data Leakage Problems in Case settings 56
Introduction 56
Case 1 57
Case 2 57
Case 3 57
6.2 Analysis of DLP existence in solving Data Leakage Problem 58
CHAPTER VII
7.1 Conclusion 61
7.2 Future Research on DLP 63
7.3 References 65
CHAPTER VIII
8. Appendices
8.1 Interview Questionnaire for data analysis 69
8.2 Abbreviations 72
List of Figures
1. Pragmatic data Security Cycle 11
2. Internal Security Breach Causes 17
3. Statics of Leading Data Loss 33
4. DL DLP Suite Components 39
5. DLP Network 41
6. DLP Datacenter 42
7. DLP Endpoint 43
8. Development of DLP Suite 44
9. DLP Dash Board 46
10. DLP Admin Console 47
11. DLP Policies 49
12. Incident Remediation Workflow 50
13. DLP Reports 52
Master Thesis - Data Loss/Leakage Prevention Page 10
CHAPTER I
1.1 Introduction on Data Loss/Leakage Prevention
Data loss, which means a loss of data that occur on any device that stores data. It is a problem for
anyone that uses a computer. Data loss happens when data may be physically or logically
removed from the organization either intentionally or unintentionally. The data loss has become
a biggest problem in organization today where the organizations are in responsibility to
overcome this problem.
Data Leakage is an incident when the confidentiality of information has been compromised. It
refers to an unauthorized transmission of data from within an organization to an external
destination. The data that is leaked out can either be private in nature and are deemed
confidential whereas Data Loss is loss of data due to deletion, system crash etc. Totally both the
term can be referred as data breach, has been one of the biggest fears that organization face
today.
Data Loss/Leakage Prevention (DLP) is a computer security term which is used to identify,
monitor, and protect data in use, data in motion, and data at rest[1]. DLP is sued to identify
sensitive content by using deep content analysis to per inside files and with the use if network
communications. DLP is mainly designed to protect information assets in minimal interference
in business processes. It also enforces protective controls to prevent unwanted incidents. DLP
can also be used to reduce risk, and to improve data management practices and even lower
compliance cost.
Systems are designed to detect and prevent unauthorized use and transmission of confidential
information. Vendors refer to the term as Data Leak Prevention, Information Leak Detection and
Prevention (ILDP), Information Leak Prevention (ILP), Content Monitoring and Filtering
(CMF), Information Protection and Control (IPC) or Extrusion Prevention System by analogy to
Intrusion-prevention system[1].
In this paper, the researcher deals with both the terms data loss and data leakage in analyzing
how the DLP technology helps in minimizing the data loss/leakage problem? The study is
performed as a case research on DLP technology in organizational perspective.
Data Loss Prevention Phases:
Master Thesis - Data Loss/Leakage Prevention Page 11
Why DLP?
To meet the various mandatory compliance and regulatory requirements, for e.g. the
Payment Card Industry (PCI) requirements for credit card handling
Prevent client, business or employee data from being incorrectly disclosed internally and
externally
Global capabilities with central configuration and enforcement
Alternate Uses of DLP:
Sensitive asset classification
Sensitive asset audits
Identity and access management audits
Applying encryption to sensitive assets
Applying enterprise digital rights management (EDRM) privileges to sensitive assets
1.2 Objective
As organization was facing issues with data loss, the objective of our Master Thesis is to analyze
the evaluation of how well DLP fills security gap in comparison with previously used technology
in a motive to solve data loss/ leakage problem. This is a very important need for the capability
to exchange confidential information securely and easily as the organization is dealing with
sensitive payroll data. This is done by doing a detailed study and a case research on Data Loss
Prevention technology in organization.
Master Thesis - Data Loss/Leakage Prevention Page 12
1.3 Definition of Security gap
In this part, first the meaning of gap is explained, and then meaning of security gap is described,
and followed by the causes of the gap that might lead to a security gap problem. .
What is meant by gap? Gap is sometimes called „the space between where we are and where we
want to be‟. The gap analysis is undertaken as means of bridging that space. It is a technique for
determining the steps that are nee to be taken in moving form a current state to a desired future
state. It begins with questionnaire “what is” and proceeds to “what should be” and finally
highlights the „gaps‟ that exist and need to be „filled‟.
Here comes what is security gap? Security gaps are nothing but the vulnerabilities or weakness
in the organization which is a threat and can be exploited to make an attack.
There are two ways of attacks such as External and Internal. External Attacks are those attacks
which are done by hackers and other people from the outside of an organization network. It is
done by finding the vulnerability and exploiting that to make an attack. Malware infection,
DDOS attack, Man in the middle are few types of attack which are done to gain monetary
benefits or to harm the organization assets.
Internal Attack is performed from the internal perimeter of the organization by a disgruntled
employee, contractors or vendors either for monetary benefits or to take away some confidential,
sensitive data out of the organization. Software code, PCI DSS information, financial reports,
NER report are few examples of inside attack which are performed from inside of the network.
Why the gap is a problem? The gap becomes a problem when there is a false feel of information
security is created as this false feeling does not protect against threats. This might due to the
causes such as organization may not be aware of information security risk to their operations, by
default acceptance of unknown level of risk, unconscious deciding on risk level, relaying on
ineffective controls, not able to justify the spending of security, etc.
Though organization has many security frameworks and techniques that are available today but
the overall security structure or measurements is far from acceptance. The false feel of security
has various causes such as interests, language, education, uncertainty, knowledge, view on
process control, and methods to handle information (in) security. All these can be looked at in
various ways[47].
1.4 Organization Details
The organization is one of the fortune 500 companies in the world which has its major business
in payroll processing and human resource solutions along with tax and computing solutions.
Master Thesis - Data Loss/Leakage Prevention Page 13
The organization is one of four U.S. companies to get a AAA credit rating from Standard &
Poor's (S&P) and Moody's.
We have kept the organization name anonymous due to its nature of business as it deals with
sensitive (payroll) data of more than 545,000 organizations.
1.5 Problem Definition
Data leakage problem became an existing problem in many organizations. This even became a
major problem in organizational view point. This particular problem arises in the context where
the sensitive data and company documents are protected by a security model. In organization, the
major data loss is mainly happening from insider attack. Though the security technologies such
as Firewall, IDS, IPS etc. (which are already implemented in organization) are very powerful but
can help majorly an outsider attack on data. Since these technologies doesn‟t help majorly on
insider attacks, so this results to a data leakage problem.
In looking at the above problem facing the prevention of data leakage problem, does
organization have what it takes to counter this problem? It is now evident that data must be
protected from data loss problem to give a competitive edge. But given this data leakage
problem, how does the organization supposed to prevent the data leakage problem?
1.6 Thesis Outline
Chapter II is a literature review of DLP technology and it provides details scenarios of type of
data loss and what sensitive data DLP can monitor and protect from leakages.
Chapter III presents the details of research methodology wherein we have given the details of
type of research and ways of data collection.
Chapter IV presents the details of Theoretical study on Data Leakage Problems
Chapter V gives empirical study and data analysis on overview of the organization along with
usage of DLP technology in comparison with previously used technology in organization.
Chapter VI gives results on the research question which focuses on the security gap filled by
DLP in concurrence to previous technologies and also brings in detail about data leakage
problems in case settings, and analysis part which brings out the analysis on DLP existence in
solving Data Leakage Problem and analysis of DLP products and its Implementation in
organization.
Master Thesis - Data Loss/Leakage Prevention Page 14
Chapter VII contains the conclusion, future research on DLP with cloud computing and
references
Chapter VIII contains Interview questions and appendices
1.7 Research Questions
How the security gap is filled using the DLP technology compared to previous techniques?
1.8 Thesis Limitation
Legal issues in adopting Data Loss Prevention in different countries and regions due to various
laws of Governance, Risk and Compliance.
Master Thesis - Data Loss/Leakage Prevention Page 15
CHAPTER II
2. Literature Review on Data Loss/Leakage Prevention
2.1 Introduction on DLP
In this chapter we have tried to bring the facts that are behind the engineering of Data Loss
Prevention technology. This chapter though it looks like a problem statement but it generally
speaks about the overall issues in and around the organizations. In a broader sense it gives an
understanding of the security breaches as examples which happened with various organizations,
the source of threat to data loss. It also focus on what type of data to be considered as sensitive to
the organization which can be monitored to avoid data leakages since all data/information cannot
be considered as sensitive.
The issue of data loss or data breach has been one of the biggest fears that most of the
organizations face today. The term DLP stands for Data Loss/Leakage Prevention, which was
introduced in 2006. Like numerous security products, DLP has improved and is beginning to
influence the security industry worldwide. While doing my research I came across a paper that
this DLP technology had been known to bear Information loss Prevention/Protection,
Information Leak prevention/Protection, and Extrusion Prevention. Later this DLP technology
gained some popularity in the early year 2007[2].
In some organizations, there is a wide hole in controlled and in secure environment which was
created to protect electronic assets. This hole is the way where the business and individuals
communicate with each other over the Internet.
Whether if it is an email/instant message/webmail/website form/file transfer/electronic
communications that are existing in the company still it is uncontrolled and unmonitored on their
way to their destinations along with the potential for confidential information to fall into the
wrong hands.
Data loss prevention (DLP) is interested in identifying sensitive data and also is one of the most
critical issues facing CIOs, CSOs and CISOs. DLP is now today‟s strict regulatory and ultra-
competitive environment. In creating and implementing a DLP strategy, the task can seem to be
intimidating. For this the effective solutions are available. This thesis presents best practices for
preventing leaks, enforcing compliance, protecting company‟s brand value and reputation in
organization[3].
2.2 Defining the Data Loss
The data loss issue is being exposed from confidential information about a customer to dozens of
company‟s product files and documents being sent to a competitor. This can be caused in many
Master Thesis - Data Loss/Leakage Prevention Page 16
ways either accidental or deliberate, or even with insiders in realizing sensitive data about
customer‟s personal information, intellectual property, or other confidential information in
violation of company policies and regulatory requirements.
Here considering few of high-profile examples:
- AOL posts search engine data contains personal information about its members
- DuPont employee leaks $400 million in intellectual property
- Former Ceridian employee who accidentally posts ID and also bank account data for 150
employees of an advertising firm on a website
Like the above there are many more data loss problem occurred and the list goes on[4].
In organization, today‟s employees with available access to electronically expose sensitive data,
the scope of sensitive data loss problem is greater than outsider‟s threat protection.
In order to cover all the loss bearings, an organization has the potential to encounter:
Data in motion – Any data that is moving through the network to the outside via the
Internet
Data at rest – Data that resides in files systems, databases and other storage methods
Data at the endpoint – Data at the endpoints of the network (e.g. data on devices such as
USB, external drives, laptops, mobile devices, etc)[3,4].
2.3 What Data is Sensitive?
The two important firms such as, regulatory compliance and intellectual property protection
airing data loss prevention efforts.
Regulatory Compliance: Almost very organization falls under one or more local, or International
regulatory mandates. The companies are required to take measurements to protect private and
personally-identifiable information, in whatever regulations they belongs too. Personally-
identifiable information can be used to uniquely identify a particular person (whether a customer,
employee, student or patient, etc). For example, in U.S., thirty-five states presently behest the
notification of individuals by the company who suffers a data loss in the event that their
personally identifiable information is breached[4].
Data loss is not only a fundamental problem data-sensitive field companies, but also for the
organization conducting the business worldwide.
Simple missteps can constitute regulatory violations. Missteps such as, sending a legitimate
email which contains unencrypted ATM password or credit card information or sharing a report
which includes employee personal data or medical data with an unauthorized person.
Master Thesis - Data Loss/Leakage Prevention Page 17
Intellectual Property Protection(IP): In today‟s environment, IP is a major concern for all
organizations. In protecting important assets of the business such as one trying to obtain secret
information and employee taking sensitive information with them, is a key driver of data loss
prevention efforts.
From 2006 report of United States Trade Representative, U.S. businesses are losing
approximately $250 billion annually cause of trade secret theft. The trade secrets can be
diagrams, flowcharts, program devices, formulas, patterns, techniques, supplier, etc. With these
considerations, the chances are good that employees may not even know they are handling IP [5].
Companies are in response to take necessary steps to better protect valuable IP from situations
such as;
Sending unreleased pricing information to the unauthorized or wrong email address.
Sending customer information by an employee without their knowledge to a third party
for financial gain,.
Sending ownership information to a distributor who might can forward it on to
competitors.
Forwarding an email containing business plans to another email address.
2.4 Why Data Loss is so prevalent?
Now the world is connected using electronic communication where we are electrically connected
in numerous ways. It doesn‟t matter where we are around the globe. Accessing the electronic
data has become more crucial in day-to-day business.
For instance, many companies have development offices at offshore level, and /or international
level in which all were exponentially increase the opportunity for data loss. Confidential
information can travel even to the far corners of the earth using simple email communication.
Over the years, organizations have spent large amount of resources in a motive to protect their
information. In their effort, majority was focused on preventing outsiders from hacking into the
organization. Unintentional information loss from employees and partners are the results of
majority of all leaks in leading firms. Research conducted on data loss prevention indicates that
more than half of security breaches are caused by insiders. In an organization, employees can
cause a sudden damage for their company even with the simple click of a mouse[3,5].
2.5 Sizing Up the Data Loss Problem
As companies exit safety procedure, the companies are paying for not monitoring and controlling
electronic communications. Most of the organizations scan inbound email for dangerous content,
Master Thesis - Data Loss/Leakage Prevention Page 18
and fail to check their outgoing email, IM and allowing the unauthorized transfer of sensitive
information outside of the organization.
Below is the estimation stated by Deloitte‟s Global Security Survey report[6];
49% of companies have experienced an internal security breach in the past year.
31% experienced a breach from a virus/worm incident.
28% through insider fraud.
18% by means of data leakage.
96% of respondents reported that they are concerned about employee misconduct
involving their information systems.
FIGURE 1: Nearly half of all companies surveyed have experienced an internal security breach
in the past year. The most common breach causes are outlined in the chart above [6].
(Source: Deloitte‟s Global Security Survey)
Data loss can cause an organization to violate compliance regulations. It can be a threat to a
considerable brand value that a company has built. In order to protect its investment in its assets
such as brand, products, partnerships and employees, a company can no longer bear to ignore
this hole in the corporate protection.
The prevalence of electronic communications, data in motion (i.e., data that is traveling through
and out of the network) is one of the most significant data loss vectors to address today. For
instance if an employee send a document to their personal email address, so that the employee
can work from home. Likewise if a working employee in a hospital sends patient information to
a wrong person.
There are also many accesses in which confidential data can leave an organization via the
internet:
• Webmail
Master Thesis - Data Loss/Leakage Prevention Page 19
• HTTP (message boards, blogs and other websites)
• Instant Messaging
• Peer-to-peer sites and sessions
• FTP
Firewall and other network security solutions which are currently being used not inc lude data
loss prevention capabilities in order to secure data in motion. Organization employee can still
leak confidential information if appropriate controls are in place within the company to address
the data loss problem.
In order to prevent company‟s employee, consultant and any other authorized user who transmits
the sensitive information outside the organization, companies should start to implement
comprehensive data loss prevention solutions and also the company must encompass multiple
layers such as email, Web, instant messaging and many more. The data loss solutions address the
risks inherent in data at rest and also data at the end-point[7].
DLP Traffic Cop
In order to tackle the vulnerability of data in motion, the companies are in need of traffic cop for
monitoring and controlling each and every communication that leaves the company.
A DLP solution prevents confidential data loss by monitoring communications which goes
outside of the organization, encrypting the email which contains confidential information,
enabling conformity with global privacy and data security mandates, in securing outsourcing and
also partner communications, by protecting intellectual property and preventing malware-related
data harvesting, enforcing acceptable use policies and also in providing a deterrent for malicious
users. In addition to this, DLP solution can also be instrumental in helping companies comply
with regulations.
An outbound email which contains personally identifiable information or personal information
can be encrypted automatically. Keeping DLP as best practices in mind can help the organization
to determine the right solution for a company‟s specific requirements on data loss problem[8].
A censorious component of data loss prevention is the definition and enforcement of acceptable
use policies for electronic communications. Appropriate use areas add policies against activities
such as illegal activities, or sending communications to certain parties without legal disclaimers.
In order to enforce appropriate use policies, DLP solution is the requirement for organizations
with the capabilities such as:
Blocking unlawful activities
Prohibiting the distribution of copyrighted contents
Master Thesis - Data Loss/Leakage Prevention Page 20
Preventing the use of gambling websites
Enforcing the message policy
Adding legal denials to outgoing mails.
2.6 DLP key features
DLP allows enterprise to:
- Monitor all network transmissions for sensitive content.
- Block or Quarantine transmissions in violation of policy.
- Protect all types of data within your enterprise.
Monitoring - DLP identifies a wide range of sensitive enterprise content, from information in
confidential documents, to customer and privacy related information, to content specified by
customers, or provided out-of-the-box.
Enforcing - DLP uses information gathered from monitoring to enforce enterprise data privacy
policies and to meet designated compliance requirements.
Analysis - DLP recognizes over 900 different file types. Analysis of the data is based on the
actual content of the file and not the extension that is used with the file[8].
2.7 DLP Limitations
What DLP can't find: DLP tools can only monitor encrypted content when they have access to the key, which is
typically limited to e-mail and Web traffic (with the use of a specialized SSL proxy).
Most tools are limited in their ability to detect "stealth" traffic (i.e., tunneled protocols),
unless a local host DLP agent is installed.
Endpoint tools are nascent, and many vendor offerings are limited in terms of scanning
and blocking capabilities.
Keeping the above mentioned limitation, a DLP technology has to be identified as no empirical
research has been taken on this area.
2.8 Previous Research on DLP
PR1: On exploring through web we have find a Thesis on Data Loss Prevention by Hannes
Kasparick done in 2008 in which the author has highlighted the data loss issue in the
organizations and there causes.
Master Thesis - Data Loss/Leakage Prevention Page 21
Most companies are not aware of industrial espionage and loss of intellectual property until a
high quality copy of one of their products is released at a low price by a competitor. Previous IT
security measures are limited to protect data from attackers from outside the company. Over the
past few years the IT Security Branch has developed methods to protect data from internal
attackers called "Data Loss", "data leakage" or "extrusion Prevention Systems ", respectively.
Conventional systems cannot access or firewall protect against attackers who have legitimate
access to sensitive data. Data leakage prevention systems close this gap and prohibit people who
have legitimate access from distributing secret data. The thesis describes the existing
technologies to reach this. Which objective circumstances and under the usage of a data leakage
prevention system makes sense. In addition, thesis describes security models and the judicial
environment is described.
In the beginning, the author shows how data can leave a company. The classification of data is
the second step to protect data. After this the technical implementation of a data leakage system
is described. An example, implementation of a data leakage prevention system in a fictional
company offers practical advice[9].
PR2: On exploring through web we have found another interesting thesis on Data Loss
Prevention Technologies by Tomoyoshi Takebayashi, Hiroshi Tsuda, Takayuki Hasebe, and
Ryusuke Masuoka done in 2009 in which the author has highlighted about guaranteed safety of
sensitive data in moving data outside the organization premises.
This paper introduces three technologies such as;
Secure information environment: making it safe to carry data
Secure communication: countermeasures against erroneous E-mail sending
Secure document management
The above mentioned technologies, that will enable information to be used in expanded
workplace with guaranteed safety, without placing a burden on the user. This further includes
working in offline at a customer‟s office and at other organizations including collaborating
companies.
The author begins by introducing a solution for moving data safely outside the organization
premises using universal serial bus (USB) memory device. Next, the author states that it presents
an E-mail filter for preventing the erroneous sending of E-mail containing sensitive information.
Finally the author ends with describing document management technology for protecting a
Master Thesis - Data Loss/Leakage Prevention Page 22
document across its entire lifecycle including editing. The thesis also touches the data loss
prevention technologies in the coming area of software as a service (SaaS) and Cloud computing.
This paper therefore described the current activities towards data loss prevention in a multi
organization environment. The author concludes that these three technologies have a
complementary relationship that supports a practical, composite solution to data threats and also
the risk involved in data loss via paper, which is known to be a major data leakage route, must
also be tackled. Therefore the technologies presented here must be linked with other security
technologies developed by Fujitsu laboratories, which is developing data loss prevention
technologies that integrate know-how related to mobile devices, data searching technologies, and
security technologies like encryption.
Master Thesis - Data Loss/Leakage Prevention Page 23
CHAPTER III
3. Research Methodology
The aim of this chapter is to give a highlight about the general research methodology used in this
study as well as tools used in data collection and analysis.
3.1 Research Approach
There are three kinds of research approaches in scientific research; Quantitative research,
Qualitative research and Mixed research. Different researcher gives different definitions to
qualitative research, qualitative research and mixed research. Here are some of them;
Quantitative Approach: A quantitative research is the one which involves strategies of inquiry
such as experiments and surveys, and collects data on predetermined instruments that yield
statistical data. Generally, the quantitative research aims at explanation which answers primarily
to why? Quantitative data collection is based on precise measurement using structured and
validated data collection instruments such as closed ended items, behavioral responses and rating
scales.
In addition, quantitative research is defined as social research that employs empirical methods
and empirical statements. The author states that an empirical statement is defined as descriptive
statement about “what is the case in the real world” rather than “what ought to be the case”.
Therefore quantitative research is essentially about collecting numerical data to explain a
particular phenomenon, particular questions seem immediately suited to being answered using
quantitative methods[10].
Qualitative Approach: A qualitative research is the one which involves strategies of inquiry such
as narratives, phenomenology‟s, ethnographies, grounded theory studies, or case studies.
Generally, the qualitative research is a type of scientific research that aims at understanding
which answers primarily to how? Qualitative data collection is based on in-depth interviews,
participant observation, field notes and open-ended questions. Here the research is the primary
data collection instrument.
“Participant observation [for collecting data on naturally occurring behavior‟s in their usual
contexts], In-depth interviews [for collecting data on individual perspectives, and experiences],
and Focus groups [also called as group interviews is effective on eliciting data on the cultural
groups]” are some kinds of qualitative research methods [46].
Mixed Approach: A mixed research involves the mixing of quantitative and qualitative
methods. The mixed approach involves strategies of inquiry such as collecting data either
simultaneously or sequentially to best understand research problem. The data collection involves
gathering both numeric information as well as textual information. The study begins with a broad
survey and then focuses on qualitative, open-ended interviews to collect detailed views from
Master Thesis - Data Loss/Leakage Prevention Page 24
participant [12]. There are three ways of mixing the data‟s such as merging the data, connecting
the data, and embedding the data. Though it is not enough to simply collect and analyze the
data‟s (both quantitative and qualitative) there is a need to be mixed together in order to form a
complete picture of the problem then they do when standing alone [12].
From the above details, we then believe our research is of qualitative approach. Therefore the
research need not to know statistical analysis as the quantitative approach suggest. The need to
conduct this research is to know the detailed understanding of how the DLP technology
minimizes the data loss problem in the organization[11].
3.2 Research Purpose
Generally, a research purpose can be classified into Descriptive, Co relational, Explanatory, and
Exploratory research.
Descriptive Research: Descriptive research can be either quantitative or qualitative. It is
primarily concerned with finding out "what is," about the research topic. The descriptive
research is heavily dependent on instrumentation for measurement and observation. This
type of research attempts to describe systematically a situation, problem phenomenon,
and describes attitudes towards an issue[12].
Co relational Research: In general, a co relational study is a quantitative method of
research that attempts to discover or establish the relationship between two or more
variables or aspects of a situation.
Explanatory Research: Explanatory research is conducted to explain any behavior in the
market that attempts to clarify why and how there is a relationship between variables.
The goal of explanatory research is to answer the question of why.
Exploratory Research: Exploratory research, also termed as formulative study, its main
purpose of formulating a problem for more precise investigation or of developing a
working hypothesis from an operational point of view. Exploratory research is conducted
to a problem where there are few or no earlier studies to refer to. This kind of research is
typically used to identify and obtain information on a particular problem and to gain or
discover general information and also a deeper insight of a phenomenon or nature of the
problem[13].
The purpose of this research is to explore how the organization ensures the data loss is prevented
though the use of DLP technology. The idea is to get a deeper insight of how this DLP
technology works and prevents the data loss and finally propose a framework the organization
can successfully used to prevent the data loss. In short, the purpose of this research is exploratory
in nature, which is a used to get deeper understanding.
Master Thesis - Data Loss/Leakage Prevention Page 25
3.3 Research strategy
Generally, research strategy is a way of collecting and analyzing empirical evidence by
following some logic. A research design is the logic that links the data to be collected and the
conclusions to be drawn to the initial questions of the study, it ensures coherence[14]. There are
five major research strategies; experiments, research survey, archival analysis, histories, and case
studies. Each strategy has its own strength and weakness and can be utilized for all three research
purposes; exploratory, descriptive, and explanatory[16].
Case study research involves the study of an issue explored through one or more cases through a
boundary system. The author also states that it is a qualitative approach in which the investigator
explores a case in detailed, and in depth data collection involving multiple sources of
information and depicted a case description and case based themes[12]. The intent of case analysis
exists in three variations such as single instrumental case study, the multiple case studies, and the
intrinsic case study[12]. In a single instrumental case study, and then selects one bounded case to
illustrate the issue. In a collective case study, the one issue is again selected but the inquirer
selects multiple case studies to illustrate the issue. The intrinsic case study focuses on the case
itself because the case presents an unusual and unique situation.
This research therefore is designed in form of a case study, a single instrumental case study to be
more definite. The research focuses on phenomenon, which is “How do the DLP technology
helps in minimizing the data loss/leakage problem in conjunction with previously used
technologies in the organization?” and it was examined in there different ways such as the
product, people and process.
3.4 Data Collection Method
Generally, qualitative research often emphasizes the human factor to understand their behavior,
knowledge, altitudes and fears. The qualitative research involves qualitative data that are
obtained through methods such as surveys or interviews, on-site observations, and focus groups.
„Data are the empirical evidence or information one gathers carefully according to rules or
procedures‟[17]. We also found that aim of data collection strategy is to obtain answers from
different sources and this will let the researcher to describe, compare, and relate one
characteristic to another and demonstrate that certain feature exist in certain categories [18].
Case study is a qualitative approach in which the investigator explores a case in detailed, and in
depth data collection involving multiple sources of information (such as observation, interviews,
documents, audio visual materials) and reports a case description and case based themes.
Basically, there are two types of data collection methods; Primary and secondary[11].
Primary data collection: This processes three different types of strategies; interview,
questioning, and observation. It is the most substantial method in all qualitative inquiry. It is
Master Thesis - Data Loss/Leakage Prevention Page 26
first-hand information collected through various methods such as observation, interviewing,
mailing, etc.
Secondary data collection: This has been collected and processed by other researchers for
different purposes than what it is sued for. It is a very common practice to collect, process,
utilizes, and store data by companies and organizations for the support of their operation. The
secondary data are mostly collected from sources such as magazine, news paper, TV, internet,
reviews, and research articles.
For this research, interviews, observations, documents, and reports have been extensively used as
a form of data collection. Main data‟s are captured from the company internal knowledge base
(real time data or empirical data) as one of our researcher is working for the organization on Data
Loss Prevention project and another researcher have worked in conducting interview questions in
to gather the project details with respect to thesis outline. Both closed and open ended questions
were used during the interviews, and the interviews were performed in email system. Along with
this, security journals, DLP books such as (Data Leak Prevention - ISACA), and are used in
collecting the data.
3.5 Analysis Plan
Generally, when all the required data are collected, the data analyzing process will be started.
Data analysis is interplay of inquiry between experience and knowledge that we as the inquirers
bring to the research, and the information that is embedded in the data. Analysis is an ongoing
part of the research process – from the framing of the initial questions, the designing of the first
program strategy to the “testing” of a set of beliefs about why and how things happen the way
they that do and what the results of our program efforts been on an individual or community.
Analysis tells us different things at different points in our process of learning more and
understanding our work better.
The data analysis consists of examining, categorizing, tabulating, testing, and recombining of
both quantitative and qualitative data in order to achieve the purpose of the study. He also stated
that analysis plan strategy should help the researcher to choose a method that completes the
analysis of the research. Coding, pattern coding, and data visualization in a matrix were
techniques used to analyze the data. Coding is the process of labeling the researcher‟s
interpretations of units of meanings in the data. This is where the segments of data are
summarized. Pattern coding is a way of grouping those summaries into emerging themes that are
explanatory in nature[18]. Meanwhile the author again states that there are four analyzing
techniques when conducting case study research:
Pattern matching: Comparing empirically based pattern with a predicted one.
Master Thesis - Data Loss/Leakage Prevention Page 27
Explanation building: A type of pattern matching, where the goal is to analyze the case study
data by stipulating a set of casual links about it.
Times-series analysis: Multiple measures of the different variable in order to look at changes
over time.
Program logic models: Combination of pattern matching and time-series analysis, where the
complex chain of patterns over time is being stipulated.
In basis of theoretical framework of the study, the data analysis is done in relation to information
security as a product, process, and people. As the researcher stated earlier that by reviewing DLP
books, security journals, e-papers, and with company internal data (empirical data), the next
frame of reference is to make interview questions with company security officer that help the
researcher to understand about the data loss prevention using DLP technology in the
organization. After all this done, then the data will be analyzed. For this study, pattern matching
is more suitable as compare to other methodologies, since the method give the researcher the
ability to compare and match the answers with frame of reference of research to understand how
the security gaps are filled using DLP technology in comparison with previously used
technology in the organization. Finally for this single case study, we have shortened the data
content and displayed only the valid and/or important data for the research.
Master Thesis - Data Loss/Leakage Prevention Page 28
CHAPTER IV
4. Theoretical study on Data Loss/Leakage Problems
4.1 Introduction
Over past few years, data leakage problem has rocketed to the top of IT chief‟s agendas. Data
loss (which is also called data leakage) is a direct threat to confidentiality. It leads to the
disclosure of information who doesn‟t have permission to read the information. This leads to
steps for the attacker. This data loss may help compromise the integrity or availability of
information contained in it. In practice, with huge efforts in securing computer systems the data
loss problem still exists and has become a hot button topic in organization.
4.2 Definition of Data Leakage
Data Leakage (also known as information leakage) is an incident when the confidentiality of
information has been compromised (ISF). The data that is leaked out can either be private in
nature and are deemed confidential. This information can be used by attackers to further exploit
the system[19].
4.3 Description on Data Leakage Problems
“Information Security can be treated as a game between attackers and organizations”[20]. So the
attacker tries to access information assets whereas the organization acts as defenders to protect
their information assets. The organization use technical counter measures such as firewalls,
access mechanisms, virus scanner, etc. in a motive to prevent illegal activities which might lead
to data leakage problem. As human behavior is also a cause for happening data leakage problem
in an enterprise, so the organization has implemented IS security policies for reducing the
vulnerabilities created by human behavior[21].
Now a day‟s data has become more mission-critical such as hospital patient records, a graduate
school thesis, tax information, personal finance, etc. Today users are storing more information
electronically than ever. In general, data loss can be divided into leakage, and disappearance or
damage. When organization doesn‟t have proper control to their sensitive data then data leakage
problem occurs which would cause a major impact in organization. This would be called as
confidentiality loss in computer security parlance. This data loss such as hacked customer
databases, credit-card details, etc can be caused due to a leakage problem.
How the data leakage can be detected?
From a paper review, there is a way to detect leaked data. When a sensitive data was leaked and
found in unauthorized places which were given by a data distributor to third parties. The leakage
can be identified by data allocation strategies. In some cases it is possible to inject “realistic but
fake” data records which is used to detect the data leakage and also in identifying guil ty party.
Through this, the distributor can access the likelihood that the leaked data came from one or
Master Thesis - Data Loss/Leakage Prevention Page 29
more agents[45]. From the literature review, it‟s been found that a guilt model is used to that
detects the agents using data allocation strategies such as explicit data request and sample data
request without modifying the original data[44].
In sample data request, the distributor has the freedom to select the data items to provide with
the agents. The idea behind in sample data request is, it can provide agents with a s much
disjoints sets of data as possible. The problem associated with this is that in some cases the
distributed data must overlap.
In explicit data request, the distributor must provide agents with the requested data. Adding the
fake data to the distributed ones in a motive to minimize overlap of distributed data is the idea
behind this explicit data request. The problem associated with this is that agents can collude and
identify fake data.
When a data copy is no longer available to an organization, then this leads to data disappearance
or damage. A data copy is a secondary copy of data or even called as backup which was taken in
precaution if any damage occurs to stored data‟s. For instance, there was an incident occurred in
2009, when a major cell phone service provider widespread loss of customer data, unknowingly
a server crash at the storage service have temporarily wiped out backups of memos, photos, and
other data for more than a million smart phone customers. Normally, the smart phones would
automatically synchronize its data at power off which stores it for use when the phone is on
again. For example, a common problem for an enterprise is laptop theft.
From the literature review, the author stated the data leakage problem in two ways; in practical
perspective and in theoretical perspective. In practical aspect, design flaws, and implementation
errors are common causes of data leakage problems. In theoretical aspect, covert channels such
as covert storage channels and covert timing channels are one of the issues of data leakage
problems. Also the author strongly stated that information leakage is a direct threat to
confidentiality[23].
4.4 Classification of Information Leakage
From the paper, the author classified the information leakage into three levels which means a
document containing confidential data can be classified as unintentional leak, intentional leak,
and malicious leak[24].
Unintentional Leak:
Unintentional
Leak
Atatch document
Zip and send
Copy & Paste
Master Thesis - Data Loss/Leakage Prevention Page 30
The above figure shows the possible way of causing unintentional leak. The unintentional
leakage normally occurs when a user mistakenly sends a confidential data or information to third
party or wrong recipient. This is done without any personal intention. For instance, if an
employee sends an email attaching a document mistakenly this contains confidential data to a
wrong person or to vendor.
Intentional Leak:
The above figure shows the possible way of causing intentional leak. The intentional leakage
normally occurs when a user tries to send a confidential document without aware of company
policy and finally sends anyhow. This is usually done when a user bypassing the security rules
and regulations or devices without trying to gain personal benefits. For instance, when an
employee renames a document folder and partially copies the data from it.
Malicious Leak:
The above figure shows the possible way of causing malicious leak. Malicious leakage usually
caused when a user deliberately trying to sneak the confidential data past the security rules and
Intentinal Leak
Document renames
Document type change
Partial data copy
Remove keyword
Malicious Leak
Character encoding
Print screen
Password protected
Self extracted archive
Hide data
Master Thesis - Data Loss/Leakage Prevention Page 31
policies or product. For instance, when an employee sneaks a confidential data from enterprise
system and sends them through email and even cause vulnerability to the system.
4.5 Causes for Data Loss Problems
There are many causes for data loss problems. After having a careful review, the researcher
pointed out the main causes for data loss problem which are not limited;
Accidental Deletion
Computer Viruses and Malware
Physical Damage
Accidental Formatting
Head Crashes
Logical Errors
Continued Use After Signs of Failure
Power Failure
Firmware Corruption
Natural Disasters
Accidental Deletion: Human error is the major cause for accidental deletion which leads to data
loss. Accidentally, user may delete important files in trash which can even cause bigger issues.
This type of data loss can only be recovered by File recovery software as long as user is prepared
ahead of time and acts quickly to recover their information.
Computer Viruses and Malware: As there are plenty of viruses and malwares that can lead to a
cause for data loss. This is caused by deleting the files or folders intentionally or by hard drive
crashes. This type of data loss can be prevented by installing virus protection software to the
system.
Physical Damage: The physical damage can be caused in many ways. For instance, hard drive
which is a sensitive piece of machinery can lead to a physical damage to the drive platters
resulting in loss of data or corruption if this mishandled. This type of data loss can be minimized
with a proper care within their recommended operating parameters.
Accidental Formatting: Generally, formatting a hard drive will leads to loss of information
contained on it. This is caused mainly when user selects an incorrect device to format when
attempting to format another device. The lost information can be recovered by either data
recovery program or by calling a data recovery specialist right away.
Head Crashes: Head crash is a hard disk failure that happens when read write head of hard disk
comes in contact with its rotating platter. This might leads to data loss problem when a user
drops laptop on the ground. So the data stored on hard disk might get damaged so it might easily
leads to data loss. So when a hard disk is in a problem then user should never attempt to
Master Thesis - Data Loss/Leakage Prevention Page 32
disassemble and repair by themselves. This type of data loss must carefully be taken care by the
user.
Logical Errors: Logical errors can be caused in different ways such as by system or file
corruption, software problems, and also invalid entries in file locations. This can lead to a major
data loss. This type of data loss can be prevented by reinstalling the operating system and restore
files form backup as many times. In some cases, logical errors can be fixed using disk utilities.
Continued Use After Signs of Failure: The data loss caused by this kind of cause is mainly due
to ignoring the early signs of drive failure. The warning signs such as grinding noise, system
hangs, and random file deletion which states that drive may be failing are totally ignored by
users which can lead to data loss.
Power Failure: This is a common cause of data loss problem. Suddenly power off may lead to
loss of unsaved files and can this even lead to file corruption. One of the best ways to prevent
this type of problem is through the use of UPS. Another way to minimize this kind of data loss is
to make insure that the user have to save the files frequently.
Firmware Corruption: It is an important factor in a hard drive. Firmware allows to read and
write data to the disk. If firmware is corrupt then drive will fail even if it‟s electronic or
machinery components are fully functional. The main causes for firmware corruption includes
virus attacks, unskillful attempts to update it, self-corruption due to excessive, and prolonged
use. Finally the hard drive‟s firmware can be restored by Low level programming and can even
get access to the data. CompuRecovery can also be a possible solution to recover data from
firmware corrupt drives. Insuring a proper backup is a must for minimizing this type of data loss
problem.
Natural Disasters: This can be caused due to lighting strikes, power surges, flood, fire, and
earthquakes to hard drives. It‟s recommended to protect the data by quality surge protector in
computer devices. Online backup service has to be taken to make sure that data‟s are in a safer
place when disaster occurs. These measurements will help in minimizing the data loss problem
in case of natural disaster.
In addition to the above mentioned causes for data leakage problems, there was a found reason
for why data breaches happen? It is understood that no one sets out to lose data but it happens
for number of reasons that includes cyber terrorism, human beings, lack of processors, and
organized crime.
In a better understanding, here is an example for causes of data loss problem. Below figure is
shows the statics of leading causes of data loss.
Master Thesis - Data Loss/Leakage Prevention Page 33
Statics of Leading Data Loss
Implications
Finally the researcher states that data leakage is a silent threat. Thus this chapter describes in
detail about data leakage concept where a reader can clearly understand the actual concept of
data leakage with data leakage problems, the causes and occurrence of data leakage and how it
can be classified based on the incidents. The causes shown here are taken with respect to real
case scenarios which might could be happen in industries. The actual real case scenarios are
shown in further chapter which was taken while conducting a case research in organization.
Computer Virus
Natural Disaster
Hardware or
System
Malfuction Human Error Software
Corruption
Master Thesis - Data Loss/Leakage Prevention Page 34
CHAPTER V
5. Empirical Study of Data Loss/Leakage Prevention
In the course of our research, one of our researchers who works for organization, have collected
real time data and conducted face to face questionnaires with two or more security officer‟s and
the other researcher who had an interview with a security officer through email management
system, have formulated the data in a theoretical structure on how DLP technology helps in
securing the data in organization. This empirical study also gives the details of organization
history and its details, and brings out why organization is in need of DLP technology.
5.1 Why Organization is in need of DLP Technology?
To prevent data loss in the organization by using Data Loss Prevention technology, the
organization is in need for the capability to exchange confidential information securely and
easily.
Confidential Data
o Credit Card / Client Information
o Customer privileged data
o Employee personal data
o Business Confidential data
Secure data from
o Employee Error, Employee Theft
5.2 Organization Details and History
Organization is a listed fortune 500 company on (NASDAQ), is having more than $10 billion in
revenues and around 570,000 clients. It is one of the world's largest providers of business
outsourcing solution. With powerful over 60 years of experience, this organization offers a broad
range of tax, payroll solutions, human resource and benefits administration. The user-friendly
solutions for employers provide greater value to companies of all types and ranges [24].
The organization is also a leading supplier of integrated computing solutions to motorcycle, auto,
truck, recreational vehicle, marine and heavy equipment dealers throughout the world.
The organization is one of four U.S. companies to get a AAA credit rating from Standard &
Poor's (S&P) and Moody's[25].
It serves the industry or organizations through two strategic groups – Employer Services and
Dealer Services.
Master Thesis - Data Loss/Leakage Prevention Page 35
Employer Services: Employer Services supports organizations from recruitment to retirement.
Employer Services have more than 545,000 organizations as their customers and provides
payroll, benefits and HR solutions[26].
Employer Services serves the market with the following business units:
National Account Services: This unit serves the employers with more than 1000
employees
Major Account Services: Employers with a range of employees between 50 to 1,000 are
served by Major Account Services
Small Business Services: Those employers with less than 50 employees Small Business
Services units serves them
Added Value Services: This unit provides special services in insurance, retirement, tax,
and pre-employment for organizations of all sizes
Retirement Services: Administrative services to definite contribution plans for small,
medium and large sized companies are provided by this unit
Total Source: This units totally takes care of Professional Employer Organization (PEO)
for small and medium sized organizations
ES International Services: This unit has business across the globe for all sized
organization internationally
ES Canada: Canada's marketplace leader in providing human resources, time and
attendance management, payroll, comprehensive outsourcing services and occupational
health & safety
Dealer Services (DS): Dealer Services offers integrated technology solution and services to
more than 25,000 automotive dealerships of all size of organizations throughout the world, as
well as to the vehicle manufacturers.
It provides industry-leading solutions that help auto, truck, motorcycle, marine, recreational, and
heavy equipment vehicle dealers use technology to increase efficiency and reduce costs in every
area of their operations. Here‟s a sampling of our solutions[27]:
5.3 Usage of DLP Technology in contrast with other existing technologies in organization
There are various technologies being used in the organization to prevent data loss. Though these
technologies are very powerful but can help majorly an outside attack on data, whereas the
current DLP technology deployed is mainly focused on inside attacks.
Master Thesis - Data Loss/Leakage Prevention Page 36
Below are the currently used technologies in the organization for preventing security breaches.
Here the researcher concentrates on how these technologies are addressing the security issues in
comparison with DLP.
Anti-malware: Malware stands for malicious software, it is a software designed to
disrupt computer operation, gather sensitive information, or gain unauthorized access to
computer systems. Types of malwares are Virus and Worms. A virus requires user
intervention whereas worm spread itself.
These malwares infect computers to directly use the infected computers data. Malwares
steals Data to get personal and proprietary information. The security threat that steals data
are key loggers, screen scrapers, spyware, adware, backdoors, and bots.
Anti-malware is a software used to protect malware attacks on computers, this software
get into the operating system's core or kernel functions in the same way as malware,
which attempt to operate from there. Each time the operating system does some job, the
anti-malware software checks that the OS is doing an approved task[28].
Though this anti-malware software works in real time environment very effectively but it
only looks for threats from outside, by scanning and signature validation it ensures that
malware infection be removed. This anti-malware software helps in data loss prevention
from external threats but for internal threats it doesn't have any mechanism.
Firewalls: Firewalls are devices or software which permit or deny network transmissions
based on a set of rules and are used to protect networks from unauthorized access while
permitting legitimate communications to pass.
Firewall - firewall is a software or hardware that helps in keeping network secure. Its
objective is to control the incoming and outgoing traffic of networks by analyzing the
data packets and determining whether it should be allowed through or not. A network's
firewall frames a brigade between an internal network (Secure), and an external network,
i.e. Internet (Insecure).
There are different types of Firewalls used in the organization, and it is one of the best
security features to be implemented. But the major problem here is Firewall works on
Access Controlled List often know as ACL‟s. These ACL‟s either allow or deny
completely. For example if a rule is set to deny any outgoing traffic with certain set of
data, then it will block all such traffic and it will not even allow the legitimate traffic to
flow[29].
Master Thesis - Data Loss/Leakage Prevention Page 37
Configuration, patch, system and Vulnerability management tools - Vulnerability
management is an on-going process which identifies and protects the valuable data,
intellectual property and will mitigate the vulnerabilities. Vulnerabilities are addressed by
applying patches or changing configuration settings to address the root cause and shield
systems from threats.
Vulnerability scanners are used to identify and classify vulnerabilities. It looks for
vulnerabilities known and reported by the security community, and those which are
already fixed by relevant vendors with patches and security updates.
This feature of security management helps in fixing the software bugs which are
vulnerable and through which an attack is done. It makes the infrastructure and the
applications stronger so that outside attacks like XSS, SQL Injection etc. can be stopped
[30].
IDS/IPS - Intrusion detection system (IDS) is a device or software application that
monitors network or system activities for malicious activities. It identifies a potential
security breach, and logs the information and gives an alert by signaling.
IPS – Intrusion prevention systems monitor network, system activities for malicious
activity. It mainly identifies malicious activity, log information, attempt to block/stop
activity, and report activity.
Intrusion prevention systems are extensions of intrusion detection systems because they
both monitor network traffic, system activities for malicious activity. Intrusion prevention
systems can able to prevent or block intrusions that are detected. This can perform
actions such as indicating an alarm, leaving the malicious packets, and blocking the
traffic from the offending IP address.
Though both IDS/IPS and Firewall relate to network security, an intrusion detection
system (IDS) differs from a firewall in that a firewall looks outwardly for intrusions in
order to stop them from happening. Firewalls limit access between networks to prevent
intrusion and do not signal an attack from inside the network.
An IDS evaluates a suspected intrusion once it has taken place and signals an alarm. An
IDS also watches for attacks that originate from within a system. It examines network
communications, identifying heuristics and patterns (signatures) of common computer
attacks, and taking action to alert. The system that terminates connections is an intrusion
prevention system.
Master Thesis - Data Loss/Leakage Prevention Page 38
It cannot protect or prevent data loss as it only monitors and focus on identifying possible
threats and just flags/alerts or termination of connection[31].
SIEM - Security Information Event Management (SIEM) is a tool used on enterprise data
networks to centralize the storage of logs which was generated by the software running
on the network. This has various features such as gathering information, analyzing the
information and also presenting the information from network and security devices;
identity and access management applications; vulnerability management and policy
compliance tools; database logs; application logs; external threat data and OS. It monitors
and helps manage user and service privileges, and directory services; as well as providing
log auditing and review and incident response.
Though this technology can collect events or logs and store for certain period of time but
it doesn‟t have the capabilities of preventing/protecting data loss[32].
Identity and access Management - Identity and access management includes policies,
processes, provisioning, authentication, authorization, privileges/permissions and
entitlement enforcements. The system products and applications of identity and access
management are solutions implemented for enterprises and organizations. It is a process
of identification of an identity to its termination to prevent misuse [33].
All the above technologies are used to prevent external attacks and act very minimal for
preventing Insider attacks/threats.
In contrast to the above technologies used for Loss Protection/Prevention, DLP provides a
policy-based approach to secure data. It enables customers to classify their sensitive data,
discover data across the enterprise, enforce controls, and generate reports to ensure compliance
with established policies.
DLP solution is capable of doing the following which were very not there in previous
technologies
- Discover and protect sensitive data in the enterprise. This is done by leveraging common
policies across the infrastructure to discover and protect sensitive data in the data center,
and also in the network, and on endpoints.
- Mitigate risk. Risk is mitigated through identity-aware policy-based remediation and
enforcement.
- Reduce total cost of ownership. TCO is reduced with industry leading scalability,
automated protection of sensitive data, and the most comprehensive policy library
available.
Master Thesis - Data Loss/Leakage Prevention Page 39
- Simplify security operations. This is done by streamlining the security operations process
using incident handling and workflows, and by integrating the DLP with Log
management tool[34].
5.4 Empirical study of DLP Products and its Implementation
Below are the details which give insight to the basics of deployment like architecture diagram,
project flow and process flow and how best DLP Suite can be used, assuming that most of the
product defaults will meet the needs. It is best suited for use when it does not require optimal
performance or to test product features that are not used, and in combination with limited
amounts of data and numbers of machines. In addition to above, Data Loss Prevention
technology helps in preventing Data breaches.
Planning Deployment: It is possible to deploy the DLP Suite in many different hardware
configurations, ranging from shared components on a few machines to all components on
separate, geographically scattered machines. How best DLP set up can be done depends on
what are the data security needs are and how the organization is structured.
Master Thesis - Data Loss/Leakage Prevention Page 40
What Information Needs Protection?
Generally standardized information is defined as sensitive information, such as personal
credit card information (PCI) or personally identifiable information (PII). The sensitive
information PCI and PII and also many other standard information‟s can be detected
using external tools provided by DLP.
Unofficially standardized information such as employee records or software source code
can be identified using expert tools provided by DLP. Using these expert tools, many
unofficially standardized information can be identified.
Information‟s which are unique to organization, DLP ease to create custom blades and
policies, or make fingerprints of sensitive documents and files [37].
Which Data-Loss Scenarios to cover?
The DLP Suite can provide protection in any or all of these scenarios:
Data in motion: In order to prevent the transmission of sensitive data to external sites, the
network traffic is monitored by DLP Network.
Data at rest: Stored sensitive data is protected by DLP Datacenter which might consists
on organization‟s server, databases, or employees‟ computers, etc.
Data in use: In order to prevent unauthorized actions with sensitive data, DLP Endpoint
continuously monitors end users‟ actions[37].
5.4.1 DLP Network
To detect sensitive content in emails, web pages, or other transmissions leaving the network, the
solution is DLP Network.
Master Thesis - Data Loss/Leakage Prevention Page 41
DLP Network includes the following components [36]:
Network Controller: It maintains information about confidential data, content
transmission policies, and communicates with Enterprise Manager, the web-based
administrative interface for all DLP products.
Managed devices. One or more of any of these devices:
Sensor: Deployed at network egress points. Passively monitors traffic, analyzing
it for sensitive content. Can only monitor; cannot block transmissions.
Interceptor: Deployed as an inline mail transfer agent to monitor, quarantine, or
block sensitive email (SMTP) traffic. Can also be used with an email encryption
gateway to encrypt messages.
ICAP Server: Deployed in association with a proxy server to monitor or block
sensitive HTTP, HTTPS, FTP, or ActiveSync traffic. ICAP servers can also be
used to monitor internal e-mail.
5.4.2 DLP Datacenter
DLP Datacenter is mainly used to locate stored sensitive content within organization machines,
even in employee‟s computer hard drives, databases, file shares, etc. It normally performs scans
on machines within the network with a motive to detect and act on cases where the sensitive data
is stored.
Master Thesis - Data Loss/Leakage Prevention Page 42
DLP Datacenter includes the following components[36]:
Enterprise Coordinator: This manages the actions of utilized Site Coordinators by
sending configuration information along with policies, and also it receives scan results
from them, and will pass that information to Enterprise Manager.
Site Coordinators: Manage the actions of groups of agents and grid workers, utilizing
them will start for a scan, and retrieving the scan results from them, and this information
will pass on to the Enterprise Coordinator.
Scanning agents: This is a kind of small programs which run on endpoint machines as a
service that performs to indentify sensitive data.
Grid workers: Grid workers are scanning agents used when there is a need for special
purposes. They are mainly used for analyzing large data storehouse.
The Enterprise Coordinator can manage many geographically scattered Site Coordinators. One
Site Coordinator can manage one or more (typically local) agent-scan groups, each of which can
consist of up to hundreds of individual endpoint computers, each containing an agent.
Site Coordinators also manage grid-scan groups, in which a number of grid workers on dedicated
machines collaborate to scan a large file share. Site Coordinators also manage repository-scan
groups and database-scan groups, specialized grid-scan groups that analyze proprietary file
repositories (such as SharePoint) and databases, respectively.
Master Thesis - Data Loss/Leakage Prevention Page 43
5.4.3 DLP Endpoint
DLP Endpoint is deployed in a motive to monitor the end-user actions such as copying sensitive
information or files to demountable devices.
DLP Endpoint includes the following components[36]:
Enterprise and Site Coordinator: The main purpose here is to manage the agents on
endpoint machines, where Enterprise Manager communicates with the Enterprise
Coordinator, which in turn communicates with one or more Site Coordinators.
Enforcement agents: DLP Endpoint enforcement agents analyze files for sensitive
content. This is performed only in the context of a user action. It frames a risk in case of
combination of action and file acts.
5.4.4 Designing the Deployment
Deploying a DLP solution means setting up one to three DLP products. The process involves
installing software components on various machines, or connecting hardware appliances
containing pre-installed software to the network. In either case, it is then configure the solution
using a graphical administration tool into which all three products are fully integrated.
The following diagram presents a high level architecture of a complete deployment of the DLP
Suite[36].
Master Thesis - Data Loss/Leakage Prevention Page 44
This architecture diagram has these characteristics:
It consists of a headquarters office, which contain employees and its related
infrastructure, plus several branch offices.
The branch offices connect with headquarters by means of a high-speed WAN. The
headquarters in addition includes a firewall and Internet connection.
For Internet access, the branch offices connect through the headquarters‟ Web servers.
For email, the branch offices connect to the headquarters‟ mail server.
Most of the corporate data repositories and all large databases are located at headquarters;
some branch offices have large file servers of their own.
DLP Datacenter
One Site Coordinator at headquarters, one at each branch office.
Agent-scan groups - one at headquarters and one at each branch office, for scanning
endpoints (users‟ computers).
Master Thesis - Data Loss/Leakage Prevention Page 45
Grid-scan groups - one for each large file server (NAS, SAN, etc.) at headquarters or the
branches.
Repository-scan groups - one for each data repository (such as SharePoint) at
headquarters.
Database-scan groups - one for each corporate database at headquarters that needs to be
scanned.
Grid-worker machines - configured for use by each grid, repository, or database-scan
group. It is installed as per need.
Scanning agents - installed as needed on each endpoint and grid-worker machine.
DLP Endpoint
One Site Coordinator at headquarters, one at each branch office (already created for DLP
Datacenter).
Endpoint groups - one at headquarters and one at each branch office, for monitoring user
actions on endpoints.
Enforcement agents - installed as needed on each endpoint.
DLP Network
Network Sensor appliance attached (by means of a network tap) at point of network
egress to the Internet.
Network Interceptor appliance attached to network, downstream of main email server (to
catch outgoing email), and upstream of an email gateway.
Network ICAP Server attached to network in conjunction with a proxy server (to catch
HTTP and FTP transmissions).
5.4.5 DLP Dashboard
Below is the excerpt of the dash board of DLP suite deployed. The dash board is a graphical user
interface which provides complete snapshot of the DLP. It categorizes incidents by Network,
Endpoint and Datacenter. It also shows number of events or incidents generated and their status.
Based on both the type of policy and content blade, number of incident triggered can be seen in
the dashboard. It's very user friendly environment with lots of information on a mouse click[36].
Master Thesis - Data Loss/Leakage Prevention Page 46
5.4.6 DLP Admin Console
Admin console basically has different options to perform admin activities.
Status and overview tab we can find the types of devices and their status (active or
Inactive)
Users and Groups tab gives the functionality to create, delete or/and modify roles / users
access to DLP
Network, Endpoint and Datacenter tabs, system status is displayed based on device type
(Sensor, ICAP server, Grid Worker etc.)
Notification - automatic alerts are set whenever a device or feature fails to perform the
job, i.e. an email alert is sent when any of the devices or services are hit or stop working.
Settings option gives functionality to set various thresholds
Support - It opens up a knowledge base for quick help
Master Thesis - Data Loss/Leakage Prevention Page 47
This helps in saving lot of labor work as in an organization with very huge deployment it
is very tidy and uneasy job to keep a track of all the devices and services.
5.4.7 Content Analysis and Policy Application
All three DLP products make use of content analysis (detection of sensitive content in documents
or messages) and application of policy (a specification of how to handle sensitive documents or
messages)[36].
Master Thesis - Data Loss/Leakage Prevention Page 48
Content Blades
Content blades are highly accurate pattern-matching detectors of sensitive content. DLP supports
two kinds of content blades:
Described-content blades are detailed descriptions of sensitive content, and may contain
terms, regular expressions, programmatic entities, and other factors to accurately detect
classes of sensitive content such as Social Security Numbers.
Approximately 150 pre-defined “expert” content blades are available for immediate use
in the DLP product, and it can be customized or create other content blades that are
unique to organization.
Fingerprinted-content blades (or “fingerprints”) are mathematical descriptors of
individual sensitive documents or fragments of documents. They will “match” any copies
of those documents or fragments found anywhere in the organization.
Fingerprints of known sensitive documents are created, and then used to ensure that unauthorized
copies of the documents are not being used.
The DLP products use content blades to perform content analysis on intercepted messages,
stored files, and files being manipulated by users. Each document or message is assigned a score,
or risk factor, depending on how strongly it matches a content blade.
Policies
Policies are sets of rules that specify when to create an event (a record that a sensitive document
or message has been detected) and how to act on, or remediate, that event. A policy can base its
decision on the results of content analysis (the risk factor, or severity, of the analyzed content)
and on non-content-based factors such as the identity of the message sender or the destination of
the user action. Below figure gives an excerpt of policies in DLP [36].
Master Thesis - Data Loss/Leakage Prevention Page 49
Approximately 150 pre-defined “expert” policy templates are available for immediate use in the
product, however as per the organizational requirement new policies can be created and already
existing policies can be customized.
5.4.8 Incidents
When sufficient number of events occur, DLP creates incidents that a security officer can
evaluate and take appropriate steps to manually remediate the security issues that they represent.
There is a dedicated workflow followed to analyze the root cause and follow the remediation
process. A dedicated team, working on security incidents handles the workflow. A watch list is
Master Thesis - Data Loss/Leakage Prevention Page 50
maintained and on all the users a vigilant eye is kept, if the security incidents are repeated
appropriate action is taken involving other departments like legal, compliance etc.[36].
Below figure shows the high level workflow of Critical Incident Response Center team.
The below figure gives an understanding of the types of Incidents / Events, date and time they
occurred, severity level, sender or owner details, protocol used, the exact file name or
information along with details of type of policy violated.
From the GUI incidents can also be differentiated based on type (Network, Datacenter &
Endpoint). As per the requirement it can be filtered with date ranges (day, week, month etc.)[36].
Master Thesis - Data Loss/Leakage Prevention Page 51
5.4.9 Reports
To further aid in understanding and remediating the security issues revealed by events and
incidents, Enterprise Manager includes a reporting facility that security officers and executives
can use to gauge areas of risk, risk trends, and levels of regulatory compliance.
Based on the requirement reports can be generated. DLP provides both automatic and manual
reporting features. If the report is set automatic, then it is sent automatically on the date, day and
time specified. DLP also provides feature to create custom reports for different business
functions, type of violation, severity level, trend etc.[36].
Below figure shows an example of DLP reports
Master Thesis - Data Loss/Leakage Prevention Page 52
Implications
Thus this chapter clearly describes the existence and the usage of DLP technology by
implementing DLP suite components on how the sensitive data or confidential data are protected
from data leakage problem in an organization along with its features and uses. The chapter has
mainly come up with empirical study of DLP suit components where each component has the
specialty in preventing the data from data loss problem. From this chapter, the reader can
understand the concepts of DLP suite components and it‟s usage in data loss prevention
techniques in organizational view point.
Master Thesis - Data Loss/Leakage Prevention Page 53
CHAPTER VI
6. Analysis
This chapter deals with research based question along with data leakage problems in real time
case and further this chapter concludes with data analysis on how DLP helps in solving data
leakage problems.
6.1 Research on Security gap analysis
In this part, the researcher addresses the research question which is mentioned in chapter I along
with the cases of data leakage problems in case settings.
6.1.1 RQ: How the security gap is filled using this DLP technology compared to previous
techniques?
In this part, the researcher comes over how the security gap is filled using the current DLP
technology in comparison with previously used technique in organization. Here the researcher
concentrate mainly on the security issues which are solved using DLP technology compared to
previously used technology in data leakage prevention to organization expectation.
Based on the literature review and empirical data study, here the researcher analyzed how the
security gap was existed in the organization. In the organization, there were very few controls
which were placed to prevent internal threats such as data leakage. Though, firewall is placed as
an edge device to monitor the incoming and outgoing traffic on the network but it was unable to
address the core triad of information security that is availability.
The data on firewalls are monitored by Access Control Lists (ACL), the major drawback of
ACL's is either it allow everything or deny everything based on the control list. If a legitimate
user has to send some confidential data for business purpose and the ACL is set to block it
doesn't allow the user to send the information which causes many issues.
At last, none of the existing technologies were able to monitor data at rest, i.e. storage and data in
use, i.e. endpoints. Though antivirus provides a security control but it mainly and heavily deals
with malware on the endpoints (HDD, CD Drive, etc.) and there was no technology which was
helping to find the sensitive data in storage. To address all these security gaps, the organization
has implemented Data Loss Prevention technology in its environment which basically classifies
data into network, storage and end points.
Below comes how the security gap was addressed by DLP technology;
Security Gaps addressed by DLP:
DLP has provided a comprehensive enterprise data management platform, information security
department has benchmarked the organizations' business workflow and related to the protection
Master Thesis - Data Loss/Leakage Prevention Page 54
of existing IT assets. This process includes investigating and targeting key aspects of the network
infrastructure that are a source of data loss. DLP has identified the potential areas of data
leakage, below are the security gaps which are addressed after implementing DLP [35];
DLP helped in knowing where all the data exists, how it's accessed and who has access to
it, as the complexity of an IT infrastructure increases, so is the difficulty to maintain
infrastructure.
DLP helped in assigning the roles of data managers and storage managers; also it
addressed creating a data ranking system.
Classification or categorization of sensitive data, critical data for the organization.
Identifying users with excessively liberal access controls, including higher management,
who request high privilege access levels without having proper knowledge in data
security.
The focus was on inbound emails to protect against internet threats, where outbound
email is were overlooked as a major source of data loss. Accidental loss of confidential
and proprietary information from insider email is one of the largest areas of data loss
which was filled by DLP. Associated risks from personal web based account to auto
forward, which have serious legal, financial and regulatory consequences.
Illegitimate use of Internet protocols and services such as IM, blogging, peer-to-peer file
sharing, unauthorized uploading (FTP) of data to Web sites and social networking sites.
The involvement of contractors and consultants requires the creation of user credentials.
Knowledge and accountability of these user accounts is necessary, as these are frequently
misused.
The portable medium (Removable storage media) for the loss of data such as optical
media, flash drives, personal media devices and external hard drives are regulated with
compliance.
All monitoring control is lost after Mobile computing platforms (i.e. laptops, PDAs) are
physically removed from the corporate environment.
Enterprise storage has change from direct-attached storage (DAS), basic networked file
shares and simple database storage to storage area networks (SANs), tiered and
Master Thesis - Data Loss/Leakage Prevention Page 55
hierarchical storage models, high-end storage arrays, clustered storage and virtual storage
systems using Fiber Channel and iSCSI. The remediation strategies for data leakage are
very difficult due to the wide range of hardware and software and their different
configurations, this is addressed by DLP.
DLP helped to put into practice, essential company-wide standards and procedures for all
employee data usage and information rights.
Based on the business risks associated with data loss or exposure, ranking and assessing
of corporate data is done.
Ensuring detection and classification uses effective identification with lexical
examination of data content.
Review of business critical data and frequently maintaining and updating inventory,
ensuring proper controls are in place and ensuring security protocols are up to date.
DLP helped in implementing a successful data security model that simplifies role based
access control (RBAC) and granular control of individual users.
Imposing employee training of corporate email acceptable use policies. Compliance and
policy management of outbound email, automated messaging protection policy for
corporate.
Making certain that employees are aware of computer usage monitoring as avoidance to
attempts at policy circumvention.
Governing frequent assessments of user-privilege levels to evaluate and confirm that the
appropriate settings are configured for all users.
Inserting access controls directly into sensitive information by using digital rights
management (DRM) technologies.
Using federated identity management to maintain data security when dealing with
business partners or vendors.
To track data locations and monitor data leakage threats with respect to time and user
request routine audit and data-flow assessment are performed and reports are generated
and sent to higher management.
Master Thesis - Data Loss/Leakage Prevention Page 56
It is very hard and challenging to efficiently manage new and existing data. DLP security
policies addresses data proliferation issues also maintain data availability, business
efficiency, operational stability and data restoration. Data loss prevention helps
compliance issue and is critical in protecting confidential company data and preserving
customer data privacy[35].
The above mentioned points have been addressed by DLP after its implementation. Though there
were technologies to prevent data loss or leakage but can prevent attack externally and if at all
placed as a control for insider data leakage then it is proving to be very uncomfortable affecting
availability of information. For e.g. when firewalls were used to prevent data leakage few ACL's
were created which were preventing or blocking data for every user all the time. It doesn't have
capability to check for legitimate transaction or encrypted data being sent out for business
purposes. One other major issue or gap filled by DLP is role based access. After its
implementation and successful data center scans it has been noticed that many of the shares and
folder with files containing sensitive information were not having role based access, few shares
were open to all the users with 'Everyone' group. DLP helped in finding the data with sensitive
information and with effective RBAC process all the shares were restricted.
DLP helped in categorization of important data, i.e. sensitive data, company confidential data
etc. With DLP end point in place a strict and proper control is now in place to monitor what data
is being copied onto the removable media and whether it is encrypted or not.
The DLP network provided control to check all the data transmitted on the network with
protocols HTTP, SMTP, FTP, HTTPS, POP3 etc. DLP changed the risk approach and helped in
proper auditing of all the data in the organization especially the unstructured data.
6.1.2 Data Leakage Problems in Case settings
Introduction:
A single data loss incident results in continuous cost and huge damage. It affects customers,
involves internal investigation, repair any break to systems, and dealing with litigation, external
audits, increased regulatory lapse. When the organization faced several breaches it thought to
check every data being sent out of the organization it could locate stop the data leaking out of its
networks.
Due to the above problem, the organization chooses RSA DLP to provide a data loss prevention
solution. Data Loss Prevention (DLP) refers to systems designed to detect and prevent the
unauthorized transmission of information from the computer systems of an organization to
outsiders. DLP added an additional layer of security for the organization since one of the biggest
Master Thesis - Data Loss/Leakage Prevention Page 57
assets of an organization is its data. With the DLP solution in place, the organization can now see
how the data is being used, by whom, and can control who sees the information and who they
can send it to. The DLP solution even will stop information being sent to unencrypted sites.
In a better understanding, here the researcher comes with some cases of live scenario of data
leakage problems which are caused in organization.
Case 1:
There was a case which has happened some time back in the organization were a user has
unintentionally sent data outside the organization for its routine process. The client was a big
customer with payroll processing of more than 10,000 employees. In the day to day operation the
payroll processing executive has sent the file with sensitive information of 400+ users of the
client outside the organization and without encryption. The data has landed into wrong hands and
due to which there was serious reputation damage and huge losses to the organization. The
sensitive data was misused and resulted in exploitation of many users money of the client. This
was noticed when there were complaints raised from the client. On investigation it has been
found that the executive has sent the data unencrypted which landed in wrong hands. Then the
organization after seriously analyzing the situation has implemented DLP in its environment
which is proved to be a good technology for prevention of data leakage and loss since this
controls data on network, data at storage and data at end points[36].
Case 2:
Once a user unintentionally replied to an email which was having huge sensitive data and it
landed somehow into some other domain mail box which caused a big issue due to which the
organization lost its valuable customer as the business partner have stopped doing business and
opted for some other payroll processing organization. The employee didn‟t realize that sent email
from the organization to an outside company that everything in between those two servers is
insecure. This has led in offering weekly training for IT staff on the information security
program, and monthly the organizational wide training[36].
Case 3:
The organization which is a major and largest finance across the globe had an issue with
sensitive customer data piling up on network shares, file shares and other less controlled
environment and had become almost impossible to stop the proliferation of the sensitive
information. DLP has provided a comprehensive data leakage solution which has addressed all
the data leak avenues and performed data classification by addressing the data elements that
required to be protected by regulations and also that covered the business risk.
DLP is a robust solution for preventing data breaches for compliance with privacy regulations,
for achieving performance, scalability and the highly precise content detection capabilities to
Master Thesis - Data Loss/Leakage Prevention Page 58
satisfy existing requirements as well the flexibility to customize the solution. The
implementation resulted in significant benefits for the organization including meeting
compliance objectives, improved audit performance, cost savings, scalable, flexible, pluggable
solution[36].
6.2 Analysis of DLP existence in solving Data Leakage Problem
It all started when there was a security breach/incident in the organization some time back. The
management then went ahead with risk assessment by one of the Big 4 organization. After a
thorough assessment by the vendor it has been recommended that the organization holds lots of
confidential sensitive payroll data in the environment and this requires good data security
controls to keep a strict watch.
Also the vendor has recommended implementing Data Loss Prevention in the environment as
this was a major source of data leakage intentionally or unintentionally. After a detailed study of
different DLP products, it has been decided to implement RSA DLP to overcome the internal
data breach threats.
With huge investments and grueling efforts DLP is successfully implemented, and as scoped this
technology is a good control to keep a watch on the data in use, data in storage and data on
network.
After the implementation of the infrastructure the major and painful task was to define and
configure policies, since all the data sniffing or data checks happens based on the policies
defined in the system. Why I am saying this was a painful task is because there are several
business regions/departments and each has their own requirements, and meeting all the
requirements is a cumbersome task.
Finally after much efforts and discussions with all the business regions and bringing them to a
common platform we have defined and configured the policies like personally identifiable
information, payment card industry, work place violence etc. (to name few).
Once everything was in place we started monitoring the behavior of the technology and incident
pattern, here we were looking at all the three types (network, endpoint and storage).
Data in Storage: Here in this we have performed DLP security scans on the windows storage,
netapps storage and emc nas storage. The scan will look at the data and identifies the sensitive
information based on the policies and creates an incident, this helped us to reach out to the share
or folder owner and restrict access. We have taken help of Varonis DatAdvantage tool to find the
security permissions and level of access each group or user possess. By this we have ensured that
the confidential or sensitive information which was lying unprotected in various shares and
folders are now secured by appropriate security permissions and access to only those users who
need it. This helped in implementing RBAC in the environment.
Master Thesis - Data Loss/Leakage Prevention Page 59
Data in use: Effective DLP control has helped in data loss/leakage from the external devices like
USB, CD/DVD etc. A small piece of software known as DLP agent is deployed on each machine
which keeps a track of all the activities on system. If a user is trying to copy data from system to
any of the external device unencrypted then a pop up window appears and gives instruction to
encrypt and copy data. And based on the pattern matching it will generate an incident. Sophos
encryption software is used to do 128 bit SSL encryption.
Data on Network: The data on network for protocols like SMTP, HTTP, FTP etc. if data is
transmitted unencrypted then an Incident is triggered with the details of source, destination and
all other details[41].
Master Thesis - Data Loss/Leakage Prevention Page 60
Eventually, every business looks for a competitive gain. And for the data-driven business, DLP
increases the protection of sensitive data and deliver a stand for aggressive growth. The cost
associated with data protection frees operating assets for investment in the organization growth
opportunities. It helps organization overcome legal issues and customer loss by making a
comparatively small amount thus magnetize clients, increase sales, and advance company. DLP
focuses on the normal run of business. It ensures unnecessary locking down data and averting
employees from performing their job activities or allowing liberated flow of data without any
control. Delivering business-centric security is the main motive of DLP[36].
Master Thesis - Data Loss/Leakage Prevention Page 61
CHAPTER VII
7.1 Conclusion
Data discovery and classification is a prerequisite to a successful deployment of a Data Loss
Prevention solution. Understanding the data flows and classifying information enables
organizations to protect sensitive information while avoiding relatively benign information like
family photos or grocery shopping lists.
DLP has helped the organization in providing a quick, practical framework to:
Discovering sensitive information
Protecting this information
Evaluating and refining DLP policies and rules once the knowledge is obtained about the
nature of the organization‟s internal and external information flows
After implementation the results were clearly seen in the organization as „how the security gaps‟
are filled on all the three different modules for data loss prevention;
Data in motion or data on the network it is seen that the traffic leaving the organization network
is checked for each packet and analyzed on the policies defined for PCI, PII, confidential data
etc. This technology has capability of screening data on all the protocols like HTTP, HTTPS,
FTP, Telenet, IMAP, SMTP, POP3 etc. And it has been found that the data uploaded on to
different sites are thoroughly checked for data loss.
In Data on Storage we have noticed that the sensitive data lying on storage in different areas on
various file servers was not encrypted and after DLP we have scheduled scans to find the data
and by this we were able to get the sensitive data encrypted and by working with IT teams data
was restricted for only authorized users to access.
In Data in use and on the endpoints DLP is able to restrict data being copied to the external
device unencrypted thus eliminating the data loss.
Master Thesis - Data Loss/Leakage Prevention Page 62
Traditionally, security is part preventative and part reactive. Data Loss Prevention is preventative
but this is not always possible. With proper planning and some understanding of the business
processes and data structure, it was possible to cast a net that will catch violations without
knowing the identification of a piece of information or the location of a piece of data.
Data Leakage Prevention has prevented Data Leakage and has answered the examined
implementation. The results from the examination clearly show the advantages of using the
solution. DLP has helped in making the survey of data which is spread over the network and
protect it from leakage. It has provided data protection and a way to reconnaissance after
implemented with global data classification concept.
The findings highlighted in this thesis also show the ability how in preventing maliciously
motivated leakage. The DLP solution assignment has provided the features of awareness and
content discovery; it has intensely evaluated the possible raise of responsiveness for confidential
data, validates the highly increased complexity of the network. The discussion and further study
on DLP in cloud computing along with risk evaluation must be continued in more detail.
The implementation of DLP in organizational environments has certainly given a lot of clarity on
the technology and its work. It is also necessary to study the DLP solution more in depth in terms
of software vulnerabilities (weakness). Still if the security of the software is not associated to the
approach of DLP, it is necessary to guarantee that the software does not have any vulnerability.
Otherwise, it is not probable to avoid leakage.
Since this thesis could cover DLP solutions, its features and functions, the other important work
is to examine more DLP solutions. As already mentioned DLP software should be evaluated in
terms of security. This evaluation process must go on for the other DLP solutions to get an
overall picture of the security of these products.
Master Thesis - Data Loss/Leakage Prevention Page 63
7.2 Future Research on DLP
In this part, the researchers have come up with future research in referring some papers on how
DLP technology can be used on cloud computing in preventing data loss leakage along with the
benefits of DLP in cloud computing.
DLP in Cloud Computing
Many organizations are moving data to the cloud, but this leads to security and compliance
concerns. Though moving to a cloud environment is flexible and cost effective, but the security
controls for cloud are very rare. Having DLP in cloud computing may increase confidence of
organizations to move business-critical apps, but this may again lead to questions like how cloud
DLP works and how it can actually enhance security and compliance. How it address unique
requirements of cloud computing?
Data is shifted from central storage form to a distributed model, i.e. from mainframe/midrange to
client-server, which forced security organization to change. The risks of data on workstations
and in personal devices are directed to an increase in data loss prevention gear, which can
monitor mobile and distributed systems. Security management has to discover and track how
data is being stored and the new trail of transmission. Similarly, a shift from physical machines
to virtual machines forces another move; the virtual environment introduces many issues, for e.g.
security and automation of cloud environments.
A thing to be noted is whether cloud providers have the ability to identify sensitive data. Does
they have Privacy trade-offs and controls like encryption if they are looking to find the sensitive
data on virtual machine to report on. Another thing to be looked into is the granularity in role -
based access and reporting. Is the performance impact of finding sensitive data can be managed
easily?
Data loss prevention (DLP) should find and block the loss of sensitive data. Along with
Discovery of sensitive data, a cloud DLP should have the capability to prevent loss of data.
On reviewing few papers we have find that one way to use DLP for cloud computing is to
monitor and even block data migrations to and from the cloud from infrastructure. Cloud
computing services rely on HTTP as their main communications protocol. Therefore, if HTTP
and HTTPS is monitored, then many potential data migrations across the cloud can be
detected[38].
There are few ways by which we can watch the network (SMTP) traffic, along with discovery
scans, and they are
1. By an endpoint agent embedded in the cloud instance
2. By routing traffic via a dedicated DLP server or appliance egress to the cloud
3. By operating a cloud instance of a DLP server and routing traffic through it
Master Thesis - Data Loss/Leakage Prevention Page 64
Advantages of Cloud DLP
In a cloud environment, a virtual machine can be used to run a security engine in order to
manage all the other virtual machines on a designated set of virtual servers, based on virtual
machine manager technology to host virtual machines. The virtual machines can then run client
software with a DLP engine that will scan, recognize and block communication of sensitive
information. The VMM can get these together and merge into a single virtual machine, making
DLP engine able to monitor and manage all the virtual machines that run a client, and also to see
data at rest. This makes the scope for compliance requirements like PCI DSS; PII etc. for
sensitive data. DLP runs as a service, it can be enabled / disabled for virtual machines running in
the cloud data center.
A cloud environment is dynamic, so as a DLP service, as it can be extensible and automated. A
DLP solution can be planned using APIs to automate controls, like making a rule that
automatically shift a virtual machine with sensitive data behind a firewall or budge it into a lock-
down.
The flexibility and control in the cloud computing makes control of virtual machines more viable
than in the physical setup. A rule can require a VM found with credit card data, should have its
network connectivity isolated at the application level (restrict certain protocols) to block data
leaks, and shoot an alert (email) to administrators. Assessment of a full virtual data center; cloud
DLP can find systems with sensitive data and move them from a cluster of insecure systems to
one assigned to business-critical applications with sensitive data[39].
Cloud DLP limitations
If the cloud platform is public it may support a single network interface per instance, which will
result in a need of virtual DLP version that can monitor and forward or block traffic with
restriction. There is a lot of significance in using DLP to monitor data migrating to the cloud and
for content discovery on cloud-based storage, but deploying DLP in a public cloud may not be
significant. It makes sense in private cloud, depending on what it is used for.
Security of any cloud deployment in line with DLP is probably an application infrastructure,
which rely more on application security and encryption.
DLP is an excellent tool to enhance data security in the cloud. It can be used to track data
migrating to the cloud, discover sensitive information stored on cloud, and to protect services
running on the cloud, given the fact it is tuned accordingly.
Master Thesis - Data Loss/Leakage Prevention Page 65
7.3 Reference List
[1] Richard E. Mackey, Available:
http://viewer.media.bitpipe.com/1240246133_118/1258558418_168/sCompliance_sSecur
ity_Data-Protection_final.pdf
[2] Bradley R. Hunter, Available: http://www.ironport.com/pdf/ironport_dlp_booklet.pdf
[3] Webspy, Available:
http://www.webspy.com/resources/whitepapers/2008%20WebSpy%20Ltd%20-
%20Information%20Security%20and%20Data%20Loss%20Prevention.pdf
[4] Data loss problems, Available: http://www.webspy.com/resources/whitepapers/2009
WebSpy Ltd-Information Security and Data Loss Prevention.pdf
[5] 2006 Report, The Office of the U.S. Trade Representative, Available:
http://www.ustr.gov/about-us/press-office/reports-and-publications/archives
[6] Deloitte‟s Global Security Survey report, Available:
http://www.deloitte.com/assets/Dcom-
Global/Local%20Assets/Documents/TMT/dttl_TMT%202011%20Global%20Security%2
0Survey_High%20res_191111.pdf
[7] Prathaben Kanagasingham, 2008, Available:
www.sans.org/reading_room/whitepapers/dlp/data-loss-prevention
[8] DLP Key Features, Available: http://ecs.arrow.com/suppliers/documents/RSA-
SolutionBrief-enVisionSolutions.pdf
[9] Data Leakage Prevention by Hannes Kasparick in May 2008
[10] Suphat, Fundamentals of Quantitative Research
[11] „The security gap in risk analysis‟ by Serge van der Schaft, 28 februari 2005.
[12] Creswell, 2007 literature review, Available:
www.wordsinspace.net/course.../MatternLiteratureReviewTips.pdf
Master Thesis - Data Loss/Leakage Prevention Page 66
[13] Borg & Gall, 1989. Available: www.staff.vu.edu.au/syed/research/Methodologydraft2.pdf
[14] Kothari, 2004. Available: www.tbher.org/index.php/tbher/article/download/35/35
[15] Rowley, 2002. Available: www.emeraldinsight.com/journals.htm?articleid=866789
[16] Yin 2003, Available: www.nova.edu/ssss/QR/QR13-4/baxter.pdf
[17] Numen 2000, Available: www.ingentaconnect.com/content/brill/num
[18] Bell 2010, books.google.co.in/books/.../Doing_Your_Research_Project.html?id..
[19] Miles and Huberman, 1994. Available: www.engin.umich.edu/.../maxwell-conceptual-
framework.pdf
[20] Information Leakage, http://uwcisa.uwaterloo.ca/Biblio2/Topic/KarenKarYanLeung.pdf
[21] Wang 2008, Available: www.glue.umd.edu/~sliang/papers/WangWH.LST.RSE2008.pdf
[22] Baker et al. 2011 and Loeb 2002. Available: https://ub-madoc.bib.uni-
mannheim.de/.../InES_Working_Paper_No...
[23] Information leakage by Zhenghong Wang, Nov 2012
[24] ] Information Leakage Prevention Accuracy and Privacy Tests, May 2006. Available:
www.docstoc.com/.../Information-Leak-Prevention-Accuracy-and-Security
[25] Data Loss Prevention Technologies by Tomoyoshi Takebayashi, Hiroshi Tsuda, Takayuki
Hasebe, and Ryusuke Masuoka (manuscript received April 14, 2009)
[26] ADP Wikipedia. Abailable: http://en.wikipedia.org/wiki/Automatic_Data_Processing
[27] Bloomberg News, Available: http://www.businessweek.com/news/2011-08-08/s-p-says-u-
s-downgrade-doesn-t-affect-aaa-rated-j-j-microsoft.html
[28] ADP, Available: http://www.adp.com/about-us.aspx
[29] ADP Corporate Overview, Available:
http://www.adp.com/~/media/ADPCorporateOverview_050712.ashx
Master Thesis - Data Loss/Leakage Prevention Page 67
[30] Understanding Anti-Malware, Available: http://www.cse-
cst.gc.ca/documents/services/csg-cspc/csg-cspc07l-eng.pdf
[31] Firewall Computing, Available: http://en.wikipedia.org/wiki/Firewall_%28computing%29
[32] Mell, Bergeron & Henning, Creating a Patch & Vulnerability management Program,
Available: http://csrc.nist.gov/publications/nistpubs/800-40-Ver2/SP800-40v2.pdf
[33] Robert Drum 2006, IDS AND IPS PLACEMENT FOR NETWORK PROTECTION,
Available: http://www.infosecwriters.com/text_resources/pdf/IDS_Placement_RDrum.pdf
[34] Wikipedia, Security Information and Event Management (SIEM), Available:
http://en.wikipedia.org/wiki/Security_information_and_event_management
[35] Identity And Access Management, Available:
http://www.karingroup.com/eng/about/what_is_identity.pdf
[36] DLP proactively protecting sensitive data, Available:
http://www.rsa.com/products/DLP/ds/9103_DLPDC_DS_0511.pdf
[37] RSA DLP Solution Brief, Available:
http://www.rsa.com/products/DLP/sb/9104_DLPST_SB_0311.pdf
[38] “Internal report from case”
[39] David Meizlik, The ROI of Data Loss Prevention, Websense
[40] DLP in Cloud, Available: http://searchcloudsecurity.techtarget.com/tip/Cloud-DLP-
Understanding-how-DLP-works-in-virtual-cloud-
environments?asrc=EM_USC_17666941
[41] Mark Rose, Available: http://www.brighttag.com/2012/03/13/data-loss-prevention-
through-the-cloud/
[42] DLP Product documentation, Available:
https://knowledge.rsasecurity.com/scolcms/set.aspx?id=9304
[43] Chris Porter (Cisco), Email Security with Cisco Iron Port.
Master Thesis - Data Loss/Leakage Prevention Page 68
[44] „Data Allocation Strategies in Data Leakage Detection‟, by Unnati Kavali, Tejal Adhang,
Mr. Vaibhav Narawade/ International Journel of Engineering Research and Applications
(IJERA) Available: www.ijera.com
[45] „Modeling and Detection of Data Leakage fraud‟, by Nageswarrao, Vungarala, Manoj
Kiran. Somidi, Krishnaiah.R.V. IOSR Journel of Computer Engineering (ISORJCE)
ISSN: 2278-0661 Volume 4, Issue 6 (Sept-Oct.2012) Available: www.iosrjournals.org
[46] Qualitative research Methods: A Data Collector‟s Field Guide, 2005 by Family Health
International. Available:
http://www.fhi360.org/nr/rdonlyres/emgox4xpcoyrysqspsgy5ww6mq7v4e44etd6toiejyxal
hbmk5sdnef7fqlr3q6hlwa2ttj5524xbn/datacollectorguideenrh.pdf
[47] Lubich, H.P; “The changing roel of IT security in an Internet world , a business
perspective”, May 22-25, 2000; Available:
http://www.terena.nl/conference/archieve/tnc2000/proceedings/2A/2a2.html
Master Thesis - Data Loss/Leakage Prevention Page 69
8. Appendices
8.1 Interview Questionnaire for data analysis
Below are the questionnaires which are questioned to a security officer in terms of DLP
performance on security operations in a motive on solving data loss/leakage problem, and
security gap problem and with respect to the organization employee’s awareness of DLP
technology.
1. We already have existing security technologies like Anti-virus, Firewall, IDS/IPS etc., and
after implementing DLP in the environment does it helped in preventing data leakage? If yes, to
what extent.
2. Data leakage has cost the organization with huge penalties, is DLP a good security control for
data leakage?
3. Is there any change in security incidents after implementing DLP in the organization? If so,
what is the difference?
4. Which are the traffic protocols like SMTP, HTTP, FTP etc. that are generating high amount of
security violations?
5. Does any of the policy or content blade requires fine tuning. If so which content blade or
policy and why?
6. DLP is a good security control when it comes to Data Leakage Prevention, but there are still
few areas which DLP doesn't address. Do you still think the organization is secure when it comes
to data leakage?
7. Data Loss Prevention has addressed many security gaps in terms of internal threat, according
to you what are the critical gaps which are addressed and how?
Master Thesis - Data Loss/Leakage Prevention Page 70
8. Are the employees aware of the Data Loss Prevention and the best practices to use Network,
Storage and Endpoint devices?
9. What is the remediation process followed in addressing security incidents? If it is an
intentional data leakage, which all teams or departments are involved?
10. DLP closely deals with internal threats, is the organization having proper security controls
for external threats?
11. What experience do you have with Data Loss Prevention?
12. What are the major security issues that are addressed by DLP technology compared to
previous technologies?
13. How the organization managed with preventing internal threats before implementing DLP
technology?
14. What made the organization to deploy DLP for filling the security gap? And how does the
DLP react exactly in filling a particular gap? So what is your opinion on this case?
15. How the performance of DLP was tested in minimizing the data loss on internal threats? And
what are the procedures that are followed in testing the DLP technology on this?
16. Is there any special training was given to organization employees when DLP was deployed?
If so what kind of training? And how the training helped the employees in better understanding
of DLP?
17. What are the methods for testing the DLP products? And how the methods favor the
organization to their expectation?
Master Thesis - Data Loss/Leakage Prevention Page 71
18. Is there any cases that are not solved by DLP? If so what‟s the case is? And what was the
reason that DLP is not able to solve the security issue?
19. What are the major differences noticed while performing security operations using DLP in
comparison with previous technologies? And how these differences contribute to organization
expectation?
20. How does a DLP policy helps in indentifying the sensitive content at data in storage? And
how this policy differs from other existing technologies?
21. How the awareness training program was implemented to the organization employee on
about DLP technology importance?
Master Thesis - Data Loss/Leakage Prevention Page 72
8.2 Abbreviations
DLP – Data Loss Prevention
ADP – Automatic Data Processing
HTTP – Hypertext Transfer Protocol
FTP – File Transfer Protocol
CIO – Chief Information officer
CSO - Chief Security officer
CISO - Chief Information Security officer
IP - Intellectual Property Protection
PCI - Personal credit card information
PII - Personally identifiable information
SMTP – Simple Mail Transfer protocol
ICAP – Internet Content adaptation protocol
WAN – Wide Area Network
NAS – Network Attached Storage
SAN – Storage Area Network
GUI – Graphical User Interface
ACL – Access Controlled List
XSS – Cross Site Scripting
SQL – Structured Query language
IDS - Intrusion detection system
IPS – Intrusion Prevention System
SIEM - Security Information Event Management
TCO - Total cost of ownership
PDA – Personal Digital Assistant
DAS - Direct-attached storage
iSCSI - Internet Small Computer System Interface
RBAC – Role Base Access Control
DRM - Digital rights management
ROI - Return on investment
PHI - Personal health information
S&O - Strategy and Operations
EPS - Earnings Per Share
FTE – Full Time Employee
VMM – Virtual Machine Manager
DSS – Decision Support System