security measures for data warehouse -...

5
International Journal of Science, Engineering and Technology Research (IJSETR) Volume 3, Issue 6, June 2014 1729 ISSN: 2278 – 7798 All Rights Reserved © 2014 IJSETR Security Measures For Data warehouse Arvind Jaiswal AbstractData warehouse is an archive that stores the data of an enterprise gathered from multiple sources making it an environment suitable for optimal information processing. For any enterprise, a data warehouse is crucial since this comprehensive information stored in the data warehouse could be used to analyze long term data and carry out analysis of trends over a period of time. Since the data involved is accessible by different audiences through various leads, it requires a high degree of control and security. This paper is focused on highlighting the effective security measures that can be taken in the data warehouse design, an example scenario dealing with a secured solution for a health care system and key areas of providing security in a data warehouse. Most of the data warehouse developers treat security as a trivial issue while designing the data base. This could lead to devastating results in an enterprise where crucial data is managed and accessed from the data wares. Consider a scenario of a corporate IT firm designing a data warehouse consisting of data from various business units and branches. The firm has made it easy for all its employees to access data and diverse information which could be important in taking decisions. However, if necessary measures are not taken to impart security into the system, there is every possibility that important project details or personal details like salary, your bank statements and performance evaluation results could be easily accessed by hackers. Thus it is extremely important to have internal control and security mechanisms to protect the confidentiality and integrity of data without compromising on the availability of information. Index TermsData warehouse, security, integrity, enterprise, confidentiality, long term data. I. INTRODUCTION With the need to assure the confidentiality, integrity and availability of data in a data warehouse environment, security requirements in data warehouse came into existence. The DW (Data Warehouse) essentially holds the business intelligence for the enterprise to enable strategic decision making. It not only provides an integrated view of enterprise but also renders the organizations information consistently. The DW secure applications must prevent unauthorized users from accessing or modifying data, henceforth the data must be available to the right set of users at the right time. The system must also keep a record of activities performed by its users. II. IMPLEMENTING SECURITY AT VARIOUS STAGES OF DATA WAREHOUSE CYCLE The designer of the system must focus on not just designing a technically elegant DW with security measures imparted where ever necessary, but must Arvind Jaiswal, Asst. Prof., Faculty of Computer Application Acropolis Institute of Technology, Indore (M.P.) also ensure that the designed DW does not get mired does not get mired in cost or time over runs. The following are a three step check that DW designers must consider should ensure that the data warehouse is not vulnerable. However, the designer must consider the security requirements from requirement gathering to maintenance, post deployment. Thus the effective mechanisms to ensure security is classified under the three important sections namely, requirements gathering, design and deployment. Fig 1: Security measures at various stages of DW cycle Step 1 - Categorize the data based on sensitivity and identify areas of data susceptibility The data managed in the date warehouse must be categorized based on the intended audience and sensitivity to disclosure. Suitable security measures must be taken while providing the access rights to stake holders as the data could be susceptible to modification or destruction. The classification based on sensitivity to disclosure is generally in the following three levels. 1. Least sensitive The data in this category is not classified and is available to all end users of the data warehouse irrespective of their levels. This data could usually be the common enterprise practices, declarations, laws governing, etc. Thus it is not required to have stringent security criteria for the same. 2. Moderately sensitive The data in this category is moderately sensitive and thus public access to data is not provided. Required personnel access this data based on need to carry out functionalities which would not be accomplished without this data. This may include

Upload: phungtuong

Post on 30-Mar-2018

218 views

Category:

Documents


4 download

TRANSCRIPT

International Journal of Science, Engineering and Technology Research (IJSETR) Volume 3, Issue 6, June 2014

1729ISSN: 2278 – 7798 All Rights Reserved © 2014 IJSETR

Security Measures For Data warehouse Arvind Jaiswal

Abstract— Data warehouse is an archive that stores the data of an enterprise gathered from multiple sources making it an environment suitable for optimal information processing. For any enterprise, a data warehouse is crucial since this comprehensive information stored in the data warehouse could be used to analyze long term data and carry out analysis of trends over a period of time. Since the data involved is accessible by different audiences through various leads, it requires a high degree of control and security. This paper is focused on highlighting the effective security measures that can be taken in the data warehouse design, an example scenario dealing with a secured solution for a health care system and key areas of providing security in a data warehouse.Most of the data warehouse developers treat security as a trivial issue while designing the data base. This could lead to devastating results in an enterprise where crucial data is managed and accessed from the data wares. Consider a scenario of a corporate IT firm designing a data warehouse consisting of data from various business units and branches. The firm has made it easy for all its employees to access data and diverse information which could be important in taking decisions. However, if necessary measures are not taken to impart security into the system, there is every possibility that important project details or personal details like salary, your bank statements and performance evaluation results could be easily accessed by hackers. Thus it is extremely important to have internal control and security mechanisms to protect the confidentiality and integrity of data without compromising on the availability of information.

Index Terms— Data warehouse, security, integrity, enterprise, confidentiality, long term data.

I. INTRODUCTION

With the need to assure the confidentiality, integrity and availability of data in a data warehouse environment, security requirements in data warehouse came into existence. The DW (Data Warehouse) essentially holds the business intelligence for the enterprise to enable strategic decision making. It not only provides an integrated view of enterprise but also renders the organizations information consistently. The DW secure applications must prevent unauthorized users from accessing or modifying data, henceforth the data must be available to the right set of users at the right time. The system must also keep a record of activities performed by its users.

II. IMPLEMENTING SECURITY AT VARIOUS STAGES OF DATA WAREHOUSE CYCLE

The designer of the system must focus on not just designing a technically elegant DW with security measures imparted where ever necessary, but must

Arvind Jaiswal, Asst. Prof., Faculty of Computer ApplicationAcropolis Institute of Technology, Indore (M.P.)

also ensure that the designed DW does not get mired does not get mired in cost or time over runs. The following are a three step check that DW designers must consider should ensure that the data warehouse is not vulnerable. However, the designer must consider the security requirements from requirement gathering to maintenance, post deployment. Thus the effective mechanisms to ensure security is classified under the three important sections namely, requirements gathering, design and deployment.

Fig 1: Security measures at various stages of DW cycle

Step 1 - Categorize the data based on sensitivity and identify areas of data susceptibility

The data managed in the date warehouse must be categorized based on the intended audience and sensitivity to disclosure. Suitable security measures must be taken while providing the access rights to stake holders as the data could be susceptible to modification or destruction. The classification based on sensitivity to disclosure is generally in the following three levels.

1. Least sensitive

The data in this category is not classified and is available to all end users of the data warehouse irrespective of their levels. This data could usually be the common enterprise practices, declarations, laws governing, etc. Thus it is not required to have stringent security criteria for the same.

2. Moderately sensitive

The data in this category is moderately sensitive and thus public access to data is not provided. Required personnel access this data based on need to carry out functionalities which would not be accomplished without this data. This may include

International Journal of Science, Engineering and Technology Research (IJSETR) Volume 3, Issue 6, June 2014

1730ISSN: 2278 – 7798 All Rights Reserved © 2014 IJSETR

investment details, financial statements, personnel information etc. Privacy laws govern such information and must be considered while providing access.

3. Highly sensitive

This data is highly sensitive is accessible only to high level data warehouse users. Information in this category could be critical like the trade secrets, recruitment strategies, quotation details etc. Special access privileges must be provided for users accessing the data and stringent security must be enforced through legitimate privileges.The data warehouse security is largely dependent on factors associated with the data warehouse environment. Constraints imposed by the environment could lead to critical failures which have to be taken care of. Must be capable of handling concurrent access

of data of different sensitivity levels. Enterprises using a single data ware server for both confidential and top secret information must ensure that non leak of information takes place.

If the enterprise is using operating system access control along with in built mechanisms there is high probability that the problem is exacerbated.

The availability or accessibility must not be provided at the cost of compromise on the integrity or security of data.

Care must be taken about the natural factors, utility factors and human threats which might intrude into the critical data ware.

Step 2 - Formulate effective measures to impart security during design phase itself

Vulnerabilities due to the environment as discussed in the previous section could be taken care of with cost effective mechanisms that ensure the integrity of the data warehouse. The following are some general effective measures that the designer might consider during the design of the DW. Creation of disaster recovery systems that will

be enabled in event of any failure. Inclusion of control mechanisms to prevent

access to update or delete historical data and merge data.

Encryption mechanisms that ensure that data is accessed in an authorized way that nullifies the probability of data fabrication of any kind.

Usage of basic DBMS mechanisms to partition sensitive data into separate tables.

Step 3 - Have a surveillant eye on the data in a data warehouse

A vigilant mechanism to detect a security flaw is also required post design of the data ware and is a

persistent responsibility of maintaining the system without critical failures. This section aims at identifying the integrity of data in the data warehouse. This is a crucial stage where security needs surface and is also the most challenging phase. Generally data monitoring techniques are employed to obtain comprehensive information about the tables in date warehouse, rows and column details, users using the data and frequency of usage, etc. The following security areas must be given importance The enterprise must ensure to employ

continuous monitoring of the data structures and align rule validations that detect changes made in the data by third party sources.

Key personnel involved in the monitoring or maintenance of the data warehouse must be informed of violations or when threshold is exceeded, through alerts and notifications to take immediate action.

Must carry out regular analysis using tools that could even be over night job runs that do not intrude in the daily work.

If the business users are accessing data through a web browser, then charts, graphs and score cards can be used to reveal the true nature and quality of data.

III. SECURITY CONCERNS AND ADOPTIVE MEASURES IN A DATA WAREHOUSE

ENVIRONMENT

The following sections describe the grey areas in a database environment where security measures can be employed. They can be broadly classified into four areas as explained below.

Data Warehouse System

This section deals with security techniques for a data warehouse system that helps to identify the vulnerabilities and allows the users to access the required data without worrying about security.An important step in managing the security for a DW is to identify a security manager who must develop and document a security plan. The security manager should be involved in the architecture design and verify the actual setup and use of the DW. Every system change needs to be examined from a security perspective. Many organizations require a mandatory sign off by the security manager as a part of deployment process.

Hardware and Operating System

The most direct way to access the valuable information in the DW / BI system is to gain physical access to the computers on which the system is running. All test and production servers must be kept in a data center with restricted access.

International Journal of Science, Engineering and Technology Research (IJSETR)

ISSN: 2278 – 7798

The second most direct way to access DW / BI system is by the way of operating system. Following security measures can be followed. Restrict login access. Only system

administrators need to log in to the server running the DW / BI components. Others can access services across the network.

Restrict network access. Ensure data directories are secure, including

database files. Keep up to date with security patches for the

operating system.

Development Environment

The development environment and servers should be managed professionally, though usually not to the same standards as the test and production systems. The data on development servers is often sensitive as it is drawn from the source systems. Providing strict access control to the development servers is always necessary.

Internet Network

Most intranets inside organizations use the same TCP/IP protocol as the internet. Most networks have a complex set of devices and functionhelp manage the flow of packets around the organization. Unauthorized access to internal information assets through the Internet must be restricted. The various devices and functions operating in the network of the organization include routers, firewalls and the directory servers. Care must be taken to secure data as it flows through all intervals.

CASE STUDY: Secured data warehousing design for a health care system

Having seen the basic effective measures a designer must consider during DW design,following scenario aims at edifying the importance of the same. This simple scenario looks at the design of a DW for a health care system with various wings. These various departments interact and exist cohesively and thus the designed DW must be able to operate with out inconsistencies in the system. The essence of a data warehouse is accessibility of data that governs data to be subject oriented, integrated, nonvolatile, consistent and accurate.In a health care system, the scenario is fairly different from other institutions in a sense that there are no fixed repetitive transactions and each encounter is unique. Data warehouses have traditionally lent themselves to transactions and numbers. The institutions that have been successful with data warehousing like the banks, retailers, manufacturers, etc deal with files, transactions and numbers. Healthcare on the contrary has textual

International Journal of Science, Engineering and Technology Research (IJSETR) Volume 3, Issue 6, June 2014

All Rights Reserved © 2014 IJSETR

he second most direct way to access DW / BI system is by the way of operating system. Following security measures can be followed.

Restrict login access. Only system tors need to log in to the server

running the DW / BI components. Others can access services across the network.

Ensure data directories are secure, including

Keep up to date with security patches for the

The development environment and servers should be managed professionally, though usually not to the same standards as the test and production systems. The data on development servers is often

the source systems. Providing strict access control to the development

Most intranets inside organizations use the same TCP/IP protocol as the internet. Most networks have a complex set of devices and functions that

manage the flow of packets around the organization. Unauthorized access to internal information assets through the Internet must be restricted. The various devices and functions operating in the network of the organization

walls and the directory servers. Care must be taken to secure data as it flows

: Secured data warehousing design

Having seen the basic effective measures a designer must consider during DW design, the following scenario aims at edifying the importance of the same. This simple scenario looks at the design of a DW for a health care system with various wings. These various departments interact and exist cohesively and thus the designed DW

to operate with out inconsistencies in the system. The essence of a data warehouse is accessibility of data that governs data to be subject oriented, integrated, nonvolatile, consistent and

In a health care system, the scenario is fairly different from other institutions in a sense that there are no fixed repetitive transactions and each encounter is unique. Data warehouses have traditionally lent themselves to transactions and

nstitutions that have been successful with data warehousing like the banks, retailers, manufacturers, etc deal with files, transactions and numbers. Healthcare on the contrary has textual

descriptions of the different medical encounters or simple verbiage. The department wide information includes medical records of patients, salary information of the doctors and nursing, financial data, infrastructure details involving the equipments etc. For each patient getting admission in the hospital, it is recorded - who was admitted (Patient details), what was the primary Diagnosis, which bed the patient was given (Placement) and which Insurance will cover the expenses. Health care professionals like doctors, nurses and therapists need to access patient, diagnosis and placement details. The administration wing on the other hand is interested in overall figures like number of patients admitted. The details of billing, insurance covers etc must be available to the accounts wing. Thus to formulate necessary security measures, the system must consider “who needs which data?” and “who should be allowed to see what?”Let us consider that our health care system is managing an enterprise data warehouse that will be used by many its sub-divisions and subsidiaries. The health care system has to adhere to security guidelines that ensure that the employees of each health care sub-division will only be able to view the data that is relevant to their own division, while also providing the employees in its health care system to view the overall picture.

Fig 2: Data model of health care system

Therefore, the need for categorizing data according to user requirements is vital and henceforth we can create a data mart for every department. Data Marts are a smaller and specific purpose warehouses. The difference between Data Warehouse and Data Marts is that, a data mart is a store house of data pertaining to a smaller unit or division while a data warehouse consists of enterprise wide data. Data Marts is used on a business division/department level. Thus, for the given scenario, every department of the health care system can be modeled as a data mart there by

Volume 3, Issue 6, June 2014

1731

descriptions of the different medical encounters or

The department wide information includes medical records of patients, salary information of the doctors and nursing, financial data, infrastructure details involving the equipments etc. For each patient getting admission in the hospital, it is

who was admitted (Patient details), what was the primary Diagnosis, which bed the patient was given (Placement) and which Insurance will cover the expenses. Health care professionals like doctors, nurses and therapists need to access

nd placement details. The administration wing on the other hand is interested in overall figures like number of patients admitted. The details of billing, insurance covers etc must be available to the accounts wing. Thus to formulate

sures, the system must consider “who needs which data?” and “who

Let us consider that our health care system is managing an enterprise data warehouse that will be

divisions and subsidiaries. are system has to adhere to security

guidelines that ensure that the employees of each division will only be able to view

the data that is relevant to their own division, while also providing the employees in its health care

Fig 2: Data model of health care system

Therefore, the need for categorizing data according to user requirements is vital and henceforth we can create a data mart for every department. Data Marts are a smaller and specific purpose oriented data warehouses. The difference between Data Warehouse and Data Marts is that, a data mart is a store house of data pertaining to a smaller unit or division while a data warehouse consists of enterprise wide data. Data Marts is used on a

division/department level. Thus, for the given scenario, every department of the health care system can be modeled as a data mart there by

International Journal of Science, Engineering and Technology Research (IJSETR)

ISSN: 2278 – 7798

giving faster access to department wide data with little or no training. The benefits of using data marts are that, single entities and thus help focus on specific tasks. They use dimensional data modeling which optimizes data for reports, and hence provide users with faster access to common data. Users with little knowledge or no training at all can browse a dmart and obtain information as needed. They are also inexpensive and not complex to design.However, usage of only the data marts in the design of the health care system has certain security level disadvantages including other limitations on data size, functionality and consolidation. The lack of co ordination between sub divisions could make room for security threats since they are physically separate. This could also bring in environmental threats because of the heterogeneous nature of the system. Canalysis of these sub divisions could be prone to errors. Thus using a heterogeneous set of data marts alone would not be suitable to manage the health care system and there is a need for a central control that provides better consistency and manacommunicating with the department data marts. This brings us to develop a system as depicted in the diagram. Here, source transaction systems generate, capture and store the data. The data warehouse is a central repository that accumulates historical data. This is connected to the various data marts of the health care system pertaining to different departments. The data is then loaded to the OLAP server and the business intelligence systems are used to transform the data which could be used to generate reports, carry out analysis etc.

Fig 3: Health care data warehouse design

This central warehouse brings in the following advantages to the health care system design Consistent security model: If we have dozens of data marts using different database servers, it may be difficult to implement consistent

International Journal of Science, Engineering and Technology Research (IJSETR) Volume 3, Issue 6, June 2014

All Rights Reserved © 2014 IJSETR

giving faster access to department wide data with

The benefits of using data marts are that, they are single entities and thus help focus on specific tasks. They use dimensional data modeling which optimizes data for reports, and hence provide users with faster access to common data. Users with little knowledge or no training at all can browse a data mart and obtain information as needed. They are also inexpensive and not complex to design.However, usage of only the data marts in the design of the health care system has certain security level disadvantages including other

functionality and consolidation. The lack of co ordination between sub divisions could make room for security threats since they are physically separate. This could also bring in environmental threats because of the heterogeneous nature of the system. Combined analysis of these sub divisions could be prone to

Thus using a heterogeneous set of data marts alone would not be suitable to manage the health care system and there is a need for a central control that provides better consistency and management by communicating with the department data marts. This brings us to develop a system as depicted in the diagram. Here, source transaction systems generate, capture and store the data. The data warehouse is a central repository that accumulates

ical data. This is connected to the various data marts of the health care system pertaining to different departments. The data is then loaded to the OLAP server and the business intelligence systems are used to transform the data which could

erate reports, carry out analysis etc.

Fig 3: Health care data warehouse design

This central warehouse brings in the following advantages to the health care system design

Consistent security model: If we have dozens of data marts using different database servers, it may be difficult to implement consistent

security policies identically across all of those datamarts. With a single data warehouse, we can implement the security policies in one place and they are applied consistently across all of the data. Therefore, Instead of using a dozens of heterogeneous data marts, a consolidated data warehouse is much simpler to secure and less expensive to manage. Central management: With a central data

warehouse, an organization can devote more resources towards providing sound security. It is expensive for an enterprise to implement the same level of security across multiple data marts than it can implement on a centrally managed data warehouse.

Fewer points of susceptability: When data is spread among dozens of data marts, a malicious employee can choose the weakest system to attack. With a consolidated system, there is only a single system that needs to be secured.

Easy maintenance: Many security breaches are caused by the simple fact that patches have not been applied to all systems before someone tried to hack one of them. A single data warehouse allows rapid installation of security patches and is much simpler to administer.

Following additional security measures can be employed:Access control mechanisms can be employed to

restrict the data access based on category of users.

System use policy statements can be developed which users might sign before gaining access.

Detailed security policy can be published including the list of sensitive elements.

Encryption mechanisms can be used where ever necessary.

Considering the above explained advantages of a central model, it is advisable to design a health care system DW as a central data warehouse which is consistent, consolidated and easier to maintain. This is effective in preventing security violations and hazardous events.

IV. CONCLUSION

From a security perspective, it is always prudent to align the data warehouse design to implement security measures right from the phase of planning to deployment. Choosing the right design for the enterprise plays a crucial role in fore seeing threats to information that are crucial to analyze trends. The task of defining and implementing security spans the lifecycle as highlighted in the first section. It’s important to get management's view, but also talk to analysts and other potential users of the system, about the kind of information they need, to carry out their tasks effectively. Before

Volume 3, Issue 6, June 2014

1732

security policies identically across all of those data-marts. With a single data warehouse, we can

ty policies in one place and they are applied consistently across all of the data. Therefore, Instead of using a dozens of heterogeneous data marts, a consolidated data warehouse is much simpler to secure and less

ith a central data warehouse, an organization can devote more resources towards providing sound security. It is expensive for an enterprise to implement the same level of security across multiple data marts than it can implement on a centrally

Fewer points of susceptability: When data is spread among dozens of data marts, a malicious employee can choose the weakest system to attack. With a consolidated system, there is only a single system that needs to be secured.

: Many security breaches are caused by the simple fact that patches have not been applied to all systems before someone tried to hack one of them. A single data warehouse allows rapid installation of security patches and is much simpler to administer. lowing additional security measures can be

Access control mechanisms can be employed to restrict the data access based on category of

System use policy statements can be developed which users might sign before gaining access.

urity policy can be published including the list of sensitive elements.Encryption mechanisms can be used where ever

Considering the above explained advantages of a central model, it is advisable to design a health care system DW as a central data warehouse which is consistent, consolidated and easier to maintain. This is effective in preventing security violations

ONCLUSION

From a security perspective, it is always prudent to align the data warehouse design to implement security measures right from the phase of planning to deployment. Choosing the right design for the

le in fore seeing threats to information that are crucial to analyze trends. The task of defining and implementing security spans the lifecycle as highlighted in the first section. It’s important to get management's view,

her potential users of the system, about the kind of information they need, to carry out their tasks effectively. Before

International Journal of Science, Engineering and Technology Research (IJSETR) Volume 3, Issue 6, June 2014

1733ISSN: 2278 – 7798 All Rights Reserved © 2014 IJSETR

laying out strict security policies, it is important to note that a DW / BI (Data Warehouse/Business Intelligence) system is valuable only if people can access it. The more the relevant information available, the more the value of the system is. Careful management of data requires protecting the confidential data and publishing the rest, ensuring that only authorized users access the DW and there is a provision to limit the view of data to appropriate users.

REFERENCES[1] Book: Paulraj Ponniah, Data Warehousing fundamentals,

Wiley, 2005. [2] Journal paper : Arnon Rosenthal and Edward Sciore, View

Security as the Basis for Data Warehouse Security, Proceedings of the International Workshop on Design and Management of Data Warehouse (DMDW’2000), Sweden, June, 2000.

[3] Internet resource: Hari Mailvaganam, Design Methodologies of Kimball and Inmon plus a Third Way,http://www.dwreview.com/Articles/KimballInmon.html, 21 July 2009.

[4] Systems Security Engineering Capability Maturity Model SSE-CMM - Model Description Document v3.0. Carnegie Mellon University, 2003. http://www.ssecmm.org/model/model.asp.

[5] F. Emekci, O. Sahin, D. Agrawal, and A. E. Abbadi.Privacy preserving decision tree learning over multiple parties. Data & Knowledge Engineering, 63:348–361,2007.

[6] E. Fernandez-Medina, J. Trujillo, R. Villarroel, and M. Piattini. Access control and audit model for the multidimensional modeling of data warehouses. Decision Support Systems, 42:1270–1289, 2006.

[7] E. Fernandez-Medina, J. Trujillo, R. Villarroel, and M. Piattini. Developing secure data warehouses with a uml extension. Information systems, 32:826–859, 2007.

[8] Y. Liu, S. Y. Sung, and H. Xiong. A cubic-wise balance approach for privacy preservation in data cubes. Information Sciences, 176:1215–1240, 2006.

[9] Oracle Security and the data warehouse.Oracle While Paper, 2005. http://www.oracle.com/technology/products/bi/db/10g/pdf/twp_bi_dw_security_10gr1_0405.pdf.

[10] A. Rosenthal and E. Sciore. View security as the bases for data warehouse security. In Proceedings of the International Workshop on Design and Management of Data Warehouses (DMSW 2000), pages 8:1–8:8, 2000.

[11] N. Szirbik, C. Pelletier, and T. Chaussalet. Six methmodological steps to build medical data warehouses for research. International Journal of Medical Informatics, 75:683–691, 2006.

[12] S. Warigon. Data warehouse control & security - seven-step program to secure database warehouses. findarticles.com, 1998. http://findarticles.com/p/articles/mi_m4153/is_n1_v55/ai_20568160.

Mr. Arvind Jaiswal is working as Assistant. Professor in Faculty of Computer Application department of Acroplis Institute of Technology and Research, Indore (M.P.). He has 11 years of Teaching Experience. He Has done M.Tech (CSE) from Amity University, NOIDA(UP). He has published 11 research papers in reputed International Journal, International

and National conferences. His area of expertise is Data Mining and Date Warehousing.