research article
Post on 28-Oct-2014
575 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
Advanced DatabasesAdvanced Databases
Presentation onPresentation on
““A Suggested Model based on the A Suggested Model based on the open standards in Data Warehouseopen standards in Data Warehouse””
Presented by: Presented by: Shamama Tul Umber Parwaiz (0772122)Shamama Tul Umber Parwaiz (0772122)
22
ContentsContents
AbstractAbstract Introduction Introduction Literature Review Literature Review Research Methodology Research Methodology ConclusionConclusion Future WorkFuture Work Acknowledgement Acknowledgement References References
33
AbstractAbstract
The current data warehouse technology is not based on the The current data warehouse technology is not based on the
open standards; there exist several proprietary standards, open standards; there exist several proprietary standards,
but the unified agreed upon standards for data but the unified agreed upon standards for data
warehousing are still lacking, this is a driving force for warehousing are still lacking, this is a driving force for
certain issues like security, interoperability and integration certain issues like security, interoperability and integration
etc. in this research we have presented a model that etc. in this research we have presented a model that
describes the core layers in the data warehouses that are describes the core layers in the data warehouses that are
supposed to be based on the open standards and discusses supposed to be based on the open standards and discusses
some of those in detail. some of those in detail.
44
IntroductionIntroduction
In recent years the Data Warehousing has become a very useful In recent years the Data Warehousing has become a very useful
technology for integrating the operational data sources in a way technology for integrating the operational data sources in a way
that it gives the decision making capabilities to the top level that it gives the decision making capabilities to the top level
management of the organization. While designing and management of the organization. While designing and
developing the data warehouse the IT experts have to choose developing the data warehouse the IT experts have to choose
an appropriate approach, but unluckily the approaches are not an appropriate approach, but unluckily the approaches are not
based on open standards. based on open standards.
Continued on next slide …
55
Introduction Introduction Contd…Contd…
Continued on next slide …
The data warehouse technology is facing many challenges The data warehouse technology is facing many challenges
related to Design, Security, Performance, Data Cleaning, related to Design, Security, Performance, Data Cleaning,
Storage, Integration, Extraction, Transformation, Loading, Data Storage, Integration, Extraction, Transformation, Loading, Data
Refreshing, Schemas, Rollup, Drill down, and Interoperability, Refreshing, Schemas, Rollup, Drill down, and Interoperability,
after presenting the framework that gives the meta-after presenting the framework that gives the meta-
requirements for the data warehouse design, now we are requirements for the data warehouse design, now we are
moving towards a design model for data warehouses, this moving towards a design model for data warehouses, this
model will help in designing good data warehouses.model will help in designing good data warehouses.
66
Introduction Introduction Contd…Contd…
Continued on next slide …
This research is primarily focused to introduce a data This research is primarily focused to introduce a data
warehouse design model that is based on open standards and warehouse design model that is based on open standards and
meets the meta-requirements that we have presented in our meets the meta-requirements that we have presented in our
previous work, developing a data warehouses based on open previous work, developing a data warehouses based on open
standards will lead to a harmony and compatibility of several standards will lead to a harmony and compatibility of several
industry products.industry products.
77
Introduction Introduction Contd…Contd…
Data warehouse is not just a collection of data from several Data warehouse is not just a collection of data from several
operational data sources; rather we can consider the data operational data sources; rather we can consider the data
warehouse as a defined process containing three major steps: warehouse as a defined process containing three major steps:
• Extract data from the distributed operational sources, most of the Extract data from the distributed operational sources, most of the
times it is extracted from the legacy systems. times it is extracted from the legacy systems.
• Transforming and aggregating data consistently into warehouseTransforming and aggregating data consistently into warehouse
• Accessing the data in an efficient and flexible mannerAccessing the data in an efficient and flexible manner
The main contribution of the data warehouse is its power to The main contribution of the data warehouse is its power to
convert the data into information that can be used in strategic convert the data into information that can be used in strategic
decision making among the organizations [4]. decision making among the organizations [4].
88
Literature Review Literature Review
Continued on next slide …
Lack of standards Lack of standards
• There is lack of standards between industry and There is lack of standards between industry and
researchers as the have not yet agreed on a researchers as the have not yet agreed on a
unified standard, more over no standards for unified standard, more over no standards for
modeling data warehouse security exist as yet, the modeling data warehouse security exist as yet, the
design of data warehouse is not mining aware, the design of data warehouse is not mining aware, the
data warehouse design generally fulfills the OLAP data warehouse design generally fulfills the OLAP
requirements but do not address the Data Mining requirements but do not address the Data Mining
requirements [1]. requirements [1].
99
Literature Review Literature Review Contd…Contd…
Continued on next slide …
Lack of standards Lack of standards
• There is lack of standards between industry and There is lack of standards between industry and
researchers as the have not yet agreed on a researchers as the have not yet agreed on a
unified standard, more over no standards for unified standard, more over no standards for
modeling data warehouse security exist as yet, the modeling data warehouse security exist as yet, the
design of data warehouse is not mining aware, the design of data warehouse is not mining aware, the
data warehouse design generally fulfills the OLAP data warehouse design generally fulfills the OLAP
requirements but do not address the Data Mining requirements but do not address the Data Mining
requirements [1]. requirements [1].
1010
Literature Review Literature Review Contd…Contd…
Continued on next slide …
Integration problemIntegration problem• Integration issues related to data warehouses have Integration issues related to data warehouses have
also got the vital importance, some organizations also got the vital importance, some organizations
are settling data marts which are departmental are settling data marts which are departmental
subsets focused on selected subjects e.g., a subsets focused on selected subjects e.g., a
marketing data mart may include customer, marketing data mart may include customer,
product, and sales information, these data marts product, and sales information, these data marts
enable faster roll out, since they do not require enable faster roll out, since they do not require
enterprise-wide processing, but they lead to enterprise-wide processing, but they lead to
complex integration problems in the long run [2].complex integration problems in the long run [2].
1111
Literature Review Literature Review Contd…Contd…
Security Issues in data warehouse Security Issues in data warehouse • Security issues in data warehousing have also got Security issues in data warehousing have also got
vital importance, data from different systems vital importance, data from different systems having different security policies is integrated, the having different security policies is integrated, the users of the operational systems are not the same users of the operational systems are not the same as the users of the data warehouse, Access control as the users of the data warehouse, Access control schemes of Operational database objects (e.g., schemes of Operational database objects (e.g., tables) cannot be mapped easily to Data tables) cannot be mapped easily to Data warehouse items like dimensions, hierarchies etc. warehouse items like dimensions, hierarchies etc. therefore need for proper OLAP security design therefore need for proper OLAP security design arises [3].arises [3].
1212
MethodologyMethodology
The ETL Filter The ETL Filter
• We have proposed the ETL Filter in our design We have proposed the ETL Filter in our design
model, the functionality of this filter would be to model, the functionality of this filter would be to
only allow the data with agreed upon data types to only allow the data with agreed upon data types to
the data warehouse, in order to do this task the the data warehouse, in order to do this task the
ETL Filter will have a repository of standards that ETL Filter will have a repository of standards that
will be populated with the data types. will be populated with the data types.
Continued on next slide …
1313
Methodology Methodology Contd…Contd…
The ETL Filter The ETL Filter
Continued on next slide …
1414
Methodology Methodology Contd…Contd…
Platform Independent APIsPlatform Independent APIs• As per our research we have to come to a As per our research we have to come to a
conclusion that there is a strong need to develop conclusion that there is a strong need to develop
APIs for accessing data from a warehouse, these APIs for accessing data from a warehouse, these
APIs should be platform independent so that any APIs should be platform independent so that any
of the programming language can be used to of the programming language can be used to
connect the data warehouse, these APIs also help connect the data warehouse, these APIs also help
us in preventing the changes to applications if the us in preventing the changes to applications if the
underlying data warehouse is changed and vice underlying data warehouse is changed and vice
versa. versa.
Continued on next slide …
1515
Methodology Methodology Contd…Contd…
Platform Independent APIsPlatform Independent APIs
Continued on next slide …
1616
Methodology Methodology Contd…Contd…
Dimensional Security ManagementDimensional Security Management• Our approach for security is relatively simple, we Our approach for security is relatively simple, we
have introduces a layer known as “Dimensional have introduces a layer known as “Dimensional
Security Management Layer”, this layer manages Security Management Layer”, this layer manages
the dimension level security, there could be the dimension level security, there could be
several dimensions in a data warehouse data e.g., several dimensions in a data warehouse data e.g.,
sales, cities, profit, products etc. the users will be sales, cities, profit, products etc. the users will be
only allowed to query the dimensions for which only allowed to query the dimensions for which
have been permitted by the dimensional security have been permitted by the dimensional security
management layer. management layer.
Continued on next slide …
1717
Methodology Methodology Contd…Contd…
Dimensional Security Management Dimensional Security Management
Continued on next slide …
1818
Methodology Methodology Contd…Contd…
Dimensional Security ManagementDimensional Security Management
• In the initial stages the data warehouses were used and queried by In the initial stages the data warehouses were used and queried by
executive management and business analysts only. But now-a-days executive management and business analysts only. But now-a-days
the range of users with data warehouse access is increasing; the the range of users with data warehouse access is increasing; the
supposition that only limited users will access the data warehouse supposition that only limited users will access the data warehouse
is no longer appropriate and the need of proper security and access is no longer appropriate and the need of proper security and access
control mechanisms is becoming more and more important. Data control mechanisms is becoming more and more important. Data
warehouses have become open systems, especially OLAP analysis warehouses have become open systems, especially OLAP analysis
requires this open nature [3]. requires this open nature [3].
Continued on next slide …
1919[6]
2020[6]
2121[6]
2222
2323
Methodology Methodology Contd…Contd…
Dimensional Security ManagementDimensional Security Management
• The table – 1 is proposed to store the security The table – 1 is proposed to store the security
information in a very simple way, the table information in a very simple way, the table
contains two parts first part contains the header contains two parts first part contains the header
information, and the second contains the security information, and the second contains the security
information.information.
• The header information section contains the The header information section contains the
“Attribute and Value” pairs that contain attributes “Attribute and Value” pairs that contain attributes
like OLAP Server name, version etc. like OLAP Server name, version etc.
Continued on next slide …
2424
Methodology Methodology Contd…Contd…
Dimensional Security ManagementDimensional Security Management
• The security information section of the table contains the list of The security information section of the table contains the list of
users in the rows and the dimensions in the columns, the users in the rows and the dimensions in the columns, the
intersection between the rows and columns that are basically the intersection between the rows and columns that are basically the
‘cells’ contain the access rights of the specific user over the ‘cells’ contain the access rights of the specific user over the
particular dimension.particular dimension.
• If a particular user wants information regarding the sale of a If a particular user wants information regarding the sale of a
particular product in different cities over specified time period, then particular product in different cities over specified time period, then
he must have the access rights for the three dimensions, i.e., he must have the access rights for the three dimensions, i.e.,
product, city and time, if he does not possess the access rights to product, city and time, if he does not possess the access rights to
any of them he will not be able to view the specified report. any of them he will not be able to view the specified report.
Continued on next slide …
2525
Methodology Methodology Contd…Contd…
Dimensional Security ManagementDimensional Security Management• The same table can be used to define the roles, which The same table can be used to define the roles, which
simplify the access control management; a role is defined simplify the access control management; a role is defined
once and can be assigned to multiple users, and at the same once and can be assigned to multiple users, and at the same
time one user may possess multiple roles. time one user may possess multiple roles.
• Now we describe an algorithm that will determine the access Now we describe an algorithm that will determine the access
of a particular user over a dimension set. The algorithm of a particular user over a dimension set. The algorithm
takes the User and Dimension list as input and returns the takes the User and Dimension list as input and returns the
access rights to perform any operation on the given access rights to perform any operation on the given
dimension set in form of true or false. dimension set in form of true or false.
Continued on next slide …
2626
2727
2828
Flow of AlgorithmFlow of AlgorithmStart
User: ‘u’Dimensions: (d1, d2, …, dn)
i = Index of user in table
j = Index of next dimension ‘d’ in table
j <= n ?
table [i][j] <> R
Yes
NoA
Return ‘Failure’
No
End
AReturn ‘Success’
2929
Methodology Methodology Contd…Contd…
Meta-data Definition Language (MDL)Meta-data Definition Language (MDL)
• Our study shows that the meta data in different data Our study shows that the meta data in different data
warehouses is stored in different ways, here we want to warehouses is stored in different ways, here we want to
introduce the concept of MDL i.e., Meta data Definition introduce the concept of MDL i.e., Meta data Definition
Language, if every data warehouse follows this language, Language, if every data warehouse follows this language,
the integration problems can be resolved, in existing data the integration problems can be resolved, in existing data
warehouse solutions we have introduced an MDL translator warehouse solutions we have introduced an MDL translator
layer that can work with the existing system without making layer that can work with the existing system without making
much changes. much changes.
Continued on next slide …
3030
3131
ConclusionConclusion
We have described the substantial technical challenges in developing and We have described the substantial technical challenges in developing and
deploying data warehouses in our research. While many commercial deploying data warehouses in our research. While many commercial
products and services exist, there is lack of standards at the same time; products and services exist, there is lack of standards at the same time;
there are still several interesting areas for research in developing open there are still several interesting areas for research in developing open
standards for data warehouses.standards for data warehouses.
After going through the literature review, we have come to point that some After going through the literature review, we have come to point that some
efforts have already been taken regarding platform independent design efforts have already been taken regarding platform independent design
and their respective implementations, but not much work has been done in and their respective implementations, but not much work has been done in
defining the open standards for data warehouses as there have been defining the open standards for data warehouses as there have been
efforts in defining open standards for web services. efforts in defining open standards for web services.
3232
Future WorkFuture Work
We have presented a big picture for designing We have presented a big picture for designing
data warehouse based on open standards that do data warehouse based on open standards that do
not exist today, how ever there is need to explore not exist today, how ever there is need to explore
and materialize each of these.and materialize each of these.
Methodology Contd…Methodology Contd…
3333
ACKNOWLEDGMENTACKNOWLEDGMENT
Authors of this paper pay special thanks to their most Authors of this paper pay special thanks to their most
respectable instructor and supervisor for this work Mr. respectable instructor and supervisor for this work Mr.
Aslam Parvez for his contributions and guidance through Aslam Parvez for his contributions and guidance through
out the work.out the work.
3434
References References [1] [1] Stefano Rizzi, "Research in Data Warehouse Modeling Stefano Rizzi, "Research in Data Warehouse Modeling
and and Design: Dead or Alive?" Design: Dead or Alive?" DOLAP’06DOLAP’06, November , November 10, 2006, 10, 2006, Arlington, Virginia, USAArlington, Virginia, USA
[2][2]Surajit Chaudhuri and Umeshwar Dayal, “An Overview Surajit Chaudhuri and Umeshwar Dayal, “An Overview of of Data Warehousing and OLAP Technology” Data Warehousing and OLAP Technology” ACM ACM digital Librarydigital Library, , research.microsoft.comresearch.microsoft.com
[3][3]Torsten Priebe, “Towards OLAP Security Design – Survey and Torsten Priebe, “Towards OLAP Security Design – Survey and Research Issues” Research Issues” DOLAP 2000 McLeanDOLAP 2000 McLean, VA, USA, ACM ISBN , VA, USA, ACM ISBN 1-1-58113-323-558113-323-5
[4][4]Fabio Rilston, Jaelson Freire “DWARF: An Approach for Fabio Rilston, Jaelson Freire “DWARF: An Approach for Requirements Definition and Management of Data Warehouse Requirements Definition and Management of Data Warehouse Systems” Systems” 11th IEEE International Requirements Engineering 11th IEEE International Requirements Engineering ConferenceConference 1090-705X/03, 2003. 1090-705X/03, 2003.
[5][5]Emilio Soler, Juan Trujillo, “A framework for developing Emilio Soler, Juan Trujillo, “A framework for developing secure secure data warehouses based on MDA and QVT” data warehouses based on MDA and QVT” 2nd 2nd International International Conference on availability, reliability and Conference on availability, reliability and security (ARES 07)security (ARES 07) 0-7695-2775-2/07, 2007 IEEE. 0-7695-2775-2/07, 2007 IEEE.
3535
ReferencesReferences [6] [6] Roger Fang, Sama Tuladhar, "Teaching Data Warehousing Roger Fang, Sama Tuladhar, "Teaching Data Warehousing
& Data Mining in a Graduate Program of & Data Mining in a Graduate Program of Information Information Technology" Mid-South Conference, JCSC 21, 5 (May 2006)Technology" Mid-South Conference, JCSC 21, 5 (May 2006)
36
??
?? ??Thanks …
top related