data warehouse final report

Upload: li-bred

Post on 05-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Data Warehouse Final Report

    1/19

    Data WarehouseUnlocking the mystery

  • 7/31/2019 Data Warehouse Final Report

    2/19

    Data warehousing is the design andimplementation of processes, tools, and

    facilities to manage and deliver complete,timely, accurate, and understandableinformation for decision making.

    It includes all the activities that make itpossible for an organization to create,manage, and maintain a data warehouseor data mart.

  • 7/31/2019 Data Warehouse Final Report

    3/19

  • 7/31/2019 Data Warehouse Final Report

    4/19

    The data warehouse is that portionof an overall Architected DataEnvironment that serves as the

    single integrated source of data forprocessing information

  • 7/31/2019 Data Warehouse Final Report

    5/19

    Brief History

    The concept of data warehousing dates back to the late 1980s when IBMresearchers Barry Devlin and Paul Murphy developed the business datawarehouse.

    In essence, the data warehousing concept was intended to provide an

    architectural model for the flow of data from operational systems to decisionsupport environments.

    The origin of the concept of data warehousing can be traced back to the early1980s, when relational database management systems emerged ascommercial products.

    One of the prime concerns underlying the creation of these systems was theperformance impact of end-user computing on the operational data processingsystems. This concern prompted the requirement to separate end-usercomputing systems from transactional processing systems.

    The foundation of the relational model with its simplicity, together with thequery capabilities provided by the SQL language, supported the growinginterest in what then was called end-user computing or decision support.

  • 7/31/2019 Data Warehouse Final Report

    6/19

    Brief History

    One of the prime concerns underlying the creation of thesesystems was the performance impact of end-usercomputing on the operational data processing systems.

    This concern prompted the requirement to separate end-user computing systems from transactional processingsystems.

    The role and purpose of data warehouses in the dataprocessing industry have evolved considerably since thoseearly days and are still evolving rapidly.

    Comparing todays data warehouses with the early daysdecision support databases should be done with great care.

  • 7/31/2019 Data Warehouse Final Report

    7/19

    Brief History

    Data warehouses should no longer be identifiedwith database systems that support end-userqueries and reporting functions.

    They should no longer be conceived as snapshotsof operational data.

    Data warehouse databases should be considered

    as new sources of information, conceived for useby the whole organization or for smallercommunities of users and data analysts withinthe organization

  • 7/31/2019 Data Warehouse Final Report

    8/19

    Characteristicsof Data warehouse

    Subject-Oriented: Information is presented according to specificsubjects or areas of interest, not simply as computer files. Data is

    manipulated to provide information about a particular subject. Integrated: A single source of information for and about

    understanding multiple areas of interest. Non-Volatile: Stable information that doesnt change each time an

    operational process is executed. Information is consistent regardlessof when the warehouse is accessed.

    Time-Variant: Containing a history of the subject, as well as currentinformation. Historical information is an important component of adata warehouse.

    Accessible: The primary purpose of a data warehouse is to providereadily accessible information to end-users.

    Process-Oriented: It is important to view data warehousing as aprocess for delivery of information. The maintenance of a datawarehouse is ongoing and iterative in nature.

  • 7/31/2019 Data Warehouse Final Report

    9/19

    Based on analogies with real-life warehouses, data warehouseswere intended as large-scale collection/storage/staging areasfor corporate data. Data could be retrieved from one centralpoint or data could be distributed to retail stores or data

    marts that were tailored for ready access by users.

    Data Mart: A data structure that is optimized for access. It isdesigned to facilitate end-user analysis of data. It typicallysupports a single, analytic application used by a distinct set ofworkers.

    Data Warehouse Architecture in the context of anorganization's data warehousing efforts is a conceptualization ofhow the data warehouse is built. There is no right or wrongarchitecture; rather multiple architectures exist to supportvarious environments and situations.

    The worthiness of the architecture can be judged in how theconceptualization aids in the building, maintenance, and usage

    of the data warehouse

  • 7/31/2019 Data Warehouse Final Report

    10/19

    Data Warehouse Architecture

    Data Architecture describes how data is processed, stored, andutilized in a given system. It provides criteria for data processingoperations that make it possible to design data flows and alsocontrol the flow of data in the system.

    The Data Architecture breaks a subject down to the atomic leveland then builds it back up to the desired form. The Data Architectbreaks the subject down by going through 3 traditionalarchitectural processes:

    Conceptual - represents all business entities. Logical - represents the logic of how entities are related. Physical - the realization of the data mechanisms for a specific

    type of functionality.

  • 7/31/2019 Data Warehouse Final Report

    11/19

    Conceptualization of Data warehouse withthe interconnected layers

    Operational database layer The source data for the data warehouse - An organization's Enterprise

    Resource Planning systems fall into this layer.

    Data access layer The interface between the operational and informational access layer -

    Tools to extract, transform, load data into the warehouse fall into thislayer.

    Metadata layer The data directory - This is usually more detailed than an operational system data

    directory. There are dictionaries for the entire warehouse and sometimes dictionaries

    for the data that can be accessed by a particular reporting and analysis tool.

    Informational access layer The data accessed for reporting and analyzing and the tools for reporting and

    analyzing data - Business intelligence tools fall into this layer. And the Inmon-Kimball differences about design methodology, discussed later in this article,have to do with this layer.

  • 7/31/2019 Data Warehouse Final Report

    12/19

    Benefits of Data Warehousing

    A data warehouse provides a common data model for all data of interestregardless of the data's source.

    Prior to loading data into the data warehouse, inconsistencies are identifiedand resolved. This greatly simplifies reporting and analysis.

    Information in the data warehouse is under the control of data warehouseusers so that, even if the source system data is purged over time, theinformation in the warehouse can be stored safely for extended periods oftime.

    Because they are separate from operational systems, data warehouses provideretrieval of data without slowing down operational systems.

    Data warehouses can work in conjunction with and, hence, enhance the valueof operational business applications, notably customer relationshipmanagement (CRM) systems.

    Data warehouses facilitate decision support system applications such as trendreports (e.g., the items with the most sales in a particular area within the last

    two years), exception reports, and reports that show actual performanceversus goals

  • 7/31/2019 Data Warehouse Final Report

    13/19

    Disadvantages of Data Warehousing

    There are also disadvantages to using a data warehouse. Some ofthem are:

    Data warehouses are not the optimal environment for unstructured

    data

    Because data must be extracted, transformed and loaded into thewarehouse, there is an element of latency in data warehouse data.

    Over their life, data warehouses can have high costs. The datawarehouse is usually not static. Maintenance costs are high.

    Data warehouses can get outdated relatively quickly. There is a costof delivering suboptimal information to the organization.

    There is often a fine line between data warehouses and operationalsystems. Duplicate, expensive functionality may be developed. Or,functionality may be developed in the data warehouse that, in

    retrospect, should have been developed in the operational systemsand vice versa.

  • 7/31/2019 Data Warehouse Final Report

    14/19

    The Future of Data Warehousing

    Data warehousing, like any technologyniche, has a history of innovations that

    did not receive market acceptance

    A 2009 Gartner Group paper predicted

    these developments in businessintelligence/data warehousing market.

  • 7/31/2019 Data Warehouse Final Report

    15/19

    The Future of Data Warehousing

    Because of lack of information, processes, and tools,through 2012, more than 35 per cent of the top 5,000global companies will regularly fail to make insightful

    decisions about significant changes in their business andmarkets.

    By 2012, business units will control at least 40 per cent ofthe total budget for business intelligence.

    By 2012, one-third of analytic applications applied tobusiness processes will be delivered through coarse-grained application mashups.

  • 7/31/2019 Data Warehouse Final Report

    16/19

    Businesses of all sizes and in different industries, aswell as government agencies, are finding that they canrealize significant benefits by implementing a datawarehouse. It is generally accepted that datawarehousing provides an excellent approach fortransforming the vast amounts of data that exist inthese organizations into useful and reliable informationfor getting answers to their questions and to supportthe decision making process. A data warehouseprovides the base for the powerful data analysistechniques that are available today such as datamining and multidimensional analysis, as well as themore traditional query and reporting. Making use ofthese techniques along with data warehousing canresult in easier access to the information you need formore informed decision making.

  • 7/31/2019 Data Warehouse Final Report

    17/19

    A Solution, Not a Product

    Often we think that a data warehouse is a product, orgroup of products, that we can buy to help getanswers to our questions and improve our decision-

    making capability. But, it is not so simple. A datawarehouse can help us get answers for betterdecision making, but it is only one part of a moreglobal set of processes. As examples, where did thedata in the data warehouse come from? How did itget into the data warehouse? How is it maintained?

    How is the data structured in the data warehouse?What is actually in the data warehouse? These are allquestions that must be answered before a datawarehouse can be built. We prefer to discuss themore global environment, and we refer to it as datawarehousing.

  • 7/31/2019 Data Warehouse Final Report

    18/19

    Why Data Warehousing?

    The concept of data warehousing has evolved out of the need foreasy access to a structured store of quality data that can be usedfor decision making. It is globally accepted that information is avery powerful asset that can provide significant benefits to any

    organization and a competitive advantage in the business world.Organizations have vast amounts of data but have found itincreasingly difficult to access it and make use of it. This isbecause it is in many different formats, exists on many differentplatforms, and resides in many different file and databasestructures developed by different vendors. Thus organizationshave had to write and maintain perhaps hundreds of programs

    that are used to extract, prepare, and consolidate data for use bymany different applications for analysis and reporting. Also,decision makers often want to dig deeper into the data onceinitial findings are made. This would typically require modificationof the extract programs or development of new ones. Thisprocess is costly, inefficient, and very time consuming. Datawarehousing offers a better approach.

  • 7/31/2019 Data Warehouse Final Report

    19/19

    Data warehousing implements the process toaccess heterogeneous data sources; clean,filter, and transform the data; and store thedata in a structure that is easy to access,understand, and use. The data is then used

    for query, reporting, and data analysis. Assuch, the access, use, technology, andperformance requirements are completelydifferent from those in a transaction-oriented

    operational environment. The volume of datain data warehousing can be very high,particularly when considering therequirements