cisb594 – business intelligence data warehousing part ii

25
CISB594 – Business CISB594 – Business Intelligence Intelligence Data Warehousing Data Warehousing Part II Part II

Upload: jeffery-ball

Post on 13-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data WarehousingData WarehousingPart IIPart II

CISB594 – Business IntelligenceCISB594 – Business Intelligence

ReferenceReference• Materials used in this presentation are extracted mainly from

the following texts, unless stated otherwise.

CISB594 – Business IntelligenceCISB594 – Business Intelligence

ObjectivesObjectives

At the end of this lecture, you should be able to:

• Describe the processes used in developing and managing data warehouses

• Explain data integration and the extraction, transformation, and load (ETL) processes

• Describe real-time (active) data warehousing• Understand data warehouse administration and

security issues

CISB594 – Business IntelligenceCISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Integration: The Extraction, Data Integration: The Extraction, Transformation, and Load (ETL) ProcessTransformation, and Load (ETL) Process

Data integration is an umbrella term that covers three processes which combine to move data from multiple sources into a data warehouse:

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Extraction, transformation, and load (ETL) technologies• Fundamental, a DW could not exist without ETL• The ETL process consists of – extraction (reading data from one or more databases), – transformation (converting the extracted data from its

previous form into the form in which it needs to be so that it can be placed into a data warehouse)

– load (putting the data into the data warehouse)• The ETL process also contributes to the quality of the data in a

DW, its purpose is to load the data warehouse with integrated and cleansed data

Data Integration : The Extraction, Data Integration : The Extraction, Transformation, and Load (ETL) ProcessTransformation, and Load (ETL) Process

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Transformation Tools – Data Transformation Tools – To purchase or to build?To purchase or to build?

• The process can either be done through purchasing data transformation tools or through developing the tools using programming languages.

• Programmers can set up ETL processes using almost any programming language, but building such processes from scratch can become complex. Increasingly, companies are buying ETL tools to help in the creation of ETL process (Wikipedia)

• Examples of ETL tools– IBM InfoSphere DataStage– Microsoft SQL Server Integration Services (SSIS)– Oracle Data Integrator (ODI)

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Transformation Tools – Data Transformation Tools – To purchase or to build?To purchase or to build?

• Issues affect whether an organization will purchase data transformation tools or build the transformation process itself – Data transformation tools are expensive– Data transformation tools may have a long learning curve– It is difficult to measure how the IT organization is doing until

it has learned to use the data transformation tools • Not an easy decision ! However many believe purchasing the

tools should be able to simplify the maintenance of the data warehouse

CISB594 – Business IntelligenceCISB594 – Business Intelligence

• Important criteria to consider in selecting an ETL tool– Ability to read from and write to an unlimited number of data

source architectures– Automatic capturing and delivery of metadata– A history of conforming to open standards– An easy-to-use interface for the developer and the functional

user

Data Transformation Tools – Data Transformation Tools – To purchase or to build?To purchase or to build?

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehouse DevelopmentData Warehouse Development

• Choosing the vendors – Six guidelines to consider a vendor

Financial strengthQualified consultantsMarket shareIndustry experienceEstablished partnerships

• - These indicate that a vendor is likely to be in business for the long term, to have the support capabilities its customers need, and to provide products that interoperate with other products the potential user has or may obtain.

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehouse DevelopmentData Warehouse Development • A data warehousing project is a major undertaking• MORE COMPLICATED AS IT COMPRISES AND INFLUENCES

MANY DEPARTMENTS, INPUT OUTPUT INTERFACES AND CAN BE PART OF BUSINESS STRATEGY

• Data warehouse development approaches– Inmon Model: EDW approach – Kimball Model: Data mart approach

• Which model is best?– There is no one-size-fits-all strategy to data warehousing – It depends on the need and the capacity of the organization– For many organizations, data mart approach is a convenient

first step in implementing DW

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Describe the major similarities and differences between the Inmon and Kimball data warehouse development approaches.•Similarities: Both methods can produce an enterprise data warehouse and subset data marts. •Differences: Inmon’s approach starts with an enterprise data warehouse, creating data marts as subsets of that EDW if appropriate. The focus is on proven, traditional methods and technologies. Kimball’s starts with data marts, consolidating them into an EDW later if appropriate. It focuses in creating a useful end-user capability quickly.

Data Warehouse DevelopmentData Warehouse Development

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Effort Data Mart (Kimball Model) EDW (Inmon Model)

Scope One subject area Several subject areas

Development Time Months Years

Development Cost $10K - $100 K ++ $1000000 ++

Development Difficulty Low to medium High

Sources Only some operational and external systems

Many operational and external systems

Size Megabytes to several gigabytes

Gigabytes to petabytes

Hardware Workstations and departmental servers

Enterprise servers and mainframe

OS Windows and Linux Unix, S/390

Number of simultaneous users

10s 100s to 1000s

Data Warehouse DevelopmentData Warehouse Development

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehouse DevelopmentData Warehouse Development • Some best practices for implementing a data warehouse (Weir,

2002):– Project must fit with corporate strategy and business

objectives– There must be complete buy-in to the project by executives,

managers, and users– It is important to manage user expectations about the

completed project– The data warehouse must be built incrementally– Build in adaptability

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehouse DevelopmentData Warehouse Development

• Some best practices for implementing a data warehouse (Weir, 2002):– The project must be managed by both IT and business

professionals– Develop a business/supplier relationship– Only load data that have been cleansed and are of a quality

understood by the organization– Do not overlook training requirements– Be politically aware

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehouse DevelopmentData Warehouse Development • Failure factors in data warehouse projects:– Cultural issues being ignored– Inappropriate architecture– Unclear business objectives– Missing information– Unrealistic expectations– Low data quality

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehouse DevelopmentData Warehouse Development • Issues to consider to build a successful data

warehouse:– Starting with the wrong sponsorship chain– Setting expectations that you cannot meet and

frustrating executives at the moment of truth– Engaging in politically naive behavior– Loading the warehouse with information just

because it is available

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehouse DevelopmentData Warehouse Development • Issues to consider to build a successful data

warehouse:– Believing that data warehousing database design is

the same as transactional database design– Choosing a data warehouse manager who is

technology oriented rather than user oriented– Focusing on traditional internal record-oriented

data and ignoring the value of external data and of text, images, and, perhaps, sound and video

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehouse DevelopmentData Warehouse Development • Implementation factors that can be categorized

into three criteria– Organizational issues– Project issues– Technical issues

• User participation in the development of data and access modeling is a critical success factor in data warehouse development

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehouse DevelopmentData Warehouse Development • Other factor to consider - Massive data

warehouses and scalability – The main issues pertaining to scalability:• The amount of data in the warehouse• How quickly the warehouse is expected to grow• The number of concurrent users• The complexity of user queries

– Good scalability means that queries and other data-access functions will grow linearly with the size of the warehouse

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Real-Time Data Warehousing Real-Time Data Warehousing • Traditionally, a data warehouse are not business critical, data

are commonly updated on a weekly basis – not allowing for responding to transactions in near real time

• Today, organizations are facing the need for real-time data warehousing, as decision support has become operational.

• The emergence of real-time data warehousing (RDW) or active data warehousing (ADW)

• The process of loading and providing data via a data warehouse as they become available

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Real-Time Data Warehousing Real-Time Data Warehousing • The need for real-time data– A business often cannot afford to wait a whole day for its

operational data to load into the data warehouse for analysis

– Provides incremental real-time data showing every state change and almost analogous patterns over time

– Real-time data collection can reduce or eliminate the nightly batch processes

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehouse Data Warehouse Administration and Security IssuesAdministration and Security Issues

• Due to its huge size and complicated nature, DW requires strong monitoring and administrating

• Needs more than a DBA• Needs data warehouse administrator (DWA)

A person responsible for the administration and management of a data warehouse

CISB594 – Business IntelligenceCISB594 – Business Intelligence

What skills should a DWA possess? Why?• Familiarity with high-performance hardware, software and

networking technologies, since a data warehouse is based on those

• Solid business insight, to understand the purpose of the DW and its business justification

• Familiarity with business decision, making processes to understand how the DW will be used

• Excellent communication skills, to communicate with the rest of the organization

Data Warehouse Data Warehouse Administration and Security IssuesAdministration and Security Issues

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Effective security in a data warehouse should focus onfour main areas:1. Establishing effective corporate and security policies and

procedures. An effective security policy should start at the top and be communicated to everyone in the organization.

2. Implementing logical security procedures and techniques to restrict access. This includes user authentication, access controls, and encryption.

3. Limiting physical access to the data center environment.4. Establishing an effective internal control review process for

security and privacy

Data Warehouse Data Warehouse Administration and Security IssuesAdministration and Security Issues

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Now ask if ..Now ask if ..

You are able to:• Describe the processes used in developing and managing data

warehouses• Explain data integration and the extraction, transformation,

and load (ETL) processes• Describe real-time (active) data warehousing• Understand data warehouse administration and security

issues

CISB594 – Business IntelligenceCISB594 – Business Intelligence