a narrow escape: how we avoided the data warehouse “death march”

A Narrow Escape: How We Avoided the Data Warehouse “Death March”

Chris Lawson

Northern California Oracle User’s Group November 15, 2001

home.mindspring.com/~karo3

What is a “Death March” ?

Edward Yourdon, 1997, Death March: The Complete Software Developer’s Guide to Surviving “Mission Impossible Projects”

Death March projects “use a “forced march imposed upon relatively innocent victims, the outcome of which is usually a high casualty rate.”

Some “Encouraging” Words

•DW projects have a terrible reputation, often producing mad customers and a frustrated project team.

•Many seek to “jump ship.”

Yourdon: “A large percentage of the projects associated with start-up companies are death march projects. A large percentage of these projects will fail.”

Death March, Death March!

Further “Words of Encouragement”

•Approx 50% of large DW projects “fail to meet the desired levels of success,” often failing on first or even second attempts.”

CIO magazine, “The Middle Ground,” 1/15/99.

By the time the data sources have been identified and transformed the in-house customers have often lost interest.

Traditional Approach: Will it Work?

• For large information systems, development has traditionally been an extremely structured process.

• Many days are spent on up-front analysis, requirements analysis, design reviews, etc.

• The strategy for these types of projects is to invest much time early, when mistakes are cheaper to fix.

Traditional Software Project

• For large government-funded projects, this is reasonable because requirements are fixed.

• Customer may even mandate standards, such as “SEI” (Software Engineering Institute).

• These standards require strict requirement analysis and rigorous design reviews.

• Emphasis is on repeatable processes and continuous improvement.

Change in “Ground Rules”

• The strategy described above often does not work well with large, private-sector projects, such as Data Warehouses.

• The “ground rules” defining a successful project are different.

• For example, in contrast to government projects, Data Warehouse requirements are negotiable, elusive, and will probably change.

Change in Ground Rules (cont’d)

“Just when you are about to deliver, expect the rules to change – then change again.”

Steve Tracy, Intelligent Enterprise, “Close the Loop,” March 27, 2001.

In other words, the entire nature of the project development cycle is fluid.

Our Escape Route: Prototyping

• Early phases were very limited in scope.

• We deliberately followed a simpler development process.

• The Prototype approach allows early demonstration of a working model with some features.

Data warehouse designed for DigitalThink used a Prototype strategy with very restricted scope.

Key to Prototyping Benefits

• With scope limited to only six key reports and one data source, our early successes provided confidence for later stages.

•Prototyping provides the opportunity to quickly uncover “show stoppers.

• Roadblocks will be found quickly.

Another Benefit: Credibility

• Prototyping allows team to gain support among the potential users.

• Working prototype enabled the team to perform several “demos” of key reports.

• These demos provided "buy-in" from interested parties

• Demos also proved that that the project was “on track”--we gained credibility.

Design Overview

• Like most Data Warehouses, Digital Think used a process called ETL—Extraction, Transformation, and Loading.

• Nightly runs perform data extraction, transformation, and loading from the production Sybase system into Oracle 8i.

• Only the changed production data is inserted.

• Data is transformed into appropriate ‘star’ schema format using E.piphany ETL tool.

ProductionSybase

Load Program

ETL Program

STAGINGOracle

ServerSun E4500 cluster

Reporting ServerSun E-3500 cluster

Actuate e.Report

ing Server

REPORTINGOracle

Actuate ReportCast

Apache 1.3.12

Web ServerSun Ultra 2 (2)

Reporting System

• The application server layer was actually a Report Server in our case.

• Reports are produced by Actuate reporting system.

• Actuate reads the pre-defined report definition to define a database query.

• It then executes the query on the Oracle 8i database, adds formatting and graphics.

• It then forwards report to Apache web server for delivery to the client’s web browser.

Example of Report

Database Surprises

• Surprisingly(?) some of the newest Oracle features were a mixed blessing.

• Star Transformation did improve performance of some reports, but degraded others.

I wasn’t surprised

Feature turned off when bug caused different results from same query!

More on Star Schemas later!

Report “Factories.”

• Some queries require extensive processing.

• This presents risk that numerous reports might be requested simultaneously, degrading the system.

• This risk was mitigated via “factories,” which restrict the number of simultaneous queries.

• Reports are assigned to particular factories based on expected run time.

• The factory for the typical report was configured to allow execution of two reports per CPU.

A Big Challenge: Lumpy Data

• Certain customers accounted for a large proportion of the data. This presented a performance-tuning dilemma.

• Time for reports ranged from a few seconds for most to a few minutes for some customers.

• Histograms no good because ‘big’ customer looks like a ‘regular’ account (at first).

How can performance be optimized given this huge variance?

Oracle Trick: “Query Rewrite”

• Recent database feature uses “materialized views” (Snapshots) with “Query Rewrite.”

• Each night, aggregate data is stored in a materialized view for the big accounts.

• Then, whenever these customers submit a report, the optimizer rewrites the database query to use the special aggregate data.

• Another help was “Pre-running reports” for big customers. The Report Server then uses the pre-built report instead of new query.

Star Schema Queries

Let’s look at how Oracle uses Star Schemas

FACTORY10 rows

PRODUCT500 rows

STORE100 rows

TIME_PERIOD365 rows

SALES Fact table

100 million rows

Dimension tables

Star Schema Performance

• Star Schemas are very useful, but good performance can be elusive when joining.

• There is always the “normal” way of joining--Hash Join, Nested Loop, Sort/Merge.

• But--the Oracle optimizer has some “tricks” to use when joining tables in Star Schemas.

• These two tricks are called “Star Join” and “Star Transformation.”

• These two methods are NOT the same, and use drastically different ideas.

Intro to Sql Joins with Stars

• The Issue: What if the selection criteria is spread among the Dimension tables.

• That is, the query as a whole only returns a few rows, but the ‘Where Clause” on each Dimension table is not very restrictive.

• This presents a special type of problem, and the join methods can have drastically different performance, as we will see.

Failure to understand this issue will cause bad things to happen.

Example: Typical Query

Select Sales.Sale# From Sales, Time_Period, Factory, Product, Store where

The query as a whole only returns a few rows but any one criteria would return many rows.

Time_Period.Quarter = ‘Q1-2000’ andFactory.Manuf_Name = ‘Coleman’ andProduct.Description = ‘Camping Gear’ andStore.City = ‘Elk Grove’ ;

The Dilemma

•How can we “jump” to the final result set?

•It seems like no matter how we join the tables, we must process many rows. The earliest joins must “pay the price.”

•A solution seems impossible! Or is it?

•We only want the final result set, not all the rows we have to “wade through” to get there.

Oracle’s First Answer: STAR Join

•Optimizer recognizes the Star setup, & bypasses the usual join methods.

•Prior to joining to the Fact table, the optimizer jointly considers the dimension constraints.

•The optimizer manually builds a list of all possible combinations (Cartesian product) of the Dimension rows that meet the selection criteria.

•This small set of rows is used to access the Fact table, via a (B*tree) composite index.

Star Join Example

Step 1: Find Cartesian product of Dimension rows that meet the search criteria.

Quarter = ‘Q1-2000’ 90 rowsManuf = ‘Coleman x 1 rowProduct = ‘Sleeping Bag’ x 1 rowStore = ‘Boston’ x 1 row

total 90 rows

Step 2: Using this set of 90 rows, access Sales table via 4-col index.

STAR Join: Complications

•Requires huge composite index on Fact table.

•Multiple huge indexes will be necessary so that “leading columns” rule will always be met.

•Cartesian product can be huge. Example:Quarter = ‘Q1-2000’ 90 rowsManuf like ‘C%’ x 10 rowsProduct like ‘Sleeping%’ x 10 rowsStore_Area = ‘East’ x 20 rows = 180,000

Thus, Star Join would require 180,000 index lookups

Oracle’s Second Answer: STAR Transformation

The “Trick”-- Use inherent speed of combining bitmaps

Instead of figuring all possible combinations, the combining is done at the bitmap level. Need bitmap index for each foreign key in Fact pointing to a Dimension table.Star_Transformation_Enabled = True

Example of STAR Transformation

QUARTER 90 rows {‘1/1/2000’, ‘1/2/2000’, etc.}

MANUF 10 rows {‘Coleman’, ‘Clark’, etc.}

PRODUCT 10 rows {‘Sleep Bag’, ‘Sleepware’, etc}

STORE: 20 rows {‘Boston’, ‘New York’, etc.}

Consider again criteria: Quarter = ‘Q1-2000’ AND Manuf like ‘C%’ AND Product like ‘Sleeping’ AND Store_Area = ‘East’

Step 1: Scan each Dimension table to find rows that match the search criteria. Get PK’s:

STAR Transformation

Step 2: Access matching bitmaps on Fact table. Combine bitmaps (‘OR’) for each Dimension:

QUARTER Merge 90 bitmaps => 1 bitmap

MANUF Merge 10 bitmaps => 1 bitmap

PRODUCT Merge 10 bitmaps => 1 bitmap

STORE: Merge 20 bitmaps => 1 bitmap

STAR Transformation

Step 3: Combine (‘AND’) bitmaps yielding the “final” bitmap that encompasses all selection criteria.

Step 4: Use final bitmap to read the Fact table.

Step 5: [If necessary] Return to Dimension tables to get columns not yet read.

High Availability

• This required a method of keeping the database available while new data is loaded.

• The E.piphany tool provided a handy solution to this via a duplicate set of database tables.

• The Reporting system was required to be available 24x7, because of world-wide customers.

High Availability

• Each table set contains all the data needed to run a report. Thus, while the users are querying set “A,” new data is copied into “B.”

• At the conclusion of the nightly data update, new reports are directed (via a view) to the most recent set of data--either “A” or “B.”

• Of course, this feature requires more storage, but the high availability benefit was deemed a good tradeoff.

High Availability (HA)

• The database and reporting servers were configured for HA (High Availability) using Veritas Cluster Server (VCS).

• With VCS, each server is actually part of a two-node cluster.

• Upon failure of one node, the entire application and database automatically switch to the alternate node.

• See web site for further documentation on Veritas VCS.

Teamwork

• The critical position of project architect was filled very early in the project.

•The architect was one of the few team members with extensive data warehouse design experience.

•He provided the “road map”, as well as designed the database “Star” schema.

Our architect

Teamwork (continued)

• External customers were invited to preview prototype reports, and provide feedback.

Our DBA loves sql tuning

• The DBA worked with the designers to optimize the response time, and identify database bottlenecks.

• This frequently meant identifying poorly performing sql code, and suggesting corrections, or adding new indexes.

Summary: Our Secret Formula

• Rather, prototyping strategy provided a positive impetus and a laid a good groundwork for further development.

• Trust and cooperation among team members was also a key contributor to success.

• Recipe for success: Competent personnel plus small early successes plus credibility.

The project success was not primarily due to “gurus” or clever development techniques.

For Further Information:

Chris_Lawson @ Yahoo.com

home.mindspring.com/~karo3

a narrow escape: how we avoided the data warehouse “death march”

Documents

government projects

types of projects

large dw projects

data warehouse death

encouraging wordsdw

privatesector projects

data warehouse requirements

data warehouses