1 productivity benefits of the instant data warehouse 27/7/2004 -- as more and more large...

5
1 Productivity Benefits of the Instant Data Warehouse 27/7/2004 -- As more and more large organisations use the Instant Data Warehouse we are starting to see even greater levels of productivity in constructing ETL processes. In short, we are getting better at using the Instant Data Warehouse and we are building 'tools around the tools' to extend the reach and functionality of the Instant Data Warehouse. In recent times I have often been asked the question: 'Just how fast is it to develop ETL processes using the Instant Data Warehouse?‘ People considering using the Instant Data Warehouse are asking for 'solid numbers' of actual work performed using the Instant Data Warehouse. Anyone who has been working in the Data Warehousing space for any length of time knows that the question of 'How long does it take to develop ETL processing?' is equivalent to the question of 'How long is a piece of string?' It is a tough question to answer in any way that provides meaningful guidance to a prospective customer. However, taking on the challenge, this press release is my 'best effort' at defining Just how fast is it to develop ETL processes using the Instant Data Warehouse? Today, I am pleased to announce some results of recent experiences in two very large clients. Client #1 - Telecommunications The Client The client is huge by any standards. A telecommunications company providing the full range of telecommunications services to a population of around 20 million people. The high level numbers are 18M accounts with 60M CDRs per day. The System The system to be developed was an Operational Data Store based on the Sybase IWS Telecommunications Model. The data to be placed into the Operational Data Store was Customer Care and Billing information including CDRs from switches and the billing system. The client selected DataStage XE with Parallel Extender as the ETL tool to be used. The Approach With such a suite of complicated issues to deal with at this client we decided to use the Instant Data Warehouse to prototype the customised IWS model and ETL prior to building the real ETL in DataStage. We believed if we could build the ODS as a prototype, validate the model and validate the ETL, then we would significantly reduce the risks of building the ODS and ETL… Press Release From the desk of Peter Nolan

Upload: james-wright

Post on 29-Dec-2015

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Productivity Benefits of the Instant Data Warehouse 27/7/2004 -- As more and more large organisations use the Instant Data Warehouse we are starting

1

Productivity Benefits of the Instant Data Warehouse

27/7/2004 -- As more and more large organisations use the Instant Data Warehouse we are starting to see even greater levels of productivity in constructing ETL processes. 

In short, we are getting better at using the Instant Data Warehouse and we are building 'tools around the tools' to extend the reach and functionality of the Instant Data Warehouse.

In recent times I have often been asked the question:

'Just how fast is it to develop ETL processes using the Instant Data Warehouse?‘

People considering using the Instant Data Warehouse are asking for 'solid numbers' of actual work performed using the Instant Data Warehouse.

Anyone who has been working in the Data Warehousing space for any length of time knows that the question of 'How long does it take to develop ETL processing?' is equivalent to the question of 'How long is a piece of string?'

It is a tough question to answer in any way that provides meaningful guidance to a prospective customer.

However, taking on the challenge, this press release is my 'best effort' at defining

Just how fast is it to develop ETL processes using the Instant Data Warehouse? 

Today, I am pleased to announce some results of recent experiences in two very large clients.

Client #1 - Telecommunications 

The Client

The client is huge by any standards. A telecommunications company providing the full range of telecommunications services to a population of around 20 million people. The high level numbers are 18M accounts with 60M CDRs per day.   

The System

The system to be developed was an Operational Data Store based on the Sybase IWS Telecommunications Model.  The data to be placed into the Operational Data Store was Customer Care and Billing information including CDRs from switches and the billing system.  The client selected DataStage XE with Parallel Extender as the ETL tool to be used. 

The Approach

With such a suite of complicated issues to deal with at this client we decided to use the Instant Data Warehouse to prototype the customised IWS model and ETL prior to building the real ETL in DataStage.  We believed if we could build the ODS as a prototype, validate the model and validate the ETL, then we would significantly reduce the risks of building the ODS and ETL…

PressRelease

From thedesk of

Peter Nolan

Page 2: 1 Productivity Benefits of the Instant Data Warehouse 27/7/2004 -- As more and more large organisations use the Instant Data Warehouse we are starting

2

Productivity Benefits of the Instant Data Warehouse

This was the first time we had decided to use the Instant Data Warehouse purely as a prototyping tool rather than as the ETL tool that would be used in production.

We built the staging area using DataStage as per normal, and the prototype database using the Instant Data Warehouse.

The approach we used was to develop all mappings using spreadsheets and once we felt we have these 'pretty right' convert them to be executable processes in the Instant Data Warehouse. 

The Result with Instant Data Warehouse

It may be hard to believe. We completed the construction of the prototype ETL in 2 work weeks.  If you count that we worked some longer days, you might want to call it 3 work weeks.

In either case, the amount of time spent writing the ETL processes in the Instant Data Warehouse was trivial compared to the overall project.

The top line numbers for the initial prototype were:

26 dimension tables

20 fact tables

This two weeks included the time and effort to solve most of our data integration issues.

 

Estimated Benefit

Having built the prototype and validated it we re-wrote all the working ETL using DataStage.

The re-write to DataStage took 16 work weeks. I want to point out that we actually had DataStage templates as the basis for the DataStage development, so this 16 work weeks included the 'flying start' of already having all processes well defined and template jobs available.

I would also note that 16 work weeks for implementing the ETL for a major portion of Sybase IWS in DataStage was also significantly faster than the previous IWS projects I had managed.

The likely benefit of using the Instant Data Warehouse to perform initial development and validation is difficult to gauge. 

My estimate of the benefits are:

Minimum of 4 work weeks cut out of development time, but more likely 8 weeks. (Including the time spent developing the prototype.)

Significant improvement is job quality since the developers had proven ETL to work against and example outputs to compare their results against.

Significant improvement in our understanding of the data at a much earlier stage of the project.

All participants could see what was being developed far earlier than would have otherwise been possible.

 

PressRelease

From thedesk of

Peter Nolan

Page 3: 1 Productivity Benefits of the Instant Data Warehouse 27/7/2004 -- As more and more large organisations use the Instant Data Warehouse we are starting

3

Productivity Benefits of the Instant Data Warehouse

Client #2 - Telecommunications 

The Client

A global telco. In the country we were working in the client provides a full range of mobile services to a population of around 20 million people.  

The System

The system to be developed was a 'Financial Analysis and Reporting Data Mart' based on Oracle Financials.  One of the purposes of the project was to get the data out of Oracle Financials to make it possible to perform much more sophisticated financial analysis. The target model was to be a customised version of the Sybase IWS Core Model.

Since the data mart was for financial data and not CDR data the volumes were far more reasonable than Client #1.   

The Approach

With the success of using the Instant Data Warehouse at Client 1, we decided to make more use of the Instant Data Warehouse on Client #2.

The client wanted to take a look at Informatica and DataStage in a little more detail, but did not want to delay the project to do so. So we proposed building the data mart before the ETL tool decision was made.  A first in my experience. The decision finally went with DataStage.

Since we had no ETL tool at the time we also decided to build the staging area using the Instant Data Warehouse as well as the Data Mart itself.

Once we built the staging area we could perform better data analysis because we had the data on a local machine and we could manipulate it more easily.

We performed the regular mapping using spreadsheets and once satisfied we customised the IWS model.  During the mapping exercise we analysed all existing financial reports and defined all the data to provide all the existing financial reporting as the 'minimum' level of data to be migrated to the data mart. 

As we analysed the source data we also included many more fields which we believed would be of value in the future for financial analysis.

The net result of the staging area and mapping exercise was approximately 100 tables in the staging area and 2000+ fields identified as 'potentially useful' to place into the data mart. I think I can safely say that the 4 weeks I spent looking at that spreadsheet cost me about 2 years worth of 'eyeball wear and tear'!!!

The analysis of reports and data, and the mapping exercise took around 8 work weeks.  We wanted the mapping spreadsheet to be as good as it could be before we started the Instant Data Warehouse mapping because of the very large number of tables and fields involved.

PressRelease

From thedesk of

Peter Nolan

Page 4: 1 Productivity Benefits of the Instant Data Warehouse 27/7/2004 -- As more and more large organisations use the Instant Data Warehouse we are starting

4

Productivity Benefits of the Instant Data Warehouse

The Result with Instant Data Warehouse

You will find the numbers for this client even harder to believe than Client #1.

We completed the construction of the ETL for the staging area in a couple of days. This included updating the Data Transfer Utility to be able to set fields to default values. 

We completed the first cut of all the ETL processing in 2 weeks.  We then spent another week running the ETL over and over again finding data problems and developing our understanding of the data and how it really fitted together, not how we thought it fitted together from our data analysis.

To put this in perspective, and this is very important:

We were able to write the ETL for the mappings defined in 25% of the time it took to develop those mappings. 

One of the most important factors with the migration of something as important as financial reporting from the accounting system to a data mart is to get the reports to balance before cutting over to the data mart.

Usually, it is not possible to test whether the reports will balance until the end of the project. That is, when production volumes of data are available through the production ETL. Having done many such projects, I completely understood that the problem with this approach is that if the reports do not balance then the 'surprise' comes at the end of the project, and significant rework may be required.

However, in my 13+ years of data warehousing experience I have never had another option, until now…

As a project team, we discussed the possibility of migrating the Instant Data Warehouse to the production environment and building the full production database prior to even starting the production ETL coding. This would reduce the single largest risk of the project. Again, another first for me.

We decided to go ahead and try this as an experiment.  At the time the Instant Data Warehouse ran on Windows2000 and Solaris in 32 bit mode. Client #2 uses AIX 64 bit. We really didn't know if the Instant Data Warehouse would run on AIX 64 bit and no members of the project team or the customer had any experience writing 64 bit applications on AIX.

Amazingly, the Instant Data Warehouse moved to AIX 64 bit without the need to change one single line of code!!!

This project is 'work in progress' and we will be building the entire production database such that we can build the Business Objects front end and balance it to existing reports before we write any Production ETL code.  This means we will see our first production reports at least one month earlier than otherwise possible.

The top line numbers for this data mart are:

50+ dimension tables

36+ fact tables

PressRelease

From thedesk of

Peter Nolan

Page 5: 1 Productivity Benefits of the Instant Data Warehouse 27/7/2004 -- As more and more large organisations use the Instant Data Warehouse we are starting

5

Productivity Benefits of the Instant Data Warehouse

Estimated Benefit

It is very hard to estimate the benefits of:

Being able to run data integration processes while it is still trivial to change the target data model and ETL processes from staging to target.

Being able to delay purchase of ETL tools in order to more closely study the alternatives.

Being able to build the full production system before writing the production ETL processing to do so.

Being able to balance the production reports a month earlier than otherwise possible.

These are things that data warehouse developers dream of being able to do.

As a Data Warehousing Project Manager and Data Warehouse Architect, I know they are very, very valuable things to be able to do.

We have not yet built the production ETL for this project. However, we are estimating that with the extremely high level of confidence we have in the ETL developed in the Instant Data Warehouse we should be able to get the ETL coding effort down around 8 work weeks, maybe even less. Watch this space.

In this case we expect the re-work rate on the ETL programming to be almost zero.

The Bottom Line

The bottom line of these two examples is this:

The Instant Data Warehouse is, by far, more productive than the leading 'ETL Productivity Tools'. 

The Instant Data Warehouse has now 'proven' itself to be the most productive ETL tool, period.  

More importantly, from my own personal point of view, the Instant Data Warehouse now significantly contributes to the goal that I set myself some 10 years ago:

Make dimensional data warehouses available to any company that wants one.  

The Instant Data Warehouse slashes the #1 cost of a data warehousing project, ETL development and maintenance.

The Instant Data Warehouse is so productive that even large customers who plan to use one of the leading ETL tools like Informatica or DataStage can build a prototype using the Instant Data Warehouse and reduce the overall project time significantly for having done so.   

I hope these two examples of recent productivity benefits of the Instant Data Warehouse go some way to answering that question that I am so often asked nowadays.

Just how fast is it to develop ETL processes using the Instant Data Warehouse? 

Best Regards

Peter Nolan

PressRelease

From thedesk of

Peter Nolan