data warehousing best practices whitepaper

8/3/2019 Data Warehousing Best Practices Whitepaper

1/23

Data Warehouse Best Practices

April, 09, 2009


2/23

White Paper

Copyright 2009 Intrasphere Page 2 of 23 Confidential

Table of Contents

Introduction ................................................................................................................... 4

Planning ......................................................................................................................... 5

Readiness Assessment........................................................................................... 5

Corporate Sponsorship...................................................................................... 5

Appropriate Budget, Time and Resource Commitments.................................... 6

User Demand..................................................................................................... 6

IT Readiness...................................................................................................... 6

Project Planning....................................................................................................... 7

Realistic Timelines and Scope........................................................................... 7

Phased Approach .............................................................................................. 7

Communication Channels and Issue Tracking................................................... 8

Data Governance Committee ............................................................................ 8

Analysis ......................................................................................................................... 9

Data Analysis ........................................................................................................... 9

Data Profiling ..................................................................................................... 9

System Analysis ...................................................................................................... 9

Capacity Planning.............................................................................................. 9

Tool Assessment ............................................................................................. 10

Design .......................................................................................................................... 11

Data Modeling ........................................................................................................ 11Dimensional versus Relational......................................................................... 11

Data Mapping .................................................................................................. 11

Data Marts ....................................................................................................... 12

System Design ....................................................................................................... 12

Modular Design................................................................................................ 12

A/B Data Environments ................................................................................... 13

Development................................................................................................................ 14

Development Environment ................................................................................... 14

Multiple Dev Databases................................................................................... 14

Architecture Team ........................................................................................... 14

Golden Copy.................................................................................................... 15

Full Sets of Source Data.................................................................................. 15


3/23

White Paper


Test............................................................................................................................... 16

Test Environment................................................................................................... 16

Concurrent Testing .......................................................................................... 16

Regression Testing.......................................................................................... 17

Automated tools............................................................................................... 17

Deployment.................................................................................................................. 19

Production Build.................................................................................................... 19

Deployment Checklist ...................................................................................... 19

Initial System Burn in ....................................................................................... 20

Summary...................................................................................................................... 21

About Intrasphere ....................................................................................................... 22

For More Information ................................................................................................ 23


4/23

White Paper


Introduction

Data warehouses are large, expensive systems. They typically take years to fully implement, require the

efforts of large teams, and often fail to deliver on the initial promise due to technical, procedural and other

reasons. However, successful data warehouses that result in true business value and competitiveadvantage can be built if the correct approach and best practices are followed.

This white paper serves as a practical best practices guide to use for Data Warehouse initiatives. It draws

upon the authors years of experience designing and building Data Warehouses in the Pharmaceutical

Life Sciences industry.

This document is organized around the end-to-end process that includes the following key phases:

Planning

Analysis

Design

Development

Test

Deployment


5/23

White Paper


Planning

Data warehouses require lots of planning in order to achieve success. The following are some best

practice activities that help ensure the appropriate level of planning is done prior to building the system.

Readiness Assessment

In the planning phase, it is important to honestly assess the readiness of the organization for the

implementation of a data warehouse. A data warehouse readiness assessment is important to identify

areas of potential failure. There are several published methods for executing this assessment, such as

those found in Ralph Kimballs The Data Warehouse Lifecycle Toolkit. The important factors to consider

usually cover the following basic areas:

Corporate Sponsorship

Appropriate Budget, Time and Resource Allocations

User Demand

IT Readiness

Corporate Sponsorship

Corporate Sponsorship is a key factor to consider in assessing the readiness of the organization.

Successful data warehouses have strong senior sponsorship from the leadership of the organization.

Ideally, the senior sponsor will be a respected visionary with the clout to influence budgets and convince

others of the importance of the data warehouse. Strong sponsorship will help:

Get corporate buy in, generating the acceptance of the data warehouse within the organization

Allocate resources and budget, ensuring the data warehouse has the funding and support it

needs to be built

Assist with removing any roadblocks that may come up during the building and deployment of the

data warehouse

Bridge departments to garner cooperation across departmental lines

Create and share a vision or mission statement that will convince the company as a whole of the

importance of the data warehouse

If the senior sponsor is not fully committed to the expense, time, and effort of the data warehouse, it may

be difficult to get others to fully support the effort, especially if timelines or budgets run over. While it is

possible to build a data warehouse without strong sponsorship, it is much more difficult and risky. This


6/23

White Paper


step in the readiness assessment serves to identify or recruit the person(s) that will act as the sponsor(s),

and to gauge their commitment to championing the data warehouse effort.

Appropriate Budget, Time and Resource Commitments

Data warehouses require large investments in time, resources and money. They are usually implementedvia multi-year projects, and often have large teams both building and supporting them. A key to the

success of building a data warehouse is to make sure that the budget, time and resource needs are met.

It is critical to set realistic expectations and to gain commitments before starting. It is not atypical for a

data warehousing project to cost 50%-100% more than original estimates, or take two times longer to

complete. In addition to initial commitments on budgets, time and resources, it is wise to set aside

contingency amounts.

Many times, it is easier to break up the effort into several phases, and procure the budget and resource

commitments based on smaller efforts. Care should be taken to ensure that the areas of the datawarehouse that will produce the highest ROI receive the highest priority. This will help to prove the value

of the data warehouse and to acquire future funds and resources. This step in the assessment is to

understand how difficult it will be in raising the funds and getting resources committed to the effort of

building the data warehouse.

User Demand

Data warehouses need to meet the demands of its users, or it will not be adopted by the user community

and thus the proposed value of the data warehouse will not be realized and the project deemed a failure.

It is very important to make sure that the user community is open to changing its business operations to

include using the data warehouse. The key here is to get the user community eager to get involved and

excited in the potential of the data warehouse. Not only will the data warehouse be built to better answer

the types of questions the users want to ask, but it will be better accepted by the user community when

deployed. This step in the assessment is to interview a few users and ensure that there is a need, and

that if you build it, they will come.

IT Readiness

A data warehouse will usually be built and supported by a companys Information Technology

department. It is critical to evaluate the technical abilities of the IT department in hosting the data

warehouse. Several factors should be investigated to assess whether or not the IT department is ready to

support the effort:

Ability to acquire, deploy and host the necessary hardware

Ability to acquire the software licenses necessary

Necessary resources and budgets to host the system


7/23

White Paper


Technical skills and experience with the hardware and software platforms chosen to implement

the data warehouse (database, ETL, business intelligence tools, etc.)

Technical skills to restart the system in case of failure

Technical skills to back up and restore the system as needed

Technical skills to rebuild the system in case of disaster

If the IT department is already supporting data warehouses, it is a good idea to plan to use similar

platforms if possible. This assessment step is critical in identifying any gaps in the support the system will

need once built. These gaps, if any, should be addressed in the initial data warehouse project.

Project Planning

There are several activities conducted during the project planning phase of the data warehouse projectthat can help ensure success. While most are typical in any project, the size and complexity of a data

warehouse project can make them especially critical. Creating a realistic project timeline, ensuring clear

communication channels, setting up rigorous scope and change controls, issue tracking and escalation,

and frequent status checks are all important in any project, but critical to the success of a data

warehouse. In addition, a data governance board should be established to quickly resolve any data

issues found, or escalate them to proper resolution as fast as possible.

Realistic Timelines and Scope

It is highly recommended that the project timeline be driven by a bottom up approach. The scope for the

initial release (in a multi-phased approach) should be clearly identified. Each task needed to accomplish

the scoped result should be assigned an estimated effort, named resources (if possible), and identifiedconstraints. It is very risky to time-box a data warehouse project to meet a specific deadline. Usually,

attempting delivery with a top-down timeline will result in a very limited project scope.

Phased Approach

A phased approach is highly recommended. Data warehouses are easily chunked into work efforts

based on either sets of data source systems or sets of data marts addressing specific related business

function needs. A well designed data warehouse needs to be able to add new source systems and new

business intelligence tools throughout its lifetime. This property lends itself to phased development and

deployment. Phased efforts usually take longer and can cost more in the long run than all or nothing

efforts, but are inherently much less risky, and users can gain access and use the system much earlier

(which means the return on the investment is realized earlier as well.)

Another major benefit of a phased approach is that it is more flexible to users needs. Users typically

develop new ideas and requirements once they start accessing a data warehouse. A phased approach

can respond quickly and increase the value of the system to these users.


8/23

White Paper


Finally, a phased approach allows teams to include bug fixes or resolved data issues that may not have

been caught during testing. This will increase the quality of the data warehouse, and increase its value to

the users.

Communication Channels and Issue Tracking

A clear organization chart of the project and clear, articulated communication channels should be

established to ensure that issues are raised and dealt with in a timely manner. Data warehouses typically

uncover many unexpected problems, and problem resolution is key to ensuring the project runs smoothly

and is not excessively delayed by unresolved issues. Issues should be tracked, and the status of the

issues should be reviewed frequently to ensure progress is being made. Utilizing an issue tracking tool

can assist in making sure issues are visible to the entire team, but should not be the sole channel for

communicating issues. When issues are raised, they should be acknowledged before being assumed to

be assigned.

A weekly status meeting is effective for tracking major issues. However, due to the number of teams and

the complexity of the data warehousing development and testing effort, it is recommended that each teamhave frequent team meetings in addition to project status meetings.

Test team members should also be paired with development team members to report bugs directly. This

will usually allow for better communication between tester and developer, and will help speed the overall

resolution of issues found during testing.

Data Governance Committee

Data warehouse projects uncover many hidden data issues in operational databases. Due to the high

number of data issues found during initial development of a data warehouse, it is highly recommended

assembling a Data Governance Committee. The role of the committee will be to:

Resolve or seek resolution on data issues

Publish a set of master data by identifying official sources for data lists (such as product lists)

Identify methods for fixing data errors

Describe the definitions of data elements across multiple source systems

Enforce data standards

The Data Governance Committee should be staffed by knowledgeable data stewards from the business

users, source systems and the data warehouse. Its members may also perform some tasks in the data

warehouse development effort.


9/23

White Paper


Analysis

The analysis phase of a data warehousing project typically takes longer than other system development

projects. This is because of the effort required to profile and analyze the data from multiple source

systems. There are several activities that are best practices during this phase. They include:

Data profiling

Capacity planning

Tool assessment

Data Analysis

Data Profiling

Data profiling is a critical component to building a data warehouse. It consists of investigating the raw

data in the source systems, looking for data patterns, limits, string lengths, distinct values, constraint

violations, etc. It is the first pass at identifying potential data issues, and is also the analysis step that will

generate requirements for the designs of the data models and the data mappings. This step is typically

done by a data analyst using SQL editors or data profiling tools.

Most times this step is done in parallel with data modeling and data mapping efforts, usually by the same

team. This approach allows the team to focus on specific areas, modeling, mapping, and assessing the

data quality all at once. Even though the data modeling and data mapping activities reside in the Design

Phase, the reality is that these three activities lend themselves to iterative execution very well, and are

usually done together. In fact, it is not uncommon for these iterative activities to continue into the

development phase, since data issues and anomalies are sometimes not encountered until then.

System Analysis

Capacity Planning

A capacity plan is a critical component for any data warehouse. It is the guide to growing the system over

time. Capacity plans for data warehouses consider the following:

Initial data storage size requirements of the data warehouse

Incremental data growth due to ETL migrations

Number of users (usually identified as named users, active users and concurrent users)

Estimated processing requirements based on concurrent user queries and other processes

ETL schedule requirements


10/23

White Paper


User access time (up-time) requirements

Archiving and Partitioning

The inputs into the initial capacity plan should be gathered during the analysis phase, since data profiling

can give a clear understanding of data size and growth expectations. These will be used to ensure

appropriate data storage capacity is available, and that the ETL servers can process the required data

within the ETL batch window.

A capacity plan should be a living document, measuring the initial growth and user pattern estimates

against actual values periodically. This will ensure that performance and functionality is not limited by

data warehouse growth.Tool Assessment

Data warehouses require a number of specialized software tools. The most simple data warehouses willhave database software, ETL software, and some business intelligence or reporting software. During the

analysis phase, it is critical to identify the tools to be used to implement the data warehouse, especially if

they will require the assigned development resources to be trained in their use. It is essential to choose

the tools that will not only fit the current vision, but the future expectations of the data warehouse. Scaling,

support, and product upgrades should be considered. Time should be set aside to allow for vendor

demos and bake-offs. Time should be allowed for the vendors to build demos based on specific

requirements as this can help identify the fit of the tools.

The foundation software components of a data warehouse are typically very expensive, and require a

good deal of technical experience and competence to ensure a successful implementation. This causes

most companies to standardize on specific tool vendors. If that is the case, a tool assessment is stillrecommended to ensure that gaps in functionality are identified so that alternate solutions can be

designed.


11/23

White Paper


Design

The design phase of a data warehouse is typically longer than other system projects. This is due to

several factors, including the iterative nature of data modeling and data mapping, the design of complex

technical infrastructure, and the interactions with the several source systems. This section of bestpractices will identify some key considerations during the design phase of a data warehouse.

Data Modeling

The most critical task for any data warehouse is the data model. Deciding the appropriate model is key to

the success and performance of the data warehouse, as well as to the types and diversity of queries

supported. There are also several best practices based on or related to the data model. Data mapping

describes the movement of data from the source systems into the target data models. Data marts are

views or sub-components of the data warehouse built to support a specific business process or group of

related processes for a specific business functional group. A/B data switches are a mechanism used to

maintain separation between ETL and end users, allowing both access to the data concurrently.

Dimensional versus Relational

There are two approaches to data modeling for a data warehouse, each with strengths in various

scenarios. They are the Dimensional and the Relational models. Dimensional models (also called star

schemas) are especially effective for data that is aggregated at different levels, and for building cube

databases for drilling on various data attributes. Dimensional models retrieve large amounts of data more

efficiently and faster than Relational models. Relational models are effective for pinpointing individual

records quickly. Most data warehouses use dimensional models for the data marts (where users interact

with the data.) The back office staging areas usually contain both dimensional and relational tables, each

depending on the needs of the ETL and other back office processing.

The data modeling effort in a data warehouse takes 4 times as long as an operational or transactional

system. It is an iterative effort, and becomes increasingly (some say exponentially) more complicated with

each additional source system conformed. It is critical to have an experience data warehouse data

modeler work with the project team and data governance board to accomplish this critical task.

Data Mapping

Data mapping is an essential part in designing the ETL of the data warehouse. It identifies each source

data element, any transformations or processing routines applied, and the target data element into which

the data is loaded. Two main tips for data mapping are to use a single spreadsheet worksheet for each

table, and map target to source. When mapping target to source, it is easier to ensure that all required

data fields in the target data model are addressed. Mapping source to target can include many

unnecessary data elements (in the source systems) that will confuse and distract.

The data mapping is also used by the development team for implementing the ETL. All transformations,

data quality checks, and cross references must be included in the data mapping document. It is also not


12/23

White Paper


uncommon for the final revisions of the data mapping to be made during or shortly after the ETL

development, to reflect any modifications made during this phase.

Data Marts

Data marts are the views or tables of data that the users interact with. Data marts are tailored to answerspecific sets of user queries in order to ensure the best possible performance. This means that data

marts are typically limited to a single business process or a group of related business processes for a

single business functional group. This allows some variations on data modeling approaches and physical

implementation approaches for each data mart with performance tuning in mind. As mentioned above,

data marts can be either views or tables, or even files, depending on query performance requirements.

Often, data marts are materialized views that are refreshed with each ETL batch execution. However,

some data marts are frozen points in time (for example, quarterly data) in order to ensure best possible

performance. The bottom line is that a data mart is an individually tailored set of data to best serve the

users accessing it for the types of questions they need answered. A good data warehouse will have many

data marts specific to identified needs, rather than just piling all the data together to let the users do whatthey want.

System Design

Modular Design

One critical design best practice in data warehousing is to ensure that the design of all the system

components is as modular as possible. Data warehouses are complicated systems that interact with

many other transactional and operational systems. Over the lifespan of a typical data warehouse, source

systems will be retired and replaced by new sources. In addition, new data marts will be required to ask

and answer new business questions. New business intelligence tools will be leveraged to analyze and

model the results. This means that several times during the lifespan of a data warehouse, certain parts ofthe warehouse will be redesigned, replaced, or retired. In order to minimize the impact to the system

during these changes, a modular design is critical.

The design should balance the desire for reuse with the expectation of replacement. For example,

designs should abstract the ETL staging area to allow for a data source to be replaced and minimally

impact the reports leveraging the data. There are several data modeling techniques that can help

minimize the impact of change (such as slowly changing dimensions.) Other heuristics include:

ETL should be done in several legs (acquire from source, process and transform data, integrate

with other data, load into data marts)

Data marts should be used to group specific related business functional areas

Data models should support changes with minimum impact

One business function per code module


13/23

White Paper


A/B Data Environments

There are a few strategies for minimizing the impact of the ETL on end users. Typically, the ETL

processes run during off hours, and users are either discouraged from or even denied access to the data

during these times. An A/B switch is a set of identical tables, one available for user access while the other

is being updated by the ETL. This can be an especially effective strategy for ETL processes that run morethan once a day. The concept can be implemented in a number of ways, but the idea is fairly simple. The

ETL updates one set of tables while the users access the other. Then, when the ETL is finished, they

switch (usually by dropping and recreating synonyms.) This strategy also has the benefit of allowing the

users to continue to have access to the data in the event of an ETL failure.


14/23

White Paper


Development

The development phase of a data warehouse typically involves multiple sub-teams building various

components of the system. There may be one or more ETL teams, a database team, and one or more BI

system development teams (for the end user reports and query tools.) Careful planning is required toprevent these resources from stepping on each other toes during component development. Typically,

data warehouses will have multiple development environments to allow each team to focus on its

components, without negatively impacting other teams. This adds complexity, however, when the

underlying architecture components change, since these changes must be tracked and published to the

many development environments. These complications can be minimized by maintaining clear

communication channels and architecture oversight, usually in the form of the architecture team.

Development Environment

Multiple Dev Databases

As mentioned above, having several development environments will prevent teams from impacting each

other. An example would be the ETL team running component tests changing the data while the report

development team is writing SQL queries. The goal of this best practice activity is to give each team a

development environment that will minimize the impact of such situations. Depending on the design

structure of the data models, this can be a single database schema within a development instance, or

may be a copy of the entire database itself in separate database instances. The design of the

development environments will depend on balancing available hardware, software, and support resources

with the needs of the development teams.

Architecture Team

An architecture team is required to mitigate the potential dangers of variations in the architecture and data

models of multiple development environments. The role of the architecture team is to:

Control changes to the underlying architecture of the data warehouse

Brainstorm to resolve architectural issues

Represent each development team with regards to impacts

Communicate across development teams

Plan inter-team activities

Changes to the logical and physical database structures are especially important, since they will impact

all teams. The architecture team should consist of members of each development team to review and

assess the impact of proposed modifications with regards to the team they are representing. They also

represent the communication channel back to the team.


15/23

White Paper


In addition to controlling changes, the architecture team should be a conduit for each development team

to communicate to the other teams. Sharing of issues and resolutions, enhancement ideas, status, and

coordination of cross team activities (such as the periodic refresh of the dev environments) should be part

of the discussions of the architecture team.

Golden Copy

One of the key activities in minimizing the potential impact of multiple development environments is the

periodic refresh of those environments with the latest version of the official development environment

footprint. This official version is often called the golden copy. It is also the version of the development

environment that will be promoted to the test environments when the time comes to do so. The golden

copy should be maintained by one or more members of the architecture team. Whenever changes to the

underlying architecture are made, and these changes pass component (unit) testing, they should be

integrated into the golden copy. Then, the golden copy should be distributed to the various development

environments to allow team members to test for impacts to their components and to adjust their code if

necessary. Each team will then submit tested versions of its components for inclusion into the golden

copy, where it can be distributed to the other teams in the next periodic refresh. In addition, the goldencopy should maintain a full set of source data.

Full Sets of Source Data

Acquiring full, up-to-date copies of the data sources is an important key to the success of building and

testing data warehouse components. It will allow realistic component testing, is a key for performance

testing, and is also vital for preventing data issues that cause errors in the ETL and Business Intelligence

queries. In addition, it saves the need to create test data, which can be extremely time-consuming. It may

be necessary to update the copies with newer copies if a source system is altered (upgraded) or if the

overall data changes significantly.

Any manufactured data created by the development or test teams should not be stored in the golden copy

of the source data, unless all teams agree.


16/23

White Paper


Test

The testing phase of a data warehouse usually takes significantly longer than the testing of an operational

system. This is due in large part to the need to run multiple ETL cycles in order to fully test a data

warehouse release. If the initial data load will be implemented through the ETL processes, testing the firstfull run of the ETL can take several days. Similar to development, having a few test environments can

accelerate the process by allowing some tests to be run concurrently. In addition, the utilization of

automated testing tools can significantly decrease the time it takes to implement regression tests.

Test Environment

Concurrent Testing

Similar to the best practice of having several development environments, several test environments will

allow for concurrent testing. The following list is an example of the types of concurrent tests that can be

run if there are multiple test environments (database instances and schemas):

Initial ETL Load/Performance Test (tests the performance of the first run of ETL)

Incremental ETL Tests (both performance and functional tests)

Business Intelligence Data Mart Tests (tests the business intelligence components)

User Acceptance Tests

Typically, the initial load of an ETL process can take from several hours to several days, depending on

the amount of data processed. For example, an ETL run loading 10 years of data will handle 520x the

data a weekly incremental ETL run will process. Performance tuning changes require that these tests be

run a few times. This alone is why Performance and Load Testing is usually done in a separate

environment.

The incremental ETL testing will require several cycles to ensure data conditions are thoroughly tested

(e.g. testing inserts, updates, and deletes to source data.) Since these tests are focused on testing

changes to the data, combining these tests with Business Intelligence report and query tests can extend

the timeline of both. Typically, a specific ETL process is run to create the data mart environments for the

Business Intelligence testing processes/environment; these data marts are then left untouched by the

ETL until the next cycle.

While multiple environments can speed up the testing cycles, care must be taken to ensure that all

environments are rebuilt with the latest golden copy for each cycle. The golden copy will include the fixes

to all bugs found in the previous cycle, and each environment will need regression testing to ensure the

fixes did not break something already tested.


17/23

White Paper


Finally, it should be noted that the challenges, complexities, and costs of maintaining several test

environments has encouraged some projects to forgo the benefits of time savings and adopt a single

threaded, serial approach to testing.

Regression Testing

Regression testing is a key component of data warehouse development. In the first release, regression

testing is used to ensure code stability by testing bug fixes for negative impacts to components that have

already passed testing. While this is important for the first release, it is critical for the ongoing lifecycle of

the data warehouse. Subsequent releases will include new source systems, new business intelligence

tools, and/or new data marts with new ETL driven data processing. While regression testing is not unique

to data warehousing, developing a good regression test plan is essential to ensuring that multiple future

releases are deployed smoothly with little impact to the users. A best practice with regards to regression

testing is the use of automated testing tools.

Automated tools

There are several factors in choosing and using automated testing tools for a data warehouse. It should

be noted that in most scenarios, the initial implementation of automated testing is time consuming. The

activities involved in setting up the tools, creating the test scripts, and debugging test cycles, as well as

training testers, can add significant time to the test planning phase of the project. However, over the life of

a data warehouse, this initial investment will pay off. There are some key features that these tools should

possess:

Test script repository

Test script version control

Ability to organize and combine test scripts

Thorough result reporting

Ability to continue processing after errors are encountered

Ability to initiate ETL processes

Ability to query database

Ability to interact with Business Intelligence tools

Intuitive user interface

Issue tracking


18/23

White Paper


It may be necessary to use several automated test tools. For example, performance and load testing may

require a different tool than Business Intelligence report testing. Using a suite of vendor related tools is

common, and may have the benefit of leveraging a single repository for test scripts and issue tracking.


19/23

White Paper


Deployment

Deploying a data warehouse usually takes longer than deploying an operational system as well. (Why

would this be any different?) This is especially true for the first release. The initial data load of the system

can take several days. In addition, the system is usually given a few days to cycle through the ETL toensure all system integration points are correct, and no issues arise. The deployment package should

include the golden copy and checklists of steps to deploy the system, set up scheduled batch jobs, modify

permissions and create user accounts on both source and target databases, and implement other

necessary preparatory actions

Production Build

Deployment Checklist

The deployment checklist is an essential tool to ensure a smooth deployment. The checklist should be

developed by cataloguing all activities required to build the test environments. During the final test cycle,

the test environment should be built from scratch using the deployment checklist in order to test the

checklist. The checklist is usually a spreadsheet or series of spreadsheets that contain the following

information:

Activity name

Assigned resource

Step-by-step detailed activity instructions

Golden copy filenames or component names

Hardware and software component identifiers (such as server ids and schema names)

Required parameters (including login ids, etc.)

Communication directions (notification of completion, errors, etc.)

Execution or scheduling of initial processes (such as kicking off the ETL)

Including passwords in the deployment checklist can be a security risk. It is recommended that any

passwords that are needed to execute a deployment checklist step be supplied outside of the checklist in

a secure manner. Another alternative is to util ize temporary passwords during the deployment, with a finalchecklist step noting a change to the temporary passwords.

The checklist should be owned by the architecture team. While it is recommended that one person

coordinate the deployment effort through monitoring the checklist activities, it is vital to ensure that more

than one person be fully knowledgeable on the details of the checklist activities.


20/23

White Paper


Initial System Burn in

A system burn-in period is recommended after the initial deployment of the data warehouse. During this

burn-in period, the ETL is run for a few days to ensure there are no unexpected problems, such as a

scheduled backup interfering with the ETL. A few users should be granted access to ensure there are no

problems with the user-facing business intelligence applications and report data. This is a highlyrecommended best practice activity, because during the first release, something will go wrong. A week of

burn-in is typical.


21/23

White Paper


Summary

Data warehouse projects are similar to many other system development projects, and many of the

standard project methodology best practices apply. However, data warehouses are typically much larger

and much more complex than other types of systems, and this complexity brings challenges and risks.The underlying theme of many of the best practices listed in this document is to reduce the complexity

and size of the activities into smaller, more manageable chunks. Using a modular approach to component

design, a phased approach to scoping, and an iterative approach to analysis and testing can help

accomplish this goal. A methodical, piece-by-piece approach to building a data warehouse is much more

manageable than an all or nothing approach.


22/23

White Paper


About Intrasphere

Intrasphere Technologies, Inc. (www.intrasphere.com) is a consulting firm focused on the Life Sciences

industry. We provide comprehensive, business-focused services that help companies achieve

meaningful results. Our professionals leverage strategic acumen, deep industry knowledge and provenproject execution abilities to deliver superior service that builds true business value.

Our strategy, business process and technology services are developed to specifically address areas that

are most important to our clients including; Drug Safety, Business Intelligence, Enterprise Content

Management, Compliance and IT Management, to name a few.

We understand the unique nature of the Life Sciences working environment and clients need to reduce

costs, drive business processes and speed-to-market, while satisfying regulatory mandates.

Some of the worlds leading global companies including Pfizer Inc. (NYSE:PFE), Johnson & Johnson

(NYSE: JNJ), Novartis (NYSE: NVS), Eli Lilly (NYSE: LLY), Vertex Pharmaceuticals (Nasdaq: VRTX) andHarperCollins Publishers (NWS), among others, look to Intrasphere as their trusted solutions partner.

Founded in 1996, Intrasphere is headquartered in New York City with operations in Europe and Asia.

Intrasphere has been recognized nationally for performance by industry leading organizations such as,

Deloitte & Touche, Crains New York Business and Inc. Magazine.


23/23

White Paper

For More Information

Jim Brown

Intrasphere Technologies

(212) [email protected]

Locations

North America:

Corporate Headquarters

New York City

Intrasphere Technologies, Inc.

100 Broadway, 10th Floor

New York, NY 10005

ph: +1 (212) 937-8200

fax: +1 (212) 937-8298

Europe:

United Kingdom

4th Floor

Brook House

229-243 Shepherds Bush Road

Hammersmith London, W6 7NL

ph: +44 (0) 208 834 3700

fax: +44 (0) 208 834 3701

Asia:

India

Block 2-A, DLF Corporate Park

DLF City, Phase III

Gurgaon, Haryana 122002

ph: +91 (0124) 4168200

fax: +91 (0124) 4168201

data warehousing best practices whitepaper

Documents