dw - capability maturity model

167
DWCMM: The Data Warehouse Capability Maturity Model Thesis for the Degree of Master of Science Author: Catalina Sacu Student Number: 3305260 Thesis Number: INF/SCR-09-86 Institute of Information and Computer Sciences Utrecht University, The Netherlands Supervisors: Utrecht University: dr. Marco Spruit 1 st dr. ir. Johan Versendaal 2 nd Inergy: Frank Habers

Upload: waqarasghar

Post on 20-Oct-2015

85 views

Category:

Documents


2 download

DESCRIPTION

DW Capability Maturity model

TRANSCRIPT

Page 1: DW - Capability Maturity Model

DWCMM: The Data Warehouse

Capability Maturity Model

Thesis for the Degree of Master of Science

Author:

Catalina Sacu

Student Number: 3305260

Thesis Number: INF/SCR-09-86

Institute of Information and Computer Sciences

Utrecht University, The Netherlands

Supervisors:

Utrecht University: dr. Marco Spruit 1st

dr. ir. Johan Versendaal 2nd

Inergy: Frank Habers

Page 2: DW - Capability Maturity Model

II

Abstract

Data Warehouses (DWs) and Business Intelligence (BI) have been part of a very dynamic and popular

field of research in the last years as they help organizations in making better decisions and increasing

their profitability. Unfortunately, many DW/BI solutions fail to bring the desired results, and therefore, it

is important to have an overview of the critical success factors. However, this is usually very difficult to

do as a DW/BI project is a very complex endeavour. This research offers a solution to this problem by

creating a Data Warehouse Capability Maturity Model (DWCMM) focused on the technical and

organizational aspects involved in developing a DW environment. Based on an extensive literature study,

the DWCMM consists of a maturity matrix and a maturity assessment questionnaire that analyze the main

categories and sub-categories necessary when implementing a DW/BI solution. The model and its

associated questionnaire can be used to help organizations assess their current DW solution and provide

them with guidelines for future improvements. In order to validate and enrich the theory created, the

DWCMM was evaluated empirically through five expert interviews and four case studies. Based on the

evaluation results, some minor changes were made to improve the model. The main conclusion of this

research is that the DWCMM can be successfully applied in practice and organizations can use it as a

starting point for improving their DW/BI solution.

Page 3: DW - Capability Maturity Model

III

Acknowledgements

Utrecht, August 2010

I would like to use this opportunity to thank some people who made a significant contribution to this

research.

First, I would like to express my gratitude to my supervisors at Utrecht University, dr. Marco Spruit and

dr. ir. Johan Versendaal, for their professional guidance and constructive feedback during the project.

Second, I would like to thank my external supervisor Frank Habers for offering me the opportunity to

perform my research during an internship at Inergy. He has been extremely helpful in providing prompt

access to all the resources I needed for the research and in offering me guidance and advice every time I

needed it. He has also helped me arrange several expert interviews for validating my model. I would also

like to thank other colleagues from Inergy, especially Rick Tijsen, for their support and enthusiasm during

my research there.

Furthermore, I would also like to express my gratitude to all the experts and respondents who made room

for me in their busy schedules and accepted to review my model and fill in the assessment questionnaire,

respectively.

Last, but not least, I would like to thank my parents, my boyfriend and my friends for their constant love,

support and welcome distractions when needed.

Catalina Sacu

Page 4: DW - Capability Maturity Model

IV

Table of Contents

1 Introduction .......................................................................................................................................... 1

1.1 Problem Definition & Research Motivation ................................................................................. 1

1.2 Research Questions ....................................................................................................................... 2

1.3 Research Approach ....................................................................................................................... 3

2 What are Data Warehousing and Business Intelligence? ..................................................................... 5

2.1 The Intelligent Organization ............................................................................................................... 5

2.2 Data-Information-Knowledge............................................................................................................. 6

2.3 The Origins of DW/BI ........................................................................................................................ 8

2.4 DW/BI Definition ............................................................................................................................... 9

2.5 DW/BI Business Value ..................................................................................................................... 11

2.6 Inmon vs. Kimball ............................................................................................................................ 12

2.7 Maturity Modelling ........................................................................................................................... 13

2.7.1 Data Warehouse Capability Maturity Model (DWCMM) Forerunners .............................. 14

2.8 Summary ........................................................................................................................................... 16

3 A Data Warehouse Capability Maturity Model ....................................................................................... 17

3.1 From Nolan‘s Stages of Growth to the Data Warehouse Capability Maturity Model ...................... 17

3.2 DWCMM .......................................................................................................................................... 18

4 DW Technical Solution Maturity ............................................................................................................. 24

4.1 General Architecture and Infrastructure...................................................................................... 24

4.1.1 What is Architecture? .......................................................................................................... 24

4.1.2 Conceptual Architecture and Its Layers .............................................................................. 25

4.1.3 Infrastructure ....................................................................................................................... 27

4.1.4 Metadata .............................................................................................................................. 28

4.1.5 Security ............................................................................................................................... 30

4.1.6 Business Rules for DW ....................................................................................................... 31

4.1.7 DW Performance Tuning .................................................................................................... 32

4.1.8 DW Update Frequency ........................................................................................................ 33

4.2 Data Modelling ........................................................................................................................... 34

4.2.1 Data Modelling Definition and Characteristics ................................................................... 34

4.2.2 Data Models Classifications (Data Models Levels and Techniques) .................................. 34

4.2.3 Dimensional Modelling ....................................................................................................... 37

Page 5: DW - Capability Maturity Model

V

4.2.4 Data Modelling Tool ........................................................................................................... 41

4.2.5 Data Modelling Standards ................................................................................................... 41

4.2.6 Data Modelling Metadata Management .............................................................................. 42

4.3 Extract – Transform – Load (ETL) ............................................................................................. 43

4.3.1 What is ETL? ...................................................................................................................... 43

4.3.2 Extract ................................................................................................................................. 43

4.3.3 Transform ............................................................................................................................ 44

4.3.4 Load .................................................................................................................................... 46

4.3.5 Manage ................................................................................................................................ 47

4.3.6 ETL Tools ........................................................................................................................... 48

4.3.7 ETL Metadata Management ................................................................................................ 49

4.3.8 ETL Standards..................................................................................................................... 49

4.4 BI Applications ........................................................................................................................... 50

4.4.1 What are BI Applications? .................................................................................................. 50

4.4.2 Types of BI Applications .................................................................................................... 50

4.4.3 BI Applications Delivery Method ....................................................................................... 54

4.4.4 BI Applications Tools ......................................................................................................... 54

4.4.5 BI Applications Metadata Management.............................................................................. 55

4.4.6 BI Applications Standards .................................................................................................. 55

4.5 Summary ........................................................................................................................................... 56

5 DW Organization and Processes .............................................................................................................. 57

5.1 DW Development Processes............................................................................................................. 57

5.1.1 DW Development Phases .......................................................................................................... 58

5.1.2 The DW/BI Sponsor .................................................................................................................. 64

5.1.3 The DW Project Team and Roles .............................................................................................. 65

5.1.4 DW Quality Management ......................................................................................................... 66

5.1.5 Knowledge Management ........................................................................................................... 66

5.2 DW Service Processes ...................................................................................................................... 67

5.2.1 From Maintenance and Monitoring to Providing a Service ...................................................... 68

5.2.2 IT Service Frameworks ............................................................................................................. 68

5.2.3 DW Service Components .......................................................................................................... 71

5.3 Summary ........................................................................................................................................... 77

6 Evaluation of the DWCMM ..................................................................................................................... 78

Page 6: DW - Capability Maturity Model

VI

6.1 Expert Validation .............................................................................................................................. 78

6.1.1 Expert Review Results and Changes ......................................................................................... 78

6.2 Multiple Case Studies ....................................................................................................................... 81

6.2.1 Case Study Approach ................................................................................................................ 82

6.2.2 Case Overview .......................................................................................................................... 84

6.2.3 Case Studies Results and Conclusions ...................................................................................... 90

6.3 Summary ........................................................................................................................................... 93

7 Conclusions and Further Research ........................................................................................................... 94

7.1 Conclusions ...................................................................................................................................... 94

7.2 Limitations and Further Research ..................................................................................................... 95

8 References ................................................................................................................................................ 97

Appendix A: DW Detailed Maturity Matrix ............................................................................................. 105

Appendix B: The DW Maturity Assessment Questionnaire (Final Version)............................................ 112

Appendix C: DW Maturity Assessment Questionnaire (Redefined Version) ........................................... 124

Appendix D: Expert Interview Protocol ................................................................................................... 134

Appendix E: Case Study Interview Protocol ............................................................................................ 136

Appendix F: Case Study Feedback Template ........................................................................................... 138

Appendix G: Paper .................................................................................................................................... 139

Page 7: DW - Capability Maturity Model

VII

List of Figures

Figure 1: IS Research Framework (adapted from (Hevner et al., 2004)). ..................................................... 4

Figure 2: Information Gap (adapted from (Tijsen et al., 2009)). .................................................................. 5

Figure 3: The BI Cycle (adapted from (Thomas, 2001)). ............................................................................. 6

Figure 4: The Data-Information-Knowledge-Wisdom Hierarchy (adapted from (Hey, 2004)). .................. 7

Figure 5: Data Warehouse Capability Maturity Model (DWCMM). ......................................................... 18

Figure 6: DWCMM Condensed Maturity Matrix. ...................................................................................... 22

Figure 7: A Typical DW Architecture (adapted from (Chaudhuri & Dayal, 1997)). ................................. 25

Figure 8: DW Design Process Levels (adapted from (Husemann et al., 2000)). ........................................ 35

Figure 9: Star Schema vs. Cube (adapted from (Chauduri & Dayal, 1997)). ............................................. 38

Figure 10: Case Study Method (adapted from (Yin, 2009)). ...................................................................... 83

Figure 11: Alignment Between Organization A‘s Maturity Scores. ........................................................... 88

Figure 12: Alignment Between Organization B‘s Maturity Scores. ........................................................... 88

Figure 13: Alignment Between Organization C‘s Maturity Scores. ........................................................... 88

Figure 14: Alignment Between Organization D‘s Maturity Scores. ........................................................... 89

Figure 15: Benchmarking for Organization A. ........................................................................................... 90

Page 8: DW - Capability Maturity Model

VIII

List of Tables

Table 1: Differences between operational databases and DWs (adapted from (Breitner, 1997)). .............. 10

Table 2: Comparison of Essential Features of Inmon‘s and Kimball‘s Data Warehouse Models (Breslin,

2004). .......................................................................................................................................................... 13

Table 3: Overview of Maturity Models. ..................................................................................................... 14

Table 4: DW General Questions. ................................................................................................................ 20

Table 5: DW Architecture Maturity Assessment Questions. ...................................................................... 27

Table 6: Infrastructure Maturity Assessment Questions. ............................................................................ 28

Table 7: Business Metadata vs. Technical Metadata (adapted from (Moss & Atre, 2003)). ...................... 28

Table 8: Metadata Management Maturity Assessment Question. .............................................................. 30

Table 9: Security Maturity Assessment Question. ...................................................................................... 31

Table 10: Business Rules Maturity Assessment Questions. ....................................................................... 32

Table 11: Performance Tuning Maturity Assessment Question. ................................................................ 33

Table 12: Update Frequency Maturity Assessment Question. .................................................................... 34

Table 13: Data Model Synchronization and Levels Maturity Assessment Questions. ............................... 37

Table 14: Dimensional Modelling Maturity Assessment Questions. .......................................................... 40

Table 15: Data Modelling Tool Maturity Assessment Questions. .............................................................. 41

Table 16: Data Modelling Standards Maturity Assessment Questions. ...................................................... 42

Table 17: Data Modelling Metadata Management Maturity Assessment Questions. ................................. 43

Table 18: Data Quality Maturity Assessment Questions. ........................................................................... 46

Table 19: ETL Complexity Maturity Assessment Question. ...................................................................... 47

Table 20: ETL Management and Monitoring Maturity Assessment Question. .......................................... 48

Table 21: ETL Tools Maturity Assessment Question. ................................................................................ 49

Table 22: ETL Metadata Management Maturity Assessment Question. .................................................... 49

Table 23: ETL Standards Maturity Assessment Questions. ........................................................................ 50

Table 21: Table 24: BI Applications Maturity Assessment Question. ........................................................ 53

Table 25: BI Applications Delivery Method Maturity Assessment Question. ........................................... 54

Table 26: BI Tools Maturity Assessment Question. ................................................................................... 55

Table 27: BI Applications Metadata Management Maturity Assessment Question. .................................. 55

Table 28: BI Applications Standards Maturity Assessment Questions. ..................................................... 56

Table 29: DW Development Processes General Maturity Assessment Question. ...................................... 58

Table 30: Project Management Maturity Assessment Question. ................................................................ 60

Table 31: Requirements Definition Maturity Assessment Question. .......................................................... 61

Page 9: DW - Capability Maturity Model

IX

Table 32: Testing and Acceptance Maturity Assessment Question. ........................................................... 63

Table 33: Development/ Testing/ Acceptance/ Production Maturity Assessment Questions. .................... 64

Table 34: DW/BI Sponsorship Maturity Assessment Question. ................................................................. 65

Table 35: DW Project Team and Roles Maturity Assessment Question. ................................................... 65

Table 36: DW Quality Management Maturity Assessment Question. ........................................................ 66

Table 37: Knowledge Management Maturity Assessment Question. ......................................................... 67

Table 38: Overview of IT Service Frameworks. ......................................................................................... 69

Table 39: ITIL‘s Core Components (adapted from (Cater-Steel, 2006)). ................................................... 69

Table 40: IT Service CMM‘s Key Process Areas (adapted from (Paulk et al., 1995)). .............................. 71

Table 41: Maintenance and Monitoring Maturity Assessment Question. ................................................... 72

Table 42: Service Quality Management Maturity Assessment Question. .................................................. 73

Table 43: Service Level Management Maturity Assessment Question. ..................................................... 74

Table 44: Incident Management Maturity Assessment Question. .............................................................. 74

Table 45: Change Management Maturity Assessment Question. ............................................................... 75

Table 46: Incident Management Maturity Assessment Question. .............................................................. 76

Table 47: Availability Management Maturity Assessment Question. ........................................................ 76

Table 48: Release Management Maturity Assessment Question. ............................................................... 77

Table 49: Expert Overview. ........................................................................................................................ 78

Table 50: Rephrased or Changed Questions and Answers. ........................................................................ 80

Table 51: Case and Respondent Overview. ................................................................................................ 84

Table 52: Technologies Usage Overview. .................................................................................................. 85

Table 53: Organizations‘ Maturity Scores. ................................................................................................. 87

Table 54: Maturity Scores Analysis. ........................................................................................................... 89

Page 10: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 1 -

1 Introduction

In nowadays economy, organizations are part of a very dynamic environment due to continuous changing

conditions and relationships. At the same time, the external environment is an important source of

information (Aldrich & Mindlin, 1978) that organizations will have to gather and process very rapidly in

order to maintain their competitive advantage (Choo, 1995). Moreover, as (Kaye, 1996) notes,

―organizations must collect, process, use, and communicate information, both external and internal, in

order to plan, operate and take decisions.‖ The ongoing request for profits, increasing competition and

demanding customers, all require organizations to take the best decisions as fast as possible (Vitt et al.,

2002). Hence, in order to survive, companies have to adapt themselves to this new information

environment by shortening the period of time between the moment of acquiring the information and

getting the right results. One of the solutions that can narrow down this time gap and improve the decision

making process is the implementation of Data Warehouses and Business Intelligence (BI) applications.

1.1 Problem Definition & Research Motivation

The most fundamental aspect in a particular organization in today‘s highly globalized market is the

critical decision making capacity of the management, which influences the successful running of business

operations. Hence, it is very important for organizations to manage both transaction- and non-transaction-

oriented information for making timely decisions and react to changing business circumstances

(AbuSaleem, 2005). Moreover, in the last couple of years, enterprises have changed their business focus

towards customer orientation to remain competitive. Accordingly, maintaining relationships with clients

and managing their data have appeared as top issues to be considered by global companies. Also, many

researchers have reported that the amount of data in a given organization doubles every five years

(AbuSaleem, 2005). In order to process this high amount of data and make the best decisions as fast as

possible, the information must be reliable, accurate, real-time and easy-to-access. For such information,

all the enterprise-related data should be integrated and appropriately analyzed from a multi-dimensional

point of view. The solution for this is a data warehouse (DW).

Over the years, DWs have become one of the fundamentals of the information systems that are used to

support the decision making initiatives. The new era of enterprise-wide systems integration and the

growing needs towards BI both accelerate the development of DW solutions (AbuAli & Abu-Addose,

2010). Most large companies have already established DW systems as a component of the information

systems landscape. According to (Gartner, 2007), BI and DWs are at the forefront of the use of IT to

support management decision-making. DWs can be thought of as the large-scale data infrastructure for

decision support. BI can be viewed as the data analysis and presentation layer that sits between the DW

and the executive decision-makers (Arnott & Pervan, 2005). In this way, the DW/BI solutions can

transform raw data into information and then into knowledge.

However, a DW is not only a software package. The adoption of DW technology requires massive capital

expenditure and a certain deal of implementation time. DW projects are hence very expensive, time-

consuming and risky undertakings compared with other information technology initiatives, as cited by

prior researchers (Wixom & Watson, 2001; Hwang et al., 2004; Mukherjee & D‘Souza, 2003; Solomon,

Page 11: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 2 -

2005). The typical project costs over $1 million in the first year alone (AbuAli & Abu-Addose, 2010).

And, it is estimated that one-half to two-thirds of all initial DW efforts fail (Hayen et al., 2007).

Moreover, (Gartner, 2007) estimates that more than fifty percent of DW projects have limited acceptance

or fail. Therefore, it is crucial to have a thorough understanding of the critical success factors and

variables that determine the efficient implementation of a DW solution.

These factors can refer to the development of the DW/BI solution or to the usage and adoption of BI. In

this master thesis, we will focus on the former as we consider that it represents the foundation for a solid

DW solution that can have a high rate of usage and adoption. First, it is critical to properly design and

implement the databases that lie at the heart of your DW. The right architecture and design can ensure

performance today and scalability tomorrow. Second, all components of the data warehouse solution

(e.g.: data repository, infrastructure, user interface) must be designed to work together in a flexible, easy-

to-use way. A third task is to develop a consistent data model and establish what and how source data will

be extracted. In addition to these factors, the DW needs to be created and developed quickly and

efficiently so that the organization can gain the business benefits as soon as possible (AbuAli & Abu-

Addose, 2010). As can be seen, a DW project can unquestionably be complex and challenging. This is

why it is important to gain some insight into the technical and organizational variables that determine the

successful development of a DW solution and assess these variables. Therefore, it is the main goal of this

master thesis to do this by creating a Data Warehouse Capability Maturity Model (DWCMM) and

answering the following main research question:

How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon?

1.2 Research Questions

As stated before, the main goal of this research is to develop a DWCMM that will help organizations

assess their current DW solution from both technical and, organizational and processes points of view. In

order to do this and address the main research question, several sub-questions have been formulated and

have to be answered.

First, we would like to give an overview on the field of BI and DW in order to have a better

understanding of the context of BI/DW by answering the first sub-question: What are BI and DWs?

Then, we will elaborate on the second important element from our model, the maturity part. We will

identify the main characteristics of maturity models and the most representative maturity models for our

research by answering the following sub-question: What do maturity models represent and which are the

most representative ones for our research? Once we have a general overview on BI/DW and maturity

modelling, we can continue with presenting the stages of the DWCMM and the main characteristics for

each stage. We will in this way answer the next two sub-questions: What are the most important variables

and characteristics to be considered when building a data warehouse? and How can we design a

capability maturity model for a data warehouse assessment? Having created and presented the model, we

can now apply it as an assessment method at different organizations and see whether it is a viable source

of information and which changes need to be done. This will answer the last sub-question: To which

extent does the data warehouse capability maturity model result in a successful assessment and guideline

for the analyzed organizations?

Page 12: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 3 -

To summarize, in order to deliver a valid DWCMM, our research aims to answer the following research

questions:

Main question:

How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon?

Sub-questions:

1) What are business intelligence and data warehouses?

2) What do maturity models represent and which are the most representative ones for our research?

3) What are the most important variables and characteristics to be considered when building a data

warehouse?

4) How can we design a capability maturity model for a data warehouse assessment?

5) To which extent does the data warehouse capability maturity model result in a successful

assessment and guideline for the analyzed organizations?

1.3 Research Approach

Information systems (IS) are implemented within an organization for the purpose of improving the

efficiency and effectiveness of that organization. Hence, the main goal of research in this field is to create

―knowledge that enables the application of information technology for managerial and organizational

purposes‖ (Hevner et al., 2004). According to (Hevner et al., 2004), mainly two paradigms characterize

the research in the IS discipline: behavioural science and design science. Behavioural science aims to

develop and verify theories that explain or predict human or organizational behaviour. Design science

paradigm on the other hand seeks to extend the boundaries of human and organizational capabilities by

creating new and innovative artifacts. As discussed above, the main goal of our research is to develop a

DWCMM that depicts the maturity stages of a DW project, which can be used to assist organizations in

identifying their current maturity stage and evolving to a higher level. For this purpose, a design research

approach is used as its main philosophy is to generate scientific knowledge by building and validating a

previously designed artifact (Hevner et al., 2004). In this research, the artifact is the DWCMM, which is

developed according to the seven design science guidelines stated by (Hevner et al., 2004) and to the five

steps in developing design research artifacts as described by (Vaishnavi & Kuechler, 2008):

awareness of problem – it can come from multiple research sources. In our case, awareness of the

problem area was raised in discussions with DW/BI practitioners and literature study on data

warehousing and maturity modelling.

suggestion – it is essentially a creative step wherein new functionality is envisioned based on a

novel configuration of either existing or new elements. Before the actual development of the

DWCMM, we have done a thorough literature study, proposed ideas and received suggestions

from experts regarding the components of the model and the relationship between them. We have

also designed an outline framework of the model.

development – it involves the actual implementation of the model using various techniques

depending on the artifact to be constructed. This stage is highly related to the previous one. In our

Page 13: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 4 -

research, it involves the actual creation and presentation of the DWCMM with all its analyzed

categories and maturity stages.

evaluation – it consists of evaluating the constructed artifact according to criteria that are always

implicit and frequently made explicit in the awareness step. According to (Hevner et al., 2004),

case studies have proved to be an appropriate evaluation method in design research. Therefore,

the validation phase in our case has consisted of five expert interviews and four case studies. We

have received a lot of feedback and suggestions from the expert interviews for improving the

model. Then, once we redefined the model, we continued its validation within four organizations

following (Yin, 2009) case study approach.

conclusion – this phase is the finale of a specific research effort when results are summarized,

conclusions are drawn and suggestions for further research are discussed.

The way in which our research fits within the IS Research Framework designed by (Hevner et al., 2004)

can be depicted in the figure below.

Figure 1: IS Research Framework (adapted from (Hevner et al., 2004)).

As we adopted a design science approach for our study, the following structure was chosen for this thesis

document. We will first provide some background information on the main concepts of the study in

chapter 2. We will then present an overview of the design artifacts of this research in chapter 3. Chapters

4 and 5 offer a detailed analysis of the main components of the model we developed. In chapter 6, results

are presented for two evaluation activities of the model – expert interviews and test case studies. Finally,

in chapter 7, conclusions are drawn and limitations of this research are discussed along with some points

on further research.

Page 14: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 5 -

2 What are Data Warehousing and Business Intelligence?

In this section, the key background concepts of the study – data warehousing, business intelligence and

maturity modelling – will be summarized, and related work will be explored.

2.1 The Intelligent Organization

The ubiquitous complexity, speed of change and uncertainty of the present economical environment

determine organizations to face enormous challenges (Schwaninger, 2001). However, for a long time

organizations worked in closed settings and saw themselves as fortresses with walls and boundaries that

limited their activities and influence (Choo, 1995). Nowadays, this static representation of organizations

has become a relic. Today‘s organizations are complex, open systems that cannot function isolated from

the surrounding dynamic environment. As already discussed in the introduction, the external environment

is an important source of information (Aldrich & Mindlin, 1978) that organizations will have to gather

and process very rapidly in order to maintain their competitive advantage (Choo, 1995). However,

nowadays, information is being generated at an ever-increasing rate, which makes it very difficult for

companies to manage it. Decision makers often find themselves into an information overload problem,

being very hard for them to identify the right information for decision purposes in the available time

(O'Reilly, 1980). This causes a so-called ―information gap‖ due to the need for fast decision making on

one hand, and the longer time needed to acquire the right information on the other hand (Tijsen et al.,

2009). This requires decision makers to utilize information management systems and analysis for

supporting their business decisions (Turban et al., 2007). This is where BI/DW can help. As depicted in

figure 2, BI helps narrowing down the information gap by shortening the required time to obtain relevant

data and by efficient utilization of available time to apply information (Tijsen et al., 2009).

Figure 2: Information Gap (adapted from (Tijsen et al., 2009)).

In this way, organizations are not only information consumers, but also creators of information and

knowledge (Choo, 1995). This can help them understand and adapt very fast to the changes in their

business environment and maintain their competitive advantage. According to (Porter, 1985), an effective

competitive strategy requires a deep understanding of the relationship between the firm and its

environment. And this can be obtained by applying DW/BI as a competitive differentiator. This refers to

using DW/BI not only to get to know your own organization and customers, but also your competitors

Page 15: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 6 -

and getting competitive intelligence. As J. D. Rockefeller once said, ―Next to knowing all about your own

business, the best thing to know about is the other fellow‘s business.‖

As can be seen in figure 3, there is a whole BI cycle which starts with planning based on corporate needs;

then ethically collecting reliable information from valid sources; then analyzing the data to form

intelligence in conjunction with strategic planning and market research. Finally, in order for the

intelligence to have value, it must be disseminated in a form that is clear and understandable (Thomas,

2001). BI is a rigorous process where sources of information, including published information as well as

human sources, play a vital role. The BI process was working long before the development of computers

and knowledge database software, but those tools have allowed BI to have much greater value in the

decision-making process and in the way organizations sustain their competitive advantage.

Figure 3: The BI Cycle (adapted from (Thomas, 2001)).

In this way, organizations can become ―intelligent‖ and stay ahead of change which according to

(Drucker, 1999) is the only way of coping with change effectively. The main characteristics that

distinguish intelligent organizations are (Schwaninger, 2001):

to adapt to change as a function of external stimuli

to influence and shape their environment

to find a new milieu, if necessary, or to reconfigure themselves virtuously with their environment

to make a positive net contribution to the viability and development of the larger environments

into which they are embedded.

2.2 Data-Information-Knowledge

As the terms data, information and knowledge have been used in the previous paragraphs and will be used

further in this thesis, we would like to give a short overview on each of the terms and see the differences

between them. In everyday‘s writing the distinction between data and information is not clearly made,

and they are often used interchangeably; the same applies for information and knowledge. However,

many scientists claim that data, information and knowledge are part of a sequential order (Zins, 2007).

Data are the raw material for information, and information is the raw material for knowledge. A well

Page 16: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 7 -

known representation of the relationships between the three concepts is the ―DIKW (Data, Information,

Knowledge, Wisdom) Hierarchy‖. One of the versions of the hierarchy depicts it as a linear chain (Hey,

2004) as can be seen in figure 4. Not all versions of the DIKW model reference all four components

(earlier versions not including data, later versions omitting or downplaying wisdom), but the main idea is

the same. We will only elaborate on the first three concepts here as they are the most used and

acknowledged ones.

Figure 4: The Data-Information-Knowledge-Wisdom Hierarchy (adapted from (Hey, 2004)).

The distinctions and relationships between data, information and knowledge will be further elaborated in

the remainder of the paragraph.

Data

Data has experienced a variety of definitions, largely depending on the context of its use. For example,

Information Science defines data as unprocessed information and other domains leave data as a

representation of objective facts (Hey, 2004). According to (Ackoff, 1989), data is raw. It simply exists

and has no significance beyond its existence. Data is acquired from the external world through our senses

in the form of signals and signs. Much neural processing has to take place between the reception of a

stimulus and its sensing as data by an agent (Kuhn, 1974). In an organizational context, data is usually

described as structured records of transactions which are stored in a technology system for different

departments such as finance, accounting, sales, etc. (Davenport & Prusak, 2000). Data says nothing about

its own importance or irrelevance; nor does it provide judgement, interpretation or sustainable basis of

action. However, it is important to organizations because it is essential raw material for the creation of

information.

Information

Information is data that has been processed, interpreted and given meaning (useful or not) by way of

relational connection. It provides answers to ―who‖, ―what‖, ―where‖ and ―when‖ questions. In computer

parlance, a relational database makes information from the data stored within it (Ackoff, 1989).

According to (Boisot & Canals, 2004), information constitutes those significant regularities residing in

Page 17: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 8 -

the data that agents attempt to extract from it. In order to interpret data into information, a system needs

knowledge. The meaning of terms may be different for different people, and it is our knowledge about

particular domains - and the world in general - that enables us to get meaning out of these data strings.

Hence, for data to become information, interpretation and elaboration processes are required (Aamodt &

Nygård, 1995). Computers can assist in transforming data into information, but cannot replace humans.

Today‘s managers believe that having more information technology will not necessary improve the state

of information. To sum up, we could say that ―information is data that makes a difference‖ (Davenport &

Prusak, 2000).

Knowledge

Knowledge is broader, deeper and richer than data and information. It is the appropriate collection of

information such that it is placed in a certain context and its intent is to be useful. Knowledge is a

deterministic process and it provides answers to ―how‖ questions (Ackoff, 1989). According to

(Davenport & Prusak, 2000), knowledge is ―a fluid mix of framed experience, values, contextual

information and expert insight that provides a framework for evaluating and incorporating new

experiences and information‖. As can be seen, knowledge is not neat or simple to obtain. It can be

considered both process and stock, and its creation takes place within and between people. Knowledge

allows us to act more effectively than information and data as it gives the opportunity of predicting future

outcomes. We could say that knowledge in practical sense is ―value added information‖ (Jashapara, 2004)

which helps us make better and faster decisions.

2.3 The Origins of DW/BI

While the term BI is new (since the early 1990s), computer-based BI systems go back, in one form or

another, for more than forty years (Gray & Negash, 2003). Approaches to BI have thus evolved over

decades of technological innovation and management experience with IT.

The history of BI systems begins in the mid-1960s when researchers began systematically studying the

use of computerized quantitative models to assist in decision making and planning (Power, 2003).

(Ferguson & Jones, 1969) reported the first experimental study using a computer aided decision system

by investigating a production scheduling application. At the same time, organizations were beginning to

computerize many of the operational aspects of their business. Information systems were developed to

perform operational applications such as order processing, billing, inventory control and payroll (Arnott

& Pervan, 2005). Once the importance of the data from the operational processes was acknowledged for

the decision-making process, the first Management Information System (MIS) was developed. Another

turning point in this field was Morton‘s work who, together with Gorry, defined the concept of ―Decision

Support Systems‖ (DSS) (Gorry & Morton, 1971). They constructed a framework for improving MIS and

conceived DSS as systems that support any managerial activity in decisions that are semistructured or

unstructured (Arnott & Pervan, 2005). The aim of DSS was to create an environment in which the human

decision maker and the IT-based system worked together in an interactive way to solve problems, the

human dealing with the complex unstructured parts of the problem, and the information system providing

assistance by automating the structured elements of the decision situation.

The oldest form of DSS was the personal DSS which were small-scale systems that were normally

developed for one manager, or a small number of independent managers, for one decision task. They

Page 18: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 9 -

effectively replaced MIS as the management support approach of choice and for around a decade they

were the only form of DSS in practice. Starting with the 1980s, many activities associated with building

and studying DSS occurred in universities and organizations that resulted in expanding the scope of DSS

applications. This determined a broad historical progression and development of DSS into the next main

categories (Arnott & Pervan, 2005):

Group DSS - a group DSS consists of a set of technology and language components and

procedures, communication and information processing that support a group of people engaged in

a decision-related process (Huber, 1984; Kraemer & King, 1988).

Negotiation DSS – a negotiation DSS also operates in a group context, but as the name suggests,

they involve the application of computer technologies to facilitate negotiations (Rangaswamy &

Shell, 1997). As GSS were developed, the need to provide electronic support for groups involved

in negotiation problems and processes evolved as a focused sub-branch of GSS with different

conceptual foundations to support those needs.

Executive information systems (EIS) – an EIS is a data-oriented DSS that provides reporting

about the nature of an organization to management (Fitzgerald, 1992). Despite the ‗executive‘

title, they are used by all levels of management. EIS were enabled by technology improvements

in the mid to late 1980s and, by the mid 1990s EIS had become mainstream and were an integral

component of the IT portfolio of any reasonably sized organization.

As the 1990s unfolded, we saw the emergence of Data Warehousing (DW) and Business Intelligence (BI)

which replaced the EIS. We will focus on these two terms further in the next paragraph in order to get a

better overview of the field that inspired us in our research.

2.4 DW/BI Definition

The term BI was first introduced by (Luhn, 1958) in his article called ―A Business Intelligence System‖.

In his view, BI was defined as: ―the ability to apprehend the interrelationships of presented facts in such a

way as to guide actions towards a desired goal‖. However, the term BI was coined and popularized in the

early 1990s by Howard Dresner (Gartner Group analyst). He described BI as a set of concepts and

methods to improve business decision making by using fact-based support systems (Power, 2003). In the

last couple of years, a lot of attention has been spent on BI and therefore many definitions can be found in

literature. Some of the most representative ones can be seen here.

According to (Golfarelli et al., 2004), BI can be defined as the process of turning data into information

and then into knowledge. A similar view on BI is the one of (Eckerson, 2007) who believes that BI

represents ―the tools, technologies and processes required to turn data into information and information

into knowledge and plans that optimize business actions.‖ Furthermore, (Gray & Negash, 2003) consider

that BI systems ―combine data gathering, data storage and knowledge management with analytical tools

to present complex and competitive information to planners and decision makers‖. We can see from these

definitions that BI helps the decision making process by efficiently and effectively transforming data into

knowledge through the use of different analytical tools.

Moreover, the concept of DW dates back in the late 1980s when IBM researchers (Devlin & Murphy,

1988) published their article ―An Architecture for a Business and Information Systems‖ and introduced

the term ―business data warehouse‖. However, the DW technology and development became popular in

Page 19: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 10 -

the 1990s after (Inmon, 1992) published his seminal book ―Building the Data Warehouse‖. Furthermore,

the bull market of the 1990s led to a plethora of mergers and acquisitions and an increasing globalization

of the world economy. Large organizations were therefore faced with significant challenges in

maintaining an integrated view of their business. This was the environment that determined the increase

of development and usage of DWs (Arnott & Pervan, 2005).

Similar to BI, a lot of definitions can be found for DW, but all of them start from the ways that Inmon and

Kimball (the creators of the two main schools of thought and practice within data warehousing) defined it.

(Inmon, 1992) defines the DW as a ―subject-oriented, integrated, time-varying, non-volatile collection of

data that is used primarily in organizational decision making‖. (Kimball, 1996) offers a much simpler

definition of a DW which provides less insight and depth than Inmon‘s, but is no less accurate. In his

opinion, a DW is ―a copy of transaction data specifically structured for query and analysis‖ (Kimball,

1996). DWs are therefore targeted for decision support as they collect information about one or more

business processes involved in the whole organization. The DW can be seen as a repository that stores

data gathered from many operational databases, and from which the information and knowledge needed

to effectively manage the organization emerge.

Typically, the DW is maintained separately from the organization‘s operational databases as it supports

online analytical processing (OLAP) through a variety of front end tools such as query tools, report

writers, data mining and analysis tools. OLAP functional and performance requirements are quite

different from those of the online transaction processing (OLTP) that applications traditionally supported

by the operational databases. The main differences between operational databases and DWs can be seen

in the table below.

Characteristics Operational Databases DWs

Source of data Operational data; OLTP are the

original source of data

Consolidated data; data comes

from various OLTP databases

Number of sources Few Many

Size of sources Gigabyte Gigabyte-Terabyte

Data content Current values Archived, derived, summarized

Purpose of data Control and run fundamental

business tasks

Help planning, problem solving,

prediction and decision support

Complexity of transactions Simple Complex

Kind of transactions Static, predefined Dynamic, flexible

Actuality Current-valued Current-valued & historical

Numbers of users/Frequency High Medium/Low

Table 1: Differences between operational databases and DWs (adapted from (Breitner, 1997)).

Now that we defined both the BI and DW terms, we can see that there is some overlapping, but also some

differences between them. Also, in literature there has been a debate regarding these two concepts. Some

authors believe that BI is the overarching term with the DW being the central data store foundation,

whereas others refer to data warehousing as the overall concept with the DW databases and BI layers as

subset deliverables (Kimball et al., 2008). As (Kimball et al., 2008) and (Inmon, 2005), two of the most

notorious figures in this field, believe that the DW is the foundation of BI, we will proceed with this

approach in the remainder of this thesis.

Page 20: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 11 -

2.5 DW/BI Business Value

Since the development of a DW/BI environment is usually a very expensive endeavour, an organization

considering such an initiative needs a BI strategy and a business justification to show the balance between

the costs involved and the benefits gained. A DW/BI initiative provides numerous benefits—not only

tangible benefits such as increasing the sales volume and profits, but also intangible benefits such as

enhancing the organization's reputation. Many of these benefits, especially the intangible ones, are

difficult to quantify in terms of monetary value. The real benefit of a DW/BI solution occurs when the

created knowledge is actionable. That means that an organization cannot just provide for the information

factory; it must also have some methods for extracting value from that knowledge. This is not a technical

issue – it is an organizational one. To have identified actionable knowledge is one thing, but to take the

proper action requires a nimble organization with individuals empowered to take that action. Hence,

before embarking on building a DW/BI environment, every included DW/BI activity should be

accompanied by some strategy to gain business value (Loshin, 2003).

Moreover, although the general benefits of DW/BI initiatives are widely documented, they cannot justify

the DW/BI project unless these benefits can be associated to the organization‘s specific business problems

and strategic business goals (Moss & Atre, 2003). Justification for a DW/BI initiative must always be

business-driven and not technology-driven. It is very important for such an initiative to have support from

top level management in order to be successful. Therefore, the DW/BI initiative as a whole, and the

proposed BI application specifically, should support the strategic business goals. Each proposed BI

application must reduce measurable business problems (i.e.: problems affecting the profitability or

efficiency of an organization) in order to justify building the application.

Furthermore, the business representative should be primarily responsible for determining the business

value of the proposed DW/BI application. The information technology (IT) department can become a

solution partner with the business representative and can help explore the business problems and define

the potential benefits and costs of the DW/BI solution. IT can also help clarify and coordinate the

different needs of the varied groups of business users in order to develop a solution that will have a higher

rate of adoption.

With the business representative leading the business case assessment effort, IT staff can assist with the

four business justification components (Moss & Atre, 2003):

Business drivers - Identify the business drivers, strategic business goals, and DW/BI application

objectives. Ensure that the DW/BI solution objectives support the strategic business goals.

Business analysis issues - Define the business analysis issues and the information needed to meet

the strategic business goals by stating the high-level information requirements for the business.

Cost-benefit analysis – Estimate the benefits and costs for building and maintaining a successful

BI decision-support environment. Determine the return on investment (ROI) by assigning

monetary value to the tangible benefits and highlighting the positive impact the intangible

benefits will have on the organization.

Risk assessment - Assess the risks in terms of technology, complexity, integration, organization,

project team, user adoption and financial investment.

Page 21: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 12 -

As can be seen from this paragraph, it is very important to have synchronization between the business

goals of an organization and technical DW/BI solution. The business part has to be the driver for building

the technical application. However, due to time constraints and the fact that there are previous DW/BI

maturity models that also focus on the business side of the problem (Watson et al., 2001; Eckerson, 2004;

Hostmann, 2007), in this thesis we will focus on the technical aspects and the organizational processes

and roles involved in developing a DW/BI solution. Once the business goals and strategy are clearly

defined, it all comes down to being able to develop and maintain a solid technical solution.

2.6 Inmon vs. Kimball

As mentioned in the paragraphs above, there are two different fundamental approaches to data

warehousing: enterprise level data warehouses (Inmon, 1992) and division or department level data marts

(Kimball, 1996). Understanding the basics of the architecture and methodology of both models provides a

good foundational knowledge of data warehousing. Based on this and an organization‘s special needs,

architects can then choose between Inmon‘s, Kimball‘s or a hybrid architectural model.

Inmon sees the DW as a part of a much larger information environment, which he calls Corporate

Information Factory (CIF). To ensure that the DW fits well in this larger environment, he advocated the

construction of both an atomic DW and departmental databases. Inmon‘s approach stresses top-down

development using adaptations of proven database methods and tools. He proposes a three-level data

model (Breslin, 2004). The first level is represented by entity relationship diagrams (ERDs); the second

level establishes the data item set (DIS) for each department; and the third level is the physical model,

created ―by merely extending the mid-level data model to include keys and physical characteristics‖

(Inmon et al., 2005). As it can be seen, Inmon‘s approach is evolutionary rather than revolutionary. His

tools and methods can actively be used only by IT professionals, whereas end users have a more passive

role in the DW development, mostly receiving the results generated by the IT professionals.

On the other hand, Kimball proposes a bottom-up approach by first building one data mart per business

process and then creating the organization‘s DW as a sum of all data marts. The interoperability between

various data marts is ensured by the data bus which requires that all data marts are modeled within

consistent standards called conformed dimensions. Kimball proposes a unique four-step dimensional

design process that consists of: selecting the business process; declaring the grain (i.e.: the level of detail)

of the DW; choosing the dimensions; and identifying the facts. Fact tables contain metric data and

dimension tables show the context of the facts and modify the data. Dimensional modelling has a series of

advantages such as understandability, query performance and extensibility to accommodate new data

(Kimball et al., 2008). Dimensional modelling tools can be used by end users with some special training

which ensures the active involvement of end users in the development of the DW (Breslin, 2004).

Inmon‘s and Kimball‘s models are similar in some ways such as the treatment of the time attribute or the

extract-transform-load (ETL) process, but they are also very different regarding other aspects such as the

development methodologies and architecture, data modelling and philosophy. A summary of these

differences can be depicted in table 2 adapted from (Breslin, 2004).

Page 22: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 13 -

Inmon Kimball

Methodology and Architecture

Overall approach Top-down Bottom-up

Architectural structure Enterprise wide (atomic) data

warehouse is the foundation for data

marts.

Data marts model a single business

process and the enterprise

consistency is achieved through data

bus and conformed dimensions.

Complexity of the method Quite complex Fairly simple

Comparison with established

development methodologies

Derived from the spiral

methodology.

Four-step process, inspired from

relational databases methods.

Consideration of the physical design Fairly thorough Fairly light

Data Modelling

Data orientation Subject- or data-driven Process oriented

Tools Traditional (ERDs, DISs) Dimensional modeling

End-user acceptability Low High

Philosophy

Primary audience IT professionals End-users

Place in the organization Integral part of Corporate

Information Factory (CIF)

Transformer and retainer of

operational data

Objective Deliver a sound technical solution

based on proven database methods

and technologies.

Deliver a solution that makes it easy

for end users to directly query the

data and get good response times.

Table 2: Comparison of Essential Features of Inmon‘s and Kimball‘s Data Warehouse Models (Breslin, 2004).

The model that we are developing in this thesis can be applied to both Inmon‘s and Kimball‘s conceptual

views on DW development, but there are some specific aspects in the data modelling assessment that are

limited to dimensional modelling. The reasons for this include both time constraints and the fact that most

of the DWs developed in practice make use of this technique especially for data marts and for models

presented to users. For more information on the two data modelling techniques, see 4.2.2 and 4.2.3.

2.7 Maturity Modelling

As the main goal of our research is to develop a Data Warehouse Capability Maturity Model, we will now

give an overview on the subject of maturity modelling and take a look at the maturity models that served

as a source of inspiration for our endeavour.

In this highly competitive environment, it is very important for organizations to be aware of their current

situation and know the steps they need to take for continuous improvement. This requires the company‘s

positioning with regard to its IT capabilities and the quality of its products and processes. This positioning

usually involves a comparison with the company‘s goals, external requirements (e. g.: customer demands,

laws or guidelines), or benchmarks. However, an objective assessment of a company‘s position often

proves to be a difficult task. Maturity models can be helpful in this situation. They essentially describe the

development of an entity over time, where the entity can be anything of interest: a human being, an

organizational function, an organization, etc. (Klimko, 2001).

Maturity models can be used as an evaluative and comparative basis for organizational improvement (de

Bruin et al., 2005), and to derive an informed approach for increasing the capability of a specific area

within an organization (Hakes, 1996). They usually have a number of sequentially ordered levels, where

the bottom stage stands for an initial state than can be, for example, characterized by an organization

Page 23: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 14 -

having little capabilities in the domain under consideration. In contrast, the highest stage represents a

concept of total maturity. Advancing on the evolution path between the two extremes involves a

continuous progression regarding the organization‘s capabilities or process performance. The maturity

model serves as an assessment of the position on the evolution path, as it offers a set of criteria and

characteristics that need to be fulfilled in order to reach a particular maturity level (Becker et al., 2009).

During a maturity appraisal which can be done by predetermined procedures such as questionnaires, a

snapshot of the organization regarding the given criteria is made (i.e.: a descriptive model). Based on the

results of the as-is analysis, recommendations for improvement measures can be derived and prioritized in

order to reach higher maturity levels (i.e.: a prescriptive model). Then, once the model is applied in a

wide range of organizations, similar practices across organizations can be compared in order to

benchmark maturity within more disparate industries (i.e.: a comparative model).

2.7.1 Data Warehouse Capability Maturity Model (DWCMM) Forerunners

Studies have shown that more than one hundred and fifty maturity models have been developed (de Bruin

et al., 2005), but only some of them managed to gain global acceptance. Also, there are several

information technology and/or information system maturity models dealing with different aspects of

maturity: technological, organizational and process maturity. The most important maturity models that

served as a source of inspiration for our research can be seen in table 3 and are briefly presented in the

following paragraphs.

Authors Model Focus

Nolan (1973) Stages of Growth IT Growth Inside an Organization

Software Engineering Institute

(SEI) (1993)

Capability Maturity Model (CMM) Software Development Processes

Watson, Ariyachandra &

Matyska (2001)

Data Warehousing Stages of Growth Data Warehousing

Chamoni & Gluchowski (2004) Business Intelligence Maturity Model Business Intelligence

The Data Warehousing Institute

(TDWI) (2004)

Business Intelligence Maturity Model Business Intelligence

Gartner – Hostmann (2007) Business Intelligence and Performance

Management Maturity Model

Business Intelligence and

Performance Management

Table 3: Overview of Maturity Models.

Nolan’s Stages of Growth

First, one of the most widely used concepts in organizational and IS research is the ―stages of growth‖.

The fundamental belief is that many things change over time, in sequential, predictable ways. The stages

of growth are commonly depicted graphically using an S-shaped curve, where the turnings of the curve

mark important transitions. The number of stages varies with the phenomena under investigation, but

most models have between three and six stages (Watson et al., 2001). One of the most famous ―stages of

growth‖ maturity models is Richard Nolan‘s one, published in (Nolan, 1973). The model has been widely

recognized and used by both practitioners and researchers alike. It is based on the companies‘ spending

for electronic data processing, but it can be expanded to the general approach of IT in an organization.

Nolan‘s initial model describes four distinct stages: initiation, expansion, formalization and maturity. In

1979, Nolan transformed the original four-stage model into a six-stage model by adding two new stages:

Page 24: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 15 -

integration and data administration that were put in between the stages of formalization and maturity. For

a more detailed and also critical analysis of the Nolan curve see (Galliers & Sutherland, 1991).

Capability Maturity Model (CMM)

The second classical maturity model is the Capability Maturity Model (CMM) developed at the end of the

eighties by Watts Humphrey and his team from the Software Engineering Institute (SEI) at Carnegie

Mellon University. The CMM is a framework that describes the key elements of an effective software

process and presents an evolutionary improvement path from an ad-hoc, immature process to a mature,

disciplined one. It covers practices for planning, engineering and managing software development and

maintenance. The components of the CMM include (Paulk et al., 1995):

five maturity levels – initial, repeatable, defined, managed and optimizing;

process capabilities – describe the range of expected results that can be achieved by following a

software process;

key process areas – components of the maturity levels that identify a cluster of related activities

that, when performed collectively, achieve a set of goals considered important for establishing

process capability at that maturity level;

goals – summarize the key practices of a key process area;

common features – indicate whether the implementation and institutionalization of a key process

area is effective, repeatable, and lasting;

key practices – each key process area is described in terms of key practices that, when

implemented, help to satisfy the goals of that key process area.

Furthermore, a number of maturity models have been developed for assessing the maturity of BI and

DWs solutions.

Data Warehousing Stages of Growth

The ―Data Warehousing Stages of Growth‖ was adapted from Nolan‘s growth curve. It includes three

stages that describe the current evolution of DWs:

initiation — the initial version of the warehouse;

growth — the expansion of the warehouse;

maturity — the warehouse becomes fully integrated into the company‘s operations.

and nine variables that describe the different stages: data, architecture, stability of the production

environment, warehouse staff, users, impact on users‘ skills and jobs, applications, costs and benefits,

organizational impact (Watson et al., 2001). However, this model has its limitations as it is a

generalization, so it does not describe perfectly every company‘s experiences. Also, the model is a few

years old and meanwhile, new developments have occurred, that point to additional stages.

BI Maturity Model (biMM)

Another interesting model is the ―Business Intelligence Maturity Model‖ developed by (Chamoni &

Gluchowski, 2004), but both the model and the paper are in German which makes it rather difficult for

non-German speakers to understand its content. It comprises of five levels of evolutionary BI

Page 25: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 16 -

development analyzed from three perspectives: business content, technology and organizational impact.

Different aspects of these perspectives are recorded and evaluated for each of the five stages. The model

is applied in different organizations in order to do some BI benchmarking in specific industrial sectors

and offer general strategic recommendations.

TDWI’s BI Maturity Model

Another famous BI maturity model is the one developed by The Data Warehousing Institute (TDWI)

(Eckerson, 2004). It is a six-stage model that shows the trajectory that most organizations follow when

evolving their BI infrastructure. The maturity stages are: prenatal, infant, child, teenager, adult and sage.

They are defined by a number of characteristics including scope, analytic structure, executive perceptions,

types of analytics, stewardship, funding, technology platform, change management and administration. In

2009, TDWI published a poster with a more complex BI maturity model which can be considered as a

generalization of multiple BI projects and implementations indicating certain patterns of behaviour based

on five different aspects: the BI adoption, organization control and processes, usage, insight and return on

investment (ROI). In order to give more value to the model, TDWI also created an assessment

questionnaire with questions on funding, value, architecture, data, development and delivery that can be

filled in by different organizations in order to have some BI benchmarking done.

Gartner’s BI and Performance Management (PM) Maturity Model

The last model that we would like to present is Gartner‘s Group BI and Performance Management (PM)

Maturity Model (Hostmann, 2007). Their model helps an organization understand its current position with

regard to BI and what it needs to do to move to the next level. Gartner bases its maturity curve on the

real-world phenomenon that organizational change is usually incremental over time and proposes five

maturity stages: unaware, tactical, focused, strategic and pervasive. An important discovery in their

analysis is that one characteristic was more likely than any other to indicate whether an organization is

capable of operating at the higher levels of BI/PM maturity: its implementation of the BI Competency

Center (BICC), or its lack thereof. BICC is a group of business, IT, and information analysts who work

together to define BI strategies and requirements for the entire organization.

2.8 Summary

This chapter has presented information on the key background concepts for our thesis – data warehousing,

business intelligence and maturity modelling. We first talked about the ―intelligent organization‖ and how

DW/BI solutions can help companies improve their performance. Then, we gave a short overview on

DW/BI evolution and defined the two concepts. Emphasis was then put on the fact that a DW/BI initiative

must always be business-driven and not technology-driven in order to be successful. We continued with

presenting the two main conceptual approaches to data warehousing (i.e.: Inmon vs. Kimball). Finally, we

provided some information on maturity modelling and the main maturity models that served as a

foundation for the artifact we designed. We will continue with an overview on the model we developed in

chapter 3.

Page 26: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 17 -

3 DWCMM: The Data Warehouse Capability Maturity Model

This section describes in detail the deliverables proposed as a solution to the research problem.

3.1 From Nolan’s Stages of Growth to the Data Warehouse Capability Maturity Model

As presented in the previous paragraphs, a lot of maturity models have been developed for different

fields. And several models have been proposed for the field of DW/BI. Each of them has a different way

of assessing maturity, but there are some common elements for all the models.

First of all, Nolan‘s ―stages of growth‖ was a breakthrough in the organizational and IS research (Nolan,

1973). It shows the growth and evolution of information technology (IT) in a business or similar

organization from stage 1 called ―initiation‖ to the last stage called ―maturity‖. The second maturity

model which was actually the starting point for this thesis is the CMM (Paulk et al., 1995). It has become

a recognized standard for rating software development organizations. The CMM is a framework that

describes the key elements of an effective software process and presents an evolutionary improvement

path from an ad-hoc, immature process to a mature, disciplined one. Since its development, CMM has

become a universal model for assessing software process maturity. Therefore, we decided to use it as a

main foundation for our model. However, the CMM has often been criticized for its complexity and

difficulty of implementation. That is why we simplified it by keeping the five maturity levels (i.e.: initial,

repeatable, defined, managed and optimizing), the process capabilities and the key process areas, which in

our model would translate to the chosen benchmark variables/categories for doing the DW maturity

assessment.

As DW/BI is widely applied in practice, several maturity models were developed especially for this field

as already presented. One of the most recent and famous models is the one developed by the TDWI

(Eckerson, 2004). Another interesting model is Gartner‘s BI and PM maturity model (Hostmann, 2007).

They both show the trajectory that most organizations follow when evolving their BI or PM

infrastructure. However, even if both models are interesting, they are not sustained by scientific literature

and they focus more on the business side of BI implementation and not on the technical aspects of a DW

project. Furthermore, even if the other two models, the DW stages of growth (Watson et al., 2001) and the

BI maturity model (Chamoni & Gluchowski, 2004), have more scientific roots, they have their

deficiencies. As mentioned before, the latter is in German, whereas the former is a few years old and

meanwhile, new developments have occurred that point to additional stages. Although both models focus

on more variables involved in the DW/BI development, they do not go deep into analyzing the technical

aspects of a DW solution.

Therefore, it can be seen that even if DW/BI solutions are often implemented in practice and a lot of

maturity models have been created, none is actually focusing on the technical aspects of the DW/BI

solution and the organizational processes that sustain them. Hence, this is the research gap we would like

to fill in by developing a Data Warehouse Capability Maturity Model (DWCMM) that focuses on the

DW technical solution and DW organization and processes. A short overview of the model will be shown

in the next paragraph and more details on each component will be given in the upcoming chapters.

Page 27: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 18 -

3.2 DWCMM

Using the CMM as a main foundation, the other maturity models described above and a thorough and

extensive literature study, we developed the DWCMM that can be depicted in figure 5.

Figure 5: Data Warehouse Capability Maturity Model (DWCMM).

When analyzing the maturity of a DW solution, we are actually taking a snapshot of an organization at the

current moment in time. Therefore, in order to do a valuable assessment, it is important to include in the

maturity analysis the most representative dimensions involved in the development of a DW solution.

Several authors describe that the main phases usually involved in a DW project lifecycle are (Kimball et

al., 2008; Moss & Atre, 2003; Ponniah, 2001): project planning and management, requirements

definition, design, development, testing and acceptance, deployment, growth and maintenance. All of

these phases and processes refer to the implementation and maintenance of the actual DW technical

solution which includes: general architecture and infrastructure, data modelling, ETL, BI applications.

These categories can be analyzed from many points of view which will be depicted in our model and the

maturity assessment we developed. Therefore, as the DWCMM will be restricted for doing the assessment

of the technical aspects, without taking into consideration the DW/BI usage and adoption or the DW/BI

business value, it will consider two main benchmark variables/categories for analysis, each of them

having several sub-categories:

DW Technical Solution

General Architecture and Infrastructure

Data Modelling

Extract-Transform-Load (ETL)

BI Applications

DW Organization & Processes

Development Processes

Service Processes.

Page 28: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 19 -

In order to be able to do the assessment for each of the chosen categories and sub-categories, we also

developed a DW maturity assessment questionnaire. It is important to emphasize the fact that the

questionnaire we have developed does a high level assessment of an organization’s DW solution and it is

limited strictly to the DW technical aspects. Emphasis should also be put on the fact that the model will

assess “what” and “if” certain characteristics and processes are implemented and not “how” they are

implemented. It is a practical solution as it takes less than an hour to fill in the questions that will be

scored and it is addressed to someone from the DW team who has knowledge and experience in all the

presented categories included in the DWCMM (e.g.: DW technical architect, BI project manager, BI

manager, BI consultant, etc.). However, although it may be tempting to use the scores from the

assessment questionnaire as a definitive statement of the organization‘s DW maturity, this should be

avoided. The maturity score is just a rough gauge that merely scratches the surface of most DW projects.

That is why the maturity assessment we developed should serve as a starting point. To truly assess the

technical maturity and discover the areas of strengths and weakness, organizations should perform more

thorough analysis for each benchmark category.

The DW maturity assessment questionnaire has 60 questions divided into the following three categories:

DW General Questions (9 questions) – it comprises of several questions about the DW/BI

solution and they are not scored. Their purpose is to offer a better image on the drivers for

implementing the DW environment, the budget allocated for data warehousing and BI, the DW

business value, end-user adoption, etc. This will be useful in creating a complete picture on the

current DW solution and its maturity. Also, once the questionnaire is filled in by more

organizations, this data will serve as input for statistical analysis and comparisons between

organizations from the same industry or across industries. The questions from this category can

be seen in the table below.

1) Could you elaborate on the main drivers for implementing a BI/DW solution in your organization?

2) How long has your organization been using BI/DW?

3) Could you elaborate on the success of the BI/DW solution in your organization, in terms of:

a) Returns vs. Costs

b) Time (Intended vs. Actual)

c) Quality

d) End-user adoption.

4) Which answer best describes how executives perceive the purpose of your organization‘s BI/DW environment?

a) Operational cost center – An IT system needed to run the business

b) Tactical resource - Tools to assist decision making

c) Mission-critical resource - A system that is critical to running business operations

d) Strategic resource – Key to achieving performance objectives and goals

e) Competitive differentiator – Key to gaining or keeping customers and/or market share.

5) What percentage of the annual IT budget for your organization does the BI/DW budget represent?

6) What percentage of the IT department is taking care of BI (i.e.: how many people from the total number of IT

employees)?

7) Who is the budget owner of the BI/DW solution in your organization (i.e.: who is responsible for paying the

invoice)?

8) Which technologies do you use for developing the BI/DW solution in your organization (i.e.: for data

modelling, ETL, BI applications, database)?

9) What data modelling technique do you use for your BI/DW solution (e.g.: dimensional modelling, normalized

Page 29: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 20 -

modelling, data vault, etc.)?

Table 4: DW General Questions.

DW Technical Solution (32 questions) – it comprises of several scored questions for each of the

following sub-categories:

General Architecture and Infrastructure (9 questions)

Data Modelling (9 questions)

ETL (7 questions)

BI Applications (7 questions). More details on this part will be given in chapter 4.

DW Organization & Processes (19 questions) – it comprises of several scored questions for each

of the following sub-categories:

Development Processes (11 questions)

Service Processes (8 questions). More details on this part will be given in chapter 5.

The whole DW maturity assessment questionnaire is shown in appendix B. Each question from the

questionnaire will have five possible answers which are scored from 1 to 5, 1 being a characteristic for the

lowest maturity stage and 5 for the highest one. When an organization takes the survey, it will receive:

a maturity score for each sub-category by computing the average value of the weightings (i.e.:

sum of the weightings / number of questions).

an overall score for each of the two main categories by computing the average value of the scores

obtained for each sub-category.

an overall maturity score is shown following the same principle applied to the main two

categories scores.

We believe that the maturity scores for the sub-categories can give a good overview on the current DW

solution implemented by the organization. This is the reason why, after computing the maturity scores for

each sub-category, a radar graph, as the one depicted in figure 5, will be drawn to show the alignment

between these scores. In this way, the organization will have a clearer image on their current DW project

and will know what sub-category is the strongest and which one is left behind.

An important point here is that the answers are usually mingled in order to get a more unbiased result.

Some questions have their answers given in a hierarchical order as, in order to get to a higher maturity

level, the organization should already have implemented the requirements found in the previous stages.

As will be seen in the validation chapters, the model was tested in several organizations with all the

answers mingled. However, this created confusion for some of the questions (especially the ones from the

service processes part), and therefore, we decided to keep some answers in hierarchical order, assuming

that every respondent would like to get a fair result and will not offer biased answers.

After reviewing the maturity scores and the given answers by a specific organization, some general

feedback and advice for future improvements will be provided. Each organization that takes the

assessment will receive a document with a short explanation on the scoring method, a table with their

maturity scores and the radar graph, and then some general feedback that will consist of:

a general overview on the maturity scores.

an analysis of the positive aspects already implemented in the DW solution.

Page 30: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 21 -

several steps that the organization should take in order to improve their current DW application.

A template of this document can be seen in appendix F. Moreover, as our model measures the maturity of

a DW solution, we also created two maturity matrices – a condensed maturity matrix and a detailed one –

each of them having five maturity stages as inspired by the CMM:

Initial (1)

Repeatable (2)

Defined (3)

Managed (4)

Optimized (5),

where the initial stage describes an incipient DW development and the optimized level shows a very

mature solution that can be obtained by an organization with a lot of experience in the field where

everything is standardized and monitored. An organization will usually be situated on one of these stages

if their score is a perfect match with the number of the stage (i.e.: 1, 2, 3, 4 or 5) or somewhere in between

otherwise. However, this mapping is not a perfect match.

The condensed DW maturity matrix gives a short overview of the most important characteristics for each

sub-category for each maturity level. This will offer a better image on the main goal of the DWCMM and

on what the detailed maturity matrix entails. The condensed maturity matrix can be seen in figure 6.

Stages

Benchmark Variables

Initial (1) Repeatable (2) Defined (3) Managed (4) Optimized (5)

DW

Tec

hn

ica

l S

olu

tio

n

Architecture Desktop data

marts

Independent

data marts

Independent

data

warehouses

Central DW

with/without

data marts

DW/BI service

that federates a

central DW and

other sources

via standard

interface

Data Modelling No data models

synchronization

or standards

Manually

synchronized

data models

Manually or

automatically

synchronized

data models

Automatic

synchronization

of most data

models

Enterprise-wide

standards and

automatic

synchronization

of all the data

models

ETL Simple ETL with

no standards that

just extracts and

loads data into

the DW

Basic ETL with

simple

transformations

Advanced ETL

(e.g. slowly

changing

dimensions

manager, data

quality system,

reusability, etc.)

More advanced

ETL (e.g.

hierarchy

manager,

special

dimensions

manager, etc.)

Optimized ETL

for real-time

DW with all the

standards

defined

BI Applications Static and

parameter-driven

reports

Ad-hoc

reporting;

OLAP

Dashboards &

scorecards

Predictive

analytics; data

& text mining

Closed-loop &

real-time BI

applications

Page 31: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 22 -

DW

Org

an

iza

tio

n &

Pro

cess

es Development Processes Ad-hoc, non-

standardized

development

processes or

defined phases

Some

development

processes

policies and

procedures

established with

some phases

separated

Standardized

development

processes with

all the phases

separated and

all the roles

formalized

Quantitative

development

processes

management

Continuous

development

processes

improvement

Service Processes Ad-hoc, non-

standardized

service processes

Some service

processes

policies and

procedures

established

Standardized

service

processes with

all the roles

formalized

Quantitative

service

processes

management

Continuous

service

processes

improvement

Figure 6: DWCMM Condensed Maturity Matrix.

However, as already mentioned, more important is the detailed DW maturity matrix which can be seen in

appendix A. We will give a short overview on the detailed DW maturity matrix in this paragraph.

First, the characteristics for each maturity stage are usually obtained by mapping the corresponding

answers of each question from the maturity assessment questionnaire (except for several characteristics

such as: project management, testing and acceptance, whose answers are formulated in a different way).

In this way, an organization will be able to see their maturity stage by category (e.g.: architecture) and by

main category characteristics (e.g.: metadata, standards, infrastructure, etc.). The matrix has two

dimensions:

columns – show each benchmark sub-category (i.e.: Architecture, Data Modelling, ETL, BI

Applications; Development Processes, Service Processes) with their maturity stages from Initial

(1) to Optimized (5).

rows – show the main analyzed characteristics (e.g.: for Architecture – conceptual architecture,

business rules, metadata, security, data sources, performance, infrastructure, update frequency)

for each sub-category divided by maturity stage.

The matrix can be interpreted in two ways:

1) Take each stage and see which the specific characteristics for each sub-category for that particular

stage are.

2) Take each sub-category and see which its specific characteristics for each stage or for a particular

stage are.

As the developed questionnaire does an assessment for each benchmark sub-category, a specific

organization will most likely follow the second interpretation. They would probably like to know what

steps to take in order to improve each sub-category and hence, the overall maturity score, which will lead

to a higher maturity stage. It is also very unlikely that an organization will have at the same moment in

time, all the characteristics for all the sub-categories on the same maturity stage. However, the first

interpretation does not have to be followed so strictly. After all, this is only a model and the mapping

between theory and reality is not perfect. Therefore, if a company gets a maturity score of 3, this does not

mean that all the characteristics for all the sub-categories are on stage three. Depending also on the

Page 32: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 23 -

standard deviation and the answers themselves, we can find out more information about the actual

situation. This is why we believe that the second interpretation would be more useful and we will

exemplify it here for general architecture and infrastructure.

The main characteristics for general architecture and infrastructure evaluated in our model are: conceptual

architecture, business rules, metadata management, security, data sources, infrastructure, performance,

and update frequency.

The maturity stages for conceptual architecture have the following structure:

Initial (1) – desktop data marts (e.g.: Excel sheets)

Repeatable (2) – multiple independent data marts

Defined (3) – multiple independent data warehouses

Managed (4) – a single, central DW with multiple data marts (Inmon) or conformed data marts

(Kimball)

Optimized (5) – a DW/BI service that federates a central DW and other data sources via a

standard interface.

Therefore, if an organization scores 3 for this specific characteristic, we would advise it to reconsider

their architecture and maybe go one step further and implement a single, central DW. In this way, they

could reach maturity stage four for this specific characteristic and it would be the first step towards a

higher overall maturity score. The same interpretation can be given if analyzing any characteristic for

architecture or for any other benchmark category.

At the same time, one could say that in order to be on maturity stage 3, an organization should have the

following characteristics implemented for architecture (more or less):

Conceptual architecture – multiple independent data marts.

Business rules – some business rules defined or implemented.

Metadata Management – central metadata repository separated by tools.

Security – independent authorization for each tool, etc.

Now that we have offered an overview of the DWCMM, we will continue with presenting the DWCMM

and the DW maturity assessment questionnaire and matrix more thoroughly in the next chapters. For this,

we will elaborate on each category and sub-category as shown in the DWCMM and we will present the

characteristics and questions we chose in order to assess the maturity of each benchmark variable from

the DWCMM. In chapter 4 we will focus on the DW Technical Solution maturity and we will continue in

chapter 5 with the DW Organization & Processes part.

Page 33: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 24 -

4 DW Technical Solution Maturity

The main elements of the DWCMM having been identified, it is now time to elaborate on each part of the

maturity assessment questionnaire to present our arguments regarding the choice of questions for each

sub-category of the DWCMM. We will start with the components of the DW technical solution – general

architecture and infrastructure, data modelling, ETL, BI applications – in this chapter and continue with

the DW processes and organization in the next one.

4.1 General Architecture and Infrastructure

We already talked about what a DW is and the most common approaches to develop one in the previous

chapters. In this section, we would like to analyze the most important elements that need to be considered

when assessing the maturity of DW general architecture and infrastructure (this benchmark variable was

initially called ―architecture‖; see 6.1.1 for more details on changing the name). Depending on this, we

will also define the most representative questions for architecture included in the maturity assessment

questionnaire.

4.1.1 What is Architecture?

Architecture as a general term refers to a blueprint that allows communication, planning, maintenance,

learning, and reuse (Sen & Sinha, 2005). According to (Kimball et al., 2008), the architecture of a DW

consists of three major pieces: data architecture - organizes the data and defines the quality and

management standards for data and metadata; application architecture – the software framework that

controls the movement of data from source to user; and technical architecture - the underlying computing

infrastructure that enables the data and application architectures.

The whole architecture is divided in two parts (Kimball et al., 2008):

the back room where the data modelling and the ETL process take place and

the front room which refers to the BI applications and services.

Besides these three main components (i.e.: data modelling, ETL, BI applications), architecture also

includes underlying elements such as infrastructure, metadata and security that support the flow of data

from the source systems to the end-users (Kimball et al., 2008; Chauduri & Dayal, 1997). At the same

time, architecture refers to the major data storage components – source systems, data staging area, data

warehouse database, operational data store, data marts – and the way they are assembled together

(Ponniah, 2001). This is connected to the conceptual approach of designing and building the DW (e.g.:

conformed data marts – Kimball or enterprise-wide DW – Inmon, etc.). Therefore, in this thesis we

consider architecture as a separate category for assessing maturity in which we include questions

regarding: conceptual architecture and its layers, infrastructure, metadata management, security

management, update frequency, business rules, performance optimization. We will elaborate on each of

these elements and, at the same time, present the questions related to these elements that we included in

our maturity questionnaire.

Page 34: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 25 -

4.1.2 Conceptual Architecture and Its Layers

In this section we will present a typical DW architecture which usually contains several data storage

layers such as source systems, data staging area, data warehouse database, operational data store, data

marts. It is not mandatory for all these elements to be part of the architecture. A typical DW architecture

can be seen in figure 7.

Figure 7: A Typical DW Architecture (adapted from (Chaudhuri & Dayal, 1997)).

Source Systems

The first component of a DW is represented by the source systems without which there would be no data.

They provide the input into the solution and require detailed analysis at the beginning of the project. In

most cases, data must come from multiple systems built with multiple data stores hosted on multiple

platforms. The source systems usually include: Excel files, text files, XML files, relational databases,

enterprise resource planning (ERP) and customer relationship management (CRM) systems, etc. For a

broader view on these types of data sources, see (Kimball et al., 2008). Lately, organizations have begun

implementing capabilities in order to include in their DW various types of unstructured data sources (e.g.:

text documents, e-mail files; images or videos) and Web data sources. However, this implies new

technologies such as content intelligence (i.e.: search, classification and discovery techniques) which are

not yet very mature (Blumberg & Atre, 2003). Therefore, one could say that an organization that is able to

extract data from this kind of sources is a more mature one.

Data Staging Area

A data staging area is a temporary location in the back room of a DW where data from source systems is

copied. Occasionally, the implementation of the DW encounters environmental problems as it pulls data

from many source operational systems. Therefore, a separate staging area is needed to prepare data for the

DW, but it is not universally built. The copy of the data can be a one-to-one mapping of the source

systems‘ content, but in a more convenient environment. The data staging area is not accessible for the

end users and it does not support query or presentation services. It acts as a surrogate for the source

systems and it offers several benefits (Walker, 2006):

Page 35: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 26 -

It is a good place to perform data quality profiling.

It can be used as a point close to the source to perform data quality cleaning.

It will serve as a workbench for ETL, etc..

Data Marts

Data marts can be considered as subsets of the data volume from the whole organization specific to a

group of users or department. Therefore, they are limited to specific subject areas. For example, a data

mart for the marketing department would have subjects limited to customers, products, sales, etc.

(Chauduri & Dayal, 1997). The data from a data mart are usually aggregated to a certain level which can

sometimes provide rapid response to end-user requests. Data marts require less cost and effort to develop

and provide access to functional or private information to specific organizational units. They are suited

for businesses demanding a fast time to market, quick impact on the bottom line, and minimal

infrastructure changes (Murtaza, 1998). However, even if from a short-term perspective a data mart seems

a better investment than a DW, from a long-term perspective, the former is never a substitute for the

latter. The main reason for this is because many organizations misunderstand the concept of data marts

and develop independent solutions that propagate freely throughout the organization and become a

problem when attempting to integrate them (Kimball et al., 2008). Therefore, when developed, data marts

should be conformed and integrated, or derived from an enterprise wide DW.

Data Warehouse Database

As already presented, there are two main conceptual DW architectures: central DW with multiple data

marts (Inmon) or conformed data marts (Kimball). Of course, there are also hybrid approaches that

combine the enterprise wide DW and the conformed data marts technique. Here, we just refer to the DW

database or the separate repository that does the actual storage of data. The DW is no special technology

in itself as it is a relational or multidimensional data structure that is optimized for analysis and querying.

As the data structure and operations are different from the ones in the transactional systems, it is

important to have the DW environment separated from the operational ones (Chauduri & Dayal, 1997).

Operational Data Store

The Operational Data Store (ODS) is a database that provides a consolidated view of volatile

transactional data from multiple operational systems. According to Bill Inmon, the originator of the

concept, an ODS is ―a subject-oriented, integrated, volatile, current-valued, detailed-only collection of

data in support of an organization's need for up-to-the-second, operational, integrated, collective

information.‖ (Inmon, 1992). As can be seen, an ODS differs from a DW in that the ODS‘s contents are

updated in the course of business, whereas a data warehouse contains static data. Therefore, this

architecture is suitable for real time or near real time reporting and analysis that can be done without

impacting the performance of the production systems. Unfortunately, operational data is not designed for

decision support applications, and complex queries may result in long response times and heavy impact

on the transactional systems.

Maturity Assessment Question(s)

All this being said, we can now show the maturity assessment questions related to these elements:

Page 36: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 27 -

1) What is the predominant architecture of your DW?

a) Level 1 – Desktop data marts (e.g.: Excel sheets)

b) Level 2 – Multiple independent data marts

c) Level 3 – Multiple independent data warehouses

d) Level 4 – A single, central DW with multiple data marts (Inmon) or conformed data marts (Kimball)

e) Level 5 – A DW/BI service that federates a central enterprise DW and other data sources via a standard

interface.

2) What types of data sources does your DW extract data from at the highest level?

a) Level 1 – CSVs files

b) Level 2 – Operational databases

c) Level 3 – ERP and CRM systems; XML files

d) Level 4 – Unstructured data sources (e.g.: text documents, e-mail files)

e) Level 5 – Various types of unstructured data sources (e.g.: images, videos) and Web data sources.

Table 5: DW Architecture Maturity Assessment Questions.

As can be seen, we focused our attention on the conceptual architecture and on the types of data sources

that the DW supports at the highest level as we considered them to be the most important high-level

elements that characterize the maturity of the conceptual architecture. The hierarchical order in which the

answers were organized was deduced from the information given beforehand on these elements and from

the literature study we had done.

4.1.3 Infrastructure

Infrastructure is a very important component of a DW as it provides the underlying foundation that

enables the DW architecture to be implemented. It is sometimes called technical architecture and it

includes several elements such as: hardware platforms and components (i.e.: disks, memory, CPUs,

DW/ETL/BI applications servers), operating systems (e.g.: UNIX), database platforms (e.g.: relational

engines or multidimensional/OLAP engines), connectivity and networking. Several factors influence the

implemented infrastructure: the business requirements, the technical and systems issues, the specific skills

and experience of the DW team, policy and other organizational issues, expected growth rates, etc.

(Kimball et al., 2008).

An important aspect here is the parallel processing hardware architecture used: symmetric

multiprocessing (SMP), massively parallel processing (MMP) and non-uniform memory architecture

(NUMA). These architectures differ in the way the processors work with the disk, memory and each

other. It is important to gain sufficient insight into each option‘s features, benefits and limitations in order

to select the proper server hardware. Therefore, you cannot say that one is more mature than the other.

For more information on parallel processing hardware architectures, see (Kimball et al., 2008; Ponniah,

2001).

As DWs contain large volumes of data with a different structure than the operational databases, a

specialized infrastructure for DW can be critical for performance and better results. The most important

aspect is to have different servers for the OLTP and DW systems. However, many organizations ignore

this thing and use the same servers for both systems which leads to low performance. Once this is done,

higher performance can be achieved by having separate servers for DW, ETL and BI applications. Lately,

a new hardware solution has been developed for increasing the performance of the DW system: a

Page 37: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 28 -

specialized DW appliance. It consists of a small amount of proprietary hardware with an integrated set of

servers, storage, operating system(s), DBMS and software specifically pre-installed and pre-optimized for

data warehousing. Though such appliances are expensive relative to regular hardware, the custom

hardware they contain allows them to claim a 10-50 times improvement over existing database solutions

(Madden, 2006). Another reason for buying such an appliance is simplicity. The appliance is delivered

complete (―no assembly required‖) and installs rapidly. Finally, if there are any problems, the appliance

requires complex analysis, but only a single call to the appliance vendor for a solution (Feinberg & Beyer,

2010).

Maturity Assessment Question(s)

From the information presented above, we decided that a representative question to assess the maturity of

the infrastructure refers to the specialization of infrastructure for a DW solution:

3) To what degree is your infrastructure specialized for a DW?

a) Very low – Desktop platform

b) Low – Shared OLTP systems and DW environment

c) Moderate – Separate OLTP systems and DW environment

d) High – Separate servers for OLTP systems, DW, ETL and BI applications

e) Very high – Specialized DW appliances (e.g.: Netezza, Teradata).

Table 6: Infrastructure Maturity Assessment Questions.

4.1.4 Metadata

Metadata is usually defined as ―data about data‖ (Shankaranarayanan & Even, 2004). However, this

definition does not give a clear image on what metadata actually is. Metadata can be seen as all the

information that defines and describes the structures, operations and contents of the DW system in order

to support the administration and effective exploitation of the DW. The DW/BI industry often refers to

two main categories of metadata (Moss & Atre, 2003):

Business metadata - provides business users with a roadmap for accessing the business data in the

DW/BI decision-support environment. It describes the contents of the DW in more user

accessible terms. It shows what data the user can find, where it comes from, what it means and

what its relationships is to the other data in the DW.

Technical metadata - supports the technicians and ―power users‖ by providing them with

technical information about the objects and processes that make up the DW/BI system.

Some differences between business metadata and technical metadata are highlighted in table 7.

Business Metadata Technical Metadata

Provided by business people Provided by technicians or tools

Documented in business terms on data models and in

data dictionaries

Documented in technical terms in databases, files,

programs, and tools

Used by business people Used by technicians, ―power users‖, databases,

programs, and tools (e.g.: ETL, OLAP)

Names fully spelled out in business language Abbreviated names with special characters, used in

databases, files, and programs

Table 7: Business Metadata vs. Technical Metadata (adapted from (Moss & Atre, 2003)).

Page 38: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 29 -

(Kimball et al., 2008) propose a third category of metadata:

Process metadata – describes the results of various operations in the DW and it is especially

applied to the ETL or query processes. For example, in the ETL process, each task logs key data

about is execution, such as start and end time, CPU seconds used, rows processed, etc. Similar

process metadata is generated when users query the DW. This data is very important for

performance monitoring and improvement process.

Metadata can be considered the DNA of the DW as it defines its elements and how they work together. It

drives the DW and provides flexibility by buffering the various components of the system from each other

(Ponniah, 2001).

A very important aspect related to metadata is integration. Metadata is usually stored and maintained in

repositories. These are structured storage and retrieval systems, typically built on top of a conventional

DBMS. A repository is not simply a storage component, but also embodies functionalities necessary to

handle the stored metadata. However, the reality is that most tools create and manage their own metadata

repository and therefore, there will be several metadata repositories scattered around the DW system.

These repositories often use different storage types and thus, they may have overlapping content. It‘s this

combination of multiple repositories that causes problems and hence, the best solution is a single

integrated metadata repository (Kimball et al., 2008). However, implementing an integrated metadata

repository can be very challenging, but if succeeded, it would be valuable in several ways: it could help

identify the impact of making a change to the DW system; it could serve as a source for auditing and

documentation; it would ensure metadata quality and synchronization, etc. As usually an organization

supports tools from more vendors, it is rather difficult to create an integrated metadata repository due to

lack of standardization. But, despite all these challenges, a metadata repository is a mandatory component

of every DW environment and metadata should be gathered for all the components of the DW (i.e.: data

modelling, ETL, BI applications, etc.) (Ponniah, 2001; Moss & Atre, 2003).

Another important aspect related to metadata is accessibility (Moss & Atre, 2003). In order to reach its

goal, BI applications metadata should always be available and easily accessible to end users for a better

understanding and usage of the DW solution. Of course, the best solution would be a complete integration

of metadata with the BI applications (i.e.: metadata can be accessed through one button push on the

attributes, metrics, etc.). However, this is also the hardest one to implement. In case the organization has a

metadata repository implemented, another efficient way of accessing metadata is through a metadata

management tool. But, there are still many organizations that do not pay much attention to business

metadata and its accessibility and, therefore, metadata is very often not available or available by sending

documents to users by request.

Maturity Assessment Question(s)

As metadata is an underlying element in a DW and it has specific characteristics for each of the major

components – data modelling, ETL, BI applications – we will have one maturity question regarding

metadata in each of the mentioned categories. For architecture, we decided that the metadata maturity

question should refer to the general metadata management:

4) To what degree is your metadata management implemented?

Page 39: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 30 -

a) Very low – No metadata management

b) Low – Non-integrated metadata by solution

c) Moderate – Central metadata repository separated by tools

d) High – Central up-to-date metadata repository

e) Very high – Web-accessed central metadata repository with integrated, standardized, up-to-date metadata.

Table 8: Metadata Management Maturity Assessment Question.

4.1.5 Security

A DW is a veritable gold mine of information as all of the organization‘s critical information is readily

available in a format easy to retrieve and use. The DW system must publish data to those who need to see

it, while simultaneously protecting the data. On the one hand, the DW team is judged by how easily the

business user can access the data, and on the other hand, the team is blamed if sensitive data gets into the

wrong hand or if data is lost. Therefore, security is very important for the success of the DW even if some

organizations seem to ignore this fact. User access security is usually implemented through several

methods (Kimball et al., 2008; Moss & Atre, 2003; Ponniah, 2001):

Authentication – the process of identifying a person, usually based on a logon ID and password.

This process is meant to ensure that the person is who he or she claims to be. There are several

levels of authentication depending on how sensitive the data is. The first level consists of a

simple, static password, followed by a system-enforced password pattern and periodically

required changes. An organization with a DW solution should at least have this security method

implemented.

Role-based security – databases usually offer role-based security. A role is just a grouping of

users with some common requirements for accessing the database. Once the roles are created,

users can be set up in the appropriate roles and access privileges may be granted at the level of a

role. A privilege is an authorization to perform a particular operation; without explicitly granted

privileges, a user cannot access any information in the database. While privileges let you restrict

the types of operations a user can perform, managing these privileges may be complex. To

address the complexity of privilege management, database roles encapsulate one or more

privileges that can be granted to and revoked from users.

Tool-based security – tool-based security is usually not as flexible as role-based security at

database level. Nevertheless, tool-based security can form some part of the security solution.

However, if the DW team is planning to use the DBMS itself for security protection, then tool-

based security may be considered redundant.

Authorization – the process of determining what specific content a user is allowed to access. Once

users are authenticated, the authorization process defines the access policy. Authorization is a

more complex problem in the DW system than authentication because limiting access can have

significant maintenance and computational overhead.

No matter of the chosen security strategy, a very important and hard to achieve goal is to establish a

security policy for the DW compliant with the organizational security policy and to implement and

integrate this security at a companywide level (Ponniah, 2001).

Page 40: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 31 -

Maturity Assessment Question(s)

From the most important aspects related to security presented above – related to the way security is

implemented for the DW – we came up with the following maturity question for this DW component:

5) To what degree is security implemented in your DW architecture?

a) Very low – No security implemented

b) Low – Authentication security

c) Moderate – Independent tool-based security

d) High – Role-based security at database level

e) Very high – Integrated companywide authorization security

Table 9: Security Maturity Assessment Question.

4.1.6 Business Rules for DW

Business rules are abstractions of the policies and practices of a business organization. They reflect the

decisions needed to accomplish business policy and objectives of an organization (Kaula, 2009). Business

rules are used to capture and implement precise business logic in processes, procedures, and systems

(manual or automated). Therefore, business rules are an important aspect when implementing a DW.

Example of business rules used in a DW are: different attributes, ranges, domains, operational records,

etc. Business rules can serve different purposes in the development of a DW (Ponniah, 2001):

They are very important for data quality and integrity. In order to have the right data in the DW, it

is important that the values of each data item adhere to prescribed business rules. For example, in

an auction system, the sale price cannot be less than the reserve price. Many data quality

problems are determined by violation of such business rules. An example would be when an

employee record comes up with the number of days (i.e.: days worked in a year plus vacation

days, holidays and sick days) more than 365 or 366.

They are a source for business metadata.

They should be taken into consideration when requirements are defined

They should be used for data modelling and applied for the extraction and transformation of data.

Maturity Assessment Question(s)

To sum up, an enterprise that properly documents and actually follows its business rules will have a better

DW and will also manage change better than one that ignores its rules. Being hard to assess at a high level

which business rules are defined and implemented, we decided to include a more general assessment

question that can be seen here.

6) To what degree have you defined and documented definitions and business rules for the necessary

transformations, key terms and metrics?

a) Very low – No business rules defined

b) Low – Most of the business rules defined and documented

c) Moderate – Few business rules defined and documented

d) High – Some business rules defined and documented

e) Very high – All business rules defined and documented.

7) To what degree have you implemented definitions and business rules for the necessary transformations, key

Page 41: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 32 -

terms and metrics?

a) Very low– No business rules implemented

b) Low – Most of the business rules implemented

c) Moderate – Few business rules implemented

d) High – Some business rules implemented

e) Very high – All business rules implemented.

Table 10: Business Rules Maturity Assessment Questions.

4.1.7 DW Performance Tuning

DWs usually contain large volumes of data. At the same time, they are query-centric systems and hence,

the need to process queries faster dominates. That is the reason why various methods are needed to

improve performance (Ponniah, 2001):

Software performance improvement – the most often used are (Chauduri & Dayal, 1997):

- index management – indexes are database objects associated with database tables and created to

speed up access to data within the tables. Indexing techniques have already been in existence for

decades for transactional systems, but in order for them to handle large volume of data and

complex queries common in DWs, some new or modified techniques have to be implemented for

indexing the DWs (Vanichayobon & Gruenwald, 2004). The most used indexing techniques for

data warehousing are: B-tree index, bitmap index, projection index.

- data partitioning – typically, the DW holds some very large database tables. Loading these tables

can take excessive time; building indexes for large tables can also create problems sometimes.

Therefore, another solution for performance tuning is data partitioning which mean deliberate

splitting of a table and its index into manageable parts.

- parallel processing – major performance improvement can be achieved if the processing is split

into components that are executed in parallel. The simultaneous concurrent executions will

produce the results faster. Parallel processing techniques work in conjunction with data

partitioning schemes. They are usually features of the used DBMS and some physical options are

also critical for effective parallel processing.

- view materialization – many queries over DWs require summary data, and therefore use

aggregates. Hence, besides the detailed data, the DW needs to contain summary data.

Materializing summary data on different parameters can help to accelerate many common queries

by significantly speeding up query processing,

Hardware performance improvement – scale the DW server to match the query requirements,

tune the DW computing platform (i.e.: a set of hardware components and the whole network).

Specialized DW appliances or DW Cloud Computing – an overview on the former was given in

4.1.3. Cloud Computing is the latest trend in data warehousing/BI and it is not very mature yet.

Some of the advantages of Cloud Computing are: performance - better query and data load

performance; simplicity – rapid time to value and simple tools for agile provisioning and

simplified management; elasticity – scale on demand; low acquisition and maintenance costs –

price based on utilization.

Page 42: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 33 -

Maturity Assessment Question(s)

An organization that has a DW in place usually starts its performance tuning with the first category (i.e.:

software tuning), and if this does not pay off, they continue with the second option (i.e.: hardware tuning).

However, the organizations with a lot of experience in data warehousing understand that the best solution

to improve performance is to buy a DW specialized appliance or to resort to the latest trend, DW cloud

computing. Therefore, the maturity question for performance tuning can be depicted in the table below.

8) To what degree do you use methods to increase the performance of your DW?

a) Very low – No methods to increase performance

b) Low – Software performance tuning (e.g.: index management, parallelizing and partitioning system, views

materialization)

c) Moderate – Hardware performance tuning (e.g.: DW server)

d) High – Software and hardware tuning

e) Very high – Specialized DW appliances (e.g.: Netezza, Teradata) or cloud computing.

Table 11: Performance Tuning Maturity Assessment Question.

4.1.8 DW Update Frequency

The classical DW solutions were built for strategic and tactical BI that would help executives or line-of-

business mangers develop and assess progress in achieving long-term enterprise goals. This uses

historical data which is one day to a few months or even years old. However, this tradition has been

changing lately. With ever-increasing competition and rapidly changing customer needs and technologies,

enterprise decision makers are no longer satisfied with scheduled analytics reports, pre-configured KPIs

or fixed dashboards. They demand ad hoc queries to be answered quickly, they demand actionable

information from analytic applications using real-time business performance data, and they demand these

insights be accessible to the right people exactly when and where they need them (Azvine, 2005).

Therefore, real time processing is an increasingly common requirement in data warehousing, as more and

more business users expect the DW to be continuously updated throughout the day and grow impatient

with stale data. However, building a real time DW/BI system requires gathering a very precise

understanding of the true business requirements for real time data and identifying an appropriate ETL

architecture that incorporates a variety of technologies integrated with a solid platform.

Maturity Assessment Question(s)

As a conclusion, one could say that an organization that does real-time data warehousing is a very mature

one as it probably has optimized processes and ETL. Real time data warehousing is however, a very

complex activity and it is hard to judge from a high level point of view if it is done successfully and with

a high data quality. But, we will tackle this problem by including here a maturity question regarding the

update frequency of the DW and another question in the ETL part that will assess its complexity and

performance.

9) Which answer best describes the update frequency for your DW?

a) Level 1 – Monthly update or less often

b) Level 2 – Weekly update

c) Level 3 – Daily update

Page 43: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 34 -

d) Level 4 – Inter-daily update

e) Level 5 – Real-time update.

Table 12: Update Frequency Maturity Assessment Question.

4.2 Data Modelling

4.2.1 Data Modelling Definition and Characteristics

A data model is ―a set of concepts that can be used to describe the structure of and operations on a

database‖ (Navathe, 1992). By structure of a database, (Navathe, 1992) refers to the data types,

relationships and constraints that define the ―template‖ of that database. Hence, data modelling is the

process of creating a data model.

Furthermore, data modelling is very important for creating a successful information system as it defines

not only data elements, but also their structures and relationships between them. Data modelling

techniques are used to model data in a standard, consistent, predictable manner in order to manage it as a

resource. Some authors like (Simsion & Witt, 2005) consider the data model to be ―the single most

important component of an information system‘s design‖ due to several reasons:

leverage – a small change to the data model may have major impact on the system as a whole.

Problems with data organization arise not only from failing to meet initial business requirements,

but also from expensive changes to the business after the database had been built.

conciseness – a data model is a very powerful tool for expressing information systems

requirements and capabilities whose value lies partly in conciseness.

data quality – a data model plays a key role in achieving good data quality by establishing a

common understanding o what is to be held in each table and column.

4.2.2 Data Models Classifications (Data Models Levels and Techniques)

Throughout time, there were a lot of data models developed that can be classified mainly along two

dimensions (Navathe, 1992):

a) the first dimension deals with the steps of the overall database design activity to which the model

applies. The classic database design process consists of mapping requirements of data and

applications successively through the following steps (Navathe, 1992): conceptual design, logical

design and physical design. (Golfarelli & Rizzi, 1998) and (Husemann et al., 2000) propose a

DW design approach similar to the traditional database design. Hence, we will consider the three

sequential phases/levels in figure 8 to serve as a reference for a complete DW design process

model: conceptual design, logical design, physical design.

Page 44: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 35 -

Figure 8: DW Design Process Levels (adapted from (Husemann et al., 2000)).

Conceptual Design

Conceptual design translates user requirements into an abstract representation understandable to the user

that is independent of implementation issues, but is formal and complete, so that it can be transformed

into the next logical schema without ambiguities (Tryfona et al., 1999). The conceptual data model is

usually represented as a diagram with supporting documentation (Simsion & Witt, 2005) (e.g.: high level

model diagram as described by (Kimball et al., 2008) for dimensional modelling).

Logical Design

Logical design models data using constructs that are easy for users to follow, avoid physical details of

implementation, but typically depend on the kind of DBMS used in the implementation (e.g.: relational

data model, dimensional data model, etc.) (Navathe, 1992). It is usually the most often implemented and

it makes the connection between the conceptual design and the physical one. Logical design is still easily

understood by users, and it does not deal with the physical implementation details yet. It only deals with

defining the types of information that are needed.

Physical Design

Physical design incorporates any necessary changes to achieve adequate performance and consists of a

variety of choices for storage of data in terms of clustering, partitioning, indexing, directory structure,

access mechanisms, etc. (Navathe, 1992; Simsion & Witt, 2005). Some guidelines on developing

concepts for describing physical implementations along the lines of a data model can be found in (Batory,

1988).

b) the second dimension deals with the flexibility (i.e.: the ease with which a model can deal with

complex application situations) and expressiveness of the data model (i.e.: the ease with which a

model can bring out the different abstractions and relationships in an involved application) and it

includes mainly the following types of models: record-based data models, semantic data models

and object-based models. For an overview on these models, see (Navathe, 1992).

Page 45: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 36 -

In this section we will briefly describe two of the most often used data modelling techniques in data

warehousing: entity-relationship data models, relational data models. We will have a separate paragraph

for dimensional modelling because, as mentioned before, we will focus on this data modelling technique

in our research and we will have several questions in the data modelling maturity assessment

questionnaire dedicated to dimensional modelling.

Entity-Relationship (ER) Data Models

Entity-relationship (ER) model proposed by (Chen, 1975), is one of the most famous semantic data

models, and it has been a precursor for many subsequent variations. It is used mainly for conceptual

design and the basic constructs in the ER model are (Chen, 1975):

entities – An entity is recognized as being capable of an independent existence which can be

uniquely identified. It is an abstraction from the complexities of a certain domain and it can be a

physical object, an event or a concept. Entities can be viewed as nouns.

relationships – A relationship captures how two or more entities are related to one another.

Relationships can be thought of as verbs, linking two or more nouns.

attributes – An attribute expresses the information about an entity or a relationship which is

obtained by observation or measurement.

Moreover, entities, relationships and attributes are classified in sets and this is what ER models usually

show. However, the distinction between entities and relationships or entities and attributes can sometimes

be fuzzy and it should be clarified for each particular environment. In conclusion, the ER model is fairly

simple to use, has been formalized and has a reasonably unique interpretation with an easy diagrammatic

notation. It has remained a favourite as a means for conceptual design as an easy way of communication

in the early stages of database design. It is also used for conceptual DW design (for both Inmon‘s and

Kimball‘s views), but especially for enterprise-wide DWs when applying Inmon‘s view on developing

DWs.

Relational Data Models

The relational data model is a record-base data model proposed by (Codd, 1970). It became a landmark

development in this area because it provided a mathematical basis to the discipline of data modelling. The

fundamental assumption is that all data are represented as mathematical n-ary relations, an n-ary relation

being a subset of the Cartesian product of n domains given as sets (i.e.: S1, S2, …, Sn, not necessarily

distinct). A relation on n sets can also be defined as a set of n tuples each of which has its first element

from S1, its second element from S2, and so on (Codd, 1970).

These relations are organized in the form of tables which consist of tuples (rows) of information defined

over a set of attributes (columns). The attributes, in turn, are defined over a set of atomic domains of

values. The data from the model are operated upon by means of a relational algebra, which includes

operations of selection, projection, join as well as set operations of union, intersection, Cartesian product,

etc. Moreover, there are two types of constraints that apply for this model:

the entity integrity constraint – guarantees the uniqueness of a table‘s key;

Page 46: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 37 -

the referential integrity constraint – guarantees that whenever a column in one table derives

values from a key of another table, those values must be consistent.

Due to its simplicity of modelling, the relational data model gained a wide popularity among business

applications developers. It is usually used to capture the microscopic relationships among data elements

and eliminate data redundancies. It is extremely beneficial for transaction processing because it makes

transaction loading and updating simple and fast. However, it is also used for DW design as the logical

model when following Inmon‘s view on developing DWs.

Maturity Assessment Question(s)

As we are not going to judge which data modelling technique is better for data warehousing, we

considered that two significant characteristics that could determine the maturity of this category for a DW

are: the synchronization (i.e.: establishing consistency among data from a source to a target data storage

and vice versa and the continuous harmonization of the data over time) between all the data models found

in the DW (i.e.: ETL source and target models, DW and data marts models, BI models); and the

differentiation between data models levels (i.e.: physical, logical and conceptual). Companies usually

ignore the conceptual level as, at first, they do not see any benefits from it. However, in time, some of

them realize that it is very important for a solid and consistent data modelling and start designing it.

1) Which answer best describes the degree of synchronization between the following data models that your

organization maintains and the mapping between them: ETL source and target models; DW and data marts

models; BI semantic or query object models?

a) Automatic synchronization of all of the data models

b) Manual synchronization of some of the data models

c) No synchronization between data models

d) Manual or automatic synchronization depending on the data models

e) Automatic synchronization of most of the data models.

2) To what degree do you differentiate between data models levels: physical, logical and conceptual?

a) No differentiation between data models levels

b) All data models have conceptual, logical and physical levels designed

c) Logical and physical levels designed for some data models

d) Conceptual level also designed for some data models

e) Logical and physical levels designed for all the data models.

Table 13: Data Model Synchronization and Levels Maturity Assessment Questions.

4.2.3 Dimensional Modelling

ER diagrams and relational modelling are popularly used for database design in OLTP environments, but

also in DWs. However, the database designs recommended by ER diagrams are considered by some

authors to be inappropriate for decision support systems where efficiency in querying and in loading data

is very important (Chauduri & Dayal, 1997). Relational (i.e.: normalized) modelling has some

characteristics that are appropriate for OLTP systems, but not for DWs:

its structure is not easy for end-users to understand and use. In OLTP systems this is not a

problem because, usually, end-users interact with the database through a layer of software.

Page 47: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 38 -

data redundancy is minimized. This maximizes efficiency of updates, but tends to penalize

retrievals. Data redundancy is not a problem in DWs because data is not updated on-line.

Dimensional modelling came as a solution to these problems. It was proposed by (Kimball, 1996) and has

been adopted as the predominant approach to designing DWs and data marts in practice (Moody &

Kortink, 2000). Dimensional modelling is a logical design technique for structuring data so that it is

intuitive to business users and delivers fast query performance (Kimball, 1996). The main advantages of

dimensional modelling are (Kimball et al., 2008; Ponniah, 2001): understandability, query performance

and flexibility.

Dimensional modelling divides the world into:

measurements – Measurements are captured by the organization‘s business processes and their

supporting operational source systems. They are usually numeric values and are called facts.

context – Facts are surrounded by largely textual context that is true at the moment the fact is

recorded. This context is intuitively divided into independent logical parts called dimensions.

Each of the organization‘s business processes can be represented by a dimensional model that consists of

a fact table containing the numeric measurements surrounded by several dimension tables containing the

textual context. This star-like structure is often called a star join (Kimball et al., 2008). Dimensional

models can be stored in:

a relational database platform (i.e.: a ROLAP server) – they are typically referred to as star

schemas.

multidimensional online analytical structures (i.e.: MOLAP servers) – they are typically called

cubes.

An example of a star schema and a cube can be seen in the figure below.

Figure 9: Star Schema vs. Cube (adapted from (Chauduri & Dayal, 1997)).

However, star schemas do not explicitly provide support for attribute hierarchies and sometimes,

snowflake schemas are used. They provide a refinement of star schemas where the dimensional hierarchy

Page 48: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 39 -

is explicitly represented by normalizing the dimension tables. This leads to advantages in maintaining the

dimension tables. However, the denormalized structure of the dimensional tables in star schemas may be

more appropriate for browsing the dimensions. There are also other structures used for dimensional

modelling (e.g.: fact constellations), but the ones we presented are the most often implemented. For more

information on dimensional modelling, see (Kimball, 1996) and (Kimball et al., 2008).

Fact Tables

Fact tables store the performance measurements generated by the organization‘s business activities or

events. The value of fact is usually not known in advance because it is variable and the fact‘s valuation

occurs at the time of the measurement event. Two aspects are important when analyzing fact tables

(Kimball et al., 2008):

Fact table keys – they are characterized by a multipart key made up of foreign keys coming from

the intersection of the dimension tables involved in the business process. This shows that a fact

table always expresses a many-to-many relationship.

Fact table granularity – it refers to the level of detail of the data stored in a fact table. High

granularity refers to data that is at or near the transaction level, data referred to as atomic level

data. Low granularity refers to data that is summarized or aggregated, usually from the atomic

level data.

Dimension Tables

In contrast to the rigid qualities of fact tables consisting of only keys and numeric measurements,

dimension tables are filled with a lot of descriptive fields. In many ways, the power of the DW is

proportional to the quality and depth of the dimension attributes as robust dimensions translate into robust

querying and analysis capabilities. The most important aspects when analyzing dimension tables are

(Kimball et al., 2008):

Dimension table keys – whereas fact tables have a multipart key, dimension rows are uniquely

identified by a single key field. It is recommended that surrogate keys should be used and not the

keys that were used in the source systems. These surrogate keys are meaningless and they merely

serve as join fields between the fact and dimension tables. For practical reasons, they are usually

represented as simple integers assigned in sequence.

Conformed dimensions – they are dimensions that adhere to the same structure and are shared

across the enterprise‘s DW environment, joining to multiple fact tables representing various

business processes. Conformed dimensions are either identical or strict mathematical subsets of

the most granular, detailed dimension. Dimension tables are not conformed if the attributes are

labeled differently or contain different values.

Hierarchies – A hierarchy is a set of parent-child relationships between attributes within a

dimension. These hierarchy attributes, called levels, roll up from child to parent, for example,

Customer totals can roll up to Sub-region totals which can further rollup to Region totals. Another

example would be: Daily sales roll up to Weekly sales, which rollup to Month to Quarter to

Yearly sales.

Slowly changing dimensions - the dimensional model needs to track time-variant dimension

attributed as required by the business requirements. There are mainly three techniques for

Page 49: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 40 -

handling slowly changing dimensions (SCDs): type 1 – overwrite of one or more attributes in an

existing dimension row; type 2 – copy the previous version of the dimension row and create a

new row with a new surrogate key; type 3 – add and populate a new column of the dimension

table with the previous values and populate the original column with the new values. Of course,

these techniques will sometimes be used in a hybrid approach for better management.

Special dimensions – These are dimensions that are only sometimes needed, but they involve

knowledge and experience to be successfully built: mini dimensions (i.e.: dimensions created by

the possible combination of the frequently analyzed or frequently changed attributes of the

rapidly changing large dimensions); large dimensions (i.e.: dimensions with a very large number

of rows or with a large number of attributes), junk dimensions (i.e.: structures that provide a

convenient place to store junk attributes such as transactional codes, flags and/or text attributes

that are unrelated to any particular dimension), etc.

Maturity Assessment Question(s)

The maturity assessment part for dimensional modelling includes three questions on the most important

characteristics for fact and dimensional tables. The best approach for designing fact tables is to have a

very high percentage of data at a low level of granularity in order to be able to do analysis at whichever

level of aggregation. Regarding the dimension tables, the implementation of slowly changing dimensions

and special dimensions implies advanced knowledge and experience, and are therefore specific to

organizations on a higher maturity stage.

3) What percentage of all your fact tables has their granularity at the lowest level possible?

a) Very few fact tables have their granularity at the lowest level possible

b) Few fact tables have their granularity at the lowest level possible

c) Some fact tables have their granularity at the lowest level possible

d) Most fact tables have their granularity at the lowest level possible

e) All fact tables have their granularity at the lowest level possible.

4) To what degree do you design conformed dimensions in your data models?

a) No conformed dimensions

b) Conformed dimensions for few business processes

c) Enterprise-wide standardized conformed dimensions for most business processes; also making use of a high

level design technique such as an enterprise bus matrix

d) Conformed dimensions for some business processes

e) Enterprise-wide standardized conformed dimensions for all business processes.

5) Which answer best describes the current state of your dimension tables modelling?

a) Few dimensions designed; no hierarchies or surrogate keys designed

b) Some dimensions designed with surrogate keys and basic hierarchies (if needed)

c) Most dimensions designed with surrogate keys and basic/complex hierarchies (if needed)

d) Slowly changing dimensions techniques (i.e.: type 2, 3 and more) also designed

e) Besides regular dimensions and slowly changing dimensions techniques, special dimensions are also

designed (e.g.: mini, monster, junk dimensions).

Table 14: Dimensional Modelling Maturity Assessment Questions.

Page 50: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 41 -

4.2.4 Data Modelling Tool

Data models can be created by just drawing the models in different spreadsheets and documents.

However, the more optimum solution is to use a data modelling tool. The main advantages of using a data

modelling tool are (Kimball et al., 2008):

It makes the connection and transition between all the data models levels easier.

It integrates the DW model with other corporate data models.

It helps assure consistency in naming and definition.

It creates good documentation in a variety of useful formats.

It makes metadata management for data modelling easier.

However, the most important benefits of using a data modelling tool refer to making the design itself and

metadata management easier and more efficient.

Maturity Assessment Question(s)

As the usage of a data modelling tool can be a differentiator for an organization developing a DW

solution, we included in our assessment a maturity question derived from the information provided above:

6) Which answer best describes the usage of a data modelling tool in your organization?

a) Level 1 – No data modelling tool

b) Level 2 – Scattered data modelling tools used only for design

c) Level 3 – Scattered data modelling tools used also for maintenance

d) Level 4 – Standardized data modelling tool used only for design

e) Level 5 – Standardized data modelling tool used for design and maintaining metadata.

Table 15: Data Modelling Tool Maturity Assessment Questions.

4.2.5 Data Modelling Standards

DW Standards Overview

Standards in a DW environment are necessary and cover a wide range of objects, processes, and

procedures. Standards range from how to name the fields in the database to how to conduct interviews

with the user departments for requirements definition. Standards do not need only to be defined and

documented, but it is very important to actually implement them and use them constantly. The definition

of standards would also benefit if a person or a group in the DW would be designated to revise the

standards and keep them up-to-date. By consistently applying standards, it will be much easier for the

business users and developers to navigate the complex DW system. Standards also provide a consistent

means for communication. Effective communication must take place among the members of the project

and the users. Standards ensure consistency across the various areas leaving less room for ambiguity.

Therefore, one could say that the importance of standards cannot be overemphasized (Ponniah, 2001).

This is why many companies invest a lot of time and money to prescribe standards for their information

systems and implicitly, for their DW.

Page 51: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 42 -

As can be seen, standards can be defined and implemented for every part of the DW architecture and

processes and this is why we will include questions regarding the definition and implementation of

standards for the maturity assessment of each of the major components – data modelling, ETL, BI

applications.

Data Modelling Standards

With regard to data modelling, standards are many and diverse. They can be applied to all the data models

levels (i.e.: conceptual, logical and physical) and most often standards like naming conventions for the

objects and attributes in the data models take on special significance. Other standards here refer to the

way one data model is derived from the other, the way metadata is documented or how data quality is

taken care of in this phase.

Maturity Assessment Question(s)

All the maturity assessment questions related to standards will address general aspects such as the

definition and documentation of standards and their actual implementation. The same principle applies for

data modelling. There is an important distinction between having some standards defined and written

down somewhere and actually following those standards.

7) To what degree have you defined and documented standards (e.g.: naming conventions, metadata, etc.) for your

data models?

a) Very low – No standards defined for data models

b) Low – Solution-dependent standards defined for some of the data models

c) Moderate – Enterprise-wide standards defined for some of the data models

d) High – Enterprise-wide standards defined for most of the data models

e) Very high – Enterprise-wide standards defined for all the data models.

8) To what degree have you implemented standards (e.g.: naming conventions, metadata, etc.) for your data

models?

a) Very low – No standards defined for data models

b) Low – Solution-dependent standards defined for some of the data models

c) Moderate – Enterprise-wide standards defined for some of the data models

d) High – Enterprise-wide standards defined for most of the data models

e) Very high – Enterprise-wide standards defined for all the data models.

Table 16: Data Modelling Standards Maturity Assessment Questions.

4.2.6 Data Modelling Metadata Management

Data models usually need a lot of metadata (business and technical) to be documented in order to create

consistency and understandability for both developers and users. A common subset of business metadata

components as they apply to data includes (Moss & Atre, 2003): data names, definitions, relationships,

identifiers, types, lengths, policies, ownership, etc. The standardization of the metadata documentation is

also critical for integration among data models. Hence, the maturity question depicted below.

Maturity Assessment Question(s)

9) To what degree have you documented the metadata (e.g.: definitions, business rules, main values, data quality,

Page 52: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 43 -

etc.) in your data models?

a) Very low – No documentation for any data models

b) Low – Non standardized documentation for some of the data models

c) Moderate – Standardized documentation for some of the data models

d) High – Standardized documentation for most of the data models

e) Very high – Standardized documentation for all the data models.

Table 17: Data Modelling Metadata Management Maturity Assessment Questions.

4.3 Extract – Transform – Load (ETL)

4.3.1 What is ETL?

The Extract-Transform-Load (ETL) process is part of the DW back room component. As the name shows,

the ETL process involves the following activities:

extracting data from outside sources;

transforming data to fit the target‘s requirements;

loading data into the target database.

According to (Kimball et al., 2008), there is also a forth component of the ETL system, called managing

the ETL environment. This component is very important as in order for the ETL processes to run

consistently to completion and be available when needed, they have to be managed and maintained. These

activities are also part of the DW maintenance and monitoring processes, but there are some important

technical components that need to be implemented and this is why we will also elaborate on it in this

paragraph. Moreover, (Kimball et al., 2008) propose 34 subsystems that form the ETL architecture and

divide them for every ETL main activity (i.e.: extract, transform, load and manage).

However, even if the name seems to be understood by everyone, nobody can say why the ETL system is

so complex and resource demanding (Kimball et al., 2008). Easily, 60 to 80 percent of the time and effort

of developing a DW project is devoted to the ETL system (Nagabhushana, 2006). Building an ETL

system is very challenging because many outside constraints put pressure on the ETL design: business

requirements, source data systems, budget, processing windows and available staff skills. Hence,

designing ETL processes is extremely complex, often prone to failure, and time consuming (Simitsis et

al., 2005). However, since it is extensively recognized that the design and maintenance of the ETL

processes are a key factor in the success of a DW project (March & Hevner, 2007; Solomon, 2005),

organizations put a lot of effort into implementing a powerful ETL system. In order to formulate the

chosen maturity questions for this category, we would like to first give a short overview on each ETL

component.

4.3.2 Extract

The extraction system is the first of the ETL architecture. It addresses the issues of understanding the

source data, extracting the data and transferring it to the DW environment where the ETL system can

operate on them independent of the operational systems (Kimball et al., 2008). Depending on the DW

Page 53: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 44 -

architecture, the extracted data may go directly into the DW or in a data staging area. Extraction

essentially resumes to two questions (Loshin, 2003):

What data should be extracted?

How should that data be extracted?

The first question essentially relies on which results clients expect to see in their BI applications.

However, the answer is not that simple as it depends on what source data we have and also, the data

model that the architects had previously developed. The answer to the second question may depend on

the scale of the project, the number and disparity of the data sources, and how far into the implementation

the developers are. Extraction can be as simple as a collection of simple SQL queries or as complex as to

require ad hoc, specially designed programs written in a proprietary programming language (Loshin,

2003). The other alternative is to use tools to help automate the process and obtain better results.

Depending on the organization and the data warehouse project, data can be extracted from various source

systems.

Moreover, according to (Kimball et al., 2008), there are three subsystems that support the extraction

process:

Data profiling system – it does the technical analysis of data to describe its content, consistency

and structure. It focuses on the instance analysis of individual attributes, providing information

such as data type, length, value range, uniqueness, occurrence of null values, typical string

pattern, etc. (Rahm & Hai Do, 2000). The profiling step protects the ETL team from dealing with

dirty data and provides them guidance to set expectations regarding realistic development

schedules, limitations in the source data and the need to invest in better source data capture

practices.

Change data capture (CDC) system – It will offer the capability to transfer only the source data

that has changed since the last load. This is not important at the first historic load, but it will

prove very useful from this point forward. Implementing the CDC system is not an easy task. For

more information on how to capture source data changes, see (Kimball et al., 2008).

Extract system – This is a fundamental component of the ETL architecture and it refers to the data

extraction itself whether it is done by writing scripts or by using a tool. Sometimes, data has to be

extracted from only one system, but most of the times, each source might be in a different system.

There are two primary methods for getting data from a source system: as a file or as a stream that

is constructing the extract system as a single process. Other two important aspects that need to be

taken into consideration in the extraction phase are: data compression – important when large

amounts of data have to be transferred through a public network; and data encryption – important

for security reasons.

4.3.3 Transform

The transformation step is where the ETL system adds value to the data through the changes it makes.

Usually, this phase includes cleaning and transforming the data according to the business rules and

standards that have been established for the DW.

Page 54: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 45 -

Data Cleaning

Data cleaning, also called data cleansing or scrubbing, is part of the complex and important data quality

processes. It deals with detecting and removing errors and inconsistencies from data in order to improve

their quality (Rahm & Hai Do, 2000). As DWs are used for decision making, the correctness of their data

is very important to avoid wrong results. ―Dirty data‖ (e.g.: duplicates, missing data) will produce

incorrect statistics proving the concept of ―garbage in, garbage out‖. Hence, due to the wide range of

possible data inconsistencies and large data volume, data cleaning is considered to be one of the biggest

problems in data warehousing. However, many organizations do not cleanse their data and believe that

this is the responsibility of the source systems. Qualitative or accurate data means that data are (Kimball

& Caserta, 2004):

correct – the values and descriptions in data describe their associated objects truthfully and

faithfully;

unambiguous – the values and descriptions in data can be taken to have only one meaning;

consistent – the values and descriptions in data use one constant notational a convention to

convey their meaning;

complete – the individual values and descriptions in data are defined (not null) for each instance;

and the aggregate number of records is complete.

Even if most often data cleansing is done manually or by low-level programs that are difficult to write and

maintain, data quality tools are available to enhance the quality of the data at several stages in the process

of developing a data warehouse. Cleansing tools can be useful in automating many of the activities that

are involved in cleansing the data: parsing, standardizing, correction, matching and transformation.

A part of the data quality process is represented by quality screens or tests that act as diagnostic filters in

the data flow pipelines (Kimball et al., 2008). What is important here is the action taken when an error is

thrown: 1) halting the process; 2) sending the offending record(s) to a suspense file for later processing;

3) merely tagging the data and passing it through to the next step in the pipeline in order. The last choice

is of course the best one is it offers the possibility of taking care of data quality without aborting the job.

Two other deliverables that can be of help in the data cleaning activities and are usually hard to

implement are (Kimball & Caserta, 2004):

the error-event schema – captures all error events that are vital inputs to data quality

improvement.

the audit dimension assembler – attaches metadata to each fact table as a dimension. This

metadata is available to BI applications for visibility into data quality.

Maturity Assessment Question(s)

Data quality is very important for data warehousing due to the fact that if users do not trust data, they will

not use the DW environment that will be considered a failure. At the same time, it is also one of the

biggest DW challenges (Ponniah, 2001), as high data quality is very hard to achieve. Of course, when

taking a first look at the DW, it is difficult to assess the actual data quality. This is why we included a

question that checks whether a specific organization addresses data quality by identifying and solving

Page 55: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 46 -

data quality issues. The usage of data quality tools is of course a strong point and an organization that

uses them will definitely get better results.

1) Which answer best describes the data quality system implemented for your ETL?

a) Daily automation: yes / no; Specific data quality tools: yes / no; Identifying data quality issues: yes;

Solving data quality issues: no

b) Daily automation: no; Specific data quality tools: no; Identifying data quality issues: yes; Solving data

quality issues: no

c) Daily automation: yes / no; Specific data quality tools: yes / no; Identifying data quality issues: yes;

Solving data quality issues: yes

d) Daily automation: no; Specific data quality tools: no; Identifying data quality issues: no; Solving data

quality issues: no

e) Daily automation: yes; Specific data quality tools: yes; Identifying data quality issues: yes; Solving data

quality issues: yes.

Table 18: Data Quality Maturity Assessment Questions.

Data Transformation

Besides data cleaning, the transformation system literally transforms the data in accordance with the

business rules and standards that have been established for the DW. Typical transformations that are

implemented in a DW are (Nagabhushana, 2006):

format changes - change data from different sources to a standard set of formats for the DW;

de-duplication – compare records from multiple sources to identify duplicates and merge them

into a unified one;

splitting-up fields/integrating fields – split-up a data item from the source systems into one or

more fields in the DW/integrate two or more fields from the operational systems into a DW field;

derived values – compute derived values using agreed formulas (e.g.: averages, totals, etc.);

aggregation – create aggregate records based on the atomic DW data;

other transformations such as filtering, sorting, joining data from multiple source, transposing or

pivoting, etc.

4.3.4 Load

The DW load system takes the load images created by the extraction and transformation subsystems and

loads these images directly into the DW. A good load system should be able to perform the following

activities (Kimball et al., 2008; Nagabhushana, 2006):

generate surrogate keys – create standard keys for the DW separate from the source systems keys;

manage slowly changing dimensions (SCDs);

handle late arriving data – apply special modifications to the standard processing procedures to

deal with late-arriving fact and dimension data;

drop indexes on the DW when new records are inserted;

load dimension records;

load fact records;

compute aggregate records using base fact and dimension records;

Page 56: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 47 -

rebuild or regenerate indexes once all loads are complete;

log all referential integrity violations during the load process.

Maturity Assessment Question(s)

The maturity assessment question for this category aims to give an overview on the general complexity

and performance of ETL. Once again, we are not trying to judge how certain activities are done, but only

if they exist. As mentioned before, the latest trend in this field is real-time data warehousing which puts a

lot of pressure on ETL. Hence, the highest level of maturity for ETL involves real-time capabilities.

2) Which answer best describes the complexity of your ETL?

a) Simple ETL that just extracts and loads data into the data warehouse

b) Basic ETL with simple transformations such as: format changes, sorting, filtering, joining, deriving new

calculated values, aggregation, etc and surrogate key generator

c) Advanced ETL capabilities: slowly changing dimensions manager, reusability, change data capture system,

de-duplication and matching system, data quality system

d) More advanced ETL capabilities: error event table creation, audit dimension creation, late arriving data

handler, hierarchy manager, special dimensions manager

e) Optimized ETL for a real time DW (real-time ETL capabilities).

Table 19: ETL Complexity Maturity Assessment Question.

4.3.5 Manage

In order for the DW project to be a success, the ETL processes need to reliable, available and

manageable. This is the reason why (Kimball et al., 2008) consider the management subsystem to be the

forth component of the ETL system. They propose 13 subsystems to be included in this ETL component.

Some of them can also be found in (Nagabhushana, 2006; Chauduri & Dayal, 1997), but they are not

grouped into a separate subsystem of the ETL process. The most important capabilities for a successful

management of the ETL system are:

an ETL job scheduler;

a backup system;

a recovery and restart system – it can be manual or automatic;

a workflow monitor – ensures that the ETL processes are operating efficiently and gathers

statistics regarding ETL execution or infrastructure performance.

a version control and migration system – helps archiving and recovering all the logic and

metadata of the ETL process and then migrate this information to another environment (for

example, from development to test and on to production).

a data lineage and dependency system – identifies the source of a data element and all

intermediate locations and transformations for that data element.

a security system – security is an important consideration for the ETL system and the

recommended method is role-based security on all data and metadata in the ETL system;

a metadata repository management.

Page 57: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 48 -

Maturity Assessment Question(s)

In order to assess the maturity of the management and monitoring of ETL, we separated the necessary

activities in two categories: simple monitoring which is usually done first; and advanced monitoring

which is usually implemented by an organization that already has some experience in this field. A critical

aspect for ETL that can really make the difference is the restart and recovery system. An organization

usually evolves from not having a restart and recovery system at all to a completely automatic restart and

recovery system. However, the latter is very complex and prone to error and, therefore, it is very hard to

achieve.

3) Which answer best describes the management and monitoring of your ETL?

(Definitions:

Simple monitoring (i.e.: ETL workflow monitor – statistics regarding ETL execution such as pending,

running, completed and suspended jobs; MB processed per second; summaries of errors, etc.);

Advanced monitoring (i.e.: ETL workflow monitor – statistics on infrastructure performance like CPU

usage, memory allocation, database performance, server utilization during ETL; job scheduler – time

or event based ETL execution, events notification; data lineage and analyzer system))

a) Restart and recovery system: no; Simple monitoring: no; Advanced monitoring: no; Real-time monitoring:

no

b) Restart and recovery system: no; Simple monitoring: yes; Advanced monitoring: no; Real-time monitoring:

no

c) Manual restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes / no; Real-

time monitoring: no

d) Manual and automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes

/ no; Real-time monitoring: no

e) Completely automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring:

yes; Real-time monitoring: yes.

Table 20: ETL Management and Monitoring Maturity Assessment Question.

4.3.6 ETL Tools

There is a constant debate whether an organization should deploy custom-coded ETL solutions or should

buy an ETL tool suite (Kimball & Caserta, 2004). Using hand-coded ETL proves helpful sometimes

because it offers: object-oriented techniques that can make all the transformations consistent for error

reporting, validation and metadata updates; metadata can be more directly managed; in-house

programmers might be available; unlimited flexibility. However, even if programmers can set up ETL

processes using hand-coded ETL, building such processes from scratch can become complex.

That is the reason why companies are buying more and more often ETL tools for this purpose. There are

some advantages for buying an ETL tool such as: simpler, faster, cheaper development; users without

professional programming skills can use them effectively; integrated metadata repository; automated

generated metadata at every step of the ETL process; in-line encryption and compression capabilities;

good performance for very large data sets; possibility of augmenting the ETL tool with selected

processing modules hand coded in an underlying programming language.

Page 58: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 49 -

Maturity Assessment Question(s)

As already mentioned, ETL can be built by using a programming language or by using an ETL tool, the

latter being the more optimum solution. A company that uses hand-coded ETL usually does not have a

very complex ETL process which shows a low level of maturity regarding ETL capabilities. However, in

both cases, some standard scripts are sometimes needed which can increase the performance of ETL.

From the expert interviews we had and from the exploratory case study we did, we came up with another

possibility of generating ETL: complete ETL generated from metadata. This is rarely applied in practice

nowadays, but it is the desired solution for the future.

4) Which answer best describes the usage of an ETL tool in your organization?

a) Level 1 – Only hand-coded ETL

b) Level 2 – Hand-coded ETL and some standard scripts

c) Level 3 – ETL tool(s) for all the ETL design and generation

d) Level 4 – Standardized ETL tool and some standard scripts

e) Level 5 – Complete ETL generated from metadata.

Table 21: ETL Tools Maturity Assessment Question.

4.3.7 ETL Metadata Management

ETL is responsible for the creation and use of much of the metadata describing the DW environment.

Therefore, it is important to capture and manage all possible types of metadata for ETL: business,

technical and process metadata. Nevertheless, not many organizations manage to do this and thus, we

decided to include the following maturity question regarding ETL in our assessment.

Maturity Assessment Question(s)

5) To what degree is your metadata management implemented for your ETL?

a) Very low – No metadata management

b) Low – Business and technical metadata for some ETL

c) Moderate – Business and technical metadata for all ETL

d) High – Process metadata is also managed for some ETL

e) Very high – All types of metadata are managed for all ETL.

Table 22: ETL Metadata Management Maturity Assessment Question.

4.3.8 ETL Standards

A general overview on standards used in data warehousing was given in 4.2.5. Standards specific to ETL

are related to: naming conventions, set-up standards, recovery and restart system, etc. The maturity

questions and stages below are straightforward.

Maturity Assessment Question(s)

6) To what degree have you defined and documented standards (e.g.: naming conventions, set-up standards,

recovery process, etc.) for your ETL?

a) Very low – No standards defined

b) Low – Few standards defined for ETL

Page 59: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 50 -

c) Moderate – Some standards defined for ETL

d) High – Most standards defined for ETL

e) Very high – All the standards defined for ETL.

7) To what degree have you implemented standards (e.g.: naming conventions, set-up standards, recovery process,

etc.) for your ETL?

a) Very low – No standards defined

b) Low – Few standards defined for ETL

c) Moderate – Some standards defined for ETL

d) High – Most standards defined for ETL

e) Very high – All the standards defined for ETL.

Table 23: ETL Standards Maturity Assessment Questions.

4.4 BI Applications

4.4.1 What are BI Applications?

BI applications are part of the front-room component of the DW architecture (Kimball et al., 2008) and

are sometimes referred to as ―front-end‖ tools (Chauduri & Dayal, 1997). They are what the end-users see

and hence, are very important in order for a DW to be considered a successful one. According to (March

& Hevner, 2007), a crucial point for achieving DW implementation success is the selection and

implementation of appropriate end-user analysis tools, because business benefits of BI are only gained

when the system is adopted by its intended end-users. This is why BI applications must meet several

design requirements such as (Kimball et al., 2008): be correct – BI applications must provide accurate

results; perform well – queries should have a satisfactory response time; be easy to use – BI applications

should be customized for each category of users; have a nice interface – BI applications should be clear

and have an attractive design; be a long-term investment – BI applications must be properly documented,

maintained, enhanced and extended.

4.4.2 Types of BI Applications

Throughout time, BI applications have evolved from (simple) predefined reporting to (advanced) data-

mining tools to fulfill users‘ analytical needs (Breitner, 1997). Also, according to (Azvine et al., 2006),

traditional BI applications fall into the following categories sorted by ascending complexity:

report what has happened – standard reporting and query applications (i.e.: static/preformatted

reports; interactive/parameter-driven reports);

analyze and understand why it has happened – ad-hoc reporting and online analytical processing

(OLAP); visualization applications (i.e.: dashboards, scorecards);

predict what will happen – predictive analytics (i.e.: data and text mining).

However, in the last couple of years, due to the development of real-time data warehousing, a new

category of BI applications has developed called operational BI and closed-loop applications (Kimball et

al., 2008). As the complexity of the BI applications contributes to the maturity of a DW environment, we

will include a maturity question regarding this aspect and therefore, we will give a short overview on each

type of BI applications in the remainder of this paragraph.

Page 60: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 51 -

Standard Reporting and Query Applications

This category of BI applications is usually considered to be the entry level BI tooling, providing end users

with a core set of information about what is happening in a particular area of the business (Kimball et al.,

2008). Standard reports are the reports the majority of non-technical business users look at every day.

They represent an easy-to-use means to get the needed information with a very short learning curve. As

presented above, two types of standard reporting can be distinguished, based on the level of data

interactivity:

static/preformatted reporting – the most basic form of reporting which can be seen as a

repeatable, pre-calculated and non-interactive request for information. It is characterized by rigid

evaluations of business facts presented in a standard format on a routine basis to a targeted

audience (usually represented by casual users) (Eckerson, 2009).

interactive/parameter-driven reporting – this kind of reporting offers the possibility of creating

reports with dynamic content. End users now have some flexibility as they can choose from a

predefined set of parameters to filter reports content to their individual preferences and needs

(Turban et al., 2007). Once users get the view of the data they want, they can save the view as a

report and schedule it to run on a regular basis. This allows reports designers to create reports that

can serve multiple end users categories.

Analytic Applications

Analytic applications are more complex than standard reports. Although the latter offer the possibility of

creating reports of all shapes and detail levels, in many cases additional information is required (Varga &

Vukovic, 2008). This places higher requirements on the DW architecture and also on the end-users IT and

analytical skills. Analytic applications offer the possibility of ad-hoc (or online) data access and complex

analysis through a user friendly interface based system. In this way, users can formulate their own queries

directly into the data without the need of in-depth knowledge of SQL or other database query languages.

Probably the best known analytic technique is the Online Analytical Processing (OLAP), term coined by

E.F. Codd in 1993.

OLAP interfaces provide a fairly simple, yet extremely flexible navigation and presentation environment

that enables end-users to gain insight into data through fast, dynamic, consistent, interactive access to a

wide variety of possible views of information. This is possible due to the fact that data is characterized by

multidimensionality, being structured as a cube, designed with dimensions and facts. OLAP users can

navigate through the data cube using several operations such as (Breitner, 1997): roll-up (increasing the

level of aggregation) and drill-down (decreasing the level of aggregation or increasing detail) along one

or more dimension hierarchies, slice and dice (selection and projection to a certain layer or sub-cube) or

pivot (re-orienting the multidimensional view of data). It can be seen that although the data cube is a

simple structure, the large number of alternatives, including many numeric facts and dimensions and

many hierarchies or abstraction levels combine to form an immense universe of queries that can be

explored via an OLAP interface (Tremblay et al., 2007). Through OLAP, users can generate fast reports

regardless of database size and complexity and they are allowed to define new ad-hoc calculations in any

desired way without having advanced knowledge of SQL.

Page 61: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 52 -

Visualization Applications

Due to the flood of data available from information systems, standard reporting and analytic applications

are often not enough for business analysts and decision-makers to make sense out of the knowledge they

contain. This is the reason why, especially when dealing with large amounts of data, visualization

techniques can be very useful to facilitate data analysis. Information visualization is defined by (Chung et

al., 2005) as ―a process of constructing a visual presentation of abstract quantitative data. The

characteristics of visual perception enable humans to recognize patterns, trends and anomalies inherent in

the data with little effort in a visual display.‖ The main visualization applications that are usually used in

BI are dashboards and scorecards. According to (Eckerson, 2006), there are three types of performance

dashboards: operational dashboards – used to track core operational processes; tactical dashboards – used

by managers and analysts to track and analyze departmental activities, processes and projects; strategic

dashboards – used by executives and staff to chart their progress toward achieving strategic objectives.

(Eckerson, 2006) states that dashboards are part of the first two categories, whereas scorecards are use at

the strategic level.

Furthermore, it can be said that dashboards present various key performance indicators (KPIs) (i.e.: key

measures crucial to business strategy that must link to the organization‘s performance) in one screen view

with intuitive displays of information (e.g.: tables, graphs, charts, dials, gauges, etc.) similar to an

automobile control panel. Dashboards support status reporting and alerts generation across multiple data

sources at a high level, but also allow drill down to more specific data (Kimball et al., 2008).

As said above, scorecards are actually dashboards developed at a strategic level. They help executives

monitor their progress toward achieving strategic objectives. A scorecard can track an organization‘s

performance by measuring business activity at a summarized level and comparing these values to

predefined targets. In this way, executives can determine what actions should be taken in order to improve

performance. There are more types of scorecards, but the most implemented one is the balanced scorecard

defined by (Kaplan & Norton, 1992).

Data and Text Mining Applications (Predictive Analytics)

Data and text mining applications are sophisticated BI applications that involve advanced methods for

data analysis. It is a process that requires a lot of data which need to be in a reliable state before it can be

subjected to the data mining process. A newer technique is text mining which refers to the process of

deriving high-quality information from text. Data and text mining can also be found under the name of

knowledge discovery or the newer term, predictive analytics. Data mining is defined by (Holsheimer &

Siebes, 1994) as being ―the search for relationships and global patterns that exist in large databases, but

are „hidden‟ among the vast amount of data‖; these relationships can then offer valuable knowledge

about the database and the objects in the database. However, other researchers such as (Fayyad et al.,

1996) consider that actually knowledge discovery refers to the overall process of discovering useful

knowledge from data; whereas data mining refers to a particular step in this process that consists of

―applying data analysis and discovery algorithms that, under acceptable computational efficiency

limitations, produce a particular enumeration of patterns over the data‖ (Fayyad et al., 1996).

Page 62: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 53 -

Data mining relies on known techniques from fields like machine learning, pattern recognition, and

statistics. It also uses a variety of methods such as (Fayyad et al., 1996): classification, regression,

clustering, summarization, dependency modelling, change and deviation detection.

Operational BI and Closed-loop Applications

This category of applications is part of the real-time data warehousing requirement. It includes the use of

applications that are more sophisticated than typical operational reports, but leverage the rich historical

context across multiple business processes available in the DW to guide operational decision making.

These applications also frequently include transactional interfaces back to the source systems. The goal of

operational BI applications is to reduce the analysis latency – the time it takes to inform the person in

charge of data analysis that new data has to be analyzed, the time needed to choose appropriate analysis

models and the time to process the data and present the results (Seufert & Schiefer, 2005). Sometimes,

these applications may be produced by accessing live operational data. In other cases, when a certain

degree of data latency can be tolerated, the reports are produced using the information collected by the

(near) real-time DW. Hence, in order to get accurate operational results, activities and processes involved

in a DW project have to be optimized.

Maturity Assessment Question(s)

As can be seen from the short overview on the BI applications, the types of BI applications supported by

the DW environment are an important indicator on its maturity. For example, an organization that

develops predictive analytics certainly has experience in developing less complex applications such as ad-

hoc reports or visualization applications. The highest level of maturity refers to the development of

closed-loop and operational (real-time) BI applications as it is the last trend in this field and not many

organizations have the necessary skills and experience to develop them. As user requirements can change

very often and the time to deliver the updated BI applications is rather short, a characteristic that can act

as a differentiator is the usage of standardized objects (e.g.: KPIs, metrics, attributes, templates, etc.). This

being said, the maturity questions and stages can be seen below.

1) Which types of BI applications best describe the highest level purpose of your DW environment?

a) Level 1 – Static and parameter-driven reports and query applications

b) Level 2 – Ad-hoc reporting; online analytical processing (OLAP)

c) Level 3 – Visualization techniques: dashboards and scorecards

d) Level 4 – Predictive analytics: data and text mining; alerts

e) Level 5 – Closed-loop BI applications; real-time BI applications.

2) To what degree are standardized objects (e.g.: KPIs, metrics, attributes, templates) used in your BI applications?

a) Very low – Objects defined for every BI application

b) Low – Some reusable objects for similar BI applications

c) Moderate – Some standard objects and templates for similar BI applications

d) High – Most similar BI applications use standard objects and templates

e) Very high – All similar BI applications use standard objects and templates.

Table 21: Table 24: BI Applications Maturity Assessment Question.

Page 63: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 54 -

4.4.3 BI Applications Delivery Method

As end users are interested only in the results they get from the BI applications, the easiness of accessing

and delivering these results is critical for the success of the DW solution. The main BI applications

delivery methods are:

Physically (e.g.: on paper) or electronically (e.g.: by e-mail) delivered reports. Even if this

method is easy to implement, it is the least mature and efficient way of delivering BI applications.

Reports can be delivered manually or automatically.

Direct tool-based interface. This is a more evolved delivery method as it offers a better interface

for the users to use when they want to access their reports. It involves developing a set of reports

and providing them to the users directly using the standard data access tool interface (Kimball et

al., 2008). However, there might be some integration or accessibility problems if an organization

uses more BI tools.

A BI portal. Lately, the Web has become a popular environment for BI applications. The result of

this is the development of a new delivery method, the BI portal, which is also the most evolved

and difficult to implement and maintain. A BI portal will give the users a well organized, useful,

easily understood place to find the tools and information they need (Kimball et al., 2008;

Ponniah, 2001). Besides the structured BI applications, the BI portal should also offer functions

such as information center and help, discussion forum, alerting, metadata browser, etc. A

successful BI portal also needs to be highly interactive and always up-to-date.

Maturity Assessment Question(s)

From the information presented on BI applications delivery method, the maturity question we created for

assessing this characteristic is straightforward.

3) Which BI applications delivery method best describes the highest level purpose of your DW?

a) Level 1 – Reports are delivered manually physically (e.g.: on paper) or electronically (e.g.: by e-mail)

b) Level 2 – Reports are delivered automatically by email

c) Level 3 – Direct tool-based interface

d) Level 4 – A BI portal with basic functions: subscriptions, discussions forum, alerting

e) Level 5 – Highly interactive, business process oriented, up-to-date portal (no differentiation between

operational and BI portals).

Table 25: BI Applications Delivery Method Maturity Assessment Question.

4.4.4 BI Applications Tools

As we saw for data modelling and ETL, the usage of tool(s) can really make the difference between

organizations. This is the reason why we decided to also include a question regarding this aspect for BI

applications. After having the expert interviews, we decided that a very low maturity level is represented

by the usage of different BI tools for each data mart, whereas the highest maturity stage is reached when

there is one standardized tool for main stream BI applications (i.e.: reporting and visualization

applications which are most often developed) and one for specific BI applications (i.e.: data mining,

financial analysis which are harder to implement, and usually are specific to each department).

Page 64: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 55 -

Maturity Assessment Question(s)

4) Which answer best describes your current BI tool usage?

a) Level 1 – BI tool related to the data mart

b) Level 2 – More than two tools for main stream BI (i.e.: reporting and visualization applications)

c) Level 3 – One tool recommended for main stream BI, but each department can use their own tool

d) Level 4 – One standardized tool for main stream BI, but each department can use their own tool for specific

BI applications (i.e.: data mining, financial analysis, etc.)

e) Level 5 – One standardized tool for main stream BI and one standardized tool for specific BI applications.

Table 26: BI Tools Maturity Assessment Question.

4.4.5 BI Applications Metadata Management

As BI applications are what the end user sees, an important aspect is the accessibility of metadata. An

overview on how this can be achieved was offered in 4.1.4. Therefore, an organization can evolve from

showing no metadata to users to completely integrate metadata with the BI applications (e.g.: metadata

can be accessed through one button push on the attributes).

Maturity Assessment Question(s)

5) Which answer best describes the metadata accessibility to users?

a) Very low – No metadata available

b) Low – Some incomplete metadata documents that users ask for periodically

c) Moderate – Complete up-to-date metadata documents sent to users periodically or available on the intranet

d) High – Metadata is always available through a metadata management tool, different from the BI tool

e) Very high – Complete integration of metadata with the BI applications (e.g.: metadata can be accessed

through one button push on the attributes, etc.).

Table 27: BI Applications Metadata Management Maturity Assessment Question.

4.4.6 BI Applications Standards

A general overview on standards used in data warehousing was given in 4.2.5. Standards specific to BI

Applications include: naming conventions, generic transformations, logical structure of attributes and

measures, etc. Once again, we will not assess what standards are defined or implemented, but if this is

done. The maturity questions and stages below are straightforward.

Maturity Assessment Question(s)

6) To what degree have you defined and documented standards (e.g.: naming conventions, generic

transformations, logical structure of attributes and measures) for your BI applications?

a) No standards defined

b) Few standards defined for BI applications

c) Some standards defined for BI applications

d) Most standards defined for BI applications

e) All the standards defined for BI applications.

7) To what degree have you implemented standards (e.g.: naming conventions, generic transformations, logical

structure of attributes and measures) for your BI applications?

Page 65: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 56 -

a) No standards implemented

b) Few standards implemented for BI applications

c) Some standards implemented for BI applications

d) Most standards implemented for BI applications

e) All the standards implemented for BI applications.

Table 28: BI Applications Standards Maturity Assessment Questions.

4.5 Summary

In this chapter we took a closer look at the DW technical solution category and its main sub-categories:

general architecture and infrastructure, data modelling, ETL and BI applications. For each of them we

identified the most important characteristics that might influence the maturity of the DW solution and we

introduced the maturity assessment questions. We will continue with the second category in our model –

the DW organization and processes – in the next chapter.

Page 66: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 57 -

5 DW Organization and Processes

When assessing the maturity of a DW technical solution, the processes and roles involved in the project

also need to be analyzed. A good technical solution cannot be developed without the processes

surrounding it as there is a strong interconnection between the two parts. It is more probable that an

organization with standardized processes and formalized development roles will develop a better DW

solution. At the same time, an organization cannot improve its processes without having some experience

with previous DW projects. Therefore, in this chapter we will take a closer look at the second part of the

DW maturity assessment questionnaire, the one regarding the organizational roles and processes

necessary to develop and maintain a DW solution.

5.1 DW Development Processes

A DW solution can be considered a software engineering project with some specific characteristics. And,

therefore, as any software engineering project, it will go through several development stages (Moss &

Atre, 2003). There have been several models or paradigms of software development defined in literature

and applied in practice. Some of the most known ones are: the waterfall model, spiral development,

iterative and incremental development, agile development, etc. For an overview on these models, see

(Sommerville, 2007).

Since DW/BI is an enterprise-wide evolving environment that is continually improved and enhanced

based on feedback from the business community, the best approach for its development is the iterative

and incremental development (Kimball et al., 2008; Ponniah, 2001). Due to its complexity, the approach

for a DW project has to include iterative tasks going through cycles of refinement. (Kimball et al., 2008)

also suggest that agile techniques fit best with the development of BI applications. Designing and

developing the analytic reports and analyses involve unpredictable, rapidly changing requirements. The

BI team members need to work in close proximity to the business, so that they can be readily available

and responsive in order to release incremental BI functionality in a matter of weeks. However, one size

seldom fits all, and therefore, it is important for organizations to be able to address the right methodology

for each DW layer.

Maturity Assessment Question(s)

As it is hard to judge which software development paradigm is better and more mature, the first maturity

question on development processes is a more general one and it refers to how the DW development

processes map to the CMM levels: whether they are done ad-hoc or they are standardized. And if they are

standardized, it is important to know if they are measured against defined goals and continuously

improving.

1) Which answer best describes the DW development processes in your organization?

a) Level 1 – Ad-hoc development processes; no clearly defined development phases (i.e.: planning,

requirements definition, design, construction, deployment, maintenance)

b) Level 2 – Repeatable development processes based on experience with similar projects; some development

phases clearly separated

c) Level 3 – Standard documented development processes; iterative and incremental development processes

with all the development phases clearly separated

Page 67: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 58 -

d) Level 4 – Development processes continuously measured against well-defined and consistent goals

e) Level 5 – Continuous development process improvement by identifying weaknesses and strengthen the

process proactively, with the goal of preventing the occurrence of defects.

Table 29: DW Development Processes General Maturity Assessment Question.

5.1.1 DW Development Phases

No matter of the chosen DW development model, a lifecycle approach is needed in order to accomplish

all the major objectives in the system development process (Ponniah, 2001). A DW system consists of

numerous tasks, technologies, and team member roles. It is not enough to have the perfect data model or

best-of-breed technology. The many facets of a DW project need to be coordinated and the lifecycle

approach can do that by breaking down the project complexity and enforcing orderliness and a systematic

approach to building the DW (Kimball et al., 2008). However, a one-size fits all lifecycle approach will

not work for a DW project. The lifecycle approach has to be adapted to the special needs of the

organization‘s DW project. But, no matter of the situational factors, the main high level phases and tasks

required for an effective DW implementation are (Kimball et al., 2008; Moss & Atre, 2003):

Project planning and management

Requirements definition

Design

Development

Testing and acceptance

Deployment/production

Growth, maintenance and monitoring.

As the DW environment is continuously changing and improving, the first six phases are usually

considered to be project-based, whereas the maintenance and monitoring should be done on an ongoing

basis. However, many authors report that even today, software organizations do not have any defined

processes for their software maintenance activities (April et al. 2004). (van Bon, 2000) confirms the lack

of process management in software maintenance and that it is a mostly neglected area. Traditionally,

maintenance has been depicted as the final activity of the software development process (Schneidewind,

1987). (Bennett, 2000) has a historical view of this problem, tracing it back to the beginning of the

software industry when there was no difference between software development and software

maintenance. But, starting with the 1980s, software maintenance began to be treated as a sequence of

activities and not as the final stage of a software development project. Several standards were developed

especially for software maintenance and nowadays, many organizations make this distinction between the

development phases and the maintenance and monitoring processes. This is also the reason why we make

this distinction, especially that in a DW project, maintenance and monitoring activities take a lot of time

and effort. Therefore, we will elaborate on the first six phases in this section and will continue with the

maintenance and monitoring activities in the DW service processes part.

Page 68: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 59 -

5.1.1.1 Project Planning and Management

One of the reasons why so many DW projects fail is improper project planning and inadequate project

management (Ponniah, 2001). DW project planning is not a one-time activity. Since project plans are

usually based on estimates, they must be adjusted constantly. A solid and, at the same time, flexible DW

project plan could be the foundation for a successful DW initiative. Project planning usually consists of

several important activities (Lewis, 2001): create a work breakdown structure listing activities, tasks, and

subtasks; estimate time, cost and resource requirements; determine the critical path based on the task and

resource dependencies; create the detailed project plan. As a DW project is very complex and many risks

can affect its development, a very important step here is project risk management. It involves three main

activities: identify possible risks and threats; quantify threats and risks by assigning a risk priority

number; and develop contingency plans to deal with risks that cannot be ignored.

However, just planning the project is not enough for a successful DW implementation. The project also

needs to be managed during its development. First, the DW project officially begins with the project

kickoff meeting in order to get the entire project team on the same page in terms of where the project

stands and where it plans to go (Kimball et al., 2008). Once the project has started, the project status must

be regularly monitored (Lewis, 2001). The DW project lifecycle requires the integration of numerous

resources and tasks that must be brought together at the right time to achieve success. Monitoring project

status is key to achieving this coordination. Another important problem in project management is the

management of scope changes. This is usually done by adopting issue tracking or change management

methodologies.

Of course, throughout the project, a lot of changes might happen and this is why it is a good idea to

maintain the project plan by updating and evaluating it periodically. Moreover, consolidated project

documentation will help ease the burden keeping pace with the unending nature of the DW project.

Documenting project assumptions and decision points is also helpful in the event that the deliverables do

not meet expectations. However, many organizations ignore the importance of documentation, and if time

pressures mount, it will be the first item to be eliminated. Finally, in order to learn from previous

mistakes, projects and project management should always be reviewed and evaluated. This will offer

some lessons learned that will determine the same mistakes to be avoided in the future (Lewis, 2001).

Maturity Assessment Question(s)

As explained in this section, project planning and management is crucial for a DW project success. This

is why we created a maturity question regarding this part of the development processes. We included in

the answers the most important aspects: project planning and scheduling; project risk management;

project tracking and control; documentation; and evaluation and assessment. Therefore, an organization

which does not have any of these activities implemented is on the first level of maturity, whereas one that

takes care of all these activities is on the highest level of maturity regarding project planning and

management.

2) Which answer best describes your DW project management?

a) Level 1 – Project planning and scheduling: no; project risk management: no; project tracking and control:

no; standard and efficient procedure and documentation, evaluation and assessment: no

b) Level 2 – Project planning and scheduling: yes; project risk management: no; project tracking and control:

Page 69: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 60 -

no; standard and efficient procedure and documentation, evaluation and assessment: no

c) Level 3 – Project planning and scheduling: yes; project risk management: no; project tracking and control:

yes; standard and efficient procedure and documentation, evaluation and assessment: no

d) Level 4 – Project planning and scheduling: yes; project risk management: yes; project tracking and control:

yes; standard and efficient procedure and documentation, evaluation and assessment: no

e) Level 5 – Project planning and scheduling: yes; project risk management: yes; project tracking and control:

yes; standard and efficient procedure and documentation, evaluation and assessment: yes.

Table 30: Project Management Maturity Assessment Question.

5.1.1.2 Requirements Definition

In a DW, users‘ business requirements represent the most powerful driving force (Ponniah, 2001) as they

impact virtually every aspect of the project. Also, as end users alone are able to define the business goals

of the DW systems correctly, they should be enabled to specify information requirements by themselves

(Hansen, 1997).

The DW environment is an information delivery system where the users themselves will access the DW

repository and create their own outputs. It is therefore extremely important that the DW should contain

the right elements of information in the most optimal formats in order for the users to get the results they

want. Every task that is performed in every phase in the development of the DW is determined by the

requirements. Every decision made during the design phase is totally influenced by the requirements.

Because requirements form the primary driving force for every phase of the development process, special

attention needs to be paid to the requirements definition phase in order to make sure that it contains

adequate details to support each phase.

Requirements are usually gathered from the user community using two basic interactive techniques

(Kimball et al., 2008; Moss & Atre, 2003; Ponniah, 2001):

interviews – they are conducted with individuals or small groups (i.e.: two or three persons at a

time) and they represent a good approach when details are intricate.

facilitated sessions – they are larger group sessions of ten to twenty people led by a facilitator and

they more appropriate after getting a baseline understanding of the requirements,

but useful information can also be extracted from the review of existing documentation from the user and

IT departments.

Another important aspect which is often neglected in the requirements definition phase is formal

documentation (Kimball et al., 2008; Ponniah, 2001) which is essential for several reasons. First, the

requirements definition document is the foundation for the next phases and it becomes the encyclopedia

of reference material as resources are added to the DW team. If project team members have to leave the

project for any reason at all, the project will not suffer from people walking away with the knowledge

they have gathered. Second, documentation helps the team to crystallize and better understand the

interview content. Finally, formal documentation will also validate the findings when reviewed with the

users.

Page 70: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 61 -

Maturity Assessment Question(s)

As shown in this paragraph, the requirements definition phase is very important for the DW environment

and special attention should be paid to it. A solid requirements definition follows a standard methodology

and has a formal requirements document. Also, even if not usually done, causal analysis meetings to

identify common bottlenecks causes in this step and subsequent elimination of these causes could be very

beneficial for the DW development process.

3) Which answer best describes the requirements definition phase for your DW project?

a) Level 1 – Ad-hoc requirements definition; no methodology used

b) Level 2 – Methodologies differ from project to project; interviews with business users for collecting the

requirements

c) Level 3 – Standard methodology for all the projects; interviews and group sessions with both business and

IT users for collecting the requirements

d) Level 4 – Level 3) + qualitative assessment and measurement of the phase; requirements document also

published

e) Level 5 – Level 4) + causal analysis meetings to identify common bottlenecks causes and subsequent

elimination of these causes.

Table 31: Requirements Definition Maturity Assessment Question.

5.1.1.3 Design/ Development/ Testing and Acceptance/ Deployment

Once the business requirements are gathered and defined, the DW team can continue with designing the

data model and the physical database, designing and developing the ETL and the BI applications. Then,

the developed DW with all its components needs to be tested, accepted by both the technical and business

parts, and finally, the system can be deployed or put into production. We will give a short overview on

each of these phases further in this paragraph and then present the maturity questions regarding this part

of the DW development processes.

Design

The design phase refers to designing the data models with all three levels (i.e.: conceptual, logical,

physical), the ETL and the BI applications. Most of the aspects related to the design phase were already

mentioned in the DW technical solution part where we elaborated on each technical component.

However, several things regarding the processes can be added here.

First, the data modelling process itself starts during the business requirements activity when the

preliminary requirements definition document is created. Based on this, the design team will first develop

a high level conceptual model, and then continue with the logical and physical data models. What is

important in this process is to remember that the data modelling is an iterative process and to have a

preparation period beforehand which includes activities such as: identify the roles and participants

required, review the business requirements document, set up the modelling environment, develop

standards and obtain appropriate facilities and supplies. The design of ETL and BI applications also

involves several activities in order to be successful: create a plan and documentation, do some resource

planning, develop default strategies and standards (Kimball et al., 2008).

Page 71: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 62 -

Development

The development phase includes the building of the physical databases and the actual implementation of

ETL and BI applications (Kimball et al., 2008). The physical databases are built when the data definition

language (DDL) is run against the database management system (DBMS). ETL programs must be

developed for the two sets of load processes: one-time historic load and incremental load. If DBMS load

utility is used to populate the BI target databases, then only the extract and transformation programs need

to be written, including the programs that create the final load files. If an ETL tool is used, the

instructions (i.e.: technical metadata) for the ETL tool must be created. BI applications development

involves using the front end tool building environment and writing the programs and scripts for the

reports, queries, front-end interface, and online help function (Moss & Atre, 2003).

Testing and Acceptance

The DW system is a complex software project that needs to be tested extensively before put in

production. However, even if testing is critical for DW success, many organizations underestimate the

importance and the time needed for these tasks. The most important activities during this step are

(Golfarelli & Rizzi, 2009; Kimball et al., 2008; Moss & Atre, 2003):

unit testing – all ETL modules and BI applications must be unit tested to prove that they compile

without errors, but also to verify that they perform their functions correctly, to trap all potential

errors, and to see if they produce the right results. It is also recommended that a different role

than the developer should do this unit testing.

integration and regression testing – once all the individual ETL modules and BI applications

have been unit tested, the entire system needs to be tested. This is done with integration testing on

the first release and with regression testing on subsequent releases. In this way, the completely

integrated system can be verified whether it meets its requirements or not. Regression testing

focuses on finding defects after a major change has occurred and it uncovers all the test results

that deviate from the correct answers.

performance testing – a performance test will indicate whether the system performs well both for

loads and queries and reports.

acceptance testing – acceptance tests are done by the users of the DW in order to verify that the

system meets the mutually agreed-upon requirements. The acceptance tests include the validation

of the ETL process, but more importantly for the end users, they should determine the overall

usability of the BI applications and whether the returned results are the desired ones. In order for

these tests to be effective, users‘ training is usually done beforehand.

Besides doing these activities, it is also important to formalize and follow a standard procedure for the

testing and acceptance phase. In this way, it would be much easier to keep track of the tests and their

results and, at the same time, evaluate the testing and acceptance phase (Kimball et al., 2008).

Deployment (Production)

The last step in order to finish the DW implementation is to deploy it by transferring the DW from testing

to production. The first deployment is the easiest one. After this, things get a little bit more complicated

Page 72: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 63 -

as any modifications to the system should be accomplished with minimal disruption to the business user

community. For more details on DW deployment techniques, see (Kimball et al., 2008).

Maturity Assessment Question(s)

As a lot of aspects regarding the design phase were analyzed in the DW technical part, and it is difficult to

do a high level assessment for the development and deployment phases, we decided to assess the testing

and acceptance phase, which is a critical one for DW success. It will show the main activities involved in

this phase and offer the possibility for the organization to choose the ones implemented by them. The

question will be scored through normalization as further explained in the expert evaluation chapter.

4) Which of the following activities are included in the testing and acceptance phase for your DW project?

a) Unit testing by another person

b) System integration testing

c) Regression testing

d) User training

e) Acceptance testing

f) Standard procedure and documentation for testing and acceptance

g) External assessments and reviews of testing and acceptance.

Table 32: Testing and Acceptance Maturity Assessment Question.

Development/ Testing/ Acceptance/ Production Environments

To support all the phases presented in this paragraph, organizations usually set up different environments

for different purposes (Moss & Atre, 2003):

The development environment, where the programs and scripts are written and tested by the

developers.

The testing environment, where the DW system with all its components is tested.

The acceptance environment, where the users do acceptance tests.

The production environment, where the DW actually runs after being rolled out.

The implemented environments can influence the quality and performance of the DW. While smaller

organizations may have only two environments (i.e.: development and production), others usually have at

least three different environments. Another important aspect is the way the migration between

environments is done: manually or automatically, the latter being of course the optimum one.

Maturity Assessment Question(s)

The maturity question chosen for this aspect is straightforward and self-explanatory after reviewing the

arguments mentioned above. As standards are crucial for data warehousing, we also did an assessment for

the

5) To what degree is there a separation between the development/test/acceptance/deployment environments in

your organization?

a) Very low – no separation between environments

b) Low – two separate environments (i.e.: usually development and production) with manual transfer between

them

c) Moderate – some separation between environments (i.e.: at least three environments) with manual transfer

Page 73: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 64 -

between them

d) High – some separation between environments (i.e.: at least two environments) with automatic transfer

between them

e) Very high – all the environments are distinct with automatic transfer between them.

6) To what degree has your organization defined and documented standards for developing, testing and deploying

DW functionalities (i.e.: ETL and BI applications)?

a) Very low – no standards defined

b) Low – few standards defined

c) Moderate – some standards defined

d) High – a lot of the standards defined

e) Very high – a comprehensive set of standards defined

7) To what degree has your organization implemented standards for developing, testing and deploying DW

functionalities (i.e.: ETL and BI applications)?

a) Very low – no standards implemented

b) Low – few standards implemented

c) Moderate – some standards implemented

d) High – a lot of the standards implemented

e) Very high – a comprehensive set of standards implemented.

Table 33: Development/ Testing/ Acceptance/ Production Maturity Assessment Questions.

5.1.2 The DW/BI Sponsor

As already mentioned, strong support and sponsorship from senior business management is critical for a

successful DW initiative. However, many organizations seem to overlook this aspect and ignore its

importance. No other venture unifies the information view of the entire corporation as the corporation‘s

DW does. The entire organization is involved and positioned for strategic advantage. Therefore, it is

important to have sponsorship from the highest levels of management to keep focus and satisfy

conflicting requirements (Ponniah, 2001). The DW sponsor needs to be more than an IT project manager

or IT director. Effective business sponsors share several characteristics (Kimball et al., 2008). First, they

have a vision for the potential impact of a DW/BI solution and can visualize how improved access to

information will result in incremental value to the business. Second, strong business sponsors are

influential leaders within the organization and they are demanding, but at the same time, realistic and

supportive. It is important that they have a basic understanding of DW/BI concepts, including the iterative

development cycle to avoid unrealistic expectations. Effective sponsors are able to deal with short-term

problems and project setbacks and they are willing to compromise.

Maturity Assessment Question(s)

All this being said, some conclusions can be drawn:

The DW project sponsor needs to be from the business department.

It is better to have multiple strong sponsors within the organization.

The best sponsorship involves business-driven, cross-departmental sponsorship including top

level management. Therefore, the DW/BI initiative is integrated in the company‘s strategy and

processes with continuous support and budget.

Therefore, the maturity question derived from these conclusions is:

8) Which answer best describes the sponsor for your DW project?

Page 74: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 65 -

a) Level 1 – No project sponsor

b) Level 2 – Chief information officer (CIO) or an IT director

c) Level 3 – Single sponsor from a business unit or department

d) Level 4 – Multiple individual sponsors from multiple business units or departments

e) Level 5 – Multiple levels of business-driven, cross-departmental sponsorship including top level

management sponsorship (BI/DW is integrated in the company process with continuous budget).

Table 34: DW/BI Sponsorship Maturity Assessment Question.

5.1.3 The DW Project Team and Roles

As in any type of project, the success of a DW project also depends on the project team. A DW project is

similar to other software projects in that it is human-intensive. It takes several trained and especially

skilled persons to form the project team. Two of the factors that can break a project are: complexity

overload and responsibility ambiguity. But, the bad influence of these factors can be overcome by putting

the right person in the right job (Ponniah, 2001). Therefore, organizing the project team for a DW project

has to do with matching diverse roles and responsibilities with proper skills and levels of experience. A

DW project requires a number of different roles and skills from both the business and IT communities

during its lifecycle. The main roles refer to (Kimball et al., 2008; Ponniah, 2001):

Sponsorship and management (e.g.: business sponsor, project manager, etc.)

Development roles (e.g.: business analyst, data steward, data quality analyst, data modeler,

metadata manager, ETL architect, ETL developer, BI architect, BI developer, technical architect,

security manager, DW tester, etc.)

Monitoring and maintenance roles (e.g.: help desk, operations manager, etc.)

However, there is seldom a one-to-one relationship between roles and individuals. It does not really

matter whether a person fills multiple roles on the DW project. What really matters is to have these roles

and responsibilities formalized and actually implemented. It is also important to do periodic evaluation

and assessments of the performance of roles in order to check for training requirements and solve skill-

role mismatches (Humphries et al., 1999; Nagabhushana, 2006).

Maturity Assessment Question(s)

As it is difficult to say whether a team with more roles is more mature than one with less roles, we will

assess here whether the role definition and implementation has been done. Besides this, a company on a

higher level of maturity would also do periodic assessment and evaluation of roles.

9) Which answer best describes the role division for the DW development process?

a) Level 1 – No formal roles defined

b) Level 2 – Defined roles, but not technically implemented

c) Level 3 – Formalized and implemented roles and responsibilities

d) Level 4 – Level 3) + periodic peer reviews (i.e.: review of each other‘s work)

e) Level 5 – Level 4) + periodic evaluation and assessment of roles (i.e.: assess the performance of the roles

and match the needed roles with responsibilities and tasks).

Table 35: DW Project Team and Roles Maturity Assessment Question.

Page 75: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 66 -

5.1.4 DW Quality Management

The purpose of DW Quality Management is to provide management with appropriate visibility into the

development process being used by the DW project and of the products being built. Organizations usually

start by doing DW development quality assurance. This involves reviewing and auditing the data

warehousing products and activities to verify that they comply with the applicable procedures and

standards and providing the project and other appropriate managers with the results of these reviews and

audits. In time, organizations learn how to manage this and implement DW quality management. This

involves defining quality goals for the DW products and processes, establishing plans to achieve these

goals, and monitoring and adjusting the plans, products, activities, and quality goals to satisfy the needs

and desires of the customer and end user (Paulk et al., 1995).

Maturity Assessment Question(s)

The maturity assessment question and the characteristics specific for each stage can be depicted in the

table below.

10) Which answer best describes the DW quality management?

a) Level 1 – No quality assurance activities

b) Level 2 – Ad-hoc quality assurance activities

c) Level 3 – Standardized and documented quality assurance activities done for all the development phases

d) Level 4 – Level 3) + measurable and prioritized goals for managing the DW quality (e.g.: functionality,

reliability, maintainability, usability)

e) Level 5 – Level 4) + causal analysis meetings to identify common defect causes and subsequent elimination

of these causes; service quality management certification.

Table 36: DW Quality Management Maturity Assessment Question.

5.1.5 Knowledge Management

Knowledge management (KM) is an emerging discipline that promises to capitalize on organization‘s

intellectual capital. KM implementation and use has rapidly increased since the 1990s as more and more

companies understood the importance of the knowledge each individual possesses and can systematically

share with an organization (Rus & Lindvall, 2002). KM is ―the practice of adding actionable value to

information by capturing tacit knowledge and converting it to explicit knowledge; by filtering, storing,

retrieving and disseminating explicit knowledge; and by creating and testing new knowledge‖ (Nemati et

al., 2002). Explicit knowledge, also known as codified knowledge, is expressed knowledge. It

corresponds to the information and skills that employees can easily communicate and document, such as

processes, templates and data. Tacit knowledge is personal knowledge that employees gain through

experience; this can be hard to express and is largely influenced by their beliefs, perspectives and values

(Nonaka, 1991).

DW development is a quickly changing, knowledge-intensive process involving people working in

different phases and activities. Therefore, knowledge in data warehousing is diverse and an improved use

of this knowledge is the basic motivation for KM in this field. KM is equally important for both the DW

Page 76: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 67 -

development processes and service processes. The general knowledge evolution cycle which defines the

phases of organizational knowledge can also be applied for the specific field of DW (Agresti, 2000):

originate / create knowledge – members of the DW team develop knowledge through learning,

problem solving, innovation, creativity, and importation from outside sources.

create / acquire knowledge – members acquire and capture information about knowledge in

explicit forms.

transform / organize knowledge – knowledge is organized, transformed or included in written

material and knowledge bases.

deploy / access knowledge – knowledge is distributed through education, training and mentoring

programmes, automated knowledge-based systems or expert networks.

apply knowledge – the organization‘s ultimate goal is applying the knowledge – this is the most

important part of the life cycle. KM aims to make knowledge available whenever it is needed.

In order to implement these phases systematically and successfully, it is very important for organizations

to have a centralized KM strategy in place and not do everything ad-hoc (Rus & Lindvall, 2002).

Maturity Assessment Question(s)

(Klimko, 2001) proposed a KM maturity model based on CMM. By summarizing the characteristics

provided by him for each maturity stage and also the information from the knowledge evolution cycle, we

came up with the following maturity assessment question. The same maturity assessment is also done for

the implementation of KM for Service Processes.

11) Which answer best describes the knowledge management in your organization for the DW development

processes?

a) Level 1 – Ad-hoc knowledge gathering and sharing

b) Level 2 – Organized knowledge sharing through written documentation and technology (e.g.: knowledge

databases, intranets, wikis, etc.)

c) Level 3 – Knowledge management is standardized; knowledge creation and sharing through brainstorming,

training and mentoring programs, and also through the use of technology

d) Level 4 – Central business unit knowledge management; quantitative knowledge management control and

periodic knowledge gap analysis

e) Level 5 – Continuously improving inter-organizational knowledge management.

Table 37: Knowledge Management Maturity Assessment Question.

5.2 DW Service Processes

As already mentioned in the previous paragraph, in the last two decades, software maintenance began to

be treated as a sequence of activities and not as the final stage of a software development project (April et

al., 2004). Several standards and models have been developed especially for software maintenance and

nowadays, more and more organizations make this distinction between the development phases and the

maintenance and monitoring processes. These processes are very important after a DW has been deployed

in order to keep the system up and running and to manage all the necessary changes. Software

maintenance is defined as (IEEE, 1990): ―The process of modifying a software system or component after

delivery to correct faults, improve performance or other attributes, or adapt to a changed environment‖.

Page 77: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 68 -

5.2.1 From Maintenance and Monitoring to Providing a Service

In the last couple of years, IT organizations made a transition from being pure technology providers to

being service providers. This requires taking a different perspective on IT management, called IT Service

Management (ITSM). ITSM puts the services delivered by IT at the center of IT management and it is

commonly defined as (Young, 2004): ―a set of processes that cooperate to ensure the quality of live IT

services, according to the levels of service agreed to by the customer.‖

This service oriented perspective on IT organizations can be best applied to the software maintenance

field as it is an ongoing activity as opposed to the software development which is more project based.

Therefore, software maintenance can be seen as providing a service, whereas software development is

concerned with the delivery of products (Niessink & van Vliet, 2000). Consequently, customers will

judge the quality of software maintenance differently from that of software development. In particular,

service quality is assessed on two dimensions: the technical quality – what the result of the service is –

and the functional quality – how the service is delivered. This means that in order to provide high-quality

software maintenance, different and additional processes are needed than provided by a high-quality

software development organization (Niessink & van Vliet, 2000).

In order to have a clearer image on what a ―service‖ means, we can take a look at the service marketing

literature where a wide range of definitions exists of what a service entails. Usually, a service is defined

as an essentially intangible set of benefits or activities that are sold by one party to another (Grönroos,

1990). The main differences between products and services are (Zeithaml, 1996; van Bon, 2007):

intangibility, heterogeneity, simultaneous production and consumption, perishability. However, the

difference between products and services is not clear-cut and they can sometimes be intertwined.

If we turn to the software engineering domain, we see that a major difference between software

development and software maintenance is the fact that software development results in a product, whereas

software maintenance results in a service being delivered to the customer. All types of maintenance are

concerned with activities aimed at keeping the system usable and valuable for the organization. Hence,

software maintenance has more service-like aspects than software development, because the value of

software maintenance is in the activities that result in benefits for the customers, such as corrected faults

and new features. This is in contrast with software development, where the development activities do not

provide benefits for the customer, but instead it is the resulting software system that provides the benefits

(Niessink & van Vliet, 2000). As DW can also be considered a software engineering project, the same

concepts can be applied here as well. Also, as said above, the difference between products and services is

not clear-cut and, consequently, this also goes for software development and software maintenance.

5.2.2 IT Service Frameworks

Over the years, various IT service frameworks have been proposed: Information Technology

Infrastructure Library (ITIL), BS 15000, HP ITSM Reference Model, Microsoft Operations Framework

(MOF), IBM‘s Systems Management Solution Lifecycle (SMSL). However, in the ITSM landscape, ITIL

acts as the de-facto standard for the definition of best practices and processes that pertain to the

disciplines of service support and service delivery (Salle, 2004). BS 15000 extends ITIL, but at the same

time it is tightly integrated with ITIL. The other frameworks extend and refine ITIL, sometimes with

Page 78: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 69 -

guidelines specific to the referenced technologies. Therefore, we will consider the service components

from ITIL as a starting point for our analysis of the DW Service Processes part. Moreover, two maturity

models related to IT maintenance and service also served as a foundation for developing this part of our

DW maturity model: the Software Maintenance Maturity Model and the IT Service CMM. Inspired by

other maturity models, they include several maturity stages and key process areas. An overview is

depicted in table 38. A more detailed description of the three models is provided further in this paragraph.

Authors Model Main Idea

Central Computer and

Telecommunications Agency

(CCTA) (1989)

Technology Infrastructure Library

(ITIL)

service delivery processes and service

support processes and functions

Niessink, Clerc & van Vliet

(2002)

IT Service CMM key practices intended to cover the

activities needed to reach a certain

level of service maturity while

preserving a structure similar to CMM

April, Hayes, Abran & Dumke

(2004)

Software Maintenance Maturity Model

(SMmm)

unique activities of software

maintenance while preserving a

structure similar to that of the CMMi

Table 38: Overview of IT Service Frameworks.

ITIL

ITIL was established in 1989 by the United Kingdom‘s former Central Computer and

Telecommunications Agency (CCTA) to improve its IT organization. ITIL consists of an inter-related set

of best practices for lowering the cost, while improving the quality of IT services delivered to users. It is

organized around six key areas: service support, service delivery, business perspective, application

management, infrastructure management, security management and planning to implement service

management. However, the core of ITIL comprises of five service delivery processes and five service

support processes and one service support function (service desk). Service support processes apply to the

operational level of the organization (i.e.: all aspects associated with the daily activities of IT service and

maintaining the related processes), whereas the service delivery processes are tactical in nature (i.e.: the

processes required for planning and delivery of quality services over the long term, with a goal of

continual improvement of those services). An overview on ITIL‘s core components can be viewed in the

table below. For more information about ITIL, see (Colin, 2004).

Service Support Service Delivery

Service Desk Service Level Management

Incident Management Financial Management

Problem Management Capacity Management

Change Management IT Service Continuity Management

Release Management Availability Management

Configuration Management

Table 39: ITIL‘s Core Components (adapted from (Cater-Steel, 2006)).

Software Maintenance Maturity Model (SMmm)

The SMmm was designed as a customer-focused reference model for either:

Page 79: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 70 -

Auditing the software maintenance capability of a software maintenance service supplier or

outsourcer, or

Improving internal software maintenance organizations.

The model has been developed from a customer perspective, as experienced in a competitive, commercial

environment. A higher capability in the SMmm context means better services delivered for customer

organizations and increased service performance for the maintenance organizations.

The SMmm is based on the Capability Maturity Model Integration (CMMi), version 1.1 [sei02] and

Camélia model [Cam94]. The model must be viewed as a complement to the CMMi, especially for the

processes that are common to developers and maintainers. The architecture of the SMmm differs slightly

from that of the CMMi version. The most significant difference is the inclusion of:

A roadmap category to further define the key process areas (KPAs);

Detailed references to papers and examples on how to implement the practices.

The SMmm includes four process domains (i.e.: software maintenance process management, software

maintenance request maintenance, software evolution engineering, support to software evolution

enginerring), several KPAs, roadmaps and practices. While some KPAs are unique to maintenance, others

were derived from the CMMi and other models, and have been modified slightly to map more closely to

daily maintenance characteristics. For more details on the SMmm, see (April et al., 2004).

IT Service CMM

The IT Service CMM is based on the CMM, but it is adapted to the service processes. The model consists

of five maturity levels which contain KPAs. For an organization to reside on a certain maturity level, it

needs to implement all of the KPAs for that level and those of lower levels. An overview of the KPAs

assigned to each maturity level can be seen in table 40.

Level Key Process Area

Initial Ad hoc processes

Repeatable Service Commitment Management

Service Tracking and Oversight

Subcontract Management

Service Delivery Planning

Event Management

Configuration Management

Service Quality Assurance

Defined Organization Service Definition

Organization Process Definition

Organization Process Focus

Integrated Service Management

Service Delivery

Resource Management

Training Programme

Intergroup Coordination

Problem Management

Managed Quantitative Process Management

Service Quality Management

Page 80: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 71 -

Optimizing Process Change Management

Technology Change Management

Problem Prevention

Table 40: IT Service CMM‘s Key Process Areas (adapted from (Paulk et al., 1995)).

The objective of the IT Service CMM is twofold:

to enable IT service providers to assess their capabilities with respect to the delivery of IT

services, and

to provide IT service providers with directions and steps for further improvement of their service

capability.

There are a number of characteristics of the IT Service CMM that are important for understanding its

nature. The main focus of the model is the complete service organization, the scope of the model

encompasses all service delivery activities (i.e.: those activities which are key to improving the service

delivery capability of service organizations), the model is strictly ordered (i.e.: the key process areas are

assigned to different maturity levels in such a way that lower level processes provide a foundation for the

higher level processes), and the model is a minimal model in different senses (i.e.: the model only

prescribes the key processes and activities that are needed to reach a certain maturity level and it does not

show how to implement them, what organization structure to use, etc.). For a broader image on the IT

Service CMM, see (Niessink & van Vliet, 1999).

5.2.3 DW Service Components

Now that we have given a short overview on the most important frameworks related to IT service, we can

present the main elements we chose for our DW service processes maturity assessment. Once the DW

project has been deployed, ongoing maintenance and monitoring work is required to keep the DW system

operating in great shape. As the scope of the maintenance and monitoring activities in the DW extend

over many features and functions, it is important to have a plan and do these activities in a formalized

manner. The results of this phase offer the data needed to plan for growth and to improve performance.

The most important activities involved in the DW maintenance and monitoring are (Kimball et al., 2008;

Ponniah, 2001):

collection of statistics regarding the utilization of the hardware and software resources (e.g.:

memory management, physical disk storage space utilization, processor usage, report usage,

number of completed queries by time slots during the day, time each user stays online with the

data warehouse, total number of distinct users per day, etc.)

user support

BI applications maintenance and monitoring

security administration

performance monitoring and tuning

data reconciliation and data growth management

ETL monitoring and management

resource monitoring and management

infrastructure management

Page 81: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 72 -

backup and recovery management, etc..

Maturity Assessment Question(s)

As can be seen, the DW software maintenance and monitoring involves a lot of activities, but it is critical

to include at least the most important ones. This is the reason why we developed a high level maturity

question regarding DW software maintenance and monitoring processes. The question is a multiple

choice one where the answers are the main activities included in this part of the DW solution. It will be

scored similar through normalization similar to the question for the testing and acceptance phase.

However, this question was not included in the questionnaire for the first case study and thus, will not be

taken into consideration when doing the scoring for the case studies.

1) Which of the following activities are included in the maintenance and monitoring phase for your DW project?

a) Collection of statistics regarding the utilization of the hardware and software resources (e.g.: memory

management, physical disk storage space utilization, processor usage, BI applications usage, number of

completed queries by time slots during the day, time each user stays online with the data warehouse, total

number of distinct users per day, etc.)

b) BI applications maintenance and monitoring

c) User support

d) ETL monitoring and management

e) data reconciliation and data growth management

f) Security administration

g) Resource monitoring and management

h) Infrastructure management

i) Backup and recovery management

j) Performance monitoring and tuning.

Table 41: Maintenance and Monitoring Maturity Assessment Question.

As already presented in the previous paragraphs, DW maintenance and monitoring are more and more

often considered to be service processes as they are offered on an ongoing basis to the customers. From

the presented IT service frameworks, it can be seen that some elements appear in more than one model or

some elements from one model can be mapped to elements from another one. If we also take into

consideration the changing nature of a DW and all the aspects that DW maintenance and monitoring

processes entail, we decided to consider the following components when assessing the maturity of DW

service processes: service quality management, service level management, incident management, change

management, technical resource management, availability management, release management and

knowledge management. Each of these elements and the correspondent maturity assessment question(s)

will be further elaborated on in this paragraph.

5.2.3.1 Service Quality Management

The purpose of Service Quality Management is to provide management with the appropriate visibility into

the processes being used and the services being delivered. This process entails service quality assurance

activities which involve the reviewing and auditing of working procedures, DW service delivery activities

and work products to see that they comply with applicable standards and procedures. Management and

relevant groups are provided with the results of the reviews and audits. An organization with experience

Page 82: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 73 -

in service processes also develops a quantitative understanding of the quality of the services delivered in

order to achieve specific quality goals (Niessink & van Vliet, 1999). If these goals are not reached, causal

analysis meetings should be held to identify the defect causes and subsequently eliminate them. In order

to get better results, many organizations with a high maturity in DW service delivery also try to obtain

external service quality certification (e.g.: ISO certification, etc.).

Maturity Assessment Question(s)

Therefore, an organization on the first maturity stage will not have any service quality management

activities; whereas one on the highest maturity level will have not only a standard procedure, but also

quantitative service quality evaluation and causal analysis meetings. Their purpose is to identify common

defect causes and try to eliminate them in the future; in this way continuous service quality management

improvement is achieved.

2) Which answer best describes the DW service quality management in your organization?

a) Level 1 – No service quality management activities

b) Level 2 – Ad-hoc service quality management

c) Level 3 – Proactive service quality management including a standard procedure

d) Level 4 – Level 3) + service quality measurements periodically compared to the established goals to

determine the deviations and their causes

e) Level 5 – Levels 4) + causal analysis meetings to identify common defect causes and subsequent

elimination of these causes; service quality management certification.

Table 42: Service Quality Management Maturity Assessment Question.

5.2.3.2 Service Level Management

Service Level Management ensures continual identification, monitoring and reviewing of the optimally

agreed levels of IT services as required by the business. It negotiates service level agreements (SLAs)

with the suppliers and customers and ensures that they are met (Cater-Steel, 2006). It is responsible for

ensuring that all DW service management processes, operational level agreements and underpinning

contracts are appropriate for the agreed service level targets. This is done in close cooperation between

the DW service providers and the customers. Some examples of SLA performance criteria for a DW are:

50 concurrent queries processed with an average query time of no more than five minutes.

Less than four hours of planned downtime per week.

Less than six hours of unplanned downtime per month.

Data refreshed weekly.

The high level activities for Service Level Management are: document customer service needs, implement

SLAs, SLAs reviewed with the customer/supplier on a periodic or event-driven basis, actual service

delivery continuously monitored and evaluated with the customer/supplier (SLAs with penalties).

Page 83: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 74 -

Maturity Assessment Question(s)

From the high level activities, one could say that usually an organization evolves from documenting all

customer/supplier service needs in an ad-hoc manner to using a standard procedure and continuously

monitoring, evaluating and improving the actual service delivery.

3) Which answer best describes the DW service level management in your organization?

a) Level 1 – Customer/supplier service needs documented in an ad-hoc manner; no service catalogue

compiled

b) Level 2 – Some customer/supplier service needs documented and formalized based on previous experience

c) Level 3 – All the customer/supplier service needs documented and formalized according to a standard

procedure into service level agreements (SLAs)

d) Level 4 – SLAs reviewed with the customer/supplier on both a periodic and event-driven basis

e) Level 5 – Actual service delivery continuously monitored and evaluated with the customer/supplier on both

a periodic and event-driven basis for continuous improvement (SLAs including penalties).

Table 43: Service Level Management Maturity Assessment Question.

5.2.3.3 Incident Management

ITIL defines an incident as a deviation for the (expected) standard operation of a system or a service. The

objective of Incident Management is to provide continuity by restoring the service in the quickest way

possible by whatever means necessary (Salle, 2004). Also, a problem is considered in ITIL as a condition

that has been defined, identified from one large incident or many incidents exhibiting common symptoms

for which the cause is unknown (Salle, 2004). As a DW is a very complex system, many incidents and

problems can occur, and therefore, this process is very important. Incidents and problems can arise on the

side of the users or that of the system. Given the frequency of changes in a DW, many complex problems

are likely to occur very often. The objective of Incident and Problem Management is to provide continuity

by restoring the service as quickly as possible and to prevent and minimize the impact of incidents. This

is why it is critical to have a solid Incident and Problem Management that also needs to be in a close

relationship with Change Management. The high level activities for Incident Management are: detection,

recording, classification, investigation, diagnosis, resolution and recovery.

Maturity Assessment Question(s)

4) Which answer best describes the DW incident management in your organization?

a) Level 1 – Incident management is done ad-hoc with no specialized ticket handling system or service desk

to assess and classify them prior to referring them to a specialist

b) Level 2 – A ticket handling system is used for incident management and some procedures are followed, but

nothing is standardized or documented

c) Level 3 – A service desk is the recognized point of contact for all the customer queries; incidents

assessment and classification is done following a standard procedure

d) Level 4 – Level 3) + standard reports concerning the incident status including measurements and goals

(e.g.: response time) are regularly produced for all the involved teams and customers; an incident

management database is established as a repository for the event records

e) Level 5 – Level 4) + trend analysis in incident occurrence and also in customer satisfaction and value

perception of the services provided to them.

Table 44: Incident Management Maturity Assessment Question.

Page 84: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 75 -

5.2.3.4 Change Management

Change Management is described as a regular task for immediate and efficient handling of changes that

might occur in a DW environment. The main input to the Change Management process is a request for

change (RFC) (Salle, 2004). This can be done by an outcome of a process relating to Incident and

Problem Management or by extending the service through Service Level Management. The objective of

Change Management is to ensure that standardized methods and techniques are used for efficient and

immediate handling of all the changes to the DW system while minimizing change related incidents. The

changes that can frequently occur in a DW environment concern:

Changes in the contents of the DW.

Changes in the functionality of BI applications.

Changes in a source system with direct implications for ETL, etc.

The high level for Change Management activities are: acceptance and classification, assessment and

planning, authorization of changes, control and coordination, evaluation.

Maturity Assessment Question(s)

As in the case of Incident Management, at first an organization takes care of change requests in an ad-hoc

manner. Then, an electronic change management system is usually introduced for storing and solving the

requests for change and some policies and procedures for change management are beginning to be

established. Once a standard procedure for approving, verifying, prioritizing and scheduling changes is

put in place, organizations start moving towards the high end of the maturity development. And some of

them manage to reach the last maturity stage of continuous improvement of Change Management.

5) Which answer best describes the DW change management in your organization?

a) Level 1 – Change requests are made and solved in an ad-hoc manner

b) Level 2 – A change management system is used for storing and solving the requests for change; some

policies and procedures for change management established, but nothing is standardized

c) Level 3 – A standard procedure is used for approving, verifying, prioritizing and scheduling changes

d) Level 4 – Standard reports concerning the change status including measurements and goals (e.g.: response

time) are regularly produced for all the involved teams and customers; standards established for

documenting changes

e) Level 5 – Trend analysis and statistics regarding change occurrence, success rate, customer satisfaction and

value perception of the services provided to them.

Table 45: Change Management Maturity Assessment Question.

5.2.3.5 Technical Resource Management

The purpose of Resource Management is to maintain control of the necessary hardware and software

resources needed to deliver the agreed DW services level targets (Niessink & van Vliet, 1999). Before

commitments are made to customers, resources are checked. If not enough resources are available, either

the commitments are adapted or extra resources are installed. It also involves monitoring the ETL and BI

applications in order to see if the current resources are enough for the desired DW performance.

Page 85: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 76 -

Maturity Assessment Question(s)

Similar to the other service processes, DW Technical Resource Management also evolves from ad-hoc

activities to resource trend analysis and monitoring to determine the most common bottlenecks and make

sure that there is sufficient capacity to support planned services. The intermediate phases can be depicted

from the answers to the maturity assessment question below.

6) Which answer best describes the DW technical resource management in your organization?

a) Level 1 – Ad-hoc resource management activities (only when there is a problem)

b) Level 2 – Resource management is done following some procedures, but nothing is standardized or

documented

c) Level 3 – Resource management is done constantly following a standardized documented procedure

d) Level 4 – Level 3) + standard reports concerning performance and resource management including

measurements and goals are done on a regular basis

e) Level 5 – Level 4) + resource management trend analysis and monitoring to make sure that there is

sufficient capacity to support planned services.

Table 46: Incident Management Maturity Assessment Question.

5.2.3.6 Availability Management

Availability Management allows organizations to ensure that all DW infrastructure, processes, tools and

roles are according to the SLAs by using appropriate means and techniques. It should also manage risks

that could seriously impact DW services by reducing the risks to an acceptable level and planning for the

recovery of DW services. Availability Management also tries to proactively manage continual

improvement efforts by measuring and tracking metrics for availability, reliability, maintainability,

serviceability, and security (Colin, 2004). In order to have better results, Availability Management that

also needs to be in a close collaboration with Resource Management.

Maturity Assessment Question(s)

The maturity assessment question for Availability Management follows the same structure as the one for

Resource Management as these activities are in close collaboration, but there is a very important

characteristic for the former which can really make a difference: risk assessment. An organization that

follows a standardized procedure for availability management and that also pays serious attention to risk

assessment has a very high change of delivering the agreed service level targets. The maturity question

for this aspect can be seen below.

7) Which answer best describes the availability management in your organization?

a) Level 1 – Ad-hoc availability management

b) Level 2 – Availability management is done following some procedures, but nothing is standardized or

documented

c) Level 3 – Availability management documented and done using a standardized procedure (all elements are

monitored)

d) Level 4 – Level 3) + risk assessment to determine the critical elements and possible problems

e) Level 5 – Level 4) + availability management trend analysis and planning to make sure that all the elements

are available for the agreed service level targets.

Table 47: Availability Management Maturity Assessment Question.

Page 86: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 77 -

5.2.3.7 Release Management

As a DW is continuously changing and evolving over time, organizations need to embrace the release

concept. Any incomplete functionality or necessary change will be bundled in future releases. Therefore,

the objective of Release Management is to ensure that only authorized and correct versions of DW are

made available for operation (Salle, 2004). It can be seen as a collection of hardware, software,

documentation, processes or other components required to implement approved changes to a DW (Cater-

Steel, 2006). In order to have a successful Release Management, it is very important to have a solid

release planning; and to document and follow a standardized procedure for this process. A solid Release

Management also implies standardized release naming and numbering conventions; assigned roles and

responsibilities; and a release database with master copies of previous DW versions.

Maturity Assessment Question(s)

Therefore, the maturity assessment question for this component of the DW service processes is

straightforward and can be seen below.

8) Which answer best describes the release management in your organization?

a) Level 1 – Ad-hoc changes solving and implementation; no release naming and numbering conventions

b) Level 2 – Release management is done following some procedures, but nothing is standardized or

documented; release naming and numbering conventions

c) Level 3 – Release management is documented and done following a standardized procedure; assigned

release management roles and responsibilities

d) Level 4 – Level 3) + standard reports concerning release management including measurements and goals

are done on a regular basis; master copies of all software in a release secured in a release database

e) Level 5 – Level 4) + release management trend analysis, statistics and planning.

Table 48: Release Management Maturity Assessment Question.

5.3 Summary

This chapter has offered a detailed image of the DW organization and processes benchmark variable and

its main sub-categories: DW development processes and DW service processes. Just like in the previous

chapter, we identified the main characteristics for each sub-category for each maturity stage and we

presented the underlying maturity assessment questions.

Now that the DWCMM with its main components and maturity assessment questionnaire have been

presented, we will continue with presenting the results of the evaluation phase from our research process

in the next chapter.

Page 87: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 78 -

6 Evaluation of the DWCMM

This section presents the results of two activities aimed at evaluating the model presented in the previous

chapters. Chapter 6.1 is a report of the review of the model by five DW/BI experts from practice, and

emphasizes the validity of the model. Chapter 6.2 describes an assessment of the case study results gotten

by testing the model in four organizations.

6.1 Expert Validation

To evaluate the utility and further revise the DWCMM, expert validation was applied. An ―expert‖ is

defined by (Hoffman et al., 1995) as a person ―highly regarded by peers, whose judgements are

uncommonly accurate and reliable, whose performance shows consummate skill and economy of effort,

and who can deal effectively with rare or tough cases. Also, an expert is one who has special skills or

knowledge derived from extensive experience with subdomains.‖ Therefore, eliciting knowledge from

experts is very important and useful and can be done using several methods, one of them being structured

and unstructured interviews (Hoffman et al., 1995). More information on interview techniques is given in

6.2.1.

Moreover, five experts in data warehousing and BI were interviewed and asked to give their opinions

about the content of the model we have developed. The interviews were structured, but consisted of open

questions, in order to capture the knowledge of respondents. This offered the possibility of enabling the

experts to liberally state their opinions and ideas for improvement. The expert panel consists of five

experts from practice, each of them having at least 10 years of experience in the DW/BI field. All of them

are DW/BI consultants at different organizations in The Netherlands (local or multinational). An

overview of the experts and their affiliations (figures are taken from 2009 annual reports) is depicted in

table 49.The expert interview protocol and questions can be seen in appendix D.

Respondent

ID 1 2 3 4 5

Job Position CI/BI consultant Principal consultant/

Thought leader

BI/CRM

BI consultant Principal

consultant BI

BI consultant

Respondent Affiliation

Industry DW/BI

Consulting

IT Services BI Consulting IT Services DW Consulting

Market B2B B2B B2B B2B B2B

Employees ≈ 45 ≈ 49000 ≈ 35 ≈ 38000 ≈ 1

Table 49: Expert Overview.

6.1.1 Expert Review Results and Changes

DWCMM

First, the experts were asked to propose some categories that they would find important when assessing

the maturity of a DW solution. Among the proposed categories we can mention: data structure, data

Page 88: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 79 -

architecture, metadata, masterdata, hardware, infrastructure, report architecture, security, management

and maintenance, traceability within the DW. One expert said that other important aspects to be analyzed

refer to whether the organization is doing ETL or ELT, and whether they are using real time data

warehousing. Another critical point for success was considered to be the alignment between business and

IT. As can be seen, some of the categories proposed by the experts can be found or are included in the

categories from our model. The others (i.e.: data architecture, data governance, masterdata, traceability)

can be considered to be further researched in the future.

Furthermore, all reviewers gave positive feedback for their first impression of the DWCMM, said it made

sense and it could be applied for assessing an organization‘s current DW solution. One of the experts

noticed that the main sub-categories from the DW technical solution part were not on the same level

because ―architecture‖ is usually a superset that includes data modelling, ETL and BI applications. Some

experts said that in general the model seemed to be complete, but that of course, probably small changes

could be made or new categories/sub-categories should be added.

Three of five reviewers stated that ―infrastructure‖ should also be added as a sub-category for the DW

technical solution or should replace ―architecture‖. However, as already explained in the previous

chapters, in literature, infrastructure usually refers to the hardware and software supporting architecture.

Also, architecture usually refers to the logical architecture (i.e.: the data storage layers), application

architecture (i.e.: ETL, BI applications), data architecture, technical architecture (i.e.: infrastructure). Our

sub-category refers more to the logical architecture and some other elements such as: metadata, security,

infrastructure, etc. Therefore, we agree that maybe the name ―architecture‖ could be a little bit confusing

and we decided to change the name to ―General Architecture and Infrastructure‖ for the final model.

One expert suggested that ―data modelling‖ should be changed to ―data management‖ that is a broader

category which includes: data modelling, data quality and data governance. However, due to time

constraints and the fact that we do assess data modelling and a little bit the data quality in our current

model, we leave the data governance part to future research.

The last comments regarding the structure of the DWCMM were related to ETL. One of the experts

suggested that this sub-category should be called ―data logistics‖ as it could involve ETL or ELT. But, as

we believe that ETL is the more common name and easier to be understood by the respondents who

would take the assessment, we decided to leave it unchanged. Another expert proposed that a new sub-

category called ―ETL Workflow‖ should be added. It would include the way fault tolerance is addressed,

the ETL technical administration and generally how ETL processes are managed. We consider that this

sub-category would not be on the same level as the other ones, and some of its elements are addressed in

the ETL sub-category, and therefore, we decided not to include it in our model for the time being.

The DWCMM Condensed Maturity Matrix

All reviewers said they got a good first impression of the DWCMM Condensed Maturity Matrix and that

it gives a good overview of the main goal of the model. Some experts pointed out that the characterization

of ―architecture‖ for the highest maturity stage was not on the same level as the previous ones. After also

doing the test case studies, we decided to change the fifth stage of maturity to ―DW/BI service that

federates a central enterprise DW and other data sources via a standard interface‖. More comments on this

change are given in the case study results paragraph.

Page 89: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 80 -

Moreover, several suggestions were made regarding the ETL characterization for each stage. One expert

suggested that more information should be given for each stage of ETL. Another one proposed that the

characterization of ETL for the last level of maturity should be changed as ―real-time ETL‖ seems not to

be on the same page as the ETL characterization done for the previous stages (i.e.: basic ETL, advanced

ETL, etc.). The redefined matrix after the expert interviews can be depicted in figure 6.

DW Maturity Assessment Questionnaire

As with the previous two deliverables, all reviewers gave positive feedback for their first impression of

the DW maturity assessment questionnaire. Some of the experts pointed out that, even if the chosen

characteristics and questions are representative for the problem we would to address, they might not be

enough to do an in-depth assessment of a specific DW environment. Therefore, most of the experts

suggested that, when testing the model in practice, it would be very important to clearly state that the

main goal of the model is to do a high-level DW maturity assessment and that the focus of the

questionnaire is represented by the technical aspects of a DW solution.

Furthermore, each expert had its own view on data warehousing and BI, and hence, it was difficult to sum

up all their comments and integrate them in useful changed for our maturity questionnaire. Finally, we

decided to split their feedback into two categories: proposed changes that due to time constraints and

scope limitation were not implemented in the final version of the model, but should be considered for

future research; and implemented improvement suggestions that involved some rephrasing or complete

changing of questions and answers. We will give a short overview on the former further in this paragraph.

The actually implemented changes can be seen in the redefined questionnaire in appendix C. The

questions and answers that suffered changes are written in red so that the differences from the first

version of the questionnaire can be better depicted. Also, the main questions and answers that were

redefined can be seen in the table below.

Category Rephrased Questions Questions Whose Answers Were

Rephrased or Changed

Architecture 1

Data Modelling 2,

4 – split in two questions

1, 8

ETL 2,

5 – split in two questions

1, 2, 4

BI Applications 3 – split in two questions

Development Processes 3 – split in two questions 2, 5, 8

Service Processes 3, 4, 5, 6, 7, 8

Table 50: Rephrased or Changed Questions and Answers.

Moreover, here are the main changes that were proposed by the experts, but were not implemented. All

the experts suggested a version for the DW architecture found on the highest maturity level. We

combined all the opinions and came up with the answer as shown later in this chapter. Three of five

experts suggested that more attention should be paid on data quality as it is a very important issue

nowadays, and that more questions should be added for tackling this problem. We have a question

regarding data quality in the ETL section, but apparently, data quality should also be taken into

consideration in the data modelling part. Therefore, due to time constraints and to the high level nature of

our assessment, we leave this topic open for future research. One expert suggested that we should dive a

Page 90: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 81 -

little bit into cloud computing for data warehousing. We find this topic very interesting and important for

the future of data warehousing and BI. However, due to time constraints, we could not find an efficient

way to include this in our current model and we will leave it to future research.

One expert said that we should analyze in more details how the actual monitoring and management of

ETL is done, not just ask if they do this. But, as already mentioned, our assessment is a high level one and

it tries to capture what is done and not how it is done. Hence, we decided not to implement this suggested

change. Other proposals that we found interesting for assessing the maturity of a DW, but very hard to

include in a model and questionnaire like ours refer to: judge how mature the organization is in adapting

to new situations; address the right DW development methodology (e.g.: waterfall, iterative and

incremental development, agile development, etc.) for the right category; or have a good strategy for tool

management (e.g.: aspects related to pricing, licensing, etc.) and understand that there are more tools, but

you need an integrated view to be successful.

The last important suggestion was that a question for problem management should be added to the DW

service processes category. We must admit that we also thought about it when developing the model, but

we finally decided not to include this in our questionnaire. Problem is usually defined in ITIL as ―a

condition that has been defined, identified from one large incident or many incidents exhibiting common

symptoms for which the cause is unknown‖. Therefore, we believe that problem management is positively

correlated with incident and change management, and it can be included in these processes. That is why

for the time being, we will leave this question out.

Besides the questions regarding the DWCMM, we also asked the experts whether weighting coefficients

should be considered for computing the maturity scores. Two of them said that no general weights should

be used as this would make the scoring rather difficult and these weighting coefficients should be

situational, depending on each organization. One expert suggested that it would be interesting to have

weights for both individual questions and sub-categories/categories. One expert believed that weighting

factors should be used for the main sub-categories/categories. The last expert was not really sure about

this aspect, saying that it would be interesting to have weights for each question, but the scoring will

become very complicated. Therefore, due to the lack of unanimity regarding the adding of weighting

coefficients to the questionnaire, and for the clarity of scoring, we decided to leave weights out for the

time being.

6.2 Multiple Case Studies

Depending on the nature of a research topic and the goal of a researcher, different research methods are

appropriate to be used (Benbasat et al., 1987; Yin, 2009). One of the most commonly used ways to

classify research methods is the distinction between qualitative and quantitative research methods. The

research method applied here is case study research, a qualitative one. It is the most widely used

qualitative research method in information systems (IS) research and is well suited to understanding the

interactions between information technology (IT)-related innovations and organizational contexts (Darke

et al., 1998). (Runeson & Host, 2009) suggest that case study research is also a suitable research

methodology for software engineering since it studies contemporary phenomena in its natural context.

Therefore, as research in data warehousing can be considered to be at the border between IS and software

engineering, case study research is also appropriate here. According to (Yin, 2009), ―the essence of a case

Page 91: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 82 -

study is that it tries to illuminate a decision or set of decisions: why they were taken, how they were

implemented, and with what result.‖ Hence, the case study method allows investigators to retain the

holistic and meaningful characteristics of real-life events, such as organizational and managerial

processes, for example. (Benbasat et al., 1987) consider that a case study examines a phenomenon in its

natural setting, employing multiple methods of data collection to gather information from one or a few

entities (i.e.: people, groups or organizations). As our research is developing a DWCMM, it is part of the

IS/software engineering field and it is suited for both technical and organizational issues. Therefore, case

study research seems as an appropriate choice that will help us capture knowledge from practitioners, test

and validate the created models and theories. In order to enrich and validate our model in practice, four

organizations were chosen to take our DW maturity assessment and the results will be furthered presented

in this chapter.

6.2.1 Case Study Approach

Case study research can be used to achieve various research aims: to provide descriptions of phenomena,

develop theory and test theory (Darke et al., 1998). But, no matter of its final goal, preliminary theory

development as part of the case study design phase is essential (Yin, 2009). In our research, we will use it

in order to test theory which in this case is the DWCMM we developed. The use of case study research to

test theory requires the specification of theoretical propositions derived from an existing theory. The

results of case study data collection and analysis are used to compare the case study findings with the

expected outcomes predicted by the propositions (Cavaye, 1996). The theory is either validated or found

to be inadequate in some way, and may then be further refined on the basis of the case study findings.

Case study research may adopt single or multiple case designs. A single case study is appropriate where it

represents a critical case (i.e.: it meets all the necessary conditions for testing a theory), where it is an

extreme or unique case, or it is a revelatory case (Yin, 2009). Multiple case designs allow cross-case

analysis and comparison, and the investigation of a particular phenomenon in diverse settings. Multiple

case studies may also be selected to predict similar results (i.e.: literal replication) or to produce

contrasting results for predictable reasons (i.e.: theoretical replication) (Yin, 2009). As according to

(Benbasat et al., 1987) and (Yin, 2009), multiple case studies are preferred over single case studies

designs to get better results and analytic conclusions, we decided to conduct a multiple case study

research following (Yin, 2009) case study approach as depicted in figure 10.

Page 92: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 83 -

Figure 10: Case Study Method (adapted from (Yin, 2009)).

Therefore, the main steps in case study research are (Runeson & Host, 2009; Yin, 2009):

Case study design – research objectives are defined and the case study is planned. This is also

where theoretical development is done, as described in chapters 3-5.

Preparation for data collection – procedures and protocols for data collection are defined. This is

also where cases are found and selected to evaluate and test the theory. The main criterion used in

the search for suitable organizations was that all approached organizations had a professionally

DW/BI system in place whose maturity could be assessed by applying the DWCMM.

Furthermore, an important criterion for the selection of respondent per case was that the

interviewed respondents had an overall view on the technical and organizational aspects for the

DW/BI solution implemented in their organization. As (Yin, 2009) suggests that at least three

case studies should be used, four test organizations have been found that agreed on cooperating in

our research and taking the maturity assessment (an overview is provided in paragraph 6.2.2).

While selecting the case studies, the case study and data collection protocols were also defined.

The protocol contains the instrument, but also the procedures and general rules to be followed. It

is essential when doing a multiple-case study to increase the reliability of the case study research

and guide the investigator in carrying out the data collection.

Collecting evidence – execution with data collection on the studied case. Typically, multiple data

collection methods are employed in case study research to increase the validity of the results.

Also, multiple sources of evidence are used such as (Yin, 2009): documentation – various written

material; archival records – organizational charts, service, personnel and financial records;

interviews – open, structured or semi-structured interviews; direct observation – observe and

absorb details, actions or subtleties of the environment; physical artifacts – devices, tools,

instruments. In this research, the data collected in the cases consists of four interviews and

documentation. The interview lasts around 1.5 hours and consists mainly of the maturity

assessment questionnaire itself, but it also has a few open questions. This allows the researcher to

guide and control the interview, while leaving some room available for discussions in order to

capture the suggestions and questions that the respondents might have. The purpose was not only

for organizations to take the maturity assessment, but also to use the knowledge of the

respondents in order to improve the questionnaire. For analyzing purposes, interviews have been

Page 93: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 84 -

digitally recorded, transcribed and validated by the respondents. For purposes of consistency, the

interview protocol is enclosed in appendix E. Mainly, the interview consisted of three parts:

General questions about the organization and the respondent‘s role in the DW/BI project.

The DW maturity assessment questionnaire (i.e.: DW General Questions; DW Technical

Solution; DW Organization and Processes).

Final questions and suggestions.

Analysis of collected data – data analysis can be quantitative and qualitative. In this research, we

will do a qualitative data analysis to derive conclusions from the interviews and improve our

model. The remainder of this chapter discusses the overall findings of the case studies, including

a short description per case and analysis of the results. Despite the fact that all individual cases

are interesting, we will focus on the overall results.

Reporting – the report communicates the findings of the study, but it is also the main source of

information for judging the quality of the study. Therefore, the master thesis document itself will

serve as the case study report.

6.2.2 Case Overview

The case studies have been conducted at four organizations of different sizes, operating in several types of

industries and offering a wide variety of products and services. An overview of the case study

organizations (figures are taken from 2009 annual reports) and respondents is depicted in table 51. As the

technologies used for developing each component of the DW can help us shape a better image on the DW

solution and its maturity, an overview of these technologies is also offered in table 52. For each case, a

short description is provided in the following subparagraphs. A short analysis on the maturity scores each

organization got after taking the assessment is also given further in this chapter. However, due to

confidentiality reasons, the individual answers for each question and the feedback given to each

organization are not published in the official version.

Organization A B C D

Industry Retail Insurance Retail Maintenance &

Servicing

Market B2C B2B & B2C B2C B2B

Revenue 19.94 billion € 4.87 billion € 780 million € NA

Employees ≈ 138000 ≈ 4500 ≈ 3660 ≈ 3500

Respondent

Function

BI consultant DW/BI technical

architect

BI manager BI consultant & DW

lead architect

Table 51: Case and Respondent Overview.

Page 94: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 85 -

Organization

Developing Category

Organization A Organization B Organization C Organization D

Data Modelling NA Power Designer SAP Visio, Word,

PowerPoint

Extract/Transform/Load

(ETL)

IBM InfoSphere

DataStage

IBM InfoSphere

DataStage

SAP Oracle

Warehouse

Builder

BI Applications Microstrategy &

Business Objects

Cognos & SAS;

in-house Business

Objects

QlikView Oracle BI

Enterprise

Edition

Database IBM DB2 Oracle DB IBM DB2 Oracle DB

Table 52: Technologies Usage Overview.

6.2.2.1 Organization A

Organization A is an international food retailer headquartered in Western Europe. It has leading positions

in food retailing in key markets. These positions are built through strong regional companies going to

market in a variety of food store formats. The operating companies benefit from the Group‘s global

strength and best practices. Their strategy remains organized around the three pillars of profitable top-line

growth, the pursuit of excellence in execution and corporate citizenship. Organization A considers that in

a high-volume industry characterized by low margins such as food retail, excellent execution offers a true

competitive advantage. This is the reason why sophisticated tools, state-of-the-art systems and

streamlined processes are implemented to serve as the foundation for profitable growth and good returns.

Connecting and converging tools, systems, processes and people help the operating companies to address

both current and future challenges with cost-effective and integrated solutions.

DW General Information

The main drivers for developing a DW at organization A were to improve managerial decisions and

increase profit. For a supermarket it is very important to store data at a high granularity in the DW. In this

way operations can be closely monitored and different types of BI applications can be developed. The

main activities done using the DW/BI project are: reporting and dashboarding on the main KPIs on profit

margins, store usage, store losses, etc. Also, some data mining to determine which products are most

often sold together. In this way a better product placing and promotion decisions can be achieved.

Organization A has been using DW/BI for almost 10 years and executives perceive the DW/BI

environment as a tactical resource (i.e.: a tool to assist decision making) and their goal is for the DW/BI

to become a competitive differentiator (i.e.: key to gaining or keeping customers and/or market share) in

the future. In general, the returns gotten from the DW are higher than its costs, data quality is high which

can also be seen in the relatively high end-user adoption. As the DW environment is considered an

important factor for success, the budget owner is the business part and a relatively high percent of the

annual IT budget is allocated to the DW/BI department.

6.2.2.2 Organization B

Organization B, situated in Western Europe, is a major player in the insurance market. Established in the

eighteenth century, it has a century long tradition and experience in this field. It offers both private

Page 95: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 86 -

individuals and companies a wide array of life, non-life medical and disability insurances, and also

mortgage, savings and investment products. The distribution is made via several channels: brokers,

consultants working on commission, banks, independent intermediaries and direct contact.

DW General Information

The main driver for developing a DW at organization B was that business analysts and controllers needed

integrated data in order to make their own reports, analyses, etc. Another requirement came from the

consumer intelligence department that wanted to have a broader view on the whole company portfolio

and on each customer portfolio.

Organization B has been using DW/BI for almost 10 years and the DW solution is perceived as an

operational cost center (i.e.: an IT system needed to run the business) and in certain situations, as a

tactical resource. Data quality is considered to be a business responsibility and therefore the ―garbage in,

garbage out‖ principle is applied. However, a lot of attention is paid to the software and development

processes quality in the technical department. The DW solution is not a big success in organization B as

the end-user adoption is not high due to distrust in the data and the DW solution itself. DW/BI has

decentralized budget owners as business lines have their own budgets and each one of them decides what

to spend on BI, but generally around 5% of the IT budget is allocated for this department.

6.2.2.3 Organization C

Organization C is a supermarket chain with local activities in a western European country. They are a

unique and ambitious organization due to their cooperative nature: an intense cooperation between the

organization and its members. They are a flat organization with short communication lines, a complete

and modern logistics system and a low cost structure. This organization offers a wide diversity of food

related products and their goal is an optimal service to customers. They try to optimize their business

outcomes by stacking purchasing volumes and ongoing cost control.

DW General Information

The main driver for developing a DW at organization C was that management could make better

decisions based on the right data. As organization C is a supermarket, some of the drivers are the same as

the ones for organization A. A DW/BI solution is very important for food retailers as they have many and

diverse products and a lot of transactions take place every day. At organization C, the DW is considered

the only viable solution for reporting and data analysis.

Organization C has been using DW/BI for almost 15 years and it has evolved a lot since the first solution

was implemented. Executives perceive the DW/BI environment as a tactical resource and that is why data

quality is considered very important. It could be said that the returns are higher than the costs as, due to

high data quality and ―one version of the truth‖, end user adoption is good and executives are able to

make better and faster decisions. The budget owner of the DW is the Chief Financial Officer (CFO) and

usually, 10% of the annual IT budget is allocated to the DW/BI environment for continuous improvement.

Page 96: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 87 -

6.2.2.4 Organization D

Organization D is one of the leading providers of rolling stock maintenance in Western Europe. They

provide rolling stock availability and reliability for numerous passenger and freight carriers from across

Europe. In addition to short-term maintenance, organization D also offers routine servicing. This covers

minor repairs as well as the cleaning of interiors and exteriors, including the removal of graffiti. Customer

and performance focus play an essential part in the business partnership between organization D and its

customers. That is why they are closely involved in all stages of a customer‘s project in order to avoid

unnecessary work being carried out. Another important aspect for organization D is innovation. They

invest in high-technology workshops and state-of-the-art equipment. Only by keeping up with the latest

technology can they offer specialized services to customers at the forefront of the rapidly changing rail

transport market.

DW General Information

The main driver for developing a DW at organization D was the need for high data quality and

consistency in order for the business (especially middle and higher management) to make the right

decisions. In the beginning, it was focused on the operational side, but now the main focus is on the

financial one; and the main goal for this year is to integrate the two solutions in a single DW.

Organization D has been using DW for 3.5 years and it started out as a tactical resource. Nowadays,

executives perceive it as a mission-critical resource and the goal for the near future is for the DW to

become a strategic resource. Therefore, the DW solution and the way it is perceived in the organization

have developed a lot since it was first implemented. In general, there is a positive net result when

comparing the returns and costs of the DW. The main benefits include a high data quality and end-user

adoption. From this point of view, the DW/BI solution has achieved its goal. The budget owner of the

DW is the Chief Financial Officer (CFO) and usually, less than 5% of the annual IT budget is allocated to

the DW/BI environment.

6.2.2.5 Case Study Analysis

In this section, a short analysis of the results gotten by all the organizations after filling in the assessment

questionnaire will be given. The maturity scores regarding the implemented DW solution obtained by the

organizations can be seen in the table below.

Maturity Score

Benchmark Category

Organization A Organization B Organization C Organization D

Architecture 2.67 2.56 3.89 3.55

Data Modelling 2.17 3.44 3.00 4.11

ETL 3.14 3.29 3.71 2.86

BI Applications 2.71 2.71 3.43 3.57

Development Processes 2.90 3.19 3.66 3.02

Service Processes 2.63 3.00 2.87 3.12

Table 53: Organizations‘ Maturity Scores.

Page 97: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 88 -

As shown in the picture depicting our model, a better way to see the alignment between the maturity

scores for the six categories is by drawing the radar graph. The radar graphs for all the organizations can

be seen in the figures below.

Figure 11: Alignment Between Organization A‘s Maturity Scores.

Figure 12: Alignment Between Organization B‘s Maturity Scores.

Figure 13: Alignment Between Organization C‘s Maturity Scores.

012345

Architecture

Data Modelling

ETL

BI Applications

Development Processes

Service Processes

Organization A

Ideal Situation

012345

Architecture

Data Modelling

ETL

BI Applications

Development Processes

Service Processes

Organization B

Ideal Situation

012345

Architecture

Data Modelling

ETL

BI Applications

Development Processes

Service Processes

Organization C

Ideal Situation

Page 98: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 89 -

Figure 14: Alignment Between Organization D‘s Maturity Scores.

Some more information regarding the maturity scores for all the four case studies can be seen in the table

below.

Organization

Maturity Score

A B C D

Total Maturity Score for DW

Technical Solution

2.67 3.00 3.51 3.52

Total Maturity Score for DW

Organization & Processes

2.77 3.10 3.26 3.07

Overall Maturity Score 2.72 3.05 3.38 3.29

Highest Score ETL - 3.14 Data Modelling -

3.44

Architecture -

3.89

Data Modelling -

4.11

Lowest Score Data Modelling -

2.17

Architecture - 2.56 Service Processes

- 2.87

ETL - 2.86

Table 54: Maturity Scores Analysis.

As can be seen from table 53, maturity scores for each sub-category are usually between 2 and 4, with one

exception: organization D scored 4.11 for Data Modelling. Thus, the overall maturity scores and the total

score per category also ranged between 2 and 4 which shows that most organizations are probably

somewhere between the second and fourth stage of maturity. The highest maturity score was gotten by

organization C, and the lowest one by organization A. Apparently, an overall score close to 4 or 5 is quite

difficult to achieve. This is usually normal in maturity assessments, as in practice, nobody is so close to

the ideal situation. It will be interesting to see the range of scores after the questionnaire will have been

filled in by a large number of organizations.

From table 54 it can be seen that the categories with the highest and lowest scores are diverse depending

on the organization. For example, organization A scored lowest for Data Modelling, whereas Data

Modelling was the most mature variable for organization D. Interesting conclusions can also be drawn if

comparing the scores for organizations A and C as they are part of the same industry. The former is an

international food retailer and has more experience in this industry, whereas the latter is a local one with

less experience. However, organization A got a quite low DW maturity score. Thus, experience in the

industry does not also mean maturity in data warehousing. Of course, more factors can influence this

012345

Architecture

Data Modelling

ETL

BI Applications

Development Processes

Service Processes

Organization D

Ideal Situation

Page 99: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 90 -

difference in scores: size, the way data warehousing/BI is embedded in the organizational culture, the

percentage from the IT budget for BI, etc.

As presented in the previous chapters, the goal of our model is not only to give a maturity score to a

specific organization, but also provide them with some feedback and the necessary steps for reaching a

higher maturity stage. For example, the overall maturity score for organization A is 2.72, which leaves a

lot of room for improvement. Moreover, as the lowest score is for Data Modelling, a good starting point

would be this category. Due to confidentiality reasons, more details regarding the maturity scores and

feedback cannot be offered here. The template used for giving feedback to the case studies can be seen in

appendix F.

6.2.2.6 Benchmarking

As already mentioned in the previous chapters, the DWCMM can serve as a benchmarking tool for

organizations. The DW maturity assessment questionnaire provides a quick way for organizations to

assess their DW maturity and, at the same time, compare themselves in an objective way against others in

the same industry or across industries. Of course, better results will be achieved for benchmarking after

more organizations will take the maturity assessment. However, in order to have a better image on how

the graph will look like when doing benchmarking, we will give an example here using the data from the

case studies we performed. A bar chart comparing organization A‘s scores with the best practice and with

the average maturity score is shown below.

Figure 15: Benchmarking for Organization A.

6.2.3 Case Studies Results and Conclusions

From the results gotten from the case studies, it can be said that the DWCMM could be successfully

applied in practice. However, this part of the validation process had a multiple goal:

First we wanted to see if organizations can understand the questions and the answers match to

their specific situation.

Second we wanted to see if the scoring method works and if we can offer specific feedback for

each organization to achieve a higher maturity level.

0 1 2 3 4 5

Architecture

Data Modelling

ETL

BI Applications

Development Processes

Service Processes

Average Score

Best Practice

Organization A

Page 100: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 91 -

Last, but not least, we wanted to receive feedback from them regarding the questions and their

answers.

Therefore, depending on the suggestions of each interviewee, we made the following changes and drew

some conclusions. An overview of these changes and conclusions is given further in this paragraph. The

final version of the questionnaire is shown in appendix B.

The main changes were done after the first case study. The changes proved to be successful as the same

problems were not met again at the following case studies. The first interviewee suggested that in order to

judge the maturity of DW/BI in an organization, it is also critical to see how strongly it is embedded in

the organizational culture and how important it is considered for the organization. As this is very hard to

assess, a first step was to add the following question to the DW General Questions: What percentage of

the IT department is taking care of BI? (i.e.: how many people from the total number of IT employees?).

Moreover, the answers from questions 3 and 4 regarding ETL suffered some minor changes as it was hard

for the respondent to choose the most appropriate answer for his organization. A little bit of confusion

was also created by the answers of questions 1 (i.e.: Which types of BI applications best describe the

highest level purpose of your DW environment?) and 6 (i.e.: Which BI applications delivery method best

describes the highest level purpose of your DW environment?) from the BI applications part. This is the

reason why we decided to arrange the answers in a hierarchical order so that it would be clear that even if

more answers match the company‘s current situation, only the one with the highest complexity will be

scored.

Several questions from the DW Organization and Processes part also suffered minor changes. For

example, answer d) from question 2 regarding the DW development processes changed from ―some

separation between environments (i.e.: at least three environments) with automatic transfer between

them‖ to ―some separation between environments (i.e.: at least two environments) with automatic transfer

between them‖. An important change was made to the last question from DW development processes

regarding the testing and acceptance phase. As it proved to be difficult for the interviewee to match an

answer to the current organizational situation, we decided to change the layout of the question to a

multiple choice one. Therefore, we consider that there are seven main elements that determine the

maturity of the testing and acceptance phase: unit testing by another person, system integration testing,

regression testing, user training, acceptance testing, standard procedure and documentation, external

assessments and reviews. Respondents can now choose all the elements characteristics to their

organization and we will give a normalized score, between 1 and 5, in order to match the overall scoring

method.

The last change we made after the first case study was to arrange the answers from the DW service

processes questions in a hierarchical order as organizations usually need to fulfill the requirements for an

inferior level in order to get to a higher one. Apparently, the mixed answers created a little bit of

confusion for the respondent. The problem with the hierarchical order was that respondents might give a

biased response, but when tested among the other three case studies, this did not seem to happen as we

got diverse scores depending on the question and the organization. Of course, more feedback regarding

this aspect will be received after testing the questionnaire in more organizations.

Page 101: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 92 -

As already mentioned, most of the changes were made after the first case study. However, after receiving

the feedback from all the four respondents, we decided that further changes are needed for improving the

DW maturity assessment questionnaire.

First, the answer we proposed for the highest level of maturity for the first question regarding the

predominant architecture of the DW: ―a virtual integrated DW‖ will be changed to ―a DW/BI service that

federates a central enterprise DW and other data sources via a standard interface‖. To further accelerate

development and adapt quickly to changing business needs, mature organizations can redistribute some

development tasks to the business units and departments. However, a central DW is needed as a

repository for information shared across business units. Distributed groups are just allowed to build their

own applications within a framework of established standards, often maintained by a center of excellence.

In order to be successful, DW/BI solutions have to first be centralized and later federated (Eckerson,

2004). Another way to accelerate the development of BI-enabled solutions is for organizations to use

service oriented architecture (SOA). By wrapping BI functionality and query object models with Web

services interfaces, developers can make DW/BI capabilities available to any application regardless of the

platform it runs on or programming language it uses. As the previous answer was not very clear to the

interviewees, we believe that the latter will provide the right meaning for future respondents.

Furthermore, question 2 from Data Modelling is rather complex as it involves the synchronization

between a wide range of data models and maybe in the future it would be better if separated into more

questions. However, so far we have not come to an agreement how a better question and answers would

look like. Minor changes were made to the answers found on the second stage of maturity for questions 4,

5 and 6 regarding the standards and metadata for data modelling. For questions 4 and 5, we decided to

change the characteristic for the second level of maturity from ―solution dependent standards

implemented for some of the data models‖ to ―solution dependent standards‖ to make the distinction

between this maturity stage and the next one even stronger, as the next stage of maturity already involves

enterprise-wide (or team-wide) standards for some data models. A similar argument stands for changing

the characteristic on the second maturity stage for question 6 (regarding metadata management for data

models) from ―non standardized documentation for some of the data models‖ to ―non standardized

documentation‖.

Another question whose answers created some confusion was the question related to metadata

management for ETL. From the literature study we have done, the conclusion was drawn that usually

organization manage the business and technical metadata for some or all ETL, and usually the ones with a

broad experience in this field also manage the process metadata. However, one of the respondents

answered that they manage process and technical metadata for all ETL and business metadata only for

some ETL. Therefore, we consider that the answers to the question: “To what degree is your metadata

management implemented for your ETL?” may be something like the ones proposed here: a) no metadata

management; b) only one type of metadata managed (i.e.: business, technical or process); c) two types of

metadata managed (i.e.: whichever combination between business, technical and process); d) all three

types of metadata (i.e.: business, technical and process) managed for some ETL; d) all three types of

metadata managed for all ETL. However, these new characteristics need to be further tested in practice to

be validated.

Page 102: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 93 -

Moreover, another change to be considered is for question 7 from the DW development processes

regarding the DW project management. There are usually five main elements that determine the maturity

of a DW project management: project planning and scheduling, project tracking and control, project risk

management, standard procedure and documentation, and evaluation and assessment. Therefore, we

believe that a better layout and scoring for this question would be one similar to the one proposed for the

testing and acceptance phase.

While doing the case studies, we came up with a general question regarding the DW service processes

that includes the main activities from this phase. The concept of this question is the same as the one

proposed for the testing and acceptance phase from the DW development processes. We had the

opportunity to test it and score it for the last three case studies, but we did not include its score in the final

result in order to be able to compare all the four case studies on the same level. The question seems to

work in practice, although, as with the other questions with similar layout (i.e.: the ones for testing and

acceptance, project management, etc.), it is quite difficult to judge which characteristics should be on

which maturity stage.

A last remark is related to the questions on the defined, documented and implemented standards. As one

of the experts suggested, we divided the questions related to standards into two separate ones: the first

regarding the definition and documentation of standards, and the second one regarding the actual

implementation and following of standards. After testing the model, we saw that some organizations

consider these two aspects synonyms and sometimes fill in the same answers. However, as we believe

that this distinction should be made and, as we cannot generalize how other organizations would see this

problem, we will leave the questions separated for the time being.

To sum up, the DW maturity assessment questionnaire can be successfully applied in practice. We

generally received positive feedback regarding the questions and their answers from the case study

interviewees. In this way, we could test whether the questions and their answers are representative for

assessing the current DW solution for a specific organization and if they can be mapped to any

organization depending on the situational factors. We also had the chance to apply the scoring method

and give appropriate feedback for each case study. Finally, we combined all the feedback received from

the case studies and did some small, but valuable changes to some questions and answers which improved

our DWCMM as a whole.

6.3 Summary

This chapter presented the results of the two main activities done for evaluating our model: five expert

interviews and four case studies. We started by giving an overview of the experts and their affiliations,

and then we showed the main changes that resulted after the five expert interviews. We continued with

presenting the four case studies and the underlying respondents. Finally, we did an analysis of the

maturity scores gotten by the cases and we illustrated the changes made to the questionnaire that followed

the case studies.

In the next chapter, we conclude our research and present some discussion points and possible future

work.

Page 103: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 94 -

7 Conclusions and Further Research

In this section, the main conclusions of this study are presented. Subsequently, some critical analysis of

the results is done and finally, recommendations for future research are made.

7.1 Conclusions

This research has been triggered by the estimates made by (Gartner, 2007) and other researchers that

more than fifty percent of DW projects have limited acceptance or fail. Therefore, we developed a Data

Warehouse Capability Maturity Model (DWCMM) that would help organizations assess their current DW

solution and provide guidelines for future improvements. The main elements that usually influence the

success of a DW environment are: technical components, end user adoption and usage, and business

value. However, we limited our research to the technical components, due to time constraints and the fact

that a solid technical solution usually is the foundation for the other two elements to be successful. In this

way we attempted to answer the main research question for our study:

How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon?

The main conclusion from our study is that, even if our maturity model could help organizations improve

their DW solutions, there is no ―silver bullet‖ for a successful development of DW/BI solutions. The

DWCMM provides a quick way for organizations to assess their DW/BI maturity and compare

themselves in an objective way against others in the same industry or across industries. It received

positive feedback from the five experts that reviewed and validated it and it also resonated well with the

audiences from our four case studies. However, it is critical to emphasize the fact that the model only

does a high-level assessment. In order to truly assess the maturity of their DW/BI solutions and discover

the strong and weak variables, organizations should use our assessment as a starting point for a more

thorough analysis. Our research also showed that the model can be applied to a wide diversity of

organizations from different industries, but the results and guidelines for future improvement depend on

some situational factors specific for each organization. According to the experts that validated our model,

some important situational factors are: whether data warehousing and BI can act as a differentiator in

their specific industry, the size of the organization, their budget (especially the one for DW/BI), the

organizational culture regarding DW/BI, etc.

Moreover, our main research question is split into several more specific sub-questions:

1) What are data warehouses and business intelligence?

2) What do maturity models represent and which are the most representative ones for our research?

The answers to these two questions are the foundation for our research and the main ―artifact‖ we have

delivered – the DWCMM. In this way, we presented some theoretical background on the key concepts of

the study – data warehousing, business intelligence, maturity modelling – and discovered the research gap

that our model could fill in – the lack of a maturity model that would help organizations take a snapshot

of their current DW/BI technical solution and provide them systematic steps for achieving a higher

maturity stage, and thus, a DW/BI environment that would deliver better results.

Page 104: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 95 -

3) What are the most important variables and characteristics to be considered when building a data

warehouse?

This question is addressed by the DWCMM and its components as presented in chapter 3. As already

mentioned, the model we developed is limited to the technical aspects and it considers two main

benchmark variables/categories for analysis, each of them having several sub-categories (i.e.: DW

Technical Solution – Architecture, Data Modelling, ETL, BI Applications; and DW Organization and

Processes – Development Processes, Service Processes).

4) How can we design a capability maturity model for a data warehouse assessment?

The answer to this question is offered by the DW Maturity Assessment Questionnaire and the underlying

Maturity Scoring and Maturity Matrices. The questionnaire includes several questions for each

benchmark category and sub-categories. In the end, a maturity score is given for each sub-category and

category, and of course, the end result is an overall maturity score. Depending on this, an informative

maturity stage can be pointed out for a specific organizations and some general feedback regarding the

maturity scores and future steps for improvement will be outlined.

5) To which extent does the data warehouse capability maturity model result in a successful assessment

and guideline for the analyzed organizations?

Once we developed the model, an evaluation phase was necessary to test its validity and depending on the

results, make the necessary adjustments. The DWCMM with all its components was initially reviewed by

five notable experts in this field, and then, tested in four organizations to see whether it can achieve its

goal or not. The model received positive feedback in general from the experts, and several minor changes

were made as can be seen in appendix C. Furthermore, the experts pointed out that even if the model

succeeds to emphasize the most important aspects involved in the development of a DW/BI project, it

might not be complete. Other benchmark categories and sub-categories should be added in the future.

Also, the DWCMM can serve as a high level technical assessment, but more questions and thorough

analysis are needed to dig deeper into the strengths and weaknesses of the DW/BI environment. The four

case studies offered us the possibility to test the model in practice. Generally speaking, the model seemed

to deliver the desired results. The respondents identified the categories and sub-categories from the model

and the questions and answers were usually well understood. Depending on their comments, we did

several readjustments, so that the assessment would be clearer and better understood by future

respondents. Moreover, the scoring method seemed to work well, and we were also able to offer feedback

to our respondents. Of course, we believe that more valuable feedback could be given in the future by

someone with more experience in this field. An observation at this point is that we cannot track what the

organizations are going to do with the results from this assessment and whether they are actually going to

take action to improve their DW/BI solution.

7.2 Limitations and Further Research

For every scientific research project, it is important to elaborate on its objectivity and limitations. First of

all, a limitation of this study is that it is based on the design science research which answers to research

questions in the form of design artifacts. Being a qualitative research method, a risk for objectivity might

arise. Hence, a certain influence of the experiences, opinions and feelings of the researcher on the analysis

Page 105: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 96 -

is possible. In our study, the main deliverables were developed by doing thorough literature study, but

also in collaboration with a Dutch organization as described in the acknowledgements. Therefore, some

slightly noticeable lack of impartiality might have slipped into the initial structure of the model. However,

this point of weakness was minimized by doing the validation of the model with several experts in the

field.

Another limitation is the fact that our model was evaluated by conducting case study research. The

DWCMM was tested in four organizations where the position of the respondents in the organization and

their viewpoints might have biased the validation. For future reference, it would probably be advisable for

at least two respondents from one organization to take the assessment. There is a higher chance that the

results would be more objective. Also, due to the fact that the model was tested only in four cases, it is

not possible to generalize the findings to any given similar situation. Therefore, for further research, it

would be interesting to validate the model using quantitative research methods. An example would be to

have the assessment questionnaire filled in by a large number of organizations in order to be able to do

some statistical analysis on the data, more valuable benchmarking and improvements on the whole

structure of the model. Another interesting approach would be to interview more experts from different

organizations in order to come up with a different structure for the model, new benchmark categories and

sub-categories and of course, new maturity questions and answers. Moreover, as suggested by the experts,

new elements that could be analyzed further in the future are: data quality which is currently one of the

most important reasons for DW/BI failure and data governance. These two elements could be both part of

a bigger category, called data management.

An important aspect to mention here is the fact that our research is limited to the technical aspects of a

DW/BI project. Therefore, a point for future research would be to extend the model to the analysis of

DW/BI end user adoption and business value. New benchmark categories and maturity assessment

questions could be added regarding these two problems. Another future extension that would increase the

value of the model could include questions and analysis for other types of data modelling (e.g.:

normalized modelling, data vault, etc.) because, as stated earlier in this thesis, we limited our maturity

assessment only to dimensional modelling. Last, but not least, as already mentioned, our model is a high

level one. In the future, several questions could be added for a more detailed analysis of the current

DW/BI environment and more valuable feedback offered to organizations.

To sum up, this study can be seen as a contribution to understanding the main categories and elements

that determine the maturity of a DW/BI project. The developed model serves as an assessment for the

current DW/BI solution for a specific organization and offers guidelines for future improvements. As

shown, the model received positive feedback when validated, but there is always room for improvement.

And, due to the current economic situation, data warehousing and BI could really make a difference.

According to (Gartner, 2009), in the near future, organizations will have high expectations from their BI

and performance management initiatives to help transform and significantly improve their business. As

can be seen, data warehousing, BI and performance management are becoming more and more valuable

to organizations and a lot of developments could and will be done in this field and in our model, as a

consequence.

Page 106: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 97 -

8 References

Aamodt, A., & Nygård, M. (1995). Different Roles and Mutual Dependencies of Data, Information and Knowledge.

Data and Knowledge Engineering, 16 , 191-222.

AbuAli, A., & Abu-Addose, H. (2010). Data Warehouse Critical Success Factors. European Journal of Scientific

Research, 42 , (2), 326-335.

AbuSaleem, M. (2005). The Critical Success Factors of Data Warehousing. Retrieved June 24, 2010, from Master's

Degree Programme in Advanced Financial Information Systems: http://www.pafis.shh.fi/graduates/majabu03.pdf

Ackoff, R. (1989). From Data to Wisdom. Journal of Applies Systems Analysis, 16 , 3-9.

Agresti, W. (2000). Knwoledge Management. In M. Zelkowitz, Advances in Computers (pp. 171-283). London:

Academic Press.

Aldrich, H., & Mindlin, S. (1978). Uncertainty and Dependence: Two Perspectives on Environment. In L. Karpik,

Organization and Environment: Theories, Issues and Reality (pp. 149-170). London: Sage Publications Inc.

April, A., Hayes, J., Abran, A., & Dumke, R. (2004). Software Maintenance Maturity Model: the Software

Maintenance Process Model. Journal of Software Maintenance and Evolution: Research and Practice, 17 , (3), 197

- 223.

Arnott, D., & Pervan, G. (2005). A Critical Analysis of Decision Support Systems Research. Journal of Information

Technology, 20 , (2), 67-87.

Azvine, B. C. (2005). Towards Real-Time Business Intelligence. BT Technology Journal, 23 , (3), 214-225.

Batory, D. (1988). Concepts for a Database System Synthesizer. Proceedings of International Conference on

Principles of Database Systems. Paris.

Becker, J., Knackstedt, R., & Poppelbus, J. (2009). Developing Maturity Models for IT Management: A Procedure

Model and its Application. Business & Information Systems Engineering, 1 , (3), 213-222.

Benbasat, I., Goldstein, D., & Mead, M. (1987). The Case Research Strategy in Studies of Information Systems. MIS

Quarterly, 11 , (3), 369-386.

Bennett, K. (2000). Software Maintenance: a Tutorial. In M. Dorfman, & R. Thayer, Software Engineering (pp. 289-

303). Los Alamitos: IEEE Computer Society Press.

Blumberg, R., & Atre, S. (2003). The Problem with Unstructured Data. Retrieved July 23, 2010, from Information

Management: http://www.information-management.com/issues/20030201/6287-1.html

Boehm, B. (1988). A Spiral Model for Software Development and Enhancement. IEEE, 21 , (5), 61-72.

Boisot, M., & Canals, A. (2004). Data, Information and Knowledge: Have We Got It Right? Journal of Evolutionary

Economics, 14 , (1), 43-67.

Breitner, C. (1997). Data Warehousing and OLAP: Delivering Just-in-Time Information for Decision Support.

Proceeding of the 6th International Workshop for Oeconometrics. Karlsruhe, Germany.

Breslin, M. (2004). Data Warehousing - Battle of the Giants: Comparing the Basics of the Kimball and Inmon

Models. Business Intelligence Journal, 9 , (1), 6-20.

Page 107: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 98 -

Bruckner, R., List, B., & Schiefer, J. (2002). Striving Towards Near Real-Time Data Integration for Data

Warehouses. In Lecture Notes in Computer Science (pp. 173-182). Berlin: Springer.

Cater-Steel, A. (2006). Transforming IT Service Management - the ITIL Impact. Proceedings of the 17th

Australasian Conference on Information Systems. Adelaide, Australia.

Cavaye, A. (1996). Case Study Research: A Multifaceted Research Approach for Information Systems. Information

Systems Journal, 6 , 227-242.

Chamoni, P., & Gluchowski, P. (2004). Integrationstrends bei Business-Intelligence-Systemen, Empirische

Untersuchung auf Basis des Business Intelligence Maturity Model. Wirtschaftsinformatik, 46 , (2), 119-128.

Chauduri, S., & Dayal, U. (1997). An Overview of Data Warehousing and OLAP Technology. ACM Sigmod

Record, 26 , (1), 65-74.

Chen, P. (1975). The Entity-Relationship Model — Toward a Unified View of Data. Proceedings of the

International Conference on Very Large Data Bases, (pp. 9-36). Framingham, Massachusetts, USA.

Choo, C. (1995). Information Management for the Intelligent Organization. Medford, NJ: Information Today, Inc.

Chung, W., Chen, H., & Nunamaker Jr., J. (2005). A Visual Framework for Knowledge Discovery on the Web: An

Empirical Study of Business Intelligence Exploration. Journal of Management, 21 , (4), 57-84.

Codd, E. (1970). A Relational Model for Large Shared Data Banks, 13. Communications of the ACM, 13 , 377-387.

Colin, R. (2004). An Introductory Overview of ITIL. Reading, United Kingdom: itSMF Publications.

Darke, P., Shanks, G., & Broadbent, M. (1998). Successfully Completing Case Study Research: Combining Rigour,

Relevance and Pragmatism. Information Systems Journal, 8 , (4), 273-289.

Davenport, T., & Prusak, L. (2000). Working Knowledge: How Organizations Manage What They Know. Harvard:

Harvard Business Press.

Dayal, U., Castellanos, M., Simitsis, A., & Wilkinson, K. (2009). Data Integration Flows for Business Intelligence.

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database

Technology (pp. 1-11). Saint Petersburg, Russia: ACM.

de Bruin, T., Freezey, R., Kulkarniz, U., & Rosemann, M. (2005). Understanding the Main Phases of Developing a

Maturity Assessment Model. Proceedings of the 16th Australasian Conference on Information Systems. Sydney,

Australia.

Devlin, B., & Murphy, P. (1988). An Architecture for a Business and Information Systems. IBM Systems Journal,

27 , (1).

Drucker, P. (1999). Management Challenges for the 21st Century. Oxford: Butterworh-Heinemann.

Eckerson, W. (2009). Delivering Insights with Next Generation Analytics. Retrieved April 23, 2010, from The Data

Warehousing Institute: http://tdwi.org/research/2009/07/beyond-reporting-delivering-insights-with-nextgeneration-

analytics.aspx?tc=page0

Eckerson, W. (2004). Gauge Your Data Warehousing Maturity. Retrieved July 3, 2010, from The Data Warehousing

Institute: http://tdwi.org/Articles/2004/10/19/Gauge-Your-Data-Warehousing-Maturity.aspx?Page=2

Page 108: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 99 -

Eckerson, W. (2006). Performance Dashboards. New Jersey: John Wiley & Sons, Inc.

Eckerson, W. (2007). Predictive Analytics: Extending the Values of Your Data Warehousing Investment. Retrieved

June 30, 2010, from SAS: http://www.sas.com/feature/analytics/102892_0107.pdf

Fayyad, U., Gregory, P., & Padhraic, S. (1996). From Data Mining to Knowledge Discovery in Databases. The AI

Magazine, 17 , (3), 37-54.

Feinberg, D., & Beyer, M. (2010). Magic Quadrant for Data Warehouse Database Management Systems. Retrieved

July 21, 2010, from Business Intelligence: http://www.businessintelligence.info/docs/estudios/Gartner-Magic-

Quadrant-for-Datawarehouse-Systems-2010.pdf

Ferguson, R., & Jones, C. (1969). A Computer Aided Decision System. Management Science, 15 , (10), B550-B561.

Fitzgerald, G. (1992). Executive Information Systems and Their Development in the U.K.: A Research Study.

International Information Systems, 1 , (2),1-35.

Galliers, R., & Sutherland, A. (1991). Information Systems Management and Strategy Formulation: the "Stages of

Growth". Information Systems Journal, 1 , (2), 89-114.

Gartner. (2007, February 1). Creating Enterprise Leverage: The 2007 CIO Agenda . Retrieved June 24, 2010, from

Gartner: http://www.gartner.com/DisplayDocument?id=500835

Gartner. (2009). Gartner Reveals Five Business Intelligence Predictions for 2009 and Beyond. Retrieved August 6,

2010, from Gartner: http://www.gartner.com/it/page.jsp?id=856714

Golfarelli, M., & Rizzi, S. (2009). A Comprehensive Approach to Data Warehouse Testing. Proceeding of the ACM

twelfth international workshop on Data warehousing and OLAP, (pp. 17-24). Hong Kong.

Golfarelli, M., & Rizzi, S. (1998). A Methodological Framework for DW Design. Proceedings ACM First

International Workshop on Data Warehousing and OLAP (DOLAP). Washington, D.C., USA.

Golfarelli, M., & Rizzi, S. (1999). Designing the Data Warehouse: Key Steps and Crucial Issues. Journal of

Computer Science and Information Management, 2 , (3).

Golfarelli, M., Rizzi, S., & Cella, I. (2004). Beyond Data Warehousing - What's Next in Business Intelligence?

Proceedings of the 7th ACM International Workshop on Data Warehousing and OLAP, (pp. 1-6). Washington, D.C.,

USA.

Gorry, A., & Morton, S. (1971). A Framework for Information Systems. Sloan Management Review, 13 , 56-79.

Gray, P., & Negash, S. (2003). Business Intelligence. Proceedings of the 9th Americas Conference on Information

Systems, (pp. 3190-3199). Tampa, Florida, USA.

Grönroos, C. (1990). Service Management and Marketing - Managing the Moments of Truth in Service Competition.

Lexington: Lexington Books.

Hakes, C. (1996). The Corporate Self Assessment Handbook, 3rd edition. London: Chapman & Hall.

Hansen, W. (1997). Vorgehensmodell zur Entwicklung Einer Data Warehouse Losung. In H. Mucksch, & W.

Behme, Das Data Warehouse Konzept (pp. 311-328). Wiesbaden: Gabler.

Page 109: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 100 -

Hayen, R., Rutashobya, C., & Vetter, D. (2007). An Investigation of the Factors Affecting Data Warehousinf

Success. Issues in Information Systems, 8 , (2), 547-553.

Hevner, A., March, S., Park, J., & Ram, S. (2004). Design Science in Information Systems Research. Management

Information Systems Quarterly, 28 , (1), 75-106.

Hey, J. (2004). The Data, Information, Knowledge, Wisdom Chain: the Metaphorical Link. Retrieved June 27, 2010,

from http://best.berkeley.edu/~jhey03/files/reports/IS290_Finalpaper_HEY.pdf

Hoffman, R., Shadbolt, N., Burton, A., & Klein, G. (1995). Eliciting Knowledge from Experts: A Methodological

Analysis. Organizational Behaviour and Human Decision Processes, 62 , (2), 129-158.

Holsheimer, M., & Siebes, A. (1994). Data Mining: the Search for Knowledge in Databases (9406). Amsterdam:

Centrum voor Wiskunde en Informatica.

Hostmann, B. (2007). BI Competency Centers: Bringing Intelligence to the Business. Retrieved July 3, 2010, from

Business Performance Management: http://bpmmag.net/mag/bi_competency_centers_intelligence_1107/index2.html

Huber, G. (1984). Issues in the Design of Group Decision Support Systems. MIS Quarterly, 8 , (3), 195-204.

Humphries, M., Hawkins, M., & Dy, M. (1999). Data Warehousing: Architecture and Implementation. New Jersey:

Prentice Hall PTR.

Husemann, B., Lechtenborger, J., & Vossen, G. (2000). Conceptual Data Warehouse Design. Proceedings of the

International Workshop on Design and Management of Data Warehouses. Stockholm, Sweden.

Hwang, H., Ku, C., Yen, D., & Cheng, C. (2005). Critical Factors Influencing the Adoption of Data Warehouse

Technology: A Study of the Banking Industry in Taiwan. Decision Support Systems, 37 , 1-21.

IEEE. (1990). Standard Glossary of Software Engineering Terminology (IEEE STD 610.12). New York: Institute of

Electrical and Electronics Engineers, Inc.

Inmon, W. (1992). Building the Data Warehouse. Indianapolis: John Wiley and Sons, Inc.

Inmon, W. (2005). Building the Data Warehouse, 4th edition. Indianapolis: Wiley Publishing, Inc.

Jashapara, A. (2004). Knowledge Management: An Integrated Approach. Harlow: Finance Times Prentice Hall.

Kaplan, R., & Norton, D. (1992). The Balanced Scorecard - Measure that Drive Performance. Harvard Business

Review, 70 , (1), 71-79.

Kaula, R. (2009). Business Rules for Data Warehouse. International Journal of Information Technology, 5 , 58-66.

Kaye, D. (1996). An Information Model of Organization. Managing Information, 3 , (6),19-21.

Kimball, R. (1996). The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data

Warehouses. New York: John Wiley & Sons, Inc.

Kimball, R., & Caserta, J. (2004). The Data Warehouse ETL Toolkit. Indianapolis: Wiley Publishing, Inc.

Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., & Becker, B. (2008). The Data Warehouse Lifecycle Toolkit,

2nd Edition. Indianapolis: John Wiley.

Page 110: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 101 -

Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., & Becker, B. (2008). The Data Warehouse Lifecycle Toolkit,

2nd Edition. Indianapolis: Wiley Publishing, Inc.

Klimko, G. (2001). Knowledge Management and Maturity Models: Building Common Understanding . Proceedings

of the 2nd European Conference on Knowledge Management, (pp. 269-278). Bled, Slovenia.

Kraemer, K., & King, J. (1988). Computer-Based Systems for Cooperative Work and Group Decision Making. ACM

Computing Surveys , (2), 115-146.

Kuhn, T. (1974). Second Thoughts on Paradigms. In F. Suppe, The Structure of Scientific Theories. Urbana: The

University of Illinois Press.

Lewis, J. (2001). Project Planning, Scheduling and Control, 3rd Edition. New York: McGraw-Hill.

Loshin, D. (2003). Business Intelligence: the Savvy Manager's Guide. San Francisco: Morgan Kaufmann Publishers.

Loshin, D. (2003). Business Intelligence: The Savvy Manager's Guide. San Francisco: Morgan Kaufmann

Publishers.

Luhn, H. (1958). A Business Intelligence System. IBM Journal of Research and Development, 2 , (4), 314-319.

Madden, S. (2006). Rethinking Database Appliances. Retrieved July 21, 2010, from Information Management:

http://www.information-management.com/specialreports/20061024/1066827-1.html?pg=1

March, S., & Hevner, A. (2007). Integrated Decision Support Systems: A Data Warehousing Perspective. Decision

Support Systems, 43 , (3), 1031-1043.

Moody, D., & Kortink, M. (2000). From Enterprise Models to Dimensional Models: A Methodology for Data

Warehouse and Data Mart Design. Proceedings of the International Workshop on Design and Management of Data

Warehouses, (pp. 1-12). Stockholm.

Moss, L., & Atre, S. (2003). Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support

Applications. Boston: Addison Wesley.

Mukherjee, D., & D'Souza, D. (2003). Think Phased Implementation for Successful Data. Information Systems

Management, 20 , (2), 82-90.

Munoz, L., Mazon, J., Pardillo, J., & Trujillo, J. (2008). Modelling ETL Processes of Data Warehouses with UML

Activity Diagrams. Proceedings of the OTM Workshops (pp. 44-53). Monterrey, Mexico: Springer.

Murtaza, A. (1998). A Framework for Developing Enterprise Data Warehouses. Information Systems Management,

15 , (4), 21-26.

Nagabhushana, S. (2006). Data Warehousing. OLAP and Data Mining. New Delhi: New Age International Limited.

Navathe, S. B. (1992). Evolution of Data Modelling for Databases. Communications of the ACM, 35 , (9), 112-123.

Negash, S., & Gray, P. (2003). Business Intelligence. Proceedings of the 9th Americas Conference on Information

Systems, (pp. 3190-3199). Tampa, Florida, USA.

Niessink, F., & van Vliet, H. (2000). Software Maintenance from a Service Perspective. Journal of Software

Maintenance: Research and Practice, 12 , (2), 103-120.

Page 111: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 102 -

Niessink, F., & van Vliet, H. (1999). The IT Service Capability Maturity Model (IR-463). Amsterdam: Division of

Mathematics and Computing Science, Vrije Universiteit.

Nolan, R. (1973). Managing the Computer Resource: A Stage Hypothesis. Communications of the ACM, 16 , (7),

399-405.

Nonaka, I. (1991). The Knowledge-Creating Company. Harvard Business Review, 79 , (6), 96-104.

O'Reilly, C. (1980). Individuals and Information Overload in Organizations: Is More Necessarily Better? Academy

of Management Journal, 23 , (4), 684-696.

Paulk, M., Weber, C., Curtis, B., & Chrissis, M. (1995). The Capability Maturity Model: Guidelines for Improving

the Software Process. Boston: MA: Addison-Wesley.

Ponniah, P. (2001). Data Warehousing Fundamentals. New York: John Wiley & Sons, Inc.

Porter, M. (1985). Competitive Advantage. New York: The New Press.

Power, D. (2003). A Brief History of Decision Support Systems. Retrieved June 30, 2010, from Decision Support

Systems Resources: http://dssresources.com/history/dsshistoryv28.html

Prakash, N., & Gosain, A. (2008). An Approach to Engineering the Requirements of Data Warehouses.

Requirements Engineering, 13 , (1), 49-72.

Rahm, E., & Hai Do, H. (2000). Data Cleaning: Problems and Current Approaches. Bulletin of the Technical

Committee on Data Engineering, 23 , (4), 3-13.

Rangaswamy, A., & Shell, G. (1997). Using Computers to Realize Joint Gains in Negotiations: Toward an

‗Electronic Bargaining Table‘. Management Science, 43 , (8), 1147-1163.

Royce, W. (1970). Managing the Development of Large Software Systems. Proceedings of the Western Electronic

Show and Convention (WesCon). Los Angeles.

Runeson, P., & Host, M. (2009). Guidelines for Conducting and Reporting Case Study Research in Software

Engineering. Empirical Software Engineering, 14 , (2), 131-164.

Rus, I., & Lindvall, M. (2002). Knowledge Management in Software Engineering. IEEE Software, 19 , (3), 26-38.

Salle, M. (2004). IT Service Management and IT Governance: Review, Comparative. Retrieved July 16, 2010, from

HP Technical Reports: http://www.hpl.hp.com/techreports/2004/HPL-2004-98.pdf

Schneidewind, N. (1987). The State of Maintenance. IEEE Transactions on Software Engineering, 13 , (3), 303-310.

Schwaninger, M. (2001). Intelligent Organizations: An Integrative Framework. SystemsResearch and Behavioral

Science, 18 , 137-158.

Sen, A., & Sinha, A. (2005). A Comparison of Data Warehousing Methodologies. Communications of the ACM, 48 ,

(3), 79-84.

Seufert, A., & Schiefer, J. (2005). Enhanced Business Intelligence - Supporting Business Processes with Real-Time

Business Analytics. Proceedings of the 16th International Workshop on Database and Expert Systems Applications,

(pp. 919-925). Copenhagen, Denmark.

Page 112: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 103 -

Shankaranarayanan, G., & Even, A. (2004). Managing Metadata in Data Warehouses: Pitfalls and Possibilities.

Communications of the Association for Information Systems, 14 , 247-274.

Simitsis, A. (2004). Modelling and Optimization of ETL Processes in Data Warehouse Environments. Athens:

National Technical University of Athens.

Simitsis, A., Vassiliadis, P., & Sellis, T. (2005). Optimizing ETL Processes in Data Warehouses. Proceedings of the

21st International Conference on Data Engineering (pp. 564-575). Tokyo, Japan: IEEE Computer Science.

Simitsis, A., Vassiliadis, P., & Sellis, T. (2005). State-Space Optimization of ETL Workflows. IEEE Transactions

on Knowledge and Data Engineering, 17 , (10), 1404-1419.

Simsion, G. C., & Witt, G. C. (2005). Data Modelling Essential, 3rd Edition. San Francisco: Morgan Kaufmann

Publishers.

Solomon, M. (2005). Ensuring a Successful Data Warehouse Initiative. Information Systems Management, 22 , (1),

26-36.

Sommerville, I. (2007). Software Engineering, 8th Edition. Harlow: Addison-Wesley.

Thomas, J. (2001). Business Intelligence - Why? eAI Journal , 47-49.

Tijsen, R., Spruit, M., van Raaij, B., & van de Ridder, M. (2009). BI-FIT: The Fit between Business Intelligence,

End-Users, Tasks and Technologies. Utrecht: Utrecht University.

Tremblay, M., Fuller, R., Berndt, D., Studnicki, & J. (2007). Doing More with More Information: Changing

Healthcare Planning. Decision Support Systems, 43 , 1305-1320.

Tryfona, N., Busborg, F., & Christiansen, J. (1999). Data Warehousing and OLAP. Proceedings of the 2nd ACM

international workshop on Data warehousing and OLAP, (pp. 3-8). Kansas City, Missouri, United States .

Turban, E., Aronson, J., Liang, T., & Sharda, R. (2007). Business Intelligence and Decision Support Systems. New

Jersey: Pearson Education International.

Vaishnavi, V., & Kuechler, W. (2008). Design Science Research Methods and Patters: Innovating Information and

Communication Technology. Boca Raton, Florida: Auerbach Publications Taylor & Francis Group.

van Bon, J. (2007). IT Service Management: An Introduction. Zaltbommel: Van Haren Publishing.

van Bon, J. (2000). World Class IT Service Management Guide . The Hague: ten Hagen & Stam Publishers.

Vanichayobon, S., & Gruenwald, L. (2004). Indexing Techniques for Data Warehouses’ Queries. Retrieved July 3,

2010, from Univerisyt of Oklahoma Database: http://www.cs.ou.edu/~database/documents/vg99.pdf

Varga, M., & Vukovic, M. (2008). Feasability of Investment in Business Analytics. Journal of Information and

Organizational Sciences, 31 , (2), 50-62.

Vitt, E., & Luckevich, M. &. (2002). Business Intelligence: Making Better Decisions Faster. Redmond: Microsoft

Press.

Walker, D. (2006). Overview Architecture for Enterprise Data Warehouses. Retrieved July 23, 2010, from Data

Management & Warehousing : http://www.datamgmt.com/

Page 113: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 104 -

Watson, H., Ariyachandra, T., & Matyska, R. (2001). Data Warehousing Stages of Growth. Information Systems

Management, 18 , (3), 42-50.

Winter, R., & Stauch, B. (2003). A Method for Demand-driven Information Requirements Analysis in Data

Warehousing Projects. Proceedings of the 36th Hawaii International Conference on System Sciences. Big Island:

IEEE Computer Society.

Wixom, B., & Watson, H. (2001). An Empirical Investigation of the Factors Affecting Data Warehousing Success.

MIS Quarterly, 25 , (1).

Yin, R. (2009). Case Study Research Design and Methods. Thousand Oaks, California: SAGE Inc.

Young, C. (2004). An Introduction to IT Service Management. Research Note, COM-10-8287 .

Zeithaml, V., & Bitner, M. (1996). Service Marketing. New York: McGraw-Hill.

Zins, C. (2007). Conceptual Approaches for Defining Data, Information, and Knowledge. Journal of the American

Society for Information Science and Technology, 58 , (4), 479-493.

Page 114: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 105 -

Appendix A: DW Detailed Maturity Matrix

DW Technical Solution

Architecture

Initial (1) Repeatable (2) Defined (3) Managed (4) Optimized (5) Desktop data marts (e.g.:

Excel sheets)

Multiple independent data

marts

Multiple independent

data warehouses

A single, central DW

with multiple data

marts (Inmon) or

conformed data marts

(Kimball)

A DW/BI service that

federates a central

enterprise DW and

other data sources via a

standard interface

No business rules

defined or implemented

Few business rules defined

or implemented

Some business rules

defined or implemented

Most business rules

defined or

implemented

All business rules

defined or implemented

No metadata

management

Non-integrated metadata

by solution

Central metadata

repository separated by

tools

Central up-to-date

metadata repository

Web-accessed central

metadata repository

with integrated,

standardized, up-to-date

metadata

No security implemented Authentication security

Independent

authorization for each

tool

Role-level security at

database level

Integrated

companywide

authorization security

CSVs files Operational databases ERP and CRM systems;

XML files

Unstructured data

sources (e.g.: text or

documents)

Various types of

unstructured data

sources (e.g.: images,

videos) and Web data

sources

No methods to increase

performance

Software performance

tuning (e.g.: index

management, parallelizing

and partitioning system,

views materialization)

Hardware performance

tuning (e.g.: DW server)

Software and

hardware tuning

Specialized DW

appliances

Desktop platform Shared OLTP systems and

DW environment

Separate OLTP systems

and DW environment

Separate servers for

OLTP systems, DW,

ETL and BI

applications

Specialized DW

appliances

Monthly update or less

often

Weekly update Daily update Inter-daily update Real-time update

Data Modelling

Initial (1) Repeatable (2) Defined (3) Managed (4) Optimized (5)

Page 115: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 106 -

No data modelling tool Data modelling tools used

only for design

Data modelling tools

used also for

maintenance

Standardized data

modelling tool used for

design

Standardized data

modelling tool used

for design and

maintaining metadata

No synchronization

between data models

Manual synchronization of

some of the data models

Manual or automatic

synchronization

depending on the data

models

Automatic

synchronization of most

of the data models

Automatic

synchronization of all

the data models

No differentiation

between data models

levels

Logical and physical levels

designed for some data

models

Logical and physical

levels designed for all

the data models

Conceptual level also

designed for some data

models

All data models have

conceptual, logical and

physical levels

designed

No standards defined or

implemented for data

models

Solution-dependent

standards defined or

implemented for some of

the data models

Enterprise-wide

standards defined or

implemented for some of

the data models

Enterprise-wide

standards defined or

implemented for most

of the data models

Enterprise-wide

standards defined or

implemented for all

the data models

No documentation for

any data models

Non standardized

documentation for some of

the data models

Standardized

documentation for some

of the data models

Standardized

documentation for most

of the data models

Standardized

documentation for all

the data models

Very few fact tables

have their granularity at

the lowest level possible

Few fact tables have their

granularity at the lowest

level possible

Some fact tables have

their granularity at the

lowest level possible

Most fact tables have

their granularity at the

lowest level possible

All fact tables have

their granularity at the

lowest level possible

No conformed

dimensions

Conformed dimensions for

few business processes

Conformed dimensions

for some business

processes

Enterprise-wide

standardized conformed

dimensions for most

business processes; also

making use of a high

level design technique

such as an enterprise

bus matrix

Enterprise-wide

standardized

conformed dimensions

for all business

processes

Few dimensions

designed; no hierarchies

or surrogate keys

designed

Some dimensions designed

with surrogate keys and

basic hierarchies

Most dimensions

designed with surrogate

keys and complex

hierarchies

Slowly changing

dimensions techniques

(i.e.: type 2, 3 and

more) also designed

Besides regular

dimensions and slowly

changing dimensions,

special dimensions are

also designed (e.g.:

mini, monster, junk

dimensions)

ETL

Initial (1) Repeatable (2) Defined (3) Managed (4) Optimized (5) Only hand-coded ETL Hand-coded ETL and some

standard scripts

ETL tool(s) for all the

ETL design and

generation

Standardized ETL tool

and some standard

scripts for better

performance

Complete ETL

generated from

metadata

Simple ETL that just

extracts and loads data

into the data warehouse

Basic ETL with simple

transformations such as:

format changes, sorting,

filtering, joining, deriving

new calculated values,

aggregation, etc and

surrogate key generator

Advanced ETL

capabilities: slowly

changing dimensions

manager, reusability,

change data capture

system, de-duplication

and matching system,

More advanced ETL

capabilities: error event

table creation, audit

dimension creation, late

arriving data handler,

hierarchy manager,

Optimized ETL for a

real time DW (real-

time ETL capabilities)

Page 116: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 107 -

data quality system special dimensions

manager

Daily automation: no;

Specific data quality

tools: no;

Identifying data quality

issues: no;

Solving data quality

issues: no

Daily automation: no;

Specific data quality tools:

no;

Identifying data quality

issues: yes;

Solving data quality issues:

no

Daily automation:

yes/no;

Specific data quality

tools: yes/no;

Identifying data quality

issues: yes;

Solving data quality

issues: no

Daily automation:

yes/no;

Specific data quality

tools: yes/no;

Identifying data quality

issues: yes;

Solving data quality

issues: yes

Daily automation: yes;

Specific data quality

tools: yes;

Identifying data

quality issues: yes;

Solving data quality

issues: yes

Restart and recovery

system: no;

Simple monitoring: no;

Advanced monitoring:

no;

Real-time monitoring: no

Restart and recovery

system: no;

Simple monitoring: yes;

Advanced monitoring: no;

Real-time monitoring: no

Manual restart and

recovery system: yes;

Simple monitoring: yes;

Advanced monitoring:

yes;

Real-time monitoring: no

Manual and automatic

restart and recovery

system: yes;

Simple monitoring: yes;

Advanced monitoring:

yes;

Real-time monitoring:

no

Completely automatic

restart and recovery

system: yes;

Simple monitoring:

yes;

Advanced monitoring:

yes;

Real-time monitoring:

yes

No standards Few standards defined or

implemented for ETL

Some standards defined

or implemented for ETL

Most standards defined

or implemented for

ETL

All the standards

defined or

implemented for ETL

No metadata

management Business and technical

metadata for some ETL

Business and technical

metadata for all ETL

Process metadata is also

managed for some ETL

All types of metadata

are managed for all

ETL

BI Applications

Initial (1) Repeatable (2) Defined (3) Managed (4) Optimized (5) Static and parameter-

driven reports and query

applications

Ad-hoc reporting; online

analytical processing

(OLAP)

Visualization techniques:

dashboards and

scorecards

Predictive analytics:

data and text mining;

alerts

Closed loop BI

applications; real-time

BI applications

BI tool related to the

data mart

More than two tools for

main stream BI (i.e.:

reporting and visualization

applications)

One tool recommended

for main stream BI, but

each department can use

their own tool

One tool for main

stream BI, but each

department can use

their own tool for

specific BI applications

(e.g.: data mining,

financial analysis, etc.)

One tool for main

stream BI and one tool

for specific BI

applications

No standards

Few standards defined or

implemented for BI

applications

Some standards defined

or implemented for BI

applications

Most standards defined

or implemented for BI

applications

All the standards

defined or

implemented for BI

applications

Objects defined for

every BI application

Some reusable objects for

similar BI applications

Some standard objects

and templates for similar

BI applications

Most similar BI

applications use

standard objects and

templates

All similar BI

applications use

standard objects and

templates

Page 117: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 108 -

Reports are delivered

manually on paper or by

email

Reports are delivered

automatically by email

Direct tool-based

interface

A BI portal with basic

functions: subscriptions

, discussions forum,

alerting

Highly interactive,

business process

oriented, up-to-date

portal (no

differentiation

between operational

and BI portals)

No metadata available Some incomplete metadata

documents that users ask

for periodically

Complete up-to-date

metadata documents sent

to users periodically or

available on the intranet

Metadata is always

available through a

metadata management

tool, different from the

BI tool

Complete integration

of metadata with the

BI applications

(metadata can be

accessed through one

button push on the

attributes, etc.)

DW Organization & Processes

Development Processes

Initial (1) Repeatable (2) Defined (3) Managed (4) Optimized (5)

Ad-hoc development

processes; no clearly

defined development

phases (i.e.: planning,

requirements definition,

design, construction,

deployment,

maintenance)

Repeatable development

processes based on

experience with similar

projects; some

development phases clearly

separated

Standard documented

development processes;

iterative and incremental

development processes

with all the development

phases clearly separated

Development processes

continuously measured

against well-defined

and consistent goals

Continuous

development process

improvement by

identifying

weaknesses and

strengthen the process

proactively, with the

goal of preventing the

occurrence of defects

No separation between

environments

Two separate environments

(i.e.: usually development

and production) with

manual transfer between

them

Some separation

between environments

(i.e.: at least three

environments) with

manual transfer between

them

Some separation

between environments

(i.e.: at least two

environments) with

automatic transfer

between them

All the environments

are distinct with

automatic transfer

between them

No standards defined or

implemented

Few standards defined or

implemented

Some standards defined

or implemented

A lot of the standards

defined or implemented

A comprehensive set

of standards defined or

implemented

No quality assurance

activities

Ad-hoc quality assurance

activities

Standardized and

documented quality

assurance activities done

for all the development

phases

Level 3) + measurable

and prioritized goals for

managing the DW

quality (e.g.:

functionality,

reliability,

maintainability,

usability)

Level 4) + causal

analysis meetings to

identify common

defect causes and

subsequent elimination

of these causes;

service quality

management

certification

No project sponsor

Chief information officer

(CIO) or an IT director

Single sponsor from a

business unit or

department

Multiple individual

sponsors from multiple

business units or

Multiple levels of

business-driven, cross-

departmental

Page 118: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 109 -

departments sponsorship including

top level management

sponsorship (BI/DW is

integrated in the

company process with

continuous budget)

No project management

activities

Project planning and

scheduling

Some of the main project

management activities

(project planning and

scheduling; project risk

management; project

tracking and control)

Some project

management activities;

standard and efficient

procedure and

documentation

Project planning and

scheduling;

project risk

management;

project tracking and

control;

standard and efficient

procedure and

documentation;

evaluation and

assessment

No formal roles defined

Defined roles, but not

technically implemented

Formalized and

implemented roles and

responsibilities

Level 3) + periodic peer

reviews (i.e.: review of

each other‘s work)

Level 4) + periodic

evaluation and

assessment of roles

(i.e.: assess the

performance of the

roles and match the

needed roles with

responsibilities and

tasks)

Ad-hoc knowledge

gathering and sharing

Organized knowledge

sharing through written

documentation and

technology (e.g.:

knowledge databases,

intranets, wikis, etc.), and

also through training and

mentoring programs

Knowledge management

is standardized;

knowledge creation and

sharing through

brainstorming, training

and mentoring programs,

and also through the use

of technology

Central business unit

knowledge

management;

quantitative knowledge

management control

and periodic knowledge

gap analysis

Continuously

improving inter-

organizational

knowledge sharing

Ad-hoc requirements

definition; no

methodology used

Methodologies differ from

project to project;

interviews with business

users for collecting the

requirements

Standard methodology

for all the projects;

interviews and group

sessions with both

business and IT users for

collecting the

requirements

Level 3) + qualitative

assessment and

measurement of the

phase; requirements

document also

published

Level 4) + causal

analysis meetings to

identify common

bottlenecks causes and

subsequent elimination

of these causes

Only unit testing is done;

no standards or

documentation

Other types of testing are

beginning to be done

(some of the following:

unit testing by another

person; system integration

testing; regression testing;

Diverse types of testing;

some standards

Diverse types of

testing;

standard procedure and

documentation

All the main types

testing (unit testing by

another person; system

integration testing;

regression testing;

acceptance testing);

Page 119: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 110 -

acceptance testing) user training;

standard procedure

and documentation;

external assessments

and reviews

Service Processes

Initial (1) Repeatable (2) Defined (3) Managed (4) Optimized (5)

No service quality

management activities

Ad-hoc service quality

management

Proactive service quality

management including a

standard procedure

level 3) + service

quality measurements

periodically compared

to the established goals

to determine the

deviations and their

causes

level 4) + causal

analysis meetings to

identify common

defect causes and

subsequent elimination

of these causes;

service quality

management

certification

Ad-hoc knowledge

gathering and sharing

Organized knowledge

sharing through written

documentation and

technology (e.g.:

knowledge databases,

intranets, wikis, etc.)

Knowledge management

is standardized;

knowledge creation and

sharing through

brainstorming, training

and mentoring programs

Central business unit

knowledge

management;

quantitative knowledge

management control

and periodic knowledge

gap analysis

Continuously

improving inter-

organizational

knowledge

management

Customer and suppliers

service needs

documented in an ad-hoc

manner; no service

catalogue compiled

Some customer and

supplier service needs

documented and

formalized based on

previous experience

All the customer and

supplier service needs

documented and

formalized according to

a standard procedure into

service level agreements

(SLAs)

SLAs reviewed with the

customer and supplier

on both a periodic and

event-driven basis

Actual service

delivery continuously

monitored and

evaluated with the

customer on both a

periodic and event-

driven basis for

continuous

improvement (SLAs

including penalties)

Incident management is

done ad-hoc with no

specialized ticket

handling system or

service desk to assess

and classify them prior

to referring them to a

specialist

A ticket handling system is

used for incident

management and some

procedures are followed,

but nothing is standardized

or documented

A service desk is the

recognized point of

contact for all the

customer queries;

incidents assessment and

classification is done

following a standard

procedure

Level 3) + standard

reports concerning the

incident status

including

measurements and

goals (e.g.: response

time) are regularly

produced for all the

involved teams and

customers; an incident

management database is

established as a

repository for the event

records

Level 4) + trend

analysis in incident

occurrence and also in

customer satisfaction

and value perception

of the services

provided to them

Change requests are A ticket handling system is A standard procedure is Level 3) + standard Level 4) + trend

Page 120: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 111 -

made and solved in an

ad-hoc manner

used for storing and

solving the requests for

change and some

procedures are followed,

but nothing is standardized

or documented

used for approving,

verifying, prioritizing

and scheduling changes

reports concerning the

change status including

measurements and

goals (e.g.: response

time) are regularly

produced for all the

involved teams and

customers; standards

established for

documenting changes

analysis and statistics

regarding change

occurrence, success

rate, customer

satisfaction and value

perception of the

services provided to

them

Ad-hoc resource

management activities

(only when there is a

problem)

Resource management is

done following some

procedures, but nothing is

standardized or

documented

Resource management is

done constantly

following a standardized

documented procedure

Level 3) + standard

reports concerning

performance and

resource management

including

measurements and

goals are done on a

regular basis

Level 4) + resource

management trend

analysis and

monitoring to

determine the most

common bottlenecks

and make sure that

there is sufficient

capacity to support

planned services

Ad-hoc availability

management

Availability management is

done following some

procedures, but nothing is

standardized or

documented

Availability management

documented and done

using a standardized

procedure (all elements

are monitored)

Level 3) + risk

assessment to determine

the critical elements

and possible problems

Level 4) + availability

management trend

analysis and planning

to determine the most

common bottlenecks

and make sure that all

the elements are

available for the

agreed service level

targets

Ad-hoc changes solving

and implementation; no

release naming and

numbering conventions

Release management is

done following some

procedures, but nothing is

standardized or

documented; release

naming and numbering

conventions

Release management is

documented and done

following a standardized

procedure; assigned

release management

roles and responsibilities

Level 3) + standard

reports concerning

release management

including

measurements and

goals are done on a

regular basis; master

copies of all software in

a release secured in a

release database

Level 4) + release

management trend

analysis, statistics and

planning

Page 121: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 112 -

Appendix B: The DW Maturity Assessment Questionnaire (Final Version)

Data Warehouse (DW) Maturity Assessment Questionnaire

The filling in of the questionnaire will take approximately 50 minutes and in the end a maturity score for each

benchmark category/sub-category and an overall maturity score will be provided. The questions from the first part of

the questionnaire (i.e.: DW General Questions) are not scored and their answers will serve as input for shaping a

better image of the DW solution maturity. The questions from the second and third part of the questionnaire (i.e.:

DW technical solution; and DW organization and processes) are scored from 1 to 5 and they are multiple choice

questions with only one possible answer (except questions 3.1 – 11 and 3.2 – 1 where more answers may be circled).

1 DW General Questions

1) Could you elaborate on the main drivers for implementing a BI/DW solution in your organization?

2) How long has your organization been using BI/DW?

3) Could you elaborate on the success of the BI/DW solution in your organization, in terms of:

a) Returns vs. Costs

b) Time (Intended vs. Actual)

c) Quality

d) End-user adoption.

4) Which answer best describes how executives perceive the purpose of your organization‘s BI/DW environment?

a) Operational cost center – An IT system needed to run the business

b) Tactical resource - Tools to assist decision making

c) Mission-critical resource - A system that is critical to running business operations

d) Strategic resource – Key to achieving performance objectives and goals

e) Competitive differentiator – Key to gaining or keeping customers and/or market share.

5) What percentage of the annual IT budget for your organization does the BI/DW budget represent?

6) What percentage of the IT department is taking care of BI (i.e.: how many people from the total number of IT

employees)?

7) Who is the budget owner of the BI/DW solution in your organization (i.e.: who is responsible for paying the

invoice)?

8) Which technologies do you use for developing the BI/DW solution in your organization?

Developing Category Technology

Data Modelling

Extract/Transform/Load (ETL)

BI Applications

Database

Page 122: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 113 -

9) What data modelling technique do you use for your BI/DW solution (e.g.: dimensional modelling, normalized

modelling, data vault, etc.)?

2 DW Technical Solution

2.1 General Architecture and Infrastructure

1) What is the predominant architecture of your DW?

a) Multiple independent data marts

b) A virtual integrated DW or real-time DW

c) Multiple independent data warehouses

d) A single, central DW with multiple data marts (Inmon) or conformed data marts (Kimball)

e) Desktop data marts (e.g.: Excel sheets)

2) To what degree have you defined and documented definitions and business rules for the necessary

transformations, key terms and metrics?

a) No business rules defined

b) Most of the business rules defined and documented

c) Few business rules defined and documented

d) All business rules defined and documented

e) Some business rules defined and documented

3) To what degree have you implemented definitions and business rules for the necessary transformations, key

terms and metrics?

a) No business rules implemented

b) Most of the business rules implemented

c) Few business rules implemented

d) All business rules implemented

e) Some business rules implemented

4) To what degree is your metadata management implemented?

a) Web-accessed central metadata repository with integrated, standardized, up-to-date metadata

b) Non-integrated metadata by solution

c) Central up-to-date metadata repository

d) No metadata management

e) Central metadata repository separated by tools

5) To what degree is security implemented in your DW architecture?

a) No security implemented

b) Integrated company wide security

c) Independent authorization for each tool

d) Authentication security

e) Role-level security at database level

6) What types of data sources does your DW support at the highest level?

a) CSVs files

b) Operational databases

c) ERP and CRM systems; XML files

Page 123: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 114 -

d) Unstructured data sources (e.g.: text or documents)

e) Various types of unstructured data sources (e.g.: images, videos) and Web data sources

7) To what degree do you use methods to increase the performance of your DW?

a) Specialized DW appliances (e.g.: Netezza, Teradata) or cloud computing

b) No methods to increase performance

c) Software performance tuning (e.g.: index management, parallelizing and partitioning system, views

materialization)

d) Hardware performance tuning (e.g.: DW server)

e) Software and hardware tuning

8) To what degree is your infrastructure specialized for a DW?

a) Desktop platform

b) Specialized DW appliances (e.g.: Netezza, Teradata)

c) Separate OLTP systems and DW environment

d) Separate servers for OLTP systems, DW, ETL and BI applications

e) Shared OLTP systems and DW environment

9) Which answer best describes the update frequency for your DW?

a) Daily update

b) Monthly update or less often

c) Real-time update

d) Inter-daily update

e) Weekly update

2.2 Data Modelling

1) Which answer best describes the usage of a data modelling tool in your organization?

a) No data modelling tool

b) Scattered data modelling tools used only for design

c) Standardized data modelling tool used for design and maintaining metadata

d) Standardized data modelling tool used only for design

e) Scattered data modelling tools used also for maintenance

2) Which answer best describes the degree of synchronization between the following data models that your

organization maintains and the mapping between them: ETL source and target models; DW and data marts

models; BI semantic or query object models?

a) Automatic synchronization of all of the data models

b) Manual synchronization of some of the data models

c) No synchronization between data models

d) Manual or automatic synchronization depending on the data models

e) Automatic synchronization of most of the data models

3) To what degree do you differentiate between data models levels: physical, logical and conceptual?

a) No differentiation between data models levels

b) All data models have conceptual, logical and physical levels designed

c) Logical and physical levels designed for some data models

d) Conceptual level also designed for some data models

e) Logical and physical levels designed for all the data models

Page 124: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 115 -

4) To what degree have you defined and documented standards (e.g.: naming conventions, metadata, etc.) for your

data models?

a) No standards defined for data models

b) Enterprise-wide standards defined for some of the data models

c) Enterprise-wide standards defined for most of the data models

d) Solution-dependent standards defined for some of the data models

e) Enterprise-wide standards defined for all the data models

5) To what degree have you implemented standards (e.g.: naming conventions, metadata, etc.) for your data

models?

a) No standards implemented for data models

b) Enterprise-wide standards implemented for some of the data models

c) Enterprise-wide standards implemented for most of the data models

d) Solution-dependent standards implemented for some of the data models

e) Enterprise-wide standards implemented for all the data models

6) To what degree have you documented the metadata (e.g.: definitions, business rules, main values, data quality,

etc.) in your data models?

a) No documentation for any data models

b) Standardized documentation for some of the data models

c) Standardized documentation for all the data models

d) Non standardized documentation for some of the data models

e) Standardized documentation for most of the data models

If you use dimensional modelling, please answer the following three questions:

7) What percentage of all your fact tables has their granularity at the lowest level possible?

a) Very few fact tables have their granularity at the lowest level possible

b) Few fact tables have their granularity at the lowest level possible

c) Some fact tables have their granularity at the lowest level possible

d) Most fact tables have their granularity at the lowest level possible

e) All fact tables have their granularity at the lowest level possible

8) To what degree do you design conformed dimensions in your data models?

a) No conformed dimensions

b) Conformed dimensions for few business processes

c) Enterprise-wide standardized conformed dimensions for most business processes; also making use of a high

level design technique such as an enterprise bus matrix

d) Conformed dimensions for some business processes

e) Enterprise-wide standardized conformed dimensions for all business processes

9) Which answer best describes the current state of your dimension tables modelling?

a) Few dimensions designed; no hierarchies or surrogate keys designed

b) Some dimensions designed with surrogate keys and basic hierarchies (if needed)

c) Most dimensions designed with surrogate keys and basic/complex hierarchies (if needed)

d) Slowly changing dimensions techniques (i.e.: type 2, 3 and more) also designed

e) Besides regular dimensions, special dimensions are also designed (e.g.: mini, monster, junk dimensions)

Page 125: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 116 -

2.3 ETL

1) Which answer best describes the usage of an ETL tool in your organization?

a) Only hand-coded ETL

b) Complete ETL generated from metadata

c) Hand-coded ETL and some standard scripts

d) Standardized ETL tool and some standard scripts

e) ETL tool(s) for all the ETL design and generation

2) Which answer best describes the complexity of your ETL?

a) Simple ETL that just extracts and loads data into the data warehouse

b) Basic ETL with simple transformations such as: format changes, sorting, filtering, joining, deriving new

calculated values, aggregation, etc and surrogate key generator

c) Advanced ETL capabilities: slowly changing dimensions manager, reusability, change data capture system,

de-duplication and matching system, data quality system

d) More advanced ETL capabilities: error event table creation, audit dimension creation, late arriving data

handler, hierarchy manager, special dimensions manager

e) Optimized ETL for a real time DW (real-time ETL capabilities)

3) Which answer best describes the data quality system implemented for your ETL?

a) Daily automation: yes / no; Specific data quality tools: yes / no; Identifying data quality issues: yes;

Solving data quality issues: no

b) Daily automation: no; Specific data quality tools: no; Identifying data quality issues: yes; Solving data

quality issues: no

c) Daily automation: yes / no; Specific data quality tools: yes / no; Identifying data quality issues: yes;

Solving data quality issues: yes

d) Daily automation: no; Specific data quality tools: no; Identifying data quality issues: no; Solving data

quality issues: no

e) Daily automation: yes; Specific data quality tools: yes; Identifying data quality issues: yes; Solving data

quality issues: yes

4) Which answer best describes the management and monitoring of your ETL?

(Definitions:

Simple monitoring (i.e.: ETL workflow monitor – statistics regarding ETL execution such as pending,

running, completed and suspended jobs; MB processed per second; summaries of errors, etc.);

Advanced monitoring (i.e.: ETL workflow monitor – statistics on infrastructure performance like CPU

usage, memory allocation, database performance, server utilization during ETL; job scheduler – time

or event based ETL execution, events notification; data lineage and analyzer system))

a) Restart and recovery system: no; Simple monitoring: no; Advanced monitoring: no; Real-time monitoring:

no

b) Restart and recovery system: no; Simple monitoring: yes; Advanced monitoring: no; Real-time monitoring:

no

c) Manual restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes / no; Real-

time monitoring: no

d) Manual and automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes

/ no; Real-time monitoring: no

e) Completely automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring:

yes; Real-time monitoring: yes

Page 126: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 117 -

5) To what degree have you defined and documented standards (e.g.: naming conventions, set-up standards,

recovery process, etc.) for your ETL?

a) No standards defined

b) Few standards defined for ETL

c) Some standards defined for ETL

d) Most standards defined for ETL

e) All the standards defined for ETL

6) To what degree have you implemented standards (e.g.: naming conventions, set-up standards, recovery process,

etc.) for your ETL?

a) No standards implemented

b) Few standards implemented for ETL

c) Some standards implemented for ETL

d) Most standards implemented for ETL

e) All the standards implemented for ETL

7) To what degree is your metadata management implemented for your ETL?

a) No metadata management

b) Business and technical metadata for some ETL

c) All types of metadata (i.e.: business, technical, process) are managed for all ETL

d) Process metadata is also managed for some ETL

e) Business and technical metadata for all ETL

2.4 BI Applications

1) Which types of BI applications best describe the highest level purpose of your DW environment?

a) Static and parameter-driven reports and query applications

b) Ad-hoc reporting; online analytical processing (OLAP)

c) Visualization techniques: dashboards and scorecards

d) Predictive analytics: data and text mining; alerts

e) Closed-loop BI applications; real-time BI applications

2) Which answer best describes your current BI tool usage?

(Definitions:

main stream BI applications (i.e.: reporting and visualization applications);

specific BI applications (i.e.: data mining, financial analysis, etc.))

a) One standardized tool for main stream BI and one standardized tool for specific BI applications

b) BI tool related to the data mart

c) One tool recommended for main stream BI, but each department can use their own tool

d) More than two tools for main stream BI

e) One standardized tool for main stream BI, but each department can use their own tool for specific BI

applications

3) To what degree have you defined and documented standards (e.g.: naming conventions, generic

transformations, logical structure of attributes and measures) for your BI applications?

a) No standards defined

b) Few standards defined for BI applications

c) Some standards defined for BI applications

Page 127: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 118 -

d) Most standards defined for BI applications

e) All the standards defined for BI applications

4) To what degree have you implemented standards (e.g.: naming conventions, generic transformations, logical

structure of attributes and measures) for your BI applications?

a) No standards implemented

b) Few standards implemented for BI applications

c) Some standards implemented for BI applications

d) Most standards implemented for BI applications

e) All the standards implemented for BI applications

5) To what degree are standardized objects (e.g.: KPIs, metrics, attributes, templates) implemented in your BI

applications?

a) Objects defined for every BI application

b) All similar BI applications use standard objects and templates

c) Some reusable objects for similar BI applications

d) Most similar BI applications use standard objects and templates

e) Some standard objects and templates for similar BI applications

6) Which BI applications delivery method best describes the highest level purpose of your DW?

a) Reports are delivered manually on paper or by email

b) Reports are delivered automatically by email

c) Direct tool-based interface

d) A BI portal with basic functions: subscriptions, discussions forum, alerting

e) Highly interactive, business process oriented, up-to-date portal (no differentiation between operational and

BI portals)

7) Which answer best describes the metadata accessibility to users?

a) No metadata available

b) Some incomplete metadata documents that users ask for periodically

c) Complete integration of metadata with the BI applications (metadata can be accessed through one button

push on the attributes, etc.)

d) Complete up-to-date metadata documents sent to users periodically or available on the intranet

e) Metadata is always available through a metadata management tool, different from the BI tool

3 DW Organization and Processes

3.1 Development Processes

1) Which answer best describes the DW development processes in your organization?

a) Ad-hoc development processes; no clearly defined development phases (i.e.: planning, requirements

definition, design, construction, deployment, maintenance)

b) Repeatable development processes based on experience with similar projects; some development phases

clearly separated

c) Standard documented development processes; iterative and incremental development processes with all the

development phases clearly separated

d) Development processes continuously measured against well-defined and consistent goals

Page 128: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 119 -

e) Continuous development process improvement by identifying weaknesses and strengthen the process

proactively, with the goal of preventing the occurrence of defects

2) To what degree is there a separation between the development/test/acceptance/deployment environments in

your organization?

a) No separation between environments

b) Two separate environments (i.e.: usually development and production) with manual transfer between them

c) All the environments are distinct with automatic transfer between them

d) Some separation between environments (i.e.: at least two environments) with automatic transfer between

them

e) Some separation between environments (i.e.: at least three environments) with manual transfer between

them

3) To what degree has your organization defined and documented standards for developing, testing and deploying

DW functionalities (i.e.: ETL and BI applications)?

a) No standards defined

b) Few standards defined

c) Some standards defined

d) A lot of the standards defined

e) A comprehensive set of standards defined

4) To what degree has your organization implemented standards for developing, testing and deploying DW

functionalities (i.e.: ETL and BI applications)?

a) No standards implemented

b) Few standards implemented

c) Some standards implemented

d) A lot of the standards implemented

e) A comprehensive set of standards implemented

5) Which answer best describes the DW quality management?

a) No quality assurance activities

b) Ad-hoc quality assurance activities

c) Standardized and documented quality assurance activities done for all the development phases

d) c) + measurable and prioritized goals for managing the DW quality (e.g.: functionality, reliability,

maintainability, usability)

e) d) + causal analysis meetings to identify common defect causes and subsequent elimination of these

causes; service quality management certification

6) Which answer best describes the sponsor for your DW project?

a) Multiple levels of business-driven, cross-departmental sponsorship including top level management

sponsorship (BI/DW is integrated in the company process with continuous budget)

b) No project sponsor

c) Single sponsor from a business unit or department

d) Chief information officer (CIO) or an IT director

e) Multiple individual sponsors from multiple business units or departments

7) Which answer best describes your DW project management?

(Definitions:

Page 129: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 120 -

project planning and scheduling (i.e.: work breakdown structure, time, costs and resources estimates,

planning and scheduling;

project tracking and control (i.e.: milestone tracking, change control))

a) Project planning and scheduling: no; project risk management: no; project tracking and control: no;

standard and efficient procedure and documentation, evaluation and assessment: no

b) Project planning and scheduling: yes; project risk management: no; project tracking and control: no;

standard and efficient procedure and documentation, evaluation and assessment: no

c) Project planning and scheduling: yes; project risk management: no; project tracking and control: yes;

standard and efficient procedure and documentation, evaluation and assessment: no

d) Project planning and scheduling: yes; project risk management: yes; project tracking and control: yes;

standard and efficient procedure and documentation, evaluation and assessment: no

e) Project planning and scheduling: yes; project risk management: yes; project tracking and control: yes;

standard and efficient procedure and documentation, evaluation and assessment: yes

8) Which answer best describes the role division for the DW development process?

a) No formal roles defined

b) Defined roles, but not technically implemented

c) Formalized and implemented roles and responsibilities

d) c) + periodic peer reviews (i.e.: review of each other‘s work)

e) d) + periodic evaluation and assessment of roles (i.e.: assess the performance of the roles and match the

needed roles with responsibilities and tasks)

9) Which answer best describes the knowledge management in your organization for the DW development

processes?

a) Ad-hoc knowledge gathering and sharing

b) Organized knowledge sharing through written documentation and technology (e.g.: knowledge databases,

intranets, wikis, etc.)

c) Knowledge management is standardized; knowledge creation and sharing through brainstorming, training

and mentoring programs, and also through the use of technology

d) Central business unit knowledge management; quantitative knowledge management control and periodic

knowledge gap analysis

e) Continuously improving inter-organizational knowledge management

10) Which answer best describes the requirements definition phase for your DW project?

a) Ad-hoc requirements definition; no methodology used

b) Methodologies differ from project to project; interviews with business or IT users for collecting the

requirements

c) Standard methodology for all the projects; interviews and group sessions with both business and IT users

for collecting the requirements

d) c) + qualitative assessment and measurement of the phase; requirements document also published

e) d) + causal analysis meetings to identify common bottlenecks causes and subsequent elimination of these

causes

11) Which of the following activities are included in the testing and acceptance phase for your DW project?

a) Unit testing by another person

b) System integration testing

c) Regression testing

Page 130: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 121 -

d) User training

e) Acceptance testing

f) Standard procedure and documentation for testing and acceptance

g) External assessments and reviews of testing and acceptance

3.2 Service Processes (Maintenance and Monitoring Processes)

1) Which of the following activities are included in the maintenance and monitoring phase for your DW project?

a) Collection of statistics regarding the utilization of the hardware and software resources (e.g.: memory

management, physical disk storage space utilization, processor usage, BI applications usage, number of

completed queries by time slots during the day, time each user stays online with the data warehouse, total

number of distinct users per day, etc.)

b) BI applications maintenance and monitoring

c) User support

d) ETL monitoring and management

e) data reconciliation and data growth management

f) Security administration

g) Resource monitoring and management

h) Infrastructure management

i) Backup and recovery management

j) Performance monitoring and tuning

2) Which answer best describes the DW service quality management in your organization?

a) No service quality management activities

b) Ad-hoc service quality management

c) Proactive service quality management including a standard procedure

d) c) + service quality measurements periodically compared to the established goals to determine the

deviations and their causes

e) d) + causal analysis meetings to identify common defect causes and subsequent elimination of these causes;

service quality management certification

3) Which answer best describes the knowledge management in your organization for the DW development

processes?

a) Ad-hoc knowledge gathering and sharing

b) Organized knowledge sharing through written documentation and technology (e.g.: knowledge databases,

intranets, wikis, etc.)

c) Knowledge management is standardized; knowledge creation and sharing through brainstorming, training

and mentoring programs, and also through the use of technology

d) Central business unit knowledge management; quantitative knowledge management control and periodic

knowledge gap analysis

e) Continuously improving inter-organizational knowledge management

4) Which answer best describes the DW service level management in your organization?

a) Customer and suppliers service needs documented in an ad-hoc manner; no service catalogue compiled

b) Some customer and supplier service needs documented and formalized based on previous experience

c) All the customer and supplier service needs documented and formalized according to a standard procedure

into service level agreements (SLAs)

d) SLAs reviewed with the customer and supplier on both a periodic and event-driven basis

Page 131: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 122 -

e) Actual service delivery continuously monitored and evaluated with the customer on both a periodic and

event-driven basis for continuous improvement (SLAs including penalties)

5) Which answer best describes the DW incident management in your organization?

a) Incident management is done ad-hoc with no specialized ticket handling system or service desk to assess

and classify them prior to referring them to a specialist

b) A ticket handling system is used for incident management and some procedures are followed, but nothing is

standardized or documented

c) A service desk is the recognized point of contact for all the customer queries; incidents assessment and

classification is done following a standard procedure

d) c) + standard reports concerning the incident status including measurements and goals (e.g.: response time)

are regularly produced for all the involved teams and customers; an incident management database is

established as a repository for the event records

e) d) + trend analysis in incident occurrence and also in customer satisfaction and value perception of the

services provided to them

6) Which answer best describes the DW change management in your organization?

a) Change requests are made and solved in an ad-hoc manner

b) A ticket handling system is used for storing and solving the requests for change and some procedures are

followed, but nothing is standardized or documented

c) A standard procedure is used for approving, verifying, prioritizing and scheduling changes

d) c) + standard reports concerning the change status including measurements and goals (e.g.: response time)

are regularly produced for all the involved teams and customers; standards established for documenting

changes

e) d) + trend analysis and statistics regarding change occurrence, success rate, customer satisfaction and value

perception of the services provided to them

7) Which answer best describes the DW technical resource management in your organization?

a) Ad-hoc resource management activities (only when there is a problem)

b) Resource management is done following some procedures, but nothing is standardized or documented

c) Resource management is done constantly following a standardized documented procedure

d) c) + standard reports concerning performance and resource management including measurements and goals

are done on a regular basis

e) d) + resource management trend analysis and monitoring to determine the most common bottlenecks and

make sure that there is sufficient capacity to support planned services

8) Which answer best describes the availability management in your organization?

a) Ad-hoc availability management

b) Availability management is done following some procedures, but nothing is standardized or documented

c) Availability management documented and done using a standardized procedure (all elements are

monitored)

d) c) + risk assessment to determine the critical elements and possible problems

e) d) + availability management trend analysis and planning to determine the most common bottlenecks and

make sure that all the elements are available for the agreed service level targets

9) Which answer best describes the release management in your organization?

a) Ad-hoc changes solving and implementation; no release naming and numbering conventions

b) Release management is done following some procedures, but nothing is standardized or documented;

release naming and numbering conventions

Page 132: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 123 -

c) Release management is documented and done following a standardized procedure; assigned release

management roles and responsibilities

d) c) + standard reports concerning release management including measurements and goals are done on a

regular basis; master copies of all software in a release secured in a release database

e) d) + release management trend analysis, statistics and planning

Page 133: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 124 -

Appendix C: DW Maturity Assessment Questionnaire (Redefined Version)

Data Warehouse (DW) Maturity Assessment Questionnaire

The filling in of the questionnaire will take 50 minutes and in the end a maturity score for each benchmark

category/sub-category and an overall maturity score will be provided. The questions from the first part of the

questionnaire (i.e.: DW General Questions) are not scored and their answers will serve as input for shaping a better

image of the DW solution maturity. The questions from the second and third part of the questionnaire (i.e.: DW

technical solution; and DW organization and processes) are scored from 1 to 5 and they are multiple choice

questions with only one possible answer (except questions 3.1 – 11 and 3.2 – 1 where more answers may be circled).

1 DW General Questions

1) Could you elaborate on the main drivers for implementing a BI/DW solution in your organization?

2) How long has your organization been using BI/DW?

3) Could you elaborate on the success of the BI/DW solution in your organization, in terms of:

a) Returns vs. Costs

b) Time (Intended vs. Actual)

c) Quality

d) End-user adoption.

4) Which answer best describes how executives perceive the purpose of your organization‘s BI/DW environment?

a) Operational cost center – An IT system needed to run the business

b) Tactical resource - Tools to assist decision making

c) Mission-critical resource - A system that is critical to running business operations

d) Strategic resource – Key to achieving performance objectives and goals

e) Competitive differentiator – Key to gaining or keeping customers and/or market share.

5) What percentage of the annual IT budget for your organization does the BI/DW budget represent?

6) Who is the budget owner of the BI/DW solution in your organization (i.e.: who is responsible for paying the

invoice)?

7) Which technologies do you use for developing the BI/DW solution in your organization?

Developing Category Technology

Data Modelling

Extract/Transform/Load (ETL)

BI Applications

Database

8) What data modelling technique do you use for your BI/DW solution?

Page 134: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 125 -

2 DW Technical Solution

2.1 Architecture/ General Architecture and Infrastructure

1) What is the predominant architecture of your DW?

a) Level 1 – Desktop data marts (e.g.: Excel sheets)

b) Level 2 – Multiple independent data marts

c) Level 3 – Multiple independent data warehouses

d) Level 4 – A single, central DW with multiple data marts (Inmon) or conformed data marts (Kimball)

e) Level 5 – A virtual integrated DW or real-time DW

2) To what degree have you defined, documented and implemented definitions and business rules for the necessary

transformations, key terms and metrics?

a) Very low – No business rules defined

b) Low – Few business rules defined and implemented

c) Moderate – Some business rules defined and implemented

d) High – Most of the business rules defined and implemented

e) Very high – All business rules defined and implemented

3) To what degree is your metadata management implemented?

a) Very low – No metadata management

b) Low – Non-integrated metadata by solution

c) Moderate – Central metadata repository separated by tools

d) High – Central up-to-date metadata repository

e) Very high – Web-accessed central metadata repository with integrated, standardized, up-to-date metadata

4) To what degree is security implemented in your DW architecture?

a) Very low – No security implemented

b) Low – Authentication security

c) Moderate – Independent authorization for each tool / Target audience authorization

d) High – Role-level security at database level

e) Very high – Integrated companywide authorization security

5) What types of data sources does your DW support at the highest level?

a) Level 1 – CSVs files

b) Level 2 – Operational databases

c) Level 3 – ERP and CRM systems; XML files

d) Level 4 – Unstructured data sources (e.g.: text or documents)

e) Level 5 – Various types of unstructured data sources (e.g.: images, videos) and Web data sources

6) To what degree do you use methods to increase the performance of your DW?

a) Very low – No methods to increase performance

b) Low – Software performance tuning (e.g.: index management, parallelizing and partitioning system, views

materialization)

c) Moderate – Hardware performance tuning (e.g.: DW server)

d) High – Software and hardware tuning

e) Very high – Specialized DW appliances (e.g.: Netezza, Teradata) / cloud computing

7) To what degree is your infrastructure specialized for a DW?

Page 135: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 126 -

a) Very low – Desktop platform

b) Low – Shared OLTP systems and DW environment

c) Moderate – Separate OLTP systems and DW environment

d) High – Separate servers for OLTP systems, DW, ETL and BI applications

e) Very high – Specialized DW appliances (e.g.: Netezza, Teradata)

8) Which answer best describes the update frequency for your DW?

a) Level 1 – Monthly update or less often

b) Level 2 – Weekly update

c) Level 3 – Daily update

d) Level 4 – Inter-daily update

e) Level 5 – Real-time update

2.2 Data Modelling

Data quality?

1) Which answer best describes the usage of a data modelling tool in your organization?

a) Level 1 – No data modelling tool

b) Level 2 – Scattered data modelling tools used only for design

c) Level 3 – Scattered data modelling tools used also for maintenance

d) Level 4 – Standardized data modelling tool used only for design

e) Level 5 – Standardized data modelling tool used for design and maintaining metadata

2) Which answer best describes the degree of synchronization between the following data models that your

organization maintains and the mapping between them: ETL source and target models; DW and data marts

models; BI semantic or query object models?

a) Level 1 – No synchronization between data models

b) Level 2 – Manual synchronization of some of the data models

c) Level 3 – Manual or automatic synchronization depending on the data models

d) Level 4 – Automatic synchronization of most of the data models

e) Level 5 – Automatic synchronization of all of the data models

3) To what degree do you differentiate between data models levels: physical, logical and conceptual?

a) Very low – No differentiation between data models levels

b) Low – Logical and physical levels designed for some data models

c) Moderate – Logical and physical levels designed for all the data models

d) High – Conceptual level also designed for some data models

e) Very high – All data models have conceptual, logical and physical levels designed

4) To what degree have you defined and implemented standards (e.g.: naming conventions, metadata, etc.) for

your data models?

a) Very low – No standards defined for data models

b) Low – Solution-dependent standards defined for some of the data models

c) Moderate – Solution-dependent standards defined for most of the data models / Enterprise-wide standards

defined for some of the data models

d) High – Enterprise-wide standards defined for most of the data models

e) Very high – Enterprise-wide standards defined for all the data models

Page 136: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 127 -

5) To what degree have you documented the metadata (e.g.: definitions, business rules, main values, data quality,

etc.) in your data models?

a) Very low – No documentation for any data models

b) Low – Non standardized documentation for some of the data models

c) Moderate – Standardized documentation for some of the data models

d) High – Standardized documentation for most of the data models

e) Very high – Standardized documentation for all the data models

6) What percentage of all your fact tables has their granularity at the lowest level possible?

a) Very low – Very few fact tables have their granularity at the lowest level possible

b) Low – Few fact tables have their granularity at the lowest level possible

c) Moderate – Some fact tables have their granularity at the lowest level possible

d) High – Most fact tables have their granularity at the lowest level possible

e) Very high – All fact tables have their granularity at the lowest level possible

7) To what degree do you design conformed dimensions in your data models?

a) Very low – No conformed dimensions

b) Low – Conformed dimensions for few business processes

c) Moderate – Conformed dimensions for some business processes

d) High – Enterprise-wide standardized conformed dimensions for most business processes; also making use

of a high level design technique such as an enterprise bus matrix

e) Very high – Enterprise-wide standardized conformed dimensions for all business processes

8) Which answer best describes the current state of your dimension tables modelling?

a) Level 1 – Few dimensions designed; no hierarchies or surrogate keys designed

b) Level 2 – Some dimensions designed with surrogate keys and basic hierarchies (if needed)

c) Level 3 – Most dimensions designed with surrogate keys and basic/complex hierarchies (if needed)

d) Level 4 – Slowly changing dimensions techniques (i.e.: type 2, 3 and more) also designed

e) Level 5 – Besides regular dimensions and slowly changing dimensions technique, special dimensions are

also designed (e.g.: mini, monster, junk dimensions)

2.3 ETL

1) Which answer best describes the usage of an ETL tool in your organization?

a) Level 1 – Only hand-coded ETL

b) Level 2 – Hand-coded ETL and some standard scripts

c) Level 3 – ETL tool(s) for all the ETL design and generation

d) Level 4 – Standardized ETL tool and some standard scripts for better performance

e) Level 5 – Complete ETL generated from metadata

2) Which answer best describes the complexity of your ETL?

a) Level 1 – Simple ETL that just extracts and loads data into the data warehouse

b) Level 2 – Basic ETL with simple transformations such as: format changes, sorting, filtering, joining,

deriving new calculated values, aggregation, etc and surrogate key generator

c) Level 3 – Advanced ETL capabilities: slowly changing dimensions manager, reusability, change data

capture system, de-duplication and matching system, data quality system

d) Level 4 – More advanced ETL capabilities: error event table creation, audit dimension creation, late

arriving data handler, hierarchy manager, special dimensions manager

Page 137: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 128 -

e) Level 5 – Real-time ETL capabilities (optimization of ETL) / optimized ETL for an agile DW (real-time

ETL capabilities)

3) Which answer best describes the data quality system implemented for your ETL?

a) Very low – Daily automation: no; Specific data quality tools: no; Identifying data quality issues: no;

Solving data quality issues: no

b) Low – Daily automation: no; Specific data quality tools: no; Identifying data quality issues: yes; Solving

data quality issues: no

c) Moderate – Daily automation: yes/no; Specific data quality tools: yes/no; Identifying data quality issues:

yes; Solving data quality issues: no

d) High – Daily automation: yes/no; Specific data quality tools: yes/no; Identifying data quality issues: yes;

Solving data quality issues: yes

e) Very high – Daily automation: yes; Specific data quality tools: yes; Identifying data quality issues: yes;

Solving data quality issues: yes

4) Which answer best describes the management and monitoring of your ETL?

a) Level 1 – Restart and recovery system: no; Simple monitoring (i.e: ETL workflow monitor – statistics

regarding ETL execution such as pending, running, completed and suspended jobs; MB processed per

second; summaries of errors, etc.): no; Advanced monitoring (ETL workflow monitor – statistics on

infrastructure performance like CPU usage, memory allocation, database performance, server utilization

during ETL; job scheduler – time or event based ETL execution, events notification; data lineage and

analyzer system); Real-time monitoring: no

b) Level 2 – Restart and recovery system: no; Simple monitoring: yes; Advanced monitoring: no; Real-time

monitoring: no

c) Level 3 – Restart and recovery system: no / Manual restart and recovery system: yes; Simple monitoring:

yes; Advanced monitoring: yes; Real-time monitoring: no

d) Level 4 – Restart and recovery system: yes / Manual and automatic restart and recovery system: yes;

Simple monitoring: yes; Advanced monitoring: yes; Real-time monitoring: no

e) Level 5 – Restart and recovery system: yes / Completely automatic restart and recovery system: yes;

Simple monitoring: yes; Advanced monitoring: yes; Real-time monitoring: yes (Manual or automatic

restart and recovery system as needed)

5) To what degree have you defined and implemented standards (e.g.: naming conventions, set-up standards,

recovery process, etc.) for your ETL?

a) Very low – No standards defined

b) Low – Few standards defined for ETL

c) Moderate – Some standards defined for ETL

d) High – Most standards defined for ETL

e) Very high – All the standards defined for ETL

6) To what degree is your metadata management implemented for your ETL?

a) Very low – No metadata management

b) Low – Business and technical metadata for some ETL

c) Moderate – Business and technical metadata for all ETL

d) High – Process metadata is also managed for some ETL

e) Very high – All types of metadata are managed for all ETL

2.4 BI Applications

Page 138: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 129 -

1) Which types of BI applications best describe the highest level purpose of your DW environment?

a) Level 1 – Static and parameter-driven reports and query applications

b) Level 2 – Ad-hoc reporting; online analytical processing (OLAP)

c) Level 3 – Visualization techniques: dashboards and scorecards

d) Level 4 – Predictive analytics: data and text mining; alerts

e) Level 5 – Closed-loop BI applications; real-time BI applications

2) Which answer best describes your current BI tool usage?

a) Level 1 – BI tool related to the data mart

b) Level 2 – More than two tools for main stream BI (i.e.: reporting and visualization applications)

c) Level 3 – One tool recommended for main stream BI, but each department can use their own tool

d) Level 4 – One standardized tool for main stream BI, but each department can use their own tool for specific

BI applications (i.e.: data mining, financial analysis, etc.)

e) Level 5 – One standardized tool for main stream BI and one standardized tool for specific BI applications

3) To what degree have you defined and implemented standards (e.g.: naming conventions, generic

transformations, logical structure of attributes and measures) for your BI applications?

a) Very low – No standards defined

b) Low – Few standards defined for BI applications

c) Moderate – Some standards defined for BI applications

d) High – Most standards defined for BI applications

e) Very high – All the standards defined for BI applications

4) To what degree are standardized objects (e.g.: KPIs, metrics, attributes, templates) implemented in your BI

applications? / To what degree are generic components used for your BI applications?

a) Very low – Objects defined for every BI application

b) Low – Some reusable objects for similar BI applications

c) Moderate – Some standard objects and templates for similar BI applications

d) High – Most similar BI applications use standard objects and templates

e) Very high – All similar BI applications use standard objects and templates

5) Which BI applications delivery method best describes the highest level purpose of your DW?

a) Level 1 – Reports are delivered manually on paper or by email

b) Level 2 – Reports are delivered automatically by email

c) Level 3 – Direct tool-based interface

d) Level 4 – A BI portal with basic functions: subscriptions, discussions forum, alerting

e) Level 5 – Highly interactive, business process oriented, up-to-date portal (no differentiation between

operational and BI portals)

6) Which answer best describes the metadata accessibility to users?

a) Very low – No metadata available

b) Low – Some incomplete metadata documents that users ask for periodically

c) Moderate – Complete up-to-date metadata documents sent to users periodically or available on the intranet

d) High – Metadata is always available through a metadata management tool, different from the BI tool

e) Very high – Complete integration of metadata with the BI applications (metadata can be accessed through

one button push on the attributes, etc.)

3 DW Organization and Processes

Page 139: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 130 -

3.1 Development Processes

1) Which answer best describes the DW development processes in your organization?

a) Level 1 – ad-hoc development processes; no clearly defined development phases (i.e.: planning,

requirements definition, design, construction, deployment, maintenance)

b) Level 2 – repeatable development processes based on experience with similar projects; some development

phases clearly separated

c) Level 3 – standard documented development processes; iterative and incremental development processes

with all the development phases clearly separated

d) Level 4 – development processes continuously measured against well-defined and consistent goals

a) Level 5 – continuous development process improvement by identifying weaknesses and strengthen the

process proactively, with the goal of preventing the occurrence of defects

2) To what degree is there a separation between the development/test/acceptance/deployment environments in

your organization? – the time too market is too long if each environment is separate

a) Very low – no separation between environments

b) Low – two separate environments (i.e.: usually development and production) with manual transfer between

them

c) Moderate – some separation between environments (i.e.: at least three environments) with manual transfer

between them

d) High – some separation between environments (i.e.: at least two environments) with automatic transfer

between them

e) Very high – all the environments are distinct with automatic transfer between them

3) To what degree has your organization defined, documented and implemented standards for developing, testing

and deploying DW functionalities (i.e.: ETL and BI applications)?

a) Very low – no standards defined

b) Low – few standards defined

c) Moderate – some standards defined

d) High – a lot of the standards defined

e) Very high – a comprehensive set of standards defined

4) Which answer best describes the DW quality management?

a) Level 1 – no quality assurance activities

b) Level 2 – ad-hoc quality assurance activities

c) Level 3 – standardized and documented quality assurance activities done for all the development phases

d) Level 4 – level 3) + measurable and prioritized goals for managing the DW quality (e.g.: functionality,

reliability, maintainability, usability)

e) Level 5 – level 4) + causal analysis meetings to identify common defect causes and subsequent elimination

of these causes; service quality management certification

5) Which answer best describes the sponsor for your DW project?

a) Level 1 – no project sponsor

b) Level 2 – chief information officer (CIO) or an IT director

c) Level 3 – single sponsor from a business unit or department

d) Level 4 – multiple individual sponsors from multiple business units or departments

e) Level 5 – multiple levels of business-driven, cross-departmental sponsorship including top level

management sponsorship (BI/DW is integrated in the company process with continuous budget)

Page 140: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 131 -

6) Which answer best describes your DW project management?

a) Level 1 – project planning and scheduling (i.e.: work breakdown structure, time, costs and resources

estimates, planning and scheduling): no; project risk management: no; project tracking and control (i.e.:

milestone tracking, change control): no; standard and efficient procedure and documentation, evaluation

and assessment: no

b) Level 2 – project planning and scheduling: yes; project risk management: no; project tracking and control:

no; standard and efficient procedure and documentation, evaluation and assessment: no

c) Level 3 – project planning and scheduling: yes; project risk management: no; project tracking and control:

yes; standard and efficient procedure and documentation, evaluation and assessment: no

d) Level 4 – project planning and scheduling: yes; project risk management: yes; project tracking and control:

yes; standard and efficient procedure and documentation, evaluation and assessment: no

e) Level 5 – project planning and scheduling: yes; project risk management: yes; project tracking and control:

yes; standard and efficient procedure and documentation, evaluation and assessment: yes

7) Which answer best describes the role division for the DW development process?

a) Level 1 – no formal roles defined

b) Level 2 – defined roles, but not technically implemented

c) Level 3 – formalized and implemented roles and responsibilities

d) Level 4 – level 3) + periodic peer reviews (i.e.: review of each other‘s work)

e) Level 5 – level 4) + periodic evaluation and assessment of roles (i.e.: assess the performance of the roles

and match the needed roles with responsibilities and tasks)

8) Which answer best describes the knowledge management in your organization for the DW development

processes?

a) Level 1 – ad-hoc knowledge gathering and sharing

b) Level 2 – organized knowledge sharing through written documentation and technology (e.g.: knowledge

databases, intranets, wikis, etc.), and also through training and mentoring programs

c) Level 3 – knowledge management is standardized; knowledge creation and sharing through brainstorming,

training and mentoring programs

d) Level 4 – central business unit knowledge management; quantitative knowledge management control and

periodic knowledge gap analysis

e) Level 5 – continuously improving inter-organizational knowledge management

9) Which answer best describes the requirements definition phase for your DW project?

a) Level 1 – ad-hoc requirements definition; no methodology used

b) Level 2 – methodologies differ from project to project; interviews with business users for collecting the

requirements

c) Level 3 – standard methodology for all the projects; interviews and group sessions with both business and

IT users for collecting the requirements

d) Level 4 – level 3) + qualitative assessment and measurement of the phase; requirements document also

published

e) Level 5 – level 4) + causal analysis meetings to identify common bottlenecks causes and subsequent

elimination of these causes

10) Which answer best describes the testing and acceptance phase for your DW project? – answers hard to match

a) Level 1 – unit testing by another person: yes; system integration testing: no; user training: no; acceptance

testing: no; standard procedure and documentation for testing and acceptance: no; external assessments and

reviews of testing and acceptance: no;

Page 141: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 132 -

b) Level 2 - unit testing by another person: yes; system integration testing: no; user training: yes; acceptance

testing: yes; standard procedure and documentation for testing and acceptance: no; external assessments

and reviews of testing and acceptance: no;

c) Level 3 - unit testing by another person: yes; system integration testing: yes; user training: yes; acceptance

testing: yes; standard procedure and documentation for testing and acceptance: no; external assessments

and reviews of testing and acceptance: no;

d) Level 4 - unit testing by another person: yes; system integration testing: yes; user training: yes; acceptance

testing: yes; standard procedure and documentation for testing and acceptance: yes; external assessments

and reviews of testing and acceptance: no;

e) Level 5 - unit testing by another person: yes; system integration testing: yes; user training: yes; acceptance

testing: yes; standard procedure and documentation for testing and acceptance: yes; external assessments

and reviews of testing and acceptance: yes.

3.2 Service Processes (Maintenance and Monitoring Processes)

1) Which answer best describes the DW service quality management in your organization?

a) Level 1 – no service quality management activities

b) Level 2 – ad-hoc service quality management

c) Level 3 – proactive service quality management including a standard procedure

d) Level 4 – level 3) + service quality measurements periodically compared to the established goals to

determine the deviations and their causes

e) Level 5 – levels 4) + causal analysis meetings to identify common defect causes and subsequent

elimination of these causes; service quality management certification

2) Which answer best describes the knowledge management in your organization for the DW service processes?

a) Level 1 – ad-hoc knowledge gathering and sharing

b) Level 2 – organized knowledge sharing through written documentation and technology (e.g.: knowledge

databases, intranets, wikis, etc.), and also through training and mentoring programs

c) Level 3 – knowledge management is standardized; knowledge creation and sharing through brainstorming,

training and mentoring programs

d) Level 4 – central business unit knowledge management; quantitative knowledge management control and

periodic knowledge gap analysis

e) Level 5 – continuously improving inter-organizational knowledge management

3) Which answer best describes the DW service level management in your organization? – SLA with the suppliers

of data?

a) Level 1 – customer service needs documented in an ad-hoc manner; no service catalogue compiled

b) Level 2 – some customer service needs documented and formalized based on previous experience

c) Level 3 – all the customer service needs documented and formalized according to a standard procedure into

service level agreements (SLAs)

d) Level 4 – SLAs reviewed with the customer on both a periodic and event-driven basis

e) Level 5 – actual service delivery continuously monitored and evaluated with the customer on both a

periodic and event-driven basis for continuous improvement (SLAs including penalties)

4) Which answer best describes the DW incident management in your organization?

a) Level 1 – incident management is done ad-hoc with no specialized ticket handling system or service desk to

assess and classify them prior to referring them to a specialist

b) Level 2 – a ticket handling system is used for incident management; some policies and procedures for

incident management are established, but nothing is standardized

Page 142: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 133 -

c) Level 3 – a service desk is the recognized point of contact for all the customer queries; incidents

assessment and classification is done following a standard procedure

d) Level 4 – standard reports concerning the incident status including measurements and goals (e.g.: response

time) are regularly produced for all the involved teams and customers; an incident management database is

established as a repository for the event records

e) Level 5 – trend analysis in incident occurrence and also in customer satisfaction and value perception of the

services provided to them

5) Which answer best describes the DW change management in your organization?

a) Level 1 – change requests are made and solved in an ad-hoc manner

b) Level 2 – a change management system is used for storing and solving the requests for change; some

policies and procedures for change management established, but nothing is standardized

c) Level 3 – a standard procedure is used for approving, verifying, prioritizing and scheduling changes

d) Level 4 – standard reports concerning the change status including measurements and goals (e.g.: response

time) are regularly produced for all the involved teams and customers; standards established for

documenting changes

e) Level 5 – trend analysis and statistics regarding change occurrence, success rate, customer satisfaction and

value perception of the services provided to them

6) Which answer best describes the DW technical resource management in your organization?

a) Level 1 – ad-hoc resource management activities (only when there is a problem)

b) Level 2 – resource management is done following some procedures, but nothing is standardized or

documented

c) Level 3 – resource management is done constantly following a standardized documented procedure

d) Level 4 – standard reports concerning performance and resource management including measurements and

goals are done on a regular basis

e) Level 5 – resource management trend analysis and monitoring to make sure that there is sufficient capacity

to support planned services

7) Which answer best describes the availability management in your organization?

a) Level 1 – ad-hoc availability management

b) Level 2 – availability management is done following some procedures, but nothing is standardized or

documented

c) Level 3 – availability management documented and done using a standardized procedure (all elements are

monitored)

d) Level 4 – risk assessment to determine the critical elements and possible problems

e) Level 5 – availability management trend analysis and planning to make sure that all the elements are

available for the agreed service level targets

8) Which answer best describes the release management in your organization?

a) Level 1 – ad-hoc changes solving and implementation; no release naming and numbering conventions

b) Level 2 – release management is done following some procedures, but nothing is standardized or

documented; release naming and numbering conventions

c) Level 3 – release management is documented and done following a standardized procedure; assigned

release management roles and responsibilities

d) Level 4 – standard reports concerning release management including measurements and goals are done on a

regular basis; master copies of all software in a release secured in a release database

e) Level 5 – release management trend analysis, statistics and planning.

Page 143: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 134 -

Appendix D: Expert Interview Protocol

Interviewee : Date :

Organization: Start Time:

Place : End Time :

Interviewer Instructions

Ask for recording permission, for processing purposes. Recordings will be deleted after processing. Check

if recorder works correctly!

Start with the following introduction and continue with the questions:

General information:

As briefly explained in our e-mail contact, my name is Catalina Sacu and I am following the two years Master of

Business Informatics at Utrecht University. I am currently writing my thesis under the supervision of dr. M.R. Spruit

and dr. J. M. Versendaal, aiming to develop a Data Warehouse Capability Maturity Model. The main goal of my

thesis is to create a model that would help organizations assess their current data warehouse solution from both a

technical and an organizational and processes points of view.

Research:

In nowadays economy, organizations have a lot of information to gather and process in order to be able to take the

best decisions as fast as possible. One of the solutions that can improve the decision making process is the usage of

Business Intelligence (BI)/Data Warehouses (DW) solutions. They combine tools, technologies and processes in

order to turn data into information and information into knowledge that can optimize business actions. However,

even if organizations spend a lot of money for developing these solutions, more than 50 percent of the BI/DW

projects fail to deliver the promised results (Gartner Group, 2007). This was the trigger for my research that aims at

creating a DW Capability Maturity Model. In this way, we will be able to assess and score the different variables

that influence the quality of a DW and determine the current situation of an organization‘s DW solution. Then, we

will be able to offer some guidelines on future DW improvements that will lead to a better organizational

performance.

Goal:

As said before, the main goal of my research is to develop a Data Warehouse Capability Maturity Model. This

interview is part of my research and its main objective is to get some expert validation for the model I have

developed from theory and the case study done at Inergy. The interview will contain questions regarding the

following aspects:

Your organization and role

The Data Warehouse Capability Maturity Model.

Data collected during the interview will only be used for my thesis and will be processed anonymously. At the end

of my research, you will have the chance to see the results and the final model. The interview will last for about two

hours. Before we start, are there any questions? OK, let‘s start! Start recorder!

Page 144: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 135 -

Questions:

Organization and Role

1. Could you give a short introduction to your organization (including products, markets, customers)?

2. Could you explain your role in the organization (including your experience in BI)? On a scale from 1 to 5,

how would you judge your knowledge on BI (Business vs. Technical)?

The Data Warehouse Capability Maturity Model

1. In my model, I consider several benchmark variables/categories that have to be taken into consideration

and assessed when analyzing the maturity of an organization‘s DW. Which categories would you

recommend?

Show and explain the DW Capability Maturity Model (with all its components).

2. Do you think the chosen categories are representative and if not, what changes would you make?

Let‘s take a look at each category and the questions I chose in order to do the assessment.

3. Do you think the chosen questions are representative and if not, what changes would you make?

Let‘s take a closer look at two categories you prefer.

4. Do you think the chosen answers are representative and if not, what changes would you make?

5. In my model, I consider each question to have five possible answers weighted from 1 to 5. Each answer is

also specific to one of the five possible maturity stages. In this way, after getting all the answers, we can

sum up all the weightings for each category and divide them by the number of questions per category (e.g.:

a score for architecture, one for data modelling, etc.). In the end, an overall score can be obtained by

summing up the scores for all the categories and dividing them by six (the number of categories). What is

your opinion on the scoring method? Should we add weightings for each category (e.g.: architecture – 0.2;

data modelling – 0.3; etc.)? What other changes would you make?

Final Questions:

1. What are the current trends in DW in your opinion?

2. What are the situational factors (if any) in your opinion that can influence the development of a DW and

hence, the applicability of the DW Capability Maturity Model?

Thank you for your time and cooperation! Are there any additional comments or questions? Turn off

recorder!

Page 145: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 136 -

Appendix E: Case Study Interview Protocol

Interviewee : Date :

Organization: Start Time:

Place : End Time :

Interviewer Instructions

Ask for recording permission, for processing purposes. Recordings will be deleted after processing. Check if

recorder works correctly!

Start with the following introduction and continue with the questions:

General information:

As briefly explained in our e-mail contact, my name is Catalina Sacu and I am following the two years Master of

Business Informatics at Utrecht University. I am currently writing my thesis under the supervision of dr. M.R. Spruit

and dr. J. M. Versendaal, aiming to develop a Data Warehouse Capability Maturity Model. The main goal of my

thesis is to create a model that would help organizations assess their current data warehouse solution from both a

technical and an organizational and processes points of view.

Research:

In nowadays economy, organizations have a lot of information to gather and process in order to be able to take the

best decisions as fast as possible. One of the solutions that can improve the decision making process is the usage of

Business Intelligence (BI)/Data Warehouses (DW) solutions. They combine tools, technologies and processes in

order to turn data into information and information into knowledge that can optimize business actions. However,

even if organizations spend a lot of money for developing these solutions, more than 50 percent of the BI/DW

projects fail to deliver the promised results (Gartner Group, 2007). This was the trigger for my research that aims at

creating a DW Capability Maturity Model. In this way, we will be able to assess and score the different variables

that influence the quality of a DW and determine the current situation of an organization‘s DW solution. Then, we

will be able to offer some guidelines on future DW improvements that will lead to a better organizational

performance.

Goal:

As said before, the main goal of my research is to develop a Data Warehouse Capability Maturity Model. This

interview is part of my research and its main objective is to test the model in an organization to see if it works in

practice and get some feedback for future improvements of the model. The interview will contain questions

regarding the following aspects:

Your organization and role

The Data Warehouse Maturity Assessment.

Data collected during the interview will only be used for my thesis and will be processed anonymously. At the end

of my research, you will have the chance to see the results and the final model. The interview will last for about 1.5

hours. Before we start, are there any questions? OK, let‘s start! Start recorder!

Page 146: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 137 -

Questions:

Organization and Role

1. Could you give a short introduction to your organization (including products, markets, customers)?

2. Could you explain your role in the organization (including your experience in BI/DW) and the BI/DW

project? On a scale from 1 to 5, how would you judge your knowledge on BI (Business vs. Technical)?

The Data Warehouse Maturity Assessment Questionnaire

Show and explain the Data Warehouse Capability Maturity Model (with all its components).

Please see the attached questionnaire and fill in the answers.

Thank you for your time and cooperation! Are there any additional comments or questions? Turn off

recorder!

Page 147: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 138 -

Appendix F: Case Study Feedback Template

1. Maturity Scores

Short overview on the maturity assessment questionnaire.

Tables with maturity scores and radar graph.

2. Feedback

Strengths regarding the current DW solution

Feedback regarding the DW technical solution

Feedback regarding the DW organization & processes.

Page 148: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 139 -

Appendix G: Paper

Paper will be submitted to the Journal of Database Management.

DWCMM: The Data Warehouse Capability Maturity Model

Catalina Sacu1, Marco Spruit

1, Frank Habers

2

1 Institute of Information and Computing Sciences,

Utrecht University, 3508 TC, Utrecht, The Netherlands. 2 Inergy, 3447 GW Woerden, The Netherlands.

Abstract: Data Warehouses and Business Intelligence have been part of a very dynamic and

popular field of research in the last years as they help organizations in making better decisions and

increasing their profitability. This paper aims at creating a Data Warehouse Capability Maturity

Model (DWCMM) focused on the technical and organizational aspects involved in developing a

data warehouse environment. This model and its associated maturity assessment questionnaire can

be used to help organizations assess their current DW solution and provide them with guidelines

for future improvements. The DWCMM was evaluated empirically through multiple expert

interviews and case studies to enrich and validate the theory we have developed.

Keywords: Data Warehousing, Business Intelligence, Maturity Modelling.

Introduction and Problem Definition

In nowadays economy, organizations are part of a very dynamic environment due to continuous changing

conditions and relationships. As Kaye (1996) notes, ―organizations must collect, process, use, and

communicate information, both external and internal, in order to plan, operate and take decisions‖ (p. 20).

The ongoing request for profits, increasing competition and demanding customers, all require

organizations to take the best decisions as fast as possible (Vitt et al., 2002). One of the solutions that can

narrow down the period of time between the moment of acquiring the information and getting the right

results to improve the decision making process is the implementation of Data Warehouses and Business

Intelligence (BI) applications.

Over the years, data warehouses (DWs) and BI solutions have become one of the fundamentals of the

information systems that are used to support the decision making initiatives. Most large companies have

already established DW systems as a component of the information systems landscape. According to

(Gartner, 2007), BI and DWs are at the forefront of the use of IT to support management decision-

making. DWs can be thought of as the large-scale data infrastructure for decision support. BI can be

viewed as the data analysis and presentation layer that sits between the DW and the executive decision-

makers (Arnott & Pervan, 2005). In this way, the DW/BI solutions can transform raw data into

information and then into knowledge.

Page 149: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 140 -

However, a DW is not only a software package. The adoption of DW technology requires massive capital

expenditure and a certain deal of implementation time. DW projects are hence very expensive, time-

consuming and risky undertakings compared with other information technology initiatives, as cited by

prior researchers (Wixom & Watson, 2001; Hwang et al., 2004; Solomon, 2005). Moreover, it is often

believed that one-half to two-thirds of all initial DW efforts fail (Hayen et al., 2007). (Gartner, 2007)

estimates that more than fifty percent of DW projects have limited acceptance or fail. Therefore, it is

crucial to have a thorough understanding of the critical success factors and variables that determine the

efficient implementation of a DW solution.

These factors can refer to the development of the DW/BI solution or to the usage and adoption of BI. In

this research, we will focus on the former as we consider that it represents the foundation for a solid DW

solution that can have a high rate of usage and adoption. First, it is critical to properly design and

implement the databases that lie at the heart of the DW. The right architecture and design can ensure

performance today and scalability tomorrow. Second, all components of the DW solution (e.g.: data

repository, infrastructure, user interface) must be designed to work together in a flexible, easy-to-use way.

A third task is to develop a consistent data model and establish what and how source data will be

extracted. In addition to these factors, the DW needs to be created and developed quickly and efficiently

so that the organization can gain the business benefits as soon as possible (AbuAli & Abu-Addose, 2010).

As can be seen, a DW project can unquestionably be complex and challenging, and there is usually not a

single successful solution that can be applied to all organizations. Therefore, it is very important for

organizations to be aware of their current situation and know the steps they need to take for continuous

improvement. However, an objective assessment often proves to be a difficult task.

Maturity models can be helpful in this situation. They essentially describe the development of an entity

over time, where the entity can be anything of interest: a human being, an organizational function, an

organization, etc. (Klimko, 2001). Maturity models have a number of sequentially ordered levels, where

the bottom stage stands for an initial state than can be, for example, characterized by an organization

having little capabilities in the domain under consideration. In contrast, the highest stage represents a

conception of total maturity. Advancing on the evolution path between the two extremes involves a

continuous progression regarding the organization‘s capabilities or process performance. The maturity

model serves as an assessment of the position on the evolution path, as it offers a set of criteria and

characteristics that need to be fulfilled in order to reach a particular maturity level (Becker et al., 2009).

With the help of maturity modelling, we will gain some insight into the technical and organizational

variables that determine the successful development of a DW solution and analyze these variables.

Therefore, in order to make an assessment of the most important aspects that influence a DW project, this

paper develops a Data Warehouse Capability Maturity Model (DWCMM) which provides an answer to

the following research question:

How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon?

Research Methodology

The main goal of this research is to develop a DWCMM that depicts the maturity stages of a DW project.

For this purpose, a design research approach is used as its main philosophy is to generate scientific

knowledge by building and validating a previously designed artifact (Hevner et al., 2004). In this

Page 150: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 141 -

research, the artifact is the DWCMM, which is developed according to the five steps in developing design

research artifacts as described by (Vaishnavi & Kuechler, 2008): problem awareness, suggestion and

development, evaluation and conclusion. Awareness of the problem was raised in discussions with

DW/BI practitioners and literature study on data warehousing and maturity modelling. A detailed problem

description was provided in the section before. Based on this, it has become clear that DW projects often

fail or do not bring the expected results and that organizations sometimes need guidelines for

improvement. As a solution to this problem, we developed the DWCMM which can be used to assist

organizations in doing a maturity assessment for the DW technical aspects and in providing guidelines for

future improvements. First, an overview on the model and its main components will be presented. Then,

results of the evaluation phase are presented. The DWCMM has been evaluated by carrying out five

expert interviews and multiple case study within four organizations, following (Yin, 2009) case study

approach. Finally, the last section contains conclusions regarding our model and agenda for future

research.

DWCMM: The Data Warehouse Capability Maturity Model

In literature, a lot of maturity models have been developed (de Bruin et al., 2005), but only some of them

managed to gain global acceptance. There are also several information technology and/or information

system maturity models dealing with different aspects of maturity: technological, organizational and

process maturity. Some of them are specific to the data warehousing/BI field. The most important

maturity models that served as a source of inspiration for our research can be seen in table 1.

Authors Model Focus

Nolan (1973) Stages of Growth IT Growth Inside an Organization

Software Engineering Institute

(SEI) (1993)

Capability Maturity Model (CMM) Software Development Processes

Watson, Ariyachandra &

Matyska (2001)

Data Warehousing Stages of Growth Data Warehousing

Chamoni & Gluchowski (2004) Business Intelligence Maturity Model Business Intelligence

The Data Warehousing Institute

(TDWI) (2004)

Business Intelligence Maturity Model Business Intelligence

Gartner – Hostmann (2007) Business Intelligence and Performance

Management Maturity Model

Business Intelligence and

Performance Management

Table 1: Overview of Maturity Models.

Each of these models has a different way of assessing maturity, but there are some common elements for

all the models. All the models have interesting elements, but also weak points that could be improved.

Moreover, all the models developed for the field of data warehousing/BI focus on more variables

involved in such a project, but they do not go deep into analyzing the technical aspects.

The maturity model which served as the main foundation for this research is the CMM (Paulk et al.,

1995). It has become a recognized standard for rating software development organizations. The CMM is a

framework that describes the key elements of an effective software process and presents an evolutionary

improvement path from an ad-hoc, immature process to a mature, disciplined one. Since its development,

CMM has become a universal model for assessing software process maturity. However, the CMM has

often been criticized for its complexity and difficulty of implementation. That is why we simplified it by

keeping the five maturity levels (i.e.: initial, repeatable, defined, managed and optimizing), the process

Page 151: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 142 -

capabilities and the key process areas, which in our model would translate to the chosen benchmark

variables/categories for doing the DW maturity assessment.

Therefore, it can be seen that even if DW/BI solutions are often implemented in practice and a lot of

maturity models have been created, none is actually focusing on the technical aspects of the DW/BI

solution and the organizational processes that sustain them. Hence, this is the research gap we would like

to fill in by developing a Data Warehouse Capability Maturity Model (DWCMM) that focuses on the

DW technical solution and DW organization and processes. The DWCMM can be depicted in figure 1. A

short overview of the model and its components will be provided in the next paragraphs.

When analyzing the maturity of a DW solution, we are actually taking a snapshot of an organization at the

current moment in time. Therefore, in order to do a valuable assessment, it is important to include in the

maturity analysis the most representative dimensions involved in the development of a DW solution.

Several authors describe that the main phases usually involved in a DW project lifecycle are (Kimball et

al., 2008; Moss & Atre, 2003; Ponniah, 2001): project planning and management, requirements

definition, design, development, testing and acceptance, deployment, growth and maintenance. All of

these phases and processes refer to the implementation and maintenance of the actual DW technical

solution which includes: the general architecture and infrastructure, data modelling, ETL, BI applications.

These categories can be analyzed from many points of view which will be depicted in our model and the

maturity assessment we developed. Therefore, the DWCMM will be restricted for doing the assessment of

the technical aspects, without taking into consideration the DW/BI usage and adoption or the DW/BI

business value. It will consider two main benchmark variables/categories for analysis, each of them

having several sub-categories. Firstly, the DW Technical Solution consists of the following four

components: General Architecture and Infrastructure, Data Modelling, Extract-Transform-Load (ETL)

and BI Applications. Secondly, the DW Organization & Processes dimension comprises the following

two aspects: Development Processes and Service Processes.

Figure 1: Data Warehouse Capability Maturity Model (DWCMM).

Page 152: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 143 -

As can be seen from figure 1, the DWCMM does a maturity assessment which will provide a maturity

score for each benchmark sub-category. In order to create a complete image on the current DW solution

for an organization, the DWCMM has several components:

A DW maturity assessment questionnaire:

The whole DW maturity assessment questionnaire has been published in (Sacu et al., 2010). Emphasis

should be put on two aspects regarding the DW maturity assessment questionnaire. Firstly, it does a high

level assessment of an organization’s DW solution and it is limited strictly to the DW technical aspects.

Secondly, the model will assess “what” and “if” certain characteristics and processes are implemented

and not “how” they are implemented. The DW maturity assessment questionnaire has 60 questions

divided into the following three categories:

DW General Questions (9 questions) – it comprises of several questions about the DW/BI

solution and they are not scored. Their purpose is to offer a better image on the drivers for

implementing the DW environment, the budget allocated for data warehousing and BI, the DW

business value, end-user adoption, etc. This will be useful in creating a complete picture on the

current DW solution and its maturity. Also, once the questionnaire is filled in by more

organizations, this data will serve as input for statistical analysis and comparisons between

organizations from the same industry or across industries.

DW Technical Solution (32 questions) – it comprises of several scored questions for each of the

following sub-categories:

General Architecture and Infrastructure (9 questions)

Data Modelling (9 questions)

ETL (7 questions)

BI Applications (7 questions). More details on this part will be given in the next sections.

DW Organization & Processes (19 questions) – it comprises of several scored questions for each

of the following sub-categories:

Development Processes (11 questions)

Service Processes (8 questions). More details on this part will be given in the next

sections.

Each question from the questionnaire will have five possible answers which are scored from 1 to 5, 1

being a characteristic for the lowest maturity stage and 5 for the highest one. When an organization takes

the survey, it will first receive a maturity score for each sub-category by computing the average value of

the weightings (i.e.: sum of the weightings / number of questions); then, an overall score for each of the

two main categories will be given by computing the average value of the scores obtained for each sub-

category; and finally, an overall maturity score is shown following the same principle applied to the main

two categories scores.

We believe that the maturity scores for the sub-categories can give a good overview on the current DW

solution implemented by the organization. This is the reason why, after computing the maturity scores for

each sub-category, a radar graph as the one depicted in figure 1 will be drawn to show the alignment

between these scores. In this way, the organization will have a clearer image on their current DW project

and will know what sub-category is the strongest and which one is left behind.

Page 153: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 144 -

Moreover, after reviewing the maturity scores and the given answers by a specific organization, some

general feedback and advice for future improvements will be provided. Each organization that takes the

assessment will receive a document with a short explanation on the scoring method, a table with their

maturity scores and the radar graph, and then some general feedback that will consist of: a general

overview on the maturity scores; an analysis of the positive aspects already implemented in the DW

solution; and several steps that the organization should take in order to improve their current DW

application.

A condensed DW maturity matrix:

As our model measures the maturity of a DW solution, we also created two maturity matrices – a

condensed maturity matrix and a detailed one – each of them having five maturity stages as inspired by

the CMM: Initial (1); Repeatable (2); Defined (3); Managed (4); Optimized (5); where the initial stage

describes an incipient DW development and the optimized level shows a very mature solution that can be

obtained by an organization with a lot of experience in the field where everything is standardized and

monitored. An organization will usually be situated on different stages of maturity for each sub-category

that will determine the overall maturity level.

The condensed DW maturity matrix gives a short overview of the most important characteristics for each

sub-category for each maturity level. This will offer a better image on the main goal of the DWCMM and

on what the detailed maturity matrix entails. The condensed maturity matrix can be seen in figure 2.

Stages

Benchmark Variables

Initial (1) Repeatable (2) Defined (3) Managed (4) Optimized (5)

DW

Tec

hn

ica

l S

olu

tio

n

Architecture Desktop data

marts

Independent

data marts

Independent

data

warehouses

Central DW

with/without

data marts

DW/BI service

that federates a

central DW and

other sources

via standard

interface

Data Modelling No data models

synchronization

or standards

Manually

synchronized

data models

Manually or

automatically

synchronized

data models

Automatic

synchronization

of most data

models

Enterprise-wide

standards and

automatic

synchronization

of all the data

models

ETL Simple ETL with

no standards that

just extracts and

loads data into

the DW

Basic ETL with

simple

transformations

Advanced ETL

(e.g. slowly

changing

dimensions

manager, data

quality system,

reusability, etc.)

More advanced

ETL (e.g.

hierarchy

manager,

special

dimensions

manager, etc.)

Optimized ETL

for real-time

DW with all the

standards

defined

BI Applications Static and

parameter-driven

reports

Ad-hoc

reporting;

OLAP

Dashboards &

scorecards

Predictive

analytics; data

& text mining

Closed-loop &

real-time BI

applications

Page 154: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 145 -

DW

Org

an

iza

tio

n &

Pro

cess

es Development Processes Ad-hoc, non-

standardized

development

processes or

defined phases

Some

development

processes

policies and

procedures

established with

some phases

separated

Standardized

development

processes with

all the phases

separated and

all the roles

formalized

Quantitative

development

processes

management

Continuous

development

processes

improvement

Service Processes Ad-hoc, non-

standardized

service processes

Some service

processes

policies and

procedures

established

Standardized

service

processes with

all the roles

formalized

Quantitative

service

processes

management

Continuous

service

processes

improvement

Figure 2: DWCMM Condensed Maturity Matrix.

A detailed DW maturity matrix:

We will give a short overview on the detailed DW maturity matrix in this paragraph. First, the

characteristics for each maturity stage are usually obtained by mapping the correspondent answers of each

question from the maturity assessment questionnaire (except for several characteristics such as: project

management, testing and acceptance, whose answers are formulated in a different way). In this way, an

organization will be able to see their maturity stage by category (e.g.: General Architecture and

Infrastructure) and by main category characteristics (e.g.: metadata, standards, infrastructure, etc.). The

matrix has two dimensions:

columns – show each benchmark sub-category (i.e.: General Architecture and Infrastructure,

Data Modelling, ETL, BI Applications; Development Processes, Service Processes) with their

maturity stages from Initial (1) to Optimized (5);

rows – show the main analyzed characteristics (e.g.: for General Architecture and Infrastructure

– conceptual architecture, business rules, metadata, security, data sources, performance,

infrastructure, update frequency) for each sub-category divided by maturity stage.

Moreover, the matrix can be interpreted in two ways. First, one could take each stage and see which the

specific characteristics for each sub-category for that particular stage are. Second, one could take each

sub-category and see which its specific characteristics for each stage or for a particular stage are.

As the developed questionnaire does an assessment for each benchmark sub-category, a specific

organization will most likely follow the second interpretation. They would probably like to know what

steps to take to improve each sub-category and hence, the overall maturity score, which will lead to a

higher maturity stage. It is also very unlikely that an organization will have all the characteristics for all

the sub-categories on the same maturity stage at the same moment in time. Therefore, if a company gets a

maturity score of 3, this does not mean that all the characteristics for all the sub-categories are on stage

three. Depending also on the standard deviation and the answers themselves, we can find out more

information about the actual situation.

Now that the main components of the DWCMM have been identified, we will continue by taking a closer

look at the main categories and sub-categories of the model and their analyzed characteristics. These can

be depicted in the maturity assessment questionnaire and detailed maturity matrix. We will start with the

DW technical solution and continue with the DW organization and processes.

Page 155: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 146 -

DW Technical Solution Maturity

As mentioned earlier, the main components that need to be analyzed when doing an assessment of the

DW technical solution are: general architecture and infrastructure, data modelling, ETL and BI

applications.

General Architecture and Infrastructure

DW architecture includes: three main components (i.e.: data modelling, ETL, BI applications), several

data storage components (e.g.: source systems, data staging area, DW database, operational data store,

data marts) and the way they are assembled together (Ponniah, 2001), and underlying elements such as

infrastructure, metadata and security that support the flow of data from the source systems to the end-

users (Kimball et al., 2008; Chauduri & Dayal, 1997). This is connected to the conceptual approach of

designing and building the DW (e.g.: conformed data marts – Kimball or enterprise-wide DW – Inmon,

etc.). Therefore, in this research we consider architecture and infrastructure as a separate sub-category for

assessing maturity and for which the main characteristics will be further analyzed.

Conceptual architecture and its layers (question 1) – encompasses the conceptual approach of designing

and building the DW with all its data storage layers.

DW data sources (question 6) - the types of data sources that the DW extracts data from (e.g.: Excel files,

text files, relational databases, ERP & CRM systems, unstructured data: text documents, e-mails, images,

videos, Web data sources).

Infrastructure (question 8) – it provides the underlying foundation that enables the DW architecture to be

implemented (Ponniah, 2001), and it includes elements such as: hardware platforms and components,

operating systems, database platforms, connectivity and networking (Kimball et al., 2008).

Metadata management (question 4) – metadata can be seen as all the information that defines and

describes the structures, operations and contents of the DW system in order to support the administration

and effective exploitation of the DW. The main elements that influence its maturity are: the types of

implemented metadata (i.e.: business, technical or process) and the integration of metadata repositories

(Moss & Atre, 2003; Kimball et al., 2008).

Security management (question 5) – user access security is usually implemented through several methods,

presented here in hierarchical order of difficulty of implementation (Kimball et al., 2008; Moss & Atre,

2003; Ponniah, 2001): authentication, tool-based security, role-based security, authorization.

Business rules (questions 2 & 3) – they are abstractions of the policies and practices of a business

organization (Kaula, 2009), and are used to capture and implement precise business logic in processes,

procedures, and systems (manual or automated).

Performance optimization (question 7) – encompasses the various methods needed to improve DW

performance (Ponniah, 2001): software performance improvement (e.g.: index management, data

partitioning, parallel processing, view materialization); hardware performance improvement; specialized

DW appliances or cloud computing which are characteristics for a very high stage of maturity.

Page 156: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 147 -

Update frequency (question 9) – it is one of the characteristics that differentiate classical DW solutions

built for strategic and tactical BI from the newer DWs that process data in real time.

Data Modelling

Data modelling is the process of creating a data model. A data model is ―a set of concepts that can be used

to describe the structure of and operations on a database‖ (Navathe, 1992, pp. 112-113). Data modelling is

very important for creating a successful information system as it defines not only data elements, but also

their structures and relationships between them. The most important characteristics which should be taken

into consideration when assessing the maturity of data modelling are described below.

Synchronization between all the data models found in the DW (question 2) – establishing consistency

among data from a source to a target data storage and vice versa and the continuous harmonization of the

data over time.

Design levels (question 3) – encompasses all the data model design levels: conceptual design, logical

design and physical design.

Tool (question 1) – data models can be created by just drawing the models in different spreadsheets and

documents. However, the more mature solution is to use a data modelling tool that can make the design

itself and metadata management easier and more efficient.

Standards (questions 4 & 5) – standards in a DW environment are necessary and cover a wide range of

objects, processes, and procedures. All the maturity assessments related to standards will address general

aspects such as the definition and documentation of standards and their actual implementation. Most

often, standards related to data modelling refer to naming conventions for the objects and attributes in the

data models.

Metadata management (question 6) – encompasses the common subset of business and technical

metadata components as they apply to data (Moss & Atre, 2003): data names, definitions, relationships,

identifiers, types, lengths, policies, ownership, etc.

Dimensional modelling (questions 7, 8 & 9) – there are several data modelling techniques that can be

applied for data warehousing: relational (or normalized), dimensional, data vault, etc. In this research we

focused on dimensional modelling. For more information on dimensional modelling, see (Kimball, 1996).

Extract-Transform-Load (ETL)

As the name shows, the Extract-Transform-Load (ETL) process mainly involves the following activities:

extracting data from outside sources; transforming data to fit the target‘s requirements; loading data into

the target database. The ETL system is very complex and resource demanding (Kimball et al., 2008), and

hence, 60 to 80 percent of the time and effort of developing a DW project is devoted to the ETL system

(Nagabhushana, 2006). The main characteristics that we included in our ETL maturity assessment are

further described in this paragraph.

Complexity (question 2) – this refers to the maturity and performance of each ETL component (i.e.:

extract, transform, load). For example, the extraction phase should include a data profiling system, a

Page 157: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 148 -

change data capture system and the extract system itself. The transformation step usually includes

cleaning and transforming data according to the business rules and standards that have been established

for the DW. The DW load system takes the load images created by the extraction and transformation

subsystems and loads these images directly into the DW.

Data quality system (question 3) – data quality is critical for the success of a DW. Therefore, we decided

to include a question that would depict its main characteristics for each maturity stage regarding: daily

automation, specific data quality tools, identifying data quality issues and actually solving them.

Management and monitoring (question 4) – encompasses all the necessary capabilities for the ETL

processes to run consistently to completion and be available when needed (e.g.: an ETL job scheduler; a

backup system; a recovery and restart system – it can be manual or automatic; a workflow monitor, etc.)

Tool (question 1) – there is a constant debate whether an organization should deploy custom-coded ETL

solutions or should buy an ETL tool suite (Kimball & Caserta, 2004). A company that uses hand-coded

ETL usually does not have a very complex ETL process which shows a low level of maturity regarding

ETL capabilities.

Metadata management (question 7) – ETL is responsible for the creation and use of much of the metadata

describing the DW environment. Therefore, it is important to capture and manage all possible types of

metadata for ETL: business, technical and process metadata.

Standards (questions 5 & 6) – includes ETL specific standards that are related to: naming conventions,

set-up standards, recovery and restart system, etc.

BI Applications

BI applications, sometimes referred to as ―front-end‖ tools (Chauduri & Dayal, 1997), are what the end-

users see and hence, are very important for a DW to be considered a successful one. According to March

& Hevner (2007), a crucial point for achieving DW implementation success is the selection and

implementation of appropriate end-user analysis tools, because business benefits of BI are only gained

when the system is adopted by its intended end-users. The main aspects that determine the maturity of BI

applications are analyzed further in this paragraph.

Types of BI applications (question 1) – encompasses the main types of BI whose complexity contributes

to the maturity of a DW environment. According to Azvine et al. (2006), traditional BI applications fall

into the following categories sorted by ascending complexity: report what has happened – standard

reporting and query applications; analyze and understand why it has happened – ad-hoc reporting and

online analytical processing (OLAP); visualization applications (i.e.: dashboards, scorecards); predict

what will happen – predictive analytics (i.e.: data and text mining). In the last couple of years, due to the

development of real-time data warehousing, a new category of BI applications has developed called

operational BI and closed-loop applications (Kimball et al., 2008).

Delivery method (question 6) – it includes the main BI applications delivery methods. As end users are

interested only in the results they get from the BI applications, the easiness of accessing and delivering

these results is critical for the success of the DW solution.

Page 158: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 149 -

Tool (question 2) – defines the usage of BI applications tools which can really make a difference for the

DW solution.

Metadata management (question 7) – encompasses the main metadata accessibility methods. As BI

applications are what the end user sees, this is an important aspect for DW success (Moss & Atre, 2003).

Standards (questions 3 & 4) – it includes standards specific to BI Applications such as: naming

conventions, generic transformations, logical structure of attributes and measures, etc.

DW Organization and Processes Maturity

When assessing the maturity of a DW technical solution, the processes and roles involved in the project

also need to be analyzed. A good technical solution cannot be developed without the processes

surrounding it as there is a strong interconnection between the two parts. The necessary processes for a

DW project are: development processes and service processes.

DW Development Processes

A DW solution can be considered a software engineering project with some specific characteristics. And,

therefore, as any software engineering project, it will go through several development stages (Moss &

Atre, 2003). Since DW/BI is an enterprise-wide evolving environment that is continually improved and

enhanced based on feedback from the business community, the best approach for its development is

iterative and incremental development, with agile techniques for the development of BI applications

(Kimball et al., 2008; Ponniah, 2001). The high level phases and tasks required for an effective DW

implementation are (Kimball et al., 2008; Moss & Atre, 2003): project planning and management;

requirements definition; design; development; testing and acceptance; deployment/production. The main

characteristics which might influence the maturity of DW development processes can be seen below.

CMM levels (question 1) – as it is hard to judge which software development paradigm is better and more

mature, the first maturity question on development processes is a more general one and it refers to how

the DW development processes map to the CMM levels.

Project planning and management (question 7) – encompasses the main elements that determine the

maturity of this characteristic (Lewis, 2001): project planning and scheduling; project risk management;

project tracking and control; standard procedure and documentation; and evaluation and assessment.

DW/BI sponsor (question 6) – defines the extent of organizational support and sponsorship for the DW

environment. Strong support and sponsorship from senior business management is critical for a successful

DW initiative (Ponniah, 2001).

DW project team and roles (question 8) – encompasses how DW project roles and responsibilities are

formalized and implemented to solve skill-role mismatches (Humphries et al., 1999; Nagabhushana,

2006).

Requirements definition (question 10) – encompasses how requirements definition is done. In a DW,

users‘ business requirements represent the most powerful driving force (Ponniah, 2001) as they impact

virtually every aspect of the project.

Page 159: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 150 -

Testing and acceptance (question 11) – this is a critical phase for DW success as it includes several

important activities which are not always implemented. The degree of implementation influences the

success of a DW project and hence, its maturity.

Development/ testing/ acceptance/ production environments (question 2) – encompasses the way

organizations set up different environments for different purposes to support all the development phases

(Moss & Atre, 2003).

DW quality management (question 5) – its purpose is to provide management with appropriate visibility

into the development process being used by the DW project and the products being built (Paulk et al.,

1995).

Knowledge management (question 9) – encompasses all the knowledge management activities and the

way they are implemented.

Standards (questions 3 & 4) – makes an analysis of the standards used for successfully developing,

testing and deploying DW functionalities.

DW Service Processes

In the last two decades, software maintenance began to be treated as a sequence of activities and not as

the final stage of a software development project (April et al., 2004). These processes are very important

after a DW has been deployed in order to keep the system up and running and to manage all the necessary

changes. Lately, IT organizations made a transition from being pure technology providers to being service

providers. This service oriented perspective on IT organizations can be best applied to the software

maintenance field as it is an ongoing activity as opposed to the software development which is more

project based (Niessink & van Vliet, 2000). Over the years, various IT service frameworks have been

proposed, but one that acts as the de-facto standard for the definition of best practices and processes for

service support and service delivery is the Information Technology Infrastructure Library (ITIL) (Salle,

2004). Therefore, we will consider the service components from ITIL as a starting point for our analysis

of the DW service processes part. Moreover, two maturity models related to IT maintenance and service

also served as a foundation for this part of our DW maturity model: the Software Maintenance Maturity

Model (April et al., 2004) and the IT Service CMM (Niessink et al., 2002). Taking into consideration

these models and the changing nature of a DW, we considered the following components when assessing

the maturity of DW service processes.

Service quality management (question 2) – this is similar to the DW quality management, but applied to

the service processes.

Knowledge management (question 3) – this is also similar to the DW development processes, but in the

context of service processes.

Service level management (question 4) – it negotiates service level agreements (SLAs) with the suppliers

and customers and ensures that they are met by continual monitoring and reviewing (Cater-Steel, 2006).

Incident management (question 5) – its main objective is to provide continuity by restoring the service in

the quickest way possible by whatever means necessary (Salle, 2004).

Page 160: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 151 -

Change management (question 6) – it is described as a regular task for immediate and efficient handling

of changes that might occur in a DW environment.

Technical resource management (question 7) – the purpose of resource management is to maintain

control of the necessary hardware and software resources needed to deliver the agreed DW services level

targets (Niessink & van Vliet, 1999).

Availability management (question 8) – manages risks and ensures that all DW infrastructure, processes,

tools and roles are according to the SLAs by using appropriate means and techniques (Colin, 2004).

Release management (question 9) – as a DW is continuously changing and evolving over time, the

objective of release management is to ensure that only authorized and correct versions of DW are made

available for operation (Salle, 2004).

Evaluation of the DWCMM

In order to validate the DWCMM, two methods were chosen – expert validation and multiple case studies

– on which we will elaborate in this section.

Expert Validation

To evaluate the utility and further revise the DWCMM, expert validation was applied. An ―expert‖ is

defined by Hoffman et al. (1995) as a person ―highly regarded by peers, whose judgements are

uncommonly accurate and reliable and who can deal effectively with rare or tough cases. Also, an expert

is one who has special skills or knowledge derived from extensive experience with subdomains‖ (p. 132).

Therefore, eliciting knowledge from experts is very important and useful and can be done using several

methods, one of them being structured and unstructured interviews (Hoffman et al., 1995).

Moreover, five experts in data warehousing and BI were interviewed and asked to give their opinions

about the content of the model we have developed. The interviews were structured, but consisted of open

questions, in order to capture the knowledge of respondents. This offered the possibility of enabling the

experts to liberally state their opinions and ideas for improvement. The expert panel consists of five

experts from practice, each of them having at least 10 years of experience in the DW/BI field. An

overview of the experts and their affiliations is depicted in table 2. All of them are DW/BI consultants at

different organizations in The Netherlands (local or multinational).

Respondent

ID 1 2 3 4 5

Job Position CI/BI consultant Principal consultant/

Thought leader

BI/CRM

BI consultant Principal

consultant BI

BI consultant

Respondent Affiliation

Industry DW/BI

Consulting

IT Services BI Consulting IT Services DW Consulting

Market B2B B2B B2B B2B B2B

Employees ≈ 45 ≈ 49000 ≈ 35 ≈ 38000 ≈ 1

Table 2: Expert Overview.

Page 161: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 152 -

The experts were asked to give their opinions regarding the DWCMM structure, the DWCMM condensed

maturity matrix and the DW maturity assessment questionnaire. All reviewers gave positive feedback for

their first impression of all three deliverables, said they made sense and the model could be applied for

assessing an organization‘s current DW solution. Valuable insights and criticism were provided that

resulted in several (mostly minor) improvements. Furthermore, the category ―Architecture‖ was renamed

―General Architecture and Infrastructure‖ as the former created some confusion among the interviewees.

Some adjustments were made to the ETL characterization for each stage of the DWCMM condensed

maturity matrix. However, most feedback was received regarding the maturity assessment questionnaire.

This resulted in two categories of changes: proposed changes that due to time constraints and scope

limitation were not implemented in the final version of the model, but should be considered for future

research; and implemented improvement suggestions that involved some question rephrasing and answer

rephrasing or changing.

Multiple Case Studies

Depending on the nature of a research topic and the goal of a researcher, different research methods

(qualitative and quantitative) are appropriate to be used (Benbasat et al., 1987; Yin, 2009). One of the

most widely used qualitative research methods in information systems (IS) research is case study

research. It can be used to achieve various research aims: provide descriptions of phenomena, develop

theory and test theory (Darke et al., 1998). In our research, we will use it to test theory which in this case

is the DWCMM we developed. The theory is usually either validated or found to be inadequate in some

way, and may then be further refined on the basis of the case study findings. Case study research may

adopt single or multiple case designs.

As according to Benbasat et al. (1987) and Yin (2009), multiple case studies are preferred over single

ones to get better results and analytic conclusions, we decided to conduct a multiple case study research

following (Yin, 2009) case study approach. In this way, we can achieve a multiple goal: test the model in

practice to see if the chosen benchmark variables/categories, the maturity assessment questions and

answers match the organizations‘ specific solutions; and receive feedback and knowledge from

respondents regarding the DWCMM in order to make future improvements. Despite the fact that all

individual cases are interesting, this section focuses on the overall results.

Case Overview

The case studies have been conducted at four organizations of different sizes, operating in several types of

industries and offering a wide variety of products and services. An overview of the case study

organizations (figures are taken from 2009 annual reports) and respondents is depicted in table 3. The

main criterion used in the search for suitable organizations was that all approached organizations had a

professionally DW/BI system in place whose maturity could be assessed by applying the DWCMM.

Furthermore, an important criterion for the selection of respondent per case was that the interviewed

respondents had an overall view on the technical and organizational aspects for the DW/BI solution

implemented in their organization. A short analysis on the maturity scores each organization got after

taking the assessment is also given further in this paragraph.

Page 162: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 153 -

Organization A B C D

Industry Retail Insurance Retail Maintenance &

Servicing

Market B2C B2B & B2C B2C B2B

Revenue 19.94 billion € 4.87 billion € 780 million € NA

Employees ≈ 138000 ≈ 4500 ≈ 3660 ≈ 3500

Respondent

Function

BI consultant DW/BI technical

architect

BI manager BI consultant & DW

lead architect

Table 3: Case and Respondent Overview.

Case Study Analysis

In this section, a short analysis of the results gotten by all the organizations after filling in the assessment

questionnaire is given. The maturity scores regarding the implemented DW solution obtained by the

organizations can be seen in the table below.

Maturity Score

Benchmark Category

Organization A Organization B Organization C Organization D

Architecture 2.67 2.56 3.89 3.55

Data Modelling 2.17 3.44 3.00 4.11

ETL 3.14 3.29 3.71 2.86

BI Applications 2.71 2.71 3.43 3.57

Development Processes 2.90 3.19 3.66 3.02

Service Processes 2.63 3.00 2.87 3.12

Table 4: Organizations‘ Maturity Scores.

As shown in the picture depicting our model, a better way to see the alignment between the maturity

scores for the six categories is by drawing the radar graph. We will show here the radar graph for

organization A as an example.

Figure 3: Alignment Between Organization A‘s Maturity Scores.

Some more information regarding the maturity scores for all the four case studies are provided in the table

below.

012345

Architecture

Data Modelling

ETL

BI Applications

Development Processes

Service Processes

Organization A

Ideal Situation

Page 163: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 154 -

Organization

Maturity Score

A B C D

Total Maturity Score for DW

Technical Solution

2.67 3.00 3.51 3.52

Total Maturity Score for DW

Organization & Processes

2.77 3.10 3.26 3.07

Overall Maturity Score 2.72 3.05 3.38 3.29

Highest Score ETL - 3.14 Data Modelling -

3.44

Architecture -

3.89

Data Modelling -

4.11

Lowest Score Data Modelling -

2.17

Architecture - 2.56 Service Processes

- 2.87

ETL - 2.86

Table 5: Maturity Scores Analysis.

As can be seen from table 4, maturity scores for each sub-category are usually between 2 and 4, with one

exception: organization D scored 4.11 for Data Modelling. Thus, the overall maturity scores and the total

scores per category ranged between 2 and 4 which shows that most organizations are probably

somewhere between the second and fourth stage of maturity. The highest maturity score was gotten by

organization C, and the lowest one by organization A. Apparently, an overall score close to 4 or 5 is quite

difficult to achieve. This is usually normal in maturity assessments, as in practice, nobody is so close to

the ideal situation. It will be interesting to see the range of scores after the questionnaire will have been

filled in by a large number of organizations.

From table 5 it can be seen that the categories with the highest and lowest scores are diverse depending on

the organization. For example, organization A scored lowest for Data Modelling, whereas Data Modelling

was the most mature variable for organization D. Interesting conclusions can also be drawn if comparing

the scores for organizations A and C as they are part of the same industry. The former is an international

food retailer and has more experience in this industry, whereas the latter is a local one with less

experience. However, organization A got a quite low DW maturity score. Thus, experience in the industry

does not also mean maturity in data warehousing. Of course, more factors can influence this difference in

scores: size, the way data warehousing/BI is embedded in the organizational culture, the percentage from

the IT budget for BI, etc.

However, the goal of our model is not only to give a maturity score to a specific organization, but also

provide them with some feedback and the necessary steps for reaching a higher maturity stage. For

example, the overall maturity score for organization A is 2.72, which leaves a lot of room for

improvement. Moreover, as the lowest score is for Data Modelling, a good starting point would be this

category. Due to confidentiality reasons, more details regarding the maturity scores and feedback cannot

be offered here.

Benchmarking

As already mentioned in the previous sections, the DWCMM can serve as a benchmarking tool for

organizations. The DW maturity assessment questionnaire provides a quick way for organizations to

assess their DW maturity and, at the same time, compare themselves in an objective way against others in

the same industry or across industries. Of course, better results will be achieved for benchmarking after

more organizations will take the maturity assessment. However, in order to have a better image on how

Page 164: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 155 -

the graph will look like when doing benchmarking, we will provide here an example for organization A

using the data from the case studies we performed. The bar chart can be depicted below.

Figure 4: Benchmarking for Organization A.

To sum up, the DW maturity assessment questionnaire can be successfully applied in practice. We

generally received positive feedback regarding the questionnaire from the case study interviewees. In this

way, we could test whether the questions and their answers are representative for assessing the current

DW solution for a specific organization and if they can be mapped to any organization depending on the

situational factors. Respondents usually had no problems in recognizing the proposed benchmark

categories and understanding the questions and answers from the survey. We also had the chance to apply

the scoring method and give appropriate feedback for each case study. Finally, we combined all the

feedback received from the case studies and did some minor, but valuable improvements to several

questions and answers in order for them to be more representative for the analyzed characteristics and

better fit the maturity stages.

Conclusions and Further Research

This research has been triggered by the estimates made by (Gartner, 2007) and other researchers that

more than fifty percent of DW projects have limited acceptance or fail. Therefore, we developed a Data

Warehouse Capability Maturity Model (DWCMM) that would help organizations assess the technical

aspects of their current DW solution and provide guidelines for future improvements. In this way we

attempted to answer the main research question for our study:

How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon?

The main conclusion from our study is that, even if our maturity model could help organizations improve

their DW solutions, there is no ―silver bullet‖ for a successful development of DW/BI solutions. The

DWCMM provides a quick way for organizations to assess their DW/BI maturity and compare

themselves in an objective way against others in the same industry or across industries. It received

positive feedback from the five experts that reviewed and validated it and it also resonated well with the

audiences from our four case studies. Several (mostly minor) improvements were made after the

validation process.

0 1 2 3 4 5

Architecture

Data Modelling

ETL

BI Applications

Development Processes

Service Processes

Average Score

Best Practice

Organization A

Page 165: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 156 -

However, our model is not without limitations. First of all, it is critical to emphasize the fact that the

model only does a high-level assessment. In order to truly assess the maturity of their DW/BI solutions

and discover the strong and weak variables, organizations should use our assessment as a starting point

for a more thorough analysis. In the future, several questions could be added in our model for a more

detailed analysis of the current DW/BI environment and more valuable feedback offered to organizations.

Second, a limitation of this study is that it is based on the design science research which answers to

research questions in the form of design artifacts. Being a qualitative research method, a risk for

objectivity might arise. Another limitation is related to the validation process for our model. Due to time

constraints and difficulty of finding them, it was reviewed only by five experts. Therefore, more experts

should be interviewed in the future to enrich the structure and content of the model. Also, due to the fact

that the model was tested only in four cases, it is not possible to generalize the findings to any given

similar situation. For further research, it would be interesting to validate the model using quantitative

research methods. In this way, we will be able to do some statistical analysis on the data, more valuable

benchmarking and improvements on the whole structure of the model. Another future extension that

would increase the value of the model could include questions and analysis for other types of data

modelling (e.g.: normalized modelling, data vault, etc.) because, as stated earlier in this paper, we limited

our maturity assessment only to dimensional modelling. Last, but not least, more work is also needed to

extend our model to the analysis of DW/BI end user adoption and business value. New benchmark

categories and maturity assessment questions could be added regarding these two problems.

References

AbuAli, A., & Abu-Addose, H. (2010). Data Warehouse Critical Success Factors. European Journal of Scientific

Research, 42 , (2), 326-335.

Aldrich, H., & Mindlin, S. (1978). Uncertainty and Dependence: Two Perspectives on Environment. In L. Karpik,

Organization and Environment: Theories, Issues and Reality (pp. 149-170). London: Sage Publications Inc.

Arnott, D., & Pervan, G. (2005). A Critical Analysis of Decision Support Systems Research. Journal of Information

Technology, 20 , (2), 67-87.

Blumberg, R., & Atre, S. (2003). The Problem with Unstructured Data. Retrieved July 23, 2010, from Information

Management: http://www.information-management.com/issues/20030201/6287-1.html

Cater-Steel, A. (2006). Transforming IT Service Management - the ITIL Impact. Proceedings of the 17th

Australasian Conference on Information Systems. Adelaide, Australia.

Cavaye, A. (1996). Case Study Research: A Multifaceted Research Approach for Information Systems. Information

Systems Journal, 6 , 227-242.

Chamoni, P., & Gluchowski, P. (2004). Integrationstrends bei Business-Intelligence-Systemen, Empirische

Untersuchung auf Basis des Business Intelligence Maturity Model. Wirtschaftsinformatik, 46 , (2), 119-128.

Chauduri, S., & Dayal, U. (1997). An Overview of Data Warehousing and OLAP Technology. ACM Sigmod

Record, 26 , (1), 65-74.

Choo, C. (1995). Information Management for the Intelligent Organization. Medford, NJ: Information Today, Inc.

Colin, R. (2004). An Introductory Overview of ITIL. Reading, United Kingdom: itSMF Publications.

Page 166: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 157 -

de Bruin, T., Freezey, R., Kulkarniz, U., & Rosemann, M. (2005). Understanding the Main Phases of Developing a

Maturity Assessment Model. Proceedings of the 16th Australasian Conference on Information Systems. Sydney,

Australia.

Eckerson, W. (2004). Gauge Your Data Warehousing Maturity. Retrieved July 3, 2010, from The Data Warehousing

Institute: http://tdwi.org/Articles/2004/10/19/Gauge-Your-Data-Warehousing-Maturity.aspx?Page=2

Feinberg, D., & Beyer, M. (2010). Magic Quadrant for Data Warehouse Database Management Systems. Retrieved

July 21, 2010, from Business Intelligence: http://www.businessintelligence.info/docs/estudios/Gartner-Magic-

Quadrant-for-Datawarehouse-Systems-2010.pdf

Gartner. (2007, February 1). Creating Enterprise Leverage: The 2007 CIO Agenda . Retrieved June 24, 2010, from

Gartner: http://www.gartner.com/DisplayDocument?id=500835

Gray, P., & Negash, S. (2003). Business Intelligence. Proceedings of the 9th Americas Conference on Information

Systems, (pp. 3190-3199). Tampa, Florida, USA.

Hakes, C. (1996). The Corporate Self Assessment Handbook, 3rd edition. London: Chapman & Hall.

Hevner, A., March, S., Park, J., & Ram, S. (2004). Design Science in Information Systems Research. Management

Information Systems Quarterly, 28 , (1), 75-106.

Hoffman, R., Shadbolt, N., Burton, A., & Klein, G. (1995). Eliciting Knowledge from Experts: A Methodological

Analysis. Organizational Behaviour and Human Decision Processes, 62 , (2), 129-158.

Hwang, H., Ku, C., Yen, D., & Cheng, C. (2005). Critical Factors Influencing the Adoption of Data Warehouse

Technology: A Study of the Banking Industry in Taiwan. Decision Support Systems, 37 , 1-21.

Inmon, W. (1992). Building the Data Warehouse. Indianapolis: John Wiley and Sons, Inc.

Kaula, R. (2009). Business Rules for Data Warehouse. International Journal of Information Technology, 5 , 58-66.

Kaye, D. (1996). An Information Model of Organization. Managing Information, 3 , (6),19-21.

Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., & Becker, B. (2008). The Data Warehouse Lifecycle Toolkit,

2nd Edition. Indianapolis: Wiley Publishing, Inc.

Klimko, G. (2001). Knowledge Management and Maturity Models: Building Common Understanding . Proceedings

of the 2nd European Conference on Knowledge Management, (pp. 269-278). Bled, Slovenia.

Lewis, J. (2001). Project Planning, Scheduling and Control, 3rd Edition. New York: McGraw-Hill.

Madden, S. (2006). Rethinking Database Appliances. Retrieved July 21, 2010, from Information Management:

http://www.information-management.com/specialreports/20061024/1066827-1.html?pg=1

March, S., & Hevner, A. (2007). Integrated Decision Support Systems: A Data Warehousing Perspective. Decision

Support Systems, 43 , (3), 1031-1043.

Moss, L., & Atre, S. (2003). Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support

Applications. Boston: Addison Wesley.

Nagabhushana, S. (2006). Data Warehousing. OLAP and Data Mining. New Delhi: New Age International Limited.

Page 167: DW - Capability Maturity Model

DWCMM: The Data Warehouse Capability Maturity Model Catalina Sacu

- 158 -

Navathe, S. B. (1992). Evolution of Data Modelling for Databases. Communications of the ACM, 35 , (9), 112-123.

Nolan, R. (1973). Managing the Computer Resource: A Stage Hypothesis. Communications of the ACM, 16 , (7),

399-405.

Ponniah, P. (2001). Data Warehousing Fundamentals. New York: John Wiley & Sons, Inc.

Salle, M. (2004). IT Service Management and IT Governance: Review, Comparative. Retrieved July 16, 2010, from

HP Technical Reports: http://www.hpl.hp.com/techreports/2004/HPL-2004-98.pdf

Sacu, C., Spruit, M. & Habers, F. (2010). Data Warehouse (DW) Maturity Assessment Questionnaire. Utrecht:

Utrecht University.

Sen, A., & Sinha, A. (2005). A Comparison of Data Warehousing Methodologies. Communications of the ACM, 48 ,

(3), 79-84.

Vaishnavi, V., & Kuechler, W. (2008). Design Science Research Methods and Patters: Innovating Information and

Communication Technology. Boca Raton, Florida: Auerbach Publications Taylor & Francis Group.

Yin, R. (2009). Case Study Research Design and Methods. Thousand Oaks, California: SAGE Inc.