tdwi modern data warehousing maturity model guide · 2 tdwi research tdwi modern data warehousing...

20
TDWI Modern Data Warehousing Maturity Model Guide Interpreting Your Assessment Score By Fern Halper 2018

Upload: others

Post on 27-Jul-2020

18 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

TDWI Modern Data Warehousing

Maturity Model GuideInterpreting Your Assessment Score

By Fern Halper

2018

Page 2: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

Research Sponsor

Research Sponsor

Page 3: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

tdwi.org 1

TDWI RESEARCH

© 2018 by TDWI, a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. Email requests or feedback to [email protected].

Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies. Inclusion of a vendor, product, or service in TDWI research does not constitute an endorsement by TDWI or its management. Sponsorship of a publication should not be construed as an endorsement of the sponsor organization or validation of its claims.

This report is based on independent research and represents TDWI’s findings; reader experience may differ. The information contained in this report was obtained from sources believed to be reliable at the time of publication. Features and specifications can and do change frequently; readers are encouraged to visit vendor websites for updated information. TDWI shall not be liable for any omissions or errors in the information in this report.

Table of ContentsForeword from the Author 3

Value of a Maturity Model 4

Model Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Data Warehouse Modernization Overview 6

Support for Diverse Data . . . . . . . . . . . . . . . . . . . . . . . 6

Support for Sophisticated Analytics . . . . . . . . . . . . . . . . . 7

Role of the Public Cloud . . . . . . . . . . . . . . . . . . . . . . . 8

Stages of Maturity 9

Stage One: Traditional Warehouse . . . . . . . . . . . . . . . . . . 9

Stage Two: Modernizing Begins . . . . . . . . . . . . . . . . . . 10

Inflection Point . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Stage Three: Modern . . . . . . . . . . . . . . . . . . . . . . . . 13

Stage Four: Advanced/Visionary . . . . . . . . . . . . . . . . . . 14

Evaluating Assessment Scores 15

Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Summary 16

TDWI Modern Data WarehousingMaturity Model GuideInterpreting Your Assessment Score

2018

Page 4: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

2 TDWI RESEARCH

TDWI Modern Data Warehousing Maturity Model Guide

About the AuthorFERN HALPER, PH D , is VP and senior director of TDWI Research for advanced analytics. She is well known in the analytics community, having been published hundreds of times on data mining and information technology over the past 20 years. Halper is also coauthor of several Dummies books on cloud computing and big data. She focuses on advanced analytics, including predictive analytics, text and social media analysis, machine learning, AI, cognitive computing, and big data analytics approaches. She has been a partner at industry analyst firm Hurwitz & Associates and a lead data analyst for Bell Labs. Her Ph.D. is from Texas A&M University. You can reach her by email ([email protected]), on Twitter (twitter.com/fhalper), and on LinkedIn (linkedin.com/in/fbhalper).

About TDWI ResearchTDWI Research provides research and advice for BI professionals worldwide. TDWI Research focuses exclusively on analytics and data management issues and teams up with industry practitioners to deliver both broad and deep understanding of the business and technical issues surrounding the deployment of business intelligence and data management solutions. TDWI Research offers reports, commentary, and inquiry services via a worldwide membership program and provides custom research, benchmarking, and strategic planning services to user and vendor organizations.

SponsorGoogle sponsored the research for this TDWI Guide and its accompanying Interactive Assessment Tool.

Page 5: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

tdwi.org 3

Foreword

Foreword from the Author TDWI research indicates that organizations are often evolving, extending, and modernizing their data warehouse environments. This is being driven by a number of factors, including the need to support business goals, the need for greater capacity for “new” and real-time data types, the need to support modern analytics practices that involve “new” techniques such as machine learning, and the need for real-time data to support decision making.

Although some organizations are satisfied with the traditional warehouse that deals primarily in structured data and reporting, the vast majority of organizations we survey recognize the importance of modernizing their warehouse environment. This modernization can take many forms, from server upgrades to adding new platforms such as appliances, data lakes, or streaming platforms. The cloud also plays an important part in the modernization effort. More often, organizations are moving their data to the cloud to take advantage of its flexibility and ability to deal with data at scale.

In fact, the notion of a warehouse is evolving along with the technology. The old requirements for a warehouse were largely about data content: dimensional models, time series, OLAP, and provisioning data for reporting. The new requirements include support for real-time or streaming data, support for multistructured data and advanced analytics, and a services architecture. For example, newer models such as serverless computing are gaining traction. In this cloud-computing model, the cloud provider dynamically manages the allocation of machine resources, which frees up the organization to focus on more important matters, such as data analysis.

Of course, there are many ways that organizations can modernize their warehouse to meet the evolving needs of the business. Some important capabilities to measure maturity in the modernization effort include support for data diversity, infrastructure agility, analytics, collaboration, and security and governance.

We are excited to offer a maturity model for modern data warehousing because modern warehouse environments are an important market trend. TDWI is well known for its maturity models and assessment tools. In early 2014, we created a big data maturity model to help organizations understand how their big data and big data analytics deployments compared with those of their peers and how they could advance with analytics. The next year we created an analytics maturity model. We followed that up with two readiness assessments: one for IoT readiness and the other for Hadoop readiness. More recently, we published a self-service maturity model and an advanced analytics maturity model.

I trust you will find this guide and the related maturity model useful in your modernization efforts.

Fern Halper, TDWI VP Research, Sr. Director Research for Advanced Analytics

Page 6: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

4 TDWI RESEARCH

TDWI Modern Data Warehousing Maturity Model Guide

Value of a Maturity ModelTDWI Research indicates that organizations see modernizing their warehouse as an opportunity that can lead to improvements in decision making, analytics, real-time data usage, and business operations. A maturity model can help guide business and IT professionals on their data warehouse modernization journey. It provides a framework for an enterprise to understand where it is, where it has been, and where it still needs to go to support capabilities and requirements for its modernization efforts. The model can also guide an organization at the beginning of its journey by helping it understand best practices used by companies that are further along in their deployments.

A great feature of TDWI maturity models is the interactive benchmark assessment. At the end of the survey, you will be able to quantify how mature/modern your data warehouse is in an objective way, understand your progress, and identify what it will take to get to the next level of maturity. This guide will help you understand the phases of maturity in modern data warehousing and interpret your benchmarking scores.

Model DimensionsThe Modern Data Warehouse Maturity Model assessment asks approximately 50 questions across five categories that form the dimensions of the TDWI Modern Data Warehouse Maturity Model (see Figure 1).

• DATA DIVERSITY. A data warehouse should be able to manage and support large amounts of multistructured data. Does the warehouse support a variety of data types? Can the organization make decisions based on current data in the warehouse? Is that data fresh? Is the data accessible in real time? Is the schema flexible for new data sources? Can the warehouse scale up and down easily to support varying amounts of disparate data? Can users get a holistic view of data across sources, both internal and external to company/not in data warehouse?

• INFRASTRUCTURE AGILITY. Warehouse infrastructure needs to be agile to support the various needs of the organization. How integrated (e.g., supports multiple IT components) is the data warehouse architecture in support of business use cases? Does the warehouse provide the freedom to query data from anywhere? Can the warehouse ingest data in real time? Can data be processed in the warehouse and analyzed? Does the warehouse separate compute from storage such that customers can scale each independently? Can compute be brought to the data rather than moving the data?

TDWI defines a data warehouse as a data architecture populated with data, and the data is managed by a data warehouse platform that is usually based on a relational database management system and its hardware. The systems architecture of data warehouses has become a prominent source of innovation and modernization because it includes new platform types and creative user practices. Note that the actual warehouse is largely data and should not be confused with the data platforms and their enterprise servers, which are key components of a data warehouse infrastructure.1

1Philip Russom, 2016, TDWI Best Practices Report: Data Warehouse Modernization in the Age of Big Data Analytics online at tdwi.org/bpreports.

Page 7: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

tdwi.org 5

Value of a Maturity Model

• ANALYTICS SUPPORT. Analytics is the key use case for the data warehouse. What is the scope of analytics supported by the warehouse? Does it support more advanced analytics such as machine learning or predictive analytics? Does the warehouse support geospatial data types and functions? Is performance acceptable? Can organizations analyze “new” forms of data in the warehouse? Does the warehouse support production analytics? Is it easy to explore data in the warehouse?

• SHARING AND COLLABORATION. Organizational data strategies that allow for sharing and collaboration are also a sign of data warehouse maturity. Does the data strategy align with the business strategy? Can the organization share its data both internally and externally? Does operational overhead stymie the data warehouse? Does the data warehouse allow users to work collaboratively by sharing data and workloads/queries?

• SECURITY AND GOVERNANCE. Governance is critical to data warehouse maturity. How easy to understand is the company’s data governance strategy in support of the data warehouse? Are policies and processes in place, e.g., for data access? Are data quality processes deployed and measured? Are there preferred standards in place? Is tooling in place to support governance? How does the organization secure the data in the warehouse and across warehouse platforms?

Figure 1. TDWI Modern Data Warehouse Maturity Model framework.

Page 8: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

6 TDWI RESEARCH

TDWI Modern Data Warehousing Maturity Model Guide

The Modern Data Warehouse Maturity Model consists of four stages plus an inflection point where the data warehouse becomes more modern (see Figure 2).

Figure 2. The four stages of maturity for a modern data warehouse.

Data Warehouse Modernization OverviewData warehouse platforms are constantly evolving. Almost 10 years ago, TDWI wrote about what we called “generational changes” that adapt the resulting data warehouse to changing business and technology requirements.2 Back then, a majority of respondents to our surveys were using a centralized enterprise data warehouse. The reality is that many of the traditional, older warehouse environments cannot meet the requirements (e.g., iterative analytics with good performance) for sophisticated analysis of real-time data at scale. As organizations move from batch to real-time, from reporting to advanced analytics, they are extending their platforms to become more responsive to business needs.

Today, many organizations are implementing what TDWI terms a multiplatform modern data architecture to cope with changes in data and analytics. These multiplatform environments include both on-premises and cloud deployments. Our research shows that the cloud is a major growth area for data warehouse modernization; organizations are looking to move all or part of their warehouses to the cloud as well as other platforms, which form the extended warehouse environment. This evolving environment supports diverse data as well as advanced and real-time analytics. These two big capabilities are described below.

Support for Diverse DataAlthough many organizations are still dealing with structured data in their data warehouse, TDWI research indicates that companies have increasing interest in disparate kinds of data. This includes text data, images, streaming data, geospatial data, and machine-generated data—to name a few. This data comes from both internal and external sources and is becoming more critical for driving business value. For instance, many organizations are looking to analyze sensor data for predictive maintenance. They are collecting data about customers, including social media data, to better understand the customer experience and improve customer experience. The current warehouse environment may not be able to support this variety of data.

Likewise, organizations want to make decisions using a holistic view of data from both internal and external sources. That means users want a unified view of potentially distributed data. They want the data to be current, which means that the warehouse environment needs to refresh potentially large amounts of data in real time or near real time. This might involve continuous ingestion, which is quite fast and frequent compared to batch loads.

TDWI research indicates that organizations have

increasing interest in disparate data types

such as unstructured data, streaming data, and machine-generated data

2Philip Russom, 2009, TDWI Best Practices Report: Next Generation Data Warehouse Platforms, online at tdwi.org/bpreports.

Page 9: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

tdwi.org 7

Data Warehouse Modernization Overview

Support for Sophisticated AnalyticsOne of the top drivers for data warehouse modernization is the need to support analytics. This takes a number of forms.

UTILIZING MODERN ANALYTICS. Although the traditional warehouse is well suited for batch-oriented reports and dashboards, organizations are looking for their modern warehouse environment to support more sophisticated analytics. For example, machine learning algorithms—which can learn to identify patterns from large amounts of data—are becoming quite popular. Machine learning is typically an iterative process that involves tuning and model refinement. Organizations are using machine learning to predict conversions from clickstreams or to cluster orders in order to determine high-level categories. Organizations are using deep learning to train systems to recognize images. For instance, deep learning is used to classify photos in business for online auto sales or for identifying other products. Enterprises are also interested in IoT analytics, which typically involves the cloud, as part of the streaming architecture. All of these use cases may be a challenge for the traditional, on-premises data warehouse.

INFRASTRUCTURE TO SUPPORT QUERIES. Likewise, organizations are interested in querying large amounts of data using SQL. SQL has been part of the fabric of organizational analytics for years and companies will continue to want to use it against big data. Traditional warehouse environments may have performance issues when dealing with SQL against data at scale and will require frequent performance tuning.

REAL-TIME ANALYTICS. User organizations continue to push their applications for BI and analytics closer to real-time operation. This is because fresh information can support fast-paced, time-sensitive business processes such as operational BI, fraud detection, facility monitoring, and recommendations in e-commerce. This requires capturing and updating data as it is created.

NEW WAREHOUSE PRACTICES. There are a number of practices that are playing an important role in this evolving data environment to support warehouse performance and analytics. These include:

• CHANGES IN ETL. One important trend we see at TDWI is that ETL is changing. Instead of heavily transforming the data before putting it into the warehouse, organizations are more often inserting data in whatever form is convenient for the customer directly into the modern warehouse. Then they either apply the structure they need at query time or put the data through a transformation after ingestion. This later model is known as extract, load, transform (ELT). One distinct advantage is that the transformations are now captured in SQL, which is a more understandable and maintainable language than the ETL pipelines of the past.

• BRINGING COMPUTE TO THE DATA. An important trend in data warehousing is to process data in the warehouse, where the data is stored, rather than moving it. This makes sense; it can be quite time consuming and costly to move terabytes of data from one place to another for analysis. It also poses security issues. The trend is to bring compute to the data for both processing and analysis.

• SEPARATING COMPUTE FROM STORAGE. Although many DBMSs separate compute from storage, some of the newer platforms, such as Hadoop, are not designed that way. Here, each node includes storage and compute. However, when compute and storage aren’t decoupled, when one grows, the other needs to grow. This can be costly, especially for analysis. It makes more sense to separate the two. Data remains, for instance, in an object store and compute is used when needed.

• AUTOMATION. Removing manual repetitive processes (such as compression, defragmentation, backup, etc.) can be cost-effective and increase productivity for routine operations. These include automation to replace manually intensive data warehouse operations as well as utilizing

Page 10: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

8 TDWI RESEARCH

TDWI Modern Data Warehousing Maturity Model Guide

more advanced analytics for data profiling, preparation, and transformation in processing data that moves to the warehouse.

Role of the Public CloudThe public cloud is playing an important role in data warehouse modernization. The cloud enables organizations to spin up systems faster and respond to immediate business demand for data and analytics sooner. Organizations have many options when it comes to the cloud. Some organizations plan to move their entire warehouse to the cloud. Others will move their data lakes to the cloud. Still others will identify a core data set that has value and send that to the cloud on a regular basis. Cloud vendors also provide data warehouse options. Newer options such as serverless computing are, in many ways, redefining the data warehouse. In this model, the provider dynamically manages the allocation of machine resources. It scales up and down automatically in response to current load. The organization is charged only for the compute time used.

The cloud and serverless models provide a number of advantages when it comes to modernizing the data warehouse:

• ELASTICITY AND SCALABILITY. Users like the ability of the cloud to scale on demand. Companies can obtain storage and compute resources when needed. This is especially appealing to organizations looking to onboard new data sources and analyze data in new ways. The cloud can automatically allocate compute resources when analytics workloads ramp up and then reallocate them when the work is finished. Data can remain in a cloud data store. The cloud also reduces the amount of time needed for capacity planning, especially compared to adding server nodes on premises or having to re-tune and restructure a relational database management system.

• SPEED AND AGILITY. As part of being flexible, the speed of cloud implementations can provide agility. Cloud compute is immediately available. Additionally, with some of the new serverless models, this is automatic, which means that the organization buying cloud services does not even need to provision servers.

• TOTAL COST OF OWNERSHIP. Start-up costs can be lower when using the cloud versus on-premises data platforms. Maintenance costs can also be lower. Although there may be hidden costs associated with the cloud, many organizations feel that the faster time to implement, the ease of use, and the agility are overriding factors.

At TDWI, we’ve seen different approaches to the cloud, as mentioned above. We’ve seen some organizations migrate most of their data warehouse to the cloud. We’ve seen organizations move their entire warehouse to the cloud. We’ve seen other organizations migrate pieces of their warehouse to the cloud in a hybrid model. For example, an organization might utilize the cloud to offload its newer data for exploratory analysis and more sophisticated analysis by its data scientists. Alternatively, it might offload data to the cloud to lighten the load on overburdened on-premises systems. The point is that as organizations modernize their warehouse environments, they are able to mature in terms of the kinds of data they can collect and analyze to drive value for their business.

The cloud and serverless models provide

advantages when it comes to modernizing

the data warehouse

Page 11: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

tdwi.org 9

Stages of Maturity

Stages of MaturityThere are four stages of maturity in the TDWI Modern Data Warehouse Maturity Model: Traditional, Modernizing/Evolving, Modern, and Advanced/Visionary. The characteristics of each stage are described below.

Stage One: Traditional Warehouse

In this stage, the primary warehouse is typically the traditional enterprise data warehouse. It is usually on premises and consists mostly of well-understood structured data. Data from transactional systems is loaded into the platform in batch mode, which can feed data marts and cubes for operational reporting.

• DATA DIVERSITY. In this stage, the warehouse stores primarily structured data in a centralized on-premises warehouse for business requirements around reporting and dashboards. The warehouse might also support performance management activities. The data in the warehouse is typically well understood and has been cleansed for known production requirements. The traditional warehouse does not usually support semistructured, multistructured, or streaming data. The warehouse is managed by IT and supports data to an upper limit in terabytes, typically about 10 terabytes.

• INFRASTRUCTURE AGILITY. Although the warehouse does a good job of supporting operational dashboards and reports, it is limited in the way it can support the needs of a real-time business. For example, data is loaded in batch, maybe once a day or at longer intervals. Data is not refreshed often, perhaps every month or two. Users may be able to interact with dashboards and perhaps visual analytics, but they are often limited in how they can access data from a warehouse platform for more advanced analytics. Additionally, the platform does not scale to support disparate data types. It can take a long time to update the warehouse to support user requirements in terms of new data or new analytics. Performance tuning and capacity planning are common activities associated with the traditional warehouse. More fundamentally, sometimes the infrastructure cannot be scaled at all; adding 1TB of memory can require buying an entire new rack of hardware that adds time and cost.

• ANALYTICS SUPPORT. Although the warehouse team may support self-service visual analytics from the warehouse environment in addition to batch reports and dashboards, it is hard for the warehouse to support analytics beyond this. That means that if someone wants to build analytics models using machine learning or predictive analytics, they would have to ask for a data dump from the warehouse to do so. Because the warehouse doesn’t house unstructured or other kinds of “new” data, it would be difficult to use the warehouse environment to support text or streaming analytics. Without real-time refresh capabilities, it is also difficult to utilize analytics to make real-time decisions.

• SHARING AND COLLABORATION. In this stage, access to the data warehouse is often set by strict policies. As stated above, it may be difficult to get data for analysis. This may be due to company policies along with a lack of tooling to make it easy to access data from the warehouse.

In this stage, the warehouse stores primarily structured data in a centralized on-premises warehouse

Page 12: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

10 TDWI RESEARCH

TDWI Modern Data Warehousing Maturity Model Guide

Additionally, organizations at this stage might collaborate on visual storytelling but not on analytics model building. The tools and techniques available to them make this difficult. In the traditional warehouse stage, organizations understand that they will need to do more. They start to realize that the warehouse will have to support multiple personas—including business analysts, data scientists, and data engineers, to name a few.

• SECURITY AND GOVERNANCE. During this stage of maturity, data governance may be in place, but it is typically IT driven and quality focused. This is not necessarily a bad thing; it is just that organizations will need to make some changes as they modernize their environments.

Although the warehouse supports the current needs of the organization, there may be discussion starting about extending or enhancing the platform to support the future requirements of the organization around disparate data. If the discussion isn’t happening, it should be.

Stage Two: Modernizing/Evolving Begins

When modernizing begins, the organization is facing the reality of new data and analytics requirements, often driven by the business. The company may be looking to collect data from new sources. There may be high volumes of this data. Warehouse scale and performance is not where it needs to be for storing, querying, and extracting data. Additionally, the organization may have hired some data scientists to start doing more sophisticated analysis on a particular project or they may want to iterate on models on the fly. In other words, by virtue of the demand for insights from analytics, the warehouse is becoming overburdened. This means that the organization will be looking to deploy new features or platforms such as in-database analytics or the cloud.

• DATA DIVERSITY. At this stage, the data warehouse environment begins to expand as the organization starts to invest in new platforms to manage diverse data. Data volumes also increase (often beyond 10 TB). Here, too, organizations are starting to make the move to an integrated holistic view of the data. That might involve federation or virtualization across data sources. Typically, architects will get involved to evolve the design of the data warehouse to include diverse data types.

• INFRASTRUCTURE AGILITY. Here, the data warehouse infrastructure might expand to include other platforms or services such as a cloud data warehouse, columnar store, or data lake to manage and analyze disparate data and move towards real time. Some platforms might be cloud-based. Whether on premises or in the cloud, the data warehouse is supporting more complex data pipelines including ingestion, processing, and analytics. Given the nature of new, big, potentially streaming data, massively parallel processing (MPP) warehouses are more common. In fact, during this stage, the organization might use proofs of concept to experiment with new platforms that provide agility, flexibility, and scalability.

In this stage, the data warehouse is evolving

to support a growing analytics portfolio

Page 13: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

tdwi.org 11

Stages of Maturity

• ANALYTICS SUPPORT. In this stage, the data warehouse is evolving to support a growing analytics portfolio including visual analytics, self-service, and predictive analytics. The key is that the warehouse is supporting analytics that it couldn’t support well before. As organizations collect data at scale and in real time, they are also looking for the warehouse to support iterative SQL against large amounts of data. The warehouse might bring the compute to the data for better performance. Different personas are starting to use the warehouse environment for analytics—be it an environment in the cloud or a data lake on premises.

• SHARING AND COLLABORATION. At this stage, components of the data warehouse or data marts are typically housed in IT. If IT had not already been collaborating with the business on the kinds of questions it needs to address, it is doing so now. The data warehouse environment is starting to be used to share data and collaborate. However, operations overhead is often involved. The business is starting to work collaboratively with IT and an expansion strategy to give users access to data. Cross-functional teams consisting of IT, the business stakeholders, analysts and data scientists, data engineers, and operations also begin to emerge.

• SECURITY AND GOVERNANCE. Although data governance, in terms of policies and processes, might already be in place, it will begin to evolve to address modernization—including new tools and processes. This might include employing data lineage, metadata, and data catalogs to provide access and governance to new data types. Policies for this new data are also being developed.

Inflection Point

It should be clear by now that a sign of data warehouse modernization/maturity is that the warehouse can support growing and disparate data types, with good query and analytics performance, in a way that provides flexibility and agility to the business. It can support real-time updates and data at scale. Beyond this point, organizations can scale as their data storage and analysis requirements dictate rather than as their infrastructure dictates. Additionally, beyond this point, the warehouse can support more advanced analytics including techniques such as machine learning and natural language processing.

Once organizations move past the inflection point, they start to build maturity across the numerous areas that are needed for data warehouse modernization to grow. These modernization efforts become part of what TDWI calls a success cycle. As organizations see success with their data and analytics program, they start to do more. As they do more and as they gain more experience, they tend to see positive results. This success builds on itself for measurable impact. It is a virtuous cycle and is perhaps why TDWI sees that organizations that collect and analyze disparate data using advanced analytics are more likely to measure a top- or bottom-line impact with their analytics efforts than those that do not.

Page 14: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

12 TDWI RESEARCH

TDWI Modern Data Warehousing Maturity Model Guide

For instance, as organizations move past the inflection point to become more modern with their data warehouse environments, they typically either scale their traditional data volumes and/or begin to look beyond traditional data sources towards more exotic ones. The data warehouse supports complex and innovative analysis for real-time decision making. It can support ingestion of massive amounts of fresh data.

In order to make this move, an organization must take these steps:

• SHARING AND COLLABORATION. As organizations mature (in terms of their warehouse), there is a good chance they will need to realign their data strategy with business goals. That means that business and IT will need to work together to support modern data and analytics. This includes collaboration and coordination between teams, which involves communication. The importance of this cannot be understated. IT may own the data strategy, but business is typically responsible for aligning projects with organizational objectives. The two groups will need to work together to make this happen.

Collaboration will need to happen at other levels in the organization, as well. For instance, data scientists will most likely need to collaborate with subject matter experts to make sure that any analytics models created meet business needs. DevOps will need to collaborate with data scientists and others building models to make sure that models are put into production.

Organizations will also need to consider training for analytics as they realign their data and analytics strategies. This includes both training for more advanced analytics and training staff to use the analytics developed. Although this isn’t part of data warehouse modernization per se, it is an important adjunct.

• INFRASTRUCTURE AGILITY. If the organization hasn’t already done so, as it begins to mature its warehouse environment, it typically considers the public cloud as part of that architecture. As stated above, the cloud can provide elasticity, scalability, and flexibility for organizations looking to build their data and analytics efforts. Some cloud providers handle tasks such as upgrades, patches, and even capacity planning issues (with serverless models). That can help organizations cut down on operations time and cost. New cloud models, such as serverless computing, can support real-time processing with their ability to scale up resources while meeting performance requirements. Of course, not every job needs real-time processing, but real time is becoming more important to more organizations.

The cloud is often part of a multiplatform environment. As enterprises move to multiplatform solutions, it is important for them to look for solutions that are compatible with each other. For instance, that means that the data warehouse supports new analytics technologies. It means that there is a multiplatform reference architecture in place, which determines how diverse platforms integrate and interoperate, which data goes where, and how data flows between platforms. This may involve a number of vendor platforms, even if you are using a cloud data warehouse.

• ANALYTICS SUPPORT. Any sophisticated warehouse environment will need to be able to support multiple analytics techniques. This includes classes of analytics such as predictive, text, geospatial, stream, and graph. Different analysts may use different tools, so support for potential multiple analytics toolsets will be important. These tools might be deployed on premises or in the cloud.

• SECURITY AND GOVERNANCE. As organizations modernize their warehouse environment, they will also need to consider the policies and processes they put in place to secure and govern diverse data. Governance needs to include all stakeholders, including cloud providers if they are

Beyond this point, the warehouse can support

more advanced analytics

Page 15: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

tdwi.org 13

Stages of Maturity

part of the architecture. It will include how to handle diverse data in a potentially distributed environment. The organization will need to look at tooling to support metadata and data lineage. Security will also be an issue for both data at rest as well as data that may move from one platform to another.

Most organizations with modern data warehouse deployments have moved in phases as part of a gradual evolution. For instance, some organizations do newer projects in the cloud as they plan to transition to the cloud. Some organizations want to keep sensitive data on premises and govern it there.

Stage Three: Modern

By this stage, the organization has modernized its warehouse to meet the growing needs of the business. This includes support for real-time data and advanced analytics in the warehouse environment. The warehouse is moving in the same direction as the business to support traditional and advanced analytics that provide business value.

• DATA DIVERSITY. In this stage, the warehouse supports internal and external data, collected both in batch and real time. This includes multistructured data such as text, images, and streaming data. In this phase, the company is also actively consolidating silos or employing data federation and virtualization to unify views of data from across all components of the data infrastructure.

• INFRASTRUCTURE AGILITY. During this stage, the warehouse can support massive amounts of disparate data, often in a massively parallel processing (MPP) environment. This includes high-performance streaming ingestion to load data. Most data can be loaded in real time so data is fresh. Scaling the warehouse to support varying data requirements is common as is automated ETL code for loading and transforming data in real time. Data is accessible from anywhere. The architecture is often services-based, which provides flexibility to the organization in terms of data services for ingesting and processing data as well as analytics services (such as machine learning services) for gaining insight. This architecture may be hybrid so that organizations can protect what they already have that is working well, but also utilizes new platforms both on premises and in the cloud.

• ANALYTICS SUPPORT. In this stage, the data warehouse supports multiple kinds of analytics. This includes SQL that facilitates joining public or commercial databases with other data, at petabyte scale if needed. SQL performance is very fast (e.g., seconds) across terabytes of data. The warehouse also supports organizational analytics including dashboards, visualizations, and more sophisticated analytics (such as machine learning, predictive analytics, NLP, and streaming analytics) against large amounts of disparate data. That might mean that the warehouse supports multiple vendor toolsets on it or an analytics platform that supports various personas.

• SHARING AND COLLABORATION. In the mature stage, the data warehouse supports a culture of collaboration. Individuals and teams share data, queries, reports, and models within their organization. Data warehouse and end-user adoption is strong because interfaces are easy to

In the mature stage, the data warehouse supports a culture of collaboration

Page 16: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

14 TDWI RESEARCH

TDWI Modern Data Warehousing Maturity Model Guide

use. Here, the data warehouse supports the business strategy and there is an incentive to work together to deploy advanced analytics such as models into production.

• SECURITY AND GOVERNANCE. As more people become involved in analysis and as mature organizations deal with larger volumes of diverse data, the cloud, and other new technologies (such as stream mining), practices need to evolve. For instance, as more people analyze data, there is a greater need for consistent vocabularies and flexible access to data. Tools that support data lineage and data cataloging (an inventory of data assets) are common in this stage. Here, the organization is able to govern data on premises and in the cloud. The policies, processes, and standards are all well established. Data security is also in place. The organization makes a point to protect sensitive data both at rest and in motion.

Stage Four: Advanced/Visionary

Only a few companies are visionary in terms of data warehouse maturity. At this stage, the data warehouse is very responsive to organizational needs. Here, organizations are executing analytics and advanced analytics programs smoothly using an infrastructure that can support data of any shape and size. The cloud is often part of this data warehouse infrastructure or supports the whole data warehouse. In the visionary stage, there is excitement and energy in analytics, and a healthy and agile analytics culture enables users in multiple positions to benefit from data and analytics.

• DATA DIVERSITY. Companies with advanced data warehouse environments are able to manage multistructured data at scale. The warehouse provides users with a holistic, real-time view of the data, even across multiple sources. This data is accessible by all of the personas in the company from data scientists to business users, developers, and data engineers. The company has also amped up its tooling. For instance, it often makes use of automated tools that use advanced analytics such as machine learning to support intelligent data processing across both the data and analytics life cycle.

• INFRASTRUCTURE AGILITY. Managing complexity is key to data warehouse maturity. The visionary company has deployed a coherent infrastructure that includes the ability to integrate new sources of data for analytics, whether they are internal or external to the company. This requires little in the way of capacity planning. The cloud is also typically used by more mature organizations, for many reasons, and usually used in a hybrid fashion, although some organizations have moved all their data and analytics to the cloud in a serverless model.

At this stage, there is a strong focus

on using analytics to drive innovation,

insights, systems, and applications

Page 17: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

tdwi.org 15

• ANALYTICS SUPPORT. At this stage, the data warehouse supports the analytics efforts of those across the organization. This includes support for high-performance SQL as well as more sophisticated analytics such as machine learning, deep learning, stream analytics—to name a few.

Additionally, at this stage, there is a strong focus on using analytics to drive innovation, insights, systems, and applications. The data warehouse supports sophisticated analytics. Often these analytics are utilized against the data in the warehouse in a services model, where the developer or data scientist is able to select analytics services from the cloud.

• SHARING AND COLLABORATION. Here, data and analytics are shared internally and externally with customers and partners to support new business models. This is considered a natural part of doing business. The data warehouse supports fast time to market for applications and analytics. That means that IT is collaborating with the business to execute on business strategies around analytics.

• SECURITY AND GOVERNANCE. Security is well established with oversight from a well-managed data access strategy. Tools and techniques are in place for governance across the environment, including automated tooling.

Evaluating Assessment ScoresAs stated previously, the benchmark survey has roughly 50 questions across the five categories that form the dimensions of the TDWI Modern Data Warehouse Maturity Model (see Figure 1).

These dimensions should now seem familiar because they are the same categories we have been referencing throughout this guide. These factors and others are used to explore relationships in the data to help determine best practices for data warehouse modernization.

Of course, organizations can be at different stages of maturity in each of these five categories, and most are.

ScoringScoring is performed on a scale from 1–100. Each of the five dimensions has a potential high score of 20 points. Questions may be weighted differently depending on their relative importance. Because organizations can be at different levels of maturity in the five dimensions, we score each section separately as well as provide an overall score. There are also questions that aren’t scored, but rather used for best-practices guidance.

The output of the assessment is a score in each dimension and the total score, as well as a gap recommendation that provides advice and best practices for getting to the next stage of maturity.

Evaluating Assessment Scores

Page 18: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

16 TDWI RESEARCH

TDWI Modern Data Warehousing Maturity Model Guide

InterpretationOnce you complete the survey, a report-based interface will show how your responses compare to those of your peers. The breakdown of scores for each dimension is as follows:

SCORE PER DIMENSION STAGE

4–7 1: Traditional

8–12 2: Modernizing/Evolving

13–17 3: Modern

18–20 4: Advanced/Visionary

For instance, if you receive a score of 11 in the data diversity dimension of the assessment, you are in the Modernizing/Evolving stage for that dimension. You should expect to see different scores for each dimension. Modern data warehousing programs don’t necessarily evolve at the same rate across all the dimensions. For example, your company might be more advanced in supporting different types of analytics than it is in maintaining strong data governance.

When you complete the assessment, you might see scores similar to this:

DIMENSION SCORE STAGE

Data diversity 10 Modernizing/Evolving

Infrastructure agility 7 Traditional

Analytics support 11 Modernizing/Evolving

Sharing and collaboration 4 Traditional

Security and governance 7 Traditional

SummaryThe TDWI Modern Data Warehouse Maturity Model Assessment provides a quick way for organizations to assess the maturity of their warehouses and compare themselves in an objective way against others with warehouse initiatives. The assessment is based on the TDWI Modern Data Warehouse Maturity Model, which consists of four maturity stages with an inflection point between stages two and three.

The assessment serves as a relatively high-level measurement of your analytics maturity. It consists of 50 questions across five categories; this merely touches the surface of all the complexities involved in building out your warehouse ecosystem. To gauge precisely where you are, you may also choose to work with an independent source to validate your progress.

Page 19: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

tdwi.org 15

cloud.google.com

Google BigQuery is a fast, highly scalable, cost-effective, and fully managed cloud data warehouse for analytics, with built-in machine learning . BigQuery is serverless and is designed to make data analysts productive at an unmatched price-performance ratio . Because there is no infrastructure to manage, organizations can focus on analyzing data to find meaningful insights using familiar SQL and greatly simplify their data operation needs . BigQuery enables data analysts to seamlessly analyze terabytes to hundreds of petabytes of data at blazing fast speed by creating a logical data warehouse over managed, columnar storage as well as data from object storage and spreadsheets . BigQuery ML enables data scientists and data analysts to build and operationalize ML models on planet-scale structured or semistructured data directly inside BigQuery using simple SQL—in a fraction of the time . BigQuery makes it easy to securely share insights within the organization and beyond using data sets, queries, spreadsheets, and popular BI tools . It allows organizations to capture and analyze data in real time using its powerful streaming ingestion capability so that the insights are always current . BigQuery lets you quickly set up your data warehouse and start to query your data immediately .

Visit Cloud .google .com/bigquery to know more .

Research Sponsor

Page 20: TDWI Modern Data Warehousing Maturity Model Guide · 2 TDWI RESEARCH TDWI Modern Data Warehousing Maturity Model Guide About the Author FERN HALPER, PH D , is VP and senior director

555 S Renton Village Place, Ste. 700

Renton, WA 98057-3295

T 425.277.9126

F 425.687.2842

E [email protected] tdwi.org

TDWI Research provides research and advice for data professionals worldwide. TDWI Research focuses exclusively on data management and analytics issues and teams up with industry thought leaders and practitioners to deliver both broad and deep understanding of the business and technical challenges surrounding the deployment and use of data management and analytics solutions. TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor organizations.