the state of enterprise data quality: 2016 › files › 2016 › 01 › blazent... · commissioned...

21
©COPYRIGHT 2016 451 RESEARCH. ALL RIGHTS RESERVED. The State of Enterprise Data Quality: 2016 Perception, Reality and the Future of DQM CARL LEHMANN, KRISHNA ROY, BOB WINTER JANUARY 2016 BLACK & WHITE PAPER A REPORT ON RESEARCH COMMISSIONED BY

Upload: others

Post on 28-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

©COPYRIGHT 2016 451 RESEARCH. ALL RIGHTS RESERVED.

The State of Enterprise Data Quality: 2016Perception, Reality and the Future of DQMCARL LEHMANN, KRISHNA ROY, BOB WINTERJA N UA RY 20 1 6

B L AC K & W H I T E PA P E R

A R E P O RT O N R ES E A R C H CO M M I SS I O N E D BY

Page 2: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

2COMMISSIONED BY BLAZENT

NEW YORK20 West 37th Street New York, NY 10018 +1 212 505 3030

SAN FRANCISCO140 Geary Street San Francisco, CA 94108 +1 415 989 1555

LONDONPaxton House 30, Artillery Lane London, E1 7LS, UK +44 (0) 207 426 1050

BOSTONOne Liberty Square Boston, MA 02109 +1 617 598 7200

A B O U T 4 5 1 R E S E A R C H451 Research is a preeminent information technology research and advisory company. With a core focus on technology innovation and market disruption, we provide essential insight for leaders of the digital economy. More than 100 analysts and consultants deliver that insight via syndicated research, advisory services and live events to over 1,000 client organizations in North America, Europe and around the world. Founded in 2000 and headquartered in New York, 451 Research is a division of The 451 Group.© 2016 451 Research, LLC and/or its Affiliates. All Rights Reserved. Reproduction and distribution of this publi-cation, in whole or in part, in any form without prior written permission is forbidden. The terms of use regarding distribution, both internally and externally, shall be governed by the terms laid out in your Service Agreement with 451 Research and/or its Affiliates. The information contained herein has been obtained from sources be-lieved to be reliable. 451 Research disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although 451 Research may discuss legal issues related to the information technology business, 451 Research does not provide legal advice or services and their research should not be construed or used as such.451 Research shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The reader assumes sole responsibility for the selection of these materials to achieve its intended results. The opinions expressed herein are subject to change without notice.

Page 3: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

3COMMISSIONED BY BLAZENT

Key FindingsData quality is often cited as a critical determining factor in terms of the effectiveness of an enterprise to deliver business value. This report summarizes the findings of a survey of 200 IT decision-makers and influencers to determine the real and perceived risks of data quality and integrity in enterprises, and it identifies future initiatives that will im-pact the growth, use of, and quality of data under management.

Fewer than half of the study respondents (40%) were very confident in their organization’s data quality management (DQM) practices or the quality of data within their company. Only 50% of respondents believed the DQM practices put in place by their organizations – and the quality of the data used overall – were either slightly better than satisfactory, or at least good enough in general. Throughout this study, respondents exhibited doubts about the effectiveness of their DQM initiatives.

Of the respondents, 95% acknowledged that they expected the number of data sources and the volumes of data in their organization to increase in the coming year. Almost 70% of respondents expect data volumes to grow by up to 70%, while nearly 30% of respon-dents anticipate data volumes to increase by anywhere from 75% to nearly 300%.

Organizations employ multiple means to manage data quality. Some of those means are surprisingly rudimentary and manual in nature. For example, 44.5% of respondents cited the finding of data errors by using reports and then taking subsequent (after the fact) corrective action as their means for DQM, while 37.5% employed a manual data cleansing process. Also surprising was the fact that 8.5% of respondents avoided DQM completely, favoring a ‘hope for the best’ approach.

A disconnect exists between responsibility and accountability for data quality. While the IT department is mainly held responsible, the originators of the data – either employees or cross-function teams performing data entry – don’t share in this responsibility. IT depart-ments have, therefore, become burdened with the task of employing multiple technologies to compensate for the fact that responsibility for data quality is generally not assigned to those directly involved with its capture.

Page 4: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

4COMMISSIONED BY BLAZENT

Key FindingsWhile IT shoulders the responsibility, our research also shows that different groups with-in the enterprise – managerial teams – are ultimately held accountable for the quality of data. When those held ‘accountable’ and those held ‘responsible’ are poorly aligned, data quality can suffer. DQM in many organizations is fractured and poorly aligned, making a consistent approach to managing data difficult to maintain.

Over half of respondents (57.5%) were ‘somewhat confident,’ ‘unaware,’ or ‘less than con-fident’ in terms of knowing whether all the data sources required for their purposes had been aggregated prior to cleansing. Less than half (42.5%) were ‘very confident’ of this. Many respondents also reported that dependency management of any kind for analytics is not automated and involves manual effort. These findings raise the question of whether the respondents are using enough of the correct data for their projects. Missing or errone-ous data sets can have a dramatic impact on the quality of analysis, so an understanding of data dependencies for certain workloads is vital.

While the respondents generally believe they are working with satisfactory or ‘good enough’ data quality, they acknowledge that when data quality is poor, it can dramatically impact the value of its use in projects and analysis – to wit, 65% of respondents believe that 10% to 49% of business value can be lost due to poor data quality, while 29% of respondents said 50% or more of business value can be lost. Only 6% of respondents as-serted that little to no business value is lost as a result of poor data quality. These findings demonstrate that the value of high-quality data is recognized as impactful – even if the processes, technologies and responsibilities are not currently in place to attain it.

Organizations reveal an appetite for machine learning, with 41.5% of respondents wanting a program of this nature within 12 months, and 14.5% wanting a machine learning program in the next 24 months. Also, 22% of respondents said they already had a machine learn-ing program, which suggests that a once cutting-edge technology is now moving toward mainstream adoption, particularly for predictive analytics and recommendations, which emerged as the top-ranking machine learning scenarios (67% each), and thus will gener-ate more data and therefore greater need for DQM. Respondents also reported their or-ganization’s machine learning program would be used for a wide variety of projects, with asset management and data discovery emerging as the top use cases.

Page 5: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

5COMMISSIONED BY BLAZENT

Table of ContentsKey Findings 3

Current Data Environment 6

Data Quality: Perceptions vs. Reality 10

The Business Value of Quality Data 11

Data Use Cases 13

Data Quality Management: The Future 14

Conclusions 17

Recommendations 18

Appendix 19

Methodology and Respondents 19

About the Authors 21

Page 6: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

6COMMISSIONED BY BLAZENT

Current Data EnvironmentOur research first ascertained the study respondents’ current DQM status. We found that 37% of interviewees were man-aging and/or integrating 51 to 100 data sources in an organization, and 25.5% of respondents had 101 to 200 data sources under management and/or integration – illustrating that most organizations had complex data environments comprising many data sources. When it came to future data volumes, Figure 1 shows that nearly all of our respondents (98.4%) acknowledged that they expected the volume of data in their organization to increase in the coming year. Of those surveyed, 70.9% expected data volume to grow by nearly 75%, while 27% anticipated data volume to increase from 75% to nearly 300%, indicating that the amount of data flowing through most enterprises is set to increase significantly in the coming year.

Figure 1: Expected Data Volume Increase in Coming Year

2.1%

25.4%

23.3%

20.1%

11.1%

11.1%

4.8%

1.6%

0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0%

<10%

10-24%

25-49%

50-74%

75-99%

100-199%

200-299%

300-499%

500+%

Don't Know

Dat

a V

olum

e In

crea

se

Our survey also revealed that organizations employ multiple means to manage data quality (see Figure 2). Moreover, some of those means are rudimentary and manual in nature. For example, 44.5% of respondents cited the finding of errors using reports and subsequent corrective action as their modus operandi for DQM, while 37.5% of those surveyed employed a manual data cleansing process. Also surprising was the fact that 8.5% of respondents avoided DQM completely, favoring a ‘hope for the best’ approach.

Page 7: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

7COMMISSIONED BY BLAZENT

Figure 2: Means for Managing Data Quality

1.5%

8.5%

37.0%

37.5%

41.0%

44.5%

53.0%

60.5%

62.0%

0% 20% 40% 60% 80%

Don't know

We don't manage quality, we hope for the best

We outsource to a 3rd party

We manually cleanse our data

We rely on applications to validate data

We find errors using reports and then act

We use a DQM cloud service

We use DQM software on-premises

We have a formal MDM program technology

In terms of responsibility for data quality, we found that it falls firmly at the feet of the IT department (see Figure 3) – al-though, interestingly, top management (i.e., business-unit heads and managers, board of directors) were also found to bear the brunt. This could be attributed to the need for data accuracy in compliance, enterprise policies and government regula-tions, for which top management is ultimately responsible. We also found that 92% of interviewees audited data for security or governance, risk or compliance (GRC) projects, and that 80% of those surveyed have one to four full-time-equivalent (FTE) staff dedicated to data auditing. Furthermore, 35.1% of the responding organizations expected a significant increase in data auditing resources in the coming year.

Figure 3: Responsibility for Data Quality

2.1%

25.4%

23.3%

20.1%

11.1%

11.1%

4.8%

1.6%

0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0%

<10%

10-24%

25-49%

50-74%

75-99%

100-199%

200-299%

300-499%

500+%

Don't Know

Dat

a V

olum

e In

crea

se

When it came to the source of data quality issues, human error, unsurprisingly, ranked as the number one culprit (see Figure 4). IT-related practices such as migration efforts, systems changes and systems errors were also frequently cited, which we would expect. However, it is surprising that 38% of respondents cited their customers as the cause of data quality issues. Customer data entry usually involves an interaction with a Web service, Web-based application or mobile app, which typ-ically have baked-in data validation in order to field dirty data at the point of entry. What’s more, we believe errors from external data sources are likely to increase as more organizations accelerate the sharing of data and services via API integra-tion with their supplier partners and customers.

Page 8: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

8COMMISSIONED BY BLAZENT

Figure 4: Causes for Poor Data Quality

3.0%

37.5%

38.0%

43.0%

43.5%

44.0%

47.0%

57.5%

0% 20% 40% 60% 80%

We don't have data quality problems

External data

Data entry by customers

Systems errors

Changes to source systems

Mixed entries by multiple users

Data migration or conversion projects

Data entry by employees

Front-line workers and cross-functional teams noted in Figure 3 were generally not held responsible for data quality, yet data entry by employees and mixed entries by multiple users, as noted in Figure 4, were found to be the most common sources of poor data quality. These findings demonstrate the disconnect between responsibility and accountability when it comes to data quality. While the IT department is held responsible, the originators of data entry (e.g., employees and cross-function teams) are not. This places a burden on the IT department to engage with as many technologies as they can (Figure 5) to compensate for the fact that data quality responsibility, for the most part, is not assigned to those directly engaged in its capture. Our findings also showed that top management and executives are likely to be held accountable for the implications associated with data quality (poor or otherwise) – even though IT has the responsibility.Most study respondents had made investments in DQM technology and resources. However, 24% of those surveyed are currently evaluating or plan to evaluate tools within the next 12 months. When it came to the type of tools employed, or under evaluation, for DQM purposes, a broad and diverse mix of offerings were selected (see Figure 5). Big data, master data management (MDM) and data cleansing tools were the most common. Moreover, some organizations employed spe-cialized tools for specific purposes, such as geo-coding, while others selected a more general-purpose offering such as an MDM, ETL (extract, transform, load) or profiling tool. Either way, it is important to note that the breadth of tooling in use by an organization can create complexity in DQM execution. Furthermore, personnel responsible for data quality may not know the conditions under which certain tools should be used – or if at all. Our research of other markets, such as DevOps and enterprise integration, finds similar complexity, which calls for consolidation of vendors to pave the way for simplifica-tion and improvement. We believe a similar trend will emerge in DQM.

Page 9: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

9COMMISSIONED BY BLAZENT

Figure 5: DQM Tools in Use

17.5%

18.0%

22.7%

22.7%

23.7%

24.2%

24.2%

24.7%

25.3%

26.3%

28.9%

32.5%

34.5%

41.2%

43.8%

52.1%

53.6%

56.7%

0% 20% 40% 60% 80%

Enrichment

Householding

Transformation

Standardization, Normalization

Geo-coding

Matching on name and address data

Verification

Internationalization/Localization

Matching and consolidation

Profiling

Metadata management

Data stewardship (management)

ETL (Extract, Transform, Load)

Data exception handling

Monitoring

Data cleansing tools (data validation)

Master Data Management (MDM)

Big Data

Over 25% of organizations reported a ‘high’ return on investment (ROI) from DQM. Nearly 60% of respondents reported a ‘moderate’ ROI, and less than 15% reported ‘breakeven or less,’ which suggests that selected tools are doing what is required of them, but not entirely. Also of note is the fact that 80% of surveyed organizations believe data quality is of high impor-tance and warrants investment, while just 14% of interviewees seem to view data quality as less of a priority.

Page 10: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

10COMMISSIONED BY BLAZENT

Data Quality: Perceptions vs. RealityWhen it came to understanding the importance of data quality, we found that 81.5% of survey respondents believe their organization thinks that the quality of its data is better than it really is (see Figure 6). Only 9.5% of respondents believed the reverse (i.e., data quality is better than it’s perceived to be).

Figure 6: Data Quality: Perception vs. Reality

0.5%

0.5%

0.5%

5.5%

5.0%

10.5%

23.5%

16.0%

26.5%

11.5%

0.5%

3.5%

0.5%

4.5%

25.5%

26.0%

30.0%

9.5%

0% 5% 10% 15% 20% 25% 30% 35%

Poor – 1

2

3

Needs Improvement – 4

5

6

Satisfactory – 7

8

9

Excellent – 10

What They Think

What is Real

Poor data is making its way through organizations even though respondents report having achieved relatively acceptable ROI from DQM investments (as noted earlier). In other words, poor data is still an issue – even when a DQM practice is in place.

DATA Q UA L I T Y AT T R I B U T ES

When queried about their organization’s effectiveness at managing a series of data quality attributes, respondents revealed that in general they believed their organization was relatively effective – although there was some doubt. When asked to rank data quality attributes that need to be addressed in the coming year, integrity, accuracy, consistency and validity topped the list (see Figure 7).

Page 11: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

11COMMISSIONED BY BLAZENT

Figure 7: Data Attributes That Need to Be Addressed

31.0%

40.5%

42.5%

46.5%

53.5%

58.0%

68.0%

70.5%

0% 10% 20% 30% 40% 50% 60% 70% 80%

Duplication

Accessibility

Completeness

Timeliness

Validity

Consistency

Accuracy

Integrity

Roughly one-third of respondents had some doubt about whether the data they were using was the correct data for their purposes. Two-thirds had a higher level of confidence – but still felt there was room for improvement.

DATA AG G R EGAT I O N A N D C L E A N S I N G P RACT I C ES

Over half (57.5%) of respondents were ‘somewhat confident,’ ‘unaware,’ or ‘less than confident’ in terms of knowing whether all the data sources required for their purposes had been aggregated prior to cleansing. Less than half (42.5%) were ‘very confident.’ Many respondents also reported that dependency management of any kind for analytics is not automated and involves manual effort. These findings raise the question of whether businesses are using enough of the correct data for their projects, which is a critical issue. Missing and erroneous data sets can have a dramatic impact on the quality of analysis, so an understanding of data dependencies for certain workloads (especially analytics) is vital. It is also worth noting that less than 40% of those interviewed were ‘very satisfied,’ and over 50% were ‘somewhat satisfied or less’ with their organization’s current means to manage data quality. In our opinion, satisfaction rates should be much higher to avoid poor analytic practices and substandard conclusions. The anticipated growth of data volumes will only exacerbate DQM issues, particularly if data quality is already less than satisfactory.

The Business Value of Quality Data The value of business execution and outcomes can, in many ways, be directly associated with the quality of the data used for making decisions and controlling operations. Several benefits of high-quality data were cited by the study respondents, some of which – noted in Figure 8 – either directly create business value (e.g., increased revenue, lower costs); improve productivity (e.g., less time reconciling data); or improve quality (e.g., fewer errors).What is noticeable is that ‘faster decisions’ and ‘a single version of the truth’ rank relatively low on the list of benefits associ-ated with high-quality data. These findings are surprising because faster decision-making and achieving one accurate set of data are often cited in the market at large as major drivers behind DQM initiatives, particularly when data auditing and GRC are common practices (as noted in the previous section on ‘Current Data Environment’). Data audits and GRC demand consistency, accuracy and speed to maintain regulatory compliance, among other requirements. This discrepancy in terms of perceived business value may be attributed to the fact that there are fewer persons charged with auditing and GRC (who thus require faster decisions and a single version of the truth) in an organization, and a greater number of persons charged with extracting other business value from data (e.g., increased revenue, reduced cost).

Page 12: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

12COMMISSIONED BY BLAZENT

Figure 8: Business Value of Data

1.0%

24.0%

25.0%

35.5%

38.5%

40.5%

43.0%

45.0%

46.0%

46.5%

48.5%

50.5%

0% 10% 20% 30% 40% 50% 60%

Other, please specify

Single version of the truth

Faster decisions

Better supplier performance

Better, more informed decisions

More accurate orders

Fewer errors

Increased customer satisfaction

Greater confidence in analytical systems

Less time spent reconciling data

Reduced costs

Increased revenues

One of the most alarming findings in the study is that 94% of respondents believe that business value is lost as a result of poor data quality – 65% of respondents believe that 10-49% of business value can be lost due to poor data quality, while 29% of respondents said 50% or more of business value can be lost. Only 6% of respondents asserted that no business value is lost as a result of poor data quality (see Figure 9). Thus, poor data can considerably diminish productivity and the quality of results.

Figure 9: Business Value Lost Due to Poor Data Quality

0.5%

0.0%

6.5%

10.5%

11.5%

9.5%

15.0%

15.0%

25.5%

6.0%

0% 5% 10% 15% 20% 25% 30%

All Value Lost: 90 - 100%

80 - 89%

70 - 79%

60 - 69%

50 - 59%

40 - 49%

30 - 39%

20 - 29%

10 - 19%

No Value Lost: 0 - 9%

Respondents also cited a range of problems that arise from poor data quality (see Figure 10), most of which are typical and expected. What was unexpected is the relatively low ranking for ‘compliance problems,’ which we suspect may also be attributed to fewer persons heading up data auditing and GRC, and a greater number of persons using – or charged with getting business value from – data, as noted earlier.

Page 13: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

13COMMISSIONED BY BLAZENT

Figure 10: Problems Caused by Poor Data Quality

1.0%

16.5%

34.0%

34.5%

35.0%

38.0%

38.5%

39.5%

41.5%

44.0%

44.5%

0% 10% 20% 30% 40% 50%

Other, please specify

Compliance problems

Poor order accuracy

Loss of credibility in a system

Customer dissatisfaction

Duplications

Bad decision making

Delay in deploying a new system

Lost revenue

Extra costs

Extra time to reconcile data

Data Use CasesAnalysis topped the list of usage scenarios for data – 51.5% of study respondents reported that 20% to 59% of their orga-nization’s data is used for analytics of some kind, and 34% reported that between 60% and 100% of the data is analyzed. We also found that 83.3% of respondents said their organization’s use of data for analysis of various types was currently ac-ceptable for their purposes. However, 16.7% noted that the current percentage of data used for analytics was unacceptable – and they would prefer to have it increased. Of these respondents, 42.4% wanted between 80% and 100% of the data used for analytics, and 30.3% believed 60% to 79% of enterprise data should be used (see Figure 11). Not surprisingly, the per-sonnel charged with analytics tasks would welcome a greater percentage of organizational data being used for analytics.

Figure 11: Percentage of Data Used for Analytics – Acceptable vs. Preferred Ranges

9%

25%24%

28%

12%

2%5% 5%

42%

30%

18%

9%

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

80-100% 60-79% 40-59% 20-39% 10-19% 5-9% <5% Don'tKnow

We don'tanalyze

data

CurrentlyAcceptable

PreferredAmong Others

Page 14: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

14COMMISSIONED BY BLAZENT

When it came to the most beneficial uses of effectively managing and analyzing data, big data projects, followed by data security and risk management, topped the list. Our findings also revealed that 81% of organizations used data analytics to uncover new revenue opportunities.

Data Quality Management: The Future Last, we explored organizations’ plans for improving data quality going forward, the tools they would require, and the proj-ects under consideration in the realms of big data management, Internet of Things (IoT) and machine learning. When asked about the status of their organization’s plans for managing and improving data quality over time, only 24% of the organizations had ‘already implemented [a data quality plan] and it’s working.’ Meanwhile, 37.5% of respondents were either developing a plan, or had no plans in place, for managing and improving data quality, and 6.5% were dissatisfied with the plan they had already implemented (see Figure 12).

Figure 12: Status of Organization’s Plan for Managing and Improving Data Quality

4.5%

33.0%

31.5%

24.0%

6.5%

0.5%

0% 10% 20% 30% 40%

No plan

Developing a plan

Currently implementing a plan

Already implemented and it’s working

Had a plan, implemented–need a new plan

We gave up, let IT vendors and SaaSproviders worry about it

When it came to the DQM tools that respondents required the most, big data tools emerged as the number one require-ment, which we conclude reflects the paucity of such offerings, both within organizations and in the marketplace in general (see Figure 13). Data cleansing, MDM and monitoring tools factored high on the list of future needs, followed closely by data stewardship and exception-handling tools. All reflect a continuing need to better manage data and improve its quality. We believe organizations will seek tools that not only address quality control but also enable users to access, manage and re-purpose data for a variety of business needs and use cases. Vendors offering integrated capabilities that enable data quality control, data management and data integration will be increasingly in demand.

Page 15: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

15COMMISSIONED BY BLAZENT

Figure 13: DQM Tools/Services Needed Most

16.5%

18.0%

18.0%

19.0%

20.0%

20.5%

22.0%

22.5%

23.5%

24.0%

25.0%

27.5%

34.5%

36.0%

39.0%

40.0%

40.5%

46.0%

0% 10% 20% 30% 40% 50%

Matching on name and address data

Householding

Metadata management

Transformation

Profiling

Standardization, normalization

Verification

Matching and consolidation

Geo-coding

Enrichment

Internationalization/Localization

ETL (Extract, Transform, Load)

Data exception handling

Data stewardship (management)

Monitoring

Master Data Management

Data cleansing tools (data validation)

Big Data

Interestingly, price was not one of the leading considerations when make purchasing decisions (see Figure 14). It ranked lower than performance, functionality to handle complex problems, integration with existing tools, and ease of use. This suggests that organizations are willing to pay up for the functionality they need for effective DQM.

Figure 14: Decision Criteria Ranking for DQM Tools/Services

0 200 400 600 800 1,000 1,200

Score

Services offered

Price

Ease of Use

Integration with Existing Tools

Functionality to Handle Complex Problems

Performance

When it came to planned big data management programs (IoT projects are considered separately below), 60.5% of respon-dents said they already have a big data management project under way, while 33% of respondents reported there were plans to initiate one in the coming year, and 22% of those surveyed said they had no big data management project plans in the next 12 months.In terms of IoT projects (see Figure 15), just under one-quarter of respondents had no plans for an IoT program in the com-ing year, while 33% of respondents had an IoT project in place but expect the data volumes in the coming year to remain the same – indicating that IoT projects don’t always involve escalating data volumes, which is a common perception.

Page 16: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

16COMMISSIONED BY BLAZENT

Figure 15: Plans for IoT Projects

22.5%

33.0%

19.5%

25.0%

0% 5% 10% 15% 20% 25% 30% 35%

We have no such plans

We have an IoT program but the data volumesin the coming year will remain the same

We will initiate an IoT program for the first time

We will expand an existing IoT program

Organizations also showed a great appetite for machine learning, with 41.5% of respondents wanting a program of this nature within 12 months, and 14.5% wanting a machine learning program in the next 24 months. In addition, 22% of re-spondents said they already had a machine learning program, which suggests that a once cutting-edge technology is now moving toward mainstream adoption, particularly for predictive analytics and recommendations, which emerged as the top-ranked types of machine learning programs (see Figure 16).

Figure 16: Types of Machine Learning Programs (In Place or Planned)

0.6%

30.4%

48.8%

58.9%

66.7%

67.3%

0% 20% 40% 60% 80%

Other, please specify

Similarity Search

Outlier Detection

Cluster Analysis and Segmentation

Recommender Systems

Predictive Analytics

Respondents also reported that their organization’s machine learning program would be used for a wide variety of projects (see Figure 17), with asset management as the top-ranked use case – implying that organizations seek an automated and intelligent way to track IT and other enterprise assets. Data discovery is also highly ranked, implying that organizations want to better understand the totality of their data and potentially make use of so-called ‘dark data’ as part of their big data management and analytics programs.

Page 17: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

17COMMISSIONED BY BLAZENT

Figure 17: Machine Learning Use Cases (In Place or Planned)

12.5%

12.5%

12.5%

14.3%

16.7%

17.3%

17.9%

18.5%

19.0%

19.6%

20.2%

20.2%

20.8%

20.8%

21.4%

23.2%

24.4%

24.4%

25.0%

25.0%

25.6%

25.6%

25.6%

26.2%

26.8%

28.6%

30.4%

35.1%

37.5%

38.1%

44.6%

47.0%

0% 20% 40% 60%

Other, please specify

Drug Discovery and Development Analysis

Social Graph Analysis

Threat Detection

Patient Care Quality and Program Analysis

Power Generation Management

Supply Chain Analytics

Events/Activity Behavior Segmentation

Market Basket Analysis

Customer Churn Management

Event/Behavior-based Targeting

Abnormal Trading Analysis/Detection

Energy Network Management/Optimization

Clickstream Segmentation and Analysis

Product Recommendations

Call Detail Record (CDR) Analysis

Customer Segmentation

Fraud Detection/Prevention

Market and Consumer Segmentation

Campaign Management and Optimization

Network Performance Optimization

Ad Targeting/Selection

Forecasting and Optimization

High Speed Arbitrage Trading

Cross Channel Analytics

Pricing Optimization

Credit Risk Scoring

Campaign and Sales Program Optimization

Cybersecurity

Risk Management

Decision Making

Data Discovery

Asset Management

ConclusionsWhen reflecting on the key findings from this study on data quality, we take away three somewhat concerning conclusions.First, respondents convey a laissez-faire attitude toward the quality of data and the DQM practices in their organizations. It seems they believe that the current DQM efforts of their companies are generally satisfactory. Indeed, the respondents noted that improvements can be made, but the overall quality of data derived from DQM efforts seemed acceptable. For example, only 40% of the respondents were ‘very confident’ in their organization’s data quality and DQM practices; and just 50% believed their organization’s data quality and DQM practices were either slightly better than satisfactory, or at least good enough in general. Surprisingly, 8.5% of respondents reported that their organization does not engage in DQM at all, and acknowledge a ‘hope for the best’ approach. Throughout this study, we got the overall sense that respondents exhibit-ed doubts about the effectiveness of their DQM initiatives.The second observation seems to run contrary to the first. When asked about the effect of poor data quality on business value derived from its use, 94% of respondents believed that 10% or more of business value can be lost due to poor data

Page 18: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

18COMMISSIONED BY BLAZENT

quality. More specifically, 65% of respondents believed that 10% to 49% of business value can be lost due to poor data quality, and 29% of respondents said 50% or more of business value can be lost. Only 6% of the respondents said that little to no business value is lost as a result of data quality issues. These findings do not justify a laissez-faire attitude toward data quality and DQM practices. Indeed, diminished business value can also be attributed to another surprising finding – re-spondents expressed doubts that they had all the data they needed. For example, slightly over 57% were ‘somewhat confi-dent,’ ‘unaware,’ or ‘less than confident’ in terms of knowing whether all the sources of data needed for their purposes have been aggregated prior to cleansing. Only 42.5%, were ‘very confident’ of this. These findings demonstrate that the value of high-quality data, and enough of the proper data for specific purposes, is recognized as impactful.Perhaps the contradiction can be explained by the third main conclusion. There is a disconnect between those persons held accountable for data quality and those that are responsible for its capture and use. While the IT department is main-ly held accountable, the originators of data (e.g., employees, cross-functional teams, others) are not responsible for data quality upon capture or entry. IT departments are burdened with the task of employing multiple cleansing technologies to compensate. Some of those means are rudimentary and manual in nature, and apparently oblivious to the originators or curators of data. For example, 44.5% of respondents cited the finding of data errors by using reports and then taking subsequent (after the fact) corrective action as their means for DQM, while 37.5% employed a manual data cleansing pro-cess. The gap between those held accountable for data quality and those responsible for its capture and use is opaque and problematic. It leads to a lack of empathy between the two constituencies and thus, we suspect, largely accounts for the laissez-faire attitude of the respondents.Going forward, this gap is likely to expand. Of the respondents, 95% expect the number of data sources and the volumes of data in their organization to increase in the coming year. Almost 70% expect data volumes to grow by up to 70%, while almost 30% anticipate data volume to increase from 75% to nearly 300%. Moreover, organizations reveal an appetite for machine learning – described to respondents as ‘the use of algorithms that can learn from and make predictions on data without being explicitly programmed (and thus require high-quality data).’ A significant 41.5% of respondents seek a ma-chine learning program within 12 months, and 14.5% of respondents are seeking a machine learning project in the next 24 months. Interestingly, 22% of respondents said they already had a machine learning program in place, suggesting that the technology is now moving toward mainstream adoption. Predictive analytics and recommendations emerged as the top-ranking types of machine learning technology sought by organizations (67% each). Meanwhile, asset management (47%) and data discovery (44.6%) emerged as the top-ranking applied machine learning use cases, to help automate asset tracking and control, and to make use of ‘dark data’ as part of big data analytics programs, respectively. We believe the conclusions of this study and report will persist in organizations and are likely to be exacerbated due to the anticipated growth of data and plans for future projects that drive data creation and therefore need for quality management.

RecommendationsTo overcome the obstacles to DQM practices and resulting data quality challenges noted in this report, we recommend the following:

� The rules and policies often defined for MDM initiatives must be expanded and introduced to all means of data capture and entry across an organization.

� All persons and systems that capture or use data in any way should be held accountable and responsible for data quality. This means they have to know how quality is defined by understanding the data attributes that are priority and how to affect them.

� The gap between those held accountable for data quality and those responsible for data capture needs to be closed, and the relationship needs to become more transparent. This requires awareness by all parties of what it takes to maintain a DQM initiative and what needs to occur to improve the overall quality of data. There needs to be more empathy among all parties involved.

� DQM tools, techniques and services need to be rationalized and standardized to enable a combination of data cleansing, data integration.

Page 19: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

19COMMISSIONED BY BLAZENT

A P P E N D I X

Methodology and Respondents451 Research interviewed 200 North American IT executives from companies with 500 or more employees operating in diverse industries. A breakdown of the respondents’ titles, roles, and industries is provided below (Figures 18, 19 and 20).

Figure 18: Survey Respondents by Position

6%

7%

17%

18%

24%

29%

0% 10% 20% 30% 40%

VP, SVP or EVP

Individual contributor

Senior Manager

Manager

C-level

Senior Director or Manager

Figure 19: Survey Respondents by Functional Group or Department

2%

2%

4%

5%

6%

6%

20%

21%

37%

0% 10% 20% 30% 40%

Data Risk Mangement

Audit and Compliance Management

Data Security Management

Enterprise IT Architecture

Business Intelligence

Data Management

IT Service Mgmt or IT Infrastructure Mgmt

Business Operations

IT Operations

Page 20: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

20COMMISSIONED BY BLAZENT

Figure 20: Survey Respondents by Industry

1%

1%

2%

3%

3%

3%

3%

4%

5%

5%

6%

10%

11%

11%

16%

20%

0% 5% 10% 15% 20% 25%

Legal Services

Wholesale Trade

Transportation

Government

eCommerce

Energy/Utilities

Insurance

Construction

Telecommunications

Communications, Media & Services

Education

Other

Financial Services

Healthcare

Retail/Hospitality

Manufacturing

Technology

P U R C H AS E AU T H O R I T Y

The majority of survey respondents (68%) were responsible for approving the development and use of data quality tools and services within their organization, with the balance described as key purchase influencers.

Page 21: The State of Enterprise Data Quality: 2016 › files › 2016 › 01 › Blazent... · COMMISSIONED BY BLAENT 2 NEW YORK 20 West 37th Street New York, NY 10018 +1 212 505 3030 SAN

B L AC K & W H I T E PA P E R T H E STAT E O F E N T E R P R I S E DATA Q UA L I T Y: 20 1 6

21COMMISSIONED BY BLAZENT

A P P E N D I X

About the AuthorsCA R L L E H M A N N

Research Manager, Enterprise Architecture, Integration & Business Process ManagementCarl leads 451 Research’s coverage of integration and process management technologies in hybrid cloud architecture, as well as how hybrid IT affects business strategy and operations. The markets covered in his research include enterprise architecture management (EAM) tools, hybrid cloud integration technology (including iPaaS and API management) and business process management (BPM) software.Prior to joining 451 Research, Carl was Principal Analyst at BPMethods, where he advised clients on business strategy and process management. While there his book, Strategy and Business Process Management: Techniques for Improving Execu-tion, Adaptability, and Consistency, was published by Taylor and Francis Group in 2012.Carl was also a Senior VP of Strategy and Product Management for a B2B integration firm where he developed e-commerce SaaS and IaaS offerings used by over 4,000 companies. Prior to that, he served 10 years as VP of Research for IT advisory firms Gartner and META Group, advising Fortune 500 clients.

K R I S H N A R OY

Senior Analyst, Data Platforms and AnalyticsAs a Senior Analyst for the Data Platform and Analytics team, Krishna is responsible for the coverage of self-service analytics, predictive analytics and performance management. Prior to joining 451 Research, Krishna held a number of positions as a journalist in London and the US, including several years writing for Computergram International. She was also an Assistant Editor at the monthly magazine IBM System User, which focused on IBM software and hardware. In addition, Krishna spent three years covering M&A activity throughout Silicon Valley and was a founder of M&A Impact, a newsletter highlighting M&A activity in the software industry.

BO B W I N T E R

Managing Director, Advisory ServicesAs Managing Director for 451 Advisors, provides consulting services to IT vendors, enterprise customers and investors. A thirty-year veteran of the technology business with equal time on the vendor and consulting sides of the desk, Bob drives real world recommendations that are actionable and executable.Prior to joining 451 Bob was Senior Vice-President of Corporate Development for PKWare, creator of the zip file format. He also spent six years running global market research for storage giant EMC. Previously, he was Managing Director for Reality Research and Consulting, a division of United Business Media focused on Go to Market and product development consult-ing for global technology vendors.