data-ed webinar: data quality success stories

45
Dr. Peter Aiken, Founder, [email protected] Karen Akens, Data Consultant, [email protected] Data Quality Success Stories Dataversity Webinar 7-12-2016

Upload: dataversity

Post on 09-Jan-2017

776 views

Category:

Business


1 download

TRANSCRIPT

  • Dr. Peter Aiken, Founder, [email protected] Akens, Data Consultant, [email protected]

    Data Quality Success StoriesDataversity Webinar 7-12-2016

  • Copyright 2016 by Data Blueprint

    2

    Peter Intro Slide

  • Copyright 2016 by Data Blueprint

    3

    Karen Akens, CDMP

    Data management and solution development experience for numerous government and commercial clients

    Connector between Business & IT based on practical experience in both arenas

    Focus on Data Quality, Data Governance & Stewardship, and Business Intelligence

    Speaker at EDW, DGIQ, various DAMA chapters

    Board member of DAMA-Central Virginia.

  • Copyright 2016 by Data Blueprint

    4

    Information transparency Analytics Business Intelligence Increasing efficiencies Decreasing costs Driving holistic decision-

    making across the organization

    High Quality Data is Critical

  • Copyright 2016 by Data Blueprint

    5

    Getting Started with Data Quality

    Our approach begins with discovering The data that is most impactful to your

    business needs Your organizational capabilities to manage

    data as an asset (foundational practices) The state of your technical environment

    (technical practices)

    and then laying out the path forward in a roadmap That is achievable and matches your

    organizations abilities to deliver That builds momentum with specific, short-

    term win projects That outlines a long-term vision and

    implementation milestones

  • Copyright 2016 by Data Blueprint

    Clients Data Landscape

    6

    Growth through acquisition No data accountability Fractured Technology Landscape Need to Align with Global Education Strategy

    Challenge

    No Comprehensive BI Capability Lack of Unified Product and Portfolio

    Management Poor Data Quality & Unreliable Reporting Increasing Costs Due to Poor Data Mgmt.

    Business Impact

    Centralized Data Governance Program Formalize Data Stewardship Become Proactive vs. Reactive Increase Transparency and Decrease Cost

    Opportunity

  • Copyright 2016 by Data Blueprint

    7

    Case Study - Supplier Master

    Business Value Achievements: 1) Consolidated number of suppliers getting better terms and conditions

    2) Reduced suppliers with immediate payment terms, increasing cash flow3) Removed duplicate supplier, increasing ability to track spending/reduce risk dup/payments4) Increase in email addresses, (order email/remittance email) faster communication vendors, reduce cost of remittances via post.

    5) Improving quality contact information risk of missed payments and supplier relationships

    7

    0200400600800

    1000120014001600

    20-Oct-142-Dec-14

    Data Governance Board 12/14/2014

  • Copyright 2016 by Data Blueprint

    Challenges from a Lack of Data Quality

    8

    Its a ticking time bomb waiting to explode

    No Account Creation Controls

    No Standard Product

    HierarchyNo

    Universal Product Model

    Visible Inactive Records

    Labor Intensive Manual

    Data Clean-up

    Inconsistent Use of

    Business Terms

    Duplicate Accounts

    Missing Remittance

    Info

    InaccurateReports

    Missing Data

    Who owns the data?

    Who fixes?

  • Copyright 2016 by Data Blueprint

    9

    Selling the

    Message

    Share60-second Elevator Speech

    Use Current Inconsistencies

    that Impact Reporting

    Obtain Senior Level

    SponsorshipQuantify Value

    of Data

    Perform Data Quality Pilot

    Demonstrate Stewardship

    Success Story

    And ask for help before you think you need it.

  • Copyright 2016 by Data Blueprint

    If you want to avoid situations like this

    10

    One US system had 11,500 active cost centers, increasing risk mis-posting &

    mapping effort

    One S. African system missing electronic remittance info in 88% of

    cases, payments sent by post, increasing cost & lag time

    No Standard Product HierarchyCant determine product profitability

    90% of suppliers in one US system on immediate payment terms, impact to cash flow

    Lost revenue of $2 million annually, not utilizing rights previously granted

    No processes for deactivating vendors, 222,000 obsolete vendors removed from one of two US

    systems, many systems still contain ROT

    Supply Chain 320 hours every year end tracking missing 1099 data to avoid tax

    penalties

    Data issues not fixed at the source;never ending battle - financial resources spend 35%

    time reconciling data

    Data that Matters

  • Copyright 2016 by Data Blueprint

    you need to have this

    Enterprise Data

    Strategy

    Data Governance & Stewardship Framework

    which articulates

    roles of data owners and

    data stewards

    Senior level sponsorship & organizational

    culture that treats data as a strategic asset Data

    Governance Board with a mandate to drive data

    quality enterprise wide

    Master Data Management

    solution

    Data quality principles that are embedded in process &

    system designacross the enterprise

    Standard Business

    Glossary with an authoring

    and publishing process

    but not all at once!

  • Copyright 2016 by Data Blueprint

    Where to Start When Developing a Data Quality Framework

    No Accountability or Responsibility for Data

    Many resources create, review or manage data No formal data stewardship roles and responsibilitiesDifficult to determine who is accountable & responsible for data

    Establish Data Ownership & Increase Data Accountability

    Define clear data ownership & stewardship roles, accountability & responsibility of data.

    Define a vetting & onboarding process ensuring resource capacity

    Establish decision rightsMaintain a master list of all Data Stewards and their related data domains.

    Inconsistent Master Data

    Fire drill to fix data issues in isolationLittle standardization across Lines of Business and Geographies

    Difficult to report on a global level at needed level of detailNo formal master data change control process

    Consistent Master Data Management

    Develop master data standardsEstablish change controlDefine consistent data models Ongoing governance and stewardship of master data

    Inconsistent Data DefinitionsPoor Data Quality

    Business Terms definitions differ by groupData monitored in silos Fragmented use of a variety of toolsFocus on find and fix instead of root cause analysisNo standard reporting/tracking metrics

    Term Authoring & PublishingIncrease Data Quality

    Establish and implement process to define business accredited terms & publish for consumption enterprise wide

    Stewards define business rules used to structure & profile data

    Develop and implement DQ standards Ongoing Score carding & DQ metric reporting

  • Copyright 2016 by Data Blueprint

    1313

    Every work stream has a part to play if organization is to move from a reactive to proactive approach to improving data quality

    Principle Implications1. Capture data right, first

    timeWherever possible all data is captured once, at source, and validated on input

    2. Engineer-in positive impacts on data quality

    Wherever possible data quality improvement is automated, proactive and on-goingSystems, processes and products are inherently designed to improve data quality. e.g.

    The possibility of errors when data is entered or changed is engineered out

    Processes are designed to enter and maintain accurate data

    Data entry is quick and intuitive for users

    3. Integrate data quality into business processes

    Data quality standards and rules are defined and integrated into day-to-day operations e.g. instances of non-compliance are fixed at root causeThere is clear accountability throughout the organization for promoting & sustaining good quality data

  • Copyright 2016 by Data Blueprint

    Discovery - Identify potential data quality issues.Profile Data - Review sample data and existing data creation and usage process to provide context for business rule discussion with Data Owners and Business Data Stewards.Develop Business Rules - Work with Data Owners and Business Data Stewards to review documented business rules and capture undocumented rules. Define Metrics - Define metrics and acceptable thresholds against which to measure levels of quality.Evaluate Data with Metrics - Execute business rules against production data and evaluate results. Utilize acceptable thresholds set by the Data Governance Board to evaluate the data. Findings Review - Review the Findings with the Data Owners and Business Data Stewards.Remediate Anomalies - Implement and execute remediation process to fix problems with production data.Monitor Health - Define and implement a continuous monitoring/remediation plan to prevent and/or fix data quality problems in the future.

    Repeatable Process

  • Copyright 2016 by Data Blueprint

    Profile Data

    Develop Business Rules

    Define Metrics

    Evaluate Data with Metrics

    Remediate Anomalies

    Monitor Health

    Discovery

    FindingsReview

    Findings Review

    Discovery

  • Copyright 2016 by Data Blueprint

    Identifying Business Need & Resources

    Discovery process not solely the responsibility of business, IT, or Data Governance/Data Quality organizations. Requires collaboration.

    Business need or problem definitions can be influenced by a variety of sources such as:

    Migrating to One ERP and One CRM

    Master Data Management Processes

    Suspected data quality deficiencies impacting BV & regulatory requirements

    Data Governance Board initiatives

    Needs of data-centric business strategies and opportunities

    Directives from executive sponsorship team

  • Copyright 2016 by Data Blueprint

    Identifying Business Need & Resources Identify Key Resources

    BusinessData Quality

    Center of Excellence

    IT

    Data Quality Analyst

    IT Data Steward

    BusinessData

    Steward

    DataOwner

  • Copyright 2016 by Data Blueprint

    Identifying Business Need & Resources Refine Problem & Develop Initial Business Case

    Data quality team refines original problem statement to ensure that the defined project objectives are achievable and in alignment with enterprise strategy.

    Refinement of Problem

    Statement

    Begin a list of potential business impacts related to degraded quality of data within the project scope. Human capital expense for manual correction Revenue lost due to inaccurate information Regulatory fines from compliance violations Damage to corporate reputation

    Initial Development of Business Case

  • Copyright 2016 by Data Blueprint

    19

    Profile Data

    Develop Business Rules

    Define Metrics

    Evaluate Data with Metrics

    Remediate Anomalies

    Monitor Health

    Discovery

    FindingsReview

    Findings Review

  • Copyright 2016 by Data Blueprint

    20

    What to Include?

    Data Quality team should work to define the specific data elements and their encompassing source systems which will be included in the analysis.

    Focus on Data that Answers Questions

    Confirm that the data available in the defined data sources is capable of answering the questions posed by the project problem statement.

    Identifying and Requesting Data

  • Copyright 2016 by Data Blueprint

    21

    Allows for a query against live data that can be re-utilized in a repeatable process.

    Preferred for access to current data. Provides greater flexibility of data import options. Requires effort from IT team members and may have an

    associated cost.

    Build a Direct Database

    Connection

    Useful when direct connection is not available. Requires knowledgeable analyst for identifying correct

    format and uploading. Each data load requires a new data extraction effort

    Extract Data into Flat Files

    Identifying and Requesting Data

    Consider - Staging Area for data preparation

    Two Options

  • Copyright 2016 by Data Blueprint

    22

    An initial profile should be run against the data without any business rules to confirm a successful data import.

    This profile serves two purposes It is a sense check, allowing the analyst an overview of the

    data to ensure the data was loaded properly. It provides an overview against which initial observations can

    be made.

    Initial Data Profiling and Discovery

  • Copyright 2016 by Data Blueprint

    23

    Initial Data Profile Output

    Uniqueness

    Percentages Counts Key Fields

    Nulls

    Percentages Counts Key Fields

    Min/Max

    Unexpected Values

    Values outside domain

    Data Review at a Glance

  • Copyright 2016 by Data Blueprint

    Profile Data

    Develop Business Rules

    Define Metrics

    Evaluate Data with Metrics

    Remediate Anomalies

    Monitor Health

    Discovery

    FindingsReview

    Findings Review

  • Copyright 2016 by Data Blueprint

    Data owners Business data stewards IT data stewards

    Conduct a data profiling debrief session

    Purpose of the data profiling exercise Scope of the data included in the profile Expectations of them to assist in the development and

    application of business rules to future profiles.

    Communicate to the data owners

    Initial Data Profiling and DiscoveryReport Findings to Data Owners and Stewards

  • Copyright 2016 by Data Blueprint

    It may be advisable to extract information from reporting tool results into another format which can be shared with all members of the data quality team. Excel PDF

    Peculiarities of the data profile should be highlighted for review with the data owners.

    Any inferences about potential business rules, as well as questions about patterns in the data, should be noted.

    Initial Data Profiling and DiscoveryCollect and Report Information from Profile

  • Copyright 2016 by Data Blueprint

    27

    Next StepsInitial profiling is just the beginning of the Data Quality Process

    The real benefit is in developing business rules that can be applied to data in order to continue the repeatable process and develop actionable insights.

  • Copyright 2016 by Data Blueprint

    28

    Profile Data

    Develop Business Rules

    Define Metrics

    Evaluate Data with Metrics

    Remediate Anomalies

    Monitor Health

    Discovery

    FindingsReview

    Findings Review

  • Copyright 2016 by Data Blueprint

    29

    Defining Business Rules and MetricsSourcing Business Rules

    Possible sources of business rules

    Master Data Standards documents

    Subject matter expert interviews Data Stewards, Owners, and

    Consumers Desktop procedures documents Process and system

    documentation

    What to look for

    Allowable values Required fields Links between fields Fields that link between data

    domains Potential duplicate records Insights into patterns that might

    be found in the data profile

  • Copyright 2016 by Data Blueprint

    30

    Defining Business Rules and MetricsExample Business Rules Business Rule Related Business Action Data Quality Check

    Tax Identifier is required for all non-employee vendors.

    A W-9 is required before entering a new vendor into the vendor management system.

    Rule is violated if Vendor Type Employee and Tax ID is Null

    Tax Identifiers should be entered in the valid format for type of identification number.

    Consistent formatting of tax identification numbers allows for higher confidence in searching and validation.

    Rule is violated if tax ID is not in a valid format for the type, i.e. SSNs should be 999-99-9999; FEINs should be 99-9999999

    Entities (companies, employees, products, etc.) should be unique and duplicates should not be entered.

    Entity names entered into the system should be entered in a consistent format to assist with presentation and elimination of duplicates.

    Rule is violated if entity names are duplicated.

    E-mail addresses must be entered in valid formats.

    Complete e-mail addresses should be entered into the system in order to ensure valid contact information.

    Rule is violated if email address field is not a valid format (e.g. [email protected])

    mailto:[email protected]

  • Copyright 2016 by Data Blueprint

    31

    Defining Business Rules and MetricsWhat Makes Good Metrics?

    Meaningful to the Business the score should relate to improved business performance

    Measurable must be able to be quantified within a discrete range

    Controllable some action can be taken to change the data and improve the score

    Reportable should provide enough information to the data steward to take action

    Traceable must be able to be tracked over time to show improvement efforts

  • Copyright 2016 by Data Blueprint

    32

    Defining Business Rules and MetricsExamples of Metrics for Various Dimensions

    Does each value fall within an allowed set of values? Does each value conform to the defined level of precision?Accuracy

    Is data present in required fields?Completeness

    Is the data used the same way across the enterprise?Consistency

    Is the data up to date?Currency

    Are identifying data elements unique?Integrity

    Are data elements stored as assigned data types, e.g. is text stored in a telephone number field?Conformity

    Do duplicate records exist?Duplication

  • Copyright 2016 by Data Blueprint

    33

    Profile Data

    Develop Business Rules

    Define Metrics

    Evaluate Data with Metrics

    Remediate Anomalies

    Monitor Health

    Discovery

    FindingsReview

    Findings Review

  • Copyright 2016 by Data Blueprint

    34

    Evaluating Data & Reporting FindingsRe-profile Data with Business Rules and Report Findings

    Definition, refinement, and application of business rules should be repeated iteratively and reviewed until the data owners are satisfied with the accuracy and completeness of the business rule implementation.

    Present all findings to the data owners and stewards for review.

    The goal of this step is to finalize the data quality assessment definition such that an ongoing monitoring process can be modeled from the activity.

  • Copyright 2016 by Data Blueprint

    35

    Profile Data

    Develop Business Rules

    Define Metrics

    Evaluate Data with Metrics

    Remediate Anomalies

    Monitor Health

    Discovery

    FindingsReview

    Findings Review

  • Copyright 2016 by Data Blueprint

    36

    Two Routes

    Find-and-Fix Process Change

    Remediating AnomaliesCorrective Actions

    Leverage the continuous monitoring of data quality reports to confirm that the data cleansing procedures are effective

    BestPractice

  • Copyright 2016 by Data Blueprint

    37

    The costs of poor data quality include: human capital expense for manual correction revenue lost due to inaccurate information regulatory fines from compliance violations damage to corporate reputation

    Data Stewardship Training - Session 2 37

    Business Value from Data Quality

  • Copyright 2016 by Data Blueprint

    38

    Business Value Calculations# Errors Identified

    Potential Cost Avoidance

    Business Rule: Customer Address Invalid 84367 92,952.42$

    Calculation Description:Manual effot to research and correct an invalid Customer Address

    Average Salary for worker engaged in correcting address 25,000.00$ Average Salary including benefits 34,375.00$ Salary per hour 16.53$ Salary per minute 0.28$ # minutes to correct an invalid address 4Cost of manual effort to research and correct one address: 1.10$

    Sheet1

    # Errors IdentifiedPotential Cost Avoidance

    Business Rule: Customer Address Invalid84367$ 92,952.42

    Calculation Description:Manual effot to research and correct an invalid Customer Address

    Average Salary for worker engaged in correcting address$ 25,000.00

    Average Salary including benefits$ 34,375.00

    Salary per hour$ 16.53

    Salary per minute$ 0.28

    # minutes to correct an invalid address4

    Cost of manual effort to research and correct one address:$ 1.10

  • Copyright 2016 by Data Blueprint

    39

    State the issue (e.g. duplicate vendor records are causing issues with payments) Ask Why? five times

    Remediating AnomaliesFive Whys for Root Cause (Danette McGilvray)

    New master records are created instead of using existing ones.

    Why are there duplicate records?

    The reps dont want to search for existing records.

    Why do they create new duplicate records?

    Search takes too long.Why dont they want to

    search for existing records?

    Reps have not been trained in proper search techniques, system performance is poor.

    Why is the search time too long?

    Reps are measured by how quickly they can create a new master record and they dont see the implications of duplicate data downstream.

    Why is long search time a problem?

  • Copyright 2016 by Data Blueprint

    40

    Profile Data

    Develop Business Rules

    Define Metrics

    Evaluate Data with Metrics

    Remediate Anomalies

    Monitor Health

    Discovery

    FindingsReview

    Findings Review

  • Copyright 2016 by Data Blueprint

    41

    MonitoringAt the Enterprise Level

    Customer Product Supplier

    Open Deferred Remediated Total Issues Open Deferred Remediated Total Issues Open Deferred Remediated Total Issues

    156

    48

    97

    11

    225

    140

    19

    66

    145

    43

    90

    12

    Data Quality Issues by Domain as of 1-31-2015

    0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115# Open Issues

    Customer

    Product

    Supplier

    26 Critical Issues

    26.80% of Open

    31 Critical Issues

    46.97% of Open

    5 Critical Issues

    11.63% of Open

    Critical Data Quality Issues

    Total Data Quality Issues open more than 30 days: 92

    Total Data Quality Issues open more than 60 days: 31

    Total Data Quality Issue open more than 90 days: 17

    OpenDeferredRemediatedTotal Issues

  • Copyright 2016 by Data Blueprint

    42

    Establish Process to Consume Artifacts from

    Data Profiling

    Take corrective measures to improve the

    data quality

    Verify through monitoring that

    improvements were implemented by either data cleansing, controls at

    the root cause, or a combination of both.

    The data stewards should understand how to

    interpret the metrics, including what is being measured and why.

    Monitoring can be costly so it should focus

    primarily on those processes that are

    essential to the business.

    MonitoringMonitoring by Data Stewards

  • Copyright 2016 by Data Blueprint

    Data Governance & Stewardship Maturity Model

    Define

    Control

    Measure

    Expand

    Optimize

    Business Glossary &

    Roles

    Data Standards

    DQ Dashboards

    Data Sprints

    Continuous Improvement

    Identify & catalog data assets, map to owners & stewards Stewards are identifying, defining critical data, publishing

    business accredited terms for consumption

    Define authorities, control changes Data Standards enforced by Stewards & Owners Harmonize definitions across functions, Lines of Business,

    Geographies

    Measuring data quality (DQ) Monitor ongoing stewardship operations & data use Data Standards implemented for new system

    Repeatable data management processes in place Expand scope & breath of stewardship program Increase volume & efficiency of data it supports.

    Iteratively enhance data quality & stewardship performance Continuously prioritize & act upon enhancement opportunities

    from monitoring & expansion activities.

  • Copyright 2016 by Data Blueprint

    Parts of organization unaware of DG/Stewardship and do their own thing; inconsistent with DG standard

    Business units may be unaware of benefits and added value

    Risk: Awareness

    Business units refuse to adopt standards put forth System constraints make it difficult to implement new standards Business units do not engage the Global Data Services

    team on projects

    Risk: Adoption

    Funding model that aligns with governance and organizational structure (i.e. building data connections to sources with DQ tool)

    Cost of building and establishing Global Data Services

    Risk: Funding

    Stewardship skills are hard to maintain Build and sustain capability across a large world-wide

    organization

    Risk: Training

    Data Governance and Stewardship is a long-term program, not a one-time project

    Risk: Time to Build

    Strong communication plan that is meshed into overall corporate communications

    Corporate governance and strong sponsorship of DG/Stewardship

    Mitigation: Awareness

    Accountability and approval process by Data Owners and DG Enterprise Steering Committee

    Document exceptions and work-arounds Corporate governance and Architecture Review Board to

    align projects with DG/Stewardship

    Mitigation: Adoption

    DG & Stewardship funding established Cost allocation aligned with DG & Stewardship model Project specific costs

    Mitigation: Funding

    Partner with Data Architecture, Global Change & Process Excellence unit to provide a training curriculum

    Define staffing models and career paths that outline training and align with DG/Stewardship

    Mitigation: Training

    Leverage parallel opportunities to accelerate build and implementation (Master Data, Global KPI reporting, One ERP road map, One CRM)

    Pilot projects to quickly show tangible benefits

    Mitigation: Time to Build

  • Copyright 2016 by Data Blueprint

    QUESTIONS??

    45

    Data Quality Success StoriesDataversity Webinar 7-12-2016Peter Intro SlideKaren Akens, CDMPSlide Number 4Slide Number 5Clients Data LandscapeCase Study - Supplier MasterChallenges from a Lack of Data QualitySlide Number 9If you want to avoid situations like thisyou need to have thisSlide Number 12Slide Number 13Repeatable ProcessSlide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26Next StepsSlide Number 28Slide Number 29Slide Number 30Slide Number 31Slide Number 32Slide Number 33Slide Number 34Slide Number 35Slide Number 36Business Value from Data QualityBusiness Value CalculationsSlide Number 39Slide Number 40Slide Number 41Slide Number 42Slide Number 43Slide Number 44QUESTIONS??