1er simposio latinoamericano data quality fundamentals miguel angel granados troncoso

31
1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Upload: camille-ketchum

Post on 15-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

1er Simposio Latinoamericano

Data Quality Fundamentals

Miguel Angel Granados Troncoso

Page 2: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Agenda

• Scenarios• Definitions, Processes and Standards• Data Quality Services (DQS)• DQS Solutions

Page 3: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso
Page 4: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Organizational Compliance

Optimized Productivity

11Extend Any Data, Anywhere

Fast Timeto Solution

Scalable Analytics & DW

8Credible, Consistent Data

Peace of Mind

Managed Self-Service BI

4

Rapid Data Exploration

3Blazing-Fast Performance

2Required 9s& Protection

1

Scale on Demand

5 76

12109

MISSION CRITICAL CONFIDENCE

BREAKTHROUGH INSIGHT

CLOUD ON YOUR TERMS

Page 5: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Credible, Consistent Data% of master data complete & accurate

Hrs spent per employee each week searching for info

Top 20% Performers1.2hrs

Middle 50% Performers 2.8hrs

91%

68%

Under 50%Bottom 30% Performers 6hrs

Companies with accurate data perform better¹

Single BI Semantic

Model

Data Quality

Services¹Source: “Turning Pain into Productivity with Master Data Management,” Aberdeen Group, Feb 2011

Delivered with MasterData

Services

#7

Page 6: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Why is Data Quality Important?Data quality problems cost U.S. businesses more than $600 billion a year.

Data Warehousing Institute (TDWI)

Costs associated with bad data include: • Excess inventory• Higher supply chain costs• higher direct marketing costs• Billing• And more…

Page 7: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Common Data Quality IssuesData Quality Issue Sample Data Problem

Format Do values follow consistent formatting standards ? Telephone number formats:xxxxxxxxxx, (xxx) xxx-xxxx 1.xxx.xxx.xxxx, etc.

Standard Are data elements consistently defined and understood ? ‘Gender code’ = M, F, U ‘Gender code’ = 0, 1, 2

Consistent Do values represent the same meaning ? How is revenue presented ?Dollars, Euro, Both?

Complete Is all necessary data present ? 20% of customers’ last name is blank, 50% of zip-codes are 99999

Accurate Does the data accurately represent reality or a verifiable source? A Supplier is listed as ‘Active’ but went out of business six years ago

Valid Do data values fall within acceptable ranges? Salary values should be between 60,000-120,000

Duplicates Data appears several times Both John Ryan and Jack Ryan appear in the system – are they the same person?

Page 8: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Agenda

Scenarios• Definitions, Processes and Standards• Data Quality Services (DQS)• DQS Solutions

Page 9: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Data Governance

IT Governance

Data Governance

Data Management

Data Quality

Data Correctness

Strategic

Tactical

Page 10: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Data Management

Content

• Subject details• Attribute identification• Subject names• Definitions• Values representation• Standard formats

Relationship

• Identity part (similar attributes)• Group (Rules/Logic)• Hierarchy (Parent/Child)• Relationship Rules/Scenarios

Access

• Access and Sharing Politics (internal/external)

• Data provider• Metadata (use, lineage, etc)• Regulations/Security• External data sources

Changes Management

• Data Quality and Acceptability• Measurement and monitoring• Detection and Error correction• Centralized change tracking• Jurisdiction over data

Data Standarization

Data Management

Master Data Management

Page 11: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Data Quality

• Data quality consists of verifying whether the data is suitable for their intended use in operations, decision making and planning.

Domain Management

Knowledge Discovery

Discovery Value

Management

Page 12: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Quality Control Efforts• Knowing the context of the data• Profile the data required• Create and maintain quality standards• Tracking Data Quality

Page 13: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Requirements for Data Quality Solution

Cleansing

MatchingProfiling

Monitoring

Tracking and monitoring the state of data quality activities and quality of data.

Analysis of the data source; providing insight into the quality of the data, to identify data quality issues.

Amend, remove or enrich data that is incorrect or incomplete. This includes correction, standardization and enrichment.

Identifying, linking and removing duplications within or across sets of data.

Page 14: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

How to Manage Data Quality?Data quality management entails the establishment and deployment of:– Roles– Responsibilities– Policies– Procedures– Technology

Page 15: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Data Quality Standards

ISO 8000

• Data Quality Principles• Characteristics that

defines data quality• Processes that ensure

data quality

ISO 22745

• Defines open technical dictionaries

• Applying dictionaries to master data

International Association for Information and Data Qualityhttp://www.iaidq.org/

Page 16: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Agenda

ScenariosDefinitions, Processes and Standards• Data Quality Services (DQS)• DQS Solutions

Page 17: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

What is Data Quality Services?

Data Quality Services (DQS) is a Knowledge-Driven data quality solution, enabling IT Pros and data stewards to easily improve the quality of their data

Page 18: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

DQS Solution Concepts

Knowledge-DrivenBased on a Data Quality Knowledge Base (DQKB) that is reusable for a variety of data quality improvements

Knowledge Discovery

Acquire additional knowledge through data samples and user feedback

SemanticsData is mapped into Data Domains, which capture its Semantics

Open and Extendible

Support use of user-generated knowledge and IP by 3rd party reference data providers

Easy to Use

Compelling user experience designed for increased productivity

Page 19: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Data Quality Knowledge Base (DQKB)

Matching Policy

Domains

Composite Domains

Matching Rules

Reference Data Services

Composite Domain Rules

Value Relations

Reference Data Services

Domain Rules

Term-based Relations

Values

• Repository of knowledge about data:– Domains define values and rules for each field– Matching policies define rules for identifying duplicate records

Page 20: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

DQS Knowledge Sources

Windows Azure Marketplace™ Data MarketCleanse and enrich data with Reference Data Services from DataMarket

DQS Data StoreWebsite that contains DQS knowledge available for downloading

3rd Party Reference Data ProvidersOpen integration with external 3rd party reference data providers

Organization DataCreate domains from your own data sources

Out of the Box Knowledge A set of data domains that come out of the box with DQS

Page 21: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

What is a Domain?

Domain

Values

Reference Data Rules and Relationships

• Domains are specific to a data field

• Domains contain the rules for the data

• Domains can be individual or composite

Page 22: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

KB

Name

Family NameFirst Name

What is a Reference Data Service?

Address

• The Azure Marketplace hosts specialist data cleansing providers Set up an account

Subscribe to a reference service

Map your domain to the reference service

Page 23: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

DQS Architecture Overview

DQS Clients

Knowledge Discovery and Management

DQS Cloud Services

DataMarket - Categorized Reference DataDQS Client

DQS Server

Reference Data API(Browse, Set, Validate…)

Reference Data API(Browse, Get, Update…)

Common Knowledge Store

DQS Engine

Knowledge Discovery Data Profiling Exploration Matching

Cleansing

Reference Data

Reference Data Services

DQS Store - KB, Domains

© 2010 Microsoft Corporation. Microsoft Materials - Confidential. All rights reserved.

Interactive DQ Projects

Administration

Future Clients: Excel, SharePoint,MDS…

DQ Active Projects Published KBs

SSIS DQS Cleansing Component

DQ Projects Store

Other DQS Clients

3rd Party Reference Data

Page 24: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Agenda

ScenariosDefinitions, Processes and StandardsData Quality Services (DQS)• DQS Solutions

Page 25: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

IntegratedProfiling

Progress NotificationsStatus

DQS process

Build

Use

DQ Projects

Knowledge Management

Cloud Services

KnowledgeBase

EnterpriseData

ReferenceData

Page 26: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Interactive Cleansing – DQS Project• Analyzes the quality of source data• Automatically corrects and enriches the data• Manual approval/rejection of suggestions provided by the cleansing algorithm/ reference data services

Page 27: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Knowledge Base

Batch Cleansing - Using SSIS

Matching Policy

Reference Data Definition

Invalid

Corrected

Suggested

Correct

Reference Data Services

New

DQS server

Values/Rules

Page 28: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Matching – DQS Project

Why Match?• Identify duplicates within the data source• Create consolidated view of data

DQS Matching• Build a matching police• Matching training• Create a matching project • Choose survivors

Page 29: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso
Page 30: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Agenda

ScenariosDefinitions, Processes and StandardsData Quality Services (DQS)DQS Solutions

Page 31: 1er Simposio Latinoamericano Data Quality Fundamentals Miguel Angel Granados Troncoso

Q&A

Miguel Ángel Granados Troncoso@[email protected]

Personal Bloghttp://www.granadostroncoso.com.mx

PASS Mexico City Chapterhttp://mexico.sqlpass.org @PASSMXDF

SolidQ Journalhttp://www.solidq.com/sqj/Pages/Home.aspx

Microsofthttp://www.microsoft.com/sqlserver/en/us/solutions-technologies/SQL-Server-2012-business-intelligence.aspx