sql s erver d ata q uality s ervices marc jellinek principal consultant – neudesic...
TRANSCRIPT
ABOUT ME
Experience
• Principal Consultant - Neudesic
• Assistant Director (SQL Team) – Application Engineering at Ernst & Young
• IT Manager at MLB Network
• Sr. Technology Specialist at Microsoft
Technologies
• Microsoft SQL Server 6.0, 6.5, 7.0, 2000, 2005, 2008, 2008 R2 and 2012
• Relational Engine, Analysis Services, Integration Services and Reporting Services
Marc Jellinek – [email protected]
SESSION OBJECTIVES
• Introduction to SQL Server Data Quality Services (DQS)
• Understanding the problem
• Demo
• Where do we go from here?
SETTING THE STAGE
• Building from the SQL Server Series: Master Data Services in SQL Server 2012, presented by Patrick Gallucci
• http://www.neudesic.com/media/webcasts/20120501/20120501.wmv (start at 6:26)
• PASS MDM/DQS Virtual Chapter http://masterdata.sqlpass.org
• Based on demos from “SQL Server 2012 Developers Update”
Data Quality Issue Sample Data ProblemStandard Are data elements consistently
defined and understood?Gender code = M, F, U in one system and Gender code = 0, 1, 2 in another system
Complete Is all necessary data present? 20% of customers’ last name is blank,
50% of zip-codes are 99999
Accurate Does the data accurately represent reality or a verifiable source?
A Supplier is listed as ‘Active’ but went out of business six years ago
Valid Do data values fall within acceptable ranges?
Salary values should be between 60,000-120,000
Unique Data appears several times Both John Ryan and Jack Ryan appear in the system – are they the same person?
THE DATA QUALITY PROBLEM SPACE
WHAT’S THE PROBLEM
• My name is Marc Jellinek
• Marc <> “Mark”, “Marck” or “March”
• Jellinek <> “Jelinek”, “Jellineck”, “Jelineck”, “Jelliner”, “Jeliner” or “Jellyneck”
Rr
WHAT’S THE PROBLEM
Jellinek Jelinek Jellineck Jelineck Jelliner Jeliner Jellyneck
Marc
Mark
Marck
March
THE NIGHTMARE SCENARIO
The Customer Dimension
– Jellinek, Marc– Jellinek, Mark– Jellinek, Marck– Jellinek, March– Jelinek, Marc– Jelinek, Mark– Jelinek, Marck– Jelinek, March– Jellineck, Marc– Jellineck, Mark– Jellineck, Marck– Jellineck, March– Jelineck, Marc– Jelineck, Mark
– Jelineck, Marck– Jelineck, March– Jelliner, Marc– Jelliner, Mark– Jelliner, Marck– Jelliner, March– Jelliner, Marc– Jelliner, Mark– Jelliner, Marck– Jelliner, March– Jellyneck, Marc– Jellyneck, Mark– Jellyneck, Marck– Jellyneck, March
ANALYTIC IMPACT
• Average Revenue per customer
• Average Profit per customer
• Number of customers
• Customers per Geography
• Customers by Income
• Customers by Gender
• Customers by Educational Level
• Customers by Product Bought
OBLIGATORY TRUISMS
• The accuracy of your reporting is determined by the accuracy of your data (Garbage In, Garbage Out)
• Decisions made based on data will only be as good as the data on which you are basing your decisions.
• You can’t manage what you can’t measure. Inaccurate measurements lead to interesting management challenges.
THE DATA QUALITY SOLUTION SPACE
Amend, remove or enrich data that is incorrect or incomplete. This includes correction, enrichment and standardization.
Identifying, linking or merging related entries within or across sets of data.
Cleansing Matching
Profiling MonitoringAnalysis of the data source to provide insight into the quality of the data and help to identify data quality issues.
Tracking and monitoring the state of Quality activities and Quality of Data.
SQL SERVER 2012 DATA QUALITY SERVICES
High quality data is critical to effective business intelligence and to business activities
DQS is an on-premise Data Quality product in SQL Server 2012, extendible with knowledge from multiple parties thru Azure DataMarket
Richer DQ knowledge and capabilities in the cloud will make it even easier to provide high quality data
Data Quality Services (DQS) is a Knowledge-Driven data quality solution enabling IT Pros and data stewards to easily improve the quality of their data
Included with SQL Server 2012 Enterprise and BI Editions
KEY DATA QUALITY SERVICES CONCEPTS
Knowledge-Driven
Semantics
Knowledge Discovery
Based on a Data Quality Knowledge Base (DQKB)
Data Domains capture the semantics of your data
Acquires additional knowledge the more you use it
Open and Extendible
Easy to use
Add user-generated knowledge & 3rd party reference data providers
User experience designed for increased productivity
DQS ARCHITECTURE
MatchingReference
Data
DQ Clients
DQS UI
DQ Server
DQ Projects Store Common Knowledge Store Knowledge Base Store
DQ Engine
3rd Party / Internal
MS DQ Domains Store
Reference Data Services
Reference Data SetsSSIS DQ
Component
DQ Active ProjectsMS Data Domains
Local Data
Domains
Published KBs
Knowledge Discovery
Data Profiling & Exploration
Cleansing
Knowledge Discovery and Management
Interactive DQ Projects
Data Exploration
Azure Market Place
Categorized Reference Data
Categorized Reference Data Services
Reference Data API(Browse, Get, Update…)
RD Services API(Browse, Set,
Validate…)
MDS Excel Add in
DATA QUALITY SERVICES PROCESSES
Build
Use
DQ Projects
KnowledgeManagement
Match & De-dupe
Correct
& standardize
Manage KnowledgeEnterprise
Data
ReferenceData
Cloud Services
IntegratedProfiling
NotificationsProgressStatus
KnowledgeBase
Discover / Explore Data
BASIC DEFINITIONS
• Knowledge Base
– Stores all the knowledge related to a specific type of data source
– Container for domains
• Domain
– Semantic representation of a type of data in a data field or column
– Trusted values, invalid values and erroneous data
– Synonym associations, term relationships, validation and business rules, matching policies
• Matching Rule
– Set of rules and conditions that determine a match or duplicate
DATA QUALITY SERVICES COMPONENTS
• Data Quality Server
• Data Quality Client
• DQS Cleansing Component for SQL Server Integration
• Data Quality Processes in Master Data Management
DATA QUALITY SERVICES COMPONENTS
• Data Quality Server
– SQL Server Databases
• DQS_MAIN
– DQS Stored Procedures, the DQS Engine and published Knowledge Bases
• DQS_PROJECTS
– Data required for knowledge base management and DQS project activities
• DQS_STAGING_AREA
– Intermediate staging area where source data is copied and processed
DATA QUALITY SERVICES COMPONENTS
• Data Quality Client
– Standalone application
• Designed for both data stewards and DQS Administrators
• Perform knowledge management, data quality projects and administration in one user interface
• Allows for domain management, matching policy creation, data cleansing, matching, profiling, monitoring and server administration.
• Can be installed on a remote computer
DATA QUALITY SERVICES COMPONENTS
• DQS Cleansing Component in SQL Server Integration Services
– Performs data cleansing as a part of an SSIS package
– Alternative to running a cleansing project within the Data Quality Services Client application
• Data Quality Processes within Master Data Services
– Perform de-duplication on source data and master data within the Microsoft SQL Server Data Services Add-in for Microsoft Excel.
DEMO
RESOURCES
•DBI207: Using Knowledge to Cleanse Data with Data Quality Serviceshttp://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI207
•Data Quality Services Bloghttp://blogs.msdn.com/b/dqs
•Books Online for SQL Server - Data Quality Serviceshttp://technet.microsoft.com/en-us/library/ff877925(SQL.110).aspx
•Install Data Quality Serviceshttp://technet.microsoft.com/en-us/library/gg492277.aspx
•Used Master Data Services Configuration Manager, set up IIShttp://msdn.microsoft.com/library/ee633744%28SQL.110%29.aspx
•Troubleshoot Installation and Configuration Issues (Master Data Services in SQL Server 2012http://go.microsoft.com/fwlink/?LinkId=226284
•SQL Server 2012 Developer Training Kit Web Installer http://www.microsoft.com/en-us/download/details.aspx?id=27721
•SQL Server 2012 Update for Developers Training Workshophttp://social.technet.microsoft.com/wiki/contents/articles/6981.sql-server-2012-update-for-developers-training-workshop.aspx
•SQL Server 2012 Update for Developers Training Kithttp://social.technet.microsoft.com/wiki/contents/articles/6982.sql-server-2012-developer-training-kit-bom-en-us.aspx
•SQL Server 2012 Update for Developers Training Kit Contenthttp://social.technet.microsoft.com/wiki/contents/articles/6976.sql-server-2012-developer-training-kit-content-en-us.aspx
• MSDN – Data Quality Serviceshttp://msdn.microsoft.com/en-us/library/ff877925(v=sql.110).aspx
• MSDN Discussions – Data Quality Serviceshttp://social.msdn.microsoft.com/Forums/en-US/sqldataqualityservices/threads
• Technet – Data Quality Serviceshttp://technet.microsoft.com/en-us/library/ff877925.aspx
• PASS MDM/DQS Virtual Chapterhttp://masterdata.sqlpass.org
Driving Innovation with SAP On-Premise and Beyond
Monish NagisettyDaniel Sepp
7 June 201210:00-11:00 PDT
http://www.clicktoattend.com/?id=159718
Delivering a Semantic Model for Ad-Hoc Reporting
Steve Muise 21 June 201210:00-11:00 PDT
http://www.clicktoattend.com/?id=159917
Exploring the New Hadoop Implementation in Azure
Jacob Saunders 10 July 201210:00-11:00 PDT
http://www.clicktoattend.com/?id=160452
Amerishore – an Innovative, Socially Conscious Approach to Offshoring
Tracy Derr 17 July 201210:00-11:00 PDT
http://www.clicktoattend.com/?id=160650
Data Integration Improvements within SQL Server 2012
Kola Bolarin 24 July 201210:00-11:00 PDT
http://www.clicktoattend.com/?id=160322
EDI: The Reports of My Death Have Been Greatly Exaggerated
Abhilash Shanmug 9 August 201210:00-11:00 PDT
http://www.clicktoattend.com/?id=159720
http://www.neudesic.com/About/Events/Pages/RecentWebcasts.aspx
THANK YOU