powerpoint presentationdownload.microsoft.com/documents/hk/technet/techdays2013/day … ·...

41

Upload: others

Post on 01-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 2: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 3: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 4: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

Data Quality

Services 101

Knowledge

Base Driven

Data Quality

Matching

Integration

DQS with

MDS and SSIS

Page 5: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 6: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

DQ Issues and DQ Dimensions

Name Gender Street House # Zip code City State D.O.B

John Doe Male 60th street 45 New York New York 08/12/64

Jane Doe Male Jonathan ln 36 10023 Poughkeepsy NY 21-dec-1954

Name Gender Street House # Zip

code

City State D.O.B

John Doe Male E 60th St 45W 10022 New York NY 08/12/64

Jane Doe Female Jonathan

Lane

36 10023 Poughkeepsie NY 12/21/54

Name Address Postal Code City State

John Smith 545 S Valley View Drive # 136 34563 Anytown New York

Margaret & John smith 545 Valley View ave unit 136 34563-2341 Anytown New York

Maggie Smith 545 S Valley View Dr Anytown New York

John Smith 545 Valley Drive St. 34253 NY NY

Name Address Zip Code City State Cluster

John Smith 545 S Valley View Drive # 136 34563 Anytown New York 1

Margaret & John smith 545 Valley View ave unit 136 34563-2341 Anytown New York 1

Maggie Smith 545 S Valley View Dr Anytown New York 1

John Smith 545 Valley Drive St. 34253 NY NY 2

Before

Before

After

After

Completeness Accuracy Conformity Consistency Uniqueness

Page 7: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

Build

Use

DQ Projects

Knowledge

Management

Connect

Knowledge

Base

Page 8: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

Build

Use

DQ Projects

Knowledge

Management

Connect

Integrated

ProfilingKnowledge

Base

Page 9: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

9

Amend, remove or

enrich data that is

incorrect or incomplete.

This includes correction,

enrichment and

standardization .

Identifying, linking or

merging related

entries within or

across sets of data.Cleansing Matching

Profiling MonitoringAnalysis of the data

source to provide

insight into the quality

of the data and help to

identify data quality

issues.

Tracking and

monitoring

the state of Quality

activities and Quality

of Data.

Page 10: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

Matching

Reference

Data

DQ Clients

DQS UI

DQ Server

DQ Projects Store Common Knowledge Store Knowledge Base Store

DQ Engine

3rd Party

/ Internal

MS DQ

Domains Store

Reference

Data

Services

Reference

Data Sets

SSIS DQ

Component

DQ Active

Projects

MS Data

Domains

Local

Data

Domains

Published

KBs

Knowledge

Discovery

Data

Profiling &

Exploration

Cleansing

Knowledge

Discovery

and

Management

Interactive

DQ Projects

Data

Exploration

Azure Market Place

Categorized

Reference Data

Categorized

Reference Data

Services

Reference Data API

(Browse, Get,

Update…)

RD Services API

(Browse, Set,

Validate…)

MDS Excel

Add in

Future Clients –

Excel,

Dynamics

Page 11: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 12: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

15010 NE 36th Street

Page 13: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

RDS –

Reference

Data

In order

Knowledge

Base

Parsing

Page 14: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

• When you don’t have enough knowledge in your

knowledge base

• Sample : Mellissa DataWhen to Use

• Handing over the dirty job Advantage

• Paying subscription fee

• Large volumes of data may cause performance issues on

cloudDisadvantage

Page 15: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

15010 NE 36th Street , Redmond, WA, USA

USA, 15010 NE 36th Street , Redmond, WA

15010 NE 36th Street , Redmond, WA

Page 16: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 17: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 18: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 19: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 20: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 21: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 22: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 23: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 24: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

Data Issues

There are different ways to represent the same person or address in a database:

Data is ‘fuzzy’ in nature (spelling mistakes, abbreviations etc.).

Page 25: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

A matching policy is prepared in the knowledge base.

A matching policy consists of matching rules that

assess how well one record matches to another.

Specify in the rule whether records’ values have to be

an exact match, similar, or prerequisite.

Train your policy by running and tuning each rule

separately.

Page 26: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

Similarity, select

Similar if field

values can be

similar. Select Exact

if field values must

be identical.

Weight, determines

the contribution of

each domain in the

rule to the overall

matching score for

two records.

Prerequisite

validates whether

field values return a

100% match; else

the records are not

considered a match.

Minimum

matching score is

the threshold at or

above which two

records are

considered to be a

match.

Page 27: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

Uniqueness Usage Description Domains

Low • Define as Prerequisite

• Define with lower weights

Provides discriminatory

information

Gender, City, State

High • Define as Similar or Exact

• Define with higher weights

Provides highly identifiable

information and is highly

discriminatory

Names (First, Last,

Company),

Address Line 1

Completeness Usage Description

Low Do not use or define with low weight High level of missing values

High Include for matching if the column

provides highly identifiable information

Low level of missing values

Page 28: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

In Overlapping clusters a record may appear more than once in various clustered

results. This structure may be harder to read since the same record exists in multiple

clusters.

In Non-Overlapping clusters, the system unifies clusters containing the same

record. This structure is easier to read as you won't repeat the same observation

twice.

Overlapping Clusters

(A~B) , (B~C)

Non-Overlapping Cluster

(A~B~C)

Page 29: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 30: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 31: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 32: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 33: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 34: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

DQS Component Overview

Reference Data

Definition

Values/RulesSource +

MappingDQS Cleansing

Component

SSIS Package

Destination

Design Run

Activity

MonitoringInteractive Cleansing

Project

Page 35: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 36: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

http://social.technet.microsoft.com/wiki/contents/articles/14065.tsql-script-to-delete-dqs-projects-leftover-from-ssis-dqs-cleansing-component.aspx

Page 37: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 38: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 39: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:
Page 40: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords:

Thank you

Page 41: PowerPoint Presentationdownload.microsoft.com/documents/hk/technet/techdays2013/Day … · PowerPoint Presentation Author: Reza Rad Subject: Microsoft Tech Days Hong Kong 2013 Keywords: