presented by tony curcio and beate porst sponsored bydsxchange.net/uploads/free_is_better.pdf ·...

37
1 Luncheon Webinar Series January 13th, 2014 Free is Better Presented by Tony Curcio and Beate Porst Sponsored By:

Upload: dinhnhan

Post on 01-Feb-2018

220 views

Category:

Documents


1 download

TRANSCRIPT

1

Luncheon Webinar SeriesJanuary 13th, 2014

Free is Better – Presented by Tony Curcio and Beate Porst

Sponsored By:

Free is Better

Questions and suggestions regarding presentation topics? - send to

[email protected]

Downloading the presentation

• http://www.dsxchange.net/freeisbetter.html

• Replay will be available within one day with email with details

Pricing and configuration - send to [email protected] Subject line : Pricing

For those that stay through the entire presentation, we have a extra give away!

Bonus Offer – Free premium membership for your DataStage Management! Submit

your management’s email address and we will offer him access on your behalf.

• Email [email protected] subject line “Managers special”.

• Join us all at Linkedin http://tinyurl.com/DSXmembers

2

Free is Better

Free data quality and governance components in DataStage 9.1.2

Session Abstract

FOR FREEEEEE

?!?!?!?!

DataStage 9.1.2 has been updated to include new features from our data quality portfolio that will help you jumpstart your quality led Data Governance objectives.

Join this session to learn about the new "Exception Stage" and how it can be used to surface data issues to your data stewards through the Data Quality Console, a thin-client ui that is purpose-built for your data governance team ... and, best of all, ALL included with your DataStage license. We'll also explore how QualityStage and Information Analyzer then expand these capabilities to round out the data quality lifecycle.

Agenda

Data Quality Console- Integrated Stewardship Web UI for Information Server

Exception Stage- Allows DataStage and QualityStage data issues to be registered into the Data Quality Console

Operational Quality- Monitors the health of the data integration environment

Included with DataStage

Data Quality Console - Introduction

• Web-based, unified environment to monitor & assess data quality

• Proactively increase awareness of data quality throughout the enterprise

• Increase business confidence when using enterprise data for critical decision making

• Expedites error correction by helping data governance teams focus on critical data issues

InfoSphere Data Quality Console offers increased data quality

awareness, insight, and management!

Data Quality Console - Overview

Unified environment to proactively increase Data Quality awareness across multiple domains – integration, validation, cleansing, etc…

InfoSphere Data Quality Console

Information Analyzer

•data validation and monitoring

DataStage

•data integration & transformation

QualityStage

•data cleansing, standardization and matching

Roles Supporting Data Governance Community

Business (Data) Steward

• Uses the Data Quality Console dashboard to understand quality trends

• Can browse and search common data quality event information

Reviewer

• Owner of data quality events

• Investigate the record level to resolve the cause of the problem

• Can change the status of a data quality event

• Can change the owner of a data quality event

Review Manager

• All Reviewer capabilities

• Plus the following:

• Assigns owners

• Change priorities

• Delete data quality events

Data Quality Console - Terminology

• Contains basic information about an exception such as:

• Date / Time of exception

• Priority, Status, Owner of that exception

• Originating System/Project

• Viewable/searchable by Business Steward, Reviewer Manager, and Reviewer

Exception Descriptor

• Contains any detail information passed from the exception producing system

• Validation exception as defined in an Information Analyzer Rule

• Source to target validation errors as defined in Information Analyzer Exception Manager

• Accessible only by Reviewer Manager and Reviewer roles

Exception Details

Data Quality Dashboard – Home Page

Ed logs into the console and switches to the Exceptions Tab to browse the exception summaries.

Exception Descriptor Search and Filtering

To locate the exception summaries that are assigned to him, Ed opens the Owner facet and deselects the other owners

Exception Descriptor Search and Filtering

Now, Ed is able to see all of the exceptions he owns. To focus on the most recent exceptions, Ed opens the Last Modified facet.

Exception Descriptor Search and Filtering

Now Ed is able to see only the exceptions summaries he owns that were modified in the last day. He can view each exception summary in more detail by clicking on the exception summary name.

Exception Descriptor Search and Filtering

Here Ed can view additional details about the exceptions, and can review more information about exactly what the exceptions were about.

Exception Drill Down

He can also switch to the Exceptions tab to view the record-level exception instances. When finished, he returns to the exception summary list.

Exception Drill Down

To pass this information to his team, Ed decides to export the list of exception summaries

Data Quality Console – Summary

Browser based user interface for simple comsumability

Non-chargeable component

Works with Information Analyzer, Information Analyzer Exception Manger, DataStage and QualityStage

Primary information stored in the Information Server Repository with “drill-down details” available from product module sources.

© 201

3 17

Agenda

Data Quality Console- Integrated Stewardship Web UI for Information Server

Exception Stage- Allows DataStage and QualityStage data issues to be registered into the Data Quality Console

Operational Quality- Monitors the health of the data integration environment

Included with DataStage

• Collect exception data from any data integration or quality process and monitor over time

• Promote consistency in the way data stewards and business analysts can investigate data issues.

• Insert good data quality controls and governance practices into each project.

• Support clerical review for one-source and two-source variants to support business analyst review and tuning match algorithms

• Data steward dashboarding provides the same charting, searching, reporting, and monitoring as data rules, to facilitate integrated data remediation processes

Exception Stage - Introduction

Exception Stage - Types

Exception stage

• Use to specify and gather information about exceptions that are generated by all stages except the One-source Match and Two-source Match stages. The Exceptions stage receives input that you identify as exception records.

One-source Match Exception stage

• Use to specify and gather information about exceptions that are generated by One-source Match stages. The One-source Match Exceptions stage receives master record data and clerical record data from the One-source Match stage.

Two-source Match Exception stage

• Use to specify and gather information about exceptions that are generated by Two-source Match stages. The Two-source Match Exceptions stage receives clerical review records from the Two-source Match stage.

Sample data scenarios that create ETL exceptions

Database related

Primary or foreign key constraint violation

Could not delete/update (key does not exist)

Database trigger failure

Reference lookup not found

Transformation Related

Unexpected null value

Data type not appropriate

Data value not in expected range

Constraint test not successful

Sequential File

Short read

Delimiter not found

Data format/type not appropriate

Do you have a common process for how these are handled and reported to the data steward/ governance organization?

Exception Descriptor Editor

Four tabs

Categories

Implemented Data Resources

Additional Details

Output Columns

Exception Descriptor Editor - Categories

All exceptions stages have categories

At least one is required

Able to use more than one

Available in Data Quality Console and are then used as Search Criteria

Exception Descriptor Editor – Implemented Data Resources

Allow you to associate these exceptions to a specific data source (optional)

Exception Descriptor Editor – Additional Details

Allow you to associate these exceptions to a specific stages on the canvas (optional)

Exception Descriptor Editor – Additional Details (continued)

The ‘Other Properties’ flexibility means you can add:

A description of the rules used to generate the exception

A Data Governance Policy or Rule name for Business Glossary Anywhere to search

…and much more!

Exception Descriptor Editor – Output Columns

Will show in Data Quality Console as ‘Exception Details’

How It All Shows

Drill-down to detailed exception records

In this example, records

with missing first name or

age or gender.

Agenda

Data Quality Console- Integrated Stewardship Web UI for Information Server

Exception Stage- Allows DataStage and QualityStage data issues to be registered into the Data Quality Console

Operational Quality- Monitors the health of the data integration environment

Requires Information Analyzer

30

Scenarios for Operational Quality

Consider the following examples:

An organization has established a strict Service Level Agreement between business and IT that requires that the "Marketing Data Mart" is fully loaded before 6 AM each morning.

A data integration center of excellence has established a policy that in order to optimize their resource utilization, their ETL Architect will review any data integration process that takes more than one hour.

An organization that is growing through acquisition may want to have early warning indicators whether their data volume growth being loaded into the warehouse is exceeding the planned 20% increase each quarter.

Operational Quality - Definition

How do you ensure that the day to day management of any solution is meeting IT and business expectations?

Operational Quality is…

the conformance of information integration jobs or processes to established and required Service Level Agreements (SLA’s) for the Availability of information, the Production of information, the minimization of Latency in information delivery, and the expected Utilization of system resources to produce the information delivered.

Operational Quality - Categories

AVAILABILITY

• Processes or jobs that deliver new or updated information to the Line of Business (LOB) system must execute on schedule and execute cleanly (i.e. minimal processing or job errors vs. average).

• The focus is evaluation of the job execution status and log.

PRODUCTION

• Processes or jobs that deliver new or updated information to the LOB system should produce consistent levels of data based on the data consumed.

• The focus is evaluation of the data volume produced by the job.

LATENCY

• Processes or jobs that deliver new or updated information to the LOB system must execute within the target time window.

• The focus is evaluation of the timing of the job execution.

UTILIZATION

• Processes or jobs that deliver new or updated information to the LOB system must execute within prescribed resource utilization parameters.

• The focus is evaluation of the system resources used by a given job.

Sample Operational Quality Rules

Operational Quality Accelerator

Pre-built set of rules to measure Operational Quality now available on developerWorks.

Accelerator and doc intended to…

Be run as-is with minimal configuration against the DataStage and QualityStage Operations Console Database (DSODB)

Reduce the effort in identifying operational quality issues within the categories noted above

Serve as models, templates, and examples for your own additional operational rule design

http://tinyurl.com/n5owmse

Tremendous business value in an integrated platform

35

Agile IntegrationWherever your integration resides, integrate it quickly and flexibly

Business Driven GovernanceMake decisions with confidence using trusted data at the point of impact

Sustainable Quality Ensure information accuracy and quickly adapt to strategic business changes

Thank you

Free is Better

Questions and suggestions regarding presentation topics? - send to

[email protected]

Downloading the presentation

• http://www.dsxchange.net/freeisbetter.html

• Replay will be available within one day with email with details

Pricing and configuration - send to [email protected] Subject line : Pricing

For those that stay through the entire presentation, we have a extra give away!

Bonus Offer – Free premium membership for your DataStage Management! Submit

your management’s email address and we will offer him access on your behalf.

• Email [email protected] subject line “Managers special”.

• Join us all at Linkedin http://tinyurl.com/DSXmembers

37