when qualitystage is a better etl tool than datastage
Post on 02-Oct-2015
14 Views
Preview:
DESCRIPTION
TRANSCRIPT
-
Create an AccountLog In
Toolbox for IT Topics Business Intelligence Blogs
Tweet 0 0 0
When QualityStage is a better ETL tool thanDataStage
Vincent McBurney Feb 21, 2008 | Comments (13)
QualityStage remains the undiscovered gem in the Information Server suite. I would
go so far as to say it's the best stand alone single purchase of the entire suite.
Better than DataStage. And it costs the same as DataStage.
In fact there is no other single data quality tool that can match it for scope and
performance. It runs on a massively scalable parallel architecture, it's got an intuitive
GUI design across both ETL and data quality functions, it's got a huge variety of
source and targets including native parallel connectivity to Oracle, Teradata, SQL
Server and DB2. You might find a combination of products from Trillium, Informatica,
Ab Initio or Oracle that could match it but then you would be dealing with more than
one product and separate metadata repositories.
This week IBM released the QualityStage module for SERP 8.0 for Canada Post
address certification and cheaper mailout rates and a new free 900+ page RedBook
IBM WebSphere QualityStage Methodologies, Standardization, and Matching.
QualityStage 6 and 7 was a bit dodgy - it was all wizard based, it wasn't truly client-
server. More like client-MS Access-flat file-server. The best way to use
QualityStage was to get DataStage and then shoe horn QualityStage into it using the
plugin. These Access and file repositories became a problem in a multi developer
environment with lost and conflicting changes and mismatched metadata. It was a
bit like decorating a cake with paintball guns.
Why QualityStage 8 is Light Years ahead of QualityStage 7
QualityStage 8.0 changed all that. Much like Die Hard 4.0 it's a return to form for
the product with the key ingredient being the Designer.
Appearance: QualityStage 8.0 is built into the
DataStage Designer which is now awkwardly
known as the DataStage and QualityStage
Designer. This gives you the GUI data flow
style interface and it lets you get to all settings
via the GUI instead of having to dip into text
files. True client-server with job locking,
notification and release for multiple developers.
New data quality bling bling such as frequency
graphs and pass testing.
ETL: With this release IBM ripped all the ETL
steps out of QualityStage and replaced them with DataStage stages. QualityStage
8 has a subset of DataStage stages plus the data quality stages. It has all the
source and target stages, it has the most popular parallel stages of Transformer,
Lookup and Join. These are much more efficient and powerful and easier to use
than the old QualityStage functions. You get all the bonuses of DataStage: the
source connector stages, the parallel framework, the common repository.
QualityStage 8 is great standalone but most customers will probably be using it with
1Recommend Share
Your email address FOLLOW
BEGIN NOW
Work With Me
Links
Categories
Big Data GO
Tooling Around in the IBM InfoSphereby Vincent McBurney
Vincent McBurney is an IBM Champion for InformationIntegration and has been blogging for many years onInfoSphere software and ... more
Receive the latest blog posts:
Share Your Perspective
Share your professional knowledge and
experience with peers. Start a blog on Toolbox forIT today!
If you are an expert in InfoSphere software and want to workfor the biggest IBM partner in Australia and New Zealand
get in touch with me via ITToolbox or Linked In.
Steal This IM Methodology
Informatica Data Quality Blog
DataFlux Community of Experts
Data Governance Blog
dq:view - Steve Tuck on Data Quality
-
Information Analyzer, Metadata Workbench and DataStage with the shared
repository.
Shopping for an ETL and data quality tool
What should you buy when shopping for a data integration tool?
- When you buy DataStage you get every ETL stage and no Quality stages.
- When you buy QualityStage you get every Quality stage and most (but not
all) ETL stages.
- When you buy both products you get all stages.
So there is a LOT of overlap between the two products, which is great for developers
as DataStage developers know about 50% of QualityStage. When you buy both
you would expect a big discount due to the product overlap.
When choosing between DataStage and QualityStage it's all about what
QualityStage leaves out.
When to Choose DataStage
For starters there is the Slowly Changing Dimension stage that makes DataStage a
better bet for Data Warehouses and dimensional models. In a Data Warehouse
you might not want deep data quality processing - you might decide that really shit
hot data quality work belongs in the source systems and not in the DW. The Slowly
Changing Dimension stage will save you a lot of development time.
Another feature of DataStage over QualityStage is the extensibility into custom
stages and wrappers. If you are doing a lot of complex transformations and
formulas you might prefer the ability to write special stages in DataStage. If you
want to write a generic validation stage or a dynamic job that gets all its metadata
from schema files you want DataStage.
QualityStage breathes text, not air. Most of its special quality stages are about text
fields - standardising them, matching them together. If you are mainly dealing with
codes and numbers you might not need any of the quality stages.
When to Choose QualityStage
Most projects I've been on haven't needed custom stages or custom wrappers, most
projects I've been on would have been happy with QualityStage. This gives you all
the quality stage up your sleeve for no extra cost. These days I would choose
QualityStage by default and try to find a reason for switch to DataStage - or buy
both!
Data Migration is a big one for QualityStage as it lets you merge and clean your data
for your brand spanking new application.
Master Data Management has an essential requirement of QualityStage - in fact it's
so important that IBM put it onto the InfoSphere MDM Server. This server has all of
IBM's acquired MDM products on it - customer center and product center, plus future
MDM centers. It comes with QualityStage - not DataStage. IBM is the only vendor
bundling a data quality tool with an MDM tool and they can do it because
QualityStage has so much in it. It's a fully fledged ETL tool the merge, survivor
standardisation and de-duplication required when you populate your MDM data.
Plus it has SOA capabilities that go hand in hand with MDM SOA.
The new Whopping Huge QualityStage Redbook
QualityStage documentation can be a bit sparse in terms of examples and use
cases. If only we had a 900 page guide that includes real examples, screen shots
and product guidance. Oh, here's one: IBM WebSphere QualityStage
Methodologies, Standardization, and Matching.
This IBM Redbook publication documents the procedures for
implementing IBM WebSphere QualityStage and related technologies
using a typical merger/acquisition financial services business scenario.
-
It is aimed at IT architects, Information Management specialists, and
Information Integration specialists responsible for developing
This is a massive PDF - over 900 pages and 12MB. It's a book and tutorial and
research paper all in one. The authors are a team brought together from across
the globe: Nagraj Alur (IBM project leader), Alok Kumar Jha (IBM software lab
Bangalor), Barry Rosen (IBM Director in the Center for Excellence in Data
Integration) and Torben Skov (IBM Denmark). You can apply for your own IBM
residency to help write a RedBook at the residency information page.
There is a bit of Lion the Witch and the Wardrobe in the QualityStage tool. It looks
like DataStage but when you click on a data quality stage you are taken into a
fantasy world of data quality with seemingly endless depth. Even though on the
palette it looks 10 stages there are some stages that are full of custom canvases,
graphs, wizards, test forms etc.
Below are just some of the functions covered by the RedBook with some quotations
and screenshots borrowed.
Address Cleansing
The RedBook provides more detail on all the different types of address cleansing
with examples and input and output data for each. The first two cost a bundle of
extra money and the last two are free with QualityStage:
WAVES (Worldwide Address Verification and Enhancement System): corrects
topographical and spelling errors, uses probabilistic matching of an address to a
country specific reference file, covers 233 countries to the city level and 71
countries to the street level resulting in more accurate mailing.
CASS (USA), SERP (Canada Post Software Evaluation and Recognition
Program), DPID (Australia Post Delivery Point Identifier) will validate and format
data according to the standards of the postal body in each country resulting in
mail rate discounts.
MNS (Multinational Standardization Stage) looks at the text strings to work out
how to standardize the country based on things like zip codes and country codes
and separates the street and area information.
Country Rule Set is a specialised set of rules for just one country that can give
you more control than MNS - especially for customisation and overrides.
CASS, SERP and DPID offer the best certification on the market as they dig into a
reference database from the official postal authority and deliver mail discounts where
a match is found. WAVES is the next best as it also uses reference files to validate
address and it can be purchased for individual countries, regions or worldwide. If
you are not doing a lot of mailouts the standard out of the box Country Rule Set may
be enough.
As the RedBook shows the standardization are seamlessly integrated with
DataStage:
Matching
This is one of the deepest parts of the QualityStage tool as you delve into different
-
ways to match or de-duplicate records. The tool lets you organise matches as a
number of different passes as there are so many different ways to try and identify
matches. The RedBook takes you through examples:
Total statistics tab
This tab provides you with statistical data in a graphical format for all
the passes that you run.
The cumulative statistics are of value only if you test multiple passes
consecutively, in the order they appear in the match specification. The
Total
Statistics page displays the following information:
Cumulative statistics for the current runs of all passes in the match
specification.
Individual statistics for the current run of each pass.
Charts that compare the statistics for the current run of all passes.
Survivorship
Once you've done a match or a de-duplication you need to merge the records - this
is known as survivorship. Only the best parts of each record should survive at the
elimination council when all the votes get read.
During the Survive stage, IBM WebSphere QualityStage takes the
following actions:
Replaces existing data with better data from other records based
on user specified rules
Supplies missing values in one record with values from other records
on the same entity
Populates missing values in one record with values from
corresponding records which have been identified as a group in the
matching stage
Enriches existing data with external data
As usual with a QualityStage feature you can choose the standard functions or dig
deep for advanced functions:
When you configure the Survive stage, you choose simple rules that
are provided in the New Rules window or you select the Complex
-
More White Papers
13 Comments
Read 13 comments
Survive Expression to create your own custom rules. You use some
or all of the columns from the source file, add a rule to each column,
and apply the data.
The RedBook has examples for simple rules:
And complex rules:
The Wrap
You cannot show everything QualityStage can do in one blog post or even one 900
page RedBook but it will be a lot of help to new QualityStage developers to see real
examples with screenshots. There is a lot of fun developers can have with
QualityStage - it's got a lot more depth to each stage than a standard ETL function.
Disclaimer: The opinions expressed herein are my own personal opinions and do not representmy employer's view in any way.
Vincent McBurney is an IBM Information Champion for Information Integration.
Popular White Paper On This Topic
Reduce Costs with Endpoint Security
Related White Papers
What Exactly is the Right PC Hardware?
A smarter approach to CRM: An IBM perspective
Robert Rich Feb 22, 2008
Vincent,
Great post.
We totally agree with you which is why we're investing in solutions that plug into and
leverage the platform.
Robert
Blogs Discussions Research Directory
-
dialntsdf05 Jun 12, 2008
Thank's Vincent for this interesting post. Actualy working as a PM in BI, I'm looking
for an installation doc on Datastage (ETL,QualitySatge and ProfilStage). Have you
got some elements on the subject please.
Friendly
Dial
Vincent McBurney Jun 16, 2008
The good news is that IBM have published a lot of information about installing the
Information Server. Start at the Information Server Home and you'll see HTML
documentation on installing, migrating and using the Information Server.
sudha Aug 26, 2008
Excellent article
Ritu Sethi Jan 15, 2009
very Informative Article
kevindewhurst Jun 12, 2009
I know this blog post is a little old but wanted to add that DataQualityFirst has now
enhanced the capabilities of QualityStage and provided an accelerator that most
feel should be used on every QualityStage implementation! www.dataqualityfirst.com
Think QStage was powerful before? Try it with PartyQualityInsight!
Sujata Bhattacharya Jun 30, 2009
Hi Vincent
This is an excellent posting on the strength and capabilities of Quality Stage. I love
the product and have been working with the product for 10 plus years.
-Sujata
friendkak friend Jul 31, 2009
Hi, whaat are the best practices that can be implemented using QS? Please post a
few. Thanks,friend.kak@gmail.com
USER_1847760 Jan 6, 2010
It is the good post by vincent and which gives good idea about quality stage.
Malini Lakhani Sep 17, 2010
What are the pro's and con's of using Routines vs Rule sets for data validations in
QS?
balu balu Nov 17, 2011
Dear Vincent,
Very helpfull article for the starters of QS, could you please let me know the
place/path where we can find the sample files for Quality Stage
balu balu Nov 17, 2011
-
SUBMITPREVIEW
Browse all IT Blogs
Dear Vincent,
Very helpfull article for the starters of QS, could you please let me know the
place/path where we can find the sample files for Quality Stage
balu balu Nov 18, 2011
Dear Vincent,
Very helpfull article for the starters of QS, could you please let me know the
place/path where we can find the sample files for Quality Stage
Leave a Comment
Connect to this blog to be notified of new entries.
Name Your email address
You are not logged in.
Sign In to post unmoderated comments. Join the community to create your free profile today.
Want to read more from Vincent McBurney? Check out the blog archive .
Archive Category: QualityStage
Keyword Tags: qualitystage datastage data quality survivorship matching standardization qualitystage 8
Disclaimer: Blog contents express the view points of their independent authors and are not review ed for
correctness or accuracy by Toolbox for IT. Any opinions, comments, solutions or other commentary
expressed by blog authors are not endorsed or recommended by Toolbox for IT or any vendor. If you feel a
blog entry is inappropriate, click here to notify Toolbox for IT.
From Around The Web
-
We Recommend
DIY VoIP? Free Is Good for Home Use but
not Business: Here's Why
Mobile Apps, Analytics, Code Halos and
Mass Personalization
Update KB3035583 enables additional
capabilities for Windows Update
notifications in Windows 8.1 and
Windows 7 SP1
ERP Software Vendors: Don't Always Buy
Their "Seamless Integration" Sales Pitch
Some Facts about SAP Early Watch Alert
(EWA)
3D printing
From Around The Web
Recommended by
Recommended by
Collaboration Tools
Discussion Groups
BlogsWiki
Toolbox for IT
My Home
TopicsPeople
Companies
JobsWhite Paper Library
Follow Toolbox.com
Toolbox for IT onTwitter
Toolbox.com on Twitter
Toolbox.com onFacebook
Data Center
Data Center
Development
C LanguagesJava
Visual Basic
Web Design & Development
Enterprise Applications
CRMERP
PeopleSoftSAP
SCM
Siebel
Enterprise Architecture & EAI
Enterprise Architecture & EAI
Information Management
Business IntelligenceDatabase
Data Warehouse
Knowledge ManagementOracle
IT Management & StrategyEmerging Technology & Trends
IT Management & StrategyProject & Portfolio Management
Cloud ComputingCloud Computing
Networking & Infrastructure
HardwareNetworking
Communications Technology
Operating Systems
Linux
UNIXWindows
SecuritySecurity
StorageStorage
Topics on Toolbox for IT Toolbox.com
About
NewsPrivacy
Terms of Use
Work at Toolbox.comAdvertise
Contact usProvide Feedback
Help TopicsTechnical Support
PCMag Digital Group
Other Communities
Toolbox for HRToolbox for Finance
Copyright 1998-2015 Ziff Davis, LLC (Toolbox.com). All rights reserved. All product names are trademarks of their respective companies. Toolbox.com is notaffiliated with or endorsed by any company listed at this site.
top related