a practical guide to selecting data ... - helpit systems · systems cleaner data. better decisions....

19
You Are Here If you’re here, you are somewhere on the road to dealing with your organizaon’s data quality challenges and are trying to determine the best strategy to help you get there. If you’re feeling lost, then you’re in luck - The Data Quality Planning Guide is designed to help you easily understand your current challenges, establish a plan and carry out an effecve evaluaon process so you can ulmately find the data quality strategy or tool that best meets your needs. Complete with worksheets and checklists, your personalized Planning Guide includes Secons for: • Secon 1: Assessing Your Data Quality Needs • Secon 2: Defining Your Project Scope • Secon 3: Conducng an Effecve Evaluaon Like a true roadmap, feel free to print it, write on it, dog ear it, fold it, scan it, copy it, put it in a binder, add to it, share it and use it as your guide to get you from point A to data quality success. systems CLEANER DATA. BETTER DECISIONS. A Praccal Guide to Selecng Data Quality Soſtware

Upload: others

Post on 06-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

You Are HereIf you’re here, you are somewhere on the road to dealing with your organization’s data quality challenges and are trying to determine the best strategy to help you get there. If you’re feeling lost, then you’re in luck - The Data Quality Planning Guide is designed to help you easily understand your current challenges, establish a plan and carry out an effective evaluation process so you can ultimately find the data quality strategy or tool that best meets your needs. Complete with worksheets and checklists, your personalized Planning Guide includes Sections for:

• Section1:AssessingYourDataQualityNeeds• Section2:DefiningYourProjectScope• Section3:ConductinganEffectiveEvaluation

Like a true roadmap, feel free to print it, write on it, dog ear it, fold it, scan it, copy it, put it in a binder, add to it, share it and use it as your guide to get you from point A to data quality success.

sys temsCLEANER DATA. BETTER DECISIONS.

APracticalGuidetoSelectingDataQualitySoftware

Page 2: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

TableofContents

SECTION1:ASSESSINGYOURDATAQUALITYNEEDS

a. Profiling your current datab. Identifying basic system requirementsc. Understanding your data quality needs

SECTION2:DEFININGYOURPROJECTSCOPE

a. Evaluating product functionsb. Understanding processing modesc. Selecting desired product featuresd. Establishing project parameters

SECTION3:CONDUCTINGANEFFECTIVEEVALUATION

a. Creating a vendor shortlistb. Developing sample datac. Evaluating specific vendors and toolsd. Interpreting the results

Page 3: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com3

SECTION1:AssessingYourDataQualityNeeds

A)PROFILINGYOURCURRENTDATAHaving a clear view of your current data quality challenges and the processes and system structure you will have to work within is a critical first step to developing a data quality strategy that will work for your organization. There are a wide range of issues that can reside within the data, many of which may not be immediately apparent but could be the root cause of issues. Use the worksheet below to ask important questions and gather the right data to inform the next step of the process - Defining Your Project Scope.

Current Data Sources (CRM, Accounts, Legacy Systems, Lists, etc)

____________________________________________________________________________________

____________________________________________________________________________________

CurrentPointsofEntry(CRM,Website,POS,CallCenter,Batchfeeds,etc)

____________________________________________________________________________________

____________________________________________________________________________________

Averagenumberofrecordsprocessed _________________ Frequencyofprocessing _____________

� Name � Address � Phone Number � Email Address � Date of Birth � Social Security Number � Customer ID #

� Login/Password � Product or Part Numbers � Price � Transaction Data � Order Reference # � Shipping/Billing Addresses � __________________

� __________________ � __________________ � __________________ � __________________ � __________________

Standard Data Elements

� Name Mispellings � Incorrect Addresses � Duplicate Records � Missing Data � Incorrect Data � Inconsistent Data

� Unlinked Transactions � Incomplete Transactions � Garbage Data � Incorrect Formatting � Nicknames/Aliases � __________________

� __________________ � __________________ � __________________ � __________________ � __________________ � __________________

Common Data Errors

DATAPROFILE

Page 4: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com4

B)BASICSYSTEMREQUIREMENTS

There will be some critical technical and practical information that may seem tedious but will be worth your time to collect and organize. Some components will only be relevant to the technical integrator working to get the tools installed, but others may be dealbreakers for certain applications. Recruit your technical department to provide the following:

SYSTEMPROFILE

CRM/ERP Systems ________________________________________________________________

Data Warehouse Platform (e.g. SQL Server, Oracle, etc) __________________________________

Data Feed Types (I.e. Excel, CSV, XML, etc) _____________________________________________

Extract File Types (I.e. Excel, CSV, XML, etc) ____________________________________________

USERPROFILE

The user base for the data quality tools you select will impact the needs and features of the application and will also influence the decision to purchase either desktop software (suitable for non-technical users) or an integrated version (operating at the database level and more appropriate for the adminis-trator or other technical representative). Consider the following when identifying which users will be responsible for day-to-day interaction with the data quality solution:

� Marketing Department end-user � Mail-house staff � Admin level staff with limited technology training � Database administrator � Other: __________________________________________

Page 5: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com5

C)UNDERSTANDINGYOURDATAQUALITYNEEDSOnce you have established a clear view of the tangible quality issues within your current database(s), it will be important to spend time considering the business needs of the organization and how cleaner data will enable you to make better business decisions. Is the impetus behind the project to decrease the marketing spend or improve targeting? Are there service issues related to poor data quality? Is the organization undertaking data migration or warehousing initiatives that require cleansing and integration of disparate data sources? As you seek to document the goals for your evaluation, consider these suggestions for developing an accurate picture of what your organization needs from a data quality solution:

• Lookbeyondthepain. In most cases, a specific concern will be driving the urgency of the initiative but it will be well worth the effort to explore beyond the immediate pain points to other areas where data is essential. Plan to involve a cross-section of the departments including IT, marketing, finance, customer service and operations to understand the global impact that poor data quality could be having on your organization.

• Lookback,downandforward. Consider the data quality challenges you’ve had in the past, the ones you face today and the ones that have yet to come. Is a merger on the horizon? Is the company migrating to a new platform? Do you an-ticipate signficant staffing changes? Looking ahead in this way will ensure that the investment you make will have a reasonable shelf-life.

• Lookatthedatayoudon’thave. As you review the quality of the data you have, also consider what’s missing and what information would be valuable to customer service reps or the marketing department. It may exist in another data silo somewhere that just needs to be made accessible or it could require new data be collected.

• Bethecustomer. Call the Customer Service Department and put them through the paces. Sign up for marketing materi-als online. Place an order on the website. Take good notes on the places where poor data impacts your experience and then look at the data workflow through fresh eyes.

• Drawouttheworkflow. Even in small organizations, there is tremendous value in mapping out the path your data takes through your business. Where it is entered, used, changed, stored and lost. Doing this will uncover business rules that are likely impacting the data, departments with complementary needs and or places in the workflow where improvements can be made (and problems avoided).

• Thinkbigandsmall. Management and C-Level executives tend to think big. Data analysts and techical staff tend to think granularly and departmental users usually fall somewhere in the middle. Ultimately, the best solution can only be identified if you consider the global, technical and strategic business needs.

The challenges with identifying, evaluating and implementing an effective data quality solution are fairly pre-dictable but problems almost always begin with incorrect assumptions and understanding of the overall needs of the organization. In some cases, the right data quality vendor can help you move through this process but ultimately, failure to broaden the scope in this way can result in the purchase of a solution that does not meet all the requirements of the business.

Page 6: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com6

Technical Data Objectives

BUSINESSNEEDSWORkShEET

� Cleanse and standardize data as part of an existing data warehousing initiative � Support enterprise data governance, MDM or other global BI initiatives � Data enrichment & profiling � Data integration and migration � Eliminate unnecessary IT resource strain � _____________________________________________________________________ � _____________________________________________________________________

� Basic single or two-file deduplication of files � Matching of multiple records � Address validation � Front-end data capture � Batch cleansing of records � Automation of Data Quality Processes � Establish a single, 360 customer view � ______________________________ � ______________________________

� Send more targeted communications based on customer mail preferences � Reduce wasted advertising spend of inaccurate mailing lists � Improve sales and checkout process (Web, Store, Call Center) � Improve customer service with better access to global customer data � Generate more accurate view of campaign ROI � Develop a global demographic picture � Automation and enforcement of approved business rules � Remain in compliance with industry data requirements � Reduce delivery complications and associated overhead � Make informed operational and merchandising decisions � Maintain a positive brand perception � _____________________________________________________________________ � _____________________________________________________________________ � _____________________________________________________________________

Data Quality Objectives

Strategic Data Objectives

Page 7: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com7

SECTION2:DefiningYourProjectScope

A)EVALUATINGPRODUCTFUNCTIONSData Quality product suites tend to span a broad range of functions and in varying combinations. While one company may do everything on a modular scale, some may only provide one or two functions. Yet others will work with partners that can carry out complementary tasks. Without a complete understanding of these “big buckets” of features and how they apply to your business needs, it’s easy to get confused or be subject to a biased opinion on what will work for you. Below is a brief description of the main functions offered by standard data quality packages, in order of where they typically occur in a process flow:

� Standardisation Many general ‘cleansing’ functions actually fall under the category of data standardization including fix-ing misspellings, inconsistencies, transpositions and the like. Standardization also applies when moving data across columns, adding state names, zip codes or titles in places where they are missing.

� AddressValidation(Verification) Matching contact data to standard Postal Address Files (PAF) or USPS and NCOA Data to validate and update addresses is known as Address Validation (or verification). Here again, the datasets will vary by country but the same process is employed and driven by the organization’s address matching engine.

� DataEnrichment Another broad function includes expanding and enhancing your existing contact data with additional datasets. The variety of datasets is extensive and varies by region but could include names data, date of birth, length of residency, phone and fax numbers, SIC codes, geocoding data and more.

� Matching/Deduplication One of the most basic functions of data cleansing software, standard deduplication involves matching records within a file or between multiple files for merging and purging duplicate records, identifying your best customers or a multiplicity of other reasons. There are a wide range of match strategies employed in deduplication with as wide a variety of results. The critical thing to remember is that a simple count of duplicates, suppressions or records matched is essentially meaningless – it is the number of true and false matches that is significant.

� Record-Linking(SingleCustomerView) Beyond basic data cleansing is a sophisticated matching process that allows you to ‘link’ specific records to one another, specifically for the purpose of creating a single master record (or golden record). This master record would include all the relevant data for a specific contact including mail preferences, trans-actions and customer service history. This process is sometimes considered the holy grail of data cleans-ing because it generates the elusive Single Customer View (or 360 Degree View).

The functional categories above represent all of the main data quality tasks an organization would need to per-form. There are varying methods and environments in which these tasks can be carried out and a wide range of features that any vendor would provide to handle each of these tasks. If you look back at the business objectives developed in Section I-C, you will find that they align themselves with one or more of these tasks.

Page 8: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com8

B)UNDERSTANDINGPROCESSINGMODES Another consideration beyond the main functions of data cleansing software is how those functions are carried out, as not every vendor will be able to handle all the applications. The main processing modes that you should consider are:

Batch(ExistingData)Often this will be referred to as “batch data cleansing”, although this term can also be used for some of the other scenarios listed below. Here we’re talking about batch cleansing of data already in your database, to identify duplicates and incorrect or insufficient data and make appropriate corrections. This is a curative measure.

Batch(DataLoad)Batch processing is also used to match across files e.g. to match a new data feed against your existing da-tabase or data warehouse so that you can add the new records without creating new duplicates. Another example is to remove existing customers from a marketing list so that you can contact the non-customers on the list. Often, this process will be automated. Whether automated or not, this is a preventative measure.

Realtime(Interactive)Once you’ve got a clean database, it is far more effective to keep your Data Quality standards up by utilising appropriate tools at point of capture, rather than let new bad data enter the database. Here, we mean tools that work interactively, warning the person entering the data if the address is invalid or if the record they are trying to add is already on the database. Examples of real time data cleansing are address verification for a web inquiry form and duplicate prevention in a CRM system. This is a preventative measure.

Realtime(Firewall)In this mode, new records are captured but the person entering the data is not prompted to correct any problems – instead, the record is validated in real time but any errors are either corrected in the back-ground, or are logged for manual attention off-line by someone else. An example of this is a web inquiry from a visitor to your web site which is checked against your existing database in the background, so that it can be flagged as a new or existing customer. This is a preventative measure.

With this background, the objective now is to identify what your ideal solution looks like based on the busi-ness objectives and the data quality functions you will need to achieve them. Remember to think ahead to your anticipated needs, both granularly and globally. Consider larger data projects such as a planned data integration, that may impact the needs of the tools you invest in.

PROCESSING NEEDS:

Page 9: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com9

C)SELECTINGDESIREDPRODUCTFEATURESOnce you have made some of the broader decisions about your immediate business needs, the key functions you require and the methods in which you anticipate managing your data cleansing processes, your evaluation will turn to the granular features of the data quality tools you choose to evaluate. When it comes to features, we suggest putting them into two categories (or columns) - ‘Needs’ and ‘Wants’. This is a critical step because ‘Needs’ are not negotiable and will be a great way to quickly identify which applications you should put on your short list for evaluation, while ‘Wants’ are valuable for tipping the scale when two applications come close in value. In addition, ‘Wants’ also give you bargaining power in cases where features are modular.

Because there is often so much overlap in the broader data quality conversation and variation in terminology, we find it useful to discuss software features within the main functional headings previously established:

• Standardization• Address Validation • Data Enrichment• Matching/Deduplication• Record-Linking

Then the four processing modes:

• Batch (Existing Data)• Batch (Data Load)• Real time (Interactive)• Real time (Firewall)

Before diving into the actual features list broken up accordingly, here are some other items to consider when developing your list of Required Features:

• Some companies use different terminology for the same feature. Make sure you fully understand those ‘proprietary’ phrases or processes so that when it comes time to evaluate features, you can do so fairly.

• Some data quality tools are modular and will offer features or sets of features in individual components with different price points and installations. Take note of which features are/are not included in the modules you are considering.

• Consider the applications or processes you use internally that may replicate part of all of a specific feature and how you will integrate the two, or where a new and improved application or process would be the best direction to go in.

Page 10: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com10

Customer MixExperience Across All Industries and Business Needs

MatchingandDeduplicationFeatures Need WantFuzzy matchingGrading of matchesTuning of matching rulesAbility to automate matchingManual review of matchesMultiple levels of match in one passMatching on non-standard dataMatching allows for missing and inconsistent dataEffective matching out-of-the-boxCustomisable matching reportsMatching files in different formats

FEATURESWORkShEETStandardisationFeatures Need WantCorrect poorly structured and non-standard records

Identify foreign records

Flag inappropriate data in name and address

Flag garbage or incomplete data

Intelligent casing

Salutation generation from names

AddressVerificationCapabilities Need WantIntegrated verification of addresses against Postal Address Files/U

Control over updates to postcode/address

Update record with mail format address

Split address completely into component parts

DataEnrichmentCapabilities Need WantAppend geocoding data

Append consumer data

Append business data

Record-LinkingFeatures Need WantGrouping/linking of matches

Master record identification

Retain information from duplicate records

Reassign orphaned records

Real-time view across databases for inquiry and data capture

Page 11: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com11

Customer MixExperience Across All Industries and Business Needs

Batch(ExistingData) Need WantIntegrated into your database to cleanup existing data

Timely and efficient single file matching

Timely and efficient address verification

PROCESSINGMODESWORkShEET

Real-Time(Firewall) Need WantRun on individual records entering the database

Real-Time(Interactive) Need WantIntegrated into your database at point of capture

Real-time feedback on data errors

Rapid address entry using Postcode

Intelligent inquiry to find exact matches

Batch(ExistingData) Need Want

Load new batches of data

Easy to load data in different formats

Rapid matching of small batches of new data against a large master file

Automatic scheduled operation of solution

Production of standard management and exception reports

ADDITIONAL NOTES:

Page 12: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com12

D)ESTABLIShINGPROJECTPARAMETERSWhile you are knee deep in functions, features, vendor searches and the like, don’t ignore the need for some practical planning so that when you are ready to start your evaluation, there are some strategies and guidelines in place to keep both your vendors and your organization on track. Of course, it will be important to be flexible as you go through the evaluation process, especially when it comes to moving parts like budget and timeframe, but having a plan and some goal parameters in place will be priceless and may mean the difference between getting the project off the ground or letting inertia win out.

Anticipatedbudget So how do you even begin to guesstimate what it should cost you to get the right solution in place? Two things: potential savings and average range. First, do the best you can to ballpark the potential cost savings of improving your data. In some cases, the vendor can help you with this process based on a data analy-sis. Typically there are as many as 10% duplicates within a database. Assume you have a relatively modest amount of duplicates at 5% and start there. Without getting scientific, try calculating wasted advertising spend, the resources needed to handle customer shipping complaints or how much MORE money you’d make if you had more control over your marketing. Second, just take a look at the high and the low end of vendors on the shortlist you will develop in Section 3. Rather than randomly call a data quality organization and ask a price, continue through with your project, develop that shortlist and then create your price range based on the functions and features you need. Timeframe At the early stages, this will be more of an awareness than an actual goal, and it will be one of the areas, along with budget, that will evolve over the course of your evaluation. Be realistic about what you can expect here and seek input from vendors and your internal team to make sure you are not cutting yourself short. If you have internal business initiatives that will drive your goal date, such as an anticipated data migration project or large marketing initiative, you can work backwords from that date, but do make sure to budget time for all the key steps including:

• Internal planning• Searching for vendors • Initial review• Demoing the shortlist• Internal decision-making• Negotiation• Implementation and Training

ReviewandApprovalTeam This is a broader discussion in some cases as it overlaps with the developing of a Data Governance team, but the main objective is to make sure you are aware of the necessary influencers, decision-makers and budget approvers that will need to be part of this process. Knowing this early on is important and it is sometimes helpful to communicate this to your vendors so that they can work with you through the approvals process. This may mean requesting presentations to all influencers on the team, making demo software available to all the potential users, and asking the vendor to help you with documentation to help make the case for a C-Level executive.

Page 13: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com13

D)ESTABLIShINGPROJECTPARAMETERS(CONTINUED)EvaluationStrategy With this phrase, we do not mean the Evaluation itself, but instead the process you will use to evaluate the applications selected. There are several options that you can take within this process and knowing in ad-vance your strategy will help you communicate expectations and guidelines to your vendors and yet again, inform your internal staff and approvals team so that the process is orderly, streamlined and stays on track. Some considerations for this strategy include:

• ToRFPorNottoRFP: One option, preferably decided at the outset, would be to distribute a Request for Proposals (RFP/RFQ) to a shortlist of vendors to help with your evaluation. This is common for state or government bids but can also be used as a valuable tool in the commercial sector. Aside from taking up a significant chunk of time, submitting a formal bid obligates you to perform a completely fair, balanced and unbiased evalua-tion that follows a set of rules and guidelines set out in the bid. This may mean that referrals, the unex-pected and sheer gut instinct cannot play a part, which ultimately may mean you do not get to choose your preferred vendor.

• DemoDataorRealData: Knowing this ahead of time as part of your strategy is critical because this will likely be the first question asked of you when making contact with a vendor. While we will always suggest that you evaluate a solu-tion on your own data, in some cases this may not be 100% necessary or possible right away. You may be in the midst of a data migration project or could have such basic needs, such as strict address vali-dation, that preparing your own data is not necessary. In either event, you should plan for this step in advance and prepare your sample data accordingly to do a thorough and efficient test of the software.

• WhoisDrivingtheShip? Business or Technology? This is the big question the Data Quality industry as a whole has been asking lately and it is relevant here because it will determine the shape of your evaluation. If you are from a business department but after identifiying your requirements, decide that the organization is likely to take an integrated approach, it may be best to hand off the lead role to a technology representative (or vice versa). Here again, the key is to ask the questions before starting the evaluation because knowing your strategy at the outset is half the battle.

AppropriateDocumentation&Files Lastly, there are some critical documents that you should plan to gather before and during this process, some of which this Guide will help you to plan for. A brief list includes:

• Request for Proposal (if appropriate) using the functional and feature requirements outlined here• Required Features List (with columns outlined for your individual shortlist vendors)• Demo Data• Review/approval forms for the members of your team• Budget Spreadsheet

Page 14: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com14

3.ConductinganEffectiveEvaluation

A)CREATINGYOURShORTLISTThis sounds like an easy task but in reality, the current information quality industry is saturated with White Papers, Webinars, YouTube Channels and the like - all with different messages, focus areas, product features and terminology. Making sense of it can be a challenge to even the most DQ-savvy buyer but if you’ve been follow-ing the steps up until this point, you should be able to easily employ some of the following best practices to narrow down a reasonable short list that is optimal for evaluation.

• FindingtheVendors. Some of this may be obvious but there are a few tricks to digging up the key vendors within the indus-try. Google is certainly your first good bet but remember to use varying search terms because different vendors use different terminology interchangeably. While you’re surfing, don’t just look for vendor sites but user groups, blogs and analyst pages as well, because these may reveal vendors that are not coming up in the searches.

• FunctionFirst. Once you have a name in hand, start your initial review by going back to your Functional Requirements and choosing vendors that can fill those needs. Don’t worry at this point about finding a vendor that does everything under one roof - that can be a deciding factor later on. For now, concentrate on choos-ing those that provide the majority of the Functional Requirements you are looking for.

• FeaturesSecond. Once you have your big list of vendors that are in your functional ballpark, start narrowing down your list based on the specific features within each category. Now is the time to remember your Needs vs Wants and abandon anyone who truly cannot service the basic necessities.

• Cross-ReferencetheBuzz. While industry hype is not the best way to choose the perfect vendor, it is best used to eliminate com-panies from the competition based on awful press or truly negative customer reviews. Keep in mind that sometimes the very best product for the job may not be the one with the brightest lights. This is the place where you simply want to rule out companies based on clear signs that they cannot provide service.

• AddYourselftotheShortlist. We don’t recommend this step because it’s a good option, but because you are likely to consider this anyway. At some point in the process, someone will suggest internally that you already have the re-sources or an initial price point will scare you into asking - ‘do we really need this anyway?’ We suggest looking at this step proactively, as though you are one of the vendors on your short list. In this way, you can truly evaluate your potential to carry out data quality initiatives internally.

Page 15: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com15

B)DEVELOPINGYOURSAMPLEDATA

Thefirstwordofadvice-userealdata.

Many software trials will come preinstalled with sample or demo data designed primarily to showcase the fea-tures of the software. While this sample data can give you examples of generic match results, they will not be a clear reflection of your match results. This is why it is best to run an evaluation of the software on your own data whenever possible. Using the guidelines below, we suggest ‘identifying’ a real dataset that is representative of the challenges you will typically see within your actual database. That dataset will tell you if the software can find your more challenging matches, and how well it can do that.

For fuzzy matching features, you may like to consider whether the data that you test with includes these situations:

DON’T... ...create a “fake” dataset from scratch. This is not advisable because it could include unnatural scenarios that may present unreal challenges to the software, which are of no relevance to its fitness of purpose for your real data.

• phonetic matches (e.g. Naughton and Norton)• reading errors (e.g. Horton and Norton)• typing errors (e.g. Notron, Noron, Nortopn and Norton)• one record has title and initial and the other has first name with no title (e.g. Mr J Smith and John Smith)• one record has missing name elements (e.g. John Smith and Mr J R Smith)• names are reversed (e.g. John Smith and Smith, John)• one record has missing address elements (e.g. one record has the village or house name and the other address just has the street number or town)• one record has the full postal code and the other a partial postal code or no postal code

When matching company names data, consider including the following challenges:

• acronyms e.g. IBM, I B M, I.B.M., International Business Machines• one record has missing name elements e.g.

1. The Crescent Hotel, Crescent Hotel2. Breeze Ltd, Breeze3. Deloitte & Touche, Deloitte, Deloittes.

Page 16: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com16

You should also ensure that you have groups of records where the data that matches exactly, varies for pairs within the group. For example:

If you don’t have these scenarios all represented, you can doctor your real data to create them, as long as you start with real records that are as close as possible to the test cases and make one or at the most two changes to each record. In the real world, matching records will have something in common – not every field will be slightly different.

With regard to size, it’s better to work with a reasonable sample of your data than a whole database or file, other-wise the mass of information runs the risk of obscuring important details and test runs take longer than they need to. We recommend that you take two selections from your data – one for a specific postal code or geographic area, and one (if possible) an alphabetical range by last name. Join these selections together and then eliminate all the exact matches – if you can’t do this easily, one of the solutions that you’re evaluating can probably do it for you.

Ultimately, you should have a reasonable size sample without so many obvious matches, which should contain a reasonable number of fuzzier matches (e.g. matches where the first character of the postal code or last name is different between two records that otherwise match, matches with phonetic variations of last name, etc.)

URN Name Email Telephone101 John Smith [email protected]

144 John Smith [email protected] 211-456-8352

298 John Smith 211-456-8352

144 John Smith [email protected] 211-456-8352

URN Name Email Telephone101 Juan Marcos [email protected] 646-498-3055

144 Juan Marcos [email protected] 211-456-8352

298 Juan Marcos [email protected] 646-498-3055

144 Juan Marcos [email protected] 211-456-8352

Therearetwoclustershere,onecontainingthreerecordswiththesameemailaddressandanotheronecontainingthreerecordswiththesamephonenumber.

Inbothoftheseexamples,clustersbasedonemailaddressandtheclus-tersbasedonphonenumbershouldallbegroupedintoonesetbythematchingsoftware.

B)DEVELOPINGYOURSAMPLEDATA(CONTINUED)

Page 17: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com17

C)EVALUATINGSPECIFICVENDORSANDTOOLSIf you made it past all the due diligence it takes to get to this point, you are in a great position to conduct an effective evaluation of your data quality vendor shortlist. It means you understand your current data chal-lenges, you have documented your basic system, made decisions on the functions and features you require, identified a relevant shortlist of vendors and have established all the project parameters and strategy you need to guide you through the process. It has all been preparation for this stage. So you are probably asking yourself: now what?

When it comes to actually performing the evaluation, you can either download a free trial and evaluate the software yourself or engage the vendor to walk you through the process. While it may seem tempting to conduct an initial review yourself, it is not advisable because the best data quality software has a plethora of features and options designed to help you deliver the best possible matches. The only way to truly identify these options and learn how to fine tune them to meet your individual data quality objectives is to engage a knowledgable salesperson and have them walk you through the software. During this process, you will also likely be introduced to members of the technical support or integration teams which will provide you further exposure to the way the company works and the level of support they can provide you with the matching process. So the bottom line is to engage a company representative early and often during your evaluation to properly determine the software’s true matching capabilities.

Vendor Tool(s) Rep ContactInfo

Page 18: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

Brought to you by helpIT systems | www.helpIT.com18

D)INTERPRETINGThERESULTSWhen it comes to evaluating the results, remember that a simple count of duplicates, suppressions or re-cords matched to your Postal Address File (PAF) or USPS Data is meaningless – it is the number of true and false matches that is significant, so it is important to be able to view all the matches found. When dedup-ing, suppressing or matching across files, a good way of comparing results from two systems is as follows:

1. Remove all the matches from the file to be cleaned using system A.2. Perform the same level of matching using system B and see what matches system B finds in the

supposedly “clean” file. 3. Review each match (or a reasonable proportion) found by system B but not found by system A

and count the number of true matches, the number of false matches and the number that can not be classed objectively as definitely true or definitely false.

4. Repeat this process the other way round i.e. clean the raw file using system B first and then see what matches system A finds in the “clean” file.

5. Count the number of true, false and debatable matches in this file.6. Compare the counts in the two “clean” files.

It may be that your business requirement places more emphasis on a high match rate and that a certain level of false matches is acceptable. Alternatively, keeping the false match count to a minimum or even eliminating false matches entirely may be the overriding objective. Of course, if one system wins whichev-er criteria you use, the choice is easy. If not, and one system finds more true matches but also more false matches than the other, you should be able to experiment with the matching options to try and reduce the number of false matches, and then repeat the process outlined above. It is likely that you will need to involve the vendor’s support team to time the matching, which also gives you the opportunity to see just how effective the support is.

When matching to a PAF file for address verification, you can adopt a similar approach, but checking the results is more time consuming, as you need some independent way of looking up the addresses that have been matched by one system but not the other – the postal authority usually provides an online lookup facility, but sometimes the number of daily lookups is limited.

One final trick concerns evaluation using the demo data supplied with each system – you would expect the system to work well on its own demo data files, but you could also try matching the demo data file from system A in system B and vice versa. These tests are much easier to conduct when you have reduced your shortlist to two solutions.

ADDITIONAL NOTES:

Page 19: A Practical Guide to Selecting Data ... - helpIT systems · systems CLEANER DATA. BETTER DECISIONS. A Practical Guide to Selecting Data Quality Software. Table of Contents SECTION

UShEADQUARTERS

helpIT systems inc.51 Bedford RoadSuite 9Katonah, New York 10536

Tel: (866) 332.7132Fax: (914) 232.1429Email: [email protected]: (866) 628.2448Email: [email protected]

UK HEADQUARTERS

helpIT systems ltd.15-17 The CrescentLEATHERHEADKT22 8DY

Tel: +44 (0) 1372 360070Fax: +44 (0) 1372 360081Email: [email protected]: +44 (0) 1372 225904Email: [email protected]

Registered in England - Company No. 02007292 - VAT No. 564228340/

sys temsCLEANER DATA. BETTER DECISIONS.

GetCleanerData.MakeBetterDecisions.

Today, more than ever, good business decisions depend on accurate

data. Bad data means customer service suffers, opportunities are

missed and marketing spend is wasted. Clean and accurate data gives

you the advantage of knowing your customers so you can service them

well, market to them appropriately and drive greater sales.

Unfortunately, most data quality initiatives are limited to simply

checking the boxes. That is, they make shallow improvements to the

data but never actually offer any genuine business value.

WelcometohelpITsystems.

Armed with unparallelled intelligent match technology, a deeply

sophisticated knowledgebase and streamlined address validation

driving both front-end and batch cleansing solutions, helpIT systems

goes beyond just checking the boxes. For more than 20 years we’ve

been helping customers trust their data so they can use it to strengthen

their business. Isn’t that what data quality is all about?

Don’tjustchecktheboxes.Demandmore.Expectmore.