creating and working with ts quality projectsdavidhoyle.com/samples/ts_quality_userguide.pdfopening...

Creating and Working with TS Quality Projects

Version 10.5 October 2006

Opening this package indicates your acceptance of the terms and conditions of the Harte-Hanks license agreement. The customer acknowledges and agrees that (a) the System andall related documentation are confidential trade secrets of Harte-Hanks or Harte-Hankslicensors and (b) title to and intellectual property rights in the System and relateddocumentation (including without limitation all copyright, trademark, trade secret and patentrights) are and shall remain the confidential proprietary property and information of Harte-Hanks and Harte-Hanks licensors.

The customer shall use the system only in accordance with this Agreement. The customershall not disclose, copy, or reproduce any portion of the system or documentation in any formto any third person without the prior written consent of Harte-Hanks, nor allow third partiesto do the same. The customer shall keep the System and all confidential information in thestrictest confidence.

Creating and Working with TS Quality Projects October 2006

Trillium Software System® is a registered trademark of Harte-Hanks. UNIX is a registeredtrademark of UNIX System Labs, Inc. AIX, AS/400, CICS, OS/390, RS-6000, and NUMA-Q areregistered trademarks of International Business Machines Corporation. HP-UX is a registeredtrademark of Hewlett-Packard Company. Windows NT, Windows 98, Windows 2000, WindowsXP are registered trademarks of Microsoft Corporation. Solaris and Java are registeredtrademarks of Sun Microsystems. Unisys is a registered trademark of Unisys Corporation. ZIPCode, ZIP +4 and CASS are registered trademarks of the U.S. Postal Service. PAF is aregistered trademark of the Royal Mail. InstallShield is a registered trademark and servicemark of InstallShield Corporation. All other brand names and products are trademarks orregistered trademarks of their respective companies.

Copyright © Trillium Software® a division of Harte-Hanks, Inc. 2006 All rights reserved.

TOC-1

Creating and Working with TS Quality ProjectsCHAPTER 1 Introduction................................................................1-1

Sample Project......................................................... 1-2

CHAPTER 2 Working with a Project ............................................2-1

Types of Projects...................................................... 2-3Using the Control Center ........................................... 2-4

Start the Control Center .......................................... 2-4Set Up the Control Center........................................ 2-5

Creating a Project..................................................... 2-9Understanding the Control Center Features .................2-16

Project Panel ........................................................2-16Project Viewer.......................................................2-17Step Viewer ..........................................................2-19

Using the Data Flow Architect....................................2-20Graphics View .......................................................2-21List View ..............................................................2-26

Using a Project Step ................................................2-28The Data Dictionary Language (DDL) .........................2-33

Methods of Creating a DDL .....................................2-33Creating a DDL Using the DDL Editor........................2-36Creating a DDL in a Text Editor ...............................2-39Type Keyword .......................................................2-42

CHAPTER 3 Investigating Your Data ..........................................3-1

View Data Using the Data Browser.............................. 3-3View DDLs Using the DDL Editor ................................. 3-9Analyze Data Using TS Discovery...............................3-12Identify the Problems with Data ................................3-13

CHAPTER 4 Using the Global Steps .............................................4-1

Using the Global Data Router ..................................... 4-3Input and Output Settings ....................................... 4-3Process Settings..................................................... 4-5Run the Global Data Router and View Results ............4-11

TOC-2

CHAPTER 5 Cleansing Your Data .................................................5-1

Using the Transformer .............................................. 5-3Input and Output Settings ....................................... 5-3Using Multiple Input Files to Create an Output DDL ..... 5-7Process Settings..................................................... 5-9

Conditionals............................................................5-21Syntax .................................................................5-21Operators in Conditional Statements ........................5-26Operators for Asian Characters................................5-28Build a Conditional Statement .................................5-35Select or Bypass Records........................................5-37Additional Settings.................................................5-38Run the Transformer and View Results .....................5-39

CHAPTER 6 Standardizing Your Data .........................................6-1

Using the Customer Data Parser ................................. 6-3Understanding Parsing Logic Flow ............................... 6-4

How the Customer Data Parser Identifies Business Names6-5CDP Parsing Process ............................................... 6-5Customer Data Parser for China, Japan, Korea, and Taiwan6-8PREPOS ...............................................................6-12Input and Output Settings ......................................6-14Process Settings....................................................6-16Additional Settings.................................................6-22Run the Customer Data Parser and View Results........6-25Analyze Results.....................................................6-25Statistics File ........................................................6-26

Using the Business Data Parser .................................6-27BDP Parsing Process ..............................................6-27Additional Settings.................................................6-34Run the Business Data Parser and View Results .........6-36

CHAPTER 7 Tuning the Parsing Rules ........................................7-1

Understanding the Parser Definitions Tables................. 7-3Standard and User Definitions Tables ........................ 7-3


TOC-3

Syntax of Definitions............................................... 7-4Synonym..............................................................7-12Special Entries ......................................................7-14

Conventions in Parsing Customization ........................7-21How to Customize the Parser Definition Tables for Japan.. 7-23

Clue Table ............................................................7-23Name Tables.........................................................7-26jp_bnp_name.txt...................................................7-27jp_bnp_name_h.txt ...............................................7-28jp_pnp_name.txt...................................................7-29

Using the Parser Customization Editor ........................7-31View a Standard Definitions Table............................7-31View and Correct City Problems...............................7-33View and Correct Pattern Problems ..........................7-37Save the Entries....................................................7-40Re-Run Customer Data Parser .................................7-40View Errors in Parsing Customization........................7-40

CHAPTER 8 Analyzing Single Data ..............................................8-1

Using the TS Quality Analyzer .................................... 8-3Start the TS Quality Analyzer ................................... 8-3Data Entry and Cleansing ........................................ 8-4Advanced Details.................................................... 8-7Matching ............................................................... 8-8Organize Database ................................................8-10

CHAPTER 9 Enriching Your Data..................................................9-1

Sorting for the Postal Matcher .................................. 9-2Input and Output Settings ....................................... 9-2Process Settings..................................................... 9-5Additional Settings.................................................. 9-6Run the Sorting Utility and Check Results .................. 9-8

Using the Postal Matchers.......................................... 9-9Input and Output Settings ....................................... 9-9Process Settings....................................................9-11Additional Settings.................................................9-13Match Levels.........................................................9-15Dual Address Information .......................................9-17

TOC-4

Browsing the Postal Directory....................................9-20City Level Directory ...............................................9-20Street Level Directory ............................................9-21Street Details........................................................9-22

CHAPTER 10 Linking Your Data.................................................... 10-1

Using the Window Key Generator...............................10-3Input and Output Settings ......................................10-4Process Settings....................................................10-5Run the Window Key Generator and View Results ......10-9

Sorting the Record by the Window Key..................... 10-10Input and Output Settings .................................... 10-10Process Settings.................................................. 10-11Run the Sorting Utility and Check Results ............... 10-12

Using Relationship Linker........................................ 10-13Linking Examples .................................................. 10-14

Window Linking................................................... 10-18Run the Relationship Linker and View Results .......... 10-23Reference Linking ................................................ 10-24Run the Relationship Linker and View Results .......... 10-29

CHAPTER 11 Tuning the Linking Rules....................................... 11-1

Using the Relationship Linker Results Analyzer ............11-3View the Linking Results.........................................11-3Edit Fields to Display..............................................11-7Save Fields to Display ............................................11-8View Records in a Range ........................................11-9

Using the Relationship Linker Rule Editor .................. 11-12View the Linking Rules ......................................... 11-12Customize the Field and Pattern Lists ..................... 11-15Re-Run the Relationship Linker and View Results ..... 11-19

Using the Data Comparison Calculator...................... 11-21

CHAPTER 12 Selecting the Best Record ..................................... 12-1

Using the Create Common Utility ...............................12-3Input and Output Settings ......................................12-4Process Settings....................................................12-6Additional Settings............................................... 12-10


TOC-5

Run the Create Common and View Results.............. 12-11Create Common Decision Routines........................... 12-12

Decision Routine Selections for a Single Field .......... 12-14

CHAPTER 13 Manipulating Your Data ......................................... 13-1

Using the Data Reconstructor....................................13-3Rules File .............................................................13-3Input and Output Settings .................................... 13-20Settings for the Data Reconstructor ....................... 13-22Setting the Rules File ........................................... 13-22Setting the Use Rule ............................................ 13-22Additional Settings............................................... 13-24Run Data Reconstruction and View Results.............. 13-26

Bringing the Data Together..................................... 13-27Add a Global Transformer step .............................. 13-27Input and Output Settings .................................... 13-29Process Settings.................................................. 13-31Run Transformer and View Results......................... 13-32

CHAPTER 14 Packaging Projects ................................................. 14-1

Batch Script............................................................14-3Create a Script......................................................14-3Edit a Script..........................................................14-4Run a Script .........................................................14-5Create Multiple Batch Files......................................14-6

Exporting/Importing Projects ....................................14-7Export Projects......................................................14-8Import Projects .....................................................14-9Import Projects from Windows to UNIX................... 14-10

Real-Time Processing .............................................14-11The Director ....................................................... 14-11Moving From Batch to Real-Time ........................... 14-14Linking Single Record Using the TS Quality Analyzer.14-14

CHAPTER 15 Working from the Command Line ....................... 15-1

Executing TS Quality Modules....................................15-3Syntax .................................................................15-3Program Names ....................................................15-4

TOC-6

CHAPTER 16 Working with the TS Quality Utilities ................ 16-1

File Display Utility....................................................16-3Input and Output Settings ......................................16-3Outer Key and Inner Key ........................................16-3Title and Delimiters................................................16-5Field Settings........................................................16-8

File Update Utility .................................................. 16-10Match Keys and Fields.......................................... 16-10Input and Output Settings .................................... 16-15Match Key Settings.............................................. 16-15Transaction Output Settings.................................. 16-16

Frequency Count Utility .......................................... 16-17Input and Output Settings .................................... 16-18Count Settings .................................................... 16-18

Merge Split Utility.................................................. 16-19Input and Output Settings .................................... 16-19Using Multiple Input Files to Create an Output DDL ..16-19Merge Files ......................................................... 16-21Split a File .......................................................... 16-23Merge and Split Files............................................ 16-25

Resolve Utility....................................................... 16-27Input and Output Settings .................................... 16-28Link Field ........................................................... 16-29

Set Selection Utility ............................................... 16-30Input and Output Settings .................................... 16-30Select Records .................................................... 16-31

Sort Utility............................................................ 16-33

CHAPTER 17 Customizing the Control Center .......................... 17-1

Changing the Control Center Display Settings..............17-3

APPENDIX A The Data Dictionary Language and DDL Types .A-1

The Data Dictionary Language.................................... A-2Data Dictionary Language (DDL) Types ....................... A-3

Encoding (Code Page) ............................................. A-3Trillium Types ........................................................ A-6Date Format .......................................................... A-8


TOC-7

CLASS Keyword ....................................................A-10

APPENDIX B Parser Review Code ..................................................B-1

Parser Results.......................................................... B-2Parser Completion Codes (CDP/BDP)......................... B-2Customer Data Parser Review Code/Review Groups .... B-3Review Group Hierarchy .......................................... B-8Business Data Parser Review Code...........................B-11Customer Data Parser Review Codes/Review Groups for Asia-Pacific Countries.............................................B-12

TOC-8


CHAPTER 1 Introduction

This book is intended for users who wish to learn how to use TS Quality. It provides step-by-step instructions to set up a project and process data. The book assumes that the users have installed TS Quality Server, TS Quality Client, TS Quality Country Template Projects and Postal Tables according to Installing TS Quality, and read the introductory book, Getting Started with TS Quality.

This book covers the basic functions of TS Quality, but users should also consult companion materials, such as TS Quality Reference Guide and TS Quality Online Help to utilize the full capabilities of TS Quality.

See Getting Started with TS Quality for the complete list of TS Quality documentation and materials.

Introduction

1-2 Sample Project

Sample Project

In this book, a global sample project (TMT project) is used to illustrate various TS Quality functions. The TMT (TrilMedTech) project contains customer data from the United States, United Kingdom, Canada and Germany. The record data consists of typical business database fields:

• Customer business name• Contact name• Phone number• Address information• Product information• Various dates• Account representative• Account status• Customer identification numbers

The goal of this sample project is to create a consolidated customer view and to eliminate the poor data quality and redundancy in the sample data. Through this project, you will complete several tasks:

analyze the data and identify issues

cleanse and standardize data elements

enrich address information and identify duplicate records

link duplicate records

package processes and create a batch file

At the conclusion of this initial batch process, the output file will contain one contact name per business location.


2-1

CHAPTER 2 Working with a Project

Working with a Project

2-2

In order to process your data, we strongly recommend that you first create a project. A project includes a set of steps (core modules) for centralized access and allows you to manage data processing tasks easily. Projects are created in the Control Center, the graphical user interface. Within a project, you can run processes, view data, create and edit DDLs, modify settings, analyze output and tune the overall process. Projects within the Control Center are mainly used to create and test batch process flows for later use in a production environment.

This chapter focuses on these topics:

Project types

Starting and setting up the Control Center

Creating and working with projects

For an overview of the TS Quality Control Center and projects, refer to Getting Started with TS Quality.


Types of Projects 2-3

Types of Projects

A project is a combination of one or more modules and tasks that process a particular set of data in a job flow. Each module in a project is called a step. A project includes all required data files, DDL files, settings files, output, statistics files, user-defined tables and batch scripts for modules. Within a project, you can run the entire job flow, from the Transformer to the Relationship Linker, or only part of the flow.

There are two types of projects:

Standard Project - a basic project which includes predefined modules

Custom Project - a complex project for advanced users

The Create New Project Wizard will guide you through creating a project. You will be prompted to select a type at the beginning of the Wizard. Both standard and custom projects may later be modified by adding and deleting steps, or can be customized by adding user-defined components.


2-4 Using the Control Center

Using the Control Center

Start the Control Center

To Start the Control Center

1. Double-click the TS Quality v10.5 Server icon on the desktop or select Start, Programs, Trillium Software System, TS Quality, v10.5, Start Server.

2. Double-click the TS Quality v10.5 Control Center icon on the desktop or select Start, Programs, Trillium Software System, TS Quality, v10.5, Control Center. This starts the TS Quality Client and the Control Center.

3. The Start Up screen appears.

Figure 2.1 Start Up Screen

Make sure that TS Quality Server has been configured correctly. Refer to Getting Started with TS Quality for server configuration.


Set Up the Control Center 2-5

The Control Center’s main window is behind the Start Up screen. The main window contains tool bars and a tools palette to give you quick access to the most commonly used tools and applications.

Refer to Getting Started with TS Quality for an overview of Control Center Tools on the Tools Palette.

Figure 2.2 Control Center Main Window

Set Up the Control Center When you start the Control Center for the first time, you should set up General Preferences for the basic Control Center settings.

General Preferences include several options:

Tools Palette

Work Area

Tool Bar

Main Menu


2-6 Set Up the Control Center

Startup options

Default project directories

Project input staging area

Location of the Online Help directory and the Web browser path

Editors and statistics viewer programs

TS Discovery launch directory

Text and color used within the Control Center

To set up General Preferences

1. Select Setup from the main menu. 2. Select Preferences. There are two tabs, General and

Display.

Figure 2.3 General Preference


Set Up the Control Center 2-7

3. The General tab allows you to decide which applications or functions to launch upon starting the Control Center. Select or specify options based on the table below:

Option Description

On Startup Determines how the Control Center handles projects upon startup.

Open the last project - The project that you were working on in your previous session automatically opens upon startup.Default - No projects are launched upon startup.

Other Startup Options

Select one or more of these check boxes to determine which applications or features will be displayed upon startup.

Show Session Viewer- The Session Viewer opens upon startup. Show Toolbar - The Toolbar is displayed upon startup. Show Tool Palette - The Tool Palette is displayed upon startup.Show Startup Page - The Start Up screen is displayed upon startup. Automatically Backup Projects (.prj only) - When checked, a backup file of your .PRJ file is automatically created.

Checking this option does NOT back up your entire project. It simply creates a copy of your main .PRJ file.

Default Project Directory

Enter the directory where project and step files will be stored.Default: C:\TrilliumSoftware\tsq10r5s\mynewdir

Input Staging Directory

Enter the directory where input data files for the project or step will be stored.Default: C:\TrilliumSoftware\tsq10r5p

Help Directory Enter the directory where Help files are stored.Default: C:\TrilliumSoftware\tsq10r5c\doc

My Editor Enter the path and executable file of your text editor to display and edit text files within the Control Center.

My Statistics Viewer Enter the path and executable file of the application used to display statistics files within the Control Center.


2-8 Set Up the Control Center

4. Click OK.

See “Changing the Control Center Display Settings” on page 17-3 for display settings.

To get HelpOnce you have specified a web browser in Control Center Preferences, you may view the online help manuals.

1. Select Setup, Preferences. 2. On the General tab, set My browser to:

C:\Program Files\Internet Explorer\IEXPLORE.EXE 3. Select OK to close the Preferences window. 4. From the main menu, select Help. The TS Quality option

opens the home page of the TS Quality documentation set. 5. The TS Quality Control Center Help opens the documentation

for the Control Center. 6. TS Quality on the Web will automatically connect you to the

trilliumsoftware.com website if you are connected to a network. Once on the website, you can access technical support, software upgrades and downloads, educational offerings and more.

7. Program-specific help is also available on the Advanced tab of each program step.

If you are a new user, be sure to register on the www.trilliumsoftware.com website for a wealth of technical user information and support.

My browser Enter the path and executable file of your Internet browser, used to display on-line documentation under the Help Menu. Example: C:\Program Files\Internet Explorer\IEXPLORE.EXE In order to access the online manuals, you must specify a

default web browser.

Discovery Launch Directory

Enter the directory path used to launch TS Discovery.


Creating a Project 2-9

Creating a Project

The Control Center allows you to create a standard or a custom project. The standard project option is recommended for new users and may later be modified to meet your specific data cleansing needs. The custom project option is used to create a more complex project and is recommended for more experienced users. The Project Wizard will guide you through the project creation process.

In order to create a TS Quality project you will need certain information:

The name and location of your input data file(s). The input data file(s) should be either: • a fixed field file or • a delimited file

The name and location of your input Data Dictionary Language (DDL) file(s). The input DDL file(s) should be in either:• XML format (.ddx) or• text format (.ddt)

See “The Data Dictionary Language (DDL)” on page 2-33 for detailed information on DDL files.

To create a project, follow this process:

Select a project type

Specify project settings

Specify input data and input DDL files

Set up name and address format

Review the project summary


2-10 Creating a Project

To select a project type

1. From the main menu select File, New Project. The Create New Project Wizard appears.

2. On the Choose Project Type window, select either the Create a standard project option or the Create a custom project option.

3. Select Next. 4. In the Choose Project Option window, select one of the

following options:

To specify project settings

1. Select Next. On the Specify Project Settings window, configure the following settings:

Option Description

Standardize Identifies, verifies and normalizes data.

Standardize and Enrich

Identifies, verifies and normalizes data. Improves data using the Postal Matchers.

Standardize, Enrich and Link

Identifies, verifies and normalizes data. Improves data using the Postal Matchers. Groups data by identifying relationships and by applying specific linking rules.

Other Custom Process

Include separate components comprising the options above.

Settings Description

Project Name Name of the project.

Project Directory Path

Project location on the server. You can create a project anywhere but it must

be located on the server where the TS Quality Server application has been installed.

Single or Multiple Country Project

Specify whether the project contains data from one or multiple countries.



2. Select Next.

Figure 2.4 Project SettingsMulti-country project

3. If you select Multiple-Country (Global) Project, follow these steps. If not, go to step 4.

The Select Global Project Countries window will indicate what country template projects are installed on the server. Select all countries you are using, and Add them to the box on the right. Use the CTRL key to make multiple selections.

Specify if you are using a single input file or multiple country input files. If you are using multiple country input files, you must select define input files now or define input files later. If you define input files now,

Input File’s Country of Origin

Select a country for your input data. If you selected Multiple-Country(Global) Project

in the option above, this option is not available.



provide the input file name, format, and DDL in the Specify Multiple Inputs window. Click Next.

Figure 2.5 Select Global Project

To specify input data and input DDL

1. On the Specify Input Data and Format window, use the File Chooser to select the input data file name.

2. Specify whether the input file format is Fixed or Delimited. If the file is delimited, select the delimiter from the drop-down list and define whether the input file has a header or not.

If you don’t have a DDL file for your delimited input, it will be created automatically using the header as field names.

Valid delimiters are Tab, Space, Semicolon, Comma, and Pipe. Characters other than those listed must be enclosed by quotation marks.



3. Use File Chooser to enter the Data Dictionary Language (DDL) file name. Select Next.

Figure 2.6 Specify Input Data and Format 4. If you are creating a custom project, the Select Project

Components window now appears. Select the desired project components. The order of your selections will determine the sequence of steps in your project. Select Next. (If you are creating a standard project, skip this step and go to the next step.)

To set up name and address format

1. At the end of the previous step, the Set Up Name and Address Format window appears. Here, you can drag and drop name and address field names onto the Name and Address Palette as in a typical mailing label format. The Dictionary Field Names box shows all of the field names

If you are using a delimited file as input in the Wizard, the subsequent input files and all output files in the project become fixed field files.



found on the input DDL file. Select the field and drag it to the Name and Address Palette. The actual record data is displayed in the Preview Name Address area.

After dragging selected fields to the palette, you can make multiple fields single-line by editing them in the palette.

Figure 2.7 Set Up Name and Address Format 2. Review your records in the specified format, using the View

Records buttons. Click Apply to accept the data format.

To review the summary of project

1. Select Next. The Summary window indicates the options that you have selected for this project.

2. If you need to change these options, click Back to return to the appropriate window. Click Finish to accept these settings and create the new project.

3. The status bar at the bottom of the Control Center will indicate that it is copying the appropriate country templates and building the project components.

The Apply button must be selected for the Control Center to accept your desired name and address format.



4. When the process is complete, the Data Flow Architect area of the Control Center will be populated with the new project.

Figure 2.8 Data Flow Architect - New Project


2-16 Understanding the Control Center Features

Understanding the Control Center Features

The Control Center consists of three layers:

Project Panel

Project Viewer

Step Viewer

Project PanelThe Project Panel is displayed when the Control Center is opened. Existing projects appear as a suitcase icon labeled with the user’s hostname and the project name.

To explore the Project Panel 1. Click to close an open project and to view the Project

Panel.

Figure 2.9 Project Panel 2. Right-click the project icon. From this contextual menu you

can Open, Delete or view the project’s Properties. Select Properties. There are two tabs, General and Contents.

The General tab displays basic information about the project such as Name, Type, Owner, Version, Creation Date, Last Modified, Last Executed, and Location.

Project Panel


Project Viewer 2-17

The Contents tab displays content-related information about the project such as Country List, Module List and Comments.

Figure 2.10 Project Properties

Project ViewerThe Project Viewer displays all modules or steps within a project.

To explore the Project Viewer

1. Double-click the project icon. The Project Viewer opens. 2. The Project Viewer contains three views:

Right-click and select Properties


2-18 Project Viewer

Figure 2.11 Project Viewer Project Components View lists the project steps: first

by country, and then by steps within that country.

Graphics View displays steps in order of processing, using a graphical flowchart format.

List View lists steps in order of processing.

See “Using the Data Flow Architect” on page 2-20 for more information about these views.

Project Components View Project Viewer


Step Viewer 2-19

Step ViewerIn the Step Viewer you can set up the module, specify input and output files, modify program tasks and conditions, customize rules, run the module, and view and analyze output files, statistics and logs.

To open the Step Viewer

1. Double-click either the module icon in the Graphics View, the module in the List View, or the module in the Project Components View.

Figure 2.12 Step Viewer


2-20 Using the Data Flow Architect

Using the Data Flow Architect

Once your project has been created, the Data Flow Architect (DFA) presents your project in the Graphics View. The DFA lets you review and modify the data quality process. Step modules are displayed in a flowchart model, with connection arrows used to identify the flow of data. You can create step connections and job flows to run in batches. These flow charts can be customized and printed for easy illustration of the data quality process.

Figure 2.13 Data Flow Architect - Graphic View


Graphics View 2-21

Graphics ViewIn the Graphics View, you can perform various step-specific tasks:

run, rename, and move steps

delete and connect steps

copy steps

change settings files

Figure 2.14 Menu from a Step

To run steps

1. To run a single step, right-click it and select Run Selected from the pop-up menu.

2. To run multiple steps, use CTRL+click to select several steps. Once the steps are selected, right-click and select Run Selected.

3. To run steps that are connected, right-click on the desired starting point and select Select All Downstream, All Dependencies or Whole Flow. Once you make the appropriate selection, right-click and select Run Selected.

Right-click on a step


2-22 Graphics View

To rename steps

1. Right-click a step and select Rename from the pop-up menu. 2. Enter a unique step name and click OK.

To move steps

1. To move a single step, click and hold the step and drag it to a new location.

2. To move the entire job flow, click on the first step, hold down the CTRL key, and click all the other steps in the job flow. You may now drag the complete flow to a new location. Or, right-click a step and select the Select All Downstream option, then drag it to a new location.

To connect steps

1. To connect two steps, right-click the first step and select Start Connection, then click the second step. OR, click the connection area on the first step and click the second step to connect it.

To remove a connection

1. To remove a connection, right-click the step and select Remove Incoming Connection or Remove Outgoing Connection.

To move a Connection Area

1. Position your cursor over the connection area on the step until it changes to a cross hair. Right-click and select Move to Bottom, Move to Top, Move to Left or Move to Right.

Figure 2.15 Step Connection Area

Outgoing Connection Area

Incoming Connection Area


Graphics View 2-23

To copy a step module

1. To copy a step module, right-click the module and select Copy Selected.

2. In the list view, select the module to copy from the list and click the Copy Selected Step button in the toolbar menu above.

To change a settings file

1. To change a settings file, right-click a step module and choose Change Settings File.

2. Select the settings file you want to use to replace content in the step’s current settings file.

You must select a settings file of the same type, for example, if the step is a Transformer step, you must select a transfrmr.stx file.

3. A confirmation dialog will appear. Click Yes to copy the contents of the selected settings file.


2-24 Data Flow Architect Settings

Data Flow Architect SettingsIn addition to the step-specific tasks, you can make changes to the Data Flow Architect itself:

Lock steps

Add comment

Select all steps

Add new steps

Print the Data Flow Architect

Set preferences

For Preferences settings, see “Set Up the Control Center” on page 2-5. Also see “Changing the Control Center Display Settings” on page 17-3.

Figure 2.16 Menu from the DFA

To lock steps

1. Right-click anywhere inside the DFA (except on a specific step) and select Lock. This will lock all steps into place. Remember to unlock the DFA if you wish to add, delete or move a step.


Data Flow Architect Settings 2-25

To add comment

1. Right-click anywhere inside the DFA (except on a specific step) and select Add Comment.

2. Enter a comment in the Edit Comment window and click OK. The comment is inserted in the DFA window. You can also drag this comment to another location.

3. To edit comments, right-click on the comment and select Edit, Resize, Hide, or Delete. You can also select Show All Comments, Show Comment Borders, and Delete All Comments from the DFA menu.

To select all steps

1. Right-click anywhere inside the DFA (except on a specific step) and select Select All Steps. This will select all steps in the DFA.

To add new steps

1. Right-click anywhere inside the DFA (except on a specific step) and select Add New Step from Palette. This will open the Step Palette on the left side of the DFA.

2. Select a step from the Step Palette: Drag and drop it on the DFA. Choose a country, and provide a name for this step. Then click OK.

To print Data Flow Architect

1. Right-click anywhere inside the DFA (except on a specific step) and select Print Data Flow Architect. The Page Setup window opens.

2. Specify the page settings and click OK.You have several print options:

Display landscape paper boundary

Display portrait paper boundary

Display architect title imprint


2-26 List View

List ViewIn the List View, you can view steps in the order in which they will be processed. A step may be opened by double-clicking it. From the List View you can perform several tasks:

Open, rename, add, delete, and reorder steps

Generate a batch script to run selected steps

For information on batch scripts, See “Batch Script” on page 14-3.

To open the List View

1. Select the List View tab to view the project steps. 2. Click a step in the List View and the tool bar options become

available.

Figure 2.17 List View Tab

To open the step

1. In the List View, highlight a step. 2. Click on the tool bar.

To rename the step

1. In the List View, highlight a step. 2. Click on the tool bar. 3. In the Provide a Unique Step Name box, enter the new

name for the step. Click OK.

To add steps

Tool bar


List View 2-27

1. In the List View, highlight a step. 2. Click on the tool bar. Step Palette appears on the

left. Drag and drop the desired step into the List View. 3. In the Choose Country Name box, select a country from

the drop-down list. Click OK. 4. The new step is added after the step you highlighted.

To delete steps

1. In the List View, highlight one or more steps. 2. Click on the tool bar.

To move steps

1. In the List View, highlight one or more steps. 2. Use the up and down arrow keys to move the steps into the

desired order for processing.

Move selected step(s) up and down


2-28 Using a Project Step

Using a Project Step

A Project contains a series of steps. The configuration of a step window is the same for all steps. Therefore, the following steps are the same for all modules.

To open a step

1. Double-click the step in the Project Steps By Country list, or double-click the step icon in the Data Flow Architect pane. The Step Window appears. The Step Window contains three tabs:

Input Settings

Output Settings

Results

The input, output and other settings are explained in detail for each step in subsequent chapters. This section provides information on the general procedures for a project step.

Input Settings tab

Use the Input Settings tab to specify the Input File Name and Input DDL Name.

To specify input files

1. Type a file name in the Input File Name and Input DDL Name text boxes. You can use the File Chooser button to select the files.

2. Click Add. The file name is dynamically added to the table in the Input Data File Name and Input DDL Name columns.

To replace the input files

1. Type a file name in the Input File Name and Input DDL Name text boxes. You can use the File Chooser button to select the input files.


Using a Project Step 2-29

2. Click Replace. The file names in the Input Data File Name and Input DDL Name column are replaced with the files you just specified.

To delete the input files

1. Highlight the row in the Input Data File Name and Input DDL Name column that contains the file names you want to delete.

2. Click Delete.

The Data Browser can be invoked to browse the input file. The Dictionary Editor can be invoked to view or edit the DDL. The Comment icon allows the user to add comments and notes related to the step.

Figure 2.18 Step Window - Input Settings

Comment

DDL Editor

Data Browser

Input Settings Tab

Entry List


2-30 Using a Project Step

Output Settings tab

The Output Settings tab lets you specify the Output File Name, the Output DDL Name, the Statistics File Name, and the Process Log Name.

To specify output files

1. Type a file name in the Output File Name and Output DDL Name text boxes. You can use the File Chooser button to select the files.

2. Type a file name in the Statistics File Name and Process Log Name text boxes. You can use the File Chooser button to select the files.

Figure 2.19 Step Window - Output Settings

Advanced Settings

Most step configurations are made in the Advanced Settings Window. Advanced Settings options allow the user to customize settings for each step. The appearance of the Advanced Settings window varies depending on the step.

To open Advanced Settings

Output Settings Tab


Using a Project Step 2-31

1. Click the Advanced... button from the step.

Figure 2.20 Step Window - Advanced Settings

Results tab The Results tab displays output information related to the step’s execution.

Statistics - The Statistics tab shows statistics from the run, which may be viewed using the My Statistics Viewer icon or the Spreadsheet Viewer . (You can specify the editor to use as the My Statistics Viewer when setting up Preferences for the Control Center.) The Spreadsheet Viewer displays the statistics in an MS Excel format.

Process Log - The Process Log tab displays processing statistics from the step run.

Error Log - The Error Log tab displays any errors encountered during the step run. Process and Error Logs may be viewed using the Text Viewer .

Advanced Settings


2-32 Save and Run a Step

Figure 2.21 Step Window - Results

If the Process Log exceeds the capacity of the window, you can click the Text Viewer icon to display the entire file in a separate window.

Save and Run a StepAfter you finish configuring your settings, you can save your settings without running the step, or run the step.

To save a step without running

1. Click Save to save your settings.

To run a step

1. Click Run at the bottom of the step, or right-click the step icon and select Run Selected. Clicking the Run button saves your settings by default and then runs the program.

Select the Save button on a step to save any changes made to the settings if you are not going to Run the step. Changes are automatically saved when a step is Run.


The Data Dictionary Language (DDL) 2-33

The Data Dictionary Language (DDL)

The Data Dictionary Language (DDL) is a collection of English statements used to define file and record layouts. DDLs are used throughout the TS Quality system. A file that contains DDL components is called a DDL file. DDL files are either in xml format or in text format.

XML FormatFile extension is .ddx (Example: input.ddx)

Text FormatFile extension is .ddt (Example: input.ddt)

See Chapter 2 in Getting Started with TS Quality for the location of default DDL files in the directory structure.

Methods of Creating a DDLYou can create DDLs by the following methods:

Data Dictionary Editor (DDL Editor) You can use the Data Dictionary Editor in the Control Center to create a DDL or modify an existing DDL. The default format for the DDL Editor is XML. The users can convert XML files to text files or text files to XML files in the DDL Editor.

Any Text EditorYou can use any text editor to create DDLs in text format. Special text formatting, such as underline or bold, should not be applied because the software will be unable to read it.

Delimited Files ConsiderationsThe input and output files for TS Quality can be fixed-field files or delimited files. Internally, the delimited file's records will be put into a fixed format for processing according to the DDL.


2-34 Keywords in a DDL

For delimited files, every field in the DDL should reflect the maximum field length.

For example, if you have a field on input called ADDR_LINE_1 and the value is "10 Main St", then a field length of 10 bytes for that field will be sufficient, but a field length of 8 bytes will truncate to "10 Main ". If you have that field on output and the line was changed to "10 Main Street" by processing, then a field length of 10 will truncate the output to "10 Main St". Make sure that you have enough field length for each field on the DDL for delimited files.

Keywords in a DDLA DDL uses the keywords shown in Table 2.1. Required keywords are listed in bold.

Table 2.1 DDL Keywords

Keyword Description

Record Name

The name of a record in the DDL. 1 to 32 characters long. If it contains embedded spaces, they must be enclosed in double quotes.

Record Length

The total record length in bytes. The total length of the record must be equal to the sum of the lengths of all fields.

Field Name Name of the field. If it contains embedded spaces, they must be enclosed in double quotes.

At least one field statement per file is required. Maximum 32 bytes. Field names should only contain letters, numbers, and underscores.

Type Data type for the field. You can specify the appropriate character encoding or other type of value.

See “Type Keyword” on page 2-42.

Redefine Redefine the field to a specific byte position in the record. See “The REDEFINE Function” on page 2-40.

Start Position

The relative byte position of a field within the record. DDLs are zero-based. Therefore, the first field of a record generally begins in column zero.


Keywords in a DDL 2-35

Length The length of a field in bytes. The number must be a positive integer greater than zero.

If the entity is a field, the length must be less than the Record Length. Two fields cannot occupy the same space, unless one field is a redefinition of that field. If the entity is a subfield, the length must be less than the parent field.

The sum of all field lengths must equal the length of the record.

Default The default value for the field. The value must agree in type with the Type. Numbers may be positive or negative.

Values:

SPACES – fill the field length with spaces.-1 – for a numeric with a negative value.0 – for a numeric.'0' – for a character field."0" – for a string field.

Comment The comment for the field.

Attributes This allows data in the field to be passed through a TS Quality step without any data interpretation or translation. Any field type can be used because there will be no data translation.

Value:

NONVALIDATION – data in the field will remain “as is”.

Table 2.1 DDL Keywords (Continued)

Keyword Description


2-36 Creating a DDL Using the DDL Editor

Creating a DDL Using the DDL EditorYou can create a DDL from a fixed-field data file or a delimited data file. Complete the following steps to create a new DDL.

To create a DDL from a fixed field data file

1. Open the DDL Editor from the Control Center. Select New from the File menu. A new empty DDL opens.

CLASS Converts any 2-digit year into a 4-digit year. If used, it must immediately follow a Field statement.

Values:

DATE BACKWARDDATE FORWARDDATE WINDOW {nnn}

If used, CLASS is required to be on the input DDL. See “CLASS Keyword” on page A-10 to learn more about the Class specifications.

Table 2.1 DDL Keywords (Continued)

Keyword Description


Creating a DDL Using the DDL Editor 2-37

2. Select DDL Builder from the Tools menu. In the Select Data File section, enter the file name and record length, and select the encoding for your data file.

Figure 2.22 DDL Builder 3. In the Record section, highlight a portion of the record you

want to make a field in the DDL.The Start and End Position automatically appear in the windows.

4. Specify the Field Name and select the Field Type. 5. Click Add to DDL.The new field will be added to the DDL

table. 6. Repeat this process until all fields are defined in the DDL. 7. Save the DDL, using a .ddx extension to the dictionary file

name.


2-38 Creating a DDL Using the DDL Editor

To create a DDL from a delimited data file

1. Open the DDL Editor from the Control Center. 2. Select New from the File menu. A new empty DDL opens. 3. Select Tools, Create DDL from Delimited File. Select the

delimited filename and delimiter. 4. Specify the output DDL filename. The first part of the

delimited file will be displayed in the Sample Data Preview window.

Figure 2.23 Generate Dictionary from Delimited File 5. Click Create.The new DDL will be automatically created.

Save the DDL using a .ddx extension to the dictionary file name.

For delimited files, every field in the DDL will reflect the

If you are using the Project Wizard to create a project and you don’t have a DDL file for delimited input, it will be created automatically using the header as field names.


Creating a DDL in a Text Editor 2-39

maximum field length.

Creating a DDL in a Text EditorWhen creating a DDL in a text editor, make sure to include keywords and follow the set grammar that must be used in creating DDLs.

SyntaxUse the following syntax:

Keywords are case-insensitive. For example, the following keywords all mean the same thing:"Field", "FIELD", and "field".

BracketsThe actual brackets [ ] are not physically entered on a DDL file. All punctuation and noise words such as "is", "are", and "in" can be used. They are highly recommended to make subsequent reading more understandable.

Parameters are case sensitive.All name and string value parameters are case sensitive. String values are enclosed within double quotes. (example, “Hello World”)

Tab characters are not allowed in a DDL.

Always define until the last carriage return.

Comments can be enclosed between the string pairs "/*" and "*/", or can be indicated by the prefix string "//".

Keyword [is, are, in] Parameter


2-40 DDL Components in Text Format

Example/* This is a comment that extends over two linesdelimited by the slash and asterisk pairs *///This is a comment to the end of this line

DDL Components in Text FormatA text DDL consists of two main sections: Record and Body information.

The REDEFINE FunctionBy redefining fields with the REDEFINE keyword, you can use part

Type is FIXEDLength is 200

Field is input_line_1Type is ASCIIStarts in column 0Length is 50

Field is input_line_2Type is NOTRANSStarts in column 50Length is 50Default is ‘0’

Field is input_line_3Class is DATE FORWARDType is ASCIIStarts in column 100Length is 50

Field is input_line_4Type is ASCIIStarts in column 150Length is 50Attributes are NOVALIDATION

Body Information

Record Information


The REDEFINE Function 2-41

of the field or the same field with a different name in the output. Redefining fields requires listing two fields: the field to be redefined, followed by a field listing that is the redefinition.

The ‘Starts in’ position may be maintained manually. However, the automatic renumbering of ‘Starts in’ position is facilitated through the ‘//REDEFINE’ statement. When the ‘Recalculate Positions’ function in the DDL Editor encounters the string ‘//REDEFINE’ ahead of a pair of field definitions, it will not increment the ‘Starts in’ number for the second field definition. Type is FIXEDLength is 200 //REDEFINEField is ORIGINAL_RECORDType is ASCIIStarts in COLUMN 0Length is 200 Field is input_line_1Type is ASCIIStarts in COLUMN 0Length is 100

Field is input_line_2Type is ASCIIStarts in COLUMN 100Length is 100

If you are using a delimited file for input, you cannot use the Redefine function on the output DDL.


2-42 Type Keyword

Type KeywordThe Type is required for every field entity. There are two Type categories: encoding (code page), and date format. The following list shows the main values used for the Type keyword.

Encoding (Code Page) Encoding is a mapping of binary values to code position to represent characters of data. It is also called a code page. The main character encoding used in TS Quality includes ASCII, Latin1 and Latin2.

See Appendix A for the complete list of Encoding.

Date formatDate format is a type of data which may contain only valid dates.

See Appendix A for the complete list of Date format.

Class keywordClass keyword specifies the format to be used for the date field. By using the class keyword, you can convert any 2-digit year into a 4-digit year.

See Appendix A for the complete list of Class keywords.


3-1

CHAPTER 3 Investigating Your Data

Investigating Your Data

3-2

After you create a project, you must investigate your data before working with any processes. Investigation helps you determine how well your data conforms to rules that govern acceptable limits and requirements for data elements, and helps you understand what data quality processes need to be put in place. Investigate your data with the Data Browser, DDL Editor, and TS Discovery.

This chapter focuses on four tasks:

View data using the Data Browser

View DDL using the DDL Editor

Analyze data using TS Discovery

Identify problems with the data


View Data Using the Data Browser 3-3

View Data Using the Data Browser

The Data Browser lets you view a data file to verify its format as described by the data dictionary language (DDL) file. You can verify the format on either a record-by-record or on a field-by-field basis.

To open the Data Browser and view the input data

1. Double-click the project’s suitcase icon in the Data Flow Architect and open the project. Existing projects are shown as a suitcase icon with the user’s hostname and project name.

Figure 3.1 Project in the Data Flow Architect 2. Double-click on the first step (for example,

inputTransformer), and open the step.

Figure 3.2 inputTransformer Step

For detailed information on the Data Browser, see the Online Help.

Project

First step


3-4 View Data Using the Data Browser

3. On the Input Settings tab, select the first entry in the entry listing options. The input file name and corresponding DDL file name will already be populated. These files were specified during the Creating Project Wizard process (See “Creating a Project” on page 2-9).

Figure 3.3 Input Settings Tab 4. Select the Data Browser icon next to the input file

name. The Data Browser opens the input file with its corresponding DDL.

You can also open the Data Browser from the Tools Palette by double-clicking the Data Browser icon. In this case, you must select the input file and DDL to view. Opening the Data Browser from within a step automatically opens the tool, the file, and its corresponding DDL.

5. The Field Selection window opens. This window shows all the fields that exist in the input DDL.

6. Select the fields you want to display in the upper pane and click Add. To select all the fields, click Add All.

7. After the fields appear in the Selected Fields list box, you have several options:

• Clear all the fields by clicking Clear

Entry Listing

You can sort the fields. Click on the Field Name, Start Position, Length or Type column headers.



• Change the position of a field(s) or delete it by selecting

the field, and then clicking the Arrow button or to move it up or down, or the Delete button to delete it

• Save the selected fields (called the ‘view’) in a file by

clicking the Save button .

See the next procedure “To save the view” for more details on saving the fields.

Figure 3.4 Field Selection Window 8. Click Display.

The order of the fields determines the order in which they will be displayed when you browse the records.



9. Browse the data and verify that the field names reflect the data contained within them.

Figure 3.5 Input Data

To save the viewYou can save or store a view of data in the Data Browser. If you frequently look at the same fields in a file, saving a view can save time.

1. In the Field Selection window, select the fields you want to view using the CTRL Key. For example, select Phone, Country, Start_date and Product_type.

2. Select Add to add the selected fields. 3. Click Save to save the selected fields.

You can display the data by Record Numbers or Byte Offsets. Select either option in the Options menu.



4. The Save window opens. Name this view and save it in the desired directory.

Figure 3.6 Save View File

To view a stored view

1. To view a stored view, click Load in the Field Selection window. The Customized View window will show all stored views.

The view file will have the extension of .cuv.



2. Click the view name and select OK. The fields will be loaded in the Selected Fields window. Select Display to view the stored fields.

Figure 3.7 Customized View Window 3. Select File, Exit and close the Data Browser.


View DDLs Using the DDL Editor 3-9

View DDLs Using the DDL Editor

The Data Dictionary Editor (DDL Editor) lets you view existing data dictionary language (DDL) files.

To open the DDL Editor and view a DDL

1. On the Input Settings tab, click Dictionary Editor next to the input DDL name. The DDL Editor will open the input DDL file. The DDL is displayed in a table.

Figure 3.8 Data Dictionary Editor

For detailed information on the DDL Editor, see the Online Help.


3-10 View DDLs Using the DDL Editor

2. The upper frame shows the Record Name, Record Length and Update ORIGINAL_RECRD_LENGTH option:

3. The lower spreadsheet shows the details of the selected DDL. Refer to the following table and verify each item in the DDL:

Window/Option Description

Record Name The record’s name.

Record Length Total length of the record represented by this DDL in bytes.

Update ORIGINAL_RECORD Length

The ORIGINAL_RECORD length update option. See the Online Help for details.

Column Description

Field Name DDL fields listed row by row, in the order that they appear in the DDL. The standard field names are displayed in blue. Other unique field names are displayed in black.

Type Field type (encoding). See “Encoding (Code Page)” on page A-3 for details of

encoding.

Redef(Redefine)

Indicates whether the field is redefined to a specific byte position in the record.

Y = field is redefinedblank = field is not redefined

Start Pos. The zero-based byte position where the field begins in the record.

Length The length of the field in bytes.

You can edit all items in the column from this table. See the Online Help for details.


View DDLs Using the DDL Editor 3-11

4. Select File, Exit and close the DDL Editor. 5. Click in the upper right-hand corner to close the step.

Default Default value for the field.

Comment The comments for a field.

Attribute Indicates whether the data is to be passed through a step without any validation.

NOVALIDATION = data in the field remain “as is” even if it is in a different field type.

Class Converts a 2-digit year into a 4-digit year.


3-12 Analyze Data Using TS Discovery

Analyze Data Using TS Discovery

TS Discovery is a data profiling tool used to discover and analyze data quality. If you want to analyze data in more detail to reveal data anomalies, broken data rules, misaligned data relationships, and other characteristics, we recommend using TS Discovery before running other TS Quality processes.

One (1) license for TS Discovery is included with the TS Quality Client. You can launch TS Discovery by clicking the TS Discovery icon from the Control Center Toolbar.

Instructions for TS Discovery are not included in this book. Refer to TS Discovery manuals for more information.

Figure 3.9 TS Discovery


Identify the Problems with Data 3-13

Identify the Problems with Data

By browsing the input data and input DDL files, you can identify many issues with data such as misspelling, inconsistent format, incorrect entries and duplicate records. Depending on the problems with data, you must decide what cleansing and standardization processes are necessary.

For example, the following issues have been identified in the data in the sample TMT project:

The input file contains data from multiple countries (US, CA, DE, GB)

The Phone number field has variations in the phone formats

The country names in the Country field are inconsistent

The current data format in the Start_date field is different than the date format in the Last_contact_date field

There are different values for the same products in the Product_type field

There seem to be misspelled addresses in the data

There are duplicate records across the data

Those issues will be corrected in the subsequent chapters of this guide. First the global data is separated to four (4) input data files. Next, data cleansing and standardization are performed at each country level. After the addresses are validated and corrected, the records are linked to identify the duplicate data. At the end of the process, the best records with most recent information will be output and a batch script for the entire process will be created for production use.


3-14 Identify the Problems with Data


4-1

CHAPTER 4 Using the Global Steps

Using the Global Steps

4-2

After you have investigated the data and identified the issues, you can begin to process the data. First, use the Global Data Router to separate the multi-country input file into country-specific files. One advantage to running the Router step before cleansing and standardizing your data is that it enables data to be standardized at the country level. This ensures that further processing is done at a country-specific level.

In this chapter, you will perform these tasks:

Specify the input and output files

Identify the rules files used to determine the country of origin

Identify the Global Geography table, which contains state, city, locality, post code and word/pattern structures

Define the settings for the Global Data Router. These include the ability to:• Use a Country Code field to identify country of origin• Review the country list and determine the countries which

are available to the Global Data Router• Modify the default list of fields to scan for the country of

origin

Run the Global Data Router and view results.


Using the Global Data Router 4-3

Using the Global Data Router

The Global Data Router scans an input file that contains record data from more than one country, identifies the country-specific data, and then creates one output file per country that contains only the data specific to the country you selected.

The Global Data Router uses Rules Files that contain country-related word definitions and tables. These rules specify how many output files to generate and which countries are identified. The Router supports input data from most countries.

Input and Output SettingsSince the Global Data Router step is usually the first step in the project, it uses the Input File Name and Input DDL Name specified in the Project Wizard as the default inputs.

To specify input and output files

1. Open the Global Data Router step and click the Input Settings tab.

2. Enter file names in the Input File Name and Input DDL Name text boxes.

3. Click the Output Settings tab. 4. Enter file names in the Output File Name and Output DDL

Name text boxes.

Separate OutputIf you want a separate output file for each country, select Generate a separate output file per country. When this option is selected, an underscore(_) and an asterisk (*) will be added automatically to the filename you specified in the Output File Name text box. After processing, each output filename will include a country suffix in lower case. For example, the US data will be named <filename>_us, and the Canadian data will be named <filename>_ca.

Tip: You can either edit the file names manually or click the File Chooser icon to browse to the appropriate file and select it.

To view the contents of your data file, click the Data Browser icon.

Use the Dictionary Editor to view the contents of the DDL file.


4-4 Input and Output Settings

Single OutputIf you are generating a single output file for all countries, deselect Generate a separate output file per country. In this case, all data, separated by country, will be written to the single output file you specified.

If you provided Name and Address data in the Project Wizard, the output DDL will contain a series of redefines for the Name and Address data. Redefines are used to map the customer-defined field name to the TS Quality Name and Address reserved DDL field name. The input fields are mapped to the reserved TS Quality name and address field names INPUT_LINE_01 through INPUT_LINE_10. If a name or address line contains multiple fields, the input fields are mapped to INPUT_LINE_02a, INPUT_LINE_02b, etc.

5. Enter file names in the Statistics File Name and Process Log Name text boxes.

To specify the input/output file qualifiersA File Qualifier is a unique name given to a data file. Each input and output data file must have a unique file qualifier.

1. Click Advanced and navigate to Input, Settings. 2. Select Input Data File Qualifier (default is INPUT). 3. Click Advanced and navigate to Output, Settings. 4. Select Output Data File Qualifier (default is OUTPUT).

You may also specify the following settings:

To specify the NOMATCH fileThe ‘NOMATCH’ file contains records where the Global Data Router was not able to determine country of origin.

1. Click Advanced and navigate to Process, Settings. 2. Locate Nomatch File and specify the file.

To specify the starting record

1. Click Advanced and navigate to Input, Settings.

A red flag indicates a REQUIRED field for this operation.

Tip: You can either edit the file names manually or click the File Chooser icon to browse for and select the file.


Process Settings 4-5

2. Enter a numeric value in Start at Record. This specifies the record in the input data file at which the Global Data Record will begin processing (default is 1).

To specify the maximum number of records to process

1. Click Advanced and navigate to Input, Settings. 2. Enter a numeric value in Process a Maximum of. This

specifies the maximum number of records to process. By default, all records will be processed.

To process every nth record only

1. Click Advanced and navigate to Input, Settings. 2. Enter a numeric value in Process Nth Sample. This

specifies that only every Nth record will be processed. By default, all records will be processed.

To use a delimited fileIf you are using a delimited file for input and/or output, you must specify delimited settings.

1. Click Advanced and navigate to Input, Settings. 2. Select Input Data File Delimiter Encoding and Input

Data File Delimiter from the drop-down list. 3. For output, click Advanced and navigate to Output,

Settings. 4. Select Output Data File Delimiter Encoding and Output

Data File Delimiter from the drop-down list.

See “Encoding (Code Page)” on page A-3 for more information on encoding.

Process SettingsOnce you have specified input and output files, you are ready to specify the settings to process your data. Do this in the Advanced Settings window.

Valid delimiters are Tab, Space, Semicolon, Comma, and Pipe. Characters other than those listed must be enclosed within quotation marks.


4-6 Rules Files

Rules FilesThe Global Data Router uses two rules files to determine country of origin. Rules files contain entries that define the resource tables used by the Global Data Router program, as well as country-specific data.

Global Rules File—Defines rules that apply to all countries. It also contains translation tables, street types, city definitions, and other rules that require lengthy entries.

Country Rules File—Defines rules that apply to specific countries.

See “Global Data Router” in the TS Quality Reference Guide for details of these rules files.

To specify the Rules Files

1. Open the Global Data Router step. 2. Click Advanced and navigate to Process, Settings. 3. Locate the Global Rules File and Country Rules File and

specify the files.Default Global Rules File: \TrilliumSoftware\tsq10r5s\tables\general_resources\rtrules1.winDefault Country Rules File:\TrilliumSoftware\tsq10r5s\tables\general_resources\rtrules2.win

You can edit the Rules Files. You may also use the Customer Rules File which allows you to add your own user-definied rules. See “Global Data Router” in the TS Quality Reference Guide for details.

Global Geography TableIn addition to the Rules Files, the Global Data Router uses a Global Geography Table that contains state, city, locality, post code and word/pattern structures.



Country Settings 4-7

This table is read-only and may not be changed.

To specify the Global Geography Table

1. Click Advanced and navigate to Process, Settings. 2. Locate the Global Geography Table and specify the file.

Default Global Rules File: TrilliumSoftware\tsq10r5s\tables\general_resources\GLOBRTR.tbl

For China, Japan, Korea, and Taiwan, you must specify the APGLBRTR.tbl geography file using the Global Geog APAC File Name settings. If you want to include other countries such as the US, you must also specify the regular GLOBRTR.tbl geography file.

Country SettingsIf the data has a country code field, you must specify the field name for the country code. This ensures that the Global Data Router uses the data in this field to identify and score country of origin.

To specify a Country Code Field

1. Click Advanced and navigate to Process, Settings. 2. Locate the Country Code Field. Select the appropriate field

name from the drop-down list.

Country CodeField


4-8 Country Settings

Figure 4.1 Country Code Field

To review the country listMake sure the Country List identifies the valid country choices for your data.

1. Navigate to the Country List, Country settings. The Country Names are automatically entered based on your selection in the Project Create Wizard.

2. Review the list and confirm that the Country List identifies the valid country choices for your data.

Figure 4.2 Country List

Country List


Fields Settings 4-9

Fields SettingsYou must tell the Global Data Router which fields contain country of origin codes. When there is no valid country code or the country code is suspect, the Field Settings will determine which fields the GDR will inspect.

To specify fields to scan for country of origin dataNavigate to Fields, Field. Select the field name that contains information for country of origin. If you have a valid country code field, you can select that field. This means that the program will only scan that field for country of origin data.

Figure 4.3 Field Settings

DDL SettingsIf you choose, you can specify separate output DDLs for each country. If this is not specified, the output DDL specified in the Output Settings will be used.

To specify a separate DDL for each country

1. Click Advanced and navigate to DDL, Settings. 2. Select the DDL file for each country from the drop-down list.


4-10 Additional Settings

Additional SettingsYou can specify the following additional settings:

To enable debug function

1. Click Advanced and navigate to Process, Settings. 2. Select Enable Debug Output. 3. In the Debug File text box, accept the default path and file

name, or enter a new file name. Debugging information will be written to this file.

To count number of records processed

1. Click Advanced and navigate to Process, Settings. 2. Enter a value in the Sample Count text box. This value

determines how frequently TS Qualty will report while processing data. The number that you enter is the number of records that TS Quality will process before printing a progress report to the screen. For example, if you enter 50, TS Quality will print a message after processing 50, 100, 150 and so on.

This count will be written to the Process Log file. To display the Log file, select the Results tab and navigate to the Process Log tab after the program is run. The default is always 1.

To specify settings file encoding

1. In Settings File Encoding, select the correct encoding from the drop-down list.


See “Global Data Router” in TS Quality Reference Guide for the complete settings information.


Run the Global Data Router and View Results 4-11

Run the Global Data Router and View Results

To run the Global Data Router and view results

1. Click OK to close Advanced Settings. 2. Click Run to run the Global Data Router.

You can also right-click on a step and select Run Selected.

3. Select OK. 4. On the Results tab, select the Statistics sub-tab. The

Statistics sub-tab will show the number of records included in each country-specific file. The ‘NOMATCH’ file contains any records where the Global Data Router was unable to determine country of origin.

Figure 4.4 Global Data Router Statistics

When you click Run, TS Quality automatically saves your settings. To save your settings without running the step, click Save.


4-12 Run the Global Data Router and View Results


5-1

CHAPTER 5 Cleansing Your Data

Cleansing Your Data

5-2

After you separate the input data into country-specific data, you can start the cleansing process. This chapter explains how to cleanse the data using the Transformer.


Specify the input and output files

Use character translation to convert particular hexadecimal values

Use field scanning to change field values

Use table recoding to recode the values in a field using a literal or mask shape

Use conditionals to control the field scan and table recode settings

Run the Transformer and review the results


Using the Transformer 5-3

Using the Transformer

The Transformer converts input data from one or more files and formats to a single output, based on fields specified by one or more Data Dictionary Language (DDL) files. The Transformer lets you convert and merge records from up to ten input files into a single, standard format.

The Transformer performs several functions:

Scan data records for defined shapes (masks) and literal values, and then move, recode, or delete the data

Apply sophisticated conditional logic to perform an unlimited number of data transformations

Modify field lengths

Recode character fields, based on a user-defined external table

Identify and separate records that reject the conversion process so that they can be more closely examined

Input and Output SettingsThe Transformer uses the output from the Global Data Router step as input. If the Transformer step is the first step in your project, it will use the Input File Name and Input DDL Name specified in the Project Wizard as the default inputs.


1. Open the Transformer step and select the Input Settings tab.

2. Specify a file name in the Input File Name and Input DDL Name text boxes.


- OR -

Cleansing Your Data


Click Replace. The default file names in the Input Data File Name and Input DDL Name column are replaced with the files you just specified.

The Transformer can use up to ten input files simultaneously.

4. Navigate to the Output Settings tab. 5. Specify a file name in the Output File Name and Output

DDL Name text boxes. 6. Specify a file name in the Statistics File Name and

Process Log Name text boxes.

If you provided the Name and Address data during the Project Wizard, the output DDL will contain a series of redefines for the Name and Address data. Redefines map the customer defined field name to the TS Quality Name and Address reserved DDL field name. The input fields are mapped to reserved TS Quality name and address field names INPUT_LINE_01 through INPUT_LINE_10. If a name or address line consists of multiple fields, the input fields are mapped to INPUT_LINE_02a, INPUT_LINE_02b, etc.

To specify the input/output file qualifiersA File Qualifier is a unique name given to a data file. Each input and output data file must have its own unique file qualifier.

1. Click Advanced and navigate to Input, Settings. 2. Specify Input Data File Qualifier (default is INPUT). 3. Click Advanced and navigate to Output, Settings. 4. Specify Output Data File Qualifier (default is OUTPUT).

You can also specify the following settings:





Input and Output Settings 5-5

To specify multiple input filesIf you have multiple input files, make sure that the settings will be applied to your desired input file.

1. Click Advanced and navigate to Input, Setting. 2. Select the appropriate input file from the Input Files text

box on top.

Figure 5.1 Transformer Multiple Input Files 3. Specify your settings for the desired input file.

To specify an exceptions file

1. Click Advanced and navigate to Input, Settings. 2. In the Exceptions File text box, accept the default file or

specify the path and name of the file that contains exceptions records. Exceptions records contain data such as incorrect records or field types.








Cleansing Your Data


To specify the origin of record

1. Click Advanced and navigate to Input, Settings. 2. In File Source, enter text to specify the origin of the data

file. 3. Select File Source Encoding from the drop-down list. 4. Navigate to Output, Settings. In Source Field, select the

DDL field to receive the origin of record you specified in File Source.


1. Click Advanced and navigate to Input, Settings. 2. Enter a numeric value in Start at Record. This specifies the

record in the input data file at which to begin processing (default is 1).





1. Click Advanced and navigate to Input, Settings. 2. Enter numeric value in Process Nth Sample. This specifies

that only every Nth record will be processed. By default, all records will be processed.

The File Source and Source Field work together. If you specify one of these values, you must specify the other value. If you delete one of these values, you must delete the other value.


Using Multiple Input Files to Create an Output DDL 5-7

Using Multiple Input Files to Create an Output DDL

You can specify up to a maximum of ten (10) input files and their associated DDLs and use these to create a common output file for later processing by modules downstream in your workflow. This process requires that after you specify the input files, you map input fields from the associated DDLs to a common output DDL file.

To add multiple input files and map

1. Double-click a Transformer step to open the Transformer Input Settings window.

2. In the Input Data File field, type or browse to the input file you wish to use.

3. In the Input DDL File field, type or browse to the inpt DDL file associated with the input data file you specified in Step 2.

4. Click Add. 5. Repeat Steps 2-3 until you’ve added all DDL files you want to

use to create the common output format. 6. Click the Define Output DDL button (bottom left). 7. The Define Output DDL dialog appears.

Cleansing Your Data

5-8 Using Multiple Input Files to Create an Output DDL

Figure 5.2 Define Output DDL dialog 8. Use the Input DDL drop-down menu to select the DDL file

you want to use to map fields to an output DDL file. The input DDL fields appear in the left pane and the final output DDL fields appear in the right-pane.

9. Use the buttons in the center panel to refine the output DDL list of fields. You can choose from these options:Add—adds the selected input DDL field to the output DDLlist.Delete—deletes a selected output DDL field from the list.



Move Up—moves the selected field in the output DDL list up one row.Move Down—moves the selected field in the output DDL list down one row.Redefine—redefines an input field as a portion of an output field. Use this option to map multiple input fields to the same redefined output field.Consolidate—consolidates an input field with an existing output field. Use this option when two or more fields have different names but contain the same data, such as zipcode, ZIP5, and postal_code.

For Redefine and Consolidate, make sure that the lengths of the input fields do not exceed the overall length of the redefined or consolidated output DDL field.

10. When you are ready, click Save to save the output DDL field mapping. When the Transformer step runs, it will create an output DDL file that uses this mapping.

Process SettingsOnce you have specified the input and output files, you can configure the settings to process your data. The settings for processing are managed in the Advanced Settings window.

Character TranslationThe Transformer lets you convert the original hexadecimal value to another hexadecimal value.

To convert the hex value

1. Click Advanced and navigate to Input, Character Translation.

Cleansing Your Data

5-10 Field Scanning

2. Specify a value for Input Field Name. This is the field to which the hex translation is applied.

3. Specify a value for From Hex Value. This is the original hex value which will be translated to another hex value.

4. Specify a value for To Hex Value. This is the hex value to which the original value is translated.

Field ScanningThe Field Scanning function converts the values in the field. You can scan values and then Change, Copy, Cut, and Flag the values.

Scan and Change

To scan a field and change its value

1. Click Advanced and navigate to Output, Field Scanning. Select the Change tab.

2. Refer to the following table and specify values for Change in the Field Scanning window:

Setting Description

Scan Field Field in the DDL file that specifies the location in which to perform the scan.



Scan and Change 5-11

Field Justification Specifies how data contained in the field is aligned: Left/Right Adjust- remove all spaces around the value, pad the field with spaces, and change multiple spaces between the value to a single space.Left/Right Trim - remove all spaces around the value and pad the field with spacesLeft/Right Pack - remove all spaces, pack left/right, and pad the field with spacesNo Justification (default) - no action is taken

Note for Asian Character Data: There is no distinction between full-width spaces and half-width spaces in the Field Justification operation. Full-width spaces within the text are converted to half-width spaces.

Scan Format Indicates the format of the value for which to scan: either a Literal value (the actual value) or a Mask value (the shape of the value)

Scan Value User-defined value for which to scan in a specified scan field

Change Value User-defined value that replaces the scan value

Change Occurences Numeric value that indicates how many times to scan for a value in a particular word or field

Scan Position The physical location in the field at which to begin scanning for the value: the exact Beginning of the field, anywhere in the field (Default), or the exact End of the field

Scan Level Indicates whether to scan for a value at either the Field level or at the Word level

Scan Direction Indicates the direction of the scan: Right-to-Left or Left-to-right

Between Substring String of user-defined characters between which to scan

And Substring Ending substring between which to scan

Retain Between Characters

Whether to retain the scanned-for value between characters (check box)

Scan Value Encoding

Specifies the code page used by the scan value

Setting Description

Cleansing Your Data

5-12 Scan and Change

ExampleIn this example, the phone number currently has dashes and spaces. To match more accurately, you should remove the dashes and spaces from the phone number. To change the phone number format, scan the Phone field for the Literal value “-” (a dash) using the following criteria:

These settings will cause the Transformer to scan the Phone field for the literal value “-” at the Field level. If the value is found, the Transformer will left-pack the value and change it to nothing.

Two sets of double quotes as the Change Value will change the value to nothing.

Change Value Encoding

Specifies the code page used by the change value

Between Substring Encoding

Code page used by a string of characters between which to scan

Setting Description

Scan Field Phone

Field Justification Left Pack

Scan Format Literal Value

Scan Value - (a dash)

Scan Position Default

Scan Level Field

Change Value ““ (two sets of double quotes)

Change Occurrences A (for All)

Phone Field Phone Field 207-555-4423 207555442


Scan and Copy/Cut 5-13

Scan and Copy/Cut

To scan a field and copy or cut its value

1. Click Advanced and navigate to Output, Field Scanning. Select the Copy or Cut tab.

2. Refer to the following table and specify values for Copy or Cut in the Field Scanning window:

Field Description

Scan Field Field in the DDL file that specifies the location in which to scan

Target Field Specifies the field in which to store the scan result

Field Justification

Specifies how data contained in the field is aligned: Left/Right Adjust- remove all spaces around the value, pad the field with spaces, and change multiple spaces between the value to a single space.Left/Right Trim - remove all spaces around the value and pad the field with spacesLeft/Right Pack - remove all spaces, pack left/right, and pad the field with spacesNo Justification (default) - no action is taken


Scan Format Indicates the format of the value for which to scan: a Literal value or a Mask value

Scan Value User-defined value for which to scan in a specified field

Retain Scan Value

When checked, retains the scanned-for value in the target field

Scan Level Indicates whether to scan at either the Field level or the Word level

Scan Position Indicates the physical location in the field at which to begin scanning: the exact Beginning of the field, anywhere in the field (Default), or the exact End of the field

Scan Direction Indicates the direction of the scan: Right-to-Left, or Left-to-right


Cleansing Your Data

5-14 Scan and Flag

Scan and Flag

To scan a field and flag its value

1. Click Advanced and navigate to Output, Field Scanning. Select the Flag tab.

2. Refer to the following table and specify values for Flag in the Field Scanning window:

Scan Capture Indicates the data to capture, based on the position of the scanned-for value in the word or field

Word Delimiter Specifies the delimiter used to separate words within a field

Between Substring

String of user-defined characters between which to scan


Retain Between Substring

When checked, retains the scanned-for value between substrings

Scan Value Encoding


Word Delimiter Encoding

Specifies the code page used by the word delimiter



Field Description

Setting Description

Scan Field Field in the DDL file that specifies the location in which to scan

Target Field Field that stores the result of the scanA red flag indicates a REQUIRED field for this operation.


Scan and Flag 5-15

Field Justification Specifies how data contained in the field is aligned: Left/Right Adjust- remove all spaces left/right to the value, pad the field with spaces, and change multiple spaces between the value to single spaceLeft/Right Trim - remove all spaces left/right to the value and pad the field with spacesLeft/Right Pack - remove all spaces, pack left/right, and pad the field with spacesNo Justification (default) - no action is taken


Scan Format Indicates the format of the value for which to scan: either a Literal value (the actual value) or a Mask value (the shape of the value)

Scan Value User-defined value for which to scan in a specified field

Retain Scan Value When checked, retains the scanned-for value in the target field

Scan Level Indicates whether to scan for a value at the Field level or the Word level

Scan Position Indicates the physical location in the field at which to begin scanning for the scan value: the exact Beginning of the field, anywhere in the field (Default), or the exact End of the field

Scan Direction Indicates the direction of the scan, either from Right-to-Left, or from Left-to-right

Word Delimiter Specifies the delimiters used to separate words within a field

Flag Value Specifies the user-defined value for a flag

Between Substring

String of user-defined characters between which to scan


Retain Between Substring

When checked, retains the scanned-for value between substrings

Scan Value Encoding


Setting Description

Cleansing Your Data

5-16 Scan and Flag

ExampleFor example, to flag the Doctor_flag field in this example, scan the Title field for the Literal value DR using the following criteria

Literal values are always case sensitive.

These options direct the Transformer to scan the Title field for the literal value DR at the Field level. If the value is found, the transformer will retain the scan value (DR) in the source field, and place the flag value Y in the Doctor_flag field.

Word Delimiter Encoding

Specifies the code page used by the word delimiter

Flag Value Encoding

Indicates the code page used for a flag value



Setting Description

Scan Field Title

Target Field Doctor_flag

Field Justification

No Justification

Scan Format Literal Value

Scan Value DR

Retain Scan Value

Check on

Scan Position

Default

Scan Level Field

Flag Value Y

Title Field Doctor_flag Field DR Y


Table Recoding 5-17

Table RecodingThe Transformer’s Table Recoding function converts the values in a field using an external user-defined recode table. You can recode literal or mask values.

MaskMasks are character representations of a data value which define each character in the data value as follows:

Example

To perform table recoding

1. Create a user-defined recode table. You can create a recode table in any text editor.

2. Table recoding uses a comma-delimited file with one column for the original value and a second column for the recoded value.

You can use any filename or suffix that you want, as long as the file itself is comma-delimited.

Code Represents

A Represents any letter (a-z, A-Z)

N Represents a numeral (0-9)

explicit Any data value element that is not a number or a letter is shown exactly as it appears in the data value, including spaces.

Value Pattern shown in TS Quality

Jane Smith aaaa aaaaa

5.00E+02 n.nna+nn

$400.00 $nnn.nn

05/31/2005 nn/nn/nnnn

[email protected] [email protected]

Cleansing Your Data

5-18 Mask

ExampleIn this example, the Start_date field has a variety of data shapes and formats, such as 1/1/2005 and 1/01/2005. Create a recode table as shown to change the mask shapes for the Start_date field, so that every Start_date has the format of MM-DD-YYYY.

Figure 5.3 Sample Recode Table 3. The table requires a DDL that assigns field names to the two

columns. Create a DDL file that corresponds to the recode table. For example, a DDL file for the table above would look like this:

Figure 5.4 Sample DDL for the Recode Table

Original Mask Recode Mask N = Numeric


Mask 5-19

4. After you create a recode table and associated DDL, click Advanced in the Transformer step and navigate to Output, Table Recoding.

5. Enter a Table Qualifier. A table Qualifier is an unique name given to a table file. Each table file must have its own unique file qualifier.

6. Enter names for the Table File and Table DDL File. 7. Specify Table File Delimiter. 8. Specify Lookup Table Fields. These fields are a list of DDL

field names in the recode table where the original values are described.

9. Specify Lookup Output Fields. These fields are a list of DDL field names in the output file where the program looks for the original values.

10. Specify the Lookup Output Fields Format: Literal or Mask.

11. Specify Recode Table Fields. These fields are a list of DDL field names in the recode table where the recoded values are described.

12. Specify the Recode Table Fields Format: Literal or Mask.

13. Specify Recode Output Fields. These fields are a list of DDL field names from the output DDL which are used to store the recode value from the recode table.

Below are the sample settings for the Start_date field.

Table Qualifier TBL1

Table File datamask.csv

Table DDL File datamask.ddx

File Delimiter Comma

Lookup Table Fields originalmask

Lookup Output Fields Start_date

Lookup Output Fields Format Mask

Reocode Table Fields recodemask

You can specify up to five (5) fields for Lookup Table Fields, Lookup Output Fields and Recode Output Fields. When specify multiple fields, separate them by commas.

Cleansing Your Data

5-20 Mask

These settings tell the Transformer to scan the Start_date field for Mask, and recode the value according to the recode table (datamask.csv). After running the Transfomer, every Start_date will have the format of MM-DD-YYYY.

The .ddx and .csv suffixes are not required for the files to work, however, we recommend that you use them to avoid confusion.

14. You can also specify the following setting:

Reocode Table Fields Format Mask Value

Recode Output Fields Start_date

Setting Description

Lookup Fields Case-Sensitive Enables or disables the case-sensitive table lookup. By default, the lookup is case-insensitive. For example, “Rick” will match either “RICK” or “riCK”.


Conditionals 5-21

Conditionals

Conditionals control the flow of TS Quality processes by performing specific operations on data records, or by running functions. In the Transformer, the Conditionals function controls all other functions including character translation, field scanning and table recoding. The conditionals settings are specified in the Advanced Settings, Conditionals window. This section explains the conditionals syntax and sample usage, and then teaches you how to build a conditional statement.

If you are using translation, recode, and/or scan functions in the Transformer, you must specify Conditionals. See “Build a Conditional Statement” on page 5-35 for instructions.

In addition to the Transformer, you can use conditional statements for the following TS Quality modules:

Customer Data Parser

Business Data Parser

File Display Utility

File Update Utility

Set and Selection Utility

SyntaxAn IF/ELSE statement is used to describe the condition. The following syntax must be used to build the conditional statement:

The IF keyword allows you to conduct conditional tests on values in the field. When conditions are True, the RUN and/or SET keywords following IF are executed. When condition(s) are False, the RUN and/or SET keywords following the ELSE keyword are executed. A conditional statement always closes with ENDIF. Refer to the

Cleansing Your Data

5-22 IF Statement

following table for a list of keywords used in conditional statements.

IF StatementThe IF statement sets the condition. The IF statement is defined by:

DDL field names

Operators (arithmetic/comparison/logical)

Field value(s)

Literal field values such as “Boston” must be enclosed in double quotation marks. Field names and numeric values do not need the quotation marks. If numeric values such as “123” are enclosed in the quotation marks, they are read as literal values instead of numeric values.

Table 5.1 Keywords of Conditional Statements

Keyword Description

IF Begins a statement. When conditions are True, the action statements following the IF keyword are executed. Required.

RUN Precedes action commands.

SET Precedes assignment commands.

ELSE When conditions are False, the action statements following the ELSE keyword are executed.

ELSEIF When IF conditions are False, ELSEIF condition is evaluated.

ENDIF Closes out a conditional statement. Required.

IF [condition] RUN [function1] SET [function2] ELSE RUN [function3] SET [function4]ENDIF


RUN/SET Statements 5-23

Example

The IF statements can be nested as long as the corresponding ENDIF statement closes out the each IF statement. See the nested IF sample at right.

RUN/SET StatementsThe RUN/SET statement contains the function to perform.

RUNThe RUN statement is defined by:

Function names as defined in the Transformer’s settings file

Entry ID (list of entries) to be executed (comma-delimited values or ranges of values)

Example

In the first RUN statement of this example, the numbers in parentheses (2,3) apply to ENTRY_ID 1 and ENTRY_ID 2 under FIELD_SCANNING. In the second RUN statment in this example, the numbers in parentheses (3-5) apply to ENTRY_ID 3, 4, and 5 under CHARACTER_TRANSLATION.

IF (age > 18 AND state IN (“NY”, “MA”) ) OR first_name LIKE “*ob”

IF [condition1] IF [condition2] SET [function1] ELSE RUN [function2] ENDIF SET [function3] ELSE RUN [function4]ENDIF

IF (age > 18) RUN FIELD_SCANNING(2,3) RUN CHARACTER_TRANSLATION(3-5)ENDIF

Cleansing Your Data

5-24 SET

SETThe SET statement takes as arguments:

DDL field names

The equal sign assignment operator (=)

Value or field data arithmetic

Example

ELSE StatementThe ELSE statement will run certain statements if a specified condition is False. In other words, you can use an IF/ELSE statement to define two blocks of executable statements: one block to run if the condition is True, the other block to run if the condition is False.

Example

In this example, if (age > 18) evaluates as True, FIELD_SCANNING (2, 3) and SET age = processing_date - birth_date are executed. If (age > 18) evaluates as False, then statement SET record_notes = “Invalid” is executed.

IF (age > 18) SET age = processing_date – birth_dateENDIF

IF (age > 18) RUN FIELD_SCANNING(2, 3) SET age = processing_date – birth_date ELSE SET record_notes = "Invalid"ENDIF


ELSEIF Statement 5-25

ELSEIF StatementA variation on the IF/ELSE statement allows you to choose from several alternatives. Adding ELSEIF clauses expands the functionality of the statement so you can control program flow based on different possibilities.

Example

In this example, if (age > 21) evaluates as True, FIELD_SCANNING (2, 3) is executed. If (age > 21) evaluates as False, the ELSEIF (age > 18) condition is performed. If ELSEIF condition (age > 18) evaluates as True, CHARACTER_TRANSLATION (3-5) is executed. If all conditions (age > 21) and (age > 18) evaluate as False, then the statement RUN FIELD_SCANNING (1) is executed.

You can add as many ELSEIF statements as you need to provide alternative choices. However, note that extensive use of ELSEIF clauses often becomes cumbersome.

IF (age > 21) RUN FIELD_SCANNING(2,3) ELSEIF (age > 18) RUN CHARACTER_TRANSLATION(3-5) ELSE RUN FIELD_SCANNING(1)ENDIF

Cleansing Your Data

5-26 Operators in Conditional Statements

Operators in Conditional StatementsThe following operators can be used in conditional statements:

Table 5.2 Operators in Conditional Statements

Operator Description

ALL Perform every defined function entry.Example: CHARACTER_TRANSLATION (ALL)

ALWAYS Always returns True; always performs the specified operation. Example: IF (ALWAYS)

AND Connects two action statements (both should be True)Example: IF (age>18 AND gender = “M”)

OR Connects two action statements (at least one should be True)Example: IF (age>18 OR year_of_birth > 1987)

UCASE Converts a literal value or field data to uppercase to evaluate the IF statement.Example: IF (UCASE(last_name)=SMITH) SET last_name=UCASE(NAME)

This example tests the field for the literal of any case combination of “SMITH”, and if TRUE, it makes the string in the field uppercase.

LCASE Converts a literal value or field data to lowercase to evaluate the IF statement.Example: IF (LCASE(last_name)=smith) SET last_name=LCASE(name)

This example tests the field for the literal of any case combination of “smith”, and if TRUE, it makes the string in the field lowercase.

= Is equal to

!=, <> Is NOT equal to

> Is greater than

< Is less than

>= Is greater than or equal to

<= Is less than or equal to


Operators in Conditional Statements 5-27

LIKE Links a literal with a wild card asterisk (*) in a field that is used to look for a match. You can place the asterisk before the literal (for example, “*LE”) to search for all matches to the beginning of a string, or place it after the literal (for example “LE*”) to search for matched endings. You cannot place an asterisk in the middle of a literal, however, for example “L*E”. Example: IF first_name LIKE “*OB”

IN Means “field value is in” Example: IF house_number IN “1,2,3,4”

BETWEEN Means “field value is between” Example: IF house_number BETWEEN “12,34”

+ Sum of

– Difference of

|| String concatenation

/ Divided by

* Multiplied by

Table 5.2 Operators in Conditional Statements

Operator Description

Cleansing Your Data

5-28 Operators for Asian Characters

Operators for Asian CharactersIn addition to the operators in the previous section, TS Quality supports a wide range of operators specific to Asian character data. The following table shows the list of operators that you can use in conditional logic statements for Asian data.

Table 5.3 Operators for Asian Characters

Name Description

JTOKATAKANA

(Japan)

Transforms Hiragana characters to full-width Katakana.

Example:

はーとはんくす ⇒ ハートハンクス

JTOHIRAGANA

(Japan)

Transforms full-width Katakana characters to Hiragana.

Example:

ハートハンクス ⇒ はーとはんくす

CJKTOHALF

(China, Japan, Korea, Taiwan)

Transforms full-width characters to their half-width form. This operator automatically processes Katakana accent marks (dakuten and handakuten) appropriately.

Example:

Ｈａｒｔｅ－ｈａｎｋｓ ⇒ Harte-hanks


Operators for Asian Characters 5-29

CJKTOFULL


Transforms half-width characters to their full-width form. This operator automatically processes Katakana accent marks (dakuten and handakuten) appropriately.

Example:

Harte-hanks ⇒ Ｈａｒｔｅ－ｈａｎｋｓ

JKANATOROMAN(Japan)

Transform Hiragana and full-width Katakana characters to Hebon style Romaji.

Example:

じょうぞうしょ ⇒ jouzousho

JROMANTOKANA

(Japan)

Transforms Romaji (Kunrei or Hebon) characters to full-width Katakana.

Example:

haatohankusu ⇒ ハートハンクス

KTOROMAN

Korea

Transforms Korean Hangul characters to their Romanized forms.

Example:

대치동 ⇒ daech’idong


Name Description

Cleansing Your Data

5-30 Operators for Asian Characters

HIRAGANASTOL

(Japan)

Transforms small size yo-on and soku-on to its large equivalent.

Zenkaku あいうえおつやゆよわ

Hankaku ぁぃぅぇぉっゃゅょゎァィゥェォッャュョヮヵ

Example:

マッチャー ⇒ マツチヤー

CTOTRADCHINESE

(China, Taiwan)

Transforms all Simplified Chinese characters to their Traditional Chinese equivalent.

Example:

广东 ⇒ 廣東

CTOSIMPCHINESE

(China, Taiwan)

Transforms all Traditional Chinese characters to their Simplified Chinese equivalent.

Example:

臺灣 ⇒ 台湾

CJKTOARABICNUM


Transforms Chinese number symbols to their Arabic decimal equivalents.

Example:

百五十 ⇒ 150

Please make sure that you are applying this operator to the field where Chinese numbers only represent NUMBERS. Otherwise, following may happen. 千葉県 ----> １０００葉県


Name Description


Full-width (Zenkaku) and half-width (Hankaku) Japanese Characters 5-31

Full-width (Zenkaku) and half-width (Hankaku) Japanese Characters

The following list shows Japanese full-width and half-width characters that can be converted using these operators.

Blank character

Romaji ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

Number 0123456789

Symbol ~ ! @ # $ % ^ & * （） _ + ` - = { } | [ ] \ : ì ; < > ? , . / ﾞﾟ ‘ “

Katakana アイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワンャュョァィゥェォッ

Cleansing Your Data

5-32 Full-width (Zenkaku) and half-width (Hankaku) Japanese Characters

Table of Katakana/Hiragana and Their Hebon/Kunrei Romaji Equivalents

Hira - gana

Katakana Hebon Kunrei Hira - gana

Katakana Hebon Kunrei

あア a a がガ ga gaいイ i i ぎギ gi giうウ u u ぐグ gu guえエ e e げゲ ge geおオ o o ごゴ go goかカ ka ka ざザ za zaきキ ki ki じジ ji ziくク ku ku ずズ zu zuけケ ke ke ぜゼ ze zeこコ ko ko ぞゾ zo zoさサ sa sa だダ da daしシ shi si ぢヂ di diすス su su づヅ du duせセ se se でデ de deそソ so so どド do doたタ ta ta ばバ ba baちチ chi ti びビ bi biつツ tsu tu ぶブ bu buてテ te te べベ be beとト to to ぼボ bo boなナ na na ぱパ pa paにニ ni ni ぴピ pi piぬヌ nu nu ぷプ pu puねネ ne ne ぺペ pe peのノ no no ぽポ po poはハ ha haひヒ hi hi しゃシャ sha syaふフ fu fu しゅシュ shu syuへヘ he he しょショ sho syoほホ ho ho ちゃチャ cha tyaまマ ma ma ちゅチュ chu tyuみミ mi mi ちょチョ cho tyoむム mu mu じゃジャ ja zyaめメ me me じゅジュ ju zyuもモ mo mo じょジョ jo zyoやヤ ya ya


How to Use Operators for Asian Characters 5-33

How to Use Operators for Asian CharactersAsian Pacific (APAC) operators:

JTOKATAKANA, JTOHIRAGANA, CJKTOHALF, CJKTOFULL, JKANATOROMAN, JROMANTOKANA, KTOROMAN, HIRAGANASTOL, CTOTRADCHINESE, CTOSIMPSCHINESE, CJKTOARABICNUM.

The following are some simple examples of conditional statements for APAC operators.

Syntax 1This syntax is used to convert a literal value or field data in the DDL field.

Example 1In this example, all full-width characters in the INPUT_LINE_01 field are converted to their half-width form.

ゆユ yu yuよヨ yo yoらラ ra raりリ ri riるル ru ruれレ re reろロ ro roわワ wa waんン n nをヲ wo wo

IF [condition] SET [DDL field name] = [Operator](DDL field name)ENDIF

IF (ALWAYS) SET INPUT_LINE_01 = CJKTOHALF(INPUT_LINE_01)ENDIF

Cleansing Your Data

5-34 Syntax 2

Syntax 2This syntax is used to convert a literal value or field data in the DDL field 2 and compare it against the value in the DDL field 1 to evaluate the IF statement.

Example 2In this example, the program converts the Traditional Chinese characters in the CUSTOMER_NAME field to Simplified Chinese, and compares it against the value in the INPUT_LINE_01 field. If that value is equal to the value in INPUT_LINE_01, it will run FIELD_SCANNING. If the value is not equal, it will run the TABLE_RECODING function.

IF [DDL field name 1] = [Operator] (DDL field name 2) RUN [function]ENDIF

IF INPUT_LINE_01 = CTOSIMPCHINESE (CUSTOMER_NAME) RUN FIELD_SCANNING(ALL) ELSE RUN TABLE_RECODING(ALL)ENDIF


Build a Conditional Statement 5-35

Build a Conditional StatementThe Conditional Statements are built in the Conditionals Logic Builder in Advance Settings.You can specify conditional settings for your input data or output data.

To build a Conditional Statement

1. Click Advanced and navigate to Output, Conditionals. 2. Click Edit Condition to open the Logic Builder window.

Notice that the default setting “IF (ALWAYS), RUN FIELD_SCANNING (ALL)” has been specified. This means that the Field Scanning function will always run for all records.

Figure 5.5 Conditionals Logic Builder

Cleansing Your Data

5-36 Build a Conditional Statement

3. Click the button on the upper right and select your Qualifiers For Input Data Files from the pop-up list.

4. Select condition encoding from the Condition Encoding drop-down list.

5. In the middle pane, place the cursor after “RUN FIELD_SCANNING (ALL)”.

6. In the Key Words box in the lower pane, double-click RUN. The keyword RUN is inserted into the expression at the cursor location.

7. In the Function box in the lower pane, double-click TABLE_RECODING. The function TABLE_RECODING is inserted into the expression at the cursor location.

8. In the middle pane, place the cursor after TABLE_RECODING and type in the opening parenthesis.

9. In the Operators box in the lower pane, double-click ALL and close the parentheses. Your expression should now look like this:

10. Click Apply and Close. In this example, the field scanning and table recoding will always run for all records.


Select or Bypass Records 5-37

Select or Bypass RecordsWhile the Conditionals function is applied to perform specific operations on a record, the Select/Bypass Records function can be used to either Select or Bypass input or output records under certain conditions. The Select/Bypass function uses the Logic Builder located in the Advanced Settings window.

To build a Select/Bypass Condition

1. Click Advanced and navigate to Input or Output, Settings.

2. Select the Select Record Conditions or Bypass Record Conditions tab.

3. Click Edit Condition. The Logic Builder window displays.

Figure 5.6 Select and Bypass Condition Logic Builder 4. In the upper pane, select Condition Encoding from the

drop-down list.

You can use Select/Bypass Conditions for most of the TS Quality modules.

Cleansing Your Data


5. In the list of DDL Fields in the right pane, double-click a DDL field name. This is the field to which you will apply the select/bypass conditions.

6. In the Operators box in the lower right pane, double-click an operator.

7. When you have finished, click Apply and Close.In the example above, LINE_01<18 indicates that only records in which the value in the DDL field Line_01 is less than 18 will be included and selected for further processing.

You can use any of the operators for the conditional statements to create a select/bypass definition. See “Operators in Conditional Statements” on page 5-26 for the conditional operators.

Additional SettingsYou can also specify the following additional settings:

To include record sequence on output

1. Click Advanced and navigate to Output, Settings. 2. In File Sequence Field, select the DDL field which will

receive the record sequence number.


1. Click Advanced and navigate to Additional.... 2. Select Enable Debug Output. 3. In the Debug File text box, accept the default or specify the

path and file name of the debug file. 4. Optionally, you can enable the File Trace Key function. If

the File Trace Key is specified (field name), the debug file uses the value of that field when reporting.

5. Click Advanced and navigate to Input, Settings. Specify a DDL field name for File Trace Key.

See “Transformer” in the TS Quality Reference Guide for the complete settings information.


Run the Transformer and View Results 5-39

For example, if a record has a Field Scan performed on it, then a line is added to the debug file describing the recode. The value of the specified field is used to identify the record in the report. If this function is not used, each record that is read gets a record number assigned based on the order in which it was read.

To specify a mask file

1. Click Advanced and navigate to Process, Settings. 2. In the Mask File text box, specify the path and file name for

the mask file.

To count the number of records processed

1. Click Advanced and navigate to Additional.... 2. In the Sample Count text box, specify the number that

indicates the increment sample of records to read and attempt to process from an input data file.



1. Click Advanced and navigate to Additional.... 2. In Settings File Encoding, select the appropriate encoding

from the drop-down list.


Run the Transformer and View Results To run the Transformer and view results

1. Click OK to close the Advanced settings.

Cleansing Your Data

5-40 Run the Transformer and View Results

2. Click Run to run the Transformer.

You can also right-click a step and select Run Selected.

3. Select OK. 4. On the Results tab, view the Statistics sub-tab. Notice the

records affected by the Field Scan and Table Recode. 5. On the Output Settings tab, use the Data Browser to

view the Phone and Start_date fields. 6. Run the inputTransformer step. On the

Results>Statistics tab, review the output statistics using My Statistics Viewer and the Spreadsheet Viewer

. Both viewers allow the user to print statistics for other applications.

7. View the fields on the Output Settings tab using the Data Browser to be sure the scan and recode occurred.



6-1

CHAPTER 6 Standardizing Your Data

Standardizing Your Data

6-2

In this chapter, you will standardize the name and address elements using the Customer Data Parser, then standardize the non-name and address elements using the Business Data Parser.

This chapter explains the parsing logic used to standardize data elements. You will perform these tasks:

Specify input and output files

Define the settings for the Customer Data Parser and Business Data Parser

Use name generation to determine how many additional records are generated

Set line definitions for input data

Run the Customer Data Parser and Business Data Parser and view results

For Asia-Pacific countries (China, Japan, Korea, and Taiwan), the Customer Data Parser identifies and standardizes the name elements only. Parsing and standardization of address elements for those countries' data is performed by country-specific Postal Matchers.


Using the Customer Data Parser 6-3

Using the Customer Data Parser

The Customer Data Parser (CDP) identifies freeform name and address data. The CDP identifies elements of data from the input data file using the data in the fields INPUT_LINE_01 through INPUT_LINE_10.

Only the data in the fields INPUT_LINE_01 through INPUT_LINE_10 will be parsed.

The CDP uses country-specific tables in order to verify and identify data according to each country’s postal rules and idioms. Once the data is identified, output is generated.

The following two field data types are output:

original data

recoded or standardized data

The parsing process is highly table-driven. This allows users to customize name and address identification for specific business requirements.

The CDP identifies and standardizes name and address data elements. To parse non-name and address data elements, such as product name, use the Business Data Parser (BDP).

If the CDP cannot identify a piece of data in the record, an exception is written to the exception file.This file can then be used to add customized entries to the Parser Definitions Table.

See Chapter 7, “Tuning the Parsing Rules” for instructions on analyzing exceptions and customizing the Parser Definitions Tables.


6-4 Understanding Parsing Logic Flow

Understanding Parsing Logic Flow

The CDP assigns all possible attributes to the input name and address data. Based on the attributes, the CDP identifies line types and assigns final attributes based on known patterns. The CDP’s output includes original data as well as recoded or standardized values. The Customer Data Parser follows this process:

1. Assign all possible attribute(s) to the word/phrase (tokens) in the Input Name and Address Area, such as Title-Prefix, Given-Name1, Surname, etc.

2. Identify line types according to attribute weights and counts. 3. Search for known patterns and assign final word/phrase

attributes. 4. Generate output.

ExampleAssume that you have the following name and address data in an input file:

In the above example, INPUT_LINE_01 will be defined as a BUSINESS NAME line because the word ‘Drug’ has a BUSINESS definition in the word/pattern table, and because ‘Lexington’ is the same as the city name.

INPUT_LINE_02 will be defined as a PERSONAL NAME line based on its relation to the other lines in the input area and because of the name definitions found on the line. (A detailed explanation of how this particular line was processed follows this section.)

INPUT_LINE_01 Lexington Drug

INPUT_LINE_02 Ben K Pike MD

INPUT_LINE_03 10 Lois Lane

INPUT_LINE_04 Lexington 02420


How the Customer Data Parser Identifies Business Names 6-5

INPUT_LINE_03 will be defined as a STREET line based on its relation to other lines in the input area and because of the HSNO and STR-TYPE attributes found on that line.

INPUT_LINE_04 will be defined as a GEOGRAPHIC line because of the POST CODE mask found in the line, and because the combination of POST CODE and CITY are found in the parsing city table. In this example, the CDP will add the state abbreviation of MA to the output record.

How the Customer Data Parser Identifies Business Names

The Customer Data Parser uses the following criteria to determine if a name is a business name: a line...

Contains at least one word of attribute BUSINESS

Does not contain a word of an attribute of personal nature (for example, GIVEN-NAME1 or SURNAME)

Begins with the same value as the city and is not further qualified

Contains a word that uses an apostrophe followed by the letter “s” (possessive form)

Contains an unidentified word that consists of all consonants and is at least four characters long

Does not pass Name Pattern Validation (it will have a reject name form, but will be stored in the PREPOS business name field)

Contains more than one comma on the name line

CDP Parsing Process This section details the specific processing for INPUT_LINE_02.

Pattern processing provides the final attribute assignment for a line, enabling compound business and personal names to be displayed on the same line.


6-6 CDP Parsing Process

Step 1 Assign all possible attributes

First, the CDP assigned all possible attributes for each components of data in INPUT_LINE_02.

Step 2 Line type and specific attribute assignment

The CDP identified this line as a Name line because it had more name definitions than street or geography definitions. ‘BEN’ is no longer considered a RELATIONSHIP attribute since it is not located at the END of the name line.

Step 3 Pattern lookup and assign final word/phrase attributes

Once the CDP identified the line types and the attributes on those lines, a pattern was created. The CDP then looks the pattern up in the Parser Definitions Table. If the pattern is found, the recode value is returned, as in this example. If the pattern is not found, the CDP will not be able to recode the unknown attributes and it will

BEN K PIKE MD

Name GVN-NM1 1ALPHA ALPHA TITLE-SUFFIX

RELATIONSHIP

Street ALPHA 1ALPHA TYPE ALPHA

Geog COUNTRY 1ALPHA ALPHA STATE

BEN K PIKE MD


BEN K PIKE MD


GVN-NM1 GVN-NM2 SURNAME TITLE-SUFFIX


CDP Parsing Process 6-7

send the ‘bad name pattern’ to the parsing exception file for review.

The numbers after the attributes in the recode line are referred to as Name Numbers, indicating that the CDP identified one person on this record.

Step 4 Generate Output

The CDP can identify name and address elements for many countries, using country-specific definitions tables. The CDP identifies up to ten lines (100 bytes each) of input Name/Address data. It can also identify up to ten names per input record.

Entry from Parser Definitions Table (using allowable abbreviations):

‘GVN-NM1 1ALPHA ALPHA TITLE-SUFFIX’ PATTERN NAMERECODE=’GVN-NM1(1) GVN-NM2(1) SURNAME(1) TITLE-SUFFIX(1)’

Original Input Data Standardized Output Data

BUS-NAME: Lexington Drug BUS-NAME: LEXINGTON DRUG

GVN-NAME1: Ben GVN-NM1: BENJAMIN

GVN-NAME2: K GVN-NM2: K

SURNAME: Pike SURNAME: PIKE

TITLE-SUFFIX:

MD TITLE-SUFFIX: MD

HSNO: 10 HSNO: 10

STREET-NAME:

Lois STREET-NAME: LOIS

STREET-TYPE: Lane STREET-TYPE: LN

CITY: Lexington CITY-NAME: LEXINGTON

STATE: STATE: MA

POST CODE: 02420 POST CODE: 02420


6-8 Customer Data Parser for China, Japan, Korea, and Taiwan

Customer Data Parser for China, Japan, Korea, and Taiwan

The Customer Data Parser (CDP) identifies personal and business names for China, Japan, Korea, and Taiwan as follows.

Step 1 - Token IdentificationThe first step in parsing Asian words is to isolate words and phrases into tokens. Tokens may contain one or more characters (and/or symbols) that are identifiable as a word or word/phrase element. If commas or spaces are present, these are used to determine where one token ends and another begins.

Step 2 - Parsing Definition Table LookupThe second step is to scan each token against one or more parsing definition tables (also known as a lookup or word/phrase and pattern table). This process verifies which tokens are a personal or business name. It also identifies the surname character(s) and uncovers new tokens based on the look-up results.

How CDP Identifies Tokens for China, Korea, and Taiwan

The first step in parsing is to isolate all words and phrases by breaking up the input field(s) into recognizable tokens. During the initial scan, the Parser uses commas or space characters in the input field to determine where one token ends and the next begins.

Example - China

Input data: 吴卓霖，广东省广州市南华路 22 号，135800

Initial token results: (1 name token) 吴卓霖


How CDP Identifies Chinese and Korean Names 6-9

Example - Korea

Input data: 홍길동 , 서울시 강남구대치동 973-2 3 층 , 135-280

Initial token results: (1 name token) 홍길동

Example - Taiwan

Input data: 鄭淑珍，台北市四維路 2號 3 樓，106

Initial token results: (1 name token) 鄭淑珍

How CDP Identifies Chinese and Korean Names

The Customer Data Parser (CDP) uses parsing definition tables (also known as word/phrase and pattern tables) to identify each name element.

After initial tokens are created, the Parser scans each token against the appropriate parsing definition tables. During this process, all word elements that can be further identified as part of a name, for example, a surname and given name, are created as separate tokens.

Example - ChinaToken results: 2 tokens)

Previous Results New Results Reasoning

吴卓霖吴 | 卓霖 Based on surname lookup


6-10 How CDP Identifies Names for Japan

Example - KoreaToken results: 2 tokens

Example - TaiwanToken results: 2 tokens

How CDP Identifies Names for JapanThe basic functionality of the Parser consists of the following three parsing methods.

Personal name parsing (PNP)

Business name parsing (BNP)

Personal/Business parsing (BNP_CLUE)

Personal Name Parsing (PNP)PNP separates personal names. The Parser separates the input field into a last name, a first name, a title and an honorific. It is assumed that the input data contains only the name of one person, but you can create multiple output records when you encounter multiple personal names.

Input data: 山田花子様


홍길동 홍 | 길동 Based on surname lookup


鄭淑珍鄭 | 淑珍 Based on surname lookup


How CDP Identifies Names for Japan 6-11

Token results: 3 tokens

Business Name Parsing (BNP)BNP separates business names. The Parser separates the input field into a business name, a business type and a branch name. You can create a consistent business name by registering the business name pattern to the principal business name table. See “How to Customize the Parser Definition Tables for Japan” in Chapter 7.

Input data: ( 株 )AB 総務部


Principal business name table: B,AB, エービー ,

BNP_CLUE (Personal/Business Name Parsing)BNP_CLUE determines personal/business category and separates the input record accordingly.

Input data 1: 山田花子様Input data 2: ( 株 )AB 総務部


Last Name First Name Honorific

山田花子様

Business Name Principle Name Business Type

Branch

AB エービー ( 株 ) 総務部

Last Name

First Name

Honorific Business Name

Principle Name

Business Type

Branch

山田花子様

AB エービー ( 株 ) 総務部


6-12 Zenkaku and Hankaku Parse

Zenkaku and Hankaku ParseThe Parser for Japan can take both zenkaku (full-width) and hankaku (half-width) fields as input. The zenkaku and hankaku input fields are specified by the Pr Inp Field Name (zenkaku) and Pr H Inp Field Name (hankaku) settings in Advanced Input Settings. You must have zenkaku data in the zenkaku field and hankaku data in the hankaku field. Then you can specify whether to parse only the zenkaku field or hankaku field, or both fields using the Field Type Parsing Mode settings in Advanced Process Settings.

Example:Zenkaku field Hankaku field

INPUT_LINE_01 FURIGANA_NAME

山田太郎ﾔﾏﾀﾞﾀﾛｳ

ハートハンクスﾊｰﾄﾊﾝｸｽ

Zenkaku/Hankaku Mixed: The Parser cannot process the fields where Zenkaku and Hankaku data are mixed (except for spaces).

NULL Mixed: The data with null value in the input field cannot be processed correctly.

PREPOSThe CDP then passes a comprehensive data block called the PREPOS (Parser Repository). The PREPOS contains fixed-fielded character data including error codes, identification indicators, name information, street information and geographic information. The Output DDL determines which of these fields are returned to the Output file.

See Appendix B of TS Quality Reference Guide for a complete list of PREPOS fields and descriptions.


PREPOS 6-13

Example (PREPOS Fields)

Figure 6.1 Sample PREPOS Fields



Input and Output SettingsThe CDP uses the output from the Transformer step as its input.


1. Open the Customer Data Parser step and select the Input Settings tab.


3. Click the Dictionary Editor icon and view the input DDL. 4. The CDP only scans the fields defined as INPUT_LINE_01

through INPUT_LINE_10. These mappings were provided from the Project Wizard when you specified the Name and Address data. These are reserved field names and they represent the Input data to the CDP.

5. Close the DDL Editor. 6. Select the Output Settings tab. 7. Specify a file name in the Output File Name and Output

DDL Name text boxes. 8. Enter a file name in the Statistics File Name and Process

Log Name text boxes.



To check the Repository DDL File

1. Click Advanced and navigate to Additional....






2. In Repository DDL File, make sure the correct repository DDL file is specified. This DDL contains the layout of the PREPOS fields.

Country-specific repository DDLs are provided with the program. We recommend that you not change these default PREPOS DDL files. See Appendix B of TS Quality Reference Guide for a complete list of PREPOS fields and descriptions.















6-16 Process Settings

3. For output, click Advanced and navigate to Output, Settings.

4. Select Output Data File Delimiter Encoding and Output Data File Delimiter from the drop-down list.


To specify an exceptions file

1. Click Advanced and navigate to Process, Settings. 2. In the Exceptions File text box, enter or specify the path

and name of the file that contains exception records. Exception records contain data such as incorrect records or field types.

To specify a mask file

1. Click Advanced and navigate to Process, Settings. 2. In the Mask File text box, enter or select the path and file

name for the mask file.

You can specify records to either Select or Bypass under certain conditions in both input and output files. See “Select or Bypass Records” on page 5-37 for instructions on how to specify select and bypass definitions.

Process SettingsOnce you have specified input and output files, you can specify the settings used to process your data. The settings for processing are managed in the Advanced Settings window.

The navigation pane of the Advanced Settings window contains two tabs:

Parser

Prcustom



Parser Tables 6-17

The Parser tab contains settings for the Customer Data Parser. The Prcustom tab is used to define settings for the Parser Customization process. The Parser Customization process is explained in the next chapter.

The settings for China, Japan, Korea, and Taiwan differ slightly from other countries. Refer to the online Help or the TS Quality Reference Guide for those countries' settings.

Parser TablesThe Customer Data Parser uses two table files to parse the name and address elements of the input data.

Word Pattern Definition File—Defines word patterns for a given country. It contains standard definitions for words and phrases (tokens), and the patterns associated with each line type.

City Directory File—Defines state names, city names, and postal codes for a given country.

To specify the Parser tables

1. Click Advanced and navigate to Process, Settings. 2. Specify the Word Pattern Definition File and City

Directory File.Default Word Pattern Definition File (US): \TrilliumSoftware\tsq10r5s\<project>\tables\USCDPDEF.lenDefault City Directory File (US):\TrilliumSoftware\tsq10r5s\<project>\tables\USCITY.len

These files are read-only. These tables and parsing city directories carry a two letter prefix to indicate the country: (US = United States, CA = Canada, GB = United Kingdom, DE = Germany and so forth.)


6-18 Business Attribute

Business AttributeYou must specify whether to enable or disable the business assignment function.

To specify the business attribute

1. Click Advanced and navigate to Process, Settings. 2. Refer to the table below and select one of these options for

Assigned Business Attribute:

Preprocess House NumberThe parser normally pre-processes house numbers before processing street patterns. You must choose whether to preprocess the house number.

To specify whether house numbers are preprocessed

1. Click Advanced and navigate to Process, Settings. 2. Refer to the table below and select one of these options for

Preprocess House Number:

Setting Description

Automatic Business

For any word assigned a BUSINESS attribute, the entire line becomes BUSINESS.

Business via Pattern

Business, possible business, business-descriptive, and business-redefine attributes are all turned off. Business names are generated only from patterns.

No Business Assignment

Turns off the setting of token meanings of business attributes and possible business attributes.

Setting Description

No Preprocessing Disable preprocessing.

Minimum Preprocessing

A fractional number like “1 1/2” becomes “11/2”. Note that "1 1/2" becomes a HSNO token (the fraction portion must be 3 characters in length and include the “/”).


Line Definitions 6-19

Line DefinitionsIn this example, the input file has two names on each record. The first is a business name, and the second is a personal name. The first input line consists of a business name and the second line is a personal name (contact name). This is a very common data structure. You can pre-define these two line types to the CDP, thus allowing the CDP to work more efficiently.

To set line definitions

1. On the Advanced settings, navigate to Input, Settings. 2. Select the Line Definitions tab. “No Pre-definition” is set by

default. This setting allows the CDP to determine the line type.

3. For each Input Address Line, choose one of the following options:

Maximum Preprocessing

Option 1: A number like “1 1/2” becomes “11/2”. Note that "1 1/2" becomes a HSNO token (the fraction portion must be 3 characters in length and include the “/”).Option 2: A number like “2420-36” becomes “2420 36” (this option does not work for New York, New Jersey and Hawaii).

Setting Description

Setting Description

Name Line Input Address Line that contains personal names

Business Name Line

Input Address Line that contains business names

Street Line Input Address Line that contains street components

Geography Line Input Address Line that contains geography components (neighborhood, city, state, county, and postal code)

No Pre-definition None: the Parser determines the line type (default)

Prohibit Name Line definition

The Parser will not determine the line type


6-20 Generate Name Sections

4. On Input Address Line 1, select Business Name Line from the drop-down list. This pre-defines the line as a business name line.

5. Select Name Line from the drop-down list for Input Address Line 2. This pre-defines the line as a name line.

Figure 6.2 CDP Line Definitions

Generate Name SectionsBy default, the CDP is set to generate a record for each name found. If you do not want to generate additional output records and would like to identify all names found on the input records, you can create an additional name section so that all names are stored in the same record. In this case, the output DDL must be modified to store the information from the second name identified by the CDP: the consumer name. In this example, you will create two name sections.

To create two name sections in the output DDL

1. In the DDL Editor, select Tools, Parser Output DDL Generator.

2. In the Country box, select the appropriate country from the drop-down list.

3. In the Number of Name Segments box, select the number of name sections you want to generate.


Name Generation 6-21

4. Specify the ORIGINAL_RECORD DDL and the Output DDL file.

Figure 6.3 Parser Output DDL Generator 5. Select Create. 6. Select Yes to redefine the section. 7. Select Yes to see the update. 8. Select File, Exit to close the DDL Editor.

Name GenerationAfter the Parser processes the input data, it generates name and address records. This process is called name generation. In many cases, one record in the input data contains multiple business or personal names. You must specify how many records to generate when more than one business/personal names are found in the input data.

To define name generation settings

1. Select Advanced and navigate to Output, Name Generation.

2. The right pane of the Customer Data Parser Output Name Generation window contains two tabbed dialog boxes:Field Settings and Entry Settings.



3. Click the Field Settings tab. Refer to the following table and specify the values for these settings.

4. For example, the settings below instruct the CDP not to generate additional records for personal or business names.

Figure 6.4 CDP Field Settings

Additional SettingsYou can also specify the following settings:

To join name linesYou can join the second name line (INPUT_LINE_02) to the first name line (INPUT_LINE_01) for re-parsing purposes. Both lines must have a valid pattern identified for this to work.

1. Click Advanced and navigate to Input, Join Lines.

Setting Description

Generate Business Records for Additional Names

Numeric value between 0-9 that specifies how many business records to generate when more than one business name is present on the input (whether on the same record or on generated name records).

Generate Personal Records for Additional Names

Numeric value between 0-9 that specifies how many personal records to generate when more than one personal name is present on the input (whether on the same record or on generated name records).

Max Original Lines to Generate Names For

Numeric value that indicates the maximum number of original lines for which to generate names. The default is to process all records.

See “Customer Data Parser” in TS Quality Reference Guide for the complete settings information.


Additional Settings 6-23

2. Specify the From Line Index and To Line Index. From Line Index is the number (1 to 10) of the name line to be joined. To Line Index is the number (1 to 10) of the name line to which the line specified in From Line Index will be joined.

3. Specify the From Line Begin Value and To Line End Value. From Line Begin Value is the character string that is to be at the beginning of the joined line. To Line End Value is the character string that is to be at the end of the joined line.

4. Select either Literal or Mask for From Line Begin Value Format and To Line End Value Format. They are the format for the value specified in From Line Begin Value and To Line End Value.

To split address linesYou can split the address line before parsing. The CDP works more efficiently if the two address lines are split into two physical lines, rather than storing two addresses on one line.

1. Click Advanced and navigate to Input, Split Lines. 2. Select either First Occurrence or Last Occurrence for

Split Occurrence. First Occurrence splits on the first occurrence of the matching From Line End Value and To Line Begin Value. Last Occurrence splits on the last occurrence of the matching From Line End Value and To Line Begin Value.

3. Specify the From Line Index and To Line Index. From Line Index is the number (1 to 10) of the address line from which to split. To Line Index is the number (1 to 10) of the line where the new line will be inserted.

4. Specify the From Line End Value and To Line Begin Value. From Line End Value is the character string that is to be at the end of the split line. To Line Begin Value is the character string that is to be at the beginning of the split line.

5. Select either Literal or Mask for From Line Begin Value Format and To Line End Value Format. They are the



format for the value specified in From Line End Value and To Line Begin Value.

If an address has ten lines and a split line is perfomed, then the last line will be dropped.



name, or specify the name of the file to which debugging information will be sent.


1. Click Advanced and navigate to Process, Settings. 2. In the Sample Count text box, specify the number that




1. Click Advanced and navigate to Additional.... 2. In Settings File Encoding, select the correct encoding from

the drop-down list.



Run the Customer Data Parser and View Results 6-25

Run the Customer Data Parser and View Results

To run the Customer Data Parser and view results

1. Click OK to close Advanced settings. 2. Click Run to run the CDP.



Statistics file indicates the number of records read into and out of the CDP and displays name, street and geographic review information.

5. Review the Statistics file. Verify that no additional names were generated by the CDP. Be sure to use My Statistics Viewer and the Spreadsheet Viewer to review the CDP output statistics.

Analyze ResultsAfter running the CDP, the Parser generates Completion Codes and Review Codes to identify specific conditions which occurred for each record being parsed. You can review those codes to analyze the Parser results.

The completion codes are written to the CDP Repository Output Record (PREPOS) in the following field:

pr_completion_code

The review codes are written to the CDP Repository Output Record (PREPOS) in three character pairs in the following fields:

pr_name_review_codes

pr_street_review_codes

pr_geog_review_codes



6-26 Statistics File

pr_misc_review_codes

pr_global_review_codes

When a record receives a review code, Review Groups are also written to the following field:

pr_rev_group

For multiple review codes, the review group is determined by a default hierarchy table.

See Appendix B for the complete list of Completion Codes, Review Codes and Review Group for the Customer Data Parser.

Statistics FileThe Parsing Statistics Report is generated by the CDP and summarizes the number and percentage of records distributed over each review group. A brief description of each review group also appears on the statistics report.

Figure 6.5 Sample CDP Statistics

To change the review group order, Review Group Order (Process, Settings) can be used to specify the review group hierarchy.

Review Groups# of Records % Descriptions 0 945 94.5%No Targeted Conditions Found 1 0 0.0% Unidentified Item 2 22 2.2% Mixed Name Forms 3 0 0.0% Hold Mail 4 0 0.0% Foreign Address 5 0 0.0% No Names Identified 6 0 0.0% No Street Identified 7 2 0.2% No Geography Identified 8 4 0.4% Unknown Name Pattern 9 8 0.8% Derived Genders Conflict 10 11 1.1% More Than One Middle Name 11 1 0.1% Unknown Street Pattern 12 0 0.0% Invalid Directional 13 0 0.0% Unusual or Long Address 14 3 0.3% No City or County Identified


Using the Business Data Parser 6-27

Using the Business Data Parser

The Business Data Parser (BDP) uses pattern-recognition technology to identify, verify, and standardize non-name and address components of free-form text. The parsing process is driven by business rules that you can customize to meet your specific business requirements.

Use the Business Data Parser to perform several tasks:

Identify words and phrases in free-form text

Produce standardized and identified output in useful formats

Use customized user-defined attributes

Offers flexibility through an externally-edited set of tables for business rules

Identify words and phrases by their values or their masks

Correct misspellings and enable word or phrase recodes using external tables

Categorize any unique words and phrases using user-defined conditional text

Identify data for review by numerous methods

Produce standard output, so that applications may easily choose needed data elements

Display results in a log file to use for tuning business rules

Collect run statistics to quickly identify development areas

Produce a log that identifies problems to help refinement of the external word, phrase, and pattern tables

BDP Parsing ProcessThe Business Data Parser parses data and identifies patterns based on the following criteria:

For parsing of names and addresses, use the Customer Data Parser.


6-28 BDP Parsing Process

Step 1 Assign all possible attributesThe BDP identifies each word and phrase and compares them to the business rule table supplied by the Parsing Customization process. When the BDP finds a word or phrase in the table, it assigns the associated specific attribute for that table entry. For example:

If a word or phrase isn’t specified in the table, the BDP assigns it an intrinsic attribute, such as ALPHA or NUMERIC.

Step 2 Pattern lookup and assign final word/phrase attributesThe BDP looks up the entire combination of words, called a pattern, in the pattern list.

If a match to the pattern list exists, then the BDP assigns a final attribute to all words, based on the pattern.

If no match exists, then the BDP writes the pattern details to the log file for further review and tuning.

Step 3 Line type and specific attribute assignment Each line is then assigned a line type. The default line type of M (Miscellaneous) is assigned to a line unless both of the following conditions are true:

The line matches a pattern in the word and pattern table, and

A line attribute for that pattern is defined. For example:

1995 Toyota Camry

Attribute YEAR MAKE MODEL

1995 Toyota Camry

M YEAR MAKE MODEL


BDP Parsing Process 6-29

You can assign up to fifty user-defined attributes, named USER-NN, where NN = a numeric value between 1-50, inclusive.

Step 4 Generate OutputThe BDP produces a comprehensive data block called the BPREPOS (Business Data Parser Repository). The BPREPOS consists of fixed-fielded character data including error codes and identification indicators. The Output DDL determines which of these fields are returned to the Output file, and can be customized by the user.

See Appendix B of TS Quality Reference Guide for a complete list of BPREPOS Fields and descriptions.

Example (BPREPOS Fields)

Figure 6.6 Sample BPREPOS Fields



Input and Output Settings


1. Open the Business Data Parser step and select the Input Settings tab.


3. Navigate to the Output Settings tab. 4. Specify a file name in the Output File Name and Output

DDL Name text boxes. 5. Enter a file name in the Statistics File Name and Process




To specify the parser fieldYou must specify the input DDL field that contains the data to be parsed.

1. Click Advanced and navigate to Input, Settings. 2. Select Parse Field from the drop-down list.

To check Repository DDL File

1. Click Advanced and navigate to Additional.... 2. In Repository DDL File, make sure the repository DDL file

is specified. This DDL contains the layout of the BPREPOS fields.






The country-specific BPREPOS DDL is provided with the program.




















You can specify records to either Select or Bypass under certain conditions in both input and output files. See “Select or Bypass Records” on page 5-37 for instructions on how to specify select or bypass definitions.

Process Settings Once you have specified input and output files, you can specify settings used to process your data. The settings for processing are managed in the Advanced Settings window.

The navigation pane of the Advanced Settings window contains two tabs:

Parser

PrcustomSettings for the Business Data Parser are shown on the Parser tab. The Prcustom tab contains settings for the Parser Customization process. The Parser Customization process is explained in the next chapter.

Parser TablesThe Business Data Parser uses the Word Pattern Definition table file to parse the non-name and address elements of the data. The Word Pattern Definition table for the Business Data Parser is created from the Parsing Customization process.

For instructions on the Parsing Customization process, see Chapter 7, “Tuning the Parsing Rules” and Appendix B.

Word Pattern Definition File—Defines word patterns for a given country. It contains definitions for words and phrases (tokens), and the patterns associated with each line type.

These tables use a two letter prefix to indicate the


Parser Tables 6-33

country: US = United States, CA = Canada, GB = United Kingdom, DE = Germany, and so forth.

To specify the Parser table

1. Click Advanced and navigate to Process, Settings. 2. Specify the Word Pattern Definition File.

Default Word Pattern Definition File (US): \TrilliumSoftware\tsq10r5s\<project>tables\USBDPRUL.win

ExampleFor example, you can create a Word Pattern Definitions table for automobile classification. At least one definition and one pattern entry must be present in the Word Pattern Definitions table.

Entry from Word Pattern Definitions Table

Figure 6.7 Sample BDP Word Pattern Definition Table

'ACURA' INSERT MISC DEF ATT=MAKE'ALFA' INSERT MISC DEF ATT=MAKE,RECODE='ALFA ROMEO''ALFA ROMEO' INSERT MISC DEF ATT=MAKE'AMC' INSERT MISC DEF ATT=MAKE'AUDI' INSERT MISC DEF ATT=MAKE'BERTONE' INSERT MISC DEF ATT=MAKE'BMW' INSERT MISC DEF ATT=MAKE'BUICK' INSERT MISC DEF ATT=MAKE'CADDY' INSERT SYNONYM='CADILLAC''CADI' INSERT SYNONYM='CADILLAC''CADILLAC' INSERT MISC DEF ATT=MAKE,RECODE='CADILLAC''CADY' INSERT SYNONYM='CADILLAC''CHEVROLET' INSERT MISC DEF ATT=MAKE'CHEVY' INSERT MISC DEF ATT=MAKE,RECODE='CHEVROLET'




To retain original valuesYou can specify whether you want to retain the original input data in the Parser output field. The field must be defined as ORIGINAL by the output DDL. If this setting is not checked, the Parser formats the data as uppercase and removes erroneous punctuation.

1. Click Advanced and navigate to Process, Settings. 2. Select Retain Original Value.

To include Unknowns in Standard Original FieldThis setting controls whether unknown or undefined tokens are populated into label lines. When checked, label lines are populated with the complete input lines, including unknown or undefined words/tokens. These tokens are standardized and appear in the same left-to-right order as in the input line.

1. Click Advanced and navigate to Process, Settings. 2. Select Include Unknowns in Std Original Field.

To populate unknown patternsWhen checked, this setting ensures that the Parser populates user fields with known attributes, even in the event of a pattern failure.

1. Click Advanced and navigate to Process, Settings. 2. Select Populate Unknown Patterns.


1. Click Advanced and navigate to Process, Settings. 2. Select Enable Debug Output.

See “Business Data Parser” in TS Quality Reference Guide for the complete settings information.



3. In the Debug File text box, accept the default path and file name, or enter a file name where debugging information will be written.






1. Click Advanced and navigate to Additional.... 2. In Settings File Encoding, select the correct encoding from

the drop-down list.



6-36 Run the Business Data Parser and View Results

Run the Business Data Parser and View Results

To run the Business Data Parser and view results

1. Click OK to close the Advanced settings. 2. Click Run to run the BDP.



Statistics file indicates the number of records read into and out of the BDP. It also displays number of records that contain blank data, other lines, and unknown lines.

5. Review the Statistics file. Be sure to use My Statistics Viewer and the Spreadsheet Viewer to review the output

statistics of the BDP.

After you run the BDP, the Parser generates Completion Codes and Review Codes to identify specific conditions which occurred for each record being parsed. You can review those codes to analyze the parser results.

The completion codes are written to the BDP Repository Output Record (BPREPOS) in the following field:

bp_completion_code

The review codes are written to the BDP Repository Output Record (BPREPOS) in three character pairs in the following fields:

bp_misc_review_codes

See Appendix B for the complete list of Completion Codes and Review Codes for the Business Data Parser.

There are no Review Groups for the Business Data Parser.



7-1

CHAPTER 7 Tuning the Parsing Rules

Tuning the Parsing Rules

7-2

If the Customer Data Parser cannot recognize the name or address component such as city name or surname on a record, an exception is reported. When that occurs, you must change the parsing rules using Parsing Customization.To use Parsing Customization, you must first understand how the parser definition tables work.

This chapter explains the parser definition tables. You will also perform these tasks:

View parser exceptions

Identify and create an entry for a misspelled city name

Identify and create an entry for a bad name pattern

Review the new entries in the Customized Definitions file

Run Parsing Customization and re-run the Customer Data Parser

Check errors in the Parsing Customization process

This chapter focuses on the Parsing Customization process for the Customer Data Parser. See Online Help to tune the parsing rules for the Business Data Parser.


Understanding the Parser Definitions Tables 7-3

Understanding the Parser Definitions Tables

Standard and User Definitions TablesThe Parser Definitions tables contain both definitions and word/phrase pattern information.These files are used by the Parser to identify the components of the input data.

Standard Definitions TableStandard Definitions tables include all standard definitions for titles, first names, business names, street components (type and direction) as well as patterns and masks for other name and address components. They are supplied with the program.

Default Standard Definitions Table: \TrilliumSoftware\tsq10r5s\tables\parser_rules\xxCDPRUL.win(xx = 2-digit country code)

Standard Definitions tables are identified by a two letter prefix to indicate the country. (Example: US = United States, CA = Canada, GB = Great Britain and DE = Germany)

Customized Definitions TableCustomized Definitions tables contain user-created definitions.

Default Customized Definitions Table: \TrilliumSoftware\tsq10r5s\<project>\tables\xxUSERCDP.win (CDP, xx = 2-digit country code)\xxBDPRUL.win (BDP, xx = 2-digit country code)

For the Business Data Parser, the default standard definitions table is empty. You must create a table for the Business Data Parser to run.


7-4 Syntax of Definitions

Syntax of DefinitionsEntries in Standard and User Definitions tables require a special syntax.This section describes the syntax for definition entries.

Syntax

An entry is composed of Token, Operations, Line-type, Position, Attributes and Attribute Modifiers. Brackets [] indicate the enclosed item is optional. The brackets [] are NOT typed on an actual entry line.

Example

TokenA token is any word or phrase in the data, or a mask of any word, or phrase. Tokens are informally called “the left side of the equation” in a definition table entry. In this example, the token is ‘Mary’.

‘MARY’ INS NAME BEG ATT=GVN-NM1,GEN=F

Tokens cannot wrap to a second line. This also affects word and phrase definitions, masks, and pattern entries.

The Parser identifies four different types of token structures:

Token

TOKEN [OPERATION] LINE-TYPE [POSITION] KEYWORD=ATTRIBUTE, [ATTRIBUTE MODIFIER]

‘MARY’ INS NAME BEG ATT=GVN_NM1, GEN=F

‘MARY’ INS NAME BEG ATT=GVN_NM1 GEN=F

Token Operation Line Type Position Keyword=Attribute Attribute Modifier

Token entries can be no more than 100 characters in length.


Token 7-5

Sub-token

Phrase

Mask

The table below describes the token structures and provides examples.

Table 7.1 Parser Token Structures

Token

The smallest entity that has a meaning by itself. A token may or may not contain one or more sub-tokens.

Example: 'PIZZA' NAME ATT=BUS

Sub-token

String entity that has a meaning within a token (e.g. strasse;). A sub-token may appear at the beginning or end of the token.

If your data contained BERGENSTRASSE:Example: ‘STRASSE’ STREET END-TKN ATT=STR-TYPE Where:STREET the line typeEND-TKN location of the sub-token within the word (also indicates this is a sub-token)ATT=STR-TYPE the attribute assignment for table lookup

Beginning-Token (BEG-TKN)Used for the sub-token position. This keyword indicates that the sub-token position lies at the beginning of a token.Example: ‘STRASSE’ STREET BEG-TKN ATT=STR-TYPE

Ending-Token (END-TKN)Used for the sub-token position. This keyword indicates that the sub-token position lies at the end of a token. Example: ‘STRASSE’ STREET END-TKN ATT=STR-TYPE

BEG-TKN and END-TKN are only allowed on street lines. See line types in the following section for more information.

Phrase

One or more tokens grouped together that have a meaning.Example: 'HOLD MAIL' STREET ATT=HOLD


7-6 Operations

OperationsThe Parser identifies three types of operations:

Insert

Modify

DeleteIn this example, the operation is INS (INSERT).


Mask

A mask is a description of a word or phrase, using alpha, numeric or special characters to represent letters, numbers, and special characters. Masks define characters of data elements using:

“n” to represent a number (0 -9)“a -z” to represent alphabetic letters (lowercase only)Every character that is not a letter or number is represented by the character itself: / (forward slash), @ (at symbol), and so forth.

For example, a mask can define any series of five numerals as a ZIP code, instead of entering each of the 99,999 possible combinations in the table. This mask entry looks like:

‘nnnnn’ MASK GEOG DEF ATT=POSTCODE

Masks may include special characters if they are part of the word representation. For example, a mask for a nine-digit ZIP code is: ‘nnnnn-nnnn’ MASK GEOG DEF ATT=POSTCODE

Appendix D in the TS Quality Reference Guide lists the valid token tags for Asia-Pacific countries.

Table 7.1 Parser Token Structures


Operations 7-7

Underlined letters indicate allowable abbreviations.

Table 7.2 Parser Operations

INSERT

This operation inserts an entry in a table.

‘MARY’ INSERT NAME BEG ATT=GVN-NM1,GEN=F If omitted, INSERT is assumed by default.

MODIFY

This operation replaces an existing entry in a table. The original entry is deleted and the modified entry is inserted.

Example: ‘MARY’ MODIFY NAME BEG ATT=GVN-NM1,GEN=F Modify is used to change definitions in the standard definition table by creating the

entry in the user definitions file. The Parsing Customization process will combine entries from the two tables into one output (to be used by the Parser).

DELETE

This operation deletes an entry from a table.

Deleting Definitions:Example: ‘MARY’ DELETE

Deleting Synonyms: With the SYNONYM keyword, you must enter the actual synonym:Example: ‘BV’ DELETE SYNONYM=BOULEVARD

Deleting Patterns: You must enter the actual pattern followed by DELETE PATTERN.Example: GVN-NM1 1ALPHA ALPHA DELETE PATTERN


7-8 Line Types

Line TypesEach definition entry requires a line type assignment. The Parser identifies four types of lines:

Name

Street

Geography

Micellaneous

In this example, the line type is NAME.



Note that attributes do not cross line types. For instance, an attribute of GVN-NM1 cannot be used with a line type of STREET.

Line Type Description

NAME Name of a person or business. Names are usually the first one or two lines in an address record.

‘BOOKSTORE’ NAME DEF ATT=BUS,CAT=S5942

STREET All descriptions of streets and numeric addressing, including box numbers, rural routes, and apartment numbers. A street line is usually in the middle of a record, and may be one or more lines.

‘LANE’ STREET END ATT=STR-TYPE,REC=LN

GEOGRAPHY The city, state, postal code, and country in the address. Geography line(s) are usually at the end of an address record.

‘MASSACHUSETTS’ GEOG END ATT=STATE,REC=MA

MISCELLANEOUS Information that does not fit into the other line types, such as account name or a comment.

‘HOLD MAIL’ INSERT MISC DEF ATT=HOLD


Positions 7-9

Positions A token may be defined in relation to its position within the name or address line. There are three types of positions:

Beginning

Ending

Default

In this example, the position is BEG (BEGINNING).



BEGINNING

This includes the first word in a line, any word that follows a title, or any words that appear before a first name, including the first name. For example, consider the line:

MR JOSEPH SMITH

Every word except ‘Smith’ is considered to be at the beginning of the line.

DEFAULT (optional)

When the physical location of the word in the line is irrelevant, “Default” is used.

A default word may appear anywhere on the line, including the beginning or end. If this keyword is omitted from the entry, Default is assumed.

ENDING

The last word and any further non-alphabetic characters are the ending of a line. For example, consider the line

BRIARWOOD ESTATES APT 3

Both “APT” and the apartment number “3” are considered to be at the end of the line.


7-10 Attributes

AttributesAttributes (ATT=) are line-specific definitions and assign a specific meaning to a word or mask shape. The following table lists available attributes organized by line type.

For the complete list of Attributes, see Appendix D in the TS Quality Reference Guide.

User-Defined Attributes

If a particular word or phrase does not meet any of the pre-defined attributes, you may assign it a user-defined attribute. For example:

Once a user defined attribute is assigned in the User Definitions table, the corresponding field name must be included in the CDP output DDL. For instance, if a USER1 attribute is assigned a value in the User Definitions table, the field name PR_USER_FIELD_01 must be added to the CDP output DDL.

Attribute ModifiersAttributes can be further described by various Attribute Modifiers. The following section lists all definition modifiers that can be used after the attribute assignment. All modifiers must be separated from the attribute by a comma. Valid attribute modifiers are Gender, Category, Function and Recode.

Attribute Description

Name Line Attributes Attributes used in NAME lines of patterns

Street Line Attributes

Attributes used in STREET lines of patterns.

Geography Line Attributes

Attributes used in GEOGRAPHY lines of patterns.

Miscellaneous Line Attributes

Attributes used in MISCELLANEOUS lines of patterns.

Note that attributes do not cross line types. For instance, an attribute of GVN-NM1 cannot be used with a line type of STREET.

‘n-nn-nan’ MASK NAME DEF ATT=USER1


Attribute Modifiers 7-11

Gender The Gender (GEN=) keyword assigns a gender to a name component. It applies only to definitions for name lines and is required if the attribute used is GVN-NM1, 2, 3, or 4.

Valid gender codes:

M = Male

F = Female

N = Neuter (gender unknown)

Category The Category (CAT=) keyword is a user-defined, free-form means of categorizing data elements. Categories should be limited to six characters (based on assigning multiple categories throughout a record) with a maximum of 50 bytes per record for all categories.

A category can be any value that may prove useful as a group during Parsing of name and address components. For example, assigning SIC codes to company names allows the distribution of customer business verticals to be analyzed after the parsing process is complete.

Function The Function (FUNC=) keyword is used when special functions should be performed on the entry. This keyword specifies a certain subroutine, and the functions of that subroutine act on the entry.

There are Special Functions used with the FUNCTION keyword. See Appendix D in the TS Quality Reference Guide.

‘MARY’ NAME BEG ATT=GVN-NM1,GEN=F

‘BOY SCOUTS’ NAME DEF ATT=BUS,CAT=’S8641’

‘BOY SCOUTS’ NAME DEF ATT=BUS,FUNC=’BES01’


7-12 Synonym

Recode The Recode (REC=) keyword is used to recode the value. The value assigned after REC= is the value the Parser will assign to the recode output field when the defined word is encountered on input.

In the above example, the parser recodes the word ‘ROAD’ to ‘RD.’

“ROAD” would be the value stored in the original data field on Parser output. This is the pr_street_type1_original field in the Parser repository.

“RD” would be the value stored in the recoded data field on Parser output. This is the pr_street_type1_recoded field in the Parser repository.

Recode for Masks

Masks may be used to introduce and/or exclude literals and special characters in their recodes. For example, a mask for a telephone number is entered in this manner:

This entry would recode the entry ‘978 663-9955’ to ‘(978) 663-9955’.

SynonymA synonym is a shortcut for defining a token entry with the same value as a prior entry. For example:

This entry identifies ‘PBOX’ as a synonym of ‘PO BOX’.

Synonyms are used to correct common spelling errors. The two fields in the Parser affected by synonym entries in the definitions table are called the “original” and “recoded” output fields.

‘ROAD’ STREET END ATT=STR-TYPE,REC=RD

‘nnn nnn-nnnn’ MASK MISC DEF ATT=IGN,REC=’(nnn)nnn-nnnn’

‘PBOX’ SYNONYM=‘PO BOX’


Synonym 7-13

It is important to understand the behavior of synonym entries in conjunction with the recode entry of the resulting definition in Parser output. See the example below.

ExampleThe definitions table contains the following entry:

The Parser knows this entry is a ‘TYPE’, with a recode value of “CCAL”.

Parser puts “CENTRE COMMERCIAL” in the original output field pr_street_type1_original and

puts recoded value “CCAL” in recoded output field pr_street_type1_recoded

Now a synonym entry is added to use the original definition entry:

The Parser knows that this is a synonym for “CENTRE COMMERCIAL.” It places “CENTRE COMMERCIAL” (NOT “CENTRE COMMERC”) in the original data output field, and places the recoded value of “CCAL” in the recoded data output field. This ensures that you have the correct spelling in the entry.

Manage this behavior through the Retain Original Data settings (click Advanced, Process, Settings in the Customer Data Parser step). If this setting contains a value of 1, the original data output field would contain the original value (not the synonym value as shown above). See page 6-32 for information on Retain Original Data settings.

'CENTRE COMMERCIAL' STREET DEF ATTRIBUTE=TYPE, REC=’CCAL’

'CENTRE COMMERC' SYNONYM=’CENTRE COMMERCIAL’


7-14 Special Entries

Special EntriesIn addition to the basic syntax described in the previous sections, the Parser uses some special entries. This section explains special entries including:

US city name changes

Non-US city name changes

Multiple definitions for one entry

Patterns

US City Name ChangesCity name change entries are entered with an underscore (_) as the last character of the entry. This notifies the Parser that this is a city-change, and tells the program to look up the recoded entry in the City Directory Table.

This directory is used for city verification and correction, and is based on a primary geography, secondary geography lookup (such as state or city).

Example

Non-US City Name ChangesCountries other than the US use another level of city-name changes. It allows for additional city verification and correction based on a complex City Directory Table. An underscore is required

‘MABEVERLEY_’ GEOG DEF ATT=CITY-CHG, REC=’MABEVERLY’

‘CASAN FRAN_’ GEOG DEF ATT=CITY-CHG, REC=’CASAN FRANCISCO’


Multiple definitions for one entry 7-15

as the last character of the entry.

Multiple definitions for one entryOccasionally, an entry may contain multiple meanings. This is often the case when a word has a meaning for more than one line type. The first definition is entered in the standard way previously described. Subsequent definitions must be INDENTED under the initial operational value.

Note that for Geography definitions, tokens are allowed without an attribute.

PatternsA pattern consists of attributes and/or intrinsic attributes, which include any alpha, numeric, or special character representation of a data element.

Changes can be made to an existing pattern by adding another tag to the first line, using the MODIFY operation.

Level Description

Post Town

In this example, the program looks up Cheltenham as a valid post town in Gloucestershire county. Note that the recode contains only the corrected spelling of the post town.

‘GLOUCESTERSHIRE CHELTENHAN_’ GEOG DEF ATT=CITY-CHG, REC=’CHELTENHAM’

Locality In this example, the program looks up Gotherington as a valid locality in the post town of Cheltenham. Note that the recode contains only the corrected spelling of the locality.

‘CHELTENHAM GOTHERINGTEN_’ GEOG DEF ATT=CITY-CHG, REC=’GOTHERINGTON’

‘CENTER’ NAME DEF ATT=BUS STREET END ATT=SEC-TYPE,REC=CTR GEOG DEF REC=CENTER


7-16 Patterns

See “MODIFY” on page 7-7 for details.

Token identification is converted into meaningful information through pattern processing. Patterns are created in the same text file as the Definition entries. The Parser understands the difference between a definition and a pattern and processes each appropriately. Because of this, it is not necessary to create the various entries in any particular order. For organizational purposes, however, it makes sense to organize the entries by type.

Pattern Structure

The pattern structure uses one or two lines, using the following structure.

'ALPHA ALPHA' MODIFY PATTERN NAME REC=’GVN-NM1(1) SRNM(1)’

FIRST LINE:

Inbound combination of tokens

This is the combination of attributes the Parser program will attempt to find in the table. If the exact match of attribute combination is found, the program changes the attribute values on output to match the values defined in the RECODE portion of the pattern (see following information on RECODE). In this example, two words containing letters only are present, such as two names. The actual data entry could be ‘John Smith’ and the required association to the pattern would be:

John SmithALPHA ALPHA

Here, both words are identified as ALPHA attributes. See the section Intrinsic Attributes for more information. This portion of the entry must be enclosed in single quotes: ‘ALPHA ALPHA’

Keyword indicating this is a pattern ‘ALPHA ALPHA’ PATTERN NAME


Intrinsic Attributes 7-17

Intrinsic AttributesAn intrinsic attribute is one that represents an individual entity that did not have a definition entry in the table. This table lists the main intrinsic attributes used for patterns.

For the complete list of Intrinsic Attributes, see Appendix D in the TS Quality Reference Guide. .

Only the inbound portion of the pattern entry may contain intrinsic attributes. All outbound portions (recode line) must contain only non-intrinsic attribute values.

Keyword indicating to which line type this pattern entry applies

Valid line type keywordsNAMESTREETMISC

‘ALPHA ALPHA’ PATTERN NAME

SECOND LINE: (Optional: both sets of elements can be on one line.)

The recode keyword followed by an ‘=’ symbol

The attribute values that follow this keyword redefine the tokens from their inbound values.

REC=’GVN-NM1(1) SRNM(1)’

The outbound pattern recode values

This is the combination of attribute values the Parser will use on output for the data provided on this line. The values that follow the recode value must be enclosed in single quotes. Name lines require the name number following each attribute name. Please see the section on Constructing Name Patterns for details.

REC=’GVN-NM1(1) SRNM(1)’

INTRINSIC ATTRIBUTE

ABBR. DESCRIPTION

ALPHA –– Letters only

HYPHEN –– A hyphen (–)


7-18 Controlling meanings when a sub-token is present

Controlling meanings when a sub-token is present

Assume your data contains BERGENSTRASSE 12. A definition entry might exist in this format:

The following pattern is required in order to separate the subtoken from the word:

Or, the following pattern is required in order to keep the subtoken attached:

Assigning a Line Type Through a Pattern

Line Type “A”

Apartment or house name lines can be set to line type ‘A’ by the Parser to represent an apartment line. This allows separate storage of street components in the Parser output, such as street name, house number, and apartment or house name information. Do this

NUMERIC –– Numerals only

INTRINSIC ATTRIBUTE

ABBR. DESCRIPTION

‘STRASSE’ STREET ENDING-TOKEN ATT=STR-TYPE-S

‘ALPHA STR-TYPE-S’ NUMERIC PATTERN STREET REC=’STR-NM STR-TYPE HSNO’

‘ALPHA NUMERIC’ PATTERN STREET REC=’STR-NM HSNO’


Constructing Patterns for Name Lines 7-19

simply by adding “ATT=APT” on any street pattern definition, as in:

If the above pattern had been entered as just a street pattern (without using the APT attribute) then the following would have occurred:

Where the ‘Z’ line sets all data to IGNORE attributes and no individual storage of the tokens occurs, the ‘A’ line identifies the tokens properly and parses them into the appropriate Parser output fields:

pr_dwelling1_number

pr_complex1_name_recoded

pr_complex1_type_recoded

Constructing Patterns for Name LinesUnlike street patterns that simply convert an inbound attribute combination to another version on output, name patterns perform an additional function. They often contain multiple individual names on the same line. In some cases, only one last name may have been given along with three first names, and it is implied that the last name should be associated with all three first names.

One of the powerful features of the parsing engine uses parsing customization pattern structures to understand these relationships.

Assume you have this record:

'ALPHA COMPLEX-TYPE ALPHA-1NUMERIC' PATTERN STREET ATT=APT REC='COMPLEX-NAME COMPLEX-TYPE APT-NUM'

Original data:HAWTHORNE COTTAGE B1F10 MAIN STREET

Original street only pattern:(Z) HAWTHORNE COTTAGE B1F(S) 10 MAIN STREET

New pattern:(A) HAWTHORNE COTTAGE B1F(S) 10 MAIN STREET

JOHN SMITH & MARY & ROBERT


7-20 Constructing Patterns for Name Lines

There are three individuals given but only one last name. In order to ensure that each first name receives a last name on output, a pattern can be constructed to perform this association:

The numbers in the parentheses following each attribute value in the recode line indicate the physical name to which that particular token value is associated. For the last name attribute, the values in parentheses indicate that this token is associated with all three individuals.

‘GVN-NM1 ALPHA CTR GVN-NM1 CTR GVN-NM1’ PATTERN NAMEREC=’GVN-NM1(1) LAST(123) CTR(2) GVN-NM1(2) CTR(3) GVN-NM1(3)’


Conventions in Parsing Customization 7-21

Conventions in Parsing Customization

This section lists elements the user needs to be aware of in order to ensure that the Parsing Customization process functions properly.

Comment Lines

Comment lines are specified in entries in two different ways:

using an asterisk (*) in column 1

using a double forward slash (//) on the same line as the entry.

Everything following the ‘//’ will be ignored.

There must be a space after the double forward slash for the comment to be valid.

Comments may only contain alpha-numeric characters.

Line Lengths

Table entries longer than one line may span multiple lines. Each additional line within each entry must be indented. Each new entry must begin in column 1.

The maximum line length for entries is 189 characters, including the newline character. The entry definition length may not exceed 100 characters. Components of an entry may not exceed more than one line.

Quotation Marks

Entries enclosed by single quotation marks are processed as one entity. If you wish to include a single quotation mark within a SYMBOL or VALUE, use double quotation marks.

Double quotation marks (“) specified within a SYMBOL or within a

‘AARON’ NAME BEG ATT=GVN-NM1,GEN=M* Gender is required with a GVN-NM1 attribute.

‘AARON’ NAME BEG ATT=GVN-NM1,GEN=M // Gender is required with a GVN-NM1 attribute.


7-22 Conventions in Parsing Customization

VALUE are converted to single quotation marks(‘) by the system. For example:

If a recode string contains more than one word, the entire string must be entered in single quotes.

‘O”BRIEN’NAME END ATT=SRNM

‘AS TRUSTEE FOR’ SYNONYM=’TRUSTEE FOR’

‘MEBAR HARBER_’ GEOG DEF ATT=CITY-CHG, REC=’MEBAR HARBOR’


How to Customize the Parser Definition Tables for Japan 7-23

How to Customize the Parser Definition Tables for Japan

For Japan, special Parser Definition tables are used by the Parser in addition to the built-in personal and business name dictionaries. There are two types of tables (Clue Tables and Name Tables) and they are stored in the ..\tables\aptables\ directory. If the Customer Data Parser cannot recognize the name component on a record, you can create an entry to those tables.

Clue Table The Clue table (jp_clue.txt) is used to store keywords that the Parser uses to separate input text into tokens and to determine business/personal classification. You can customize this table. The following types of keywords are included in the Clue table.

Table 7.3 Tokens for jp_clue.txt

Token Type Item Description

Business Keyword

T Business Type Words to describe business type. Ex. 株式会社 ,（有）.

N Business Name Parse as business name if this token is found at the beginning of the string (excluding business type).

E Business Name Suffix Words such as 病院 , 学校 .

D Branch Name It can be a branch name by itself. Ex. 人事部 ,経理部 .

B Branch Name Suffix Usually this token is merged into the previous token and constitutes a branch name. Ex. 支店 , 営業所 .


7-24 Clue Table

FormatThe table consists of the following 4 items. The delimiter for each item is a comma. If the format is not correct, that line will be ignored and the subsequent lines will not be recognized properly.

C Business Keyword Words that can be part of business name or branch name. Ex. データ , 建設 .

H Honorific Words for honorific.Ex. 様 , 殿

P Title (position) Words for Title.Ex. 代表取締役 , 公認会計士

R Region Ex. 東京 , 長野

Table 7.3 Tokens for jp_clue.txt

Table 7.4 Format for jp_clue.txt

Position Item (Not set) NULL

1 Token type Not allowed

2 Zenkaku field Allowed

3 Hankaku field Allowed

4 User comment Allowed


Clue Table 7-25

Example:

D, 人事部 , ｼﾞﾝｼﾞﾌﾞ , user comment

T,( 株 ),( ｶﾌﾞ )

If the user comment is null, the comma between the third item and the fourth item can be omitted.

In this case, “（株）” in the input text “（株）アグレックス人事部 ” matches one of business type keywords (T type), and “ 人事部 ” matches one of branch name keywords (D type), therefore the token type for the each word was determined.

In the final output, the unknown word “ アグレックス ” was recognized as business name and each word was written out in the proper output field.

If there are unregistered keyword found, the users can add that word to this table.

Duplicate words: when you register the new keyword, try not to register duplicate words in different type.

Character Code: use CP932 for registration.

Words with Spaces: The only keyword that can include

Input

（株）アグレックス人事部

　　↓

After token separation → Output

（株）アグレックス人事部（株）アグレックス人事部

Business type (T)

Unknownword

Branchname

(D)

Business typeField

Business nameField

Branch nameField


7-26 Name Tables

spaces is N type. If you register the keyword that includes spaces, delete all spaces before and after the entry and change all spaces within the entry to one hankaku space.

Ex. N,ＨａｒｔＨａｎｋｓ,Hart Hanks

Name TablesName tables contain additional personal and business names that are not included in the personal and business name dictionaries.They also include principal business names. You can customize these tables.

Table 7.5 List of Name Tables

File Description

jp_bnp_name.txt Contains business principal name pat-terns and business type standard pat-terns (for zenkaku field)

jp_bnp_name_h.txt Contains business principal name pat-terns and business type standard pat-terns (for hankaku field)

jp_pnp_name.txt Contains last and first names that are not in the name dictionary (initial status is blank)


jp_bnp_name.txt 7-27

jp_bnp_name.txtThis table is used to register principal business names and principal business type for zenkaku field. It is not used to separate business name and business type.

FormatThe table consists of the following 4 items. The delimiter for each item is comma.

Table 7.6 Token for jp_bnp_name.txt


B Business name This is used to obtain principal business name for the business name field, and write out the principal business name in the output field.

T Business type This is used to obtain principal business type for the business type field, and write out the principal business type in the output field.

Table 7.7 Format for jp_bnp_name.txt

Position Item Not set (NULL)


2 Business name Not allowed

3 Principal name Not allowed



7-28 jp_bnp_name_h.txt

Example:

B,JR東日本 ,東日本旅客鉄道

T,（株）,株式会社 ,

If the user comment is null, the comma between the third item and fourth item can be omitted.

By standardizing the business data using this table, you can achieve more accurate matching.



jp_bnp_name_h.txtThis table is used to register principal business names and principal business type for hankaku field. It is not used to separate business name and business type. The usage and function of this table is as same as jp_bnp_name.txt except that the field for this table is Kana.

Input

（株）JR 東日本人事部

　　↓

Output

（株）株式会社 JR 東日本東日本旅客鉄道人事部

Business typeField

Principal business type

Field

Business nameField

Principal business name Field

BranchnameField


jp_pnp_name.txt 7-29

jp_pnp_name.txtThis table is used to register additional personal names. If you found last names or first names that are not in the personal name dictionary, you can add them in this table.

FormatThis table consists of the following 5 items. The delimiter for each item is comma.

Table 7.8 Token for jp_pnp_name. txt


L Last name Register additional last names. Reading of Kanji name can be registered.

F First name Register additional first names. Reading of Kanji name can be registered.

Table 7.9 Format for jp_pnp_name.txt

Position Item Not set (NULL)


2 Last name or First name (zen-kaku).

Allowed

3 Last name in Kana or First name in Kana (hankaku)

AllowedWhen parsing zenkaku first/last name, use this field as reading of the name.

4 Not used Allowed



7-30 jp_pnp_name.txt

Example:

F, 潤兵 , ｼﾞﾕﾝﾍﾟｲ ,,user comment



Words with Spaces: for integration purposes, small characters must be converted to large characters when adding hankaku kana last and first names.


Using the Parser Customization Editor 7-31

Using the Parser Customization Editor

Parsing Customization is the process of creating entries for words and phrases in the Customized Definitions Table. Those entries are created using the Parser Customization Editor. After the new entries are created and saved, you must re-run the Customer Data Parser to apply the new parsing rules.

View a Standard Definitions TableBefore making entries to the Customized Definitions Table, take a look at the Standard Definitions Table to see how it is constructed. Standard Definitions Table vary from country to country.

To view a standard definitions table

1. Open the Customer Data Parser step and click Customization Editor. The Parsing Customization Editor opens.

Figure 7.1 Opening Customization Editor 2. Select File, Open Standard Definitions.

For detailed information on the Parser Customization Editor, see the Online Help.

CustomizationEditor button


7-32 View a Standard Definitions Table

3. Locate the Standard Definitions Table in the Open dialogue box: for example, c:\TrilliumSoftware\tsq10r5s\tables\parser_rules\USCDPRUL.win.

4. Click Open. The Standard Definitions table for the selected country appears.

Figure 7.2 Standard Definitions Table (US) 5. From the Main Menu select Search, Find Entry to review the

entries in this file. 6. Select File, Exit to leave the Customization Editor.


View and Correct City Problems 7-33

View and Correct City ProblemsCity problems are reported to the exceptions file any time a US city/state combination cannot be verified. The usual cause is a misspelled city name.

To view and correct city problems

1. Open the Customer Data Parser step and click Customization Editor. The Parsing Customization Editor opens. When the Customization Editor opens, the country-specific Customized Definitions file and the country specific Word/Pattern Problems file will also open.

2. The left window of the Customization Editor contains a Navigation area which allows the user to move from Customized Definitions to specific Word/Pattern Problems. Click Customized Definitions in the Navigation area. The screen will show the current customized definitions file, which is empty by default.

3. Click below the line of asterisks. This will position your cursor to enter customized definitions.

Be sure to position the cursor below the line of asterisks before applying an entry.


7-34 View and Correct City Problems

Figure 7.3 Parsing Customization Editor 4. Click US City Problems in the Navigation area. The screen

will display city problems found in the US data.

Figure 7.4 US City ProblemsThe Frequency column lists the number of times this city problem occurred, followed by the percentage of total

Cursor position

Navigation Area

Cursor Position


View and Correct City Problems 7-35

occurrences this entry represents. Zip, State and City data is listed as it appears on the input record.In this example, the cities ‘FAIRBANK’ and ‘BAR HARBOR’ are misspelled. The right side window displays the record number(s) for the selection.

5. Right-click ‘FAIRBANK’ to start the new entry process. The cursor appears in the Input Correct City Name box.

Figure 7.5 New Entry Box 6. Enter the correct spelling of this city as ‘FAIRBANKS’. Click

OK. The two letter state abbreviation followed by the corrected city name will appear after the ‘RECODE =’ in the New Entry box.

Figure 7.6 Input Correct City Name

Cursor position


7-36 View and Correct City Problems

7. Click Apply. The entry will be added to the Customized Definitions file wherever the cursor is positioned in the Customized Definitions file.

8. In the Navigation area, click Customized Definitions to view the new entry. The entry would look like this:

Figure 7.7 US City Entry in the Customized Definitions File

9. In the Navigation area, click US City Problems. Repeat the correction steps for the city ‘BAR HARBER’.

10. Click Apply. In the Navigation area, click on Customized Definitions to view the new entry.

Figure 7.8 Multiple US City Entries in the Customized Definitions File

If you accidentally hit the Apply button or if an entry is incorrect, you can modify or delete entries directly in the Customized Definitions file.


View and Correct Pattern Problems 7-37

View and Correct Pattern ProblemsBad name patterns occur when data that the CDP cannot identify appears on a name line. Any pattern of data that cannot be completely identified is written to the exceptions file for review.

To view and correct pattern problems

1. In the Navigation area, click Bad Name Patterns. 2. Click on the first Bad Name Pattern. The data appearing in

the lower portion of the screen corresponds to the bad pattern selected. If the Frequency for the pattern is 2 then the data for the two corresponding records is displayed in the Pattern Examples window.

Figure 7.9 Bad Name Patterns 3. To correct the Bad Name Pattern you must change any

unknown attributes to a known attribute. Unknown attributes are displayed in red.

4. To change the unknown attribute, right-click on the attribute name (ALPHA). A pop-up list of possible attributes will appear.

Unknown attribute


7-38 View and Correct Pattern Problems

5. Double-click on the desired attribute (for example, SURNAME) in the list. The ALPHA attribute will be replaced with SURNAME and will appear in italics and blue.

Figure 7.10 Corrected Name Pattern 6. Click the Confirm button to verify the entry before it is

placed into the Customized Definitions file.

If there are elements of data that you do not wish to maintain, assign an IGNORE attribute to the piece of data.

Corrected attribute


View and Correct Pattern Problems 7-39

7. Click Apply to add this pattern to the Customized Definitions.

Figure 7.11 Name Pattern New Entry 8. In the Navigation area, click on Customized Definitions to

view the new entries.

Figure 7.12 Complete New Entries in the Customized Definitions File

Confirm button

If there are multiple names on one name line, the number in parentheses will determine which data element goes with which person.


7-40 Save the Entries

Save the EntriesAfter the correction have been completed, save the entries in the Customized Definitions file. These entries will be merged with the Standard Definitions entries before the Parser step is run.

To save the entries

1. Select File, Save from the Main Menu.

Re-Run Customer Data ParserAfter the new entries are created and saved, you must re-run the Customer Data Parser to apply the new parsing rules.

To apply new parsing rules

1. Run the Customer Data Parser step. When asked Would you like to run parsing customization prior to running the step?, select Yes.

2. The Customized Definitions will be merged with the Standard Definitions and the Parser will run using the complete set of parsing rules.

3. When the Parser step has run, click on the Customization Editor button. Navigate to US City Problems and then to Bad Name Patterns. Notice that the exceptions are no longer displayed. The entries in the Customized Definitions file have instructed the Customer Data Parser on how to handle these situations.

4. Close the Customization Editor.

View Errors in Parsing CustomizationWhen the Customer Data Parser has run, any errors in the Parsing Customization process will be identified with the following message:


View Errors in Parsing Customization 7-41

Figure 7.13 Parsing Customization Error MessageIf you get this message, view and correct the errors using the following steps:

To view errors in Parsing Customization

1. Open Customization Editor and select File, Open Error log.

2. The log displays the error message and indicates the line number, as well as the entry where the error occurred. A sample error log is shown below:

Figure 7.14 Parsing Customization Sample Error Log


7-42 View Errors in Parsing Customization

3. To correct errors, edit the entry in the Customized Definitions file. The error in Figure 7.14 indicates that the entry was duplicated in the Customized Definitions file. One entry should be deleted from this file.

4. Save the Customized Definitions file and re-run the Customer Data Parser.


8-1

CHAPTER 8 Analyzing Single Data

Analyzing Single Data

8-2

Sometimes users need to test and analyze the results of cleansing, standardization and linking on a single data record. TS Quality Analyzer allows the user to parse, geocode, and match name and address data interactively. It is a useful way to test and view modifications you make to the parsing rules.


Start the TS Quality Analyzer

Input name and address data

View the cleansed results

Show details for name/address parsing and standardization

Show details for address validation

Match data against your database

Review results of matching

The TS Quality Analyzer is not available for Asia-Pacific countries.


Using the TS Quality Analyzer 8-3

Using the TS Quality Analyzer

The TS Quality Analyzer processes a single data record for a specific country. The Analyzer processes each country’s data using the appropriate parsing and geocoding tables. If you have changed the Customer Data Parser’s parsing rules, using the TS Quality Analyzer is a particularly effective way to test the new rules.

Use the TS Quality Analyzer for several functions:

Review details for name/address parsing and standardization

Review details for address validation

View Customer Data Parser and Postal Matcher details for name/address record data

View Customer Data Parser Review Group descriptions

View Postal Matcher Return Code descriptions

Add the record to a database file for interactive reference matching

Match a transaction record to records in database file

See “Linking Single Record Using the TS Quality Analyzer” on page 14-14 for reference matching processing.

Start the TS Quality Analyzer

To start the TS Quality Analyzer

1. From the Tools palette, select TS Quality Analyzer. 2. In the Select a Country window, select the country you wish

to work with. Click OK.


8-4 Data Entry and Cleansing

3. The TS Quality Analyzer application opens.

Figure 8.1 TS Quality AnalyzerThere are two tabs in the TS Quality Analyzer:

Standardization - The Standardization tab is used for cleansing, parsing and postal matching processes. Customer Data Parser and Postal Matcher will be automatically run for the selected record and standardization results will be displayed. Matching - The Matching tab is used for the matching process. Relationship Linker will be automatically run for the selected record and matching results will be displayed.

You must first run the Standardization and then run the Matching. The Matching process takes as input the cleansed data generated by the Standardization process.

Data Entry and Cleansing

To enter a new record and cleanse the data

1. Select the Standardization tab.

Tool BarMain Menu


Data Entry and Cleansing 8-5

2. Select File from the main menu to select the input and output mode for the record. For input, select from Input Mode, Input Fields or Free Form Input. For output, select Output Mode, Output Fields or Free Form Input.

3. If you select Input Fields mode, enter the record line by line. If you select Free Form mode, you can enter the record in free text format.

4. Enter the new record data in the Input frame.

Figure 8.2 Input Mode 5. Click Cleanse to parse and geocode the data.

To clear the Input frame, click Clear or select Input from the Reset menu.

Free Form Input Mode

Input Fields mode


8-6 Data Entry and Cleansing

You can also click the Cleanse button in the tool bar.

6. The cleansed data will appear in the Cleansed window on the Standardization tab.

Figure 8.3 Cleansed Data 7. Look at Customer Parser Message and Postal Matcher

Message under the Cleansed window. These messages indicate whether the data entered is valid or not. If the data is valid, you will see the following messages:

If the data is invalid, you will see messages like this:

8. Click Show Details to see the parsing, standardization, and validation details. The results of the Customer Data Parser


Advanced Details 8-7

are shown in the lower left window and the results of the Postal Matcher are shown in the lower right window.

Figure 8.4 Parsing, Standardization, and Validation Details

Advanced DetailsIn addition to the parsing, standardization, and validation details, you can review the advanced details of the results of the Customer Data Parser and Postal Matcher results.

To review advanced details of data

1. Click Advanced Detail from the main menu. Refer to the table below and select the desired information:

Select... To...

Customer Data Parser and Postal Matcher Details

Review the PREPOS information returned from the Customer Data Parser and Postal Matcher

Customer Data Parser Review Group Descriptions

Look up the description of the Customer Data Parser Review Group returned

Postal Matcher Return Code Descriptions

Look up the description of the Postal Matcher Return Code

DPV Return Code Descriptions

Look up the description of the DPV Return Code


8-8 Matching

MatchingOnce the Cleansing step has run, you can match the record against records in your database.

To match data against database

1. Select the Matching tab. Notice the window key for the cleansed record is shown.

Figure 8.5 Matching Tab 2. Click the Plus sign (+) to show the Master Database. The

records in the database are shown in the lower window.

Figure 8.6 Records in Master Database

Window Key


Matching 8-9

3. Click either Match Individual or Match Household to set the level of matching.

4. Click Match.

You can also click the Match button in the tool bar.

5. The match results are displayed in the Window Key Matched Records from the Master Database and Relationship Linker Matched Records on the right side of the window.Window Key Matched Records from the Master Database shows all records in the database with the same window key as the input record.

Figure 8.7 Window Key Matched RecordsRelationship Linker Matched Records shows all matched records from the Window Key Matched records.

Figure 8.8 Relationship Linker Matched Records

To view and edit linking rulesIf you want to see and edit the field and/or pattern list files for this matching, launch the Relationship Linker Rule Editor within the TS Quality Analyzer.


8-10 Organize Database

1. Select Match Rules from the Tuning menu. 2. Select either Consumer or Business, and then select either

Level1 or Level2. 3. The Relationship Linker Rule Editor opens with the field and/

or pattern files for this matching process. 4. Review, edit and save the field and/or pattern files. 5. To match the data again using the updated field and/or

pattern files, go back to the Standardization tab and cleanse the data again.

6. Go to the Matching tab and re-run matching by clicking Match.

Organize Database

To add data to databaseAt this point, if you decide to keep the input record in the database, you can add the record.

1. Click Add to DB. The cleansed and matched input record is added to your database.

To remove data from database

1. In the master database, highlight the data you want to remove.

2. Click to remove that data.

To reset database 3. Select Master Database from the Reset menu. 4. At the confirmation message, select Yes.


9-1

CHAPTER 9 Enriching Your Data

Enriching Your Data

9-2 Sorting for the Postal Matcher

Once the name and address data is parsed, the address data must be verified and enriched by the Postal Matchers. With the Postal Matchers, data is matched to directories and appropriate geographic fields are populated with postal geocoding data. The Postal Matchers help you locate customers, verify address data, and improve that data. All Postal Matchers rely on output from the parsing process to provide addresses for linking purposes.


Sort the output file from the Customer Data Parser

Specify input, output, and the postal tables for the Postal Matcher

Run the Postal Matcher and view results

Identify the match level code for a record

View the record and analyze the match to the Postal Directory

Browse the postal directories for each country

We strongly recommend that the output file from the CDP be sorted by geographic fields so that the records will be in geographic order to permit the Postal Matchers to work most efficiently.

Sorting for the Postal MatcherThe Postal Matchers use output from the parsing process as inputs. To obtain optimum performance, the input files to the Postal Matchers must first be sorted in geographic order, using the Sort Utility. The output file will have the extension .srt to indicate that the data have been sorted.

Input and Output SettingsThe Sort Utility uses the output from the Customer Data Parser step as inputs.




1. Open the Sorting Utility step and click the Input Settings tab.



OR


4. Select the Output Settings tab. 5. Enter the Output File Name and Output DDL Name file

names. The Output File Name must have the extension .srt to indicate this is a sorted file.


To specify the output file qualifierA File Qualifier is a unique name given to a data file. For the Sort Utility, the output data file must have its unique file qualifier.

1. Click Advanced and navigate to Output, Settings. 2. Select Output Data File Qualifier (default is OUTPUT).

You may also specify the following settings:


1. Click Advanced and navigate to Input, Settings. 2. Enter a numeric value in Start at Record. This value

determines the record in the input data file at which the Sort Utility will begin processing (default is 1).




Enriching Your Data




value specifies the maximum number of records to process. By default, all records will be processed.


1. Click Advanced and navigate to Input, Settings. 2. Enter a numeric value in Process Nth Sample. This value








You can specify records to either Select or Bypass under certain conditions in both input and output files. See “Select or Bypass Records” on page 5-37 for instructions on how to specify select/bypass definitions.




Process SettingsOnce you have identified the input and output files, you are ready to specify the settings used to process your data. The settings for processing are managed in the Advanced Settings window.

Sort Fields

To specify sort fields

1. Click Advanced and navigate to Process, Settings. 2. Click the Entry Settings tab. 3. Select the input DDL fields from the drop-down list in the

Key box. These are the fields used in the sort process.

Sort fields are pre-determined according to the country-specific step. You can change the default fields by selecting different sort fields.

4. Select the sort order from the drop-down list in the Order box. Values are either Ascending Order or Descending Order.

Figure 9.1 Sort Entry Settings

To specify collating sequenceYou can specify the collating sequence for the sort order. This is optional.


Geographic fields used in the sort process

Enriching Your Data


1. Click Advanced and navigate to Process, Settings. 2. Click the Entry Settings tab. 3. In the Collating Sequence box, specify the collating

sequence. Values are ASCII, EBCDIC, FOLDED_ASCII, FOLDED_EBCDIC, or MULTI_NATIONAL. If omitted, the default collating sequence defined by the operating system is used.

For detailed information on the collating sequence, see the Sort Utility’s Online Help.

Additional SettingsYou can specify the following additional settings.

To retain the order of same-key recordsIf you want output data to retain the order of same-key records, use Stable Sort.

1. Click Advanced and navigate to Process, Settings. 2. Click the Main Settings tab. 3. Select Stable Sort.

To specify how equal-keyed records are handledYou can specify how duplicate records are handled when there are duplicate keys.

1. Click Advanced and navigate to Process, Settings. 2. Click the Main Settings tab. 3. In the Duplicates box, select an option from the drop-down

list. Values are:

KEEP_ALL - Keeps all the records.

KEEP_ONE - Keeps one record. It does not guarantee that a particular record within the duplicate set will be retained.

KEEP_NONE - Keeps none of the records.

See “Sort” in the TS QualityReference Guide for the complete settings information.



JUST_ DUPS - Keeps just the duplicates.


1. Click Advanced and navigate to Process, Settings. 2. Click the Main Settings tab. 3. Select Enable Debug Output. 4. In the Debug File text box, accept the default path and file

name, or enter a new file name to receive debugging information.


1. Click Advanced and navigate to Process, Settings. 2. Click the Main Settings tab. 3. In the Sample Count text box, enter the number that




1. Click Advanced and navigate to Process, Settings. 2. Click the Main Settings tab. 3. In Settings File Encoding, select the appropriate encoding

from the drop-down list.


Enriching Your Data

9-8 Run the Sorting Utility and Check Results

Run the Sorting Utility and Check Results

To run the Sort Utility and view results

1. Click OK to close the Advanced Settings. 2. Click Run to run the Sorting Utility.


3. Click OK. 4. In the Results window, you will see the Statistics sub-tab.

The Sort Key summary is shown on this sub-tab.

TS Quality offers a number of utilities to perform specific tasks. See Chapter 16, “Utilities”, for a review of these tools.



Using the Postal Matchers 9-9

Using the Postal Matchers

Postal Matchers match your data to the country-specific TS Quality Postal Directories and return address details and database matches.

Postal Matchers perform these functions:

Verify and assign postal codes to name and address data

Assign delivery point identifier (DPID)

Standardize and correct address components

Provide linked data in a presentation form that meets the country addressing standards

The TS Quality Postal Directories are included in the package. Country-specific directories were installed during the TS Quality installation process. You can browse the postal directories using the Postal Directory Browser. See “Browsing the Postal Directory” on page 9-20.

Input and Output SettingsThe Postal Matcher uses the output from the Sort Utility step as input to this step.


1. Open the Postal Matcher step and select the Input Settings tab. Specify the Input File Name and Input DDL Name.

2. If you are using the Census tables and/or DPV tables, select the Include Census Tables or Include DPV Tables box.

3. Select the Output Settings tab. Specify the Output File Name and Output DDL Name.

Enriching Your Data




1. Click Advanced and navigate to Input, Settings. 2. Select Input Data File Qualifier (default is INPUT). 3. Click Advanced and navigate to Output, Settings. 4. Select Output Data File Qualifier (default is OUTPUT).



1. Click Advanced and navigate to Input, Settings. 2. Enter a numeric value in Start at Record. This identifies the

record in the input data file at which the Postal Matcher will begin processing (default is 1).















You can specify records to either Select or Bypass under certain conditions in both input and output files. See “Select or Bypass Records” on page 5-37 for instructions on select and bypass definitions.

Process SettingsOnce you have identified input and output files, you are ready to specify settings to process your data. The settings for processing are managed in the Advanced Settings window.

Postal DirectoriesThe country-specific postal directories are included in TS Quality and were installed when you installed the software. These directories must be accessible to all projects.

See Installing TS Quality for a complete list of postal directories for all countries and the locations of those tables.

To specify postal directories

1. Click Advanced in the Postal Matcher step and navigate to Process, Settings.


Enriching Your Data

9-12 Postal Directories

The Process Settings window will vary from country to country. See TS Quality Reference Guide for a complete list of settings for each country.

2. If you are using the US Postal Matcher, the settings are displayed in Figure 9.2:

Figure 9.2 Postal Matcher Settings (US) 3. Refer to the table below to define each setting.

Setting Description

Postal Base Data File The file that contains street details information: for example, USBASE.tbl.

Postal Level1 Data File The file that contains level1 street name information: for example, USINDEX1.tbl.

Postal Level2 Data File The file that contains level2 city information: for example, USINDEX2.tbl.

Postal Form File The file that contains the postal certification report. Required for USPS form.




4. If you have checked the Include Census Tables and/or Include DPV Tables box in the Input Settings tab, Census Settings and/or DPV Settings window will be enabled under Process. In this case, you must select your census/DPV tables in each window.

Additional SettingsYou can also specify the following additional settings.


1. Click Advanced and navigate to Process, Settings. 2. Select Enable Debug Output. 3. In the Debug File text box, choose the default path and file

name, or enter a different file name to receive debugging information.





Postal Form Database Date Format of date to display on the report: for example, 'MMM YYYY'.

Postal Form List Name of list to be matched against US tables: for example, 'DATA FILE'.

Postal Form Customer Client name to display on the report: for example, 'CUSTOMER NAME'.

Postal Form Job Number The job number to print on the form: for example, 99999.

Setting Description

See “Postal Matchers” in the TS Quality Reference Guide for the complete settings information.

Enriching Your Data

9-14 Run the Postal Matcher and View Results


1. Click the Advanced button and navigate to Process, Settings.

2. In Settings File Encoding, select the correct encoding from the drop-down list.

See “Encoding (Code Page)” on page A-3 for more information.

Run the Postal Matcher and View Results

To run the Postal Matcher and view results

1. Click OK to close Advanced Settings. 2. Click Run to run the Postal Matcher. 3. Select OK. 4. On the Results tab, the Statistics subtab appears. Record

Matches, Processing, Changes and Failures are shown on this tab, as seen in figure 9.3.

Figure 9.3 Postal Matcher Statistics

When you click Run, TS Quality automatically saves your settings. To save your settings without runningthe step, click Save.


Match Levels 9-15

After running the Postal Matchers, the Match Level Codes are generated to identify specific conditions which occur for each record being processed. You should review those codes to analyze the postal matcher results.

Match LevelsThe Match Level Codes indicate the accuracy of the match between country geography data to the appropriate postal table. The match level codes are written and the output record in the xx_gout_match_level field.

In actual use, the “xx” in the descripton above will be replaced with a two-letter country code (Example: US = United States, CA = Canada, GB = Great Britain and DE = Germany). Thus, xx_gout_match_level would become US_gout_match_level for United States data.

Figure 9.4 Match Level Codes

Enriching Your Data

9-16 Match Levels

There are several Match Level Codes. Some common codes include:

A ‘0’ in the US_GOUT_MATCH_LEVEL field indicates that the input data successfully matched to the Directory

An ‘Y’ in the US_GOUT_STREET_NAME_CHANGE field indicates that the street name was changed

Misspelled street name was corrected

The full street name was given to the abbreviated name

See the TS Quality Reference Guide for a complete list of Match Level Codes for the Postal Matchers.


Dual Address Information 9-17

Dual Address Information

Dual Address On the Same LineIn accordance with CASS requirements, if there are two addresses on the same line, referred to as a dual address, the US Postal Matcher may require both addresses for look up. Therefore, the Customer Data Parser (CDP) needs to pass both addresses to the US Postal Matcher.

Dual address information is passed to the US Postal Matcher from the CDP using the us_gin and us_gout areas. The following rules describe how a dual address is handled: 1. Of the two addresses in a dual address, if one address is general

delivery, then us_gin_street_name will contain the other address, and a ‘G’ is set in the first position of us_gout_secondary_type.

2. If one of the addresses is a post office box (PO box), then us_gin_street_name will contain the other address, and a ‘P’ is set in the first position of us_gout_secondary_type. The PO box number is also stored, starting at the second position of us_gout_secondary_type.

3. If the dual address contains both a general delivery address and a PO box number, then ‘PO BOX’ is stored in us_gin_street_name, and a ‘G’ is set in the first position of us_gout_secondary_type.

4. If the dual address contains both a street name and a rural route, then the street name is stored in us_gin_street_name, and a 'R' is stored in the first position of us_gout_secondary_type. In addition, the route number is stored starting at the second position of us_gout_secondary_type, and the box number is stored starting at the second position of us_gout_secondary_number.

Currently, the Customer Data Parser handles the following dual address cases:

street name / general delivery

Enriching Your Data

9-18 Dual Address Information Handling

general delivery / street name

street name / PO box

PO box / street name

general delivery / PO box

PO box / general delivery

rural route / general delivery

general delivery / rural route

rural route / PO box

PO box / rural route

street name / rural route

rural route / street name

Dual Address Information HandlingThe following table shows where dual address information is stored for the above cases:

Table 9.1 Dual Address Information Handling

Dual Address us_gin_street_name

Dual Addr Flag(us_gout_secondary_type)

us_gout_secondary_type[1]

us_gout_secondary_number[1]

1 street name / general delivery

street name G

2 general delivery / street name

street name G

3 street name / PO box

street name P PO box number

4 PO box / street name

street name P PO box number

5 general delivery / PO box

PO box G

6 PO box / general delivery

PO box G


Dual Address On Different Lines 9-19

The maximum length of the PO box number in cases 3, 4, 9 and 10 is 9. It extends into us_gout_secondary_number. The maximum length of the PO box number in cases 11 and 12 is 6.

Dual Address On Different LinesWhen dual address occurs on different lines, the address that is the closest to the geography line is passed to the US Postal Matcher.

Changes to the PREPOSIf an address contains both a PO box and a rural route with a PO box number, there is no room to store the second PO box number. Therefore, the literal ‘PO BOX’ is stored in pr_dwelling3_name_recoded and the PO box number is stored in pr_dwelling3_number.

7 rural route / general delivery

rural route G

8 general delivery / rural route

rural route G

9 rural route / PO box

rural route P PO box number

10 PO box / rural route

rural route P PO box number

11 street name /rural route

street name R route number

PO box number

12 rural route /street name

street name R route number

PO box number

Table 9.1 Dual Address Information Handling

Dual Address us_gin_street_name

Dual Addr Flag(us_gout_secondary_type)

us_gout_secondary_type[1]

us_gout_secondary_number[1]

Enriching Your Data

9-20 Browsing the Postal Directory

Browsing the Postal Directory

You can browse the postal directories using the Postal Directory Browser. The Postal Directory Browser contains separate interactive browsers to view the postal directories for all countries included in the package. There are three levels for browsing: City Level, Street Level, and Street Detail.

The Postal Directory Browser is not available for Asia-Pacific (APAC) countries.

City Level Directory

To browse a city level directory

1. Select Postal Directory Browser on the Tools Palette. The Configuration Dialog box appears.

Figure 9.5 Postal Directory Browser Configuration Dialogue

2. From the drop-down menu, select the country whose postal directory you want to browse.

3. Select the directory containing your pdb_settings directory.

For detailed information on the Postal Directory Browser, see the Online Help.


Street Level Directory 9-21

4. Click OK. The City Level window for the selected country opens. This window lists cities, zip codes, and finance codes.

Figure 9.6 City Level Directory (US) 5. To search for a particular city, use one of the search boxes in

the upper part of the window. For the US, the search boxes are CITY, STATE, ZIPCODE, FINANCE CODE and US Census Search.

6. As you enter data in the search box, the program searches for your entry. You need only enter information in one of the search windows in order for the program to determine the others.

7. To clear the search boxes, click Clear.

Street Level Directory

To browse a street level directory

1. Once you have selected a city, double-click the entry or click Run to bring up the Street Level window. For the US, the

Enriching Your Data

9-22 Street Details

street level window contains all the street names for the selected city.

Figure 9.7 Street Level Directory (US) 2. To search for a certain street, use the search box. As you

enter information in the search box, the program will search for the appropriate entry.

3. To clear the search box, click Clear.

Street Details

To browse the street details

1. Once you have selected a Street Name, double-click the entry or click Run to bring up the Street Level Details window.

2. The Street Level Details window displays street details under the fields for the selected street.


Street Details 9-23

3. These fields vary from country to country. For example, the US fields would look like this:

Figure 9.8 Street Name (US) 4. The Postal Directory Browser displays the Street Detail.

Scroll to view all data presented by the Postal Directory Browser.

Figure 9.9 Street Details (US)

For detailed information on the country fields, see Postal Directory Browser’s Online Help.

Enriching Your Data

9-24 Street Details


10-1

CHAPTER 10 Linking Your Data

Linking Your Data

10-2

This chapter explains how to link your data. Linking is the process of identifying records with a matching relationship (consumer/business) in a file or duplicates in several files. Linking compares records to determine the level of similarity between them.

The result of the comparisons is categorized as either a passed, suspect, or failed match, based on the similarity of data elements in the records, as well as the assigned score of their exceptions.

Data linking involves three steps:

Create window keys using the Window Key Generator

Sort records by the window key using the Sort Utility

Match records using the Relationship Linker


Using the Window Key Generator 10-3

Using the Window Key Generator

The Window Key Generator lets you create window keys that are used to match records in the Relationship Linker. The Relationship Linker tries to match records in the same window key set so that it does not need to compare every record in the database to every other record.

A window key is constructed from elements of input fields, such as the first character of a business name and the first five characters from a postal code field. To generate a window key, you must first create a Window Key Rule that defines which part of each element to include in the key. You can use one or more keys to filter selected records for comparison.

ExampleInput Records:

Window Keys are generated from one of the window key rules provided by the Window Key Generator. For example, Key_List_10 is set to generate the window key as follows:

Key_List_10 rule:

Use the first three character of postal code.

Append to this the first character of the business name.

Append to this the first character and subsequent consonants of the street name.

Append to this a ‘1’ if this is a personal name and a ‘2’ if this is a business name.

CENTER HOSPITAL25 BRATTLE LNARLINGTON MA 02476

CHEMIST ASSOCIATES12 BRANTWOOD RDARLINGTON MA 02476

Linking Your Data


Window key that is generated:

The same window key is generated for both records, bringing them into the same match window for comparison purposes. Subsequent matching rules will indicate that these records are not matches.

Input and Output SettingsThe Window Key Generator uses the Postal Matcher output as input.


1. Open the Window Key Generator step and click the Input Settings tab.

2. Enter the Input File Name and Input DDL Name. 3. Click the Output Settings tab and enter the Output File

Name and Output DDL Name. 4. Enter a file name in the Statistics File Name and Process


You can also specify these additional settings:







024CBR2 024CBR2












You can specify records to either Select or Bypass under certain conditions in both input and output files. See “Select or Bypass Records” on page 5-37 for instructions.

Process SettingsOnce you have specified input and output files, you can define the settings to process your data. The settings for processing are specified in the Advanced Settings window.


Linking Your Data

10-6 Create Window Key Rules

Create Window Key RulesThe Window Key is generated from a window key rule selected from the Key_List. Before you can apply key rules, you must first construct them.

To define window key rules

1. Click Advanced. Navigate to Window Key Rules. 2. Select a key file from the list of Key_List_01-30. 3. In Primary Field Name, select the primary field name you

want to use in building the window key from the drop-down list.

4. In Number Characters Primary Field, specify the number of characters to use from the Primary Field Name.

5. In Primary Field Winkey Code, select conditions you want to apply to the primary field from the drop-down list.

Figure 10.1 Window Key RulesIn this example, the Key_List_10 rule is used to generate the window key as follows:

Use the first three characters of the postal code

Append the first character of the business name

Append the first character and subsequent consonants of the street name

Append a ‘1’ if this is a personal name, and a ‘2’ if this is a business name

You can also specify a secondary window key. The secondary window key will be used if the conditions in Field Value Invoke Secondary Field are met.

You can create up to 30 window keys. The maximum window key length is 50 bytes.



Specify the Window Key Field 10-7

6. Review the list of the fields, number of characters and the window key codes used in the generation of the window key.

Specify the Window Key FieldThe Window Key Field determines where the generated window key will be placed on the output record.

To set window key fields

1. Navigate to Keys, Keys Settings, Source Key. Under Source Key, click on a cell and select the name of Key_List from the drop-down list that will appear (Key_List_01 - 30).

2. In Window Key Field Name, select the field name from the drop-down list. The generated window key will be placed into that field on the output record. In this example, the generated window key from KEY_LIST_10 will be placed into the field named WINDOW_KEY_01:

Figure 10.2 Window Key Field

Additional SettingsYou can also specify these additional settings:



name, or enter the name of a file to receive debugging information.


1. Click Advanced and navigate to Process, Settings.

Linking Your Data


2. In the Sample Count text box, specify the number that indicates the increment sample of records to read and attempt to process from an input data file.



1. Click Advanced and navigate to Process, Settings. 2. In Settings File Encoding, select the encoding from the

drop-down list.


To specify mask file

1. Click Advanced and navigate to Process, Settings. 2. In the Mask File text box, enter the path and file name for

the mask file.

See “Window Key Generator”in the TS Quality Reference Guide for complete settings information.


Run the Window Key Generator and View Results 10-9

Run the Window Key Generator and View Results

To run the Window Key Generator and view results

1. Click OK to close the Advanced Settings. 2. Click Run to run the Window Key Generator step. 3. Select OK. 4. On the Results tab, the Statistics sub-tab appears. 5. Navigate to the Output Settings tab and click the Data

Browser icon to view the WINDOW_KEY_01 field.

Figure 10.3 Window Key Generated 6. Notice that all of the generated window keys end with a ‘2’.

This means all of the records have been designated as business records.


Linking Your Data

10-10 Sorting the Record by the Window Key

Sorting the Record by the Window Key

After creating the window keys, but before running the Relationship Linker, the input record must be sorted by the window key. The Sort Utility is used to sort a file into the desired order. In this example, the output file from the Window Key Generator will be sorted by the WINDOW_KEY_01 field. The output file will have the extension .srt to indicate that the file has been sorted.

Input and Output SettingsThe Sort Utility uses the output from the Window Key Generator step as input.


1. Open the Sorting Utility step. 2. Select the Input Settings tab. 3. Enter file names in the Input File Name and Input DDL

Name text boxes. 4. Click Add. The file name is dynamically added to the table in

the Input Data File Name and Input DDL Name columns.

OR


5. Navigate to the Output Settings tab. 6. Enter the Output File Name and Output DDL Name. The

Output File Name should have the extension of .srt to indicate that it is a sorted file.







To specify the output file qualifierThe File Qualifier is a unique name given to a data file. For the Sort Utility, the output data file must have the unique file qualifier (with .srt suffix).

1. Click Advanced and navigate to Output, Settings. 2. Specify Output Data File Qualifier. The default is OUTPUT.

See “Input and Output Settings” on page 9-2 for the optional input and output settings for the Sort Utility.

Process SettingsOnce you have identified input and output files, you are ready to define the settings to process your data. The settings for processing are managed in the Advanced Settings window.

Specify Sort Fields

To specify sort fields and sort order

1. Click Advanced and navigate to Process, Settings. 2. Click Entry Settings. 3. Select the input DDL fields from the drop-down list in the

Key box. 4. Select the sort order from the drop-down list in the Order

box. Values are either Ascending Order or Descending Order.

Figure 10.4 Sort Field for Window Key

See “Additional Settings” on page 9-6 for the additional settings for the Sort Utility.


Linking Your Data

10-12 Run the Sorting Utility and Check Results

Run the Sorting Utility and Check Results

To run the Sort Utility and view results

1. Click OK to close the Advanced Settings. 2. Click the Run button to run the Sorting Utility.


3. Select OK. 4. On the Results tab, the Statistics sub-tab appears. The

Sort Key Summary is shown on this tab.

Be sure the file to be used in the Relationship Linking step is sorted by the appropriate window key.



Using Relationship Linker 10-13

Using Relationship Linker

The Relationship Linker step identifies the relationships between records in a file at the business and consumer level. It can also identify whether duplicates exist in several files.

The Relationship Linker uses Comparison Routines to determine the level of similarity between records. The result of the comparisons is categorized as either Pass, Suspect, or Fail, based on the similarity of data elements.

There are two types of linking functions:

Window Linking—compares records to other records in the same file

Reference Linking—compares records in the input file to an existing reference file

For each linking, there are two levels of matching:

ConsumerConsumer Level 1 - Household level matchingConsumer Level 2 - Individual level matching

BusinessBusiness Level 1 - Company level matchingBusiness Level 2 - Contact level matching

Comparison Routines are used to compare a variety of types of data including business names, personal names, and geographic components. For example, the ABSOLUTE routine compares two fields and looks for an exact match.

The next chapter will explain how to change and tune the comparison routines. See TS Quality Reference Guide, Appendix C for a detailed description of Relationship Linker routines and their associated scoring values.

Linking Your Data

10-14 Linking Examples

Linking Examples

This section contains detailed examples for each stage of matching, beginning with input data.

Example 1: Sample Input DataAssume that you have the following input data:

Example 2: Data With Appended Window KeyCreate a window key (the last field) using Key_List_10. (See “Create Window Key Rules” on page 10-6 for the rules of Key_List_10. )

----------------------------------------------------------------Val’s Lube & Repair 105 Main St Tyngsboro Ma 01879Val’s Lubrication Main St Tyngsboro Ma 01879John C Nicoli 25 Linnell Cir Billerica Ma 01862J C Nicoli 25 Linnell Cir Billerica Ma 01862John Nicole 91 Linnell Cir Billerica Ma 01862Chris J Nicoli 25 Linnell Cir Billerica Ma 01862Val’s Lube Co 105 Main St Tyngsboro Ma 01879C J Nicoli 25 Linnell Cir Billerica Ma 01862Vasco Laboratories 13 Main St Tyngsboro Ma 01879----------------------------------------------------------------

----------------------------------------------------------------------------Val’s Lube & Repair 105 Main St Tyngsboro Ma 01879 018VA MAI2Val’s Lubrication Main St Tyngsboro Ma 01879 018VA MAI2John C Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1J C Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1John Nicole 91 Linnell Cir Billerica Ma 01862 018NICLIN1Chris J Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1Val’s Lube Co 105 Main St Tyngsboro Ma 01879 018VA MAI2C J Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1Vasco Laboratories 13 Main St Tyngsboro Ma 01879 018VA MAI2----------------------------------------------------------------------------


Linking Examples 10-15

Example 3: Data Sorted By Window KeyThe input record must be sorted by the window key. The Relationship Linker will match records in the same window key set.

Example 4: Data Grouped by Matched Level 1 (Households)After running the Relationship Linker, matched households would look like this:

----------------------------------------------------------------------------John C Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1John Nicole 91 Linnell Cir Billerica Ma 01862 018NICLIN1Chris J Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1J C Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1C J Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1Val’s Lube Co 105 Main St Tyngsboro Ma 01879 018VA MAI2Vasco Laboratories 13 Main St Tyngsboro Ma 01879 018VA MAI2Val’s Lubrication Main St Tyngsboro Ma 01879 018VA MAI2Val’s Lube & Repair 105 Main St Tyngsboro Ma 01879 018VA MAI2----------------------------------------------------------------------------

----------------------------------------------------------------------------John C Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1Chris J Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1J C Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1C J Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1----------------------------------------------------------------------------John Nicole 91 Linnell Cir Billerica Ma 01862 018NICLIN1----------------------------------------------------------------------------Val’s Lube Co 105 Main St Tyngsboro Ma 01879 018VA MAI2Val’s Lubrication Main St Tyngsboro Ma 01879 018VA MAI2Val’s Lube & Repair 105 Main St Tyngsboro Ma 01879 018VA MAI2----------------------------------------------------------------------------Vasco Laboratories 13 Main St Tyngsboro Ma 01879 018VA MAI2----------------------------------------------------------------------------

Linking Your Data

10-16 Linking Examples

Example 5: Data Grouped by Matched Level 2 (Individuals) in Matched Level 1 (Households)

After running the Relationship Linker, matched individuals in matched households would look like this:

* Indicates best record or 'survivor' of the match. See “Using the Create Common Utility” on page 12-3 to learn more about the best record and survivor record.

Example 6: Data Grouped by Suspect Level 1 (Households)After running the Relationship Linker, suspect household would look like this:

-----------------------------------------------------------------------------*John C Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1 J C Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1

*Chris J Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1 C J Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1-----------------------------------------------------------------------------*John Nicole 91 Linnell Cir Billerica Ma 01862 018NICLIN1-----------------------------------------------------------------------------*Val’s Lube Co 105 Main St Tyngsboro Ma 01879 018VA MAI2 Val’s Lubrication Main St Tyngsboro Ma 01879 018VA MAI2 Val’s Lube & Repair 105 Main St Tyngsboro Ma 01879 018VA MAI2-----------------------------------------------------------------------------*Vasco Laboratories 13 Main St Tyngsboro Ma 01879 018VA MAI2-----------------------------------------------------------------------------

---------------------------------------------------------------------------- John C Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1 Chris J Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1 J C Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1 C J Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1 John Nicole 91 Linnell Cir Billerica Ma 01862 018NICLIN1 ---------------------------------------------------------------------------- Val’s Lube Co 105 Main St Tyngsboro Ma 01879 018VA MAI2 Val’s Lubrication Main St Tyngsboro Ma 01879 018VA MAI2 Val’s Lube & Repair 105 Main St Tyngsboro Ma 01879 018VA MAI2 ---------------------------------------------------------------------------- Vasco Laboratories 13 Main St Tyngsboro Ma 01879 018VA MAI2 ----------------------------------------------------------------------------


Linking Examples 10-17

Example 7: Data Grouped by Suspect Level 2 (Individuals) within Suspect Level 1 (Households)

After running the Relationship Linker, suspect individuals in suspect households would look like this:

-----------------------------------------------------------------------------*John C Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1 J C Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1 John Nicole 91 Linnell Cir Billerica Ma 01862 018NICLIN1

*Chris J Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1 C J Nicoli 25 Linnell Cir Billerica Ma 01862 018NICLIN1-----------------------------------------------------------------------------*Val’s Lube Co 105 Main St Tyngsboro Ma 01879 018VA MAI2 Val’s Lubrication Main St Tyngsboro Ma 01879 018VA MAI2 Val’s Lube & Repair 105 Main St Tyngsboro Ma 01879 018VA MAI2-----------------------------------------------------------------------------*Vasco Laboratories 13 Main St Tyngsboro Ma 01879 018VA MAI2-----------------------------------------------------------------------------

Linking Your Data

10-18 Window Linking

Window LinkingWindow Linking compares records to other records in the same file. A group of records is matched to each other, one window key set at a time.

Input and Output SettingsThe Relationship Linker uses the output from the Sort Utility2 step as input.


1. Open the Relationship Linker step and select the Input Settings tab.



- OR -


4. Navigate to the Output Settings tab. 5. Enter file names in the Output File Name and Output DDL

Name text boxes. 6. Optionally, specify a Linking File. A linking file indicates

which matched records are linked together with common data. If you want to produce a linking file, identify the Linking Data File and Linking DDL File.









You can also specify the following settings.











1. Click Advanced and navigate to Input, Settings.


Linking Your Data

10-20 Basic Settings

2. Select Input Data File Delimiter Encoding and Input Data File Delimiter from the drop-down list.





Basic SettingsYou must specify the match method and the name form field.

To select match method

1. Click Advanced and navigate to Process, Settings. 2. In Match Method, select Window Matching from the

drop-down list.

To specify name form field

1. Click Advanced and navigate to Process, Settings.

2. In Name Form Field, select the name form field from the drop-down list. The Name Form Field contains the Consumer/Business flag. This field is created by the Transformer or Customer Data Parser, and is used by the Relationship Linker to distinguish between consumer and business records.



Field and Pattern Files 10-21

The consumer/business flag within the matching window must be the same.

Field and Pattern FilesThe Relationship Linker uses Field Files and Pattern Files in the linking process.The default files for your country are included in the TS Quality package.

Field Files - contains fields to compare in the linking process.

Default Field Files:\TrilliumSoftware\tsq10r5s\<project>\settings\xxbus1fld.stx (business level1)xxbus2fld.stx (business level2)xxcon1fld.stx (consumer level1)xxcon2fld.stx (consumer level2)(xx = 2-digit country code)

Pattern Files - contains patterns or ‘report cards’ to determine the level of similarity between the records in the linking process. The pattern is assigned a number and designated with a pass, suspect, or fail.

Default Pattern Files:\TrilliumSoftware\tsq10r5s\<project>\settings\xxbus1pat.stx (business level1)xxbus2pat.stx (business level2)xxcon1pat.stx (consumer level1)xxcon2pat.stx (consumer level2)(xx = 2-digit country code)

Flag Description

1 Consumer

2 Business

Linking Your Data

10-22 Window Key Field

To specify field and pattern settings

1. Click Advanced and navigate to Process, Field Pattern Settings.

2. Accept the default country specific settings files or select the customized settings file from the drop-down list.

See “Using the Relationship Linker Rule Editor” on page 11-12 to learn more about customizing the field and pattern files.

Window Key FieldThe Relationship Linker tries to match records in the same window key set. Therefore, you must specify the window key field.

To specify window key field

1. Navigate to Process, Transaction Window Settings. 2. In Window Key Field, select the window key field you are

using for matching. In this example, Window Key Field is set to WINDOW_KEY_01.

Figure 10.5 Window Key Field

Window SizeYou can control how many records are added to the match window. If there are more records of one window key than the value specified, additional windows are created for the remaining records. For example, if you have 1000 records and set the value at 500, additional match windows are created for the remaining records.


Run the Relationship Linker and View Results 10-23

To specify maximum window size

1. Click Advanced and navigate to Process, Transaction Window Settings.

2. In Maximum Window Size, specify a numeric value.

For the additional settings for window linking, See “Additional Settings” on page 10-28.

Run the Relationship Linker and View Results

To run the Relationship Linker and view results

1. Click OK to close the Advanced Settings and then click Run to run the Relationship Linker.

You can right-click on a step and select Run Selected.

2. Select OK. 3. On the Results tab, the Statistics sub-tab appears. Review

the statistics for the Relationship Linker on the sub-tab. 4. Click Results Analyzer to view the record detail for the

linking process. The output data set is sorted by matched individual number within matched household number within the window key.

The Results Analyzer allows the user to view the actual data and match results. We will explain this tool in detail in the next chapter.


Linking Your Data

10-24 Reference Linking

Reference LinkingReference Linking compares records in your input file to an existing reference file. It is mainly used to update new records within the existing master file in the database.

For example, suppose you’ve received a new set of records after running the initial linking. In this case, you would take the new records as your input file and the initial matched records as your reference file. You can compare the input file with the reference file and verify the existence of new records in the reference file, and update the file if necessary.

If a match is found, a matching key number is copied from the reference record to the input record. If no match is found, a new key number is generated and appended onto the input record. The number of output records in reference linking is the same as in the input records. Users can use the matching key numbers to update the reference file.

See “Relationship Linker” in the TS Quality Reference Guide for detailed information on Reference Linking.



1. Open the Relationship Linker step and select the Input Settings tab.



- OR -




4. Click the Reference Match checkbox to enable reference matching. When this checkbox is checked, the Reference File and Reference DDL options become enabled.

5. Specify the Reference File and Reference DDL. 6. Navigate to the Output Settings tab. 7. Enter file names in the Output File Name and Output DDL

Name text boxes. 8. You may also specify a Linking File. A linking file indicates

which matched records are linked together with common data. If you want to produce a linking file, specify the Linking Data File and Linking DDL File.


To specify a second output fileA second output file for the reference linking contains all records from the reference file that had a matching record in the input file.

1. Click Advanced and navigate to Reference, Output Settings.

2. Specify Reference Output Data File and Reference Output DDL File.

To specify the input/output file qualifiersA File Qualifier is an unique name given to a data file. Each input and output data file must have its own unique file qualifier.

1. Click Advanced and navigate to Reference, Input Settings.

2. Specify Reference Input Data File Qualifier. 3. Click Advanced and navigate to Reference, Output

Settings. 4. Specify Reference Output Data File Qualifier.

See the Window Linking section for the optional input and output settings.




Linking Your Data

10-26 Basic Settings

Basic SettingsThe steps for settings of Match Method, Name Form Field, Field Pattern, and Window Key are the same as those for window linking. See “Basic Settings” on page 10-20 for details.

Specify Matching NumbersIf a match is determined, a matching key number is copied from the reference record to the input record. You must specify the fields where those matching numbers are stored.

Reference Level 1 Number - Identifies the field in the reference file where existing level 1 numbers are stored. For matched records at level1, this number in the reference file is copied to the input file.

Reference Level 2 Number - Identifies the field in the reference file where existing level 2 numbers are stored. For matched records at level2, this number in the reference file is copied to the input file.

Reference Record ID - Identifies the field where record ID are stored. This value must be unique between reference file and input file.

You must add these fields to the DDL file prior to attempting reference linking.

To specify matching numbers

1. Click Advanced and navigate to Process, Reference Matching. Enter the Reference Level1 Number field.

2. Enter the Reference Level2 Number field. 3. Enter the Reference Record ID field.

To specify numbers when there is no matchIf an input record does not match any record in the reference file at the level 1, it will be assigned a number from the Number Generation Start and Number Generation Cycle.


Specify Matching Numbers 10-27

1. Click Advanced and navigate to Process, Reference Matching.

2. In Number Generation Start, enter a starting number for unmatched new records, like 0

The starting number will be this value plus 1.

3. In Number Generation Cycle, enter a text or numeric string which will be added to the beginning of the Number Generation Start value, as in NM. If you do not specify a value, the default will be used. The default is YYDDD, where YY is the last 2 digits of the year, and DDD is a number date from 1/1).

You can also specify the following additional settings.

To match all reference recordsYou can control whether to identify all matches when a input record matches more than one record in the reference file.


2. To enable all matches, check the box next to Reference File Match All. If this box is not checked, Relationship Linker does not attempt to match any additional records on the reference file after matching one record.

To specify maximum window sizeYou can control how many records are added to the match window. If there are more records of one window key than the value specified, additional windows are created for the remaining records. For example, if you have 1000 records and set this value to 500, additional Match windows are created for the remaining records.


2. In Maximum Window Size, specifies a numeric value.

Linking Your Data

10-28 Display Match/Suspect Pattern IDs

Display Match/Suspect Pattern IDsIf you want to display matched or suspect household/individual pattern IDs in the output, you can specify the field to store those IDs.

To specify fields for match/suspect pattern IDs


2. In Reference Level1 Pass (Suspect) Pattern Field, specify a DDL field where Level 1 pattern IDs are written out for output.

3. In Reference Level2 Pass (Suspect) Pattern Field, specify a DDL field where Level 2 pattern IDs are written out for output.

Additional SettingsFor both Window Linking and Reference Linking, you can configure these additional settings:



name, or enter the name of the file which will receive debugging information.




See “Relationship Linker” in the TS Quality Reference Guide for the complete settings information.


Run the Relationship Linker and View Results 10-29




drop-down list.


To specify mask file

1. Click Advanced and navigate to Process, Settings. 2. In the Mask File text box, enter the path and file name for

the mask file.

Run the Relationship Linker and View Results

To run the Relationship Linker and view results

1. Click OK to close the Advanced Settings. 2. Click Run button to run the Relationship Linker.

You can right-click on a step and select Run Selected.

3. Select OK. 4. On the Results tab, the Statistics sub-tab appears. Review

the statistics for the Relationship Linker on the sub-tab. 5. Click the Results Analyzer button to view the record detail

for the linking process.

The Results Analyzer tool allows the user to view the actual data and match results. We will explain this tool in detail in the next chapter.


Linking Your Data

10-30 Run the Relationship Linker and View Results


11-1

CHAPTER 11 Tuning the Linking Rules

Tuning the Linking Rules

11-2

The output of the Relationship Linking process is displayed in the Relationship Linker Results Analyzer. This tool allows you to view and analyze linked results. After viewing these results, you can determine if there is a need to customize the rules of the link process to meet your business requirements.


Use the Results Analyzer to view and analyze the results of the Relationship Linker process

Use the Rule Editor to analyze the linking rules and add a field to compare in the link process

Customize the field and pattern lists by adding fields and patterns to the process

Re-run the Relationship Linker using the new linking rules and view results

Use the Data Comparison Calculator to test the comparison routine and appropriate score


Using the Relationship Linker Results Analyzer 11-3

Using the Relationship Linker Results Analyzer

The Relationship Linker Results Analyzer displays linked records in a spreadsheet format. Once you have run the Relationship Linker, you can display the match results using this tool. You can browse the matched results and examine the data to see how records were initially matched. You can then decide if it is necessary to change the business rules to meet your requirements.

View the Linking Results

To start the Results Analyzer

1. Open the Relationship Linker step, and click the Results Analyzer button.

Figure 11.1 Launch Results Analyzer 2. The Relationship Linker Results Analyzer opens.

Results Analyzer button


11-4 View the Linking Results

In the Results Analyzer, each column is titled with either a field from one of the match comparison list files or a key field from the DDL file. The individual record data is displayed in the horizontal row. View linked data by clicking the appropriate tab.

Figure 11.2 Relationship Linker Results ViewLinked records are grouped together by color, alternating between blue and white. If a record is by itself and not above or below another record of the same color, then that record did not match any other record.

If all of the records are business records, as in the example above, no Consumer_Lev1 or Consumer_Lev2 records are displayed.

Records are displayed based on a specific key, depending on the currently-selected tab. For example, if you are on the Business_Lev1 tab, looking at Business_Lev1 matches, then matches will be

Click on the tab and view data

Matched records

Match key


View the Linking Results 11-5

displayed based on the lev1_matched field.

When viewing the relationship linking results in the Results Analyzer, you can see Matched records as well as Suspect records:

Matched – Displays data with exact matches between records. All records met the requirements for pass patterns (patterns that begin with P).

Suspect – Lists data with the most likely matches between records. All records met the requirements for suspect patterns (patterns that begin with S).

To view matched and suspect records

1. Matched records at the Consumer/Business Level_1 and Level_1 are displayed by default.

2. To view Suspect records at the Consumer/Business Level_1, click the Suspect radio button. To execute this view, click the red exclamation point .

Figure 11.3 Switch Matched and Suspect Records 3. When you view suspect matches, the field for the matched

level is highlighted in red and italicized, in addition to the field that contains the match key (highlighted in bold and in

The field with the match key will be in bold, and highlighted in red, to show that this key is being used to show matches.

Check this button to view suspect matches

Click this button to execute the selected view


11-6 View the Linking Results

red). This shows how the matches reflect in a suspect level versus those in a matched level.

Figure 11.4 Suspect View with Matched Level 4. To return to the Matched record view at the Consumer/

Business Level_1, click the Matched radio button. To execute this view, click the red exclamation point.

5. If you want to review the Suspect records for Level_2, select the Suspect radio button next to Level_2. To


Edit Fields to Display 11-7

execute this view click on the red exclamation point, and then select the Business_Lev2 tab.

Figure 11.5 Business Level View

Edit Fields to DisplayIn the Results Analyzer view, you can select and delete fields to display.

To select and delete fields to display

1. Select Tools, Browse More Fields. 2. The left window shows all Available Fields. Any field can be

highlighted and dragged into the Selected Fields window. A field can also be highlighted and moved by clicking Add. If you want to move all fields, click Add All.

3. Click Show. Every field that is shown in the window will appear as a column in the main viewer.

4. To delete fields from the display, select those fields in the Selected Fields window.

If you select Show Standard Fields in the Format menu, it displays only the standard DDL fields.


11-8 Save Fields to Display

5. Click Delete, then Show, to update the display.

Figure 11.6 Select Fields to Display 6. To search for a field, enter the field name in the Search text

box. Click Show.

Save Fields to DisplayYou can also save a view of fields in this window. If you frequently look at the same fields in a file, saving a view can save time.

To save a view of fields

1. In the Browse More Fields window, select fields to display. 2. Click the Save button.


View Records in a Range 11-9

3. In the Save window, name the view, and then identify the desired location for the file.

Figure 11.7 Save Fields to Display 4. To view a stored view, select the name of the view from the

drop-down menu in Select a Selected Customized View. The fields will be loaded in the Selected Fields window. Click Show to view the stored fields.

You can use Back and Forward on the Tools menu to display the previous or next view.

View Records in a RangeIf your output file is very large, it is a good idea to search smaller subsets of your file. This will make the program run more quickly. You can select a range of records within the file to view.

To view records in a range

1. Enter a starting record number in the Browse Records From text box and an ending number in the To text box.


11-10 View Records in a Range

2. Click Go. Only records in the specified range will be displayed.

Figure 11.8 View Records in a RangeYou can also use the Previous Block and Next Block buttons to browse the data. The program browses in “blocks”, based on the entered range.For example, if you entered a range of Record 6–10, then click Next Block, the program displays Record 11–15. If you click Previous Block, the program displays Record 1–5.There are also Previous Block and Next Block buttons displayed at the top and bottom of the vertical scroll bar on the right.

To view records by group size or pattern IDYou can view records in a group, either by group size or pattern ID.

1. Enter values in the Minimum Number of Members and/or Pattern Number text boxes.

Figure 11.9 Record Group Size or Pattern ID 2. Only matched groups that correspond to that value will be

displayed. For example, if you enter 2 for the Minimum Number of Members and 100 for the Pattern Number, only

If you notice breaks in the record number sequence, it is because each record is either a Consumer or Business level record.

Previous Block

Next Block


View Records in a Range 11-11

matches in groups of two or more with the pattern number 100 will be displayed.

Figure 11.10 View Records by Pattern Number

For more detailed information about Result Analyzer, see the Online Help.

Pattern number

Matches in groups of 2 or more


11-12 Using the Relationship Linker Rule Editor

Using the Relationship Linker Rule Editor

Once you have reviewed the data, you may want to add or change a field and a pattern in the match rules to meet your business requirements. For example, any records at the Contact level (Business_Lev2) that have the same Last_name and Account_number field should be positively matched together. You can use the Relationship Linker Rule Editor to change the match rules to achieve that goal.

The field and pattern list files used in the Relationship Linker process are displayed in the Relationship Linker Rule Editor.

View the Linking Rules

To start the Rule Editor

1. From the Relationship Linker step, click on the Rules Editor button on the bottom left.

Figure 11.11 Launch Relationship Linker Rule Editor 2. The Relationship Linker Rules Editor opens.

Rules Editor button


View the Linking Rules 11-13

To view the Linking Rules

1. When you open an existing field and existing pattern files, the Field List Editor (upper pane) and Grade Pattern Editor (lower pane) open automatically.

2. Select Tile Horizontally or Tile Vertically from the Window menu to view both field and pattern lists. You can view the Consumer or Business, Level 1 or Level 2 by clicking on the appropriate tab.

Figure 11.12 Relationship Linker Rule Editor

Click on the tab and view different levels of field and patterns.

Click on a column heading and drag it to the desired location to rearrange the columns.


11-14 View the Linking Rules

The following table contains a list of columns in the Field List Editor window and Grade Pattern Editor window.

Column Description

Field List Editor

Description Describe all fields in the field settings file. Double-click the cell to edit it.

Score A - E Specify up to 5 grade thresholds. For example, the first score is the threshold for grade A and the second score is the threshold for grade B. A through D must be positive; E can be positive or negative.

Comparison Routine

The Linker calls this routine to perform the field comparison. Double-click the cell, select the desired routine from the list, and click OK.

Propagation Routine

The Linker calls this routine to perform the comparison propagation for this field. Double-click the cell and select a routine from the drop-down list.

Field Name 1 - 3

Specify up to three fields for linking. Double-click the cell to open the field name list and double-click the desired field name.

Routine Modifier

Specify a value passed to a comparison routine. Each routine uses a different number of modifiers; some use none. Double-click the cell to open the list and double click on a modifier.

Grade Pattern Editor

Category Lists the pattern category: P(Pass), F (Fail), or S (Suspect). Click inside the cell and select the pattern from the drop-down menu.

Pattern ID The pattern ID is a number ranging from 0 to 999. No duplicates are allowed.


Customize the Field and Pattern Lists 11-15

Customize the Field and Pattern ListsThe Field List contains the fields which are compared in the link process. The Pattern List contains patterns used to determine the degree of similarity between records.

You can customize the linking process by adding fields and/or patterns to the process. For example, you can add a field and a pattern to the link rules so that any records at the Contact level(Business_Lev2) with the same Last_name and Account_number field will be positively matched together.

To add a field to the field list

1. In the Field List Editor click last_name in the Description column.

2. Select Edit, Insert After Selected Row to add a row for a new field.

If you insert a new row in the Field List Editor, a new column is automatically inserted into the Grade Pattern Editor. Conversely, if you delete a row in the Field List Editor, the corresponding column in the Grade Pattern Editor is also deleted.

3. Double-click the Description column and add a description of account_number for this row.

4. Double-click the Score A column and add 100. This means that you want to compare the Account_numbers for two records and they must match at 100%.

Field Name Columns

The remaining column headings take their names from the description column in the Field List Editor window. The valid grades are A, B, C, D, and E. The hyphen (-) represents a wildcard character. Click inside the cell and select the grade from the drop-down menu.


11-16 Customize the Field and Pattern Lists

5. Double-click in the Comparison Routine column and select partial1. partial1 is the routine used to compare the actual field data in the Account_number fields.

6. Double-click the Field Name 1 column and select Account_number as the field for comparison.

Figure 11.13 Account_number Field Added

To add a pattern to the pattern list

1. In the Grade Pattern Editor click Pattern ID 128 and select Edit, Insert Before Selected Row. The new pattern row is added.

2. In the Category column select a P for a positive match pattern (Pass). In the Pattern ID column give the pattern the number 400 as this is very different from the other patterns in the list.

The Pattern IDs 128 and 400 have no special meaning. They are used here as examples only.


Customize the Field and Pattern Lists 11-17

3. Select an ‘A’ for the grade for the last_name field and for the grade for the account_number. The grade ‘A’ means Score A (100) for those fields.

Figure 11.14 Pattern ID 400 Added 4. Select File, Save to save the file. When asked Do you want

to continue? select Yes. When asked Do you want to delete subsequent duplicate patterns? select No.

5. Close the Relationship Linker Rule Editor. 6. Close the Results Analyzer.7. See “Checking Errors in the Field and Pattern Lists” on page

11-18 to verify any errors in the changes you have made.


11-18 Checking Errors in the Field and Pattern Lists

Checking Errors in the Field and Pattern ListsIf you have made changes to the field and/or pattern file, make sure to run the Error Report. The program displays a message if it discovers a problem in the file, such as missing routines or duplicate pattern IDs. For example, the following grade pattern file has a duplicated pattern ID:

Figure 11.15 Duplicate Pattern ID

To check errors in the field and pattern lists

1. After the changes have been made, select Error Report from the Tools menu. If an error is found, you will receive an error message.

Figure 11.16 Error Message for Single Error

Pattern ID 102 is duplicated


Re-Run the Relationship Linker and View Results 11-19

2. The message prompts you to continue. If you click Yes, the error checking continues. You may see another error such as the one below.

Figure 11.17 Error Message for Additional Errors 3. This message tells you that some grade patterns are

duplicates. Click Yes to remove all duplicates. Click No to leave the duplicates in the file.

4. Once you have deleted these duplicate patterns, a message appears confirming the deletion.

5. Select Save from the File menu to save the file.

For more detailed information about the Relationship Linker Rule Editor, see the Online Help.

Re-Run the Relationship Linker and View Results

Once a change is made to the field or pattern list, the Relationship Linker process must be re-run. At this time, the Relationship Linker will use the new linking rules you defined.

To run the Relationship Linker with the new rules

1. Open the Relationship Linker step and click Run.

You can right click on a step and select Run Selected. This is an alternate way to run the step.


11-20 Re-Run the Relationship Linker and View Results

2. Click Results Analyzer to view the new results. 3. Click the Business_Lev2 tab to view the results of the new

contact matching. 4. In the lower left corner, type 400 in the Pattern Number

box and click OK. This will show only records that were matched using Pattern ID 400. Review the records. Notice that this new field and pattern were able to link records that use ‘nicknames’ in the first name field.

Figure 11.18 New Matching Results

Pattern Number box


Using the Data Comparison Calculator 11-21

Using the Data Comparison Calculator

The Data Comparison Calculator can help you determine the correct comparison routine and appropriate score for fields that you add to the match process. For example, you can test the difference between the ABSOLUTE and PARTIAL1 comparison routines and decide which routine you want to use.

The steps for testing the routines are the same for most of the comparison routines; the exceptions are SUBSTRING, DATE, ARRAY1, ARRAY2 and MXDNAME. This section shows the general steps for using these routines.

See the TS Quality Reference Guide, Appendix C for a detailed description of Relationship Linker routines and their associated scoring values.

To perform a comparison test

1. From the Relationship Linker Results Analyzer select Tools, Invoke Data Comparison Calculator.

2. The Data Comparison Calculator will open.1. Enter a value for the first field in the Record 1, Field 1

text box, and then enter a value for the second field in the Record 2, Field 1 text box.

2. Highlight a routine in the Comparison Routines list. If the routine uses modifiers, they will appear in the Routine Modifiers box. Select a modifier from the list or highlight (none) (default).

3. Click Comparison. The score appears in the Score box.Example

In this example, two values in the Account_number field are compared using the ABSOLUTE and PARTIAL1 routines. ABSOLUTE compares two fields and looks for an exact match. Score 100 is an exact match, including blank vs. blank. PARTIAL1 compares two fields and looks for an exact match, but applies different scores for blanks. Score 100 is an exact match excluding

Check the Match Case box if you are performing a case-sensitive comparison.


11-22 Using the Data Comparison Calculator

blank vs. blank, 75 is blank field vs. non-blank field, and 65 is blank field vs. blank field.

To run the ABSOLUTE and PARTIAL1 comparison routines

1. Type a sample Account_number into the Record 1 and Record 2 Field 1 boxes. Select the PARTIAL1 Comparison Routine from the Comparison Routines list.

Figure 11.19 Data Comparison Calculator 2. Click Compare. The score is 100. Change the Comparison

Routine to ABSOLUTE and click Compare. The score is again 100.

3. Clear the Record 1 and Record 2 Field 1 boxes. Now click Compare. The score is again 100. Change the Comparison Routine to PARTIAL1. The score of a blank field to a blank field using PARTIAL1 is 65. This is an important distinction. We did not want two records with blank Account_number


Using the Data Comparison Calculator 11-23

fields to positive match together.

For more detailed information on the Data Comparison Calculator, see the Online Help.


11-24 Using the Data Comparison Calculator


12-1

CHAPTER 12 Selecting the Best Record

Selecting the Best Record

12-2

The Create Common Utility lets you select the “best” record of a matched set of records (called the survivor), and then copies that record to a field in another record, across a matched set of records. This selection process is defined by decision routines. You can commonize data in the current field or in a new field, using data records that originate in another field.


Understand commonization and survivorship

Determine match key level settings

Identify common fields

Assign a survivor record

Run Create Common and view its results

Use the Data Browser to view the actual record data


Using the Create Common Utility 12-3

Using the Create Common Utility

The Create Common Utility allows you to set options that copy data across a linked record set. This module has two major functions:

Commonization—Copy data in one field to other fields in records linked by a match key. You can commonize data in an existing field or in a new field. You can also commonize data sourced from another field.

Survivorship—Select a user-defined “survivor” record among a group of records, using survivor selection rules. This function flags a single record at any level, indicating the “best” record of the linked set.

Input data file must be sorted by match keys (such as LEV1_MATCHED) prior to being processed by this module. If you run this module right after the Relationship Linker step, the input file is automatically sorted by the match keys. If you run this module separately, be sure to sort the input file by match keys.

ExampleAssume that the best record is determined according to the most recent date in the Last_contact_date field. In this example, you want to copy the account representative information with the most recent contact date to the set of linked records, and then identify one account representative per business.

Commonize the account representative from the record that has the most recent Last_contact_date field.

Once the data is copied, place an indicator of ‘1’ into the Survivor_flag field for the record that has the most recent Last_contact_date.

This indicator will be used later to select the best records from the file.

You can use up to ten levels of output data from the Relationship Linker.



Input and Output SettingsThe Create Common Utility uses the output from the Relationship Linker step as input to this step.


1. Open the Create Common step, and select the Input Settings tab.


3. Select the Output Settings tab. 4. Specify the Output File Name and Output DDL Name. 5. Specify a file name in the Statistics File Name and

Process Log Name text boxes.




To specify maximum array recordsYou can specify the number of records held in memory for the Match Key Level 1 setting (The Match Key Level settings are described later in this chapter). The default is 10000.

1. Click Advanced and navigate to Input, Settings. 2. Enter a numeric value in Maximum Array Records.








specifies the mamimum number of records to process. By default, all records will be processed.














Process SettingsOnce you have specified input and output files, you can specify the settings to process your data. The settings for processing are managed in the Advanced Settings window.

Match Key Level SettingsMatch Key Level settings specify the field that holds the match key used to group records for evaluation. For example, business records that were matched together usually have the same LEV1_MATCHED number. Only records in the same group will be compared and evaluated.

To specify match key level setting

1. Click Advanced and navigate to Output, Level Settings. 2. In Key Field, select the match key from the drop-down list

of DDL fields.

Figure 12.1 Match Key Level Settings

Common FieldsThe Common Fields designate the decision routines used to copy data from one field into other fields in the records linked by a common key.

To specify common field

1. Navigate to Output, Common Fields.A red flag indicates a REQUIRED field for this operation.


Common Fields 12-7

2. Specify values for the following settings:

ExampleThis example uses a decision routine called HIGHCHAR_NBNZ.

The HIGHCHAR_NBNZ routine commonizes the highest value (non-blank, non-zero) that occurs in the Last_contact_date field of all records at a record level of 1.

It copies the values in the Acct_rep field with the most recent Last_contact_date (HIGHCHAR_NBNZ) and puts this value into the Common_rep field.

Figure 12.2 Common Fields Settings Record 1 contains the highest value in the Last_contact_

date field. The data in Acct_rep in Record 1 is commonized into the Common_rep field:

Setting Description

Level ID Numeric value that specifies the level of commonization, for up to 10 levels of data hierarchy. For example: Level 1=business, Level 2=location, Level 3=contact, and so on.

Test Field Field that contains information necessary to commonize data across records. Works in conjunction with the decision routines.

Decision Routine Encoding Type of encoding used by the Decision Routine.

Decision Routine Defines what and how data is processed.

From Field Field that contains data which is modified or moved to the Target Field.

Target Field Field used to store data from the source field based on the decision routine.


12-8 Survivor Record

Survivor RecordYou can designate a survivor record from a group of records linked by a match key. Any record flagged as the survivor is assigned a flag number. The Assign Survivor function defines the test field, decision routine and target field for survivor identification.

To assign survivor

1. Navigate to Output, Assign Survivor. 2. Specify values for the following settings:

Input LEV1_MATCHED Last_contact_date

Acct_rep Output Common_rep

Record 1 00000013 2005-03-17 JLS Record 1 JLS

Record 2 00000013 2003-01-07 BPL Record 2 JLS

Record 3 00000013 2004-02-08 JCN Record 3 JLS

Record 4 00000014 2005-01-18 KJP Record 4 KJP

Record 5 00000014 2003-11-09 MMR Record 5 KJP

Setting Description

Level ID Numeric value that specifies the level of commonization, for up to 10 levels of data hierarchy. For example: Level 1=business, Level 2=location, Level 3=contact, and so on.

Test Field Field that contains information necessary to commonize data across records. Works in conjunction with the decision routines.

Decision Routine Specifies which decision routine to use for the survivorship function.

Decision Routine Encoding Type of encoding used by the Decision Routine.

Target Field Field used to store data from the source field when the create common rule is satisfied.



Survivor Record 12-9

ExampleThis example uses a decision routine called HIGHCHAR_NBNZ. Assume that the best record is the one with the most recent date (HIGHCHAR_NBNZ) in the Last_contact_date field. This record needs a survivor flag of ‘1’ in the Survivor_flag field to identify it as the best record for the LEV1_MATCHED grouping.

Figure 12.3 Survivor Settings The HIGHCHAR_NBNZ routine looks for the highest

character value (non-blank, non-zero) that occurs in the Last_contact_date field of all records at a record level of 1.

In this case, Records 1 and 4 contain the highest date contact, so the program takes those records as survivor. As a result, the Survivor_flag field is flagged with a ‘1’.

Assigned Value Numeric value that is assigned to the survivor record.

Setting Description

Input LEV1_MATCHED

Last_contact_date

Acct_rep Output Common_rep

Surviror_flag

Record 1 00000013 2005-03-17 JLS Record 1 JLS 1

Record 2 00000013 2003-01-07 BPL Record 2 JLS

Record 3 00000013 2004-02-08 JCN Record 3 JLS

Record 4 00000014 2005-01-18 KJP Record 4 KJP 1

Record 5 00000014 2003-11-09 MMR Record 5 KJP





1. Click Advanced and navigate to Additional.... 2. Select Enable Debug Output. 3. In the Debug File text box, accept the default path and file

name, or specify a different file to receive debugging information.

To count the number of records processed

1. Click Advanced and navigate to Additional.... 2. In the Sample Count text box, specify the number that




1. Click Advanced and navigate to Additional.... 2. In Settings File Encoding, select the encoding from the

drop-down list.


See “Create Common” in the TS Quality Reference Guide for complete settings information.


Run the Create Common and View Results 12-11

Run the Create Common and View Results

To run the Create Common and view results

1. Click OK to close the Advanced Settings. 2. Click Run to run the Create Common Utility.


3. Select OK. 4. On the Results tab, the Statistics sub-tab appears. 5. Navigate to the Output Settings tab and click the Data

Browser button next to the Output File Name. 6. In the Field Selection window, select the fields you used for

the Create Common process, such as LEV1_MATCHED, Acct_rep, Last_contact_date, Survivor_flag, and Common_rep.

7. Click Display to see the records. 8. Notice that for one Business household, the Active record

with the most recent Last_contact_date has a ‘1’ in the Survivor_flag field. All records in a Business Household have the Acct_rep from the record with the most recent Last_contact_date copied into the Common_rep field.

Figure 12.4 Create Common Results Displayed



12-12 Create Common Decision Routines

Create Common Decision Routines

Decision Routines are the program rules and instructions used in the Create Common Utility. They control two functions:

How data is searched for and how commonization will function within the program

How records will be set up for survivorship

Routines marked “For commonization only” can’t be used to determine a surviving record.

Table 12-1: Create Common Decision Routines

Decision Routine Description

LOWEST Lowest numeric value for selected data field

LOWEST_NB Lowest non-blank numeric value for selected data field

LOWEST_NZ Lowest non-zero numeric value for selected data field

LOWEST_NBNZ Lowest non-blank/non-zero numeric value for selected data field

HIGHEST Highest numeric value for selected data field

HIGHEST_NB Highest non-blank numeric value for selected data field

HIGHEST_NZ Highest non-zero numeric value for selected data field

HIGHEST_NBNZ Highest non-blank/non-zero numeric value for selected data field

LOWCHAR Lowest character value for selected data field

LOWCHAR_NB Lowest non-blank character value for selected data field

LOWCHAR_NZ Lowest non-zero character value for selected data field

LOWCHAR_NBNZ Lowest non-blank/non-zero character value for selected data field

HIGHCHAR Highest character value for selected data field

HIGHCHAR_NB Highest non-bank character value for selected data field

HIGHCHAR_NZ Highest non-zero character value for selected data field


Create Common Decision Routines 12-13

HIGHCHAR_NBNZ Highest non-blank/non-zero character value for selected data field

LEAST least occurring value for selected field

LEAST_NB Least occurring non-blank value for selected field

LEAST_NZ Least occurring non-zero value for selected field

LEAST_NBNZ Least occurring non-blank/non-zero value for selected field

LITERAL The specified value of a Selected Data Field. Value is in parentheses: (For Commonization Only). This example searches for the literal value 978-436-8900: LITERAL (978-436-8900)The literal value must be the same length as the test field. If spaces are required in the literal string, the entire LITERAL decision routine must be enclosed in quotes. In the line below, the literal value 978-436-8900 is preceded by four blanks, so the entire routine must be enclosed in quotes.”LITERAL ( 978-436-8900)”

LONGEST Compares the length of the test field data on one record against the length of the data in the same field on another record. System commonizes the longer of the two fields.Field1 = Smith Field2 = SmitIn this case, the contents of test field, “Smith” (the longer of the two) is commonized.

MOST Most occurring value for selected data field

MOST_NB Most occurring non-blank value for selected data field

MOST_NZ Most occurring non-zero value for selected data field

MOST_NBNZ Most occurring non-blank/non-zero value for selected data field

SHORTEST Compares the length of the test field data on one record against the length of the data in the same field on another record. System commonizes the shorter of the two fields.Test field = Smith Test field = SmitIn this case, “Smit” (the shorter of the two) is commonized.




12-14 Decision Routine Selections for a Single Field

Decision Routine Selections for a Single FieldIn the examples below, we will consider 10 records, and how the content of those records applies to ten different decision routines.

The following table shows sample decision routine results:

SURVIVOR Survivor Value Found in List (For Commonization Only)



Record # Field ContentsRecord 1 123Record 2 123Record 3 456Record 4 ___Record 5 ___Record 6 ___Record 7 000Record 8 000Record 9 000Record 10 000

Routine Searches for the To commonize field (Records)

HIGHEST Highest numeric value 456 (Record 3)

LOWEST Lowest numeric value ___ (Records 4, 5 and 6)

LOWEST_NB Lowest, non-blank numeric value 000 (Records 7-10)

LOWEST_NZ Lowest, non-zero numeric value ___ (Records 4, 5 and 6)

LOWEST_NBNZ Lowest, non-blank, non-zero numeric value

123 (Records 1 and 2)

LEAST Least occurring value 456 (Record 3)

MOST Most occurring value 000 (Records 7-10)


Decision Routine Selections for a Single Field 12-15

MOST_NZ Most occurring non-zero value ___ (Records 4, 5 and 6)

MOST_NBNZ Most occurring non-blank, non-zero value

123 (Records 1 and 2)

Routine Searches for the To commonize field (Records)


12-16 Decision Routine Selections for a Single Field


13-1

CHAPTER 13 Manipulating Your Data

Manipulating Your Data

13-2

In some cases you may want to manipulate and reconstruct data elements at certain stages of data processing. Use the Data Reconstructor to manage various data manipulation tasks. The Data Reconstructor is particularly useful when global data needs to be standardized into an identical format at the end of a project.

This chapter explains how to use the Data Reconstructor. You will perform these tasks:

Specify input, output, and DDL files for the Data Reconstruction step

Define specific Data Reconstruction rules for each country

Set the Use Rule

Run Data Reconstruction and view results

Use the Data Browser to view the reconstructed data

Generate a single file of all your global data


Using the Data Reconstructor 13-3

Using the Data Reconstructor

The Data Reconstructor is a flexible, rule-based data reconstruction program. It features a rich scripting language with conditional IF/ELSE capabilities and text manipulation. This scripting feature enables you to apply rule-based logic at any point in a job stream or real-time process.

The Data Reconstructor reconstructs addresses from a combination of data, elements, and postal matcher output fields. Reconstruction rules can be used to create an input file for a database or to create delivery address fields with specific size constraints.

Rules FileThe Rules file is a plain text file that contains data reconstruction rules, which are constructed with a special scripting language. Country-specific rules files are included in the installation package. These rules use nested IF/ELSE logic that includes selection and conditional data reconstruction features.

A rules file can contain a single rule or many rules; however, only one rule can be executed at a time.

Default Rules Files:

C:\TrilliumSoftware\tsq10r5s\<project name> settings\xxdrrules.sto

‘xx’ is a 2-digit country code such as ca, de, gb, or us.


13-4 Rule Script Language

A sample usdrrules.sto file might look like this:

Rule Script LanguageThe Data Reconstructor provides a rich script language to use when writing data reconstruction rules. You can combine existing data elements and literal values to create new data elements, based on markers you find within the record (such as Parser and Postal Matcher type fields and flag fields). You can use conditional logic to accommodate special factors when reconstructing your data. Rules can be either simple or complex, depending on your business, country, and language requirements.

Fields Fields are used in the script language to reference input or output data fields (defined in the DDL files) and literal values. When used to refer to a data field, the field-name must exactly match the spelling and case of the name in the corresponding DDL file.

Syntax

Literal Values

Literal values are string constants that consist of any combination of characters enclosed within either double-quotation marks (“) or

rule label_line#---------------------------------------## Output Alignment Section#---------------------------------------#if(out.NEWADDRL4(1:5) = " ") then move out.NEWADDRL5, NEWADDRL4; move " " , NEWADDRL5; endif;endrule

Rule Keyword

Endrule Keyword

Rule Name

n. [n:n] “literal value”out. field name (n:n) OR ‘literal value’IN. [n:*] BLANKSOUT. (n:*) ZEROS

NULLS


Data Reconstructor Rules 13-5

single-quotation marks (').

A literal string must begin and end with the same type of quotation mark.

If you need to include an actual quote character in the string, you can either enter it twice in a row or quote the entire string with the other quote character.

Although there is no practical limitation to the length of a literal value, this version of the Data Reconstructor limits the total combined size of all literals to 100 KBytes.

Data Reconstructor Rules

Reserved WordsThe following words are reserved words; they have special meaning and cannot be used except for their intended purpose:

"Mary said "you can quote me!""

'This is what Mary's friend said'or

'TS Quality'

“TS Quality”'or

alphabetic alphanumeric append:0spaces copy_all

is append:pack append:2spaces left_justify

left_justify:full NULLS numeric perform

proper_case proper_case:a else proper_case:A

proper_case:anyline proper_case:g lower_case proper_case:G

proper_case:geography proper_case:n then proper_case:N

proper_case:name proper_case:S endif proper_case:s

proper_case:street right_justify:full LT AND

ends_with OR title_case and

EQ or endrule append

GE OUT move append_pack

GT Out upper_case BLANKS

if pack NE CONTAINS


13-6 Precedence

PrecedencePrecedence controls which operators are executed first in an expression. Operators are grouped into the following levels (from highest to lowest):

ExampleIn the following expression, relational operations are performed first (== >= and <), followed by the logical AND operation, and finally the logical OR operation:

AssociativityAssociativity controls how operators at the same precedence level are grouped. All operations have left-to-right associativity.

ExamplesThe following expressions perform the same action:

IN right_justify ZEROS contains

in rule copy LE

STARTS_WITH ENDS_WITH

Operator type Keyword or symbol

Relational operators GT, LT, GE, LE, < >, <=, >=

Equity operators EQ, NE, =, !=, <>, ==

String operators Contains starts_with, ends_withCONTAINS STARTS_WITH, ENDS_WITH, IS

Logical AND operator And, and, &, &

Logical OR operator OR, or, ||

if(state == "CA" or zip_code >= “10000” AND zip_code < “20000”) //statement(s); endif;

endif;endrule


Associativity 13-7

Comments The Data Reconstructor recognizes three styles of comments:

Example 1

if(prov == "NL" OR prov == “NL” OR prov == “PE” OR prov = “NB”) move “Atlantic”, out.region;

Example 2

if(((prov == "NL" OR prov == “NS”) OR prov == “PE”) OR prov = “NB”) move “Atlantic”, out.region;

C Style Begin with /*, end with */ and include all characters in between. Comments can span multiple lines./* #... Example of C style comments. */Only C style comments can be embedded in the middle of a line.

C++ Style Begin with // and extend to the end of the line. If multi-line comments are required, the comment portion of each line must begin with //.////... This is an example of C++ style comments.//

Shell Style Similar to C++ style comments except that # is used instead of //. Comments begin with # and extend to the end of the line. ##... This is an example of shell style comments.#


13-8 Associativity

Input or Output Dictionary?

By default, the source_field in an action statement and the first field in an IF condition are assumed to be input fields (as defined in the input dictionary). Also, the destination_field in an action statement and the second field in an IF condition are assumed to be output fields (as defined in the output dictionary).

It is possible to override these assumptions in the following ways:

Prefix an input field with in. or IN.

Prefix an output field with out. or OUT. Example

You may declare your fields explicitly as input or output by always including the IN. or OUT. prefix (this will also improve your script’s readability):

Selecting a Portion of a Field

The language has a built-in substring capability that allows you to select a portion of an input or output field by specifying a position and length after the field as [n:n].

The first n is the beginning position of the substring. The first character in a field is considered to be in position 1.

The second n is the length of the substring. “Length” can be specified as * to indicate the remainder of the field.

For example, each of these statements does the same thing:

move out.newline2, OUT.newline1;

if(in.gout_fail_level != "0") then move in.line1, out.line1; move in.line2, out.line2; move in.line3, out.line3; move in.line4, out.line4;endif;

move "CANADA”, OUT.newline4;move "CANADA”, OUT.newline4[1:*];


Associativity 13-9

Substring notation can only be used with DDL fields and cannot be used with literal values. For example, each of these statements will generate an error message:

Binary Data Strings

A binary string constant can be either octal or hexadecimal.

Hexadecimal – the first quote character must be preceded immediately by an upper or lower case ‘x’ and each character is represented by its equivalent two-digit hexadecimal value (range 00 - FF). (A special case is made for x"CR" (carriage return) which is considered equivalent to x"0D" and "LF" (line feed), which is considered equivalent to x"0A". For example: X'5368656C646F6E' or x"CRLF".) Octal – the first quote character must be preceded immediately by an upper or lower case ‘o’ and each character is represented by its equivalent three-digit octal value (range 000 - 377). For example: O"110141162164154151156147" or o'015012'.

Concatenating Literal Values

Literal values can be joined together using a plus sign (+) as an operator. This can be useful when you need to create a very long literal string or to make your scripts easier to understand.

Example

BLANKS, ZEROS and NULLS

The BLANKS, ZEROS and NULLS keywords can be used to set a field entirely to blanks, zeros or binary-zeros. They can also test to see if a field contains only blanks, zeros or binary zeros.

move BLANKS[1:10] , OUT.newline1; // will generate an errormove "CANADA"[2:*], OUT.newline2; // will generate an error

move "----------------------------------" + "-------------------------------------" + "--------------------------------", dashed_line_120ch;move 'Network Pathways Inc., ' + 'Suite 100-401, ' + '1600 Bedford Hwy, ' + 'Bedford, NS, ' + 'Canada B4A 1E8' , return_address;


13-10 Associativity

Whenever these keywords are used, a literal value is created dynamically with exactly the right number of blanks, zeros or NULLS to match the size of the other fields used in the expression.

If, for some reason, all fields in an expression are BLANKS, ZEROS or NULLS keywords, the length of the resulting literal values will be one.

ExamplesIn this example, all fields used within the IF-conditions are one byte long:

In this example, the length of the BLANKS literal will be ten bytes to match the 10-byte substring selected from the 30-byte city field using the city[2:10] notation.

‘IF’ Statements

These statements allow you to add conditional logic in your scripts to choose between two or more options. For instance, you can choose to build an output address from Postal Matcher fields or from original input data based on Parser and Postal Matcher flags.

Syntax

IF statements consist of three parts:

If(BLANKS == BLANKS) then // always trueendif;

If(BLANKS == ZEROS) then // always falseendif;

If(city[2:10] == BLANKS) then // characters 2 through 11 of city are blankendif;

IF [condition [and/or/AND/OR] condition] THEN [action statement][else action statement;] ENDIF;


Associativity 13-11

the condition(s) to be evaluated

the action_statement(s) to execute when the condition(s) are TRUE

the action_statement(s) to execute when the condition(s) are FALSE.

When conditions evaluate as TRUE, the action_statement(s) following the conditions are executed; otherwise, the action_statement(s) that follow the else keyword are executed. Conditions must be enclosed in round brackets. The then keyword is optional and can be omitted, or included to improve readability.

When two fields of unequal lengths are compared, the comparison is made as if the shorter of the two fields was padded with blanks to match the length of the larger field.

ExampleIf the field urban_city_name was 20 bytes long, the following two conditions would be the same:

Conditions The program conditions include four relational conditions, two equality conditions and six string conditions as shown in table A-15:

if(“urban_city_name == "BOSTON")if(“urban_city_name == "BOSTON ")

Table 13.1 Data Reconstructor Rules Conditions

Relational Conditions Description

field1 GT field2field1 > field2

Greater Than – True if field1 is greater than field2

field1 GE field2field1 >= field2

Greater Than Or Equal ToTrue if field1 is greater than field2 or field1 is equal to field2

field1 LT field2field1 < field2

Less ThanTrue if field1 is less than field2

field1 LE field2field1 <= field2

Less Than Or Equal ToTrue if field1 is less than field2 or field1 is equal to field2


13-12 Associativity

Equality Conditions Description

field1 EQ field2field1 == field2field1 = field2

Equal ToTrue if field1 is equal to field2

field1 NE field2field1 != field2field1 <> field2

Not Equal ToTrue if field1 is not equal to field2

String Conditions Description

field1 is numeric String is Numeric – True if field1 contains only numerics.

Leading and trailing blanks are trimmed from the field before making the comparison. The IS_DIGIT_FNAME table indicates the alphabetic characters.

field1 is alphabetic String is Alphabetic – True if field1 contains only alphas.

Leading and trailing blanks are trimmed from the field before the comparison is made. The IS_ALPHA_FNAME table indicates the alphabetic characters.

field1 is alphanumeric String is Alphanumeric – True if field1 contains only alphas or numerics.

Leading and trailing blanks are trimmed from the field before the comparison. IS_ALPHA_FNAME and IS_DIGIT_FNAME specify alphabetic and numeric characters.

field1 CONTAINS field2field1 contains field2field1 ~= field2

String Contains – True if field2 is found anywhere within field1. Leading and trailing blanks are trimmed from both fields before comparisons.

field1 STARTS_WITH field2field1 starts_with field2field1 ~< field2

String Starts With – True if field1 starts with field2. Leading and trailing blanks are trimmed from both fields before comparisons.

field1 ENDS_WITH field2field1 ends_with field2field1 ~> field2

String Ends With – True if field1 ends with field2. Leading and trailing blanks are trimmed from both fields before comparisons.

Table 13.1 Data Reconstructor Rules Conditions


Associativity 13-13

ExampleThis is an example using all twelve conditions.

Logical Operators

IF conditions can be combined using logical AND and OR operators to create compound conditions.

The order of evaluation of compound conditions is described in the section “Precedence and Associativity.” (See “Rules File” on page 13-3.) The usual order can be altered using brackets to group the conditions to be evaluated first. See the following example.

if(zip_code GT "10000" AND zip_code LT "50000" AND pr_rev_group GE "008" AND pr_rev_group LE "010" AND pr_gout_fail_level == "0" AND state != "NY" AND first_name starts_with "PH" AND last_name ends_with "ING" AND company_name contains "TAXI" AND in.birth_date is numeric AND postal_code[1:1] is alphabetic AND company_name is alphanumeric ) then

move "1", flag;else

move "0", flag;endif;

Table 13-2: Data Reconstructor Logical Operators

Logical Operators Description

condition1 AND condition2condition1 and condition2condition1 && condition2

Logical AND. TRUE only if both condition1 and condition2 are TRUE.

condition1 OR condition2condition1 or condition2condition1 || condition2

Logical OR. TRUE if either condition1 or condition2 is TRUE.


13-14 Associativity

Example

Nested ‘IF’ Statements

You can create nested IF statements in which one IF statement is embedded within another.

Example

The sample shows one rule definition called LABEL1, which will either populate output fields nwaddrl3 or nwaddrl4, depending on whether the field gb_out_dpndthorough_name is blank or not, as long as the record had a match level of 0. Both nwaddrl3 and

if((pr_rev_group == "000" OR pr_rev_group == "009") AND pr_gout_fail_level == "0") then/* Construct a new address from postal matcher output fields */

IF [condition1] IF [condition2] [Action statement1] ELSE [Action statement2] ENDIF ELSE [Action statement4]ENDIF

rule LABEL1if(gb_out_match_level = "0") then if(gb_out_dpndthorough_name <> BLANKS) then move gb_out_house_number, nwaddrl3; append gb_out_dpndthorough_name, nwaddrl3; append gb_out_dpndthorough_desc, nwaddrl3; append gb_out_thorough_name , nwaddrl4; append gb_out_through_desc , nwaddrl4; else move gb_out_house_number , nwaddrl3; append gb_out_thorough_name , nwaddrl3; append gb_out_through_desc , nwaddrl3; endif;endrule


Associativity 13-15

nwaddrl4 fields are populated if there was data in the dependent thoroughfare name field.

Action Statements

Syntax

Some action statements may include a modifier that changes their operation slightly. The modifier must immediately follow the verb and be delimited from it with a single colon.

For example, the append:2spaces statement works like the append statement, with the exception that two spaces are used for a delimiter instead of one. The comma separating the source-field from the destination-field is optional.

Specific action statements take either no, one, or two arguments as described in the following sections.

Action Statements – Require No Arguments

There is one action statement that requires no arguments:

The copy_all statement resets to blanks any output field

verb [:modifier] [source field] [,] [destination field] ; - or-perform rule_name;

Statements Description

copy_all Copies all corresponding input fields to output fields. Fields are considered to correspond if they have the same name in both input and output DDL files. Any output fields that do not correspond to input fields are reset to blanks.

When DDL names are the same, copy_all moves the entire input record to the output record instead of performing field-by-field moves as an optimization.


13-16 Associativity

that has no corresponding input field. For this reason, it should always be used at the beginning of your script.

Action Statements – Require One Argument

The script language has six action statements that require a single argument. In each of these statements, the lone argument is used to specify a destination field and cannot be a literal value.


pack Removes all blank characters from the destination field.

upper_case Converts all of the characters in the destination field to upper case.

lower_case Converts all of the characters in the destination field to lower case.

title_case Converts all characters in the destination field to a mix of upper and lower case. The first alphabetic character (and any alphabetic that follows a non-alphabetic one) are converted to upper case; the remaining characters are converted to lower case.

A special exception is made for apostrophe-s which is converted to lower case. For example "MARY-JANE’S BAKERY" would be changed to "Mary-Jane's Bakery".

right_justify Right-justifies the contents of the destination field. Removes any trailing blanks from the contents of the field.

left_justify Left justifies the contents of the destination field. Removes any leading blanks.

right_justify:full Right justifies the contents of the destination field and converts each occurrence of multiple blanks to a single blank. For example, given a 20 character field containing the following value: "EXPIRY 20001127 ",

right_justify:full produces:" EXPIRY 20001127"


Associativity 13-17

left_justify:full Left justifies the contents of the destination field and converts each occurrence of multiple blanks to a single blank. For example, for a 20-character field containing this value:

" THE PIT STOP "

left_justify:full produces:"THE PIT STOP "

proper_case

proper_case:aproper_case:Aproper_case:anyline

proper_case:nproper_case:Nproper_case:name

proper_case:sproper_case:Sproper_case:street

proper_case:gproper_case:Gproper_case:geography

Converts all characters in the destination field to a mix of upper and lower case using an external UPLOW table. When no corresponding entries are found in the UPLOW table, the destination field is still converted to mixed upper/lower case using title_case logic.

Indicates that the proper_case statement is not for any specific line type. Only the ("A") line-type entries in the UPLOW table will be searched. This is the default operation when no modifier is specified.proper_case for a field containing name information.Searches the ("N") line-type entries in the UPLOW table, followed by the ("A") line-type entries if a match was not found in the "N" entries.proper_case for a field containing street information.Searches the ("S") line-type entries in the UPLOW table, followed by the ("A") line-type entries if a match was not found in the "S" entries.proper_case for a field containing geography information.Searches the ("G") line-type entries in the UPLOW table, followed by the ("A") line-type entries if a match was not found in the "G" entries.

perform Causes all statements in another rule to be executed.

Example: perform fix_name_line

This will execute all statements in the previously defined “fix_name_line” rule in the same rule-file. You can only perform a rule that has already been defined within the rule file.



13-18 Associativity

Action Statements – Require Two Arguments

The first argument specifies a source-field or a literal value and the second argument specifies the destination-field.


copy Copies the contents of one field to another, adjusting the data type (if necessary) to match the description of the output field in the DDL. If the first argument is a literal, a move operation is performed instead of a copy.

move Moves one text field to another. Unlike copy, no conversion from one data type to another is attempted.If source-field is longer than destination-field, it is truncated during the move. If source-field is shorter than destination-field, the destination-field is padded with blanks after the move.

append Appends the contents of one field to the end of the contents of another field, after first adding a single blank character as a separator.

If the destination-field is currently empty (all blanks) then a move operation is performed instead of append. This makes it possible to perform a series of append operations on the same destination-field without creating unwanted blanks at the beginning of the field.

If there is not enough room at the end of the destination-field, the source-field will be truncated to fit. There must be at least 2 blanks at the end of the destination-field before an append operation will be attempted.

append_packappend:packappend:0spaces

Works like the append statement, but without the blank separator. Appends the contents of one field directly to the end of contents of another field.

There must be at least 1 blank at the end of the destination-field before this operation will be attempted.


Associativity 13-19

Over-lapping Fields

Move, append and append_pack/append:pack operations with source and destination-fields that overlap in memory are fully supported. These operations are completed as if a temporary copy of the source-field had been made before the operation started.

Example

String Variables

String variables must be declared with a STRING keyword before they are used. String variables can be declared in two places:

At the beginning of the rules file, before any rules are defined

At the beginning of a specific rule, before the first action statement

String variables have specific properties:

Names begin with a dollar sign: $NAME

They are also case-sensitive ($NAME vs. $name vs. $Name)

They have a default length of 256 characters, unless a different length is specified at the time they’re first declared

ExampleString variables may be used any place in a rule that a DDL field name can be used.

append:2spaces Appends contents of one field to the end of the contents of another field after first adding two blank characters as a separator. May be required in some countries (e.g. Canada) to separate the postal-code from the remainder of the line.

If there is not enough room at the end of the destination-field, the source-field will be truncated to fit. There must be at least 3 blanks at the end of the destination-field before an append:2spaces operation will be attempted.


move "TRILLIUM”, out.temp;move out.temp[2:4], out.temp[1:4]// following this move the out.temp// field will contain “RILLIUM"



Input and Output SettingsIn this example, the Data Reconstructor uses the output from the Create Common Utility step as input. By using the same output DDL (global DDL) for all country-specific data, you can standardize the global data into the same format.

The global DDL contains these fields:

Original name and address fields

Country-specific Postal Matcher match level codes

Reconstructed name and address fields based on new standardized, enriched and linked data fields (NEWADDRESSL1 - 10)

Original user-defined fields


1. Open the Data Reconstructor step and select the Input Settings tab.


STRING $LAST_NAME[30]; // 30ch longSTRING $last_name; // 256ch longSTRIN $BigBuffer[10000] //10,000ch long

rule sample1STRING $name[50];move in.first_name, $name;append in.last_name, $name;move $name, out.full_name;

endrule;



3. Select the Output Settings tab. 4. Specify the Output File Name and Output DDL Name.

Enter a file name in the Statistics File Name and Process Log Name text boxes.






specifies the mamimum number of records to process. By default, all records will be processed.












13-22 Settings for the Data Reconstructor





Settings for the Data ReconstructorOnce you have specified input and output files, you can specify the settings used to process your data. The settings for processing are managed in the Advanced Settings window.

Setting the Rules File

To specify the rules file

1. Click Advanced and navigate to Process, Settings. 2. Specify the Rules File.

Example:C:\TrilliumSoftware\tsq10r5s\tmt\settings\usdrrules.sto

Setting the Use RuleEach rules file can contain a number of rules available for use. Each rule begins with the rule keyword and ends with the endrule keyword. You must specify which rule you are using.


Setting the Use Rule 13-23

ExampleA sample usdrrules.sto file might look like this:

To specify the Rule

1. Click Advanced and navigate to Process, Settings. 2. Specify the rule name in the Use Rule field. For the example

above, enter “label_line”. If you are using multiple rules in the Rules File, place a comma after each rule.

Figure 13.1 Rule Settings

rule label_line#---------------------------------------## Output Alignment Section#---------------------------------------#if(out.NEWADDRL4(1:5) = " ") then move out.NEWADDRL5, NEWADDRL4; move " " , NEWADDRL5; endif;endrule

Rule Keyword

Endrule Keyword

Rule Name

Use Rule

Rule File




To use alphabetic characters tableYou can include a table which identifies characters that are alphabetic characters. This setting may be required for the special characters found in many languages. When this table is not specified, your operating system’s default settings will be used.

This table is used by the ”is alphabetic”, “is alphanumeric”, proper_case and title_case rules.

1. Click Advanced and navigate to Process, Settings. 2. In Alpha Defines Table, enter the path and file name of

your alphabetic character table.

To use numeric digits tableYou can include a table which identifies characters that are numbers. This setting may be required for the special characters found in many languages. When it is not specified, your operating system’s default settings will be used.

This table is used by the ”is numeric” and “is alphanumeric” rules.

1. Click Advanced and navigate to Process, Settings. 2. In Numeric Defines Table, enter the path and file name of

your numeric digit table.

To use lowercase/uppercase translation tableYou can include a table used to translate characters to all lower or upper case. This setting may be required for the special characters found in many languages, or to convert from one code page to another. When this table is not specified, your operating system’s default settings will be used.

See “Data Reconstructor” in the TS Quality Reference Guide for complete settings information.



This table is used by the proper_case, title_case, and lower_case/upper_case rules.

1. Click Advanced and navigate to Process, Settings. 2. In Lowercase Translation File or Uppercase Translation

File, enter the path and file name of your table.



name, or specify the file to which debugging information will be written.







drop-down list.



13-26 Run Data Reconstruction and View Results

Run Data Reconstruction and View Results

To run Data Reconstructor and view results

1. Click OK to close the Advanced Settings. 2. Click Run to run the Data Reconstruction step.


3. Select OK at the Message box. 4. On the Results tab, the Statistics sub-tab appears. 5. Navigate to the Output Settings tab and click the Data

Browser button next to the Output File Name. 6. In the Field Selection window, select the fields you used for

the Data Reconstruction process, such as NEWADDRL1 - NEWADDRL5.

7. Click Display to review the data and ensure that it has been reconstructed properly.



Bringing the Data Together 13-27

Bringing the Data Together

The final step in the project uses the Transformer to merge multiple files into a single output file of global data. On input, select only the ‘survivor’ records for inclusion in the single output data file. On output, you will have a file of ‘survivor’ records with the accurate account representative assignment and the format required for each country.

Add a Global Transformer stepTo merge data from multiple countries together, you must add a Global Transformer step to the project.

To add a Transformer step to the project

1. From the Main menu, click Edit, Add new project step from palette. The palette displays all the available steps.

You can also select the List View tab and click Add New Step from Palette. Right-click anywhere in the Data Flow Architect area and select Add New Step from Palette.

2. Under the Standardization category, select Transformer. Drag and drop this step onto the Data Flow Architect area.

3. In the Choose Country Name box, select Global. 4. In the Provide a Unique Step Name box, enter a step

name: for example, ‘Transformer at End’. 5. Drag and drop this ‘Transformer at End’ step to the end of

the country flows and attach the Data Reconstruction steps. 6. To connect the steps, click the right connection area on the

Data Reconstruction step, then click anywhere on the ‘Transformer at End’ step.

Alternatively, you can right-click the steps and select Start Connection or End Connection from the pop-up menu.

If the Data Flow Architect area is locked, unlock it by right clicking in the Data Flow Architect and selecting Lock.


13-28 Add a Global Transformer step

Figure 13.2 Connecting Steps in Data Flow Architect

See “Using the Data Flow Architect” on page 2-20 for detailed instructions on how to connect steps.





1. Open the Transformer at End step and click Input Settings.

2. Select the Input Data and Input DDL files for the first country (for example, Canada). Click Add.

Figure 13.3 Input File Settings 3. Add the other countries’ input data files and DDLs.

Figure 13.4 Input File Settings for All Countries 4. Select the Output Settings tab. 5. Select the Output File Name and Output DDL Name.

Figure 13.5 Output File Settings 6. Enter a file name in the Statistics File Name and Process






13-30 Select and Bypass Records

Select and Bypass RecordsThe final step should use only survivor records as input; that is, records with a 1 in the Survivor_flag field. This will create a final output file with one contact per business. This record will also have the commonized account representative data. To achieve this result, use the Select and Bypass Condition function on the input files.

To build a Select/Bypass Condition

1. Click Advanced and navigate to Input, Settings, Select Record Conditions.

2. Click the first condition row and select Edit Condition. The Logic Builder window appears.

3. Double-click Survivor_flag in the DDL Fields list. 4. Double-click EQUAL TO in the Operators list. Enter 1. This

operation will select only records that have a 1 in the Survivor_flag field.

5. Click Apply and Close. Remain on the Advanced Settings window.

Figure 13.6 Select Bypass Logic Builder

See “Select or Bypass Records” on page 5-37 for more information.



6. Select the next input file name under Input Files and repeat the steps.

7. Repeat for all remaining input files.

Process SettingsOnce you have specified input and output files, you can specify settings used to process your data. The settings for processing are managed in the Advanced Settings window.

Source IdentificationSince you are merging multiple records into one file, an input source identifier should be applied. This indicator designates the origin of the record.

To specify source identification

1. Click Additional Settings. 2. Enter File Source to indicate the input origin of the record.

For example, use CA for the records from the Canadian file. Remain on the Advanced Settings window.

Figure 13.7 Source Identification

The File Source field is set to four characters in the standard DDL.


13-32 Run Transformer and View Results

3. Select the next input file name under Input Files and repeat the steps.

4. Repeat for all remaining input files.

Run Transformer and View Results

To run Transformer and view results

1. Click OK to close the Advanced Settings. 2. Click Run to run the ‘Transformer at End’ step.


3. Select OK. 4. Summary results appear on the Results tab of this step.

Figure 13.8 Summary Resutls



Run Transformer and View Results 13-33

5. Notice that only selected records from each input file are included.

6. Navigate to Output Settings and click the Data Browser button next to the Output File Name.

7. In the Field Selection window, click Add All and Display. 8. Review the records and fields selected by this process. The

final records would look like this:

Figure 13.9 Final Record Output


13-34 Run Transformer and View Results


14-1

CHAPTER 14 Packaging Projects

Packaging Projects

14-2

Most users create a script from the steps in the project. This chapter will describe how to create and run a script, and provides a summary of Real-Time Processing. Real-Time processing can ensure that new data entering the database is transformed, cleansed, enriched and linked at the point of entry. The TS Quality Analyzer tool can demonstrate a Real-Time processing environment.

This chapter focuses on several tasks:

Use the List View to order and select steps for inclusion to a batch script

Generate a script to run all selected steps

Save, view and run the script

Export and import projects

Understand the Director architecture and the role of the cleansing/matching servers

Move from a batch environment to a real-time environment

Understand the role of the business rules

Use the TS Quality Analyzer to sample real-time cleansing and matching


Batch Script 14-3

Batch Script

You can combine projet steps into a batch script that can be run on UNIX or Windows platforms. The Windows interface lets you set up a project that exists on UNIX (by using mapped network drives), but that can be run using the local Windows client.

Create a ScriptBefore generating a script to run your project, you must select the steps to be included.

To select steps

1. In the List View in the Control Center’s main window, use CTRL+click and select the desired steps.

If you want to select a block of contiguous steps, click on the first step desired and then SHIFT+click on the last step desired.

If you want to select all the steps in the List View, click on Select All Steps on the tool bar.

To generate a batch script

1. Click Generate Script to Run on the tool bar. The Batch Process window will appear.

Figure 14.1 Batch Process Window

All batch files have the extension of .bat.

Packaging Projects

14-4 Edit a Script

2. On the Save As tab, specify the name and location of the script.

3. Click Save to save your steps for a batch process.

Edit a Script

To edit a batch script

1. The Modify tab allows you to view and edit the batch file. Select the Modify tab. The steps in the batch file are listed on the left and the file contents appear on the right.

If you click a step on the left, the right pane will automatically page down to the step’s section in the file.

Any remarks are preceded with a *rem*.

Figure 14.2 Batch Script - Modify Tab


Run a Script 14-5

2. To edit the script, click Edit. This will open the script in the WordPad editor.

3. Make your changes and save the file. 4. Click Refresh to update the file.

Run a Script

To run a batch script

1. Select the Run tab. From this tab, choose one of the following methods to run the batch file:

Run as an attached process - If you run the batch file as an attached process, the Control Center must remain open. The Notify when job is done running check box is active only when this option is selected.

Run as a detached process - If you run the batch file as a detached process, the process will run independently of the Control Center.

Figure 14.3 Batch Script - Run Tab 2. Select the appropriate radio button, and select ‘Notify when

job is done running’ if necessary.

Packaging Projects

14-6 Create Multiple Batch Files

3. Click Run to start the process. If you run the batch as an attached process, you will see the following Message box when the process is complete.

Figure 14.4 Batch Script - Complete Message 4. Click OK. Review the Console Results.

If there are no messages in the Console Results, the batch job ran successfully.

Create Multiple Batch FilesAfter running the batch process, the Batch Process window remains open. If you want to run another batch process, you must close the window from the first batch process that was run. A new batch file requires a fresh batch window.


Exporting/Importing Projects 14-7

Exporting/Importing Projects

The Export and Import functions take a Control Center project and relocate it to a new release directory, or to another machine, without loss of project steps. This feature lets you move project contents to different platforms and drive locations.

Whether an upgrade is taking place or a project is being moved from one licensed server to another, Import/Export procedures make it easy to move your data quality process from one location to another. This feature also allows previous versions of the TS Quality to be migrated successfully into the current environment.

Projects created with TS Quality are fully exportable and can be imported without loss of user-defined steps. These projects can be exported into a TS Quality directory.

In order to export and import a project to another physical machine, the project directory and its accompanying subdirectories must be physically copied to a media device and transferred to the other machine.

Packaging Projects

14-8 Export Projects

Export ProjectsThe first step of the export/import procedure is to export the project.

To export a project

1. In the Control Center main window, double-click the project that you want to export.

2. Select File, Export Project...

Figure 14.5 Export Project 3. In the Export Project window, accept the default exported

project file name, or provide the correct file name. The default file name adds a .zip extension to the current file name.

4. Click Start Export. The export process starts. 5. Select OK.


Import Projects 14-9

Import ProjectsThe second step of the export/import procedure is to import the project into the current environment.

To import a project

1. In the Control Center main window, select File, Import Project...

Figure 14.6 Import Project 2. Enter file names for the Project to Import (the .zip file),

New Project Directory and New Project Name. Use the navigation buttons to browse for the file and directory.

3. Under Original Path, you should see the old path for the project, postal table, census table and software. If you want to change this information, specify the new location under the New Path area and click Substitute.

4. Click Start Import. The import process starts. 5. Select OK. The old project will be re-created in the new

location.

Note that the imported project is set up exactly as it appeared

Packaging Projects

14-10 Import Projects from Windows to UNIX

before it was exported:

All steps will be included and every step module will be in the same place in the Data Flow Architect window as it was before the project was exported.

Settings files for each step retain their information. The file contents themselves are unchanged, except in cases where the platform of the import project was different than the platform it was created on when it was exported. In this case, the slashes in paths will change to the direction that is correct for the new platform. If relative paths were used within the settings files, this feature will save some effort in changing path locations within the files.

Import Projects from Windows to UNIXIf the original platform was Windows and the new platform is UNIX, the slash direction will be reversed. The paths that include drive letters will need to be corrected manually to specify the UNIX location.

Using relative paths (for example, ..\ddl\input.ddl) will save work if you know in advance that your project will be moved from Windows to UNIX. It is the user’s responsibility to modify all path definitions in every file and each run command correctly once the import is complete.

We recommend testing the export process by moving or removing some data files from the original project before exporting the project or moving it onto a portable media device.


Real-Time Processing 14-11

Real-Time Processing

Now that the batch process is in place, you can leverage the business rules which you have developed in batch and use them in a real-time environment. By implementing a real-time solution, your data can be transformed, cleansed, enriched and linked at point of entry. This section provides an overview of the TS Quality real-time processing with the Director. The Director for TS Quality is an optional application. For information on the Director for TS Quality, please contact your Trillium Software® sales representative.

The DirectorThe Director acts as a registry for cleansing and matching servers that are made available to the calling environment.

Figure 14.7 Director Architecture Overview

Packaging Projects

14-12 Cleansing Server

Cleansing ServerTS Quality uses a single API approach to simplify the task of moving data through the various country specific modules. The simple API eliminates the need for the programmer to know the internal workings of each TS Quality function. This interface uses a single configuration file to enable simple construction of complex transactional data quality processing. The business rules developed in the batch application are used in this configuration file.

Matching ServerThe Matching Server supports reference matching, allowing you to compare an incoming record to the database of existing records. Match results are returned to the calling application.

Figure 14.8 Directory Architecture - Startup Process


Real-Time Transaction Processing 14-13

Real-Time Transaction ProcessingOnce the Director and Cleansing/Matching servers are started, a transaction record can be processed by the calling application. Initially, the client makes a request for the cleansing services. The Director provides a handle to the cleansing server so that the calling application can communicate directly with the cleansing server. The application sends the data to the cleansing server and the cleansed data is returned to the calling application. The handle is then released back to the Director. The process is repeated for matching.

Figure 14.9 Director Architecture - Record Processing Through the reference match function in the Relationship Linker, duplicate records can be identified so that they do not enter the database.

Packaging Projects

14-14 Moving From Batch to Real-Time

Moving From Batch to Real-TimeBusiness rules designed for the batch process can also be used as resources for real-time processing. Batch process DDL’s and settings files can be reused in real time. An XML configuration file, specific to the Director Architecture, is used to control the real-time process modules.Arguments in this file make individual modules available to the calling application.

Linking Single Record Using the TS Quality Analyzer

You can use the TS Quality Analyzer to watch real-time processing. The TS Quality Analyzer application is written in C# and acts as the transaction broker for the TS Quality real-time interface.

Process MethodWhen a record is entered into the data entry window, it is collected and sent to the real-time cleansing engine when the user clicks the Cleanse button. The cleansed results are displayed on the screen.

Next, the cleansed transaction record can be compared to records in the master database with the Relationship Linker’s reference match function. Candidate records are retrieved from the database for the given window key. The transaction record and the retrieved records are then sent to the reference match function for comparison.

The calling application must retrieve the records from the database using the window key from the transaction record. The retrieval of records is not a function of TS Quality.

If a match is found, the matched record is displayed. If no match is found, a message will be displayed on the screen.

See “Matching” on page 8-8 for instructions on how to run the Matching function using the TS Quality Analyzer.


15-1

CHAPTER 15 Working from the Command Line

Working from the Command Line

15-2

This chapter describes commands that run the TS Quality modules on UNIX and 32-bit PC platforms. Use these commands for two purposes:

run modules

create log files


Executing TS Quality Modules 15-3

Executing TS Quality Modules

All TS Quality modules can be run from the command line.

Before you run a module, make sure that your environment variables are properly set. See Getting Started with TS Quality for instructions.

SyntaxThe syntax for command line execution is:

where:

Example This is a sample command to execute the Transformer.

<program_name> <settings_file_name> <log_file_name>

program_name The module’s executable. See the next page for a complete list of program names.

settings_file_name The path and name of the module’s settings file.

log_file_name The path and name of the module’s log file. A log file displays any processing errors in the program.

This command is optional.

tranfrmr ..\settings\ustranfrmr.stx ..\data\error.log


15-4 Program Names

Program NamesTable 15.1 contains a complete list of program names. Use the appropriate program name in your command line:

Table 15.1 Command Line Execution Program Names

Module Program Name

Transformer tranfrmr

Customer Data Parser cusparse

apparse (China, Japan, Korea, and Taiwan)

Parsing Customization prcustom

apcustom (China, Japan, Korea, and Taiwan)

Business Data Parser busparse

Postal Matchers xxpmatch (xx=country code)

Examples:aupmatch (AU) capmatch (CA) depmatch (DE)hkpmatch (HK) gbpmatch (GB) uspmatch (US)tqpmatch (TQ) appmatch (China, Japan, Korea, and Taiwan)

Global Data Router globrtr

Window Key Generator winkey

Relationship Linker rellink

Create Common common

Data Reconstructor datarec

File Display Utility tsqdisp

File Update Utility fileupdt


Program Names 15-5

Frequency Count Utility tsqfreq

Merge Split Utility mrgsplit

Resolve Utility resolve

Set Selection Utility tsqsetsl

Sort Utility tsqsort

Table 15.1 Command Line Execution Program Names


15-6 Program Names


16-1

CHAPTER 16 Working with the TS Quality Utilities

Working with the TS Quality Utilities

16-2

TS Quality offers a number of utilities to perform specific tasks. All utilities can be executed from the TS Quality Control Center, in a batch process, or from the command line. This chapter explains how to use these utilities:

File Display utility

File Update utility

Frequency Count utility

Merge Split utility

Resolve utility

Set Selection utility

Sort utility

Each utility’s basic settings, such as input and output settings, are the same as the TS Quality core modules. This chapter focuses on the process settings information (the Advanced Settings window) for each utility.

Refer to the TS Quality Reference Guide for complete settings information on each utility.


File Display Utility 16-3

File Display Utility

The File Display Utility lets you create a customized display copy of a file without altering the contents of the original file. For example, use the File Display Utility to review your output data after you run the Relationship Linker to determine if your linking results are accurate, or if you need to tune your business rules to improve the results.

You can also use the File Display Utility to create a new file that organizes the original file’s data to meet specific display requirements.

Input and Output SettingsThe File Display Utility can use the input file and input DDL from any other step. This utility is most often used to display and verify results from the Relationship Linker. In this case, use the Input File Name and Input DDL Name from the output of the Relationship Linker step.

The File Display Utility does not use a DDL for output.

Outer Key and Inner KeyThe data that will be displayed is grouped by Outer Key and Inner Key.

Outer Key - Creates a group or records which have the same value in the field specified in the Outer Key Field.

Inner Key - Creates a group of records within the outer key that have the same value in a field specified in the Inner Key Field.

For example, the data can be grouped by LEV1_MATCHED (Outer Key) and LEV2_MATCHED_IN_LEV1_MATCHED (Inner Key) as shown in Figure 16.1:


16-4 Outer Key and Inner Key

Figure 16.1 Outer Key Group and Inner Key Group

To specify outer key and inner key fields

1. From the File Display Utility step, click Advanced and navigate to Input, Settings.

2. Select the Outer Key Field and Inner Key Field from the drop-down menu.

In addition, you can specify the following settings for the outer key and inner key:

Outer Key Group

Inner Key Group

Setting Description

Inner Key Field Encoding

Encoding type used by Inner Key Field. If not specified, the platform’s native encoding is used by default.


Title and Delimiters 16-5

Title and DelimitersYou must specify the title and delimiters for the title section, the outer key lines, and the inner key lines.

Figure 16.2 Title, Outer Key, Inner Key DelimitersIn this example, the lines in the report’s title section are separated

Inner Key Compare Bytes

Number of Inner Key Field bytes to use. By default, this is the field length of Inner Key Field.

Outer Key Field Encoding

Encoding type used by Outer Key Field. If not specified, the platform’s native encoding is used by default.

Outer Key Compare Bytes

Specifies the number of Outer Key Field bytes to use. By default, this is the field length of Outer Key Field.

Setting Description

Inner Key Delimiters

Outer Key Delimiters

Title Delimiters


16-6 Title and Delimiters

by a series of forward slashes (/), while the outer lines are separated by an asterisk (*) and the inner lines are separated by a hyphen (-).

To specify the title

1. Click Advanced and navigate to Output, Settings. 2. In Title 1, enter the first title line of the report. This line

must be enclosed within quotation marks (“ ”) if there is a space between characters. For example: ‘Matching Report’

3. If necessary, specify a second title line in Title 2 and a third title line in Title 3. Those lines must also be enclosed by quotation marks (“ ”) if there is a space between characters.

4. In Title 1, 2, 3 Encoding, specify the encoding for each line if necessary. If not specified, the platform’s native encoding is used by default.

To specify the delimiters

1. Click Advanced and navigate to Output, Settings. 2. In Title Line Delimiter, specify a line delimiter for the title

line. 3. In Outer Key Delimiter, specify a line delimiter for the

outer key. A Tab, Space, Semicolon, Comma, Pipe, or any other character may be used. Characters other than those listed must be enclosed by quotation marks.

4. In Inner Key Delimiter, specify a line delimiter for the outer key. A Tab, Space, Semicolon, Comma, Pipe, or any other character may be used. Characters other than those listed must be enclosed by quotation marks.

5. In the Encoding settings for each line, specify the encoding if necessary. If not specified, the platform’s native encoding is used by default.

A red flag indicates a REQUIRED settings.


Title and Delimiters 16-7

6. In addition, you can specify the following settings for the outer key and inner key:

To display the record in a format that is easy to read, specify at least Title1, Title Line Delimiter, Outer Key Delimiter and Inner Key Delimiter.

Setting Description

Carriage Return Indicates how the end of line is indicated in the report, which affects how the report displays on different platforms. When checked, a line feed is used to indicate the end of a line (UNIX platforms). If unchecked, a carriage return/line feed is used to indicate the end of a line (all other platforms).

Maximum Lines Per Page

Numeric value that specifies the maximum number of lines to display on a page. The default is 66.

Compress Blank Lines

When checked, compresses blank lines on the report.

Inner Break Spacing

Number of separator lines to print for the break of the inner set. The default is 1.

Outer Break Spacing

Number of separator lines to print for the break of the outer set. The default is 1.

Outer Key Minimum

Numeric value that specifies the minimum number of records in the set to display in the report.


16-8 Field Settings

Field SettingsThe fields that will be displayed in the report should also be identified.

Figure 16.3 Fields DisplayedThis sample report includes LINE_01 to LINE_04, and all fields are displayed on one line.

To specify fields to be displayed

1. Click Advanced and navigate to Output, Create Report.

LINE_01 LINE_02 LINE_03 LINE_04


Field Settings 16-9

2. Refer to the table below and configure these additional settings:

Setting Description

Report Value Type Defines the type of Report Value entry. Values are:DDL field nameLiteral valueInsert spaces only

If the space option is used, Report Value is not required. However, you must specify the number of spaces with Report Value Length.

Report Value Specifies either a literal value or a field name to display in the report.

Report Value Encoding

Specifies the encoding used by Report Value. If not specified, the platform’s native encoding is the default.

Report Value Length Specifies a limit for the length of the value displayed in the report. This is useful when using a DDL field and the field length is very long but might not be completely filled.

Report Line Number Specifies the line number for the value in Report Value. The same Report Line Number value can be associated with more than one Report Value, but they must be grouped together and remain in numerical order.

See TS Quality Reference Guide and Online Help for complete settings information.


16-10 File Update Utility

File Update Utility

The File Update Utility updates a file, called the master file, with the data contained in another file, called the transaction file. You can update records in the master file based on a specific key set when the keys match between records.The File Update utility separates records based on match or not-match key conditions and creates separate files for these records for further review or processing.

Match Keys and FieldsThe update will be applied to the records in the master file based on a Match Key specified by the user. If the match key’s values are equal in the master and transaction files, the record will be considered “matched”, and field values in the matched records in the master file will be updated by the values in the transaction file. Fields to be updated are determined by the output DDLs for the output files. If the key values in the master file and transaction file are not equal, the record will be considered “unmatched” and the unmatched records will be written out.

Match keys are specified with the Match Key setting. Field names are used for match keys. The values in the key fields must be equal in both the

master and transaction files to perform the update. Up to five match keys can be specified. All input files (master and transaction files) must be sorted

in ascending order by the match key.

Neither field names nor field lengths for match keys need to be equal for the master and transaction files. The program will search for a match based on the values in the key fields specified.

Fields to be updated must be specified in the output DDLs. When there are common fields between the master and transaction files,


Match Keys and Fields 16-11

the values in the common fields in the master file will be overwritten (updated) by the values in the same fields in the transaction file.

ExampleIn these sample master and transaction files, the following fields are used (the Match Key is the Record_Key field):

If the output DDL has all fields from the master and transaction files, the match master file includes the following fields. Therefore, the value in the common field, Record#, will be overwritten by the transaction file:

If you prefer not to overwrite or update the common field in the master file with information from the transaction file, redefine the field name either for the master or for the transaction file.

Master File Tran File

Record#

Record_Key

Name

Record#Record_KeyStreetCity

State

(Match Key) (Match Key)

Match Master File

The value of this field will be overwritten (updated) by the tran file.

The values of these fields are inserted from the tran file.

Record #Record_KeyNameStreetCityState


16-12 Match Keys and Fields

Example

InputIn this example, the master file contains the customers’ names and the transaction file contains the customers’ addresses.

Match key: Record_Key field.

Master Input File (master.dat)

Tran Input File (tran.dat)

Output DDLThe following fields are used in the DDL for ALL master output files (match_master.ddx, match_dup_master.ddx, unmatch_master.ddx):

Rec # Record_Key Name

1 0001 John Nicoli

2 0001 J Nicoli

3 0002 Mary Rogers

4 0003 Kevin McCarthy

Rec # Record_Key Street City State

100 0001 25 Linnell Circle Billerica MA

200 0001 1 Elm St. Nashua NH


400 0002 12 Oak St. Waltham MA

500 0004 3 Royal Court Boston MA

Rec #

Record_Key

Name

Street


Match Keys and Fields 16-13

Output1. The program searches for records in master.dat that have the

same key values as tran.dat. In this case, the records with “0001” and “0002” in the Record_Key field are matched records.

2. The Rec #1 in master.dat and the Rec #100 in tran.dat are matched records because they matched first. Similarly, Rec #3 and the Rec # 300 are matched records. Therefore, in the match master file, the values in the “Rec#” field will be updated by tran.dat, and all address-related fields and their values will be added from the tran.dat file.

Match Master File (match_master.dat)

3. Rec #2 in master.dat is a duplicate matched record because it appeared after the first matched record (Rec #1). Therefore, Rec #2 is written out to the match master duplicate file. For the match master duplicate file, the user must select whether or not to update the record by the Update Output Rec setting. See the following cases:

Duplicate records are additional records with the same key that appear after the first match occurs.

Match Master Duplicate File (match_dup_master.dat)

Case 1: Update the matched duplicate records

UPDATE_OUTPUT_REC ON

City

State

Rec #

Rec # Record_Key Name Street City State

100 0001 John Nicoli 25 Linnell Circle Billerica MA

300 0002 Mary Rogers 25 Linnell Circle Billerica MA


16-14 Match Keys and Fields

Case 2: Do not update the matched duplicate records

UPDATE_OUTPUT_REC OFF

4. Rec #4 in master.dat is an unmatched record because it does not share the same key value with any records in tran.dat. Therefore, Rec #4 is written out to the unmatch master file.

Unmatch Master File (unmatch_master.dat)

5. The matched records in tran.dat are written out to the match_tran.dat file to show which transaction records matched a master record. If there are any matched duplicate, unmatched, or unmatched duplicate records in the transaction file, they are also written out to separate files.

Match Tran File (match_tran.dat)


100 0001 J Nicoli 25 Linnell Circle Billerica MA


2 0001 J Nicoli


4 0003 Kevin McCarthy






Match Tran Duplicate File (match_dup_tran.dat)

Input and Output SettingsFor input, you must specify the Master File Name and DDL, Transaction File Name and DDL. For output, you can specify various types of Match and Unmatch files.

Master and transaction files must be sorted by match keys.

Match Key SettingsThe update will be applied to the records in the master file based on the Match Key specified by the user. If the match key’s values are equal between the master and transaction files, the record will be considered “matched”, and field values in the matched records in the master file will be updated with the values in the transaction file.

To specify match keys

1. From the File Update Utility step, click Advanced and navigate to Input, Master Settings.

2. Specify Match Keys by selecting field names from the drop-down list.

3. Navigate to Input, Transaction Settings. 4. Specify the same Match Keys as you specified for Master

Settings.

If you specify multiple match keys, separate the keys with commas. For example: Last_Name,First_Name,SS_number.


200 0001 1 Elm St. Nashua NH

400 0002 12 Oak St. Waltham MA

A red flag indicates a REQUIRED setting.


16-16 Transaction Output Settings

To enable Table UpdateWhen it is not practical to sort the master file by the match keys due to its size, enable the Table Update function to update the master file without sorting. The program will only update the first matched record in the master file with the contents of the first matched record in any transaction file. Any following matches for that match key will be ignored.

1. Navigate to Input, Master Settings. 2. Select Table Update.

If Table Update was turned on in the Master Settings, duplicate records will not be written out because the program will not search for duplicate records.

For the match master duplicate file, the user must select whether or not to update the record by using the Update Output Rec setting in the Master Match Dup Settings.

Transaction Output SettingsIf you specify transaction output files such as Tran Match File and Tran Unmatch File, you need to set the Transaction File Qualifier in Output Advanced Settings.

To specify transaction settings

1. Click Advanced and navigate to Output, Match Tran Settings or any of the output transaction file’s settings for the files you specified on the Output Setting tab.

2. Select For Tran. This is the value specified in the File Qualifier in Transaction Settings.



Frequency Count Utility 16-17

Frequency Count Utility

The Frequency Count Utility analyzes data records to determine the frequency of input fields by counting the occurrences of literal data strings, mask shapes and blanks.The resulting frequency counts are displayed in the output file.

ExampleAs shown in this example, the data can be counted by FIRST_NAME, LAST_NAME and STREET_ADDR. When multiple fields are specified like this, the frequency counts will be made on the combined value of the fields, not on the individual fields.

Figure 16.4 Frequency Count

COUNT FIRST_NAME LAST_NAME STREET_ADDR65 John 35 John Nicoli 35 John Nicoli 13 Yellow Way 30 John Smith 30 John Smith 23 Purple Circle 35 Bernard 34 Bernard LeCuyer 34 Bernard LeCuyer 19 Blue St 1 Bernard LCuyer 1 Bernard LCuyer 19 Blue St 35 Clara 35 Clara Currier 35 Clara Currier 18 Red Road 35 Iulia 35 Iulia Andrei 35 Iulia Andrei 14 Orange Parkway 35 Jack 35 Jack Sweeney 35 Jack Sweeney 17 Black Street



Input and Output SettingsThe Frequency Count Utility uses the Input File Name and Input DDL Name from any other TS Quality step.

Count Settings

To specify fields to count

1. From the File Update Utility step, click Advanced and navigate to Input, Freq Settings.

2. Click Entry Settings. Select Field Name from the drop-down list. This is the field which will be counted.

3. Select either Literal or Mask for Count Type.

4. Click Field Settings. Select Sort Type (Descending Order or Ascending Order) and Sort Option (Count or Value). The Sort Option specifies whether to sort the results by Count or by field Value.

5. Optionally, you can check Show All Combinations to display an expanded output view. You can also specify how many records of a given frequency are displayed in the Number of Top Occurrences text box. For example, a value of 100 will show the top 100 most frequently occurring records.

When Show All Combinations is checked, the program counts the number of occurrences of a specified data combination. For example, if a field contains Tom Smith and another record contains Tom, the report shows two occurrences of Tom and one occurrence of Tom Smith.




Merge Split Utility 16-19

Merge Split Utility

The Merge Split Utility lets you manipulate files with merge keys and split rules. You can create merge keys to determine how files will be merged, and create rules to split files into multiple smaller files or to produce multiple output files from a single input file.

Input and Output SettingsThe Merge Split Utility uses the Input File Name and Input DDL Name from any other TS Quality step.

Using Multiple Input Files to Create an Output DDL

You can specify up to a maximum of ten (10) input files and their associated DDLs and use these to create a common output file for later processing by modules downstream in your workflow. This process requires that after you specify the input files, you map input fields from the associated DDLs to a common output DDL file.

To add multiple input files and map

1. Double-click a Merge Split Utility step to open the Merge Split Utility window.

2. In the Input Data File field, type or browse to the input file you wish to use.

3. In the Input DDL File field, type or browse to the inpt DDL file associated with the input data file you specified in Step 2.

4. Click Add. 5. Repeat Steps 2-3 until you’ve added all DDL files you want to

use to create the common output format. 6. Click the Define Output DDL button (bottom left). 7. The Define Output DDL dialog appears.


16-20 Using Multiple Input Files to Create an Output DDL

Figure 16.5 Define Output DDL dialog 8. Use the Input DDL drop-down menu to select the DDL file

you want to use to map fields to an output DDL file. The input DDL fields appear in the left pane and the final output DDL fields appear in the right-pane.

9. Use the buttons in the center panel to refine the output DDL list of fields. You can choose from these options:Add—adds the selected input DDL field to the output DDLlist.Delete—deletes a selected output DDL field from the list.


Merge Files 16-21

Move Up—moves the selected field in the output DDL list up one row.Move Down—moves the selected field in the output DDL list down one row.Redefine—redefines an input field as a portion of an output field. Use this option to map multiple input fields to the same redefined output field.Consolidate—consolidates an input field with an existing output field. Use this option when two or more fields have different names but contain the same data, such as zipcode, ZIP5, and postal_code.

For Redefine and Consolidate, make sure that the lengths of the input fields do not exceed the overall length of the redefined or consolidated output DDL field.

10. When you are ready, click Save to save the output DDL field mapping. When the Merge Split Utility step runs, it will create an output DDL file that uses this mapping.

Merge FilesFor a merge operation, all input files that will be merged and the output file MUST be the same shape. In other words, they must use the same DDL.

Input files must be sorted by match keys.

ExampleIn this example, Input 1 will be merged into Input 2 using the Name field as Match Key.

Input 1

Custmer_ID# Name

0000001 John Nicoli

0000002 Mary Nicoli


16-22 Merge Files

Input 2

The following DDLs are used for ALL input files and output files:

On output, the program copies Match Key values from Input 1 and Input 2 along with other components of data. The record order will be determined by the order of key values. As a result, the total number of records is the sum of the number of records from both Input 1 and Input 2.

Output File

To merge files

1. From the Merge Split Utility step, click Advanced and navigate to Process, Settings.

2. On the Field Settings tab, select Merge for Process Type from the drop-down list.

3. On the Field Settings tab, select Match Key from the drop-down list.

You can specify up to five fields for the Match Key, separated by commas.

Custmer_ID# Name

9000001 Alice Rogers

9000002 Kevin McCarthy

Customer_ID#

Name

Customer_ID# Name

9000001 Alice Rogers

0000001 John Nicoli

9000002 Kevin McCarthy

0000002 Mary Nicoli

A red flag indicates a REQUIRED settings.


Split a File 16-23

Split a FileSplitting a file is useful when your system has a file-size limit or you want to separate a file into manageable pieces. The pieces can later be re-assembled using the Merge operation. You can split the input file by the number of records or bytes per segment.

To split a file

1. Click Advanced and navigate to Process, Settings. 2. On the Field Settings tab, select Split for Process Type

from the drop-down list. 3. On the Field Settings tab, select Partition Method from

the drop-down list.

You can specify up to five fields for the Match Key, separated by commas. If the Partition method is set to Ranges, this setting can contain only one field.

Partition Method Description

Round Robin Number Split the file by number of records. If this is selected, the Round Robin Number must be specified.

Round Robin Keys Split the file by the key (field). If this is selected, Match Keys must be specified. The input file should be sorted by the Match Key

field.

Ranges Split the file by a range of values. If this is selected, Range Start and Range End values (Entry Settings tab) must be specified.

Ranges Stable Split the file by the field name and field length. The field name and field length are specified by Match Key.

Records Per Segment Split the file by segment. If this is selected, Records Per Segment must be specified.

Bytes Per Segment Split the file by segment. If this is selected, Bytes Per Segment must be specified.

Segment Per File Split the file by the defined number of segments. The number of segments is spcified by Number of Output Files.


16-24 Split a File

4. On the Field Settings tab, specify Number of Output Files. This is the number of output files to create. The entry in the output file name is used as a base file name; extensions will be generated up to the value specified here.

Examples

Round Robin KeysIn this example, the input file will be split to Output 1 and Output 2 using the Round Robin Keys method, and Lev2_matched field as the Match Key.

Input File

On output, the program splits the input file. The first Lev2_matched group will be written to Output1, and the second Lev2_matched group will be written to Output2.

Output 1

Output 2

Round Robin NumberIn this example, the input file will be split to Output 1, Output 2 and Output 3 using the Round Robin Number method, and the Round

Name Lev2_matched

B McCarthy 000001

Bob McCarthy 000001

Catherine Rogers 000002

Cathy Rogers 000002

Name Lev2_matched

B McCarthy 000001

Bob McCarthy 000001

Name Lev2_matched


Cathy Rogers 000002


Merge and Split Files 16-25

Robin Number is set to ‘1’ .

Input File

On output, record #1 is written to output1, record #2 is written to output2, and record #3 is written to output3. Since there are only three output files specified, record #4 goes back to output1, and the cycles continue in this manner:

Output 1

Output 2

Output 3

Merge and Split Files

To merge and split files

1. Click Advanced and navigate to Process, Settings. On the Field Settings tab, select Both for Process Type from the drop-down list.

Name Lev2_matched

B McCarthy 000001

Bob McCarthy 000001


Cathy Rogers 000002

Name Lev2_matched

B McCarthy 000001

Cathy Rogers 000002

Name Lev2_matched

Bob McCarthy 000001

Name Lev2_matched



16-26 Merge and Split Files

2. On the Field Settings tab, select Partition Method from the drop-down list.

For the Both Process Type, first Merge, then Split will be performed.

3. On the Field Settings tab, select Match Key from the drop-down list.

You can specify up to five fields, separated by commas. If the Partition method is set to Ranges, this setting can contain only one field.

4. On the Field Settings tab, specify Number of Output Files. This is the number of output files to create. If this value is greater than 1, the entry in the output file name is used as a base file name, and extensions will be generated up to the value specified here.

Partition Method Description

Round Robin Number Split the file by number of records. If this is selected, a Round Robin Number must be specified.

Round Robin Keys Split the file by the key (field). The input file must be sorted by this field. If this is selected, Round Robin Keys must be specified.

Ranges Split the file by a range of values. If this is selected, Range Start and Range End values (Entry Settings tab) must be specified.

Ranges Stable Split the file by field name and field length. The field name and field length are specified by Match Key.

Records Per Segment Split the file by segment. If this is selected, Records Per Segment must be specified.

Bytes Per Segment Split the file by segment. If this is selected, Bytes Per Segment must be specified. The number of segments is specified by Number of Output Files.

Segment Per File Split the file by the defined number of segments.



Resolve Utility 16-27

Resolve Utility

The Resolve Utility resolves transitivity affecting links between transactions. When a Window Linking is performed with the Relationship Linker, you can create a link file indicating which records are linked together with common data.

Transitivity occurs when two records are linked together indirectly through a third record in a multi-linking process. For example, record A may have linked record B in the first run and record B may link record C in a subsequent run using a different window key. When this happens, record A has linked record C through transitivity. Using the MALINK file from the Relationship Linker, the Resolve utility creates a relationship of the records that can then be used to represent the entire matched record set.

Please contact Trillium Software® Customer Support for more information on multi-linking.

Example

If the MALINK record layout is:

Recid(20) + Recid(20) + Match Type(1) + Pattern(3)00000001 00000002 P 405


MALINK file frommatch on SS#

MALINK file frommatch on Name

Merge Files

Resolve



Multiple matches are run on Social Security Number and name. Record A matches to B, and on the other match record B matches to record C. Each run produces a MALINK file with the matches in it: A -> B & B -> C. The MALINK files are then combined.

Recid Recid Type Pat0002 0007 P 3290002 0015 P 2100007 0009 P 2300015 0022 P 230

The Resolve Utility processes this file and produces the following output:

Recid Recid0007 00020015 00020009 00020022 0002

Transitivity would also show that, if record A matched to record B and record B matched to record C, then record A must also match to record C.

The Resolve Utility’s output is then used to update the keys on the first linking’s output. Typically done with the File Update Utility, any occurrence of Recid in column 1 is updated to the value in Recid in column 2. The File Update Utility’s output is then sorted on the updated key to group all recoded records together with their resolved match set.

Input and Output SettingsFor input, the Resolve Utility takes merged MALINK files from the 2nd-Nth relationship linkers in a multi-link process. It resolves transitivity issues of matches from the runs which followed the first linking.

The Resolve Utility does not use a DDL for output.


Link Field 16-29

Link Field

To link files

1. From the Resolve Utility step, click Advanced and navigate to Input, Settings.

2. In From Link Field, select the DDL field from the drop-down list. This DDL field contains the starting key of the link (generally located in the left column of the Relationship Linker’s link output file).

3. In To Link Field, select the DDL field from the drop-down list. This DDL field contains the ending key of the link (generally located in the right column of the link output file from the Relationship Linker).

In addition, you can specify the following settings:

Setting Description

Process Group Records

Number of records to process in a set. When the program reaches this limit, it writes output to the file in a resolved form. Buffers are created and processing continues.

Process Group Memory

Maximum memory to use in a set. This overrides the Process Group Record settings if both are used.



16-30 Set Selection Utility

Set Selection Utility

The Set Selection Utility selects data from a file and then skip and select that data on output. Selection of data occurs based on Match Keys, Select Record Conditions and Bypass Record Conditions. Field names are used for match keys.

This utility is useful when you select records based on relationship keys (created during the linking process) where you want a set of records to be evaluated for defined criteria.

ExampleFor example, if the Match Key is Household_Number field, the program first selects records that have the Household_Number field in the input file.

Assume that, in the Select Record Conditions or Bypass Record Conditions, the condition is set to Household_number=00001. In this case, the program selects records if the values in the Household_Number field equal 00001.

After running the program, you can verify the results of the select operation by viewing the output file in the Data Browser.

Input and Output SettingsThe Set Selection Utility is usually used on the results of the Relationship Linker. In this case, use the Input File Name and Input DDL Name from the output of the Relationship Linker step.

Input files must be sorted by match keys.


Select Records 16-31

Select RecordsThe selection will be applied to records in the input file based on the Match Key field specified by the user. Records with the same match key values will be selected and written to the output file.

To select records

1. From the Set Selection Utility step, click Advanced and navigate to Input, Settings.

2. In Match Key, select the DDL field from the drop-down list. The program selects records which have this field.

Figure 16.6 Select Match Key

To set a limit on number of records or key setsYou can specify the maximum number of records to be selected and the minimum number of records per Match Key set (a group of records with the same value) to be selected. These settings can be set separately for input and output.

1. Click Advanced and navigate to Input, Settings for input and Output, Settings for output.

2. Refer to the table below and specify the appropriate values:

Setting Description

Maximum Total Records

Numeric value greater than or equal to 1. It specifies the maximum total number of records of a specific key set to process.


16-32 Select Records

To set a condition to select recordsNext, you can specify a specific value for the match key that you want to select. Use Select and Bypass Conditions to do this.

1. Click Advanced and navigate to Input, Settings, Select Record Conditions.

2. Create a desired condition to select records. See “Select or Bypass Records” on page 5-37 for building a condition.

Figure 16.7 Select Conditions

Minimum Records Per Set

Numeric value. Any key set with a record count that is less than or equal to this value will be discarded without processing.

Maximum Records Per Set

Numeric value. Any key set with a record count that is equal to or exceeds this value will be discarded without processing.

Maximum Set Numeric value that limits the total number of key sets to process.

Setting Description



Sort Utility 16-33

Sort Utility

The Sort Utility reads records from input data files and sorts them to produce a single output file. The single output file is created in a common shape with a single associated Data Dictionary Language (DDL) file.

The sort functions support up to 99 sort keys. During the sort step, you can select fields from input records to be written to the output. This process is controlled through the input and output DDL field-mapping function.

See Chapter 9 and Chapter 10 for details of the Sort Utility.



16-34 Sort Utility


17-1

CHAPTER 17 Customizing the Control Center

Customizing the Control Center

17-2

TS Quality allows you to customize your work area by changing several configuration options:

fonts used within the Control Center

size and style of text

color used within the Control Center


Changing the Control Center Display Settings 17-3

Changing the Control Center Display Settings

You can change the way that items in the Control Center are displayed by modifying the following settings:

fonts for Menu, Text Viewer, Step Labels, Step Comments and Project Labels

color for background and arrows

Any changes made here become the default until more changes are made.

To open the Preference menu

1. Select Setup, Preferences. There are two tabs, General and Display.

The General tab allows you to decide which applications or functions launch when the Control Center opens.

See “Set Up the Control Center” on page 2-5 for descriptions for the General tab.

The Display tab lets you select a new default font, style, and size for the text appearing in the Menu, Text Viewer, Step Labels, Step Comments, and Project labels. You can

You can also right-click anywhere inside the Data Flow Architect (just not on a specific step module) and select Preference.


17-4 Changing the Control Center Display Settings

also choose background and foreground colors for these items.

Figure 17.1 Preference Display Tab

To change the Font Settings

1. Select the Display tab. In the Category box, click the item that you want to change.

2. In Font Selection, select the new font, style, and size from the pull-down menus on the right. As you make your changes, the text in the Sample box reflects the changes.

3. Click OK. The Preferences tab closes and the new font settings are applied to the selected item.

To change the Color Settings

1. Select the Display tab. In the Category box, click the item that you want to change. The Color section becomes active.


Changing the Control Center Display Settings 17-5

2. Click either the Foreground or Background button, depending on which color you want to change. These buttons invoke separate identical windows.

Swatches

The Swatches tab lets you choose a color from a palette of preset colors:

a. Click a color in the palette. The Preview section below displays the selected color scheme. As you select a color from the palette, your choice will be recorded in the Recent: grid on the right. The far-left box in the first row of the grid will be filled with the selected color. Each time you select a color, the rest of the boxes will be filled in.

b. Click OK. The Foreground/Background window closes. HSB

The HSB tab lets you define the color by Hue (the color’s tint), Saturation (the hue’s purity), and Brightness (the color’s brightness):

a. Select the color component that you want to change by selecting either the Hue, Saturation, or Brightness radio button. The Color box on the left changes based on your selection.

b. Move the slider up or down to shift though the color spectrum. The Color box on the left adjusts accordingly.

c. When you want to see how a color will appear, click on the section of the Color box where the color appears. The Preview section displays the selected color scheme.

d. Click OK. The Foreground/Background window closes. RGB

The RGB tab allows you to define a color as a combination of the Red, Green, and Blue primary colors.

a. Drag the sliders for the respective colors to the left or right. The Preview section updates as the color changes.


17-6 Changing the Control Center Display Settings

b. When you are satisfied with your selection, click OK. The Foreground/Background window closes.

3. Once you are satisfied with the changes, click OK in the Preferences window. The new color settings will be applied.

To change the background color

1. Right-click anywhere inside the Data Flow Architect (not on a specific step module) and select Preference, Background Color.... The Background Color Chooser window opens.

2. Refer to the steps above for changing the color settings. 3. Click OK. The new color settings will be applied.

To change the arrow color

1. Right-click anywhere inside the Data Flow Architect (not on a specific step module) and select Preference, Arrow Color.... The Arrow Color Chooser window opens.

2. Refer to the steps above for changing the color settings. 3. Click OK. The new color settings will be applied.


A-1

APPENDIX A The Data Dictionary Language and DDL Types

The Data Dictionary Language and DDL Types

A-2 The Data Dictionary Language

The Data Dictionary Language

The Data Dictionary Language (DDL) is a method for defining data file and record layouts. The Data Dictionary Language file, commonly referred to as DDL or DDL file, is a collection of keywords that contains the definitions of the input and output files that are used by TS Quality. DDL input and output files must be defined for each module in TS Quality.

See Chapter 2, “Working with a Project” for structure and components of DDLs and how to create them.


Data Dictionary Language (DDL) Types A-3

Data Dictionary Language (DDL) Types

The Type must be specified for every DDL field entry. There are four Type categories: Encoding (code page), Trillium Types, Date Format, and Class Keyword.

Encoding (Code Page)Encoding is a mapping of binary values to code positions which represent characters of data. It is also called a code page. The following table is a list of the main character encoding used in TS Quality.

Note that some encoding below may not be available depending on the chosen module or GUI Tool. Contact Customer Support for more information.

Table A-1: DDL Encoding

Type Description

NOTRANS NOTRANS means No Translation. The operations will be done in the default encoding for the host computer.

NOTE: Users need to be careful that the data will not be translated into their native encoding. For example, if a data file from Greece is run on a computer in the US and both the settings files and all of the fields in the DDL are set to NOTRANS, you will likely get a different result than if the same project was run in Greece.

ASCII American Standard Code for Information Interchange. A 7-bit encoding for representing English characters.

BIG5 Traditional Chinese

CCSID937 Traditional Chinese

CP037 EBCDIC, IBM037

CP1250 Latin 2, Eastern European


A-4 Encoding (Code Page)

CP1251 Cyrillic (Slavic)

CP1252 Latin1 (ANSI)

CP1253 Greek

CP1254 Turkish

CP1255 Hebrew

CP1256 Arabic

CP1257 Baltic

CP1258 Vietnamese

CP932 Microsoft Extended Shift-JIS Japanese

CP936 Simplified Chinese, GBK

CP949 Korean

CP950 Traditional Chinese

EUCCN Simplified Chinese, Unix, GB2312, EUC-SC

EUCJP Japanese, Unix, EUC-JP, EUC-J, JEUC, J-EUC, EUCJ

EUCKR Korean, Unix, EUC-KR, KS_C_5861-1992

EUCTW Traditional Chinese, Unix, CNS-11643, CNS-11643-1992

GB12345 Traditional Chinese

HZGB2312 Simplified Chinese, HZ-GB-2312

IBM-83-4040IBM-83-4242

Japanese corporate kanji code

ISO2022JP Japanese, ISO-2022-JP

ISO-8859-7 Latin/Greek

ISO 8859-9 Latin-1 modification for Turkish (Latin-5)


Type Description


Encoding (Code Page) A-5

JEF-83-A1A1JEF-83-4040JEF-78-A1A1JEF-78-4040

Japanese corporate kanji code. Hitachi.

JOHAB Korean

KEIS-83-A1A1KEIS-83-4040KEIS-78-A1A1KEIS-78-4040

Japanese corporate kanji code. Fujitsu.

LATIN1 ISO 8859-1

LATIN2 ISO 8859-2

LATIN4 Baltic

LATIN7 Baltic

LATIN9 ISO 8859-15, Latin1 + Euro symbol and accented characters

UCS2 The encoding of Unicode as 16-bit values. This is the default transformation format of Unicode. UCS2 is the same as UTF16.

UTF7 The encoding of Unicode as 7-bit values that can be transmitted safely via E-mail (MIME messages).

UTF8 The encoding of Unicode as 8-bit values. In this encoding, all ASCII characters are represented by themselves, and all bytes of multi-byte characters have the eighth bit turned on. UTF8 is the default encoding for XML.

UNICODE20:BIG-ENDIAN

Unicode with the most significant byte first. Other name: big-endian

UNICODE20:LITTLE-ENDIAN

Unicode with the least significant byte first. Other name: little-endian


Type Description


A-6 Trillium Types

Trillium TypesTrillium Type is a data type of DDL field. The following table is a list of Types used in TS Quality. Many of these Types can be used together (example: PACKED DECIMAL).

Table A-2: Trillium Types

Type Description

ASCII NUMERIC Numeric characters in ASCII.

BITFIELD BITFIELD type is an array of bits embedded within a byte or an array of bytes. They are treated as right-justified unsigned integers. The length is specified by the LENGTH statement. The starting bit position is specified by the POSITION statement. Numbering schemes for identifying the position of bits are: little-endian - the smallest position number is at the far right of the entity, big-endian - the smallest position number is at the far left of the entity. One is used as the starting counting position.

BOOLEAN BOOLEAN may also be qualified as INTEGER. They are treated as right justified binary integers; however, fields with the value of zero are considered to be equal to FALSE, while fields with a non-zero value are considered to be TRUE.

BINARY Binary data type.

INTEGER INTEGER types may be signed or unsigned. They are treated as right justified binary integers of the length specified in the LENGTH statement. Integers may also be qualified as BOOLEAN.

PACKED PACKED types can be signed or unsigned. Packed decimal digits of the length are specified in the LENGTH statement in bytes. They are treated as right-justified. Since packed decimals are stored two digits to a byte, the total number of digits is twice the length for UNSIGNED PACKED and twice the length minus 1 for signed PACKED. For signed values the right-most nibble holds the sign value.


Trillium Types A-7

ZONED DECIMAL The ZONED DECIMAL type is treated as EBCDIC NUMERIC characters with the least significant byte divided into a numeric digit and a sign. The sign occupies the least significant nibble of the byte and follows the conventions for PACKED decimal signs.

Table A-2: Trillium Types

Type Description


A-8 Date Format

Date FormatDate format is a type of data which may contain only valid dates. The following table contains a list of valid date formats.

Table A-3: DDL Date Format

Type Data Format

ASCII AMERICAN MM(/)DD(/)YYYY. 8 or 10 bytes.

ASCII EUROPEAN DD(/)MM(/)YYYY. 8 or 10 bytes.

ASCII JULIAN (YY)YY(/-)DDD. 5, 7, or 8 bytes.

ASCII LONG JULIAN YYYY(/-)DDD. 7 or 8 bytes.

ASCII YEAR FIRST YYYY(/)MM(/)DD. 8 or 10 bytes.

EBCDIC AMERICAN MM(/)DD(/)YYYY. 8 or 10 bytes.

EBCDIC EUROPEAN DD(/)MM(/)YYYY. 8 or 10 bytes.

EBCDIC JULIAN (YY)YY(/-)DDD. 5, 7, or 8 bytes.

EBCDIC LONG JULIAN YYYY(/-)DDD. 7 or 8 bytes.

EBCDIC YEAR FIRST YYYY(/)MM(/)DD. 8 or 10 bytes.

PACKED AMERICAN 0MMDDYYYY. 5 bytes.

PACKED EUROPEAN 0DDMMYYYY. 5 bytes.

PACKED JULIAN (YY)YYDDD. 3 or 4 bytes.

PACKED LONG JULIAN YYYYDDD. 4 bytes.

PACKED YEAR FIRST 0YYYYMMDD. 5 bytes.

UNSIGNED PACKED AMERICAN MMDDYYYY. 4 bytes.

UNSIGNED PACKED EUROPEAN DDMMYYYY. 4 bytes.

UNSIGNED PACKED JULIAN 0(YY)YYDDD. 3 or 4 bytes.

UNSIGNED PACKED LONG JULIAN 0YYYYDDD. 4 bytes.


Date Format A-9

UNSIGNED PACKED YEAR FIRST YYYYMMDD. 4 bytes.

SJIS IMPERIAL DATE Japanese date format with imperial calendar. CP932 or Shift-JIS encoding only.

Example:

昭和 30 年 1月 1日

You must use valid month/day combinations. If the month/day is invalid, the output data is blanked out.

SJIS JAPANESE DATE Japanese date format with Gregorian calendar. CP932 or Shift-JIS encoding only.

Example:

1997 年 1 月 1 日


ASCII ROMAJI IMPERIAL DATE Japanese date format with shortened imperial year. ASCII encoding only.

Example:

S35-1-1


Table A-3: DDL Date Format

Type Data Format


A-10 CLASS Keyword

CLASS KeywordClass keyword specifies the format to be used for the date field. By using the class keyword, you can convert any 2-digit year into a 4-digit year.

The following table describes all specifications for the CLASS keyword.

Table A-4: DDL CLASS Keyword

Statement Description

DATE FORWARD Converts any 2-digit year into a 4-digit year when the data value is equal to, or greater than, the current year.

Top of date window = current year + 99Bottom of date window = current year

Example

If the current year is 2005: Top of date window = 2104 (2005 + 99 = 2104)Bottom of date window = 2005

DATE BACKWARD Converts any 2-digit year into a 4-digit year when the data value is equal to, or less than, the current year.

Top of date window = current yearBottom of date window = current year –99

Example

If the current year is 2005: Top of date window = 2005 Bottom of date window = 1906 (2005 – 99 = 1906)


CLASS Keyword A-11

DATE WINDOW {nnn}

Converts a 2-digit year into a 4-digit year, according to a user-specified date window. You can specify 1 to 4-digit numbers in {nnn}.-----------------------------------------------------------------Top of date window = if {nnn} >100 and {nnn} > current year, then {nnn} is the top of the date window.Bottom of date window = top of the date window - 99

If the current year is 1999: CLASS IS DATE WINDOW 2030Top of date window = 2030 (2030 > 100 and > the current year)Bottom of date window = 1931 (2030 – 99 = 1931)-----------------------------------------------------------------Top of date window = bottom of the date window + 99 Bottom of date window = If {nnn} >100 and {nnn} < current year, then {nnn} is the bottom of the date window.

If the current year is 1999: CLASS IS DATE WINDOW 1967Top of date window = 2066 (1967 + 99 = 2066)Bottom of date window = 1967 (1967 > 100 but < the current year)-----------------------------------------------------------------Top of date window = If {nnn} > 0 and {nnn} < 100, then top of the date window is current year + nnnBottom of date window = current year + nnn -99.

If the current year is 1999: CLASS IS DATE WINDOW 30Top of date window = 2029 (30 > 0 but < 100, 1999 + 30)Bottom of date window = 1930 (1999 + 30 -99).-----------------------------------------------------------------Top of date window = current year – nnn + 99. Bottom of date window = If {nnn} < 0, then bottom of date window is current year – nnn

If the current year is 1999: CLASS IS DATE WINDOW -30Top of date window = 2068 (1999 - 30 + 99)Bottom of date window = 1969 (-30 < 0, 1999 - 30).

Table A-4: DDL CLASS Keyword (Continued)

Statement Description


A-12 CLASS Keyword


B-1

APPENDIX B Parser Review Code

Parser Review Code

B-2 Parser Results

Parser Results

The Parser generates Completion Codes and Review Codes to identify specific conditions that occur for each record being parsed. You can review these codes to analyze the Parser results.

Parser Completion Codes (CDP/BDP)Table B-1 shows the return codes that appear in the pr_completion_code or bp_completion_code field, in output from the Customer Data Parser or Business Data Parser Repository.

Several errors (2, 3, 4, 5, 8, D) are caused by inaccuracies in the file path.

Table B-1: Parser (CDP/BDP) Completion Codes

Value Description

0 No error

1 Insufficient Storage. When using DDL field sub-segments (line_01a, line_01b, etc.) and the sum of the data in these fields exceeds the redefine field length, the data is truncated and a value of ‘1’ is returned. Processing continues normally for all other lines.

2 Table Error – Pattern, Word and/or City tables not found

3 Log File Error

4 Detail File Error

5 Pattern-Word-City Tab Error – Pattern, Word and/or City tables not readable.

6 Too Many Tokens

7 Line Definition Error

8 Display File Error

9 Invalid Parser Handle

A Invalid Parm File Entry


Customer Data Parser Review Code/Review Groups B-3

Customer Data Parser Review Code/Review Groups

Review codes are produced for many different data conditions. These codes can be evaluated in a post parsing process to trigger specific record handling or review. For instance, if a business wanted to review every record that had received a review code of 26 (Unknown Name Pattern), a subsequent step following the parsing process could redirect all records with this condition by selecting the records with this code.

The code values are represented on the record by position in the DDL field that corresponds to the line type that contained the condition. The field names that must be used in the CDP output DDL to contain these codes are:

pr_name_review_codes

pr_street_review_codes

pr_geog_review_codes

pr_misc_review_codes

pr_global_review_codes

B Invalid Interface Call Type. Must be either: O=Open, P=Process, C=Close

C Invalid Service Call Type. Must be either: D=Send to Display FileE=Supply Error Text

D Statistics File Error

F Parser not successfully initialized. Settings file may not be correctly defined. Check path and file name.

Table B-1: Parser (CDP/BDP) Completion Codes (Continued)

Value Description

Parser Review Code

B-4 Customer Data Parser Review Code/Review Groups

For each of these fields, a flag value of '1' is placed in the position in the field that corresponds to the value of the condition. So in our earlier example, where a review code of 26 was reported, you would find a '1' in the field pr_street_review_codes at position 26.

Table B-2 lists review codes, review groups, and descriptions for the Customer Data Parser.

Table B-2: CDP Review Codes and Review Groups

Review Code

Review Group

Description

Review Codes Can Belong To Multiple Review Code Fields:

000 000 No review code found

Name Codes

1 008 Unknown name pattern

2 009 Standardized first name too long

3 009 Display first name too long

4 009 Total number of export names gt max

5 009 Standardized middle name too long

6 009 Display middle name too long

7 009 Too many middle names

8 009 Standardized last name too long

9 009 Display last name too long

10 009 Standardized title too long

11 009 Display title too long

12 009 Too many titles

13 009 Standardized connector too long

14 009 Display connector too long



15 009 Standardized relation too long

16 009 Display relation too long

17 009 Standardized business too long

18 009 Display business too long

19 009 Derived genders conflict

20 009 Standardized generation too long

21 009 Display generation too long

22 010 More than one middle name

Street Codes

26 011 Unknown street pattern

27 011 Standardized street type too long

28 011 Display street type too long

29 011 Too many street types

30 012 Standardized direction too long

31 012 Display direction too long

32 012 Too many directions

33 013 Standardized street title too long

34 013 Display street title too long

35 013 Standardized complex name too long

36 013 Display complex name too long

37 013 Standardized house number too long

38 013 Display house number too long

Table B-2: CDP Review Codes and Review Groups (Continued)

Review Code

Review Group

Description

Parser Review Code

B-6 Customer Data Parser Review Code/Review Groups

39 013 Unusual house number

40 013 Display dwelling too long

41 013 Standardized dwelling too long

42 013 Too many dwellings

43 013 Unusual dwelling value

44 013 Too many dwelling values

45 013 Display box too long

46 013 Standardized box too long

47 013 Unusual box value

48 013 Display route too long

49 013 Standardized route too long

50 013 Standardized route number too long

51 013 Display route number too long

52 013 Unusual route value

53 013 Standardized complex type too long

54 013 Display complex type too long

55 013 Standardized dwelling number too long

56 013 Standardized box number too long

57 013 Display box number too long

58 013 Display dwelling number too long

59 020 Duplicate street line types


Review Code

Review Group

Description



Geography Codes

61 014 No city name found in records

62 014 No state found in records

63 014 Standardized city too long

64 014 Display city too long

66 015 Standardized state/province/county too long

67 015 Display state/province/county too long

70 015 Standardized country too long

71 015 Display country too long

72 015 Standardized neighborhood too long

73 015 Display neighborhood too long

74 015 Standardized post code too long

75 015 Display post code too long

76 015 Unusual post code value

77 016 Corrected city name too long

78 000 City name change used for city

79 017 Conflicting geographic types

80 018 Domestic city name present but could not be verified

Global Review Codes

83 001 Unidentified token

84 019 Unidentified line

85 001 Invalid token definitions


Review Code

Review Group

Description

Parser Review Code

B-8 Review Group Hierarchy

Review Group HierarchyTable B-3 displays the default review group hierarchy for the Customer Data Parser. The review group code is placed in the PREPOS field: pr_rev_group.

The Review Group Order setting (Process, Settings) can be used to modify the group hierarchy.

86 001 Label or label element too long

87 001 Miscellaneous data for line too long

88 001 Too many categories

89 001 Too many names for export

90 002 Mixed name forms present

91 003 Hold mail element present

92 004 Foreign address element found

93 005 No names identified

94 006 No street identified

95 007 No geography identified

96 - 99 Currently unassigned


Review Code

Review Group

Description

Table B-3: CDP Review Group Hierarchy

Review Group

Text of Parser Report Description

001 Unidentified token


Review Group Hierarchy B-9

005 No names identified No name found on the recordFor example: 12 main street Boston MA 01123

006 No street identified No street information found on the record. For example: John Smith Boston MA 01123

007 No geography identified No Geography information found on the record. For example:John Smith 12 main street

014 No city or county identified

Record did not contain city or county, or could not be identified

019 Unidentified Line Line type could not be determined, and is set to ?

008 Unknown name pattern Pattern for name format does not exist in table. For example:John Smith B A C D 12 main street Boston MA 01123

011 Unknown street pattern Pattern for street format does not exist in table

013 Unusual or long address When the length of the street name exceeds 25 bytes as defined in prepos.ddl

012 Invalid directional Direction is inconsistent

017 Conflicting geography types

The country default is US and the valid city state is followed by foreign type postal code. For example:Mr John Smith 12 main street Boston MA A1C 3R4

Table B-3: CDP Review Group Hierarchy (Continued)

Review Group


Parser Review Code

B-10 Review Group Hierarchy

015 Geography too long The length of the geography exceeds 30 bytes as defined in prepos.ddl

018 Unable to verify city name

City name cannot be identified

016 Corrected city name too long

Table entry for a city change recode exceeds 25 bytes as defined in prepos.ddl

020 Multiple street line types More than one street line is found on the record

010 More than one middle name

Two or more middle names were found on the name line. For example: John Adam Wilson Smith12 main street Boston MA 01123

009 Derived genders conflict The title and first name gender value are different. For example: Miss John Smith 12 main street Boston MA 01123

004 Foreign address Parser found a geography element outside the country that the Parser is running. For example:John Smith 12 main street Boston France 01123

003 Hold mail One of the lines on the record is of type H (such as Return Mail)For example:John Smith Return Mail 12 main street Boston MA 01123


Review Group



Business Data Parser Review Code B-11

Business Data Parser Review CodeTable B-4 lists the review codes and descriptions for the Business Data Parser.

002 Mixed name forms A business and personal name were both found on the record.For example:John Smith ABC corp 12 main street Boston MA 01123

000 No review code found No identifiable error on record. For example: John Smith 12 main street Boston MA 01123


Review Group


Table B-4: BCP Review Code

Review Code

Description

Review Codes can belong to multiple Review Code Fields:

082 Unidentified pattern

087 Miscellaneous line too long

086 Label Line too long

088 Too many categories found

083 Unknown token

090 No data found

000 No targeted conditions found

Parser Review Code

B-12 Customer Data Parser Review Codes/Review Groups for Asia-Pacific Countries

Customer Data Parser Review Codes/Review Groups for Asia-Pacific Countries

Review CodesThe Customer Data Parser for China, Korea, and Taiwan generates review codes for each record to highlight specific conditions describing how the record was processed. Review codes are written to the pr_review_code field. The following table lists the individual review codes.

Review Code Description

000 No review codes written.

009 Unknown token remaining after processing.

010 Unknown tokens remaining after matching front and back parts of string.

041 Business branch value too long.

042 Business type value too long.

043 Business name value too long.

044 No business name.

045 Business name and type too long for field.

051 Surname value not found in lookup table.

052 Surname value too long.

053 First (given) name value too long.

054 No surname.

055 No given name.

056 Honorific not found in table.

057 Honorific too long.

058 Unidentified token.


Review Groups B-13

Review GroupsReview groups are groups of review codes that illustrate types of conditions present in the data, whereas review codes describe actual specific conditions. Thus, review groups provide a way for users to quickly understand the general types of conditions occurring in a data. Review groups are written to the pr_review_group field.

921 Recoded business name according to a word/pattern table entry to correct a mistyped or misused word.

Review Code Description

Review Group Description

002 Missing name (no business name and missing either first or last name).

003 Business name does not contain a business keyword from the word/pattern file.

004 There are unknown tokens in the business or personal name.

005 There is a contact name in the business name.

Parser Review Code

B-14 Exception Status

Exception StatusThe Customer Data Parser for Japan generates exception status for each record to highlight specific conditions. Exception status are written to the pr_status/pr_h_status fields. The following table lists the individual exception status.

The following table shows values for the exception status.

* Word separation : spaces between characters, and characters set for PNP_DELIMITER.

Value Description Mode

00 No specific condition occurred. ALL

20 No input string found (including comments deleted) ALL

22 Unknown token found in the final mask. ALL

30 Multiple business types found. BNPCLUE

31 Only business type found. BNPCLUE

32 The record consists of branch name or branch name suf-fix only.

BNPCLUE

33 *All parsed string is in alphabet with word delimiter, and no busi-ness clue found.

BNPCLUE

34 All parsed string is in katakana with word delimiter, and no busi-ness clue found.

BNPCLUE

35 All parsed string is in hiragana with word delimiter, and no busi-ness clue found.

BNPCLUE

36 Multiple words (space delimited) in kanji and no business clue found.

BNPCLUE

40 Single person mode is ON and more than two words found. PNP


Index I-1

IndexAABSOLUTE

Data Comparison Calculator 11-21ALPHA

Intrinsic AttributesParsing Customization 7-17

Asian Charactersoperators 5-33

AssociativityData Reconstructor 13-6

Asterisks (*)Parsing Customization 7-21

AttributeDDL Editor 3-11

Attribute ModifiersCategory 7-11Function 7-11Gender 7-11Parsing Customization 7-10Recode 7-12

AttributesParsing Customization 7-10

BBatch Script

create a script 14-3Edit a script 14-4Run a script 14-5

batch script 14-3Binary Data Strings

Data Reconstructor 13-9blanks

Frequency Count Utility 16-17BNP 6-11BNP_CLUE 6-11Build a Conditional Statement 5-35

Business Attribute 6-18Business Data Parser

BPREPOS 6-29include unknowns in standard

original field 6-34populate unknown patterns 6-34Repository DDL File 6-30retain original values 6-34

Business Data Parser (BDP) 6-27Business Data Parser Review Code

B-11

CCategory

Attribute Modifiers 7-11Character Translation 5-9City Directory File 6-17City Name Changes

Locality 7-15Post Town 7-15

CJKTOARABICNUM operator 5-30CJKTOFULL operator 5-29CJKTOHALF operator 5-28Class

DDL A-10DDL Editor 3-11

CLASS Keyword A-10Class keyword

DDL 2-42collating sequence

Sort Utility 9-5Colleting sequence

ASCII 9-6EBCDIC 9-6FOLDED_ASCII 9-6FOLDED_EBCDIC 9-6MULTI_NATIONAL 9-6

Comman line executionProgram names 15-4


I-2 Index

syntax 15-3Comment

DDL Editor 3-11Comment Lines

Parsing Customization 7-21Comments

Data Reconstructor 13-7Common Fields

Create Common Utility 12-6Commonization

Create Common Utility 12-3Comparison Routine

Data Comparison Calculator 11-21Comparison Routines

Relationship Linker 10-13Completion Codes

Business Data Parser 6-36Customer Data Parser 6-25

COMPOSE or COMP 5-30Conditionals 5-21

Logic Builder 5-35Operators 5-26Syntax 5-21

IF/ELSE Statement 5-21Control Center

Data Flow Architect 2-20Graphics View 2-21List View 2-26Project Panel 2-16Project Viewer 2-17Step Viewer 2-19

ConventionsParsing Cutomization 7-21

Country Settings 4-7Create Common

Decision Routines 12-12Create Common Utility 12-3

Common Fields 12-6Commonization 12-3

Decision Routines 12-7, 12-8Match Key Level settings 12-6Survivor record 12-8Survivorship 12-3

Create New Project Wizard 2-3Csutomer Data Parser

Join name lines 6-22CTOSIMPCHINESE operator 5-30CTOTRADCHINESE operator 5-30Customer Data Parser

Business Attribute 6-18City Directory File 6-17INPUT_LINE_01 6-3Line Definitions 6-19Name Generation 6-21Parsing Logic Flow 6-4PREPOS 6-12Preprocess House Number 6-18Repository DDL File 6-14Review Group Hierarchy B-8Split address lines 6-23Word Pattern Definition Fil 6-17

Customer Data Parser (CDP) 6-3Customer Data Parser Review Code

B-3Customer Data Parser Review Groups

B-3Customer Data Pasrser

Exceptions File 6-16Customized Definitions Table

Parsing Customization 7-3Customizing the Control Center 17-3

Color settings 17-4Display tab 17-3Font settings 17-4General tab 17-3

DData Browser 3-3


Index I-3

Field Selection 3-4Save view 3-6

Data Comparison Calculator 11-21ABSOLUTE 11-21Comparison Modifiers 11-21Comparison Routines 11-21comparison test 11-21PARTIAL1 11-21Score 11-21

Data Dictionary Editor 3-9Data Dictionary Language (DDL) 2-33Data Flow Architect

Control Center 2-20Data Reconstructor 13-3

Associativity 13-6Binary Data Strings 13-9Comments 13-7Conditions 13-11Fields 13-4literal values 13-4operators 13-6Precedence 13-6Reserved Words 13-5ruels file 13-3rule script language 13-4Rules File

Action Statements 13-15Logical Operators 13-13String Variables 13-19

Use Rule 13-22Data Reconstructor Rules 13-5Date format

DDL 2-42DDL

Attributes 2-35CLASS 2-36Class keyword 2-42Comment 2-35Date format 2-42

DDL Builder 2-37Default 2-35Encoding 2-42Field Name 2-34Keywords 2-34Length 2-35methods of creating 2-33Record Length 2-34Record Name 2-34Redefine 2-34, 2-40Start Position 2-34syntax 2-39text format 2-33Type 2-34Type keyword 2-42XML format 2-33

DDL EditorAttribute 3-11Class 3-11Comment 3-11Default 3-11Field Name 3-10Length 3-10Record Length 3-10Record Name 3-10Redef 3-10Start Position 3-10Type 3-10Update ORIGINAL_RECORD

Length 3-10DDL Types A-3Decision Routines

Create Common 12-12Create Common Utility 12-7, 12-8

DefaultDDL Editor 3-11

DeleteParsing Customization 7-7

Delimited file


I-4 Index

creating a DDL 2-38Delimited Files

DDL considerations 2-33Delimiters

File Display Utility 16-5Director

Cleansing Server 14-12Matching Server 14-12Real-Time Processing 14-11

Dual Address Information 9-17

EEncoding

DDL 2-42Error Report

Field and Pattern Lists 11-18Exporting projects 14-7

FField and Pattern Lists

Error Report 11-18Field Files

Relationship Linker 10-21Field List Editor

Comparison Routine 11-14Description 11-14Field Name 11-14Propagation Routine 11-14Routine Modifier 11-14Score 11-14

Field NameDDL Editor 3-10

Field Scanning 5-10Field Selection

Data Browser 3-4Field Settings

File Display Utility 16-8Fields

Data Reconstructor 13-4

File Display Utility 16-3Delimiters 16-5Field settings 16-8Inner Key 16-3Outer Key 16-3

File Qualifier 5-4File Update Utility 16-10

master file 16-10match key 16-10, 16-15transaction file 16-10

Frequency Count Utility 16-17Full-width (Zenkaku) and half-width

(Hankaku) Japanese Characters 5-31

FunctionAttribute Modifiers 7-11

GGender

Attribute Modifiers 7-11Global Data Router 4-3

Country Rules file 4-6Country Settings 4-7DDL Settings 4-9Fields Settings 4-9Global Geography Table 4-6Global Rules file 4-6NOMATCH file 4-4Rules Files 4-6Separate Output 4-3Single Output 4-4

Global Geography Table 4-6Grade Pattern Editor

Category 11-14Field Name Columns 11-15Pattern ID 11-14

Graphics ViewControl Center

Data Flow Architect 2-21


Index I-5

HHelp

Control Center 2-8HIRAGANASTOL operator 5-30How to Use Operators for Asian

Characters 5-33HYPHEN


IIF Statements

Data Reconstructor 13-10IF/ELSE

Data Reconstructor 13-3Import projects

Windows to Unix 14-10Importing projects 14-7, 14-9Inner Key

File Display Utility 16-3Insert

Parsing Customization 7-7Intrinsic Attributes

Parsing Customization 7-17

JJKANATOROMAN operator 5-29JROMANTOKANA operator 5-29

KKTOROMAN operator 5-29

LLength

DDL Editor 3-10Line Definitions 6-19Line Lengths

Parsing Customization 7-21Line Type

Geography 7-8Miscellaneous 7-8Name 7-8Street 7-8

Line TypesParsing Customization 7-8

Linking FileRelationship Linker 10-18, 10-25

List ViewControl Center

Data Flow Architect 2-26literal data string

Frequency Count Utility 16-17Literal Values

Data Reconstructor 13-4

MMALINK

Resolve Utility 16-27Mask 5-17

Transformer 5-17mask shapes

Frequency Count Utility 16-17Masks

Parsing Customization 7-6Recodes

Parsing Customization 7-12master file

File Update Utility 16-10Match Key

File Update Utility 16-10, 16-15Set Selection Utility 16-31

Match Key Level SettingsCreate Commmon Utility 12-6

Match Level CodesPostal Matchers 9-15

Match Master Duplicate FileFile Update Utility 16-13

Match Master File


I-6 Index

File Update Utility 16-13Match Tran Duplicate File

File Update Utility 16-15Match Tran File

File Update Utility 16-14Matching

TS Quality Analyzer 8-4Merge Files

Merge Split Utility 16-21merge keys

Merge Split Utility 16-19Merge Split Utility 16-19

merge keys 16-19split rules 16-19

ModifyParsing Customization 7-7

multi-linkingResolve Utility 16-27

Multiple DefinitionsParsing Customization 7-15

NName and Address Format

project 2-13NUMERIC


OOperations

Parsing Customization 7-6Operators

Data Reconstructor 13-6Operators for Asian Characters 5-28Outer Key


PParser Customization Editor

Parsing Customization 7-31Parser Tables 6-17Parsing Customization

Attribute Modifiers 7-10Attributes 7-10City Name Changes

for non-US cities 7-14for US Cities 7-14

Comment lines 7-21Conventions 7-21Customized Definitions Table 7-3Delete 7-7Insert 7-7Line Lengths 7-21Line Types 7-8Masks 7-6Modify 7-7Multiple Definitions 7-15Operations 7-6Patters 7-15Phrase 7-5Quotation Marks 7-21Special Entries 7-14Standard Definitions Table 7-3Sub-tokens 7-5Synonym 7-12Syntax of Definitions 7-4Tokens 7-4User-defined Attributes 7-10

PARTIAL1Data Comparison Calculator 11-21

Partition MethodMerge Split Utility 16-23

Pattern FilesRelationship Linker 10-21

pattern problemsParsing Customization 7-37

Bad Name Patterns 7-37Patterns


Index I-7

Pasing Customization 7-15Phrase

Parsing Customization 7-5PNP 6-10Portion of a Field

Data Reconstructor 13-8Positions

Beginning 7-9Default 7-9Ending 7-9Parsing Customization 7-9

Postal Directory Browser 9-20City Level 9-20Street Details 9-22Street Level 9-21

Postal Matchers 9-9Census Tables 9-9DPV Tables 9-9Match Level Codes 9-15Postal Base Data File 9-12Postal Directories 9-9Postal Form Customer 9-13Postal Form Database Date 9-13Postal Form File 9-12Postal Form Job Number 9-13Postal Form List 9-13Postal Level1 Data File 9-12Postal Level2 Data File 9-12

Prcustom 6-16Business Data Parser 6-32

PrecedenceData Reconstructor 13-6

PreferencesCustomizing the Control Center

17-3General 2-6, 2-7Help 2-8

Preprocess House Number 6-18Program Names

Command line execution 15-4project

Control Centerproject step 2-28

creating 2-9input data and input DDL 2-12multi-country 2-11Name and Address Format 2-13Properties 2-16settings 2-10summary 2-14type 2-3, 2-10

custom project 2-3standard project 2-3

Project PanelControl Center 2-16

Project StepControl Center 2-28

Project ViewerControl Center 2-17

Projectsexporting 14-7importing 14-7

QQuotation Marks

Parsing Customization 7-21

RReal-Time Processing 14-11

Director 14-11Recode

Attribute Modifiers 7-12Record Length

DDL Editor 3-10Record Name

DDL Editor 3-10Redef

DDL Editor 3-10


I-8 Index

RedefineDDL 2-40

reference fileRelationship Linker 10-24

Reference Level1 NumberRelationship Linker 10-26

Reference Level2 NumberRelationship Linker 10-26

Reference LinkingRelationship Linker 10-13, 10-24

Reference Record IDRelationship Linker 10-26

Relationship Linker 10-3Business level 10-13Comparison Routines 10-13Consumer level 10-13Field Files 10-21Pattern Files 10-21Reference File 10-24Reference Level1 Number 10-26Reference Level2 Number 10-26Reference Linking 10-13, 10-24Reference Record ID 10-26Window Key 10-3Window Linking 10-13, 10-18Window Size 10-22

Relationship Linker Results Analyzer 11-2, 11-3

fields to display 11-7matched records 11-5records to display 11-9suspect records 11-5

Relationship Linker Rule Editor 11-12Field List Editor 11-13, 11-14Grade Pattern Editor 11-13, 11-14

Reserved WordsData Reconstructor 13-5

Resolve Utility 16-27MALINK 16-27

multi-linking 16-27transitivity 16-27

Review CodesBusiness Data Parser 6-36Customer Data Parser 6-25

Review GroupsCustomer Data Parser 6-26

ROMAJITOHIRAGANA or RTH 5-29Round Robin Keys

Merge Split Utility 16-24Round Robin Number

Merge Split Utility 16-24Routine Modifier

Comparison Routine 11-21Rule Script Language

Data Reconstructor 13-4Rules File

Data Reconstructor 13-22rules file

Data Reconstructor 13-3Rules Files 4-6

SSave view

Data Browser 3-6Score

Data Comparison Calculator 11-21Select and Bypass Records

Data Reconstructor 13-30Select or Bypass Records 5-37Select/Bypass Records

Logic Builder 5-37Set Selection Utility 16-30Sort Fields

Sort Utility 9-5Sort Utility 16-33

.srt 9-2Collating sequence 9-5for Postal Matchers 9-2


Index I-9

JUST_DUPS 9-7KEEP_ALL 9-6KEEP_NONE 9-6KEEP_ONE 9-6Sort Fields 9-5

Source IdentificationTransformer 13-31

Special EntriesParsing Customization 7-14

Split a FileMerge Split Utility 16-23

split rulesMerge Split Utility 16-19

Standard Definitions TableParsing Customization 7-3

StandardizationTS Qulity Analyzer 8-4

Start PosDDL Editor 3-10

Step ViewerControl Center 2-19

Sub-tokensParsing Customization 7-5

Survivor recordCreate Common Utility 12-8

SurvivorshipCreate Common Utility 12-3

SynonymParsing Customization 7-12

SyntaxCommand line execution 15-3

Syntax of DefinitionsParsing Customization 7-4

TTable Recoding 5-17Title


TokensParsing Customization 7-4

transaction fileFile Update Utility 16-10

Transformer 5-2Character Translation 5-9Field Scanning 5-10File Trace Key 5-38hex conversion 5-9Source Identification 13-31Table Recoding 5-17

transitivityResolve Utility 16-27

Trillium Types A-6TS Discovery 3-12TS Quality Analyzer 8-3

Cleansing 8-4Master Database 8-8Matching 8-4, 8-8Standardization 8-4

TypeDDL Editor 3-10

UUnderscores

in city name changes 7-14Unmatch Master File

File Update Utility 16-14Update ORIGINAL_RECORD Length

DDL Editor 3-10US City Problems

Parsing Customization 7-34User Rule

Data Reconstructor 13-22User-Defined Attributes

Parsing Customization 7-10Using Multiple Input Files to Create an

Output DDL 5-7


I-10 Index

VView input data

Data Browser 3-3

WWindow Key

Sort 10-10Window Key Field 10-7Window Key Generator 10-3

Window Key Rule 10-3Window Keys 10-3

Window Key Rule

Window Key Generator 10-3Window Key Rules

definition 10-6Window Keys

Window Key Generator 10-3Window Linking

Relationship Linker 10-13, 10-18Window Size

Relationship Linker 10-22Word Pattern Definition File 6-17

Business Data Parser 6-32


creating and working with ts quality projectsdavidhoyle.com/samples/ts_quality_userguide.pdfopening...

Documents