student guide sap data services xi 30 - data integrator
TRANSCRIPT
Student Guide
SAP Data Services XI 3.0 –
Data Integrator
SAP Data Services – Data Integrator XI 3.0
2
iii
SAP Data Services – Data Integrator XI 3.0
C O N T E N T S
About this Course Course introduction...................................................................................................xiii
Course description.....................................................................................................xiv
Course audience.........................................................................................................xiv
Prerequisites................................................................................................................xiv
Additional education.................................................................................................xiv
Level, delivery, and duration....................................................................................xv
Course success factors.................................................................................................xv
Course setup.................................................................................................................xv
Course materials..........................................................................................................xv
Learning process .........................................................................................................xv
Lesson 1
Describing Data Services Lesson introduction.......................................................................................................1
Describing the purpose of Data Services....................................................................2
Describing Data Services benefits .......................................................................2
Understanding data integration processes.........................................................2
Understanding the Data Services packages.......................................................3
Describing Data Services architecture........................................................................5
Defining Data Services components ...................................................................5
Describing the Designer .......................................................................................6
Describing the repository .....................................................................................7
Describing the Job Server......................................................................................8
Describing the engines........................................................................................12
Describing the Access Server..............................................................................12
Describing the adapters.......................................................................................12
Describing the real-time services ......................................................................12
Describing the Address Server...........................................................................13
Describing the Global Parsing Options, dictionaries, and directories.........13
Describing the Management Console ..............................................................13
Defining other Data Services tools.....................................................................14
Defining Data Services objects...................................................................................16
Understanding Data Services objects ...............................................................16
Defining relationship between objects .............................................................17
Defining projects and jobs ..................................................................................18
Using work flows.................................................................................................18
iv
SAP Data Services – Data Integrator XI 3.0
Describing the object hierarchy .........................................................................19
Using the Data Services Designer interface.............................................................21
Describing the Designer window .....................................................................21
Using the Designer toolbar ................................................................................22
Using the Local Object Library ..........................................................................23
Using the project area .........................................................................................24
Using the tool palette ..........................................................................................26
Using the workspace ...........................................................................................27
Quiz: Describing Data Services .................................................................................28
Lesson summary..........................................................................................................29
Lesson 2
Defining Source and Target Metadata Lesson introduction.....................................................................................................31
Using datastores...........................................................................................................32
Explaining datastores .........................................................................................32
Using adapters .....................................................................................................33
Creating a database datastore ...........................................................................33
Changing a datastore definition ........................................................................34
Importing metadata from data sources ............................................................35
Importing metadata by browsing .....................................................................36
Activity: Creating source and target datastores..............................................37
Using datastore and system configurations.............................................................41
Creating multiple configurations in a datastore .............................................41
Creating a system configuration .......................................................................44
Defining file formats for flat files..............................................................................46
Explaining file formats .......................................................................................46
Creating file formats ...........................................................................................46
Handling errors in file formats ..........................................................................50
Activity: Creating a file format for a flat file....................................................51
Defining file formats for Excel files...........................................................................53
Using Excel as a native data source ..................................................................53
Activity: Creating a file format for an Excel file..............................................55
Defining file formats for XML files...........................................................................57
Importing data from XML documents..............................................................57
Importing metadata from a DTD file................................................................57
Importing metadata from an XML schema......................................................60
Explaining nested data........................................................................................62
Unnesting data......................................................................................................64
Quiz: Defining source and target metadata.............................................................66
Lesson summary..........................................................................................................67
v
SAP Data Services – Data Integrator XI 3.0
Lesson 3
Creating Batch Jobs Lesson introduction.....................................................................................................69
Working with objects..................................................................................................70
Creating a project ................................................................................................70
Creating a job .......................................................................................................72
Adding, connecting, and deleting objects in the workspace ........................73
Creating a work flow ..........................................................................................73
Defining the order of execution in work flows ...............................................74
Creating a data flow....................................................................................................76
Using data flows ..................................................................................................76
Using data flows as steps in work flows ..........................................................76
Changing data flow properties .........................................................................77
Explaining source and target objects ................................................................78
Adding source and target objects .....................................................................79
Using the Query transform........................................................................................81
Describing the transform editor ........................................................................81
Explaining the Query transform .......................................................................82
Using target tables.......................................................................................................86
Setting target table options ................................................................................86
Using template tables .........................................................................................89
Executing the job..........................................................................................................93
Explaining job execution ....................................................................................93
Setting execution properties ..............................................................................93
Executing the job .................................................................................................94
Activity: Creating a basic data flow...................................................................96
Quiz: Creating batch jobs............................................................................................99
Lesson summary........................................................................................................100
Lesson 4
Troubleshooting Batch Jobs Lesson introduction...................................................................................................101
Using descriptions and annotations........................................................................102
Using descriptions with objects.......................................................................102
Using annotations to describe objects ............................................................103
Validating and tracing jobs......................................................................................104
Validating jobs ...................................................................................................104
Tracing jobs ........................................................................................................105
Using log files ....................................................................................................108
Examining trace logs .........................................................................................108
Examining monitor logs ...................................................................................109
Examining error logs ........................................................................................109
Using the Monitor tab .......................................................................................110
vi
SAP Data Services – Data Integrator XI 3.0
Using the Log tab ..............................................................................................110
Determining the success of the job .................................................................111
Activity: Setting traces and adding annotations............................................112
Using View Data and the Interactive Debugger...................................................113
Using View Data with sources and targets ...................................................113
Using the Interactive Debugger ......................................................................115
Setting filters and breakpoints for a debug session ......................................117
Activity: Using the Interactive Debugger.......................................................119
Setting up auditing....................................................................................................121
Setting up auditing.............................................................................................121
Defining audit points.........................................................................................121
Defining audit labels..........................................................................................122
Defining audit rules...........................................................................................122
Defining audit actions.......................................................................................123
Choosing audit points.......................................................................................126
Activity: Using auditing in a data flow...........................................................127
Quiz: Troubleshooting batch jobs ...........................................................................128
Lesson summary........................................................................................................129
Lesson 5
Using Functions, Scripts, and Variables Lesson introduction...................................................................................................131
Defining built-in functions.......................................................................................132
Defining functions .............................................................................................132
Listing the types of operations for functions ................................................132
Defining other types of functions ...................................................................134
Using functions in expressions................................................................................136
Defining functions in expressions ...................................................................136
Activity: Using the search_replace function...................................................139
Using the lookup function........................................................................................141
Using lookup tables ..........................................................................................141
Activity: Using the lookup_ext() function......................................................144
Using the decode function........................................................................................146
Explaining the decode function ......................................................................146
Activity: Using the decode function ...............................................................148
Using scripts, variables, and parameters................................................................150
Defining scripts ..................................................................................................150
Defining variables .............................................................................................150
Defining parameters .........................................................................................151
Combining scripts, variables, and parameters ..............................................151
Defining global versus local variables ...........................................................151
Setting global variables using job properties ................................................156
Defining substitution parameters....................................................................156
Using Data Services scripting language.................................................................159
Using basic syntax .............................................................................................159
vii
SAP Data Services – Data Integrator XI 3.0
Using syntax for column and table references in expressions ....................159
Using operators .................................................................................................160
Reviewing script examples ..............................................................................161
Using strings and variables ..............................................................................161
Using quotation marks .....................................................................................161
Using escape characters ....................................................................................162
Handling nulls, empty strings, and trailing blanks .....................................162
Scripting a custom function......................................................................................166
Creating a custom function ..............................................................................166
Importing a stored procedure as a function ..................................................169
Activity: Creating a custom function..............................................................170
Quiz: Using functions, scripts, and variables........................................................173
Lesson summary........................................................................................................174
Lesson 6
Using Platform Transforms Lesson introduction...................................................................................................175
Describing platform transforms..............................................................................176
Explaining transforms ......................................................................................176
Describing platform transforms ......................................................................177
Using the Map Operation transform.......................................................................178
Describing map operations...............................................................................178
Explaining the Map Operation transform .....................................................179
Activity: Using the Map Operation transform...............................................180
Using the Validation transform...............................................................................181
Explaining the Validation transform ..............................................................181
Activity: Using the Validation transform.......................................................186
Using the Merge transform......................................................................................190
Explaining the Merge transform .....................................................................190
Activity: Using the Merge transform..............................................................191
Using the Case transform.........................................................................................194
Explaining the Case transform ........................................................................194
Activity: Using the Case transform.................................................................197
Using the SQL transform..........................................................................................199
Explaining the SQL transform .........................................................................199
Activity: Using the SQL transform..................................................................201
Quiz: Using platform transforms............................................................................203
Lesson summary........................................................................................................204
Lesson 7
Setting up Error Handling Lesson introduction...................................................................................................205
Using recovery mechanisms....................................................................................206
Avoiding data recovery situations..................................................................206
viii
SAP Data Services – Data Integrator XI 3.0
Describing levels of data recovery strategies ................................................207
Configuring work flows and data flows ........................................................207
Using recovery mode ........................................................................................208
Recovering from partially-loaded data ..........................................................209
Recovering missing values or rows ................................................................209
Defining alternative work flows .....................................................................210
Using try/catch blocks and automatic recovery ..........................................212
Activity: Creating an alternative work flow ..................................................217
Quiz: Setting up error handling ..............................................................................220
Lesson summary........................................................................................................221
Lesson 8
Capturing Changes in Data Lesson introduction...................................................................................................223
Updating data over time...........................................................................................224
Explaining Slowly Changing Dimensions (SCD) .........................................224
Updating changes to data ................................................................................226
Explaining history preservation and surrogate keys ...................................227
Comparing source-based and target-based CDC .........................................228
Using source-based CDC..........................................................................................229
Using source tables to identify changed data................................................229
Using CDC with timestamps............................................................................229
Managing overlaps.............................................................................................233
Activity: Using source-based CDC..................................................................234
Using target-based CDC...........................................................................................237
Using target tables to identify changed data .................................................237
Identifying history preserving transforms ....................................................238
Explaining the Table Comparison transform.................................................238
Explaining the History Preserving transform ...............................................241
Explaining the Key Generation transform .....................................................244
Activity: Using target-based CDC ..................................................................245
Quiz: Capturing changes in data ............................................................................247
Lesson summary........................................................................................................248
Lesson 9
Using Data Integrator Transforms Lesson introduction...................................................................................................249
Describing Data Integrator transforms...................................................................250
Defining Data Integrator transforms ..............................................................250
Using the Pivot transform........................................................................................251
Explaining the Pivot transform .......................................................................251
Activity: Using the Pivot transform.................................................................254
Using the Hierarchy Flattening transform.............................................................255
Explaining the Hierarchy Flattening transform.............................................255
ix
SAP Data Services – Data Integrator XI 3.0
Activity: Using the Hierarchy Flattening transform.....................................257
Describing performance optimization....................................................................262
Describing push-down operations .................................................................262
Viewing SQL generated by a data flow .........................................................264
Caching data ......................................................................................................264
Slicing processes.................................................................................................265
Using the Data Transfer transform.........................................................................266
Explaining the Data Transfer transform.........................................................266
Activity: Using the Data Transfer transform..................................................267
Using the XML Pipeline transform.........................................................................269
Explaining the XML Pipeline transform.........................................................269
Activity: Using the XML Pipeline transform..................................................270
Quiz: Using Data Integrator transforms.................................................................273
Lesson summary........................................................................................................274
Answer Key Quiz: Describing Data Services ...............................................................................277
Quiz: Defining source and target metadata...........................................................278
Quiz: Creating batch jobs..........................................................................................279
Quiz: Troubleshooting batch jobs ...........................................................................280
Quiz: Using functions, scripts, and variables........................................................281
Quiz: Using platform transforms............................................................................282
Quiz: Setting up error handling ..............................................................................283
Quiz: Capturing changes in data ............................................................................284
Quiz: Using Data Integrator transforms.................................................................285
SAP Data Services – Data Integrator XI 3.0
X BusinessObjects Data Integrator XI 3.0: Core Concepts-Learner's Guide
1
SAP Data Services – Data Integrator XI 3.0
Lesson 1
Describing Data Services
Lesson introduction Data Services is a graphical interface for creating and staging jobs for data integration and data
quality purposes.
After completing this lesson, you will be able to:
• Describe the purpose of Data Services
• Describe Data Services architecture
• Define Data Services objects
• Use the Data Services Designer interface
2
SAP Data Services – Data Integrator XI 3.0
Describing the purpose of Data Services
Introduction BusinessObjects Data Services provides a graphical interface that allows you to easily create
jobs that extract data from heterogeneous sources, transform that data to meet the business
requirements of your organization, and load the data into a single location.
Note: Although Data Services can be used for both real-time and batch jobs, this course covers
batch jobs only.
After completing this unit, you will be able to:
• List the benefits of Data Services
• Describe data integration processes
• Describe the functionality available in Data Services packages
Describing Data Services benefits The Business Objects Data Services platform enables you to perform enterprise-level data
integration and data quality functions. With Data Services, your enterprise can:
• Create a single infrastructure for data movement to enable faster and lower cost
implementation.
• Manage data as a corporate asset independent of any single system.
• Integrate data across many systems and re-use that data for many purposes.
• Improve performance.
• Reduce burden on enterprise systems.
• Prepackage data solutions for fast deployment and quick return on investment (ROI).
• Cleanse customer and operational data anywhere across the enterprise.
• Enhance customer and operational data by appending additional information to increase
the value of the data.
• Match and consolidate data at multiple levels within a single pass for individuals, households,
or corporations.
Understanding data integration processes Data Services combines both batch and real-time data movement and management with
intelligent caching to provide a single data integration platform for information management
from any information source and for any information use. This unique combination allows you
to:
• Stage data in an operational datastore, data warehouse, or data mart.
• Update staged data in batch or real-time modes.
• Create a single environment for developing, testing, and deploying the entire data integration
platform.
3
SAP Data Services – Data Integrator XI 3.0
• Manage a single metadata repository to capture the relationships between different extraction
and access methods and provide integrated lineage and impact analysis.
Data Services performs three key functions that can be combined to create a scalable,
high-performance data platform. It:
• Loads Enterprise Resource Planning (ERP) or enterprise application data into an operational
datastore (ODS) or analytical data warehouse, and updates in batch or real-time modes.
• Creates routing requests to a data warehouse or ERP system using complex rules.
• Applies transactions against ERP systems.
Data mapping and transformation can be defined using the Data Services Designer graphical
user interface. Data Services automatically generates the appropriate interface calls to access
the data in the source system.
For most ERP applications, Data Services generates SQL optimized for the specific target
database (Oracle, DB2, SQL Server, Informix, and so on). Automatically-generated, optimized
code reduces the cost of maintaining data warehouses and enables you to build data solutions
quickly, meeting user requirements faster than other methods (for example, custom-coding,
direct-connect calls, or PL/SQL).
Data Services can apply data changes in a variety of data formats, including any custom format
using a Data Services adapter. Enterprise users can apply data changes against multiple
back-office systems singularly or sequentially. By generating calls native to the system in
question, Data Services makes it unnecessary to develop and maintain customized code to
manage the process.
You can also design access intelligence into each transaction by adding flow logic that checks
values in a data warehouse or in the transaction itself before posting it to the target ERP system.
Understanding the Data Services packages Data Services provides a wide range of functionality, depending on the package and options
selected:
• Data Integrator packages provide platform transforms for core functionality, and Data
Integrator transforms to enhance data integration projects.
• Data Quality packages provide platform transforms for core functionality, and Data Quality
transforms to parse, standardize, cleanse, enhance, match, and consolidate data.
• Data Services packages provide all of the functionality of both the Data Integrator and Data
Quality packages.
When your Data Services projects are based on enterprise applications such as SAP, PeopleSoft,
Oracle, JD Edwards, Salesforce.com, and Siebel, BusinessObjects Rapid Marts provide specialized
versions of Data Services functionality. Rapid Marts combine domain knowledge with data
integration best practices to deliver prebuilt data models, transformation logic, and data
extraction. Rapid Marts are packaged, powerful, and flexible data integration solutions that
help organizations:
• Jumpstart business intelligence deployments and accelerate time to value
• Deliver best-practice data warehousing solutions
4 BusinessObjects Data Integrator XI 3.0: Core Concepts-Learner's Guide
SAP Data Services – Data Integrator XI 3.0
• Develop custom solutions to meet your unique requirements
Describing Data Services—Learner’s Guide 5
SAP Data Services – Data Integrator XI 3.0
Describing Data Services architecture
Introduction Data Services relies on several unique components to accomplish the data integration and data
quality activities required to manage your corporate data.
After completing this unit, you will be able to:
• Describe standard Data Services components
• Describe Data Services management tools
Defining Data Services components Data Services includes the following standard components:
• Designer
• Repository
• Job Server
• Engines
• Access Server
• Adapters
• Real-time Services
• Address Server
• Global Parsing Options, Dictionaries, and Directories
• Management Console
This diagram illustrates the relationships between these components:
6 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Describing the Designer Data Services Designer is a Windows client application used to create, test, and manually
execute jobs that transform data and populate a data warehouse. Using Designer, you create
data management applications that consist of data mappings, transformations, and control
logic.
You can create objects that represent data sources, and then drag, drop, and configure them in
flow diagrams.
Designer allows you to manage metadata stored in a local repository. From the Designer, you
can also trigger the Job Server to run your jobs for initial application testing.
To log in to Designer
1. From the Start menu, click Programs ➤ BusinessObjects XI 3.0 ➤ BusinessObjects Data
Services ➤ Data Services Designer to launch Designer.
The path may be different, depending on how the product was installed. 2. In the BusinessObjects Data Services Repository Login dialog box, enter the connection
information for the local repository.
3. Click OK.
4. To verify the Job Server is running in Designer, hover the cursor over the Job Server icon in
the bottom right corner of the screen.
The details for the Job Server display in the status bar in the lower left portion of the screen.
Describing Data Services—Learner’s Guide 7
SAP Data Services – Data Integrator XI 3.0
Describing the repository The Data Services repository is a set of tables that holds user-created and predefined system
objects, source and target metadata, and transformation rules. It is set up on an open
client/server platform to facilitate sharing metadata with other enterprise tools. Each repository
is stored on an existing Relational Database Management System (RDBMS).
There are three types of repositories:
• A local repository (known in Designer as the Local Object Library) is used by an application
designer to store definitions of source and target metadata and Data Services objects.
• A central repository (known in Designer as the Central Object Library) is an optional
component that can be used to support multi-user development. The Central Object Library
provides a shared library that allows developers to check objects in and out for development.
• A profiler repository is used to store information that is used to determine the quality of
data.
Each repository is associated with one or more Data Services Job Servers. To create a local repository
1. From the Start menu, click Programs ➤ BusinessObjects XI 3.0 ➤ BusinessObjects Data
Services ➤ Data Services Repository Manager to launch the Repository Manager.
The path may be different, depending on how the product was installed. 2. In the BusinessObjects Data Services Repository Manager dialog box, enter the connection
information for the local repository.
8 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
3. Create Create.
You may need to confirm that you want to overwrite the existing repository, if it already
exists.
If you select the Show Details check box, you can see the SQL that is applied to create the
repository.
System messages confirm that the local repository is created.
4. To see the version of the repository, click Get Version.
The version displays in the pane at the bottom of the dialog box. Note that the version
number refers only to the last major point release number.
5. Click Close.
Describing the Job Server Each repository is associated with at least one Data Services Job Server, which retrieves the job
from its associated repository and starts the data movement engine. The data movement engine
integrates data from multiple heterogeneous sources, performs complex data transformations,
and manages extractions and transactions from ERP systems and other sources. The Job Server
can move data in batch or real-time mode and uses distributed query optimization,
Describing Data Services—Learner’s Guide 9
SAP Data Services – Data Integrator XI 3.0
multithreading, in-memory caching, in-memory data transformations, and parallel processing
to deliver high data throughput and scalability.
While designing a job, you can run it from the Designer. In your production environment, the
Job Server runs jobs triggered by a scheduler or by a real-time service managed by the Data
Services Access Server. In production environments, you can balance job loads by creating a
Job Server Group (multiple Job Servers), which executes jobs according to overall system load.
Data Services provides distributed processing capabilities through the Server Groups. A Server
Group is a collection of Job Servers that each reside on different Data Services server computers.
Each Data Services server can contribute one, and only one, Job Server to a specific Server
Group. Each Job Server collects resource utilization information for its computer. This
information is utilized by Data Services to determine where a job, data flow or sub-data flow
(depending on the distribution level specified) should be executed.
To verify the connection between repository and Job Server
1. From the Start menu, click Programs ➤ BusinessObjects XI 3.0 ➤ BusinessObjects Data
Services ➤ Data Services Server Manager to launch the Server Manager.
The path may be different, depending on how the product was installed.
10 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
2. In the BusinessObjects Data Services Server Manager dialog box, click Edit Job Server
Config.
3. In the Job Server Configuration Editor dialog box, select the Job Server.
Describing Data Services—Learner’s Guide 11
SAP Data Services – Data Integrator XI 3.0
4. Click Resync with Repository.
5. In the Job Server Properties dialog box, select the repository.
12 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
6. Click Resync.
A system message displays indicating that the Job Server will be resynchronized with the
selected repository.
7. Click OK to acknowledge the warning message.
8. In the Password field, enter the password for the repository.
9. Click Apply.
10. Click OK to close the Job Server Properties dialog box.
11. Click OK to close the Job Server Configuration Editor dialog box.
12. In the BusinessObjects Data Services Server Manager dialog box, click Restart to restart
the Job Server.
A system message displays indicating that the Job Server will be restarted.
13. Click OK.
Describing the engines When Data Services jobs are executed, the Job Server starts Data Services engine processes to
perform data extraction, transformation, and movement. Data Services engine processes use
parallel processing and in-memory data transformations to deliver high data throughput and
scalability.
Describing the Access Server The Access Server is a real-time, request-reply message broker that collects incoming XML
message requests, routes them to a real-time service, and delivers a message reply within a
user-specified time frame. The Access Server queues messages and sends them to the next
available real-time service across any number of computing resources. This approach provides
automatic scalability because the Access Server can initiate additional real-time services on
additional computing resources if traffic for a given real-time service is high.
You can configure multiple Access Servers.
Describing the adapters Adapters are additional Java-based programs that can be installed on the job server to provide
connectivity to other systems such as Salesforce.com or the Java Messaging Queue. There is
also a Software Development Kit (SDK) to allow customers to create adapters for custom
applications.
Describing the real-time services The Data Services real-time client communicates with the Access Server when processing
real-time jobs. Real-time services are configured in the Data Services Management Console.
Describing Data Services—Learner’s Guide 13
SAP Data Services – Data Integrator XI 3.0
Describing the Address Server The Address Server is used specifically for processing European addresses using the Data
Quality Global Address Cleanse transform. It provides access to detailed address line information
for most European countries.
Describing the Global Parsing Options, dictionaries, and directories The Data Quality Global Parsing Options, dictionaries, and directories provide referential data
for the Data Cleanse and Address Cleanse transforms to use when parsing, standardizing, and
cleansing name and address data.
Global Parsing Options are packages that enhance the ability of Data Cleanse to accurately
process various forms of global data by including language-specific reference data and parsing
rules. Directories provide information on addresses from postal authorities; dictionary files
are used to identify, parse, and standardize data such as names, titles, and firm data. Dictionaries
also contain acronym, match standard, gender, capitalization, and address information.
Describing the Management Console The Data Services Management Console provides access to the following features:
• Administrator
• Auto Documentation
• Data Validation
• Impact and Lineage Analysis
• Operational Dashboard
• Data Quality Reports
Administrator Administer Data Services resources, including:
• Scheduling, monitoring, and executing batch jobs
• Configuring, starting, and stopping real-time services
• Configuring Job Server, Access Server, and repository usage
• Configuring and managing adapters
• Managing users
• Publishing batch jobs and real-time services via web services
• Reporting on metadata
Auto Documentation View, analyze, and print graphical representations of all objects as depicted in Data Services
Designer, including their relationships, properties, and more.
14 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Data Validation Evaluate the reliability of your target data based on the validation rules you create in your Data
Services batch jobs in order to quickly review, assess, and identify potential inconsistencies or
errors in source data.
Impact and Lineage Analysis Analyze end-to-end impact and lineage for Data Services tables and columns, and Business
Objects Enterprise objects such as universes, business views, and reports.
Operational Dashboard View dashboards of status and performance execution statistics of Data Services jobs for one
or more repositories over a given time period.
Data Quality Reports Use data quality reports to view and export Crystal reports for batch and real-time jobs that
include statistics-generating transforms. Report types include job summaries, transform-specific
reports, and transform group reports.
To generate reports for Match, US Regulatory Address Cleanse, and Global Address Cleanse
transforms, you must enable the Generate report data option in the Transform Editor.
Defining other Data Services tools There are also several tools to assist you in managing your Data Services installation.
Describing the Repository Manager The Data Services Repository Manager allows you to create, upgrade, and check the versions
of local, central, and profiler repositories.
Describing the Server Manager The Data Services Server Manager allows you to add, delete, or edit the properties of Job Servers.
It is automatically installed on each computer on which you install a Job Server.
Use the Server Manager to define links between Job Servers and repositories. You can link
multiple Job Servers on different machines to a single repository (for load balancing) or each
Job Server to multiple repositories (with one default) to support individual repositories (for
example, separating test and production environments).
Describing the License Manager The License Manager displays the Data Services components for which you currently have a
license.
Describing Data Services-Learner's Guide 15
SAP Data Services – Data Integrator XI 3.0
Describing the Metadata Integrator
The Metadata Integrator allows Data Services to seamlessly share metadata with Business
Objects Intelligence products. Run the Metadata Integrator to collect metadata into the Data
Services repository for Business Views and Universes used by Crystal Reports, Desktop
Intelligence documents, and Web Intelligence documents.
16 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Defining Data Services objects
Introduction Data Services provides you with a variety of objects to use when you are building your data
integration and data quality applications.
After completing this unit, you will be able to:
• Define the objects available in Data Services
• Explain relationships between objects
Understanding Data Services objects In Data Services, all entities you add, define, modify, or work with are objects. Some of the
most frequently-used objects are:
• Projects
• Jobs
• Work flows
• Data flows
• Transforms
• Scripts
This diagram shows some common objects.
All objects have options, properties, and classes. Each can be modified to change the behavior
of the object.
Describing Data Services—Learner’s Guide 17
SAP Data Services – Data Integrator XI 3.0
Options Options control the object. For example, to set up a connection to a database, the database name
is an option for the connection.
Properties Properties describe the object. For example, the name and creation date describe what the object
is used for and when it became active. Attributes are properties used to locate and organize
objects.
Classes Classes define how an object can be used. Every object is either re-usable or single-use.
Single-use objects Single-use objects appear only as components of other objects. They operate only in the context
in which they were created.
Note: You cannot copy single-use objects.
Re-usable objects A re-usable object has a single definition and all calls to the object refer to that definition. If
you change the definition of the object in one place, and then save the object, the change is
reflected to all other calls to the object.
Most objects created in Data Services are available for re-use. After you define and save a
re-usable object, Data Services stores the definition in the repository. You can then re-use the
definition as often as necessary by creating calls to it.
For example, a data flow within a project is a re-usable object. Multiple jobs, such as a weekly
load job and a daily load job, can call the same data flow. If this data flow is changed, both jobs
call the new version of the data flow.
You can edit re-usable objects at any time independent of the current open project. For example,
if you open a new project, you can open a data flow and edit it. However, the changes you
make to the data flow are not stored until you save them.
Defining relationship between objects Jobs are composed of work flows and/or data flows:
• A work flow is the incorporation of several data flows into a sequence.
• A data flow is the process by which source data is transformed into target data.
A work flow orders data flows and the operations that support them. It also defines the
interdependencies between data flows.
For example, if one target table depends on values from other tables, you can use the work
flow to specify the order in which you want Data Services to populate the tables. You can also
18 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
use work flows to define strategies for handling errors that occur during project execution, or
to define conditions for running sections of a project.
This diagram illustrates a typical work flow.
A data flow defines the basic task that Data Services accomplishes, which involves moving
data from one or more sources to one or more target tables or files. You define data flows by
identifying the sources from which to extract data, the transformations the data should undergo,
and targets.
Defining projects and jobs A project is the highest-level object in Designer. Projects provide a way to organize the other
objects you create in Designer. Only one project can be open and visible in the project area at
a time.
A job is the smallest unit of work that you can schedule independently for execution. A project
is a single-use object that allows you to group jobs. For example, you can use a project to group
jobs that have schedules that depend on one another or that you want to monitor together.
Projects have the following characteristics:
• Projects are listed in the Local Object Library.
• Only one project can be open at a time.
• Projects cannot be shared among multiple users.
The objects in a project appear hierarchically in the project area. If a plus sign (+) appears next
to an object, you can expand it to view the lower-level objects contained in the object. Data
Services displays the contents as both names and icons in the project area hierarchy and in the
workspace.
Note: Jobs must be associated with a project before they can be executed in the project area of
Designer.
Using work flows Jobs with data flows can be developed without using work flows. However, one should consider
nesting data flows inside of work flows by default. This practice can provide various benefits.
Describing Data Services—Learner’s Guide 19
SAP Data Services – Data Integrator XI 3.0
Always using work flows makes jobs more adaptable to additional development and/or
specification changes. For instance, if a job initially consists of four data flows that are to run
sequentially, they could be set up without work flows. But what if specification changes require
that they be merged into another job instead? The developer would have to replicate their
sequence correctly in the other job. If these had been initially added to a work flow, the developer
could then have simply copied that work flow into the correct position within the new job.
There would be no need to learn, copy, and verify the previous sequence. The change can be
made more quickly with greater accuracy.
Even if there is one data flow per work flow, there are benefits to adaptability. Initially, it may
have been decided that recovery units are not important; the expectation being that if the job
fails, the whole process could simply be rerun. However, as data volumes tend to increase, it
may be determined that a full reprocessing is too time consuming. The job may then be changed
to incorporate work flows to benefit from recovery units to bypass reprocessing of successful
steps. However, these changes can be complex and can consume more time than allotted for
in a project plan. It also opens up the possibility that units of recovery are not properly defined.
Setting these up during initial development when the nature of the processing is being most
fully analyzed is preferred.
Describing the object hierarchy In the repository, objects are grouped hierarchically from a project, to jobs, to optional work
flows, to data flows. In jobs, work flows define a sequence of processing steps, and data flows
move data from source tables to target tables.
This illustration shows the hierarchical relationships for the key object types within Data
Services:
20 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
This course focuses on creating batch jobs using database datastores and file formats.
Describing Data Services—Learner’s Guide 21
SAP Data Services – Data Integrator XI 3.0
Using the Data Services Designer interface
Introduction The Data Services Designer interface allows you to plan and organize your data integration
and data quality jobs in a visual way. Most of the components of Data Services can be
programmed through this interface.
After completing this unit, you will be able to:
• Explain how Designer is used
• Describe key areas in the Designer window
Describing the Designer window The Data Services Designer interface consists of a single application window and several
embedded supporting windows. The application window contains the menu bar, toolbar, Local
Object Library, project area, tool palette, and workspace.
22 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Tip: You can access the Data Services Technical Manuals for reference or help through the Designer
interface Help menu. These manuals are also accessible by going through Start ➤ Programs ➤
Business Objects XI 3.0 ➤ BusinessObjects Data Services ➤ Data Services Documentation ➤
Technical Manuals.
Using the Designer toolbar In addition to many of the standard Windows toolbar buttons, Data Services provides the
following unique toolbar buttons:
Button Tool Description
Close All Windows Closes all open windows in the workspace.
Local Object Library Opens and closes the Local Object Library window.
Central Object Library Opens and closes the Central Object Library window.
Variables Opens and closes the Variables and Parameters window.
Project Area Open and closes the project area.
Output Opens and closes the Output window.
View Enabled
Descriptions
Enables the system-level setting for viewing object
descriptions in the workspace.
Validate Current View
Validates the object definition open in the active tab of
the workspace. Other objects included in the definition
are also validated.
Validate All Objects in
View
Validates all object definitions open in the workspace.
Objects included in the definition are also validated.
Audit Opens the Audit window. You can collect audit statistics
on the data that flows out of any Data Services object.
View Where Used
Opens the Output window, which lists parent objects
(such as jobs) of the object currently open in the
workspace (such as a data flow).
Back Moves back in the list of active workspace windows.
Forward Move forward in the list of active workspace windows.
Describing Data Services—Learner’s Guide 23
SAP Data Services – Data Integrator XI 3.0
Button Tool Description
Data Services
Opens and closes the Data Services Management Console,
which provides access to Administrator, Auto
Management Console Documentation, Data Validation, Lineage and Impact Analysis, Operational Dashboard, and Data Quality Reports.
Assess and Monitor Opens Data Insight, which allows you to assess and
monitor the quality of your data.
Contents Opens the Data Services Technical Manuals.
Using the Local Object Library The Local Object Library gives you access to the object types listed in the table below. The table
shows the tab on which the object type appears in the Local Object Library and describes the
Data Services context in which you can use each type of object.
Tab Description
Projects are sets of jobs available at a given time.
Jobs are executable work flows. There are two job types: batch jobs and real-time
jobs.
Work flows order data flows and the operations that support data flows, defining
the interdependencies between them.
Data flows describe how to process a task.
Transforms operate on data, producing output data sets from the sources you specify.
The Local Object Library lists both platform, Data Integrator, and Data Quality
transforms.
Datastores represent connections to databases and applications used in your project.
Under each datastore is a list of the tables, documents, and functions imported into
Data Services
Formats describe the structure of a flat file, Excel file, XML file, or XML message.
Custom functions are functions written in the Data Services Scripting Language.
24 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
You can import objects to and export objects from your Local Object Library as a file. Importing
objects from a file overwrites existing objects with the same names in the destination Local
Object Library.
Whole repositories can be exported in either .atl or .xml format. Using the .xml file format can
make repository content easier for you to read. It also allows you to export Data Services to
other products.
To import a repository from a file 1. On any tab of the Local Object Library, right-click the white space and select Repository
➤ Import from File from the menu.
The Open Import File dialog box displays.
2. Browse to the destination for the file.
3. Click Open.
A warning message displays to let you know that it takes a long time to create new versions
of existing objects.
4. Click OK.
You must restart Data Services after the import process completes. To export a repository to a file
1. On any tab of the Local Object Library, right-click the white space and select Repository
➤ Export To File.
The Write Repository Export File dialog box displays.
2. Browse to the destination for the export file.
3. In the File name field, enter the name of the export file.
4. In the Save as type list, select the file type for your export file.
5. Click Save.
The repository is exported to the file.
Using the project area The project area provides a hierarchical view of the objects used in each project. Tabs on the
bottom of the project area support different tasks. Tabs include:
Tab Description
Create, view, and manage projects.
This provides a hierarchical view of all objects used in each project.
View the status of currently executing jobs.
Describing Data Services—Learner’s Guide 25
SAP Data Services – Data Integrator XI 3.0
Tab Description
Selecting a specific job execution displays its status, including which steps are
complete and which steps are executing. These tasks can also be done using the
Data Services Management Console.
View the history of complete jobs.
Logs can also be viewed with the Data Services Management Console.
To change the docked position of the project area 1. Right-click the border of the project area.
2. From the menu, select Floating.
3. Click and drag the project area to dock and undock at any edge within Designer.
When you drag the project area away from a window edge, it stays undocked. When you
position the project area where one of the directional arrows highlights a portion of the
window, this signifies a placement option. The project area does not dock inside the
workspace area.
4. To switch between the last docked and undocked locations, double-click the gray border. To change the undocked position of the project area
1. Right-click the border of the project area.
2. From the menu, select Floating to remove the check mark and clear the docking option.
3. Click and drag the project area to any location on your screen. To lock and unlock the project area
1. Click the pin icon ( ) on the border to unlock the project area.
The project area hides.
2. Move the mouse over the docked pane.
The project area re-appears.
3. Click the pin icon to lock the pane in place again. To hide/show the project area
1. Right-click the border of the project area.
2. From the menu, select Hide.
The project area disappears from the Designer window.
3. To show the project area, click Project Area in the toolbar.
26 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Using the tool palette The tool palette is a separate window that appears by default on the right edge of the Designer
workspace. You can move the tool palette anywhere on your screen or dock it on any edge of
the Designer window.
The icons in the tool palette allow you to create new objects in the workspace. The icons are
disabled when they are invalid entries to the diagram open in the workspace.
To show the name of each icon, hold the cursor over the icon until the tool tip for the icon
appears.
When you create an object from the tool palette, you are creating a new definition of an object.
If a new object is re-usable, it is automatically available in the Local Object Library after you
create it.
For example, if you select the data flow icon from the tool palette and define a new data flow
called DF1, you can later drag that existing data flow from the Local Object Library and add it
to another data flow called DF2.
The tool palette contains these objects:
Icon Tool Description Available in
Pointer
Returns the tool pointer to a selection
pointer for selecting and moving objects
in a diagram.
All objects
Work flow Creates a new work flow. Jobs and work flows
Data flow Creates a new data flow. Jobs and work flows
R/3 data flow Creates a new data flow with the SAP
licensed extension only.
SAP licensed extension
Query
transform
Creates a query to define column
mappings and row selections.
Data flows
Template table Creates a new table for a target. Data flows
Template XML Creates a new XML file for a target. Data flows
Data transport Create a data transport flow for the SAP
Licensed extension.
SAP Licensed
extension
Script Creates a new script object. Jobs and work flows
Conditional Creates a new conditional object. Jobs and work flows
Describing Data Services—Learner’s Guide 27
SAP Data Services – Data Integrator XI 3.0
Icon Tool While Loop
Description
Repeats a sequence of steps in a work flow
as long as a condition is true.
Available in Work flows
Try
Creates a new try object that tries an
alternate work flow if an error occurs in a
job.
Jobs and work flows
Catch Creates a new catch object that catches
errors in a job.
Jobs and work flows
Annotation Creates an annotation used to describe
objects.
Jobs, work flows, and
data flows
Using the workspace When you open a job or any object within a job hierarchy, the workspace becomes active with
your selection. The workspace provides a place to manipulate objects and graphically assemble
data movement processes.
These processes are represented by icons that you drag and drop into a workspace to create a
diagram. This diagram is a visual representation of an entire data movement application or
some part of a data movement application.
You specify the flow of data by connecting objects in the workspace from left to right in the
order you want the data to be moved.
28 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Quiz: Describing Data Services 1. List two benefits of using Data Services.
2. Which of these objects is single-use?
a. Job
b. Project
c. Data flow
d. Work flow
3. Place these objects in order by their hierarchy: data flows, jobs, projects, and work flows. 4. Which tool do you use to associate a job server with a repository?
5. Which tool allows you to create a repository?
6. What is the purpose of the Access Server?
Describing Data Services—Learner’s Guide 29
SAP Data Services – Data Integrator XI 3.0
Lesson summary After completing this lesson, you are now able to:
• Describe the purpose of Data Services
• Describe Data Services architecture
• Define Data Services objects
• Use the Data Services Designer interface
SAP Data Services – Data Integrator XI 3.0
30 BusinessObjects Data Integrator XI 3.0: Core Concepts-Learner's Guide
Defining Source and Target Metadata—Learner’s Guide 31
SAP Data Services – Data Integrator XI 3.0
Lesson 2
Defining Source and Target Metadata
Lesson introduction To define data movement requirements in Data Services, you must import source and target
metadata.
After completing this lesson, you will be able to:
• Use datastores
• Use datastore and system configurations
• Define file formats for flat files
• Define file formats for Excel files
• Define file formats for XML files
32 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Using datastores
Introduction Datastores represent connections between Data Services and databases or applications.
After completing this unit, you will be able to:
• Explain datastores
• Create a database datastore
• Change a datastore definition
• Import metadata
Explaining datastores A datastore provides a connection or multiple connections to data sources such as a database.
Through the datastore connection, Data Services can import the metadata that describes the
data from the data source.
Data Services uses these datastores to read data from source tables or load data to target tables.
Each source or target must be defined individually and the datastore options available depend
on which Relational Database Management System (RDBMS) or application is used for the
datastore. Database datastores can be created for the following sources:
• IBM DB2, Microsoft SQL Server, Oracle, Sybase, and Teradata databases (using native
connections)
• Other databases (through ODBC)
• A simple memory storage mechanism using a memory datastore
• IMS, VSAM, and various additional legacy systems using BusinessObjects Data Services
Mainframe Interfaces such as Attunity and IBM Connectors
The specific information that a datastore contains depends on the connection. When your
database or application changes, you must make corresponding changes in the datastore
information in Data Services. Data Services does not automatically detect structural changes
to the datastore.
There are three kinds of datastores:
• Database datastores: provide a simple way to import metadata directly from an RDBMS.
• Application datastores: let users easily import metadata from most Enterprise Resource
Planning (ERP) systems.
• Adapter datastores: can provide access to an application’s data and metadata or just metadata.
For example, if the data source is SQL-compatible, the adapter might be designed to access
metadata, while Data Services extracts data from or loads data directly to the application.
Defining Source and Target Metadata—Learner’s Guide 33
SAP Data Services – Data Integrator XI 3.0
Using adapters Adapters provide access to a third-party application’s data and metadata. Depending on the
adapter implementation, adapters can provide:
• Application metadata browsing
• Application metadata importing into the Data Services repository
For batch and real-time data movement between Data Services and applications, Business
Objects offers an Adapter Software Development Kit (SDK) to develop your own custom
adapters. You can also buy Data Services prepackaged adapters to access application data and
metadata in any application.
For more information on these adapters, see Chapter 5 in the Data Services Designer Guide.
You can use the Data Mart Accelerator for Crystal Reports adapter to import metadata from
BusinessObjects Enterprise. See the documentation folder under Adapters located in your Data Services installation for more information on the Data Mart Accelerator for Crystal Reports.
Creating a database datastore You need to create at least one datastore for each database file system with which you are
exchanging data. To create a datastore, you must have appropriate access privileges to the
database or file system that the datastore describes. If you do not have access, ask your database
administrator to create an account for you.
To create a database datastore 1. On the Datastores tab of the Local Object Library, right-click the white space and select New
from the menu.
The Create New Datastore dialog box displays. 2. In the Datastore name field, enter the name of the new datastore.
The name can contain any alphanumeric characters or underscores (_). It cannot contain
spaces.
3. In the Datastore Type drop-down list, ensure that the default value of Database is selected.
4. In the Database type drop-down list, select the RDBMS for the data source.
5. Enter the other connection details, as required.
The values you select for the datastore type and database type determine the options available
when you create a database datastore. The entries that you must make to create a datastore
depend on the selections you make for these two options. Note that if you are using MySQL,
any ODBC connection provides access to all of the available MySQL schemas.
6. Leave the Enable automatic data transfer check box selected.
34 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
7. Click OK.
Changing a datastore definition Like all Data Services objects, datastores are defined by both options and properties:
• Options control the operation of objects. These include the database server name, database
name, user name, and password for the specific database.
The Edit Datastore dialog box allows you to edit all connection properties except datastore
name and datastore type for adapter and application datastores. For database datastores,
you can edit all connection properties except datastore name, datastore type, database type,
and database version.
• Properties document the object. For example, the name of the datastore and the date on
which it is created are datastore properties. Properties are descriptive of the object and do
not affect its operation.
Properties Tab Description
General
Attributes
Contains the name and description of the datastore, if available. The
datastore name appears on the object in the Local Object Library and
in calls to the object. You cannot change the name of a datastore after
creation.
Includes the date you created the datastore. This value cannot be
changed.
Class Attributes Includes overall datastore information such as description and date
created.
Defining Source and Target Metadata—Learner’s Guide 35
SAP Data Services – Data Integrator XI 3.0
To change datastore options 1. On the Datastores tab of the Local Object Library, right-click the datastore name and select
Edit from the menu.
The Edit Datastore dialog box displays the connection information.
2. Change the database server name, database name, username, and password options, as
required.
3. Click OK.
The changes take effect immediately. To change datastore properties
1. On the Datastores tab of the Local Object Library, right-click the datastore name and select
Properties from the menu.
The Properties dialog box lists the datastore’s description, attributes, and class attributes.
2. Change the datastore properties, as required.
3. Click OK.
Importing metadata from data sources Data Services determines and stores a specific set of metadata information for tables. You can
import metadata by name, searching, and browsing. After importing metadata, you can edit
column names, descriptions, and datatypes. The edits are propagated to all objects that call
these objects.
Metadata Description
Table name The name of the table as it appears in the database.
Table description The description of the table.
Column name The name of the table column.
Column description The description of the column.
Column datatype
The datatype for each column.
If a column is defined as an unsupported datatype (see datatypes listed
below) Data Services converts the datatype to one that is supported. In
some cases, if Data Services cannot convert the datatype, it ignores the
column entirely.
The following datatypes are supported: BLOB, CLOB, date, datetime,
decimal, double, int, interval, long, numeric, real, time, timestamp, and
varchar.
36 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Metadata Description
The column that comprises the primary key for the table.
Primary key column After a table has been added to a data flow diagram, this columns is
indicated in the column list by a key icon next to the column name.
Table attribute Information Data Services records about the table such as the date created
and date modified if these values are available.
Owner name Name of the table owner.
You can also import stored procedures from DB2, MS SQL Server, Oracle, and Sybase databases
and stored functions and packages from Oracle. You can use these functions and procedures
in the extraction specifications you give Data Services.
Information that is imported for functions includes:
• Function parameters
• Return type
• Name
• Owner
Imported functions and procedures appear in the Function branch of each datastore tree on
the Datastores tab of the Local Object Library.
You can configure imported functions and procedures through the Function Wizard and the
Smart Editor in a category identified by the datastore name.
Importing metadata by browsing The easiest way to import metadata is by browsing. Note that functions cannot be imported
using this method.
For more information on importing by searching and importing by name, see “Ways of importing
metadata”, Chapter 5 in the Data Services Designer Guide.
To import metadata by browsing 1. On the Datastores tab of the Local Object Library, right-click the datastore and select Open
from the menu.
The items available to import appear in the workspace.
2. Navigate to and select the tables for which you want to import metadata.
You can hold down the Ctrl or Shift keys and click to select multiple tables. 3. Right-click the selected items and select Import from the menu.
The workspace contains columns that indicate whether the table has already been imported
into Data Services (Imported) and if the table schema has changed since it was imported
Defining Source and Target Metadata—Learner’s Guide 37
SAP Data Services – Data Integrator XI 3.0
(Changed). To verify whether the repository contains the most recent metadata for an object,
right-click the object and select Reconcile.
4. In the Local Object Library, expand the datastore to display the list of imported objects,
organized into Functions, Tables, and Template Tables.
5. To view data for a imported datastore, right-click a table and select View Data from the
menu.
Activity: Creating source and target datastores You have been hired as a Data Services designer for Alpha Acquisitions. Alpha has recently
acquired Beta Businesses, an organization that develops and sells software products and related
services.
In an effort to consolidate and organize the data, and simplify the reporting process for the
growing company, the Omega data warehouse is being constructed to merge the data for both
organizations, and a separate data mart is being developed for reporting on Human Resources
data. You also have access to a database for staging purposes called Delta. To start the
development process, you must create datastores and import the metadata for all of these data
sources.
Objective • Create datastores and import metadata for the Alpha Acquisitions, Beta Businesses, Delta,
HR Data Mart, and Omega databases.
Instructions 1. In your Local Object Library, create a new source datastore for the Alpha Acquisitions data
with the following options:
Field Value
Datastore name Alpha
Datastore type Database
Database type MySQL
Database version MySQL 5.0
Data source alpha
User name alpha
Password alpha
38 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
2. Import the metadata for the following source tables:
• alpha.category
• alpha.city
• alpha.country
• alpha.customer
• alpha.department
• alpha.employee
• alpha.hr_comp_update
• alpha.last_run
• alpha.order_details
• alpha.orders
• alpha.product
• alpha.region 3. View the data for the category table and confirm that there are four records.
4. Create a second source datastore for the Beta Businesses data with the following options:
Field Value
Datastore name Beta
Datastore type Database
Database type MySQL
Database version MySQL 5.0
Data source beta
User name beta
Password beta
5. Import the metadata for the following source tables:
• beta.addrcodes
• beta.categories
• beta.city
• beta.country
• beta.customers
• beta.employees
• beta.orderdetails
• beta.orders
• beta.products
Defining Source and Target Metadata—Learner’s Guide 39
SAP Data Services – Data Integrator XI 3.0
• beta.region
• beta.shippers
• beta.suppliers
• beta.usa_customers 6. View the data for the usa_customers table and confirm that Jane Hartley from Planview Inc.
is the first customer record.
7. Create a datastore for the Delta staging database with the following options:
Field Value
Datastore name Delta
Datastore type Database
Database type MySQL
Database version MySQL 5.0
Data source delta
User name delta
Password
You do not need to import any metadata.
delta
8. Create a target datastore for the HR data mart with the following options:
Field Value
Datastore name HR_datamart
Datastore type Database
Database type MySQL
Database version MySQL 5.0
Data source hr_datamart
User name hruser
Password hruser
40 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
9. Import the metadata for the following target tables:
• hr_datamart.emp_dept
• hr_datamart.employee
• hr_datamart.hr_comp_update
• hr_datamart.recovery_status 10. Create a target datastore for the Omega data warehouse with the following options:
Field Value
Datastore name Omega
Datastore type Database
Database type MySQL
Database version MySQL 5.0
Data source omega
User name omega
Password omega
11. Import the metadata for the following target tables:
• omega.emp_dim
• omega.product_dim
• omega.product_target
• omega.time_dim
Defining Source and Target Metadata—Learner’s Guide 41
SAP Data Services – Data Integrator XI 3.0
Using datastore and system configurations
Introduction Data Services supports multiple datastore configurations, which allow you to change your
datastores depending on the environment in which you are working.
After completing this unit you will be able to:
• Create multiple configurations in a datastore
• Create a system configuration
Creating multiple configurations in a datastore A configuration is a property of a datastore that refers to a set of configurable options (such as
database connection name, database type, user name, password, and locale) and their values.
When you create a datastore, you can specify one datastore configuration at a time and specify
one as the default. Data Services uses the default configuration to import metadata and execute
jobs. You can create additional datastore configurations using the Advanced option in the
datastore editor. You can combine multiple configurations into a system configuration that is
selectable when executing or scheduling a job. Multiple configurations and system configurations
make portability of your job much easier (for example, different connections for development,
test, and production environments).
When you add a new configuration, Data Services modifies the language of data flows that
contain table targets and SQL transforms in the datastore based on what you defined in the
new configuration.
To create multiple datastore configurations in an existing datastore 1. On the Datastores tab of the Local Object Library, right-click a datastore and select Edit
from the menu.
The Edit Datastore dialog box displays.
2. Click Advanced >>.
A grid of additional datastore properties and the multiple configuration controls displays.
42 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
3. Click Edit next to the Configurations count at the bottom of the dialog box.
The Configurations for Datastore dialog box displays. The default configuration displays.
Each subsequent configuration displays as an additional column.
Defining Source and Target Metadata—Learner’s Guide 43
SAP Data Services – Data Integrator XI 3.0
4. Double-click the header for the default configuration to change the name, and then click
outside of the header to commit the change.
5. Click Create New Configuration in the toolbar.
The Create New Configuration dialog box displays.
6. In the Name field, enter the name for your new configuration.
Do not include spaces when assigning names for your datastore configurations. 7. Select the database type and version.
8. Click OK.
A second configuration is added to the Configurations for Datastore window.
44 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
9. Adjust the other properties of the new configuration to correspond with the existing
configuration, as required.
If a property does not apply to a configuration, the cell does not accept input. Cells that
correspond to a group header also do not accept input, and are marked with hatched gray
lines.
10. If required, click Create New Alias to create an alias for the configuration, enter a value for
the alias at the bottom of the page, and click OK to return to the Edit Datastore dialog box.
11. Click OK to complete the datastore configuration.
12. Click OK to close the Edit Datastore dialog box.
Creating a system configuration System configurations define a set of datastore configurations that you want to use together
when running a job. In many organizations, a Data Services designer defines the required
datastore and system configurations, and a system administrator determines which system
configuration to use when scheduling or starting a job in the Administrator.
When designing jobs, determine and create datastore configurations and system configurations
depending on your business environment and rules. Create datastore configurations for the
datastores in your repository before you create the system configurations for them.
Data Services maintains system configurations separately. You cannot check in or check out
system configurations. However, you can export system configurations to a separate flat file
which you can later import. By maintaining system configurations in a separate file, you avoid
Defining Source and Target Metadata—Learner’s Guide 45
SAP Data Services – Data Integrator XI 3.0
modifying your datastore each time you import or export a job, or each time you check in and
check out the datastore.
You cannot define a system configuration if your repository does not contain at least one
datastore with multiple configurations.
To create a system configuration 1. From the Tools menu, select System Configurations.
The System Configuration Editor dialog box displays columns for each datastore.
2. In the Configuration name column, enter the system configuration name.
Use the SC_ prefix in the system configuration name so that you can easily identify this file
as a system configuration, particularly when exporting.
3. In the drop-down list for each datastore column, select the appropriate datastore
configuration that you want to use when you run a job using this system configuration.
4. Click OK.
46 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Defining file formats for flat files
Introduction File formats are connections to flat files in the same way that datastore are connections to
databases.
After completing this unit, you will be able to:
• Explain file formats
• Create a file format for a flat file
Explaining file formats A file format is a generic description that can be used to describe one file or multiple data files
if they share the same format. It is a set of properties describing the structure of a flat file (ASCII).
File formats are used to connect to source or target data when the data is stored in a flat file.
The Local Object Library stores file format templates that you use to define specific file formats
as sources and targets in data flows.
File format objects can describe files in:
• Delimited format — delimiter characters such as commas or tabs separate each field.
• Fixed width format — the fixed column width is specified by the user.
• SAP R/3 format — this is used with the predefined Transport_Format or with a custom
SAP R/3 format.
Creating file formats Use the file format editor to set properties for file format templates and source and target file
formats. The file format editor has three work areas:
• Property Value: Edit file format property values. Expand and collapse the property groups
by clicking the leading plus or minus.
• Column Attributes: Edit and define columns or fields in the file. Field-specific formats
override the default format set in the Properties-Values area.
• Data Preview: View how the settings affect sample data.
The properties and appearance of the work areas vary with the format of the file.
Date formats In the Property Values work area, you can override default date formats for files at the field
level. The following data format codes can be used:
Code Description
DD 2-digit day of the month
Defining Source and Target Metadata—Learner’s Guide 47
SAP Data Services – Data Integrator XI 3.0
Code Description
MM 2-digit month
MONTH Full name of the month
MON 3-character name of the month
YY 2-digit year
YYYY 4-digit year
HH24 2-digit hour of the day (0-23)
MI 2-digit minute (0-59)
SS 2-digit second (0-59)
FF Up to 9-digit sub-seconds
To create a new file format 1. On the Formats tab of the Local Object Library, right-click Flat Files and select New from
the menu to open the File Format Editor.
To make sure your file format definition works properly, it is important to finish inputting
the values for the file properties before moving on to the Column Attributes work area.
48 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
2. In the Type field, specify the file type:
• Delimited: select this file type if the file uses a character sequence to separate columns.
• Fixed width: select this file type if the file uses specified widths for each column.
If a fixed-width file format uses a multi-byte code page, then no data is displayed in the
Data Preview section of the file format editor for its files.
3. In the Name field, enter a name that describes this file format template.
Once the name has been created, it cannot be changed. If an error is made, the file format
must be deleted and a new format created.
4. Specify the location information of the data file including Location, Root directory, and File
name.
The Group File Read can read multiple flat files with identical formats through a single file
format. By substituting a wild card character or list of file names for the single file name,
multiple files can be read.
5. Click Yes to overwrite the existing schema.
This happens automatically when you open a file. 6. Complete the other properties to describe files that this template represents. Overwrite the
existing schema as required.
Defining Source and Target Metadata—Learner’s Guide 49
SAP Data Services – Data Integrator XI 3.0
7. For source files, specify the structure of each column in the Column Attributes work area
as follows:
Column Description
Field Name Enter the name of the column.
Data Type Select the appropriate datatype from the drop-down list.
Field Size For columns with a datatype of varchar, specify the length of
the field.
Precision For columns with a datatype of decimal or numeric, specify
the precision of the field.
Scale For columns with a datatype of decimal or numeric, specify
the scale of the field.
Format
For columns with any datatype but varchar, select a format
for the field, if desired. This information overrides the default
format set in the Property Values work area for that datatype.
You do not need to specify columns for files used as targets. If you do specify columns and
they do not match the output schema from the preceding transform, Data Services writes
to the target file using the transform’s output schema.
For a decimal or real datatype, if you only specify a source column format and the column
names and datatypes in the target schema do not match those in the source schema, Data
Services cannot use the source column format specified. Instead, it defaults to the format
used by the code page on the computer where the Job Server is installed.
8. Click Save & Close to save the file format and close the file format editor.
9. In the Local Object Library, right-click the file format and select View Data from the menu
to see the data.
To create a file format from an existing file format 1. On the Formats tab of the Local Object Library, right-click an existing file format and select
Replicate.
The File Format Editor opens, displaying the schema of the copied file format.
2. In the Name field, enter a unique name for the replicated file format.
Data Services does not allow you to save the replicated file with the same name as the
original (or any other existing File Format object). After it is saved, you cannot modify the
name again.
3. Edit the other properties as desired.
50 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
4. Click Save & Close to save the file format and close the file format editor. To read multiple flat files with identical formats through a single file format
1. On the Formats tab of the Local Object Library, right-click an existing file format and select
Edit from the menu.
The format must be based on one single file that shares the same schema as the other files. 2. In the location field of the format wizard, enter one of the following:
• Root directory (optional to avoid retyping)
• List of file names, separated by commas
• File name containing a wild character (*)
When you use the (*) to call the name of several file formats, Data Services reads one file
format, closes it and then proceeds to read the next one. For example, if you specify the file
name revenue*.txt, Data Services reads all flat files starting with revenue in the file name.
Handling errors in file formats One of the features available in the File Format Editor is error handling. When you enable
error handling for a file format, Data Services:
• Checks for the two types of flat-file source errors:
○ Datatype conversion errors. For example, a field might be defined in the File Format
Editor as having a datatype of integer but the data encountered is actually varchar.
○ Row-format errors. For example, in the case of a fixed-width file, Data Services identifies
a row that does not match the expected width value.
• Stops processing the source file after reaching a specified number of invalid rows.
• Logs errors to the Data Services error log. You can limit the number of log entries allowed
without stopping the job.
You can choose to write rows with errors to an error file, which is a semicolon-delimited text
file that you create on the same machine as the Job Server.
Entries in an error file have this syntax: source file path and name; row number in source file; Data Services error; column
number where the error occurred; all columns from the invalid row
Defining Source and Target Metadata—Learner’s Guide 51
SAP Data Services – Data Integrator XI 3.0
To enable flat file error handling in the File Format Editor 1. On the Formats tab of the Local Object Library, right-click the file format and select Edit
from the menu.
2. Under the Error handling section, in the Capture data conversion errors drop-down list,
select Yes.
3. In the Capture row format errors drop-down list, select Yes.
4. In the Write error rows to file drop-down list, select Yes.
You can also specify the maximum warnings to log and the maximum errors before a job
is stopped.
5. In the Error file root directory field, click the folder icon to browse to the directory in which
you have stored the error handling text file you created.
6. In the Error file name field, enter the name for the text file you created to capture the flat
file error logs in that directory.
7. Click Save & Close.
Activity: Creating a file format for a flat file In addition to the main databases for source information, records for some of the orders for
Alpha Acquisitions are stored in flat files.
Objective • Create a file format for the orders flat files so you can use them as source objects.
Instructions 1. In the Local Object Library, create a new delimited file format called Orders_Format for the
orders_12_21_06.txt flat file in the Activity_Source folder.
The path depends on where the folder has been copied from the Learner Resource CD. 2. Adjust the format so that it reflects the source file.
Consider the following:
• The column delimiter is a semicolon (;).
• The row delimiter is {Windows new line}.
52 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
• The date format is dd-mon-yyyy.
• The row header should be skipped. 3. In the Column Attributes pane, adjust the datatypes for the columns based on their content.
Column Datatype
ORDERID int
EMPLOYEEID varchar(15)
ORDERDATE date
CUSTOMERID int
COMPANYNAME varchar(50)
CITY varchar(50)
COUNTRY varchar(50)
4. Save your changes and view the data to confirm that order 11196 was placed on December
21, 2006.
Defining Source and Target Metadata—Learner’s Guide 53
SAP Data Services – Data Integrator XI 3.0
Defining file formats for Excel files
Introduction You can create file formats for Excel files in the same way that you would for flat files.
After completing this unit, you will be able to:
• Create a file format for an Excel file
Using Excel as a native data source It is possible to connect to Excel workbooks natively as a source, with no ODBC connection
setup and configuration needed. You can select specific data in the workbook using custom
ranges or auto-detect, and you can specify variable for file and sheet names for more flexibility.
As with file formats and datastores, these Excel formats show up as sources in impact and
lineage analysis reports.
To import and configure an Excel source 1. On the Formats tab of the Local Object Library, right-click Excel Workbooks and select New
from the menu.
The Import Excel Workbook dialog box displays.
54 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
2. In the Format name field, enter a name for the format.
The name may contain underscores but not spaces.
3. On the Format tab, click the drop-down button beside the Directory field and select <Select
folder...>.
4. Navigate to and select a new directory, and then click OK.
5. Click the drop-down button beside the File name field and select <Select file...>.
6. Navigate to and select an Excel file, and then click OK.
7. Do one of the following:
• To reference a named range for the Excel file, select the Named range radio button and
enter a value in the field provided.
• To reference an entire worksheet, select the Worksheet radio button and then select the
All fields radio button.
Defining Source and Target Metadata—Learner’s Guide 55
SAP Data Services – Data Integrator XI 3.0
• To reference a custom range, select the Worksheet radio button and the Custom range
radio button, click the ellipses (...) button, select the cells, and close the Excel file by
clicking X in the top right corner of the worksheet.
8. If required, select the Extend range checkbox.
The Extend range checkbox provides a means to extend the spreadsheet in the event that
additional rows of data are added at a later time. If this checkbox is checked, at execution
time, Data Services searches row by row until a null value row is reached. All rows above
the null value row are included.
9. If applicable, select the Use first row values as column names option.
If this option is selected, field names are based on the first row of the imported Excel sheet. 10. Click Import schema.
The schema is displayed at the top of the dialog box.
11. Specify the structure of each column as follows:
Column Description
Field Name Enter the name of the column.
Data Type Select the appropriate datatype from the drop-down list.
Field Size For columns with a datatype of varchar, specify the length of the field.
Precision For columns with a datatype of decimal or numeric, specify the
precision of the field.
Scale For columns with a datatype of decimal or numeric, specify the scale
of the field.
Description If desired, enter a description of the column.
12. If required, on the Data Access tab, enter any changes that are required.
The Data Access tab provides options to retrieve the file via FTP or execute a custom
application (such as unzipping a file) before reading the file.
13. Click OK.
The newly imported file format appears in the Local Objects Library with the other Excel
workbooks. The sheet is now available to be selected for use as a native data source.
Activity: Creating a file format for an Excel file Compensation information for Alpha Acquisitions is stored in an Excel spreadsheet. To use
this information in data flows, you must create a file format.
56 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Objective • Create a file format to enable you to use the compensation spreadsheet as a source object
Instructions 1. In the Local Object Library, create a new file format for an Excel Workbook called Comp_HR.
2. Navigate to the Comp_HR.xls file in the Activity_Source folder. The path depends on where
the folder has been copied from the Learner Resource CD.
3. Select a custom range for the Comp_HR worksheet and select all cells that contain data.
4. Specify that you want to be able to extend the range.
5. Use the first row for the column names.
6. Import the schema and adjust the datatypes for the columns as follows:
Column Datatype
EmployeeID varchar(10)
Emp_Salary int
Emp_Bonus int
Emp_VacationDays int
date_updated datetime 7. Save your changes and view the data to confirm that employee 2Lis5 has 16 vacation days
accrued.
Defining Source and Target Metadata—Learner’s Guide 57
SAP Data Services – Data Integrator XI 3.0
Defining file formats for XML files
Introduction Data Services allows you to import and export metadata for XML documents that you can use
as sources or targets in jobs.
After completing this unit, you will be able to:
• Import data from XML documents
• Explain nested data
Importing data from XML documents XML documents are hierarchical and the set of properties describing their structure is stored
in separate format files. These format files describe the data contained in the XML document
and the relationships among the data elements, the schema. The format of an XML file or
message (.xml) can be specified using either a document type definition (.dtd) or XML Schema
(.xsd).
Data flows can read and write data to messages or files based on a specified DTD format or
XML Schema. You can use the same DTD format or XML Schema to describe multiple XML
sources or targets.
Data Services uses Nested Relational Data Modeling (NRDM) to structure imported metadata
from format documents, such as .xsd or .dtd files, into an internal schema to use for hierarchical
documents.
Importing metadata from a DTD file As an example, an XML document that contains information to place a sales order, such as
order header, customer, and line items, the corresponding DTD includes the order structure
and the relationship between the data elements.
58 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
You can import metadata from either an existing XML file (with a reference to a DTD) or a
DTD file. If you import the metadata from an XML file, Data Services automatically retrieves
the DTD for that XML file.
When importing a DTD format, Data Services reads the defined elements and attributes, and
ignores other parts, such as text and comments, from the file definition. This allows you to
modify imported XML data and edit the datatype as needed.
To import a DTD format 1. On the Formats tab of the Local Object Library, right-click DTDs, and select New.
The Import DTD Format dialog box appears.
Defining Source and Target Metadata—Learner’s Guide 59
SAP Data Services – Data Integrator XI 3.0
2. In the DTD definition name field, enter the name you want to give the imported DTD
format.
3. Beside the File name field, click Browse, locate the file path that specifies the DTD you want
to import, and open the DTD.
4. In the File type area, select a file type.
The default file type is DTD. Use the XML option if the DTD file is embedded within the
XML data. 5. In the Root element name field, select the name of the primary node of the XML that the
DTD format is defining.
Data Services only imports elements of the format that belong to this node or any sub-nodes.
This option is not available when you select the XML file option type.
6. In the Circular level field, specify the number of levels the DTD, if applicable.
If the DTD format contains recursive elements, for example, element A contains B and
element B contains A, this value must match the number of recursive levels in the DTD
format’s content. Otherwise, the job that uses this DTD format will fail.
7. In the Default varchar size field, set the varchar size to import strings into Data Services.
The default varchar size is 1024. 8. Click OK.
After you import the DTD format, you can view the DTD format’s column properties, and
edit the nested table and column attributes in the DTD - XML Format editor. For more
information on DTD attributes, see Chapter 2 in the Data Services Reference Guide.
To edit column attributes of nested schemas 1. On the Formats tab of the Local Object Library, expand DTDs and double-click the DTD
name to open it in the workspace.
2. In the workspace, right-click a nested column or column and select Properties.
3. In the Column Properties window, click the Attributes tab.
4. To change an attribute, click the attribute name and enter the appropriate value in the Value
field.
60 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
5. Click OK.
Importing metadata from an XML schema For an XML document that contains, for example, information to place a sales order, such as
order header, customer, and line items, the corresponding XML schema includes the order
structure and the relationship between the data as shown:
Defining Source and Target Metadata—Learner’s Guide 61
SAP Data Services – Data Integrator XI 3.0
When importing an XML Schema, Data Services reads the defined elements and attributes,
and imports:
• Document structure
• Table and column names
• Datatype of each column
• Nested table and column attributes
Note: While XML Schemas make a distinction between elements and attributes, Data Services
imports and converts them all to nested table and column attributes. For more information on Data
Services attributes, see Chapter 2 in the Data Services Reference Guide.
To import an XML schema 1. On the Formats tab of the Local Object Library, right-click XML Schemas, and select New.
The Import XML Schema Format editor appears.
62 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
2. In the Format name field, enter the name you want to give the format.
3. In the File name/ URL field, enter the file name and URL address of the source file, or click
Browse, locate the file path that specifies the XML Schema you want to import, and open
the file.
4. In the Root element name drop-down list, select the name of the primary node you want
to import.
Data Services only imports elements of the XML Schema that belong to this node or any
subnodes. If the root element name is not unique within the XML Schema, select a namespace
to identify the imported XML Schema.
5. In the Circular level field, specify the number of levels the XML Schema has, if applicable.
If the XML Schema contains recursive elements, for example, element A contains B and
element B contains A, this value must match the number of recursive levels in the XML
Schema’s content. Otherwise, the job that uses this XML Schema will fail.
6. In the Default varchar size field, set the varchar size to import strings into Data Services.
The default varchar size is 1024. 7. Click OK.
After you import an XML Schema, you can view the XML schema’s column properties, and
edit the nested table and column attributes in the workspace.
Explaining nested data Sales orders are often presented using nested data. For example, the line items in a sales order
are related to a single header and are represented using a nested schema. Each row of the sales
order data set contains a nested line item schema as shown:
Defining Source and Target Metadata—Learner’s Guide 63
SAP Data Services – Data Integrator XI 3.0
Using the nested data method can be more concise (no repeated information), and can scale to
present a deeper level of hierarchical complexity.
To expand on the example above, columns inside a nested schema can also contain columns.
There is a unique instance of each nested schema for each row at each level of the relationship
as shown:
Generalizing further with nested data, each row at each level can have any number of columns
containing nested schemas.
64 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Data Services maps nested data to a separate schema implicitly related to a single row and
column of the parent schema. This mechanism is called Nested Relational Data Modeling
(NRDM). NRDM provides a way to view and manipulate hierarchical relationships within
data flow sources, targets, and transforms.
In Data Services, you can see the structure of nested data in the input and output schemas of
sources, targets, and transforms in data flows.
Unnesting data Loading a data set that contains nested schemas into a relational target requires that the nested
rows be unnested.
For example, a sales order may use a nested schema to define the relationship between the
order header and the order line items. To load the data into relational schemas, the multi-level
must be unnested.
Unnesting a schema produces a cross-product of the top-level schema (parent) and the nested
schema (child).
You can also load different columns from different nesting levels into different schemas. For
example, a sales order can be flattened so that the order number is maintained separately with
each line-item and the header and line-item information are loaded into separate schemas.
Defining Source and Target Metadata—Learner’s Guide 65
SAP Data Services – Data Integrator XI 3.0
Data Services allows you to unnest any number of nested schemas at any depth. No matter
how many levels are involved, the result of unnesting schemas is a cross product of the parent
and child schemas.
When more than one level of unnesting occurs, the inner-most child is unnested first, then the
result—the cross product of the parent and the inner-most child—is then unnested from its
parent, and so on to the top-level schema.
Keep in mind that unnesting all schemas to create a cross product of all data might not produce
the results you intend. For example, if an order includes multiple customer values such as
ship-to and bill-to addresses, flattening a sales order by unnesting customer and line-item
schemas produces rows of data that might not be useful for processing the order.
66 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Quiz: Defining source and target metadata 1. What is the difference between a datastore and a database?
2. What are the two methods in which metadata can be manipulated in Data Services objects?
What does each of these do? 3. Which of the following is NOT a datastore type?
a. Database
b. Application
c. Adapter
d. File Format 4. What is the difference between a repository and a datastore?
Defining Source and Target Metadata—Learner’s Guide 67
SAP Data Services – Data Integrator XI 3.0
Lesson summary After completing this lesson, you are now able to:
• Use datastores
• Use datastore and system configurations
• Define file formats for flat files
• Define file formats for Excel files
• Define file formats for XML files
SAP Data Services – Data Integrator XI 3.0
68 BusinessObjects Data Integrator XI 3.0: Core Concepts-Learner's Guide
Creating Batch Jobs—Learner’s Guide 69
SAP Data Services – Data Integrator XI 3.0
Lesson 3
Creating Batch Jobs
Lesson introduction Once metadata has been imported for your datastores, you can create data flows to define data
movement requirements.
After completing this lesson, you will be able to:
• Work with objects
• Create a data flow
• Use the Query transform
• Use target tables
• Execute the job
70 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Working with objects
Introduction Data flows define how information is moved from source to target. These data flows are
organized into executable jobs, which are grouped into projects.
After completing this unit, you will be able to:
• Create a project
• Create a job
• Add, connect, and delete objects in the workspace
• Create a work flow
Creating a project A project is a single-use object that allows you to group jobs. It is the highest level of organization
offered by Data Services. Opening a project makes one group of objects easily accessible in the
user interface. Only one project can be open at a time.
A project is used solely for organizational purposes. For example, you can use a project to
group jobs that have schedules that depend on one another or that you want to monitor together.
The objects in a project appear hierarchically in the project area in Designer. If a plus sign (+)
appears next to an object, you can expand it to view the lower-level objects.
The objects in the project area also display in the workspace, where you can drill down into
additional levels:.
Creating Batch Jobs—Learner’s Guide 71
SAP Data Services – Data Integrator XI 3.0
To create a new project
1. From the Project menu, select New ➤ Project.
You can also right-click the white space on the Projects tab of the Local Object Library and
select New from the menu.
The Project - New dialog box displays.
2. Enter a unique name in the Project name field.
The name can include alphanumeric characters and underscores (_). It cannot contain blank
spaces.
3. Click Create.
The new project appears in the project area. As you add jobs and other lower-level objects
to the project, they also appear in the project area.
To open an existing project 1. From the Project menu, select Open.
The Project - Open dialog box displays. 2. Select the name of an existing project from the list.
3. Click Open.
If another project is already open, Data Services closes that project and opens the new one
in the project area.
To save a project 1. From the Project menu, select Save All.
The Save all changes dialog box lists the jobs, work flows, and data flows that you edited
since the last save.
72 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
2. Deselect any listed object to avoid saving it.
3. Click OK.
You are also prompted to save all changes made in a job when you execute the job or exit
the Designer.
Creating a job A job is the only executable object in Data Services. When you are developing your data flows,
you can manually execute and test jobs directly in Data Services. In production, you can schedule
batch jobs and set up real-time jobs as services that execute a process when Data Services
receives a message request.
A job is made up of steps that are executed together. Each step is represented by an object icon
that you place in the workspace to create a job diagram. A job diagram is made up of two or
more objects connected together. You can include any of the following objects in a job definition:
• Work flows
• Scripts
• Conditionals
• While loops
• Try/catch blocks
• Data flows
○ Source objects
○ Target objects
○ Transforms
If a job becomes complex, you can organize its content into individual work flows, and then
create a single job that calls those work flows.
Tip: It is recommended that you follow consistent naming conventions to facilitate object
identification across all systems in your enterprise.
Creating Batch Jobs—Learner’s Guide 73
SAP Data Services – Data Integrator XI 3.0
To create a job in the project area 1. In the project area, right-click the project name and select New Batch Job from the menu.
A new batch job is created in the project area.
2. Edit the name of the job.
The name can include alphanumeric characters and underscores (_). It cannot contain blank
spaces.
Data Services opens a new workspace for you to define the job.
3. Click the cursor outside of the job name or press Enter to commit the changes.
You can also create a job and related objects from the Local Object Library. When you create
a job in the Local Object Library, you must associate the job and all related objects to a project
before you can execute the job.
Adding, connecting, and deleting objects in the workspace
After creating a job, you can add objects to the job workspace area using either the Local Object
Library or the tool palette. To add objects from the Local Object Library to the workspace
1. In the Local Object Library, click the tab for the type of object you want to add.
2. Click and drag the selected object on to the workspace. To add objects from the tool palette to the workspace
• In the tool palette, click the desired object, move the cursor to the workspace, and then click
the workspace to add the object.
Creating a work flow A work flow is an optional object that defines the decision-making process for executing other
objects.
74 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
For example, elements in a work flow can determine the path of execution based on a value
set by a previous job or can indicate an alternative path if something goes wrong in the primary
path. Ultimately, the purpose of a work flow is to prepare for executing data flows and to set
the state of the system after the data flows are complete.
Note: In essence, jobs are just work flows that can be executed. Almost all of the features documented
for work flows also apply to jobs.
Work flows can contain data flows, conditionals, while loops, try/catch blocks, and scripts.
They can also call other work flows, and you can nest calls to any depth. A work flow can even
call itself.
To create a work flow 1. Open the job or work flow to which you want to add the work flow.
2. Select the Work Flow icon in the tool palette.
3. Click the workspace where you want to place the work flow.
4. Enter a unique name for the work flow.
5. Click the cursor outside of the work flow name or press Enter to commit the changes. To connect objects in the workspace area
• Click and drag from the triangle or square of an object to the triangle or square of the next
object in the flow to connect the objects.
To disconnect objects in the workspace area • Select the connecting line between the objects and press Delete.
Defining the order of execution in work flows The connections you make between the icons in the workspace determine the order in which
work flows execute, unless the jobs containing those work flows execute in parallel. Steps in a
work flow execute in a sequence from left to right. You must connect the objects in a work flow
when there is a dependency between the steps.
To execute more complex work flows in parallel, you can define each sequence as a separate
work flow, and then call each of the work flows from another work flow, as in this example:
First, you must define Work Flow A:
Creating Batch Jobs—Learner’s Guide 75
SAP Data Services – Data Integrator XI 3.0
Next, define Work Flow B:
Finally, create Work Flow C to call Work Flows A and B:
You can specify a job to execute a particular work flow or data flow once only. If you specify
that it should be executed only once, Data Services only executes the first occurrence of the
work flow or data flow, and skips subsequent occurrences in the job. You might use this feature
when developing complex jobs with multiple paths, such as jobs with try/catch blocks or
conditionals, and you want to ensure that Data Services only executes a particular work flow
or data flow one time.
76 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Creating a data flow
Introduction Data flows contain the source, transform, and target objects that represent the key activities in
data integration and data quality processes.
After completing this unit, you will be able to:
• Create a data flow
• Explain source and target objects
• Add source and target objects to a data flow
Using data flows Data flows determine how information is extracted from sources, transformed, and loaded into
targets. The lines connecting objects in a data flow represent the flow of data through data
integration and data quality processes.
Each icon you place in the data flow diagram becomes a step in the data flow. The objects that
you can use as steps in a data flow are:
• Source and target objects
• Transforms
The connections you make between the icons determine the order in which Data Services
completes the steps.
Using data flows as steps in work flows Each step in a data flow, up to the target definition, produces an intermediate result. For
example, the results of a SQL statement contain a WHERE clause that flows to the next step in
the data flow. The intermediate result consists of a set of rows from the previous operation and
the schema in which the rows are arranged. This result is called a data set. This data set may,
in turn, be further filtered and directed into yet another data set.
Data flows are closed operations, even when they are steps in a work flow. Any data set created
within a data flow is not available to other steps in the work flow.
A work flow does not operate on data sets and cannot provide more data to a data flow;
however, a work flow can:
• Call data flows to perform data movement operations.
• Define the conditions appropriate to run data flows.
• Pass parameters to and from data flows.
Creating Batch Jobs—Learner’s Guide 77
SAP Data Services – Data Integrator XI 3.0
To create a new data flow 1. Open the job or work flow in which you want to add the data flow.
2. Select the Data Flow icon in the tool palette.
3. Click the workspace where you want to add the data flow.
4. Enter a unique name for your data flow.
Data flow names can include alphanumeric characters and underscores (_). They cannot
contain blank spaces.
5. Click the cursor outside of the data flow or press Enter to commit the changes.
6. Double-click the data flow to open the data flow workspace.
Changing data flow properties You can specify the following advanced data properties for a data flow:
Data Flow Property Description
When you specify that a data flow should only execute once,
a batch job will never re-execute that data flow after the data
Execute only once flow completes successfully, even if the data flow is contained
in a work flow that is a recovery unit that re-executes. You
should not select this option if the parent work flow is a
recovery unit.
Use database links
Database links are communication paths between one database
server and another. Database links allow local users to access
data on a remote database, which can be on the local or a
remote computer of the same or different database type. For
more information see “Database link support for push-down
operations across datastores” in the Data Services Performance
Optimization Guide.
Degree of parallelism (DOP) is a property of a data flow that
defines how many times each transform within a data flow
Degree of parallelism replicates to process a parallel subset of data. For more
information see “Degree of parallelism” in the Data Services
Performance Optimization Guide.
Cache type
You can cache data to improve performance of operations
such as joins, groups, sorts, filtering, lookups, and table
comparisons. Select one of the following values:
• In Memory: Choose this value if your data flow processes
a small amount of data that can fit in the available memory.
78 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Data Flow Property Description
• Pageable: Choose this value if you want to return only a
subset of data at a time to limit the resources required. This
is the default.
For more information, see “Tuning Caches” in the Data Services
Performance Optimization Guide.
To change data flow properties 1. Right-click the data flow and select Properties from the menu.
The Properties window opens for the data flow.
2. Change the properties of the data flow as required.
3. Click OK.
For more information about how Data Integrator processes data flows with multiple
properties, see “Data Flow” in the Data Services Resource Guide.
Explaining source and target objects
A data flow directly reads data from source objects and loads data to target objects.
Creating Batch Jobs—Learner’s Guide 79
SAP Data Services – Data Integrator XI 3.0
Object Description Type
Table
A file formatted with columns
and rows as used in relational
databases.
Source and target
Template table
A template table that has been
created and saved in another
data flow (used in
development).
Source and target
File A delimited or fixed-width flat
file.
Source and target
Document
A file with an
application-specific format
(not readable by SQL or XML
parser).
Source and target
XML file A file formatted with XML
tags.
Source and target
XML message A source in real-time jobs. Source only
An XML file whose format is
based on the preceding
XML template file transform output (used in
development, primarily for
debugging data flows).
Target only
Transform
A pre-built set of operations
that can create new data, such
as the Date Generation
transform.
Source only
Adding source and target objects Before you can add source and target objects to a data flow, you must first create the datastore
and import the table metadata for any databases, or create the file format for flat files.
To add a source or target object to a data flow 1. In the workspace, open the data flow in which you want to place the object.
2. Do one of the following:
80 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
• To add a database table, in the Datastores tab of the Local Object Library, select the table.
• To add a flat file, in the Formats tab of the Local Object Library, select the file format. 3. Click and drag the object to the workspace.
A pop-up menu appears for the source or target object.
4. Select Make Source or Make Target from the menu, depending on whether the object is a
source or target object.
5. Add and connect objects in the data flow as appropriate.
Creating Batch Jobs—Learner’s Guide 81
SAP Data Services – Data Integrator XI 3.0
Using the Query transform
Introduction The Query transform is the most commonly-used transform, and is included in most data flows.
It enables you to select data from a source and filter or reformat it as it moves to the target.
After completing this unit, you will be able to:
• Describe the transform editor
• Use the Query transform
Describing the transform editor The transform editor is a graphical interface for defining the properties of transforms. The
workspace can contain these areas:
• Input schema area
• Output schema area
• Parameters area
82 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
The input schema area displays the schema of the input data set. For source objects and some
transforms, this area is not available.
The output schema area displays the schema of the output data set, including any functions.
For template tables, the output schema can be defined based on your preferences.
For any data that needs to move from source to target, a relationship must be defined between
the input and output schemas. To create this relationship, you must map each input column
to the corresponding output column.
Below the input and output schema areas is the parameters area. The options available on this
tab differs based on which transform or object you are modifying. The I icon ( ) indicates tabs
containing user-defined entries.
Explaining the Query transform The Query transform is used so frequently that it is included in the tool palette with other
standard objects. It retrieves a data set that satisfies conditions that you specify, similar to a
SQL SELECT statement.
Creating Batch Jobs—Learner’s Guide 83
SAP Data Services – Data Integrator XI 3.0
The Query transform can perform the following operations:
• Filter the data extracted from sources.
• Join data from multiple sources.
• Map columns from input to output schemas.
• Perform transformations and functions on the data.
• Perform data nesting and unnesting.
• Add new columns, nested schemas, and function results to the output schema.
• Assign primary keys to output columns.
For example, you could use the Query transform to select a subset of the data in a table to show
only those records from a specific region.
The next section gives a brief description the function, data input requirements, options, and
data output results for the Query transform. For more information on the Query transform see
“Transforms” Chapter 5 in the Data Services Reference Guide.
Input/Output The data input is a data set from one or more sources with rows flagged with a NORMAL
operation code.
The NORMAL operation code creates a new row in the target. All the rows in a data set are
flagged as NORMAL when they are extracted by a source table or file. If a row is flagged as
NORMAL when loaded into a target table or file, it is inserted as a new row in the target.
The data output is a data set based on the conditions you specify and using the schema specified
in the output schema area.
Note: When working with nested data from an XML file, you can use the Query transform to
unnest the data using the right-click menu for the output schema, which provides options for
unnesting.
Options The input schema area displays all schemas input to the Query transform as a hierarchical tree.
Each input schema can contain multiple columns.
Output schema area displays the schema output from the Query transform as a hierarchical
tree. The output schema can contain multiple columns and functions.
84 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Icons preceding columns are combinations of these graphics:
Icon Description
This indicates that the column is a primary key.
This indicates that the column has a simple mapping. A simple mapping
is either a single column or an expression with no input column.
This indicates that the column has a complex mapping, such as a
transformation or a merge between two source columns.
This indicates that the column mapping is incorrect.
Data Integrator does not perform a complete validation during design, so
not all incorrect mappings will necessarily be flagged.
The parameters area of the Query transform includes the following tabs:
Tab Description
Mapping Specify how the selected output column is derived.
Select Select only distinct rows (discarding any duplicate rows).
From Specify the input schemas used in the current output schema.
Outer Join Specify an inner table and an outer table for joins that you want
treated as outer joins.
Where Set conditions that determine which rows are output.
Group By
Specify a list of columns for which you want to combine output.
For each unique set of values in the group by list, Data Services
combines or aggregates the values in the remaining columns.
Order By Specify the columns you want used to sort the output data set.
Search/Replace Search for and replace a specific work or item in the input schema
or the output schema.
Advanced
Create separate sub data flows to process any of the following
resource-intensive query clauses:
• DISTINCT
• GROUP BY
Creating Batch Jobs—Learner’s Guide 85
SAP Data Services – Data Integrator XI 3.0
Tab Description
• JOIN
• ORDER BY
For more information, see “Distributed Data Flow execution” in
the Data Services Designer Guide.
To map input columns to output columns • In the transform editor, do any of the following:
• Drag and drop a single column from the input schema area into the output schema area.
• Drag a single input column over the corresponding output column, release the cursor,
and select Remap Column from the menu.
• Select multiple input columns (using Ctrl+click or Shift+click) and drag onto Query
output schema for automatic mapping.
• Select the output column and manually enter the mapping on the Mapping tab in the
parameters area. You can either type the column name in the parameters area or click
and drag the column from the input schema pane.
86 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Using target tables
Introduction The target object for your data flow can be either a physical table or file, or a template table.
After completing this unit, you will be able to:
• Set target table options
• Use template tables
Setting target table options When your target object is a physical table in a database, the target table editor opens in the
workspace with different tabs where you can set database type properties, table loading options,
and tuning techniques for loading a job.
Creating Batch Jobs—Learner’s Guide 87
SAP Data Services – Data Integrator XI 3.0
Note: Most of the tabs in the target table editor focus on migration or performance-tuning techniques,
which are outside the scope of this course.
You can set the following table loading options in the Options tab of the target table editor:
Option Description
Rows per commit Specifies the transaction size in number of rows.
Column comparison
Delete data from table before
loading
Specifies how the input columns are mapped to output
columns. There are two options:
• Compare_by_position — disregards the column names
and maps source columns to target columns by
position.
• Compare_by_name — maps source columns to target
columns by name.
Validation errors occur if the datatypes of the columns
do not match.
Sends a TRUNCATE statement to clear the contents of
the table before loading during batch jobs. Defaults to not
selected.
Specifies the number of loaders (to a maximum of five)
and the number of rows per commit that each loader
receives during parallel loading.
Number of loaders For example, if you choose a Rows per commit of 1000
and set the number of loaders to three, the first 1000 rows
are sent to the first loader. The second 1000 rows are sent
to the second loader, the third 1000 rows to the third
loader, and the next 1000 rows back to the first loader.
Writes rows that cannot be loaded to the overflow file for
recovery purposes. Options are enabled for the file name
Use overflow file and file format. The overflow format can include the data
rejected and the operation being performed (write_data)
or the SQL command used to produce the rejected
operation (write_sql).
Specifies a value that might appear in a source column
that you do not want updated in the target table. When
Ignore columns with value this value appears in the source column, the
corresponding target column is not updated during auto
correct loading. You can enter spaces.
88 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Option Description
Ignore columns with null Ensures that NULL source columns are not updated in
the target table during auto correct loading.
Use input keys
Enables Data Integrator to use the primary keys from the
source table. By default, Data Integrator uses the primary
key of the target table.
Update key columns Updates key column values when it loads data to the
target.
Auto correct load
Include in transaction
Transaction order
Ensures that the same row is not duplicated in a target
table. This is particularly useful for data recovery
operations.
When Auto correct load is selected, Data Integrator reads
a row from the source and checks if a row exists in the
target table with the same values in the primary key. If a
matching row does not exist, it inserts the new row
regardless of other options. If a matching row exists, it
updates the row depending on the values of Ignore
columns with value and Ignore columns with null.
Indicates that this target is included in the transaction
processed by a batch or real-time job. This option allows
you to commit data to multiple tables as part of the same
transaction. If loading fails for any one of the tables, no
data is committed to any of the tables.
Transactional loading can require rows to be buffered to
ensure the correct load order. If the data being buffered
is larger than the virtual memory available, Data
Integrator reports a memory error.
The tables must be from the same datastore.
If you choose to enable transactional loading, these options
are not available: Rows per commit, Use overflow file,
and overflow file specification, Number of loaders, Enable
partitioning, and Delete data from table before loading.
Data Integrator also does not parameterize SQL or push
operations to the database if transactional loading is
enabled.
Indicates where this table falls in the loading order of the
tables being loaded. By default, there is no ordering.
Creating Batch Jobs—Learner’s Guide 89
SAP Data Services – Data Integrator XI 3.0
Option Description
All loaders have a transaction order of zero. If you specify
orders among the tables, the loading operations are
applied according to the order. Tables with the same
transaction order are loaded together. Tables with a
transaction order of zero are loaded at the discretion of
the data flow process.
See the Data Services Performance Optimization Guide and “Description of objects” in the Data
Services Reference Guide for more information. To access the target table editor
1. In a data flow, double-click the target table.
The target table editor opens in the workspace.
2. Change the values as required.
Changes are automatically committed.
3. Click Back to return to the data flow.
Using template tables During the initial design of an application, you might find it convenient to use template tables
to represent database tables. Template tables are particularly useful in early application
development when you are designing and testing a project.
90 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
With template tables, you do not have to initially create a new table in your RDBMS and import
the metadata into Data Services. Instead, Data Services automatically creates the table in the
database with the schema defined by the data flow when you execute a job.
After creating a template table as a target in one data flow, you can use it as a source in other
data flows. Although a template table can be used as a source table in multiple data flows, it
can be used only as a target in one data flow.
You can modify the schema of the template table in the data flow where the table is used as a
target. Any changes are automatically applied to any other instances of the template table.
After a template table is created in the database, you can convert the template table in the
repository to a regular table. You must convert template tables so that you can use the new
table in expressions, functions, and transform options. After a template table is converted, you
can no longer alter the schema.
To create a template table 1. Open a data flow in the workspace.
2. In the tool palette, click the Template Table icon and click the workspace to add a new
template table to the data flow.
The Create Template dialog box displays.
3. In the Table name field, enter the name for the template table.
4. In the In datastore drop-down list, select the datastore for the template table.
5. Click OK.
You also can create a new template table in the Local Object Library Datastore tab by
expanding a datastore and right-clicking Templates.
To convert a template table into a regular table from the Local Object Library
1. On the Datastores tab of the Local Object Library, expand the branch for the datastore to
view the template table.
2. Right-click a template table you want to convert and select Import Table from the menu.
Creating Batch Jobs—Learner’s Guide 91
SAP Data Services – Data Integrator XI 3.0
Data Services converts the template table in the repository into a regular table by importing
it from the database.
3. To update the icon in all data flows, from View menu, select Refresh.
On the Datastore tab of the Local Object Library, the table is listed under Tables rather than
Template Tables. To convert a template table into a regular table from a data flow
1. Open the data flow containing the template table.
2. Right-click the template table you want to convert and select Import Table from the menu.
92 BusinessObjects Data Integrator XI 3.0: Core Concepts-Learner's Guide
SAP Data Services – Data Integrator XI 3.0
cust_temp(OEMO_Ta
View Enabled Descriptions
Enable Object Description
Make Embedded Data Flow...
Make Port
Update Schema
r atch Schema
Delete
Rename F2
Import Table
0.. View Data...
Properties...
Creating Batch Jobs—Learner’s Guide 93
SAP Data Services – Data Integrator XI 3.0
Executing the job
Introduction Once you have created a data flow, you can execute the job in Data Services to see how the
data moves from source to target.
After completing this unit, you will be able to:
• Understand job execution
• Execute the job
Explaining job execution After you create your project, jobs, and associated data flows, you can then execute the job.
You can run jobs two ways:
• Immediate jobs
Data Services initiates both batch and real-time jobs and runs them immediately from within
the Designer. For these jobs, both the Designer and designated Job Server (where the job
executes, usually on the same machine) must be running. You will likely run immediate
jobs only during the development cycle.
• Scheduled jobs
Batch jobs are scheduled. To schedule a job, use the Data Services Management Console or
use a third-party scheduler. The Job Server must be running.
If a job has syntax errors, it does not execute.
Setting execution properties When you execute a job, the following options are available in the Execution Properties window:
Option Description
Print all trace messages Records all trace messages in the log.
Disable data validation statistics
collection
Does not collect audit statistics for this specific job execution.
Enable auditing Collects audit statistics for this specific job execution.
Enable recovery
Enables the automatic recovery feature. When enabled, Data
Services saves the results from completed steps and allows
you to resume failed jobs.
94 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Option Description
Recover from last failed
Resumes a failed job. Data Services retrieves the results
from any steps that were previously executed successfully
and re-executes any other steps.
execution This option is a run-time property. This option is not
available when a job has not yet been executed or when
recovery mode was disabled during the previous run.
Collect statistics for optimization Collects statistics that the Data Services optimizer will use
to choose an optimal cache type (in-memory or pageable).
Collect statistics for monitoring Displays cache statistics in the Performance Monitor in
Administrator.
Use collected statistics Optimizes Data Services to use the cache statistics collected
on a previous execution of the job.
Specifies the system configuration to use when executing
this job. A system configuration defines a set of datastore
configurations, which define the datastore connections.
System configuration If a system configuration is not specified, Data Services uses
the default datastore configuration for each datastore.
This option is a run-time property that is only available if
there are system configurations defined in the repository.
Job Server or Server Group Specified the Job Server or server group to execute this job.
Distribution level
Allows a job to be distributed to multiple Job Servers for
processing. The options are:
• Job - The entire job will execute on one server.
• Data flow - Each data flow within the job will execute
on a separate server.
• Sub-data flow - Each sub-data flow (can be a separate
transform or function) within a data flow will execute
on a separate Job server.
Executing the job Immediate or on demand tasks are initiated from the Designer. Both the Designer and Job
Server must be running for the job to execute.
Creating Batch Jobs—Learner’s Guide 95
SAP Data Services – Data Integrator XI 3.0
To execute a job as an immediate task 1. In the project area, right-click the job name and select Execute from the menu.
Data Services prompts you to save any objects that have not been saved.
2. Click OK.
The Execution Properties dialog box displays.
96 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
3. Select the required job execution parameters.
4. Click OK.
Activity: Creating a basic data flow After analyzing the source data, you have determined that the structure of the customer data
for Beta Businesses is the appropriate structure for the customer data in the Omega data
warehouse, and you must therefore change the structure of the Alpha Acquisitions customer
data to use the same structure in preparation for merging customer data from both datastores
at a later date.
Objective • Use the Query transform to change the schema of the Alpha Acquisitions Customer table
and move the data into the Delta staging database.
Instructions 1. Create a new project called Omega.
2. In the Omega project, create a new batch job called Alpha_Customers_Job with a new data
flow called Alpha_Customers_DF.
3. In the workspace for Alpha_Customers_DF, add the customer table from the Alpha datastore
as the source object.
4. Create a new template table called alpha_customers in the Delta datastore as the target
object.
5. Add the Query transform to the workspace between the source and target.
6. Connect the objects from source to transform to target.
7. In the transform editor for the Query transform, create the following output columns:
Name Data type Content type
CustomerID int
Firm varchar(50) Firm
ContactName varchar(50) Name
Title varchar(30) Title
Address1 varchar(50) Address
City varchar(50) Locality
Region varchar(25) Region
Creating Batch Jobs—Learner’s Guide 97
SAP Data Services – Data Integrator XI 3.0
Name Data type Content type
PostalCode varchar(25) Postcode
Country varchar(50) Country
Phone varchar(25) Phone
Fax
8. Map the columns as follows:
varchar(25) Phone
Schema In Schema Out
CUSTOMERID CustomerID
COMPANYNAME Firm
CONTACTNAME ContactName
CONTACTTITLE Title
ADDRESS Address1
CITY City
REGIONID Region
POSTALCODE PostalCode
COUNTRYID Country
PHONE Phone
FAX Fax 9. Set the CustomerID column as the Primary Key.
10. Execute Alpha_Customers_Job with the default execution properties and save all objects
you have created.
11. Return to the data flow workspace and view data for the target table to confirm that 25
records were loaded.
98 BusinessObjects Data Integrator XI 3.0: Core Concepts-Learner's Guide
SAP Data Services – Data Integrator XI 3.0
A solution file called SOLUTION_Baslc. atl is included in your resource CD. To check the
solution, import the file and open it to view the data flow design and mapping logic. Do not
execute the solution job, as this may override the results in your target table.
Creating Batch Jobs—Learner’s Guide 99
SAP Data Services – Data Integrator XI 3.0
Quiz: Creating batch jobs 1. Does a job have to be part of a project to be executed in the Designer?
2. How do you add a new template table?
3. Name the objects contained within a project.
4. What factors might you consider when determining whether to run work flows or data
flows serially or in parallel?
100 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Lesson summary After completing this lesson, you are now able to:
• Work with objects
• Create a data flow
• Use the Query transform
• Use target tables
• Execute the job
Troubleshooting Batch Jobs—Learner’s Guide 101
SAP Data Services – Data Integrator XI 3.0
Lesson 4
Troubleshooting Batch Jobs
Lesson introduction To document decisions and troubleshoot any issues that arise when executing your jobs, you
can validate and add annotations to jobs, work flows, and data flows, set trace options, and
debug your jobs. You can also set up audit rules to ensure the correct data is loaded to the
target.
After completing this lesson, you will be able to:
• Use descriptions and annotations
• Validate and trace jobs
• Use View Data and the Interactive Debugger
• Use auditing in data flows
102 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Using descriptions and annotations
Introduction Descriptions and annotations are a convenient way to add comments to objects and workspace
diagrams.
After completing this unit, you will be able to:
• Use descriptions with objects
• Use annotations to describe flows
Using descriptions with objects A description is associated with a particular object. When you import or export a repository
object, you also import or export its description.
Designer determines when to show object descriptions based on a system-level setting and an
object-level setting. Both settings must be activated to view the description for a particular
object.
Note: The system-level setting is unique to your setup.
There are three requirements for displaying descriptions:
• A description has been entered into the properties of the object.
• The description is enabled on the properties of that object.
• The global View Enabled Object Descriptions option is enabled. To show object descriptions at the system level
• From the View menu, select Enabled Descriptions.
This is a global setting.
To add a description to an object
1. In the project area or the workspace, right-click an object and select Properties from the
menu.
The Properties dialog box displays.
2. In the Description text box, enter your comments.
3. Click OK.
If you are modifying the description of a re-usable object, Data Services provides a warning
message that all instances of the re-usable object will be affected by the change.
4. Click Yes.
The description for the object displays in the Local Object Library.
Troubleshooting Batch Jobs—Learner’s Guide 103
SAP Data Services – Data Integrator XI 3.0
To display a description in the workspace • In the workspace, right-click the object in the workspace and select Enable Object
Description from the menu.
The description displays in the workspace under the object.
Using annotations to describe objects An annotation is an object in the workspace that describes a flow, part of a flow, or a diagram.
An annotation is associated with the object where it appears. When you import or export a job,
work flow, or data flow that includes annotations, you also import or export associated
annotations.
To add an annotation to the workspace 1. In the workspace, from the tool palette, click the Annotation icon and then click the
workspace.
An annotation appears on the diagram.
2. Double-click the annotation.
3. Add text to the annotation.
4. Click the cursor outside of the annotation to commit the changes.
You can resize and move the annotation by clicking and dragging.
You cannot hide annotations that you have added to the workspace. However, you can
move them out of the way or delete them.
104 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Validating and tracing jobs
Introduction It is a good idea to validate your jobs when you are ready for job execution to ensure there are
no errors. You can also select and set specific trace properties, which allow you to use the
various log files to help you read job execution status or troubleshoot job errors.
After completing this unit, you will be able to:
• Validate jobs
• Trace jobs
• Use log files
• Determine the success of a job
Validating jobs As a best practice, you want to validate your work as you build objects so that you are not
confronted with too many warnings and errors at one time. You can validate your objects as
you create a job or you can automatically validate all your jobs before executing them.
To validate jobs automatically before job execution 1. From the Tools menu, select Options.
The Options dialog box displays.
2. In the Category pane, expand the Designer branch and click General.
3. Select the Perform complete validation before job execution option.
Troubleshooting Batch Jobs—Learner’s Guide 105
SAP Data Services – Data Integrator XI 3.0
4. Click OK.
To validate objects on demand
1. From the Validation menu, select Validate ➤ Current View or All Objects in View.
The Output dialog box displays.
2. To navigate to the object where an error occurred, right-click the validation error message
and select Go To Error from the menu.
Tracing jobs Use trace properties to select the information that Data Services monitors and writes to the
trace log file during a job. Data Services writes trace messages to the trace log associated with
the current Job Server and writes error messages to the error log associated with the current
Job Server.
106 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
The following trace options are available.
Trace Description
Row Writes a message when a transform imports or exports a row.
Session Writes a message when the job description is read from the
repository, when the job is optimized, and when the job runs.
Work flow
Writes a message when the work flow description is read from
the repository, when the work flow is optimized, when the work
flow runs, and when the work flow ends.
Data flow Writes a message when the data flow starts and when the data
flow successfully finishes or terminates due to error.
Transform Writes a message when a transform starts and completes or
terminates.
Custom Transform Writes a message when a custom transform starts and completes
successfully.
Custom Function Writes a message of all user invocations of the AE_LogMessage
function from custom C code.
SQL Functions
SQL Transforms
Writes data retrieved before SQL functions:
• Every row retrieved by the named query before the SQL is
submitted in the key_generation function.
• Every row retrieved by the named query before the SQL is
submitted in the lookup function (but only if
PRE_LOAD_CACHE is not specified).
• When mail is sent using the mail_to function. Writes a message (using the Table Comparison transform) about
whether a row exists in the target table that corresponds to an
input row from the source table.
SQL Readers Writes the SQL query block that a script, query transform, or
SQL function submits to the system. Also writes the SQL results.
SQL Loaders Writes a message when the bulk loader starts, submits a warning
message, or completes successfully or unsuccessfully.
Troubleshooting Batch Jobs—Learner’s Guide 107
SAP Data Services – Data Integrator XI 3.0
Trace Description
Memory Source Writes a message for every row retrieved from the memory
table.
Memory Target Writes a message for every row inserted into the memory table.
Optimized Data Flow For Business Objects consulting and technical support use.
Tables Writes a message when a table is created or dropped.
Scripts and Script Functions Writes a message when a script is called, a function is called by
a script, and a script successfully completes.
Trace Parallel Execution Writes messages describing how data in a data flow is parallel
processed.
Access Server
Communication
Writes messages exchanged between the Access Server and a
service provider.
Stored Procedure Writes a message when a stored procedure starts and finishes,
and includes key values.
Audit Data
To set trace options
Writes a message that collects a statistic at an audit point and
determines if an audit rule passes or fails.
1. From the project area, right-click the job name and do one of the following:
• To set trace options for a single instance of the job, select Execute from the menu.
• To set trace options for every execution of the job, select Properties from the menu.
Save all files.
Depending on which option you selected, the Execution Properties dialog box or the Properties dialog box displays.
2. Click the Trace tab.
108 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
3. Under the name column, click a trace object name.
The Value drop-down list is enabled when you click a trace object name.
4. From the Value drop-down list, select Yes to turn the trace on.
5. Click OK.
Using log files As a job executes, Data Services produces three log files. You can view these from the project
area. The log files are, by default, also set to display automatically in the workspace when you
execute a job.
You can click the Trace, Monitor, and Error icons to view the following log files, which are
created during job execution.
Examining trace logs Use the trace logs to determine where an execution failed, whether the execution steps occur
in the order you expect, and which parts of the execution are the most time consuming.
Troubleshooting Batch Jobs—Learner’s Guide 109
SAP Data Services – Data Integrator XI 3.0
Examining monitor logs Use monitor logs to quantify the activities of the components of the job. It lists the time spent
in a given component of a job and the number of data rows that streamed through the
component.
Examining error logs Use the error logs to determine how an execution failed. If the execution completed without
error, the error log is blank.
110 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Using the Monitor tab The Monitor tab lists the trace logs of all current or most recent executions of a job.
The traffic-light icons in the Monitor tab indicate the following:
• Green light indicates that the job is running.
You can right-click and select Kill Job to stop a job that is still running.
• Red light indicates that the job has stopped.
You can right-click and select Properties to add a description for a specific trace log. This
description is saved with the log which can be accessed later from the Log tab.
• Red cross indicates that the job encountered an error.
Using the Log tab You can also select the Log tab to view a job’s log history.
Troubleshooting Batch Jobs—Learner’s Guide 111
SAP Data Services – Data Integrator XI 3.0
You may find these job log indicators:
Indicator Description
Indicates that the job executed successfully on this explicitly selected Job
Server.
Indicates that the job encountered an error on this explicitly selected Job
Server.
Indicates that the job executed successfully by a server group. The Job Server
listed executed the job.
Indicates that the job encountered an error while being executed by a server
group. The Job Server listed executed the job.
To view log files from the project area 1. In the project area, click the Log tab.
2. Select the job for which you want to view the logs.
3. In the workspace, in the Filter drop-down list, select the type of log you want to view.
4. In the list of logs, double-click the log to view details.
5. To copy log content from an open log, select one or more lines and use the key commands
[Ctrl+C].
Determining the success of the job The best measure of the success of a job is the state of the target data. Always examine your
data to make sure the data movement operation produced the results you expect. Be sure that:
• Data was not converted to incompatible types or truncated.
• Data was not duplicated in the target.
• Data was not lost between updates of the target.
112 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
• Generated keys have been properly incremented.
• Updated values were handled properly.
If a job fails to execute:
1. Check the Job server icon in the status bar.
2. Verify that the Job Service is running.
3. Check that the port number in Designer matches the number specified in Server Manager.
4. Use the Server Manager resync button to reset the port number in the Local Object Library.
Activity: Setting traces and adding annotations You will be sharing your jobs with other developers during the project, so you want to make
sure that you identify the purpose of the job you just created. You also want to ensure that the
job is handling the movement of each row appropriately.
Objectives • Add an annotation to a job so that other designers who reference this information will be
able to identify its purpose.
• Execute the job in trace mode to determine when a transform imports and exports from
source to target.
Instructions 1. Open the workspace for Alpha_Customers_Job.
2. Add an annotation to the workspace beside the data flow with an explanation of the purpose
of the job.
3. Save all objects you have created.
4. Execute Alpha_Customers_Job and enable Trace rows option on the Trace tab of the
Execution Properties dialog box.
Note that the result of tracing the rows is that there is an entry for each row in the log to
indicate how it is being handled by the data flow.
Troubleshooting Batch Jobs—Learner’s Guide 113
SAP Data Services – Data Integrator XI 3.0
Using View Data and the Interactive Debugger
Introduction You can debug jobs in Data Services using the View Data and Interactive Debugger features.
With View Data, you can view samples of source and target data for your jobs. Using the
Interactive Debugger, you can examine what happens to the data after each transform or object
in the flow.
After completing this unit, you will be able to:
• Use View Data with sources and targets
• Use the Interactive Debugger
• Set filters and breakpoints for a debug session
Using View Data with sources and targets With the View Data feature, you can check the status of data at any point after you import the
metadata for a data source, and before or after you process your data flows. You can check the
data when you design and test jobs to ensure that your design returns the results you expect.
View Data allows you to see source data before you execute a job. Using data details you can:
• Create higher quality job designs.
• Scan and analyze imported table and file data from the Local Object Library.
• See the data for those same objects within existing jobs.
• Refer back to the source data after you execute the job.
114 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
View Data also allows you to check your target data before executing your job, then look at
the changed data after the job executes. In a data flow, you can use one or more View Data
panels to compare data between transforms and within source and target objects.
View Data displays your data in the rows and columns of a data grid. The path for the selected
object displays at the top of the pane. The number of rows displayed is determined by a
combination of several conditions:
• Sample size: the number of rows sampled in memory. Default sample size is 1000 rows for
imported source, targets, and transforms.
• Filtering: the filtering options that are selected.
• Sorting: if your original data set is smaller or if you use filters, the number of returned rows
could be less than the default.
Keep in mind that you can have only two View Data windows open at any time. if you already
have two windows open and try to open a third, you are prompted to select which to close.
To use View Data in source and target tables • On the Datastore tab of the Local Object Library, right-click a table and select View Data
from the menu.
The View Data dialog box displays. To open a View Data pane in a data flow workspace
1. In the data flow workspace, click the magnifying glass button on a data flow object.
A large View Data pane appears beneath the current workspace area.
2. To compare data, click the magnifying glass button for another object.
A second pane appears below the workspace area, and the first pane area shrinks to
accommodate it.
Troubleshooting Batch Jobs—Learner’s Guide 115
SAP Data Services – Data Integrator XI 3.0
When both panes are filled and you click another View Data button, a small menu appears
containing window placement icons. The black area in each icon indicates the pane you
want to replace with a new set of data. When you select a menu option, the data from the
latest selected object replaces the data in the corresponding pane.
Using the Interactive Debugger
Designer includes an Interactive Debugger that allows you to troubleshoot your jobs by placing
filters and breakpoints on lines in a data flow diagram. This enables you to examine and modify
data row by row during a debug mode job execution.
The Interactive Debugger can also be used without filters and breakpoints. Running the job in
debug mode and then navigating to the data flow while remaining in debug mode enables you
to drill into each step of the data flow and view the data.
When you execute a job in debug mode, Designer displays several additional windows that
make up the Interactive Debugger: Call stack, Trace, Variables, and View Data panes.
116 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
The left View Data pane shows the data in the source table, and the right pane shows the rows
that have been passed to the query up to the breakpoint.
To start the Interactive Debugger 1. In the project area, right-click the job and select Start debug from the menu.
The Debug Properties dialog box displays.
Troubleshooting Batch Jobs—Learner’s Guide 117
SAP Data Services – Data Integrator XI 3.0
2. Set properties for the execution.
You can specify many of the same properties as you can when executing a job without
debugging. In addition, you can specify the number of rows to sample in the Data sample
rate field.
3. Click OK.
The debug mode begins.
While in debug mode, all other Designer features are set to read-only. A Debug icon is visible
in the task bar while the debug is in progress.
4. If you have set breakpoints, in the Interactive Debugger toolbar, click Get next row to move
to the next breakpoint.
5. To exit the debug mode, from the Debug menu, select Stop Debug.
Setting filters and breakpoints for a debug session You can set filters and breakpoints on lines in a data flow diagram before you start a debugging
session that allow you to examine and modify data row-by-row during a debug mode job
execution.
118 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
A debug filter functions the same as a simple Query transform with a WHERE clause. You can
use a filter if you want to reduce a data set in a debug job execution. The debug filter does not
support complex expressions.
A breakpoint is the location where a debug job execution pauses and returns control to you.
A breakpoint can be based on a condition, or it can be set to break after a specific number of
rows.
You can place a filter or breakpoint on the line between a source and a transform or two
transforms. If you set a filter and a breakpoint on the same line, Data Services applies the filter
first, which means that the breakpoint applies to the filtered rows only.
To set filters and breakpoints 1. In the data flow workspace, right-click the line that connects two objects and select Set
Filter/Breakpoint from the menu.
2. In the Breakpoint window, in the Filter or Breakpoint section, select the Set check box.
3. In the Column drop-down list, select the column to which the filter or breakpoint applies.
4. In the Operator drop-down list, select the operator for the expression.
5. In the Value field, enter the value to complete the expression.
The condition for filters/breakpoints do not use a delimiter for strings. 6. If you are using multiple conditions, repeat step 3 to step 5 for all conditions and select the
appropriate operator from the Concatenate all conditions using drop-down list.
Troubleshooting Batch Jobs—Learner’s Guide 119
SAP Data Services – Data Integrator XI 3.0
7. Click OK.
Activity: Using the Interactive Debugger To ensure that your job is processing the data correctly, you want to run the job in debug mode.
To minimize the data you have to review in the Interactive Debugger, you will set the debug
process to show only records from the USA (represented by a CountryID value of 1). Once you
have confirmed that the structure appears correct, you will run another debug session with all
records, breaking after every row.
Objectives • View the data in debug mode with a filter to limit records to those with a CountryID of 1
(USA).
• View the data in debug mode with a breakpoint to stop the debug process after each row.
Instructions 1. In the workspace for Alpha_Customers_DF, add a filter between the source and the Query
transform to filter the records so that only customers from the USA are included in the
debug session.
2. Execute Alpha_Customers_DF in debug mode.
3. Return to the data flow workspace and view data for the target table.
Note that only five rows were returned.
4. Remove the filter and add a breakpoint to break the debug session after every row.
5. Execute Alpha_Customers_DF in debug mode again.
6. Discard the first row, and then step through the rest of the records.
120 BusinessObjects Data Integrator XI 3.0: Core Concepts-Learner's Guide
SAP Data Services – Data Integrator XI 3.0
7. Exit the debugger, return to the data flow workspace, and view data for the target table.
Nate that only 24 of 25 rows were returned.
Troubleshooting Batch Jobs—Learner’s Guide 121
SAP Data Services – Data Integrator XI 3.0
Setting up auditing
Introduction You can collect audit statistics on the data that flows out of any Data Services object, such as a
source, transform, or target. If a transform has multiple distinct or different outputs (such as
Validation or Case), you can audit each output independently.
After completing this unit, you will be able to:
• Define audit points and rules
• Explain guidelines for choosing audit points
Setting up auditing When you audit data flows, you:
1. Define audit points to collect run-time statistics about the data that flows out of objects.
These audit statistics are stored in the Data Services repository.
2. Define rules with these audit statistics to ensure that the data extracted from sources,
processed by transforms, and loaded into targets is what you expect.
3. Generate a run-time notification that includes the audit rule that failed and the values of
the audit statistics at the time of failure.
4. Display the audit statistics after the job execution to help identify the object in the data flow
that might have produced incorrect data.
Defining audit points An audit point represents the object in a data flow where you collect statistics. You can audit
a source, a transform, or a target in a data flow.
When you define audit points on objects in a data flow, you specify an audit function. An audit
function represents the audit statistic that Data Services collects for a table, output schema, or
column. You can choose from these audit functions:
Data object Function Description
Table or
output
schema
Count
This function collects two statistics:
• Good count for rows that were successfully processed.
• Error count for rows that generated some type of error if
you enabled error handling.
The datatype for this function is integer.
122 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Data object Function Description
Column
Column
Column
Sum
Average
Checksum
Sum of the numeric values in the column. This function only
includes the good rows.
This function applies only to columns with a datatype of
integer, decimal, double, and real.
Average of the numeric values in the column. This function
only includes the good rows.
This function applies only to columns with a datatype of
integer, decimal, double, and real.
Detect errors in the values in the column by using the
checksum value.
This function applies only to columns with a datatype of
varchar.
Defining audit labels An audit label represents the unique name in the data flow that Data Services generates for
the audit statistics collected for each audit function that you define. You use these labels to
define audit rules for the data flow.
If the audit point is on a table or output schema, these two labels are generated for the Count
audit function: $Count_objectname
$CountError_objectname
If the audit point is on a column, the audit label is generated with this format: $auditfunction_objectname
Note: An audit label can become invalid if you delete or rename an object that had an audit point
defined on it. Invalid labels are listed as a separate node on the Labels tab. To resolve the issue, you
must re-create the labels and then delete the invalid items.
Defining audit rules Use auditing rules if you want to compare audit statistics for one object against another object.
For example, you can use an audit rule if you want to verify that the count of rows from the
source table is equal to the rows in the target table.
An audit rule is a Boolean expression which consists of a left-hand-side (LHS), a Boolean
operator, and a right-hand-side (RHS):
Troubleshooting Batch Jobs—Learner’s Guide 123
SAP Data Services – Data Integrator XI 3.0
• The LHS can be a single audit label, multiple audit labels that form an expression with one
or more mathematical operators, or a function with audit labels as parameters.
• The RHS can be a single audit label, multiple audit labels that form an expression with one
or more mathematical operators, a function with audit labels as parameters, or a constant.
These are examples of audit rules: $Count_CUSTOMER = $Count_CUSTDW
$Sum_ORDER_US + $Sum_ORDER_EUROPE = $Sum_ORDER_DW
round($Avg_ORDER_TOTAL) >= 10000
Defining audit actions You can choose any combination of the actions listed for notification of an audit failure:
• Email to list: Data Services sends a notification of which audit rule failed to the email
addresses that you list in this option. Use a comma to separate the list of email addresses.
You can specify a variable for the email list.
This option uses the smtp_to function to send email. Therefore, you must define the server
and sender for the Simple Mail Tool Protocol (SMTP) in the Data Services Server Manager.
• Script: Data Services executes the custom script that you create in this option.
• Raise exception: When an audit rule fails, the Error Log shows the rule that failed. The job
stops at the first audit rule that fails. This is an example of a message in the Error Log:
Audit rule failed <($Checksum_ODS_CUSTOMER = $Count_CUST_DIM)> for <Data Flow
Demo_DF>.
This action is the default. If you clear this action and an audit rule fails, the job completes
successfully and the audit does not write messages to the job log.
If you choose all three actions, Data Services executes them in the order presented.
You can see the audit status in one of these places:
Places where you can view audit information Action on Failure
Job Error Log, Metadata Reports Raise an exception
Email message, Metadata Reports Email to list
Wherever the custom script sends the audit messages,
Metadata Reports
To define audit points and rules in a data flow
Script
1. On the Data Flow tab of the Local Object Library, right-click a data flow and select Audit
from the menu.
124 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
The Audit dialog box displays with a list of the objects you can audit, with any audit functions
and labels for those objects.
2. On the Label tab, right-click the object you want to audit and select Properties from the
menu.
The Schema Properties dialog box displays.
3. In the Audit tab of the Schema Properties dialog box, in the Audit function drop-down
list, select the audit function you want to use against this data object type.
The audit functions displayed in the drop-down menu depend on the data object type that
you have selected.
Default values are assigned for the audit labels, which can be changed if required.
Troubleshooting Batch Jobs—Learner’s Guide 125
SAP Data Services – Data Integrator XI 3.0
4. Click OK.
5. Repeat step 2 to step 4 for all audit points.
6. On the Rule tab, under Auditing Rules, click Add.
The expression editor activates and the Custom options become available for use. The
expression editor contains three drop-down lists where you specify the audit labels for the
objects you want to audit and choose the Boolean expression to use between these labels.
126 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
7. In the left-hand-side drop-down list in the expression editor, select the audit label for the
object you want to audit.
8. In the operator drop-down list in the expression editor, select a Boolean operator.
9. In the right-hand-side drop-down list in the expression editor, select the audit label for the
second object you want to audit.
If you want to compare audit statistics for one or more objects against statistics for multiple
other objects or a constant, select the Custom radio button, and click the ellipsis button
beside Functions. This opens up the full-size smart editor where you can drag different
functions and labels to use for auditing.
10. Repeat step 7 to step 10 for all audit rules.
11. Under Action on Failure, select the action you want.
12. Click Close. To trace audit data
1. In the project area, right-click the job and select Execute from the menu.
2. In the Execution Properties window, click the Trace tab.
3. Select Trace Audit Data.
4. In the Value drop-down list, select Yes.
5. Click OK.
The job executes and the job log displays the Audit messages based on the audit function
that is used for the audit object.
Choosing audit points When you choose audit points, consider the following:
• The Data Services optimizer cannot push down operations after the audit point. Therefore,
if the performance of a query that is pushed to the database server is more important than
gathering audit statistics from the source, define the first audit point on the query or later
in the data flow.
For example, suppose your data flow has a source, a Query transform, and a target, and the
Query has a WHERE clause that is pushed to the database server that significantly reduces
Troubleshooting Batch Jobs—Learner’s Guide 127
SAP Data Services – Data Integrator XI 3.0
the amount of data that returns to Data Services. Define the first audit point on the Query,
rather than on the source, to obtain audit statistics on the results.
• If a pushdown_sql function is after an audit point, Data Services cannot execute it.
• The auditing feature is disabled when you run a job with the debugger.
• If you use the CHECKSUM audit function in a job that normally executes in parallel, Data
Services disables the Degrees of Parallelism (DOP) for the whole data flow. The order of
rows is important for the result of CHECKSUM, and DOP processes the rows in a different
order than in the source. For more information on DOP, see “Using Parallel Execution” and
“Maximizing the number of push-down operations” in the Data Services Performance
Optimization Guide.
Activity: Using auditing in a data flow You must ensure that all records from the Customer table in the Alpha database are being
moved to the Delta staging database using the audit logs.
Objectives • Add audit points to the source and target tables.
• Create an audit rule to ensure that the count of both tables is the same.
• Execute the job with auditing enabled.
Instructions 1. In the Local Object Library, set up auditing for Alpha_Customers_DF by adding an audit
point to count the total number of records in the source table.
2. Add another audit point to count the total number of records in the target table.
3. Construct an audit rule that states that, if the count from both tables is not the same, the
audit must raise an exception in the log.
4. Execute Alpha_Customers_Job. Ensure that the Enable auditing option is selected on the
Parameters tab of the Execution Properties dialog box, and that the Trace Audit Data option
is enabled on the Trace tab.
Note that the audit rule passes validation.
A solution file called SOLUTION_Audit.atl is included in your resource CD. To check the
solution, import the file and open it to view the data flow design and mapping logic. Do not
execute the solution job, as this may override the results in your target table.
128 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Quiz: Troubleshooting batch jobs 1. List some reasons why a job might fail to execute.
2. Explain the View Data feature.
3. What must you define in order to audit a data flow?
4. True or false? The auditing feature is disabled when you run a job with the debugger.
Troubleshooting Batch Jobs—Learner’s Guide 129
SAP Data Services – Data Integrator XI 3.0
Lesson summary After completing this lesson, you are now able to:
• Use descriptions and annotations
• Validate and trace jobs
• Use View Data and the Interactive Debugger
• Use auditing in data flows
SAP Data Services – Data Integrator XI 3.0
130 BusinessObjects Data Integrator XI 3.0: Core Concepts-Learner's Guide
Using Functions, Scripts, and Variables—Learner’s Guide 131
SAP Data Services – Data Integrator XI 3.0
Lesson 5
Using Functions, Scripts, and Variables
Lesson introduction Data Services gives you the ability to perform complex operations using functions and to extend
the flexibility and re-usability of objects by writing scripts, custom functions, and expressions
using Data Services scripting language and variables.
After completing this lesson, you will be able to:
• Define built-in functions
• Use functions in expressions
• Use the lookup function
• Use the decode function
• Use variables and parameters
• Use Data Services scripting language
• Script a custom function
132 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Defining built-in functions
Introduction Data Services supports built-in and custom functions.
After completing this unit, you will be able to:
• Define functions
• List the types of operations available for functions
• Describe other types of functions
Defining functions Functions take input values and produce a return value. Functions also operate on individual
values passed to them. Input values can be parameters passed into a data flow, values from a
column of data, or variables defined inside a script.
You can use functions in expressions that include scripts and conditional statements.
Note: Data Services does not support functions that include tables as input or output parameters,
except functions imported from SAP R/3.
Listing the types of operations for functions Functions are grouped into different categories:
Type Description Functions
Aggregate Functions Performs calculations on
numeric values.
avg, count, count_distinct,
max, min, sum
Conversion Functions
Converts values to specific
datatypes.
cast, interval_to_char,
julian_to_date, load_to_xml,
long_to_varchar,
num_to_interval, to_char,
to_date, to_decimal,
to_decimal_ext,
varchar_to_long
Custom Functions Performs functions defined by the user.
Database Functions Performs operations specific
to databases.
key_generation, sql, total_rows
Using Functions, Scripts, and Variables—Learner’s Guide 133
SAP Data Services – Data Integrator XI 3.0
Type Description Functions
Date Functions
Environment Functions
Lookup Functions
Performs calculations and
conversions on date values. Performs operations specific
to your Data Services
environment. Looks up data in other tables.
add_months,
concat_date_time, date_diff,
date_part, day_in_month,
day_in_week, day_in_year,
fiscal_day, isweekend, julian,
last_date, month, quarter,
sysdate, systime,
week_in_month,
week_in_year, year
get_env, get_error_filename,
get_monitor_filename,
get_trace_filename, is_set_env,
set_env
lookup, lookup_ext,
lookup_seq
Math Functions
Performs complex
mathematical operations on
numeric values.
abs, ceil, floor, ln, log, mod,
power, rand, rand_ext, round,
sqrt, trunc
Miscellaneous Functions
Performs various operations.
base64_decode,
base64_encode,
current_configuration,
current_system_configuration,
dataflow_name,
datastore_field_value,
db_database_name, db_owner,
db_type, db_version, decode,
file_exists, gen_row_num,
gen_row_num_by_group,
get_domain_description,
get_file_attribute, greatest,
host_name, ifthenelse,
is_group_changed, isempty,
job_name, least, nvl,
previous_row_value,
pushdown_sql,
raise_exception,
raise_exception_ext,
repository_name, sleep,
system_user_name,
134 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Type
String Functions
System Functions
Description
Performs operations on
alphanumeric strings of data. Performs system operations.
Functions
table_attribute, truncate_table,
wait_for_file, workflow_name
ascii, chr, double_metaphone,
index, init_cap, length, literal,
lower, lpad, lpad_ext, ltrim,
ltrim_blanks, ltrim_blanks_ext,
match_pattern, match_regex,
print, replace_substr,
replace_substr_ext, rpad,
rpad_ext, rtrim, rtrim_blanks,
rtrim_blanks_ext,
search_replace, soundex,
substr, upper, word, word_ext
exec, mail_to, smtp_to
Validation Functions
Validates specific types of
values.
is_valid_date,
is_valid_datetime,
is_valid_decimal,
is_valid_double, is_valid_int,
is_valid_real, is_valid_time
Defining other types of functions In addition to built-in functions, you can also use these functions:
• Database and application functions:
These functions are specific to your RDBMS. You can import the metadata for database and
application functions and use them in Data Services applications. At run time, Data Services
passes the appropriate information to the database or application from which the function
was imported.
The metadata for a function includes the input, output, and their datatypes. If there are
restrictions on data passed to the function, such as requiring uppercase values or limiting
data to a specific range, you must enforce these restrictions in the input. You can either test
the data before extraction or include logic in the data flow that calls the function.
You can import stored procedures from DB2, Microsoft SQL Server, Oracle, and Sybase
databases. You can also import stored packages from Oracle. Stored functions from SQL
Server can also be imported. For more information on importing functions, see “Custom
Datastores”, in Chapter 5, in the Data Services Reference Guide.
• Custom functions:
Using Functions, Scripts, and Variables-Learner's Guide 135
SAP Data Services – Data Integrator XI 3.0
These are functions that you define. You can create your own functions by writing script
functions in Data Services scripting language.
136 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Using functions in expressions
Introduction Functions can be used in expressions to map return values as new columns, which allows
columns that are not in the initial input data set to be specified in the output data set.
After completing this unit, you will be able to:
• Use functions in expressions
Defining functions in expressions Functions are typically used to add columns based on some other value (lookup function) or
generated key fields. You can use functions in:
• Transforms: The Query, Case, and SQL transforms support functions.
• Scripts: These are single-use objects used to call functions and assign values to variables in
a work flow.
• Conditionals: These are single-use objects used to implement branch logic in a work flow.
• Other custom functions: These are functions that you create as required.
Before you use a function, you need to know if the function’s operation makes sense in the
expression you are creating. For example, the max function cannot be used in a script or
conditional where there is no collection of values on which to operate.
You can add existing functions in an expression by using the Smart Editor or the Function
wizard. The Smart Editor offers you many options, including variables, datatypes, keyboard
shortcuts, and so on. The Function wizard allows you to define parameters for an existing
function and is recommended for defining complex functions.
To use the Smart Editor 1. Open the object in which you want to use an expression.
2. Click the ellipses (...) button.
The Smart Editor displays.
Using Functions, Scripts, and Variables—Learner’s Guide 137
SAP Data Services – Data Integrator XI 3.0
3. Click the Functions tab and expand a function category.
4. Click and drag the specific function onto the workspace.
5. Enter the input parameters based on the syntax of your formula.
6. Click OK. To use the Function wizard
1. Open the object in which you want to use an expression.
2. Click Functions.
The Select Function dialog box opens.
138 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
3. In the Function list, select a category.
4. In the Function name list, select a specific function.
The functions shown depend on the object you are using. Clicking each function separately
also displays a description of the function below the list boxes.
5. Click Next.
The Define Input Parameter(s) dialog box displays. The options available depend on the
selected function.
6. Click the drop-down arrow next to the input parameters.
The Input Parameter dialog box displays.
Using Functions, Scripts, and Variables—Learner’s Guide 139
SAP Data Services – Data Integrator XI 3.0
7. Double-click to select the source object and column for the function.
8. Repeat steps 6 and 7 for all other input parameters.
9. Click Finish.
Activity: Using the search_replace function When evaluating the customer data for Alpha Acquisitions, you discover a data entry error
where the contact title of Account Manager has been entered as Accounting Manager. You
want to clean up this data before it is moved to the data warehouse.
Objective • Use the search_replace function in an expression to change the contact title from Accounting
Manager to Account Manager.
Instructions 1. In the Alpha_Customers_DF workspace, open the transform editor for the Query transform.
140 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
2. On the Mapping tab, delete the existing expression for the Title column.
3. Using the Function wizard, create a new expression for the Title column using the
search_replace function (under String functions) to replace the full string of "Accounting
Manager" with "Account Manager".
Note: Be aware that the search_replace function can react unpredictably if you use the
external table option.
4. Execute Alpha_Customers_Job with the default execution properties and save all objects
you have created.
5. Return to the data flow workspace and view data for the target table.
Note that the titles for the affected contacts have been changed.
A solution file called SOLUTION_SearchReplace.atl is included in your resource CD. To check
the solution, import the file and open it to view the data flow design and mapping logic. Do
not execute the solution job, as this may override the results in your target table.
Using Functions, Scripts, and Variables—Learner’s Guide 141
SAP Data Services – Data Integrator XI 3.0
Using the lookup function
Introduction Lookup functions allow you to look up values in other tables to populate columns.
After completing this unit, you will be able to:
• Use the lookup function to look up values in another table
Using lookup tables Lookup functions allow you to use values from the source table to look up values in other
tables to generate the data that populates the target table.
Lookups enable you to store re-usable values in memory to speed up the process. Lookups are
useful for values that rarely change.
The lookup, lookup_seq, and lookup_ext functions all provide a specialized type of join, similar
to an SQL outer join. While a SQL outer join may return multiple matches for a single record
in the outer table, lookup functions always return exactly the same number of records that are
in the source table.
While all lookup functions return one row for each row in the source, they differ in how they
choose which of several matching rows to return:
• Lookup does not provide additional options for the lookup expression.
• Lookup_ext allows you to specify an Order by column and Return policy (Min, Max) to
return the record with the highest/lowest value in a given field (for example, a surrogate
key).
142 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
• Lookup_seq searches in matching records to return a field from the record where the sequence
column (for example, effective_date) is closest to but not greater than a specified sequence
value (for example, a transaction date).
lookup_ext
The lookup_ext function is recommended for lookup operations because of its enhanced options.
You can use this function to retrieve a value in a table or file based on the values in a different
source table or file. This function also extends functionality by allowing you to:
• Return multiple columns from a single lookup.
• Choose from more operators to specify a lookup condition.
• Specify a return policy for your lookup.
• Perform multiple (including recursive) lookups.
• Call lookup_ext in scripts and custom functions. This also lets you re-use the lookups
packaged inside scripts.
• Define custom SQL using the SQL_override parameter to populate the lookup cache,
narrowing large quantities of data to only the sections relevant for your lookup(s).
• Use lookup_ext to dynamically execute SQL.
• Call lookup_ext, using the Function wizard, in the query output mapping to return multiple
columns in a Query transform.
• Design jobs to use lookup_ext without having to hard code the name of the translation file
at design time.
• Use lookup_ext with memory datastore tables.
Tip: Use this function to the right of the Query transform instead of to the right of a column
mapping. This allows you to select multiple output columns and go back to edit the function
in the Function wizard instead of manually editing the function’s complex syntax.
Feature Details
Syntax
lookup_ext ([translate_table, cache_spec, return_policy],
[return_column_list], [default_value_list], [condition_list],
[orderby_column_list], [output_variable_list], [sql_override])
Return value Returns any type of value. The return type is the first lookup column
in return_column_list.
Where
The following where clauses are available:
• translate_table represents the table, file, or memory datastore that
contains the value you are looking up (result_column_list).
• cache_spec represents the caching method the lookup_ext operation
uses.
• return_policy specifies whether the return columns should be
obtained from the smallest or the largest row based on values in
the order by columns.
Using Functions, Scripts, and Variables—Learner’s Guide 143
SAP Data Services – Data Integrator XI 3.0
Feature Details
• return_column_list is a comma-separated list containing the names
of output columns in the translate_table.
• default_value_list is a comma-separated list containing the default
expressions for the output columns. When no rows match the lookup
condition, the default values are returned for the output column.
• condition_list is a list of triplets that specify lookup conditions. Each
set in a triplet contains a compare_column, a compare operator
(<,<=,>,>=,=. IS, IS NOT), and a compare expression.
• orderby_column_list is a comma-separated list of column names
from the translate_table.
• output_variable_list is a comma-separated list of output variables.
• sql_override is available in the Function Wizard. It must contain a
valid, single-quoted SQL SELECT statement of a $variable of type
varchar to populate the lookup cache when the cache specification
is PRE_LOAD_CACHE.
Lookup(ds.owner.emp, empname, .no body., .NO_CACHE., empno,
1);
Example Lookup_ext[(ds.owner.emp, .NO_CACHE.,.MAX.], [empname], [.no
body.]. [empno, .=., 1]
These expressions both retrieve the name of an employee whose empno
is equal to 1.
To create a lookup expression 1. Open the Query transform.
The Query transform should have at least one main source table and one lookup table, and
it must be connected to a single target object.
2. Select the output schema column for which the lookup function is being performed.
3. In the Mapping tab, click Functions.
The Select Function window opens.
4. In the Function list, select Lookup Functions.
5. In the Function name list, select lookup_ext.
6. Click Next.
The Lookup_ext - Select Parameters dialog box displays.
7. In the Translate table drop-down list, select the lookup table.
8. Change the caching specification, if required.
9. Under Condition, in the Table column drop-down list, select the key in the lookup table
that corresponds to the source table.
144 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
10. In the Op. drop-down list, select an operator.
11. Enter the other logical join from the source table in the Expression column.
You can click and drag the column from the Available parameters pane to the Expression
column. For a direct lookup, click and drag the key from the Input Schema (source table)
that corresponds to the lookup table.
12. Under Output parameters, in the Table column drop-down list, select the column with the
value that will be returned by the lookup function.
13. Specify default values and order by parameters, if required.
14. Click Finish.
Activity: Using the lookup_ext() function In the Alpha Acquisitions database, the country for a customer is stored in a separate table and
referenced with a foreign key. To speed up access to information in the data warehouse, this
lookup should be eliminated.
Using Functions, Scripts, and Variables—Learner’s Guide 145
SAP Data Services – Data Integrator XI 3.0
Objective • Use the lookup_ext function to swap the ID for the country in the Customers table for Alpha
Acquisitions with the actual value from the Countries table.
Instructions 1. In the Alpha_Customers_DF workspace, open the transform editor for the Query transform.
2. On the Mapping tab, delete the current expression for the Country column.
3. Use the Functions wizard to create a new lookup expression using the lookup_ext function
with the following parameters:
Field/Option Value
Translate table
Condition
Alpha.alpha.country
Table column COUNTRYID
Op. =
Expression
Output parameters
customer.COUNTRYID
Table column
The following code is generated:
COUNTRYNAME
lookup_ext([Alpha.alpha.country,'PRE_LOAD_CACHE','MAX'],
[COUNTRYNAME],[NULL],[COUNTRYID,'=',customer.COUNTRYID]) SET
("run_as_separate_process"='no')
4. Execute Alpha_Customers_Job with the default execution properties and save all objects
you have created.
5. Return to the data flow workspace and view data for the target table after the lookup
expression is added.
A solution file called SOLUTION_LookupFunction.atl is included in your resource CD. To check
the solution, import the file and open it to view the data flow design and mapping logic. Do
not execute the solution job, as this may override the results in your target table.
146 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Using the decode function
Introduction You can use the decode function as an alternative to nested if/then/else conditions.
After completing this unit, you will be able to:
• Use the decode function
Explaining the decode function You can use the decode function to return an expression based on the first condition in the
specified list of conditions and expressions that evaluates to TRUE. It provides an alternate
way to write nested ifthenelse functions.
Use this function to apply multiple conditions when you map columns or select columns in a
query. For example, you can use this function to put customers into different groupings.
The syntax of the decode function uses the following format:
decode(condition_and_expression_list, default_expression)
The elements of the syntax break down as follows:
Element Description
Return value
Where
expression or default_expression
Returns the value associated with the first
condition that evaluates to TRUE.
The datatype of the return value is the data
type of the first expression in the
condition_and_expression_list.
Note: If the data type of any subsequent expression
or the default_expression is not convertible to the data
type of the first expression, Data Integrator produces
an error at validation. If the data types are convertible
but do not match, a warning appears at validation. condition_and_expression_list
Represents a comma-separated list of one or
more pairs that specify a variable number of
conditions. Each pair contains one condition
and one expression separated by a comma.
You must specify at least one condition and
expression pair:
• The condition evaluates to TRUE or FALSE.
Using Functions, Scripts, and Variables—Learner’s Guide 147
SAP Data Services – Data Integrator XI 3.0
Element Description
• The expression is the value that the function
returns if the condition evaluates to TRUE. default_expression
Represents an expression that the function
returns if none of the conditions in
condition_and_expression_list evaluate to
TRUE.
Note: You must specify a default_expression.
The decode function provides an easier way to write nested ifthenelse functions. In nested
ifthenelse functions, you must write nested conditions and ensure that the parentheses are in
the correct places as in this example: ifthenelse((EMPNO = 1),'111',
ifthenelse((EMPNO = 2),'222',
ifthenelse((EMPNO = 3),'333',
ifthenelse((EMPNO = 4),'444',
'NO_ID'))))
In the decode function, you list the conditions as in this example: decode((EMPNO = 1),'111',
(EMPNO = 2),'222',
(EMPNO = 3),'333',
(EMPNO = 4),'444',
'NO_ID')
Therefore, decode is less prone to error than nested ifthenelse functions.
To improve performance, Data Services pushes this function to the database server when
possible. Thus, the database server, rather than Data Integrator, evaluates the decode function.
To configure the decode function 1. Open the Query transform.
2. Select the output schema column for which the decode function is being performed.
3. In the Mapping tab, click Functions.
The Select Function window opens.
4. In the Function list, select Miscellaneous Functions.
5. In the Function name list, select decode.
148 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
6. Click Next.
The Define Input Parameter(s) dialog box displays.
7. In the Conditional expression field, select or enter the IF clause in the case logic.
8. In the Case expression field, select or enter the THEN clause.
9. In the Default expression field, select or enter the ELSE clause.
10. Click Finish.
11. If required, add any additional THEN clauses in the mapping expression.
Activity: Using the decode function You need to calculate the total value of all orders, including their discounts, for reporting
purposes.
Objective • Use the sum and decode functions to calculate the total value of orders in Order_Details
table.
Instructions 1. In the Omega project, create a new batch job called Alpha_Order_Sum_Job with a data flow
called Alpha_Order_Sum_DF.
2. In the Alpha_Order_Sum_DF workspace, add the Order_Details and Product tables from
the Alpha datastore as the source objects.
3. Add a new template table to the Delta datastore called order_sum as the target object.
4. Add a Query transform and connect all objects.
5. In the transform editor for the Query transform, on the WHERE tab, propose a join between
the two source tables.
6. Map the ORDERID column from the input schema to the output schema.
7. Create a new output column called TOTAL_VALUE with a data type of decimal(10,2).
8. On the Mapping tab of the new output column, use the Function wizard or the Smart Editor
to construct an expression to calculate the total value of the orders using the decode and
sum functions.
The discount and order total can be multiplied to determine the total after discount. The
decode functions allows you to avoid multiplying order with zero discount by zero.
Consider the following:
• The expression must specify that if the value in the DISCOUNT column is not zero
(Conditional expression), then the total value of the order is calculated by multiply the
QUANTITY from the order_details table by the COST from the product table, and then
multiplying that value by the DISCOUNT (Case expression).
Using Functions, Scripts, and Variables—Learner’s Guide 149
SAP Data Services – Data Integrator XI 3.0
• Otherwise, the total value of the order is calculated by simply multiplying the QUANTITY
from the order_details table by the COST from the product table (Default expression).
• Once these values are calculated for each order, a sum must be calculated for the entire
collection of orders.
Tip: You can use the Function wizard to construct the decode portion of the mapping, and
then use the Smart Editor or the main window in the Mapping tab to wrap the sum function
around the expression.
The expression should be:
sum(decode(order_details.DISCOUNT <> 0, (order_details.QUANTITY * product.COST)
* order_details.DISCOUNT, order_details.QUANTITY * product.COST))
9. On the GROUP BY tab, add the order_details.ORDERID column.
10. Execute Alpha_Orders_Job with the default execution properties and save all objects you
have created.
11. Return to the data flow workspace and view data for the target table after the decode
expression is added to confirm that order 11146 has a total value of $204,000.
A solution file called SOLUTION_DecodeFunction.atl is included in your resource CD. To check
the solution, import the file and open it to view the data flow design and mapping logic. Do
not execute the solution job, as this may override the results in your target table.
150 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Using scripts, variables, and parameters
Introduction With the Data Services scripting language, you can assign values to variables, call functions,
and use standard string and mathematical operators to transform data and manage work flow.
After completing this unit, you will be able to:
• Describe the purpose of scripts, variables, and parameters
• Explain the differences between global and local variables
• Set global variable values using properties
• Describe the purpose of substitution parameters
Defining scripts To apply decision-making and branch logic to work flows, you will use a combination of scripts,
variables, and parameters to calculate and pass information between the objects in your jobs.
A script is a single-use object that is used to call functions and assign values in a work flow.
Typically, a script is executed before data flows for initialization steps and used in conjunction
with conditionals to determine execution paths. A script may also be used after work flows or data flows to record execution information such as time, or a change in the number of rows in
a data set.
Use a script when you want to calculate values that will be passed on to other parts of the work
flow. Use scripts to assign values to variables and execute functions.
A script can contain these statements:
• Function calls
• If statements
• While statements
• Assignment statements
• Operators
Defining variables A variable is common component in scripts that acts as a placeholder to represent values that
have the potential to change each time a job is executed. To make them easy to identify in an
expression, variable names start with a dollar sign ($). They can be of any datatype supported
by Data Services.
You can use variables in expressions in scripts or transforms to facilitate decision making or
data manipulation (using arithmetic or character substitution). A variable can be used in a
LOOP or IF statement to check a variable's value to decide which step to perform.
Using Functions, Scripts, and Variables—Learner’s Guide 151
SAP Data Services – Data Integrator XI 3.0
Note that variables can be used to enable the same expression to be used for multiple output
files. Variables can be used as file names for:
• Flat file sources and targets
• XML file sources and targets
• XML message targets (executed in the Designer in test mode)
• Document file sources and targets (in an SAP R/3 environment)
• Document message sources and targets (SAP R/3 environment)
In addition to scripts, you can also use variables in a catch or a conditional. A catch is part of
a serial sequence called a try/catch block. The try/catch block allows you to specify alternative
work flows if errors occur while Data Services is executing a job. A conditional is a single-use
object available in work flows that allows you to branch the execution logic based on the results
of an expression. The conditional takes the form of an if/then/else statement.
Defining parameters A parameter is another type of placeholder that calls a variable. This call allows the value from
the variable in a job or work flow to be passed to the parameter in a dependent work flow or
data flow. Parameters are most commonly used in WHERE clauses.
Combining scripts, variables, and parameters To illustrate how scripts, variables, and parameters are used together, consider an example
where you start with a job, work flow, and data flow. You want the data flow to update only
those records that have been created since the last time the job executed.
To accomplish this, you would start by creating a variable for the update time at the work flow
level, and a parameter at the data flow level that calls the variable.
Next, you would create a script within the work flow that executes before the data flow runs.
The script contains an expression that determines the most recent update time for the source
table.
The script then assigns that update time value to the variable, which identifies what that value
is used for and allows it to be re-used in other expressions.
Finally, in the data flow, you create an expression that uses the parameter to call the variable
and find out the update time. This allows the data flow to compare the update time to the
creation date of the records and identify which rows to extract from the source.
Defining global versus local variables There are two types of variables: local and global.
Local variables are restricted to the job or work flow in which they are created. You must use
parameters to pass local variables to the work flows and data flows in the object.
152 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Global variables are also restricted to the job in which they are created. However, they do not
require parameters to be passed to work flows and data flows in that job. Instead, you can
reference the global variable directly in expressions in any object in that job.
Global variables can simplify your work. You can set values for global variables in script objects
or using external job, execution, or schedule properties. For example, during production, you
can change values for default global variables at run time from a job's schedule without having
to open a job in the Designer.
Whether you use global variables or local variables and parameters depends on how and where
you need to use the variables. If you need to use the variable at multiple levels of a specific job,
it is recommended that you create a global variable.
However, there are implications to using global variables in work flows and data flows that
are re-used in other jobs. A local variable is included as part of the definition of the work flow
or data flow, and so it is portable between jobs. Because a global variable is part of the definition
of the job to which the work flow or data flow belongs, it is not included when the object is
re-used.
The following table summarizes the type of variables and parameters you can create for each
type of object.
Object Type Used by
Job Global variable Any object in the job.
Job
Local variable A script or conditional in the
job.
Work flow
Local variable
This work flow or passed
down to other work flows or
data flows using a parameter.
Work flow
Data flow
Parameter Parameter
Parent objects to pass local
variables. Work flows may
also return variables or
parameters to parent objects.
A WHERE clause, column
mapping, or function in the
data flow. Data flows cannot
return output values.
To ensure consistency across projects and minimize troubleshooting errors, it is a best practice
to use a consistent naming convention for your variable and parameters. Keep in mind that
names can include any alpha or numeric character or underscores (_), but cannot contain blank
spaces. To differentiate between the types of objects, start all names with a dollar sign ($), and
use the following prefixes:
Using Functions, Scripts, and Variables—Learner’s Guide 153
SAP Data Services – Data Integrator XI 3.0
Type Naming convention
Global variable $G_
Local variable $L_
Parameter $P_
To define a global variable, local variable, or parameter 1. Select the object in the project area.
For a global variable, the object must be a job. For a local variable, it can be a job or a work
flow. For a parameter, if can be work flow or a data flow.
2. From the Tools menu, select Variables.
The Variables and Parameters dialog box displays.
3. On the Definitions tab, right-click the type of variable or parameter and select Insert from
the menu.
4. Right-click the new variable or parameter and select Properties from the menu.
The Properties dialog box displays. The properties differ depending on the type of variable
or parameter.
154 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
5. In the Name field, enter a unique name for the variable or parameter.
6. In the Data type drop-down list, select the datatype for the variable or parameter.
7. For parameters, in the Parameter type drop-down list, select whether the parameter is for
input, output, or both.
For most applications, parameters are used for input. 8. Click OK.
You can create a relationship between a local variable and the parameter by specifying that
the name of the local variable as the value in the properties for the parameter in the Calls
tab.
To define the relationship between a local variable and a parameter
1. Select the dependent object in the project area.
2. From the Tools menu, select Variables to open the Variables and Parameters dialog box.
3. Click the Calls tab.
Any parameters that exist in dependent objects display on the Calls tab.
Using Functions, Scripts, and Variables—Learner’s Guide 155
SAP Data Services – Data Integrator XI 3.0
4. Right-click the parameter and select Properties from the menu.
The Parameter Value dialog box displays.
5. In the Value field, enter the name of the local variable you want the parameter to call or a
constant value.
If you enter a variable, it must of the same datatype as the parameter. 6. Click OK.
156 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Setting global variables using job properties In addition to setting a variable inside a job using a script, you can also set and maintain global
variable values outside a job using properties. Values set outside a job are processed the same
way as those set in a script. However, if you set a value for the same variable both inside and
outside a job, the value from the script overrides the value from the property.
Values for global variables can be set as a job property or as an execution or schedule property.
All values defined as job properties are shown in the Properties window. By setting values
outside a job, you can rely on the Properties window for viewing values that have been set for global variables and easily edit values when testing or scheduling a job.
To set a global variable value as a job property
1. Right-click a job in the Local Object Library or project area and select Properties from the
menu.
The Properties dialog box displays.
2. Click the Global Variable tab.
All global variables for the job are listed. 3. In the Value column for the global variable, enter a constant value or an expression, as
required.
4. Click OK.
You can also view and edit these default values in the Execution Properties dialog of the
Designer. This allows you to override job property values at run time.
Data Services saves values in the repository as job properties.
Defining substitution parameters Substitution parameters provide a way to define parameters that have a constant value for one
environment, but might need to get changed in certain situations. In case a change is needed,
it can be changed in one location to affect all jobs. You can override the parameter for particular
job executions.
The typical use case is for file locations (directory files or source/target/error files) that are
constant in one environment, but will change when a job is migrated to another environment
(like migrating a job from test to production).
As with variables and parameters, the name can include any alpha or numeric character or
underscores (_), but cannot contain blank spaces. Follow the same naming convention and
always begin the name for a substitution parameter with double dollar signs ($$) and an S_
prefix to differentiate from out-of-the-box substitution parameters.
Note: When exporting a job (to a file or a repository), the substitution parameter configurations
(values) are not exported with them. You need to export substitution parameters via a separate
command to a text file and use this text file to import into another repository.
Using Functions, Scripts, and Variables—Learner’s Guide 157
SAP Data Services – Data Integrator XI 3.0
To create a substitution parameter configuration 1. From the Tools menu, select Substitution Parameter Configurations.
The Substitution Parameter Editor dialog box displays all pre-defined substitution
parameters:
2. Double-click the header for the default configuration to change the name, and then click
outside of the header to commit the change.
3. Do any of the following:
• To add a new configuration, click Create New Substitution Parameter Configuration
to add a new column, enter a name for the new configuration in the header, and click
outside of the header to commit the change. Enter the values of the substitution parameters
as required for the new configuration.
• To add a new substitution parameter, in the Substitution Parameter column of the last
line, enter the name and value for the substitution parameter.
4. Click OK.
To add a substitution parameter configuration to a system configuration
1. From the Tools menu, select System Configurations.
The System Configuration Editor dialog box displays any existing system configurations:
158 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
2. For an existing system configuration, in the Substitution Parameter drop-down list, select
the substitution parameter configuration.
3. Click OK.
Using Functions, Scripts, and Variables—Learner’s Guide 159
SAP Data Services – Data Integrator XI 3.0
Using Data Services scripting language
Introduction With Data Services scripting language, you can assign values to variables, call functions, and
use standard string and mathematical operators. The syntax can be used in both expressions
(such as WHERE clauses) and scripts.
After completing this unit, you will be able to:
• Explain language syntax
• Use strings and variables in Data Services scripting language
Using basic syntax Expressions are a combination of constants, operators, functions, and variables that evaluate
to a value of a given datatype. Expressions can be used inside script statements or added to
data flow objects.
Data Services scripting language follows these basic syntax rules when you are creating an
expression:
• Each statement ends with a semicolon (;).
• Variable names start with a dollar sign ($).
• String values are enclosed in single quotation marks (').
• Comments start with a pound sign (#).
• Function calls always specify parameters, even if they do not use parameters.
• Square brackets substitute the value of the expression. For example:
Print('The value of the start date is:[sysdate()+5]');
• Curly brackets quote the value of the expression in single quotation marks. For example:
$StartDate = sql('demo_target', 'SELECT ExtractHigh FROM Job_Execution_Status
WHERE JobName = {$JobName}');
Using syntax for column and table references in expressions
Because expressions can be used inside data flow objects, they often contain column names.
The Data Services scripting language recognizes column and table names without special
syntax. For example, you can indicate the start_date column as the input to a function in the
Mapping tab of a query as: to_char(start_date, 'dd.mm.yyyy')
The column start_date must be in the input schema of the query.
160 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
If there is more than one column with the same name in the input schema of a query, indicate
which column is included in an expression by qualifying the column name with the table name.
For example, indicate the column start_date in the table status as: status.start_date
Column and table names as part of SQL strings may require special syntax based on the RDBMS
that the SQL is evaluated by. For example, select all rows from the LAST_NAME column of
the CUSTOMER table as: sql('oracle_ds','select CUSTOMER.LAST_NAME from CUSTOMER')
Using operators The operators you can use in expressions are listed in the following table in order of precedence.
Note that when operations are pushed to a RDBMS to perform, the precedence is determined
by the rules of the RDBMS.
Operator Description
+ Addition
- Subtraction
* Multiplication
/ Division
= Comparison, equals
< Comparison, is less than
<= Comparison, is less than or equal to
> Comparison, is greater than
>= Comparison, is greater than or equal to
!= Comparison, is not equal to
|| Concatenate
AND Logical AND
OR Logical OR
Using Functions, Scripts, and Variables—Learner’s Guide 161
SAP Data Services – Data Integrator XI 3.0
Operator Description
NOT Logical NOT
IS NULL Comparison, is a NULL value
IS NOT NULL
Reviewing script examples Example 1
$language = 'E';
$start_date = '1994.01.01';
$end_date = '1998.01.31';
Example 2
Comparison, is not a NULL value
$start_time_str = sql('tutorial_ds', 'select to_char(start_time,\'YYYY-MM-DD
HH24:MI:SS\')');
$end_time_str = sql('tutorial_ds', 'select to_char(max(last_update),\'YYYY-MM-DD
HH24:MI:SS\')');
$start_time = to_date($start_time_str, 'YYYY-MM-DD HH24:MI:SS');
$end_time = to_date($end_time_str, 'YYYY-MM-DD HH24:MI:SS');
Example 3 $end_time_str = sql('tutorial_ds', 'select to_char(end_time,\'YYYY-MM-DD
HH24:MI:SS\')');
if (($end_time_str IS NULL) or ($end_time_str = '')) $recovery_needed = 1;
else $recovery_needed = 0;
Using strings and variables Special care must be given to handling of strings. Quotation marks, escape characters, and
trailing blanks can all have an adverse effect on your script if used incorrectly.
Using quotation marks The type of quotation marks to use in strings depends on whether you are using identifiers or
constants. An identifier is the name of the object (for example, table, column, data flow, or
function). A constant is a fixed value used in computation. There are two types of constants:
• String constants (for example, 'Hello' or '2007.01.23')
162 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
• Numeric constants (for example, 2.14)
Identifiers need quotation marks if they contain special (non-alphanumeric) characters. For
example, you need a double quote for the following because it contains blanks: "compute large numbers"
Use single quotes for string constants.
Using escape characters If a constant contains a single quote (') or backslash (\) or another special character used by
the Data Services scripting language, then those characters must be preceded by an escape
character to be evaluated properly in a string. Data Services uses the backslash (\) as the escape
character.
Character Example
Single quote (') 'World\'s Books'
Backslash (\) 'C:\\temp'
Handling nulls, empty strings, and trailing blanks To conform to the ANSI VARCHAR standard when dealing with NULLS, empty strings, and
trailing blanks, Data Services:
• Treats an empty string as a zero length varchar value, instead of as a NULL value.
• Returns a value of FALSE when you use the operators Equal (=) and Not Equal (<>) to
compare to a NULL value.
• Provides IS NULL and IS NOT NULL operators to test for NULL values.
• Treats trailing blanks as regular characters when reading from all sources, instead of trimming
them.
• Ignores trailing blanks in comparisons in transforms (Query and Table Comparison) and
functions (decode, ifthenelse, lookup, lookup_ext, lookup_seq).
NULL values To represent NULL values in expressions, type the word NULL. For example, you can check
whether a column (COLX) is null or not with the following expressions: COLX IS NULL
COLX IS NOT NULL
Data Services does not check for NULL values in data columns. Use the function nvl to remove
NULL values. For more information on the NVL function, see “Functions and Procedures”,
Chapter 6 in the Data Services Reference Guide.
Using Functions, Scripts, and Variables—Learner’s Guide 163
SAP Data Services – Data Integrator XI 3.0
NULL values and empty strings Data Services uses the following two rules with empty strings:
• When you assign an empty string to a variable, Data Services treats the value of the variable
as a zero-length string.
An error results if you assign an empty string to a variable that is not a varchar. To assign
a NULL value to a variable of any type, use the NULL constant.
• As a constant (' '), Data Services treats the empty string as a varchar value of zero length.
Use the NULL constant for the null value. Data Services uses the following three rules with NULLS and empty strings in conditionals:
Rule 1 The Equals (=) and Is Not Equal to (<>) comparison operators against a NULL value always
evaluate to FALSE. This FALSE result includes comparing a variable that has a value of NULL
against a NULL constant.
The following table shows the comparison results for the variable assignments $var1 = NULL
and $var2 = NULL:
Condition Translates to Returns
If (NULL = NULL) NULL is equal to NULL FALSE
If (NULL != NULL) NULL is not equal to NULL FALSE
If (NULL = ' ') NULL is equal to empty string FALSE
If (NULL != ' ')
NULL is not equal to empty
string
FALSE
If ('bbb' = NULL) bbb is equal to NULL FALSE
If ('bbb' != NULL) bbb is not equal to NULL FALSE
If ('bbb' = ' ') bbb is equal to empty string FALSE
If ('bbb' != ' ')
bbb is not equal to empty
string
TRUE
If ($var1 = NULL) NULL is equal to NULL FALSE
If ($var != NULL) NULL is not equal to NULL FALSE
164 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Condition Translates to Returns
If ($var1 = ' ') NULL is equal to empty string FALSE
If ($var != ' ')
NULL is not equal to empty
string
FALSE
If ($var1 = $var2) NULL is equal to NULL FALSE
If ($var != $var2) NULL is not equal to NULL FALSE
The following table shows the comparison results for the variable assignments $var1 = ' '
and $var2 = ' ':
Condition Translates to Return
If ($var1 = NULL) Empty string is equal to NULL FALSE
If ($var != NULL)
Empty string is not equal to
NULL
FALSE
If ($var1 = ' ')
Empty string is equal to empty
string
TRUE
If ($var != ' ')
Empty string is not equal to
empty string
FALSE
If ($var1 = $var2)
Empty string is equal to empty
string
TRUE
If ($var != $var2)
Rule 2
Empty string is not equal to
empty string
FALSE
Use the IS NULL and IS NOT NULL operators to test the presence of null values. For example,
assuming a variable assignment $var1 = NULL;
Condition Translates to Return
If ('bbb' IS NULL) bbb is NULL FALSE
If ('bbb' IS NOT NULL) bbb is not NULL TRUE
If (' ' IS NULL) Empty string is NULL FALSE
Using Functions, Scripts, and Variables—Learner’s Guide 165
SAP Data Services – Data Integrator XI 3.0
Condition Translates to Return
If (' ' IS NOT NULL) Empty string is not NULL TRUE
If ($var1 IS NULL) NULL is NULL TRUE
If ($var1 IS NOT NULL)
Rule 3
NULL is not NULL FALSE
When comparing two variables, always test for NULL. In this scenario, you are not testing a
variable with a value of NULL against a NULL constant (as in the first rule). Either test each
variable and branch accordingly or test in the conditional as shown in the second row of the
following table.
Condition Recommendation
Do not compare without explicitly testing for
NULLS. It is not recommended to use this logic If ($var1 = $var2)
because any relational comparison to a NULL
value returns FALSE.
If ( (($var1 IS NULL) AND ($var2 IS
NULL)) OR ($var1 = $var2))
Execute the TRUE branch if both $var1 and
$var2 are NULL, or if neither are NULL but
are equal to each other.
166 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Scripting a custom function
Introduction If the built-in functions that are provided by Data Services do not meet your requirements, you
can create your own custom functions using the Data Services scripting language.
After completing this unit, you will be able to:
• Create a custom function
• Import a stored procedure to use as a custom function
Creating a custom function You can create your own functions by writing script functions in Data Services scripting
language using the Smart Editor. Saved custom functions appear in the Function wizard and
the Smart Editor under the Custom Functions category, and are also displayed on the Custom
Functions tab of the Local Object Library. You can edit and delete custom functions from the
Local Object Library.
Consider these guidelines when you create your own functions:
• Functions can call other functions.
• Functions cannot call themselves.
• Functions cannot participate in a cycle of recursive calls. For example, function A cannot
call function B if function B calls function A.
• Functions return a value.
• Functions can have parameters for input, output, or both. However, data flows cannot pass
parameters of type output or input/output.
Before creating a custom function, you must know the input, output, and return values and
their datatypes. The return value is predefined to be Return.
To create a custom function 1. On the Custom Functions tab of the Local Object Library, right-click the white space and
select New from the menu.
The Custom Function dialog box displays.
Using Functions, Scripts, and Variables—Learner’s Guide 167
SAP Data Services – Data Integrator XI 3.0
2. In the Function name field, enter a unique name for the new function.
3. In the Description field, enter a description.
4. Click Next.
The Smart Editor enables you to define the return type, parameter list, and any variables to
be used in the function.
168 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
5. On the Variables tab, expand the Parameters branch.
6. Right-click Return and select Properties from the menu.
The Return value Properties dialog box displays.
7. In the Data type drop-down list, select the datatype you want to return for the custom
function.
By default, the return datatype is set to integer.
8. Click OK.
9. To define a new variable or parameter for your custom function, in the Variables tab,
right-click the appropriate branch and select Insert from the menu.
10. In the Name field, enter a unique name for the variable or parameter.
11. In the Data type drop-down list, select the datatype for the variable or parameter.
12. For a parameter, in the Parameter type drop-down list, select whether the parameter is for
input, output, or both.
Data Services data flows cannot pass variable parameters of type output and input/output. 13. Click OK.
14. Repeat step 9 to step 13 for each variable or parameter required in your function.
Using Functions, Scripts, and Variables—Learner’s Guide 169
SAP Data Services – Data Integrator XI 3.0
When adding subsequent variables or parameters, the right-click menu will include options
to Insert Above or Insert Above. Use these menu commands to create, delete, or edit variables
or parameters.
15. In the main area of the Smart Editor, enter the expression for your function.
Your expression must include the Return parameter. 16. Click Validate to check the syntax of your function.
If your function contains syntax errors, Data Services displays a list of those errors in an
embedded pane below the editor. To see where the error occurs in the text, double-click an
error. The Smart Editor redraws to show the location of the error.
17. Click OK. To edit a custom function
1. On the Custom Functions tab of the Local Object Library, right-click the custom function
and select Edit from the menu.
2. In the Smart Editor, change the expression as required.
3. Click OK. To delete a custom function
1. On the Custom Functions tab of the Local Object Library, right-click the custom function
and select Delete from the menu.
2. Click OK to confirm the deletion.
Importing a stored procedure as a function If you are using Microsoft SQL Server, you can use stored procedures to insert, update, and
delete data in your tables. To use stored procedures in Data Services, you must import them
as custom functions.
To import a stored procedure 1. On the Datastores tab of the Local Object Library, expand the datastore that contains the
stored procedure.
2. Right-click Functions and select Import By Name from the menu.
The Import By Name dialog box displays.
170 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
3. In the Type drop-down list, select Function.
4. In the Name field, enter the name of the stored procedure.
5. Click OK.
Activity: Creating a custom function The Marketing department would like to send special offers to customers who have placed a
specified numbers of orders. This requires creating a custom function that must be able to be
called in a real-time job as a customer's order is entered into the system.
Objectives • Create a custom function to accept the input parameters of the Customer ID and the number
of orders required to receive a special order, check the Orders table, and return a value of
1 or 0.
• Create a batch job using the custom function to create an initial list of customers who have
placed more than five orders, and are therefore eligible to receive the special offer.
Instructions 1. In the Local Object Library, create a new custom function called CF_MarketingOffer.
2. In the Smart Editor for the function, create a parameter called $P_CustomerID with a datatype
of varchar(10) and a type of input.
3. Create a second parameter called $P_Orders with a datatype of int and a type of input.
4. Define the custom function as a conditional clause that specifies that, if the number of rows
in the Orders table is equal to the $P_Orders value for the Customer ID, then the function
should return a 1; otherwise, it should return 0.
The syntax should be as follows:
if ((sql('alpha', 'select count(*) from orders where customerid =
[$P_CustomerID]')) >= $P_Orders)
Return 1;
else Return 0;
Using Functions, Scripts, and Variables—Learner’s Guide 171
SAP Data Services – Data Integrator XI 3.0
5. In the Omega project, create a new batch job called Alpha_Marketing_Offer_Job with a data
flow called Alpha_Marketing_Offer_DF.
6. Create a new global variable for the job called $G_Num_to_Qual with a datatype of int.
7. In the job workspace, to the left of the data flow, create a new script called CheckOrders and
create an expression in the script to define the global variable as five orders to qualify.
The expression should be:
$G_Num_to_Qual = 5;
8. Connect the script to the data flow.
9. In the data flow workspace, add the Customer table from the Alpha datastore as the source
object.
10. Add a template table to the Delta datastore called offer_mailing_list as the target object.
11. Add two Query transforms and connect all objects. 12. In the transform editor for the first Query transform, map the following columns:
Schema In Schema Out
CONTACTNAME CONTACTNAME
ADDRESS ADDRESS
CITY CITY
POSTALCODE POSTALCODE 13. Create a new output column called OFFER_STATUS with a datatype of int.
14. On the Mapping table, map the new output column to the custom function using the Function
wizard. Specify the CUSTOMERID column for $P_CustomerID and the global variable for
$P_Orders.
The expression should be as follows:
CF_MarketingOffer(customer.CUSTOMERID, $G_Num_to_Qual)
15. In the transform editor for the second Query transform, map the following columns:
Schema In Schema Out
CONTACTNAME CONTACTNAME
ADDRESS ADDRESS
CITY CITY
172 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Schema In Schema Out
POSTALCODE POSTALCODE
16. On the WHERE tab, create an expression to select only those records where the
OFFER_STATUS value is 1.
The expression should be:
Query.OFFER_STATUS = 1
17. Execute Alpha_Marketing_Offer_Job with the default execution properties and save all
objects you have created.
18. Return to the data flow workspace and view data for the target table.
You should have one output record for contact Lev M. Melton in Quebec.
A solution file called SOLUTION_CustomFunction.atl is included in your resource CD. To check
the solution, import the file and open it to view the data flow design and mapping logic. Do
not execute the solution job, as this may override the results in your target table.
Using Functions, Scripts, and Variables—Learner’s Guide 173
SAP Data Services – Data Integrator XI 3.0
Quiz: Using functions, scripts, and variables 1. Describe the differences between a function and a transform.
2. Why are functions used in expressions?
3. What does a lookup function do? How do the different variations of the lookup function
differ?
4. What value would the Lookup_ext function return if multiple matching records were found
on the translate table?
5. Explain the differences between a variable and a parameter. 6. When would you use a global variable instead of a local variable?
7. What is the recommended naming convention for variables in Data Services?
8. Which object would you use to define a value that is constant in one environment, but may
change when a job is migrated to another environment?
a. Global variable
b. Local variable
c. Parameter
d. Substitution parameter
174 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Lesson summary After completing this lesson, you are now able to:
• Define built-in functions
• Use functions in expressions
• Use the lookup function
• Use the decode function
• Use variables and parameters
• Use Data Services scripting language
• Script a custom function
Using Platform Transforms—Learner’s Guide 175
SAP Data Services – Data Integrator XI 3.0
Lesson 6
Using Platform Transforms
Lesson introduction A transform enables you to control how data sets change in a data flow.
After completing this lesson, you will be able to:
• Describe platform transforms
• Use the Map Operation transform
• Use the Validation transform
• Use the Merge transform
• Use the Case transform
• Use the SQL transform
176 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Describing platform transforms
Introduction Transforms are optional objects in a data flow that allow you to transform your data as it moves
from source to target.
After completing this unit, you will be able to:
• Explain transforms
• Describe the platform transforms available in Data Services
• Add a transform to a data flow
• Describe the Transform Editor window
Explaining transforms Transforms are objects in data flows that operate on input data sets by changing them or by
generating one or more new data sets. The Query transform is the most commonly-used
transform.
Transforms are added as components to your data flow in the same way as source and target
objects. Each transform provides different options that you can specify based on the transform's
function. You can choose to edit the input data, output data, and parameters in a transform.
Some transforms, such as the Date Generation and SQL transforms, can be used as source
objects, in which case they do not have input options.
Transforms are often used in combination to create the output data set. For example, the Table
Comparison, History Preserve, and Key Generation transforms are used for slowly changing
dimensions.
Transforms are similar to functions in that they can produce the same or similar values during
processing. However, transforms and functions operate on a different scale:
• Functions operate on single values, such as values in specific columns in a data set.
• Transforms operate on data sets by creating, updating, and deleting rows of data.
Using Platform Transforms—Learner’s Guide 177
SAP Data Services – Data Integrator XI 3.0
Describing platform transforms The following platform transforms are available on the Transforms tab of the Local Object
Library:
Icon Transform Description
Case Divides the data from an input data set into multiple output
data sets based on IF-THEN-ELSE branch logic.
Map Operation Allows conversions between operation codes.
Merge Unifies rows from two or more input data sets into a single
output data set.
Query Retrieves a data set that satisfies conditions that you specify.
A query transform is similar to a SQL SELECT statement.
Row Generation Generates a column filled with integers starting at zero and
incrementing by one to the end value you specify.
SQL Performs the indicated SQL query operation.
Validation
Allows you to specify validation criteria for an input data set.
Data that fails validation can be filtered out or replaced. You
can have one validation rule per column.
178 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Using the Map Operation transform
Introduction The Map Operation transform enables you to change the operation code for records.
After completing this unit, you will be able to:
• Describe map operations
• Use the Map Operation transform
Describing map operations Data Services maintains operation codes that describe the status of each row in each data set
described by the inputs to and outputs from objects in data flows. The operation codes indicate
how each row in the data set would be applied to a target table if the data set were loaded into
a target. The operation codes are as follows:
Operation Code Description
NORMAL
Creates a new row in the target.
All rows in a data set are flagged as NORMAL when they are
extracted by a source table or file. If a row is flagged as NORMAL
when loaded into a target table or file, it is inserted as a new row
in the target.
Most transforms operate only on rows flagged as NORMAL.
Creates a new row in the target.
INSERT Only History Preserving and Key Generation transforms can accept
data sets with rows flagged as INSERT as input.
Is ignored by the target. Rows flagged as DELETE are not loaded.
DELETE Only the History Preserving transform, with the Preserve delete
row(s) as update row(s) option selected, can accept data sets with rows flagged as DELETE.
Overwrites an existing row in the target table.
UPDATE Only History Preserving and Key Generation transforms can accept
data sets with rows flagged as UPDATE as input.
Using Platform Transforms—Learner’s Guide 179
SAP Data Services – Data Integrator XI 3.0
Explaining the Map Operation transform The Map Operation transform allows you to change operation codes on data sets to produce
the desired output. For example, if a row in the input data set has been updated in some previous
operation in the data flow, you can use this transform to map the UPDATE operation to an
INSERT. The result could be to convert UPDATE rows to INSERT rows to preserve the existing
row in the target.
Data Services can push Map Operation transforms to the source database.
The next section gives a brief description the function, data input requirements, options, and
data output results for the Map Operation transform. For more information on the Map
Operation transform see “Transforms” Chapter 5 in the Data Services Reference Guide.
Inputs/Outputs Input for the Map Operation transform is a data set with rows flagged with any operation
codes. It can contain hierarchical data.
Use caution when using columns of datatype real in this transform, because comparison results
are unpredictable for this datatype.
Output for the Map Operation transform is a data set with rows flagged as specified by the
mapping operations.
Options The Map Operation transform enables you to set the Output row type option to indicate the
new operations desired for the input data set. Choose from the following operation codes:
INSERT, UPDATE, DELETE, NORMAL, or DISCARD.
180 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Activity: Using the Map Operation transform End users of employee reports have requested that employee records in the data mart contain
only current employees.
Objective • Use the Map Operation transform to remove any employee records that have a value in the
discharge_date column.
Instructions 1. In the Omega project, create a new batch job called Alpha_Employees_Current_Job with a
data flow called Alpha_Employees_Current_DF.
2. In the data flow workspace, add the Employee table from the Alpha datastore as the source
object.
3. Add the Employee table from the HR_datamart datastore as the target object.
4. Add the Query transform to the workspace and connect all objects.
5. In the transform editor for the Query transform, map all columns from the input schema to
the same column in the output schema.
6. On the WHERE tab, create an expression to select only those rows where discharge_date is
not empty.
The expression should be:
employee.discharge_date is not null
7. In the data flow workspace, disconnect the Query transform from the target table.
8. Add a Map Operation transform between the Query transform and the target table and
connect it to both.
9. In the transform editor for the Map Operation transform, change the settings so that rows
with an input operation code of NORMAL have an output operation code of DELETE.
10. Execute Alpha_Employees_Current_Job with the default execution properties and save all
objects you have created.
11. Return to the data flow workspace and view data for both the source and target tables.
Note that two rows were filtered from the target table.
A solution file called SOLUTION_MapOperation.atl is included in your resource CD. To check
the solution, import the file and open it to view the data flow design and mapping logic. Do
not execute the solution job, as this may override the results in your target table.
Using Platform Transforms—Learner’s Guide 181
SAP Data Services – Data Integrator XI 3.0
Using the Validation transform
Introduction The Validation transform enables you to create validation rules and move data into target
objects based on whether they pass or fail validation.
After completing this unit, you will be able to:
• Use the Validation transform
Explaining the Validation transform Use the Validation transform in your data flows when you want to ensure that the data at any
stage in the data flow meets your criteria.
For example, you can set the transform to ensure that all values:
• Are within a specific range
• Have the same format
• Do not contain NULL values
The Validation transform allows you to define a re-usable business rule to validate each record
and column. The Validation transform qualifies a data set based on rules for input schema
columns. It filters out or replaces data that fails your criteria. The available outputs are pass
and fail. You can have one validation rule per column.
For example, if you want to load only sales records for October 2007, you would set up a
validation rule that states: Sales Date is between 10/1/2007 to 10/31/2007. Data Services looks
at this date field in each record to validate if the data meets this requirement. If it does not, you
can choose to pass the record into a Fail table, correct it in the Pass table, or do both.
182 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Your validation rule consists of a condition and an action on failure:
• Use the condition to describe what you want for your valid data.
For example, specify the condition IS NOT NULL if you do not want any NULLS in data
passed to the specified target.
• Use the Action on Failure area to describe what happens to invalid or failed data.
Continuing with the example above, for any NULL values, you may want to select the Send
to Fail option to send all NULL values to a specified FAILED target table.
You can also create a custom Validation function and select it when you create a validation
rule. For more information on creating a custom Validation functions, see “Validation
Transform”, Chapter 12 in the Data Services Reference Guide.
The next section gives a brief description the function, data input requirements, options, and
data output results for the Validation transform. For more information on the Validation
transform see “Transforms” Chapter 5 in the Data Services Reference Guide.
Input/Output Only one source is allowed as a data input for the Validation transform.
The Validation transform outputs up to two different data sets based on whether the records
pass or fail the validation condition you specify. You can load pass and fail data into multiple
targets.
The Pass output schema is identical to the input schema. Data Services adds the following two
columns to the Fail output schemas:
• The DI_ERRORACTION column indicates where failed data was sent in this way:
○ The letter B is used for sent to both Pass and Fail outputs.
○ The letter F is used for sent only to the Fail output.
If you choose to send failed data to the Pass output, Data Services does not track the results.
You may want to substitute a value for failed data that you send to the Pass output because
Data Services does not add columns to the Pass output.
• The DI_ERRORCOLUMNS column displays all error messages for columns with failed
rules. The names of input columns associated with each message are separated by colons.
For example, “<ValidationTransformName> failed rule(s): c1:c2”.
If a row has conditions set for multiple columns and the Pass, Fail, and Both actions are
specified for the row, then the precedence order is Fail, Both, Pass. For example, if one
column’s action is Send to Fail and the column fails, then the whole row is sent only to the
Fail output. Other actions for other validation columns in the row are ignored.
Options When you use the Validation transform, you select a column in the input schema and create a
validation rule in the Validation transform editor. The Validation transform offers several
options for creating this validation rule:
Using Platform Transforms—Learner’s Guide 183
SAP Data Services – Data Integrator XI 3.0
Option Description
Enable Validation Turn the validation rule on and off for the
column.
Do not validate when NULL
Condition
Send all NULL values to the Pass output
automatically. Data Services will not apply the
validation rule on this column when an
incoming value for it is NULL.
Define the condition for the validation rule:
• Operator: select an operator for a Boolean
expression (for example, =, <, >) and enter
the associated value.
• In: specify a list of possible values for a
column.
• Between/and: specify a range of values for
a column.
• Match pattern: enter a pattern of upper and
lowercase alphanumeric characters to
ensure the format of the column is correct.
• Custom validation function: select a
function from a list for validation purposes.
Data Services supports Validation functions
that take one parameter and return an
integer datatype. If a return value is not a
zero, then Data Services processes it as
TRUE.
• Exists in table: specify that a column’s value
must exist in a column in another table. This
option also uses the LOOKUP_EXT
function. You can define the NOT NULL
constraint for the column in the LOOKUP
table to ensure the Exists in table condition
executes properly.
• Custom condition: create more complex
expressions using the function and smart
editors.
Data Services converts substitute values in the
condition to a corresponding column datatype:
integer, decimal, varchar, date, datetime,
timestamp, or time. The Validation transform
requires that you enter some values in specific
formats:
• date (YYYY.MM.DD)
184 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Option
Action on Fail
Description
• datetime (YYYY.MM.DD HH24:MI:SS)
• time (HH24:MI:SS)
• timestamp (YYYY.MM.DD HH24:MI:SS.FF)
If, for example, you specify a date as
12-01-2004, Data Services produces an error
because you must enter this date as 2004.12.01.
Define where a record is loaded if it fails the
validation rule:
• Send to Fail
• Send to Pass
• Send to both
If you choose Send to Pass or Send to Both,
you can choose to substitute a value or
expression for the failed values that are sent
to the Pass output.
To create a validation rule 1. Open the data flow workspace.
2. Add your source object to the workspace.
3. On the Transforms tab of the Local Object Library, click and drag the Validation transform
to the workspace to the right of your source object.
4. Add your target objects to the workspace.
You will require one target object for records that pass validation, and an optional target
object for records that fail validation, depending on the options you select.
5. Connect the source object to the transform.
6. Double-click the Validation transform to open the transform editor.
Using Platform Transforms—Learner’s Guide 185
SAP Data Services – Data Integrator XI 3.0
7. In the input schema area, click to select an input schema column.
8. In the parameters area, select the Enable Validation option.
9. In the Condition area, select a condition type and enter any associated value required.
All conditions must be Boolean expressions. 10. On the Properties tab, enter a name and description for the validation rule.
11. On the Action On Failure tab, select an action.
12. If desired, select the For pass, substitute with option and enter a substitute value or
expression for the failed value that is sent to the Pass output.
This option is only available if you select Send to Pass or Send to Both. 13. Click Back to return to the data flow workspace.
14. Click and drag from the transform to the target object.
15. Release the mouse and select the appropriate label for that object from the pop-up menu.
186 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
16. Repeat step 14 and step 15 for all target objects.
Activity: Using the Validation transform Order data is stored in multiple formats with different structures and different information.
You will use the Validation transform to validate order data from flat file sources and the alpha
orders table before merging it.
Objectives • Join the data in the Orders flat files with that in the Order_Shippers flat files.
• Create a column on the target table for employee information so that orders taken by
employees who are no longer with the company are assigned to a default current employee
using the validation transform in a new column named order_assigned_to.
• Create a column to hold the employee ID of the employee who originally made the sale.
• Replace null values in the shipper fax column with a value of 'No Fax' and send those rows
to a separate table for follow up.
Instructions 1. Create a file format called Order_Shippers_Format for the flat file
Order_Shippers_04_20_07.txt. Use the structure of the text file to determine the appropriate
settings.
2. In the Column Attributes pane, adjust the datatypes for the columns based on their content:
Column Datatype
ORDERID int
SHIPPERNAME varchar(50)
SHIPPERADDRESS varchar(50)
SHIPPERCITY varchar(50)
SHIPPERCOUNTRY int
SHIPPERPHONE varchar(20)
SHIPPERFAX varchar(20)
SHIPPERREGION int
SHIPPERPOSTALCODE varchar(15)
Using Platform Transforms—Learner’s Guide 187
SAP Data Services – Data Integrator XI 3.0
3. In the Omega project, create a new batch job called Alpha_Orders_Validated_Job and two
data flows, one named Alpha_Orders_Files_DF, and the second named Alpha_Orders_DB_DF.
4. Add the file formats Orders_Format and Order_Shippers_Format as source objects to the
Alpha_Orders_Files_DF data flow workspace.
5. Edit the source objects so that the Orders_Format source is using all three related orders
flat files and the Order_Shippers_Format source is using all three order shippers files.
Tip: You can use a wildcard to replace the dates in the file names.
6. Add a Query transform to the workspace and connect it to the two source objects.
7. In the transform editor for the Query transform, create a WHERE clause to join the data on
the OrderID values.
The expression should be as follows:
Order_Shippers_Format.ORDERID = Orders_Format.ORDERID
8. Add the following mappings in the Query transform:
Schema Out Mapping
ORDERID Orders_Format.ORDERID
CUSTOMERID Orders_Format.CUSTOMERID
ORDERDATE Orders_Format.ORDERDATE
SHIPPERNAME Order_Shippers_Format.SHIPPERNAME
SHIPPERADDRESS Order_Shippers_Format.SHIPPERADDRESS
SHIPPERCITY Order_Shippers_Format.SHIPPERCITY
SHIPPERCOUNTRY Order_Shippers_Format.SHIPPERCOUNTRY
SHIPPERPHONE Order_Shippers_Format.SHIPPERPHONE
SHIPPERFAX Order_Shippers_Format.SHIPPERFAX
SHIPPERREGION Order_Shippers_Format.SHIPPERREGION
SHIPPERPOSTALCODE Order_Shippers_Format.SHIPPERPOSTALCODE 9. Insert a new output column above ORDERDATE called ORDER_TAKEN_BY with a datatype
of varchar(15) and map it to Orders_Format.EMPLOYEEID.
188 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
10. Insert a new output column above ORDERDATE called ORDER_ASSIGNED_TO with a datatype
of varchar(15) and map it to Orders_Format.EMPLOYEEID.
11. Add a Validation transform to the right of the Query transform and connect the transforms.
12. In the transform editor for the Validation transform, enable validation for the
ORDER_ASSIGNED_TO column to verify the value in the column exists in the EMPLOYEEID
column of the Employee table in the HR_datamart datastore.
The expression should be as follows:
HR_datamart.hr_datamart.employee.EMPLOYEEID
13. Set the action on failure for the Order_Assigned_To column to send to both pass and fail.
For pass, substitute '3Cla5' to assign it to the default employee.
14. Enable validation for the SHIPPERFAX column to send NULL values to both pass and fail,
substituting 'No Fax' for pass.
15. Add two target tables in the Delta datastore as targets, one called Orders_Files_Work and
one called Orders_Files_No_Fax.
16. Connect the pass output from the Validation transform to Orders_Files_Work and the fail
output to Orders_Files_No_Fax.
17. In the Alpha_Orders_DB_DF workspace, add the Orders table from the Alpha datastore as
the source object.
18. Add a Query transform to the workspace and connect it to the source.
19. In the transform editor for the Query transform, define the following mappings:
Column Mapping
ORDERID Orders.ORDERID
CUSTOMERID Orders.CUSTOMERID
ORDERDATE Orders.ORDERDATE
SHIPPERNAME Orders.SHIPPERNAME
SHIPPERADDRESS Orders.SHIPPERADDRESS
SHIPPERCITY Orders.SHIPPERCITYID
SHIPPERCOUNTRYID Orders.SHIPPERCOUNTRY
SHIPPERPHONE Orders.SHIPPERPHONE
SHIPPERFAX Orders.SHIPPERFAX
Using Platform Transforms—Learner’s Guide 189
SAP Data Services – Data Integrator XI 3.0
Column Mapping
SHIPPERREGION Orders.SHIPPERREGION
SHIPPERPOSTALCODE Orders.SHIPPERPOSTALCODE 20. Insert a new output column above ORDERDATE called ORDER_TAKEN_BY with a data type
of varchar(10) and map it to Orders.EMPLOYEEID.
21. Insert a new output column above ORDERDATE called ORDER_ASSIGNED_TP with a data
type of varchar(10) and map it to Orders.EMPLOYEEID.
22. Add a Validation transform to the right of the Query transform and connect the transforms.
23. Enable validation for Order_Assigned_To to verify the column value exists in the
EMPLOYEEID column of the Employee table in the HR_datamart datastore.
24. Set the action on failure for the Order_Assigned_To column to send to both pass and fail.
For pass, substitute '3Cla5' to assign it to the default employee.
25. Enable validation for the ShipperFax column to send NULL values to both pass and fail,
substituting 'No Fax' for pass.
26. Add two target tables in the Delta datastore as targets, one named Orders_DB_Work and
one named Orders_DB_No_Fax.
27. Connect the pass output from the Validation transform to Orders_DB_Work and the fail
output to Orders_DB_No_Fax.
28. Execute Alpha_Orders_Validated_Job with the default execution properties and save all
objects you have created.
29. View the data in the target tables to view the differences between passing and failing records.
A solution file called SOLUTION_Validation.atl is included in your resource CD. To check the
solution, import the file and open it to view the data flow design and mapping logic. Do not execute the solution job, as this may override the results in your target table.
190 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Using the Merge transform
Introduction The Merge transform allows you to combine multiple sources with the same schema into a
single target.
After completing this unit, you will be able to:
• Use the Merge transform
Explaining the Merge transform The Merge transform combines incoming data sets with the same schema structure to produce
a single output data set with the same schema as the input data sets.
For example, you could use the Merge transform to combine two sets of address data:
The next section gives a brief description the function, data input requirements, options, and
data output results for the Merge transform. For more information on the Merge transform see
“Transforms” Chapter 5 in the Data Services Reference Guide.
Input/Output The Merge transform performs a union of the sources. All sources must have the same schema
as shown in the diagram below, including:
• Number of columns
• Column names
• Column datatypes
Using Platform Transforms—Learner’s Guide 191
SAP Data Services – Data Integrator XI 3.0
If the input data set contains hierarchical data, the names and datatypes must match at every
level of the hierarchy.
The output data has the same schema as the source data. The output data set contains a row
for every row in the source data sets. The transform does not strip out duplicate rows. If columns
in the input set contain nested schemas, the nested data is passed through without change.
Tip: If you want to merge tables that do not have the same schema, you can add the Query
transform to one of the tables before the Merge transform to redefine the schema to match the
other table.
Options The Merge transform does not offer any options.
Activity: Using the Merge transform The Orders data has now been validated, but the output is for two different sources: flat files
and database tables. The next step in the process is to modify the structure of those data sets
to they match, and then merge them into a single data set.
Objectives • Use the Query transforms to modify any column names and data types and to perform
lookups for any columns that reference other tables.
• Use the Merge transform to merge the validated orders data.
Instructions 1. In the Omega project, create a new batch job called Alpha_Orders_Merged_Job with a data
flow called Alpha_Orders_Merged_DF .
2. In the data flow workspace, add the orders_file_work and orders_db_work tables from the
Delta datastore as the source objects.
3. Add two Query transforms to the data flow, connecting each source object to its own Query
transform.
4. In the transform editor for the Query transform connected to the orders_files_work table,
map all columns from input to output.
5. Change the datatype for the following columns as specified:
Column Type
ORDER_TAKEN_BY varchar(15)
ORDER_ASSIGNED_TO varchar(15)
ORDERDATE datetime
192 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Column Type
SHIPPERADDRESS varchar(100)
SHIPPERCOUNTRY varchar(50)
SHIPPERREGION varchar(50)
SHIPPERPOSTALCODE varchar(50)
6. For the SHIPPERCOUNTRY column, change the mapping to perform a lookup of
CountryName from the Country table in the Alpha datastore.
The expression should be as follows:
lookup_ext([Alpha.alpha.country,'PRE_LOAD_CACHE','MAX'],
[COUNTRYNAME],[NULL],[COUNTRYID,'=',orders_file_work.SHIPPERCOUNTRY]) SET
("run_as_separate_process"='no')
7. For the SHIPPERREGION column, change the mapping to perform a lookup of RegionName
from the Region table in the Alpha datastore.
The expression should be as follows:
lookup_ext([Alpha.alpha.region,'PRE_LOAD_CACHE','MAX'],
[REGIONNAME],[NULL],[REGIONID,'=',orders_file_work.SHIPPERREGION]) SET
("run_as_separate_process"='no')
8. In the transform editor for the Query transform connected to the orders_db_work table,
map all columns from input to output.
9. Change the datatype for the following columns as specified:
Column Type
SHIPPERCOUNTRY varchar(50)
SHIPPERREGION varchar(50) 10. For the SHIPPERCITY column, change the mapping to perform a lookup of CityName from
the City table in the Alpha datastore.
The expression should be as follows:
lookup_ext([Alpha.alpha.city,'PRE_LOAD_CACHE','MAX'],
[CITYNAME],[NULL],[CITYID,'=',orders_db_work.SHIPPERCITYID]) SET
("run_as_separate_process"='no')
11. For the SHIPPERCOUNTRY column, change the mapping to perform a lookup of
CountryName from the Country table in the Alpha datastore.
Using Platform Transforms—Learner’s Guide 193
SAP Data Services – Data Integrator XI 3.0
The expression should be as follows:
lookup_ext([Alpha.alpha.country,'PRE_LOAD_CACHE','MAX'],
[COUNTRYNAME],[NULL],[COUNTRYID,'=',orders_db_work.SHIPPERCOUNTRYID]) SET
("run_as_separate_process"='no')
12. For the SHIPPERREGIONID column, change the mapping to perform a lookup of
RegionName from the Region table in the Alpha datastore.
The expression should be as follows:
lookup_ext([Alpha.alpha.region,'PRE_LOAD_CACHE','MAX'],
[REGIONNAME],[NULL],[REGIONID,'=',orders_db_work.SHIPPERREGIONID]) SET
("run_as_separate_process"='no')
13. Add a Merge transform to the data flow and connect both Query transforms to the Merge
transform.
14. Add a template table called Orders_Merged in the Delta datastore as the target table and
connect it to the Merge transform.
15. Execute Alpha_Orders_Merged_Job with the default execution properties and save all objects
you have created.
16. View the data in the target table.
Note that the SHIPPERCITY, SHIPPERCOUNTRY, and SHIPPERREGION columns for the
363 records in the template table consistently have names versus ID values.
A solution file called SOLUTION_Merge.atl is included in your resource CD. To check the
solution, import the file and open it to view the data flow design and mapping logic. Do not
execute the solution job, as this may override the results in your target table.
194 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Using the Case transform
Introduction The Case transform supports separating data from a source into multiple targets based on
branch logic.
After completing this unit, you will be able to:
• Use the Case transform
Explaining the Case transform You use the Case transform to simplify branch logic in data flows by consolidating case or
decision-making logic into one transform. The transform allows you to split a data set into
smaller sets based on logical branches.
For example, you can use the Case transform to read a table that contains sales revenue facts
for different regions and separate the regions into their own tables for more efficient data access:
The next section gives a brief description the function, data input requirements, options, and
data output results for the Case transform. For more information on the Case transform, see
“Transforms” Chapter 5 in the Data Services Reference Guide.
Input/Output Only one data flow source is allowed as a data input for the Case transform. Depending on the
data, only one of multiple branches is executed per row. The input and output schema are also
identical when using the case transform.
Using Platform Transforms—Learner’s Guide 195
SAP Data Services – Data Integrator XI 3.0
The connections between the Case transform and objects used for a particular case must be
labeled. Each output label in the Case transform must be used at least once.
You connect the output of the Case transform with another object in the workspace. Each label
represents a case expression (WHERE clause).
Options The Case transform offers several options:
Option Description
Label
Define the name of the connection that
describes where data will go if the
corresponding Case condition is true.
Expression Define the Case expression for the
corresponding label.
Produce default option with label
Specify that the transform must use the
expression in this label when all other Case
expressions evaluate to false.
Row can be TRUE for one case only
To create a case statement
Specify that the transform passes each row to
the first case whose expression returns true.
1. Open the data flow workspace.
2. Add your source object to the workspace.
3. On the Transforms tab of the Local Object Library, click and drag the Case transform to the
workspace to the right of your source object.
4. Add your target objects to the workspace.
You will require one target object for each possible condition in the case statement. 5. Connect the source object to the transform.
6. Double-click the Case transform to open the transform editor.
196 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
7. In the parameters area of the transform editor, click Add to add a new expression.
8. In the Label field, enter a label for the expression.
9. Click and drag an input schema column to the Expression pane at the bottom of the window.
10. Enter the rest of the expression to define the condition.
For example, to specify that you want all Customers with a RegionID of 1, create the following
statement: Customer.RegionID = 1
11. Repeat step 7 to step 10 for all expressions.
12. To direct records that do not meet any defined conditions to a separate target object, select
the Produce default option with label option and enter the label name in the associated
field.
13. To direct records that meet multiple conditions to only one target, select the Row can be
TRUE for one case only option.
In this case, records are placed in the target associated with the first condition that evaluates
as true.
14. Click Back to return to the data flow workspace.
Using Platform Transforms—Learner’s Guide 197
SAP Data Services – Data Integrator XI 3.0
15. Connect the transform to the target object.
16. Release the mouse and select the appropriate label for that object from the pop-up menu.
17. Repeat step 15 and step 16 for all target objects.
Activity: Using the Case transform Once the orders have been validated and merged, the resulting data set must be split out by
quarter for reporting purposes.
Objective • Use the Case transform to create separate tables for orders occurring in fiscal quarters 3 and
4 for the year 2007 and quarter 1 of 2008.
Instructions 1. In the Omega project, create a new batch job called Alpha_Orders_By_Quarter_Job with a
data flow named Alpha_Orders_By_Quarter_DF.
2. In the data flow workspace, add the Orders_Merged table from the Delta datastore as the
source object.
3. Add a Query transform to the data flow and connect it to the source table.
4. In the transform editor for the Query transform, map all columns from input to output.
5. Add the following two output columns:
Column Type Mapping
ORDERQUARTER int quarter (orders_merged.ORDERDATE)
ORDERYEAR
varchar(4) to_char (orders_merged.ORDERDATE,
'YYYY')
6. Add a Case transform to the data flow and connect it to the Query transform.
7. In the transform editor for the Case transform, create the following labels and associated
expressions:
Label Expression
Q42006 Query.ORDERYEAR = '2006' and
Query.ORDERQUARTER = 4
Q12007 Query.ORDERYEAR = '2007' and
Query.ORDERQUARTER = 1
198 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Label Expression
Q22007 Query.ORDERYEAR = '2007' and
Query.ORDERQUARTER = 2
Q32007 Query.ORDERYEAR = '2007' and
Query.ORDERQUARTER = 3
Q42007 Query.ORDERYEAR = '2007' and
Query.ORDERQUARTER = 4 8. Choose the settings to not produce a default output set for the Case transform and to specify
that rows can be true for one case only.
9. Add five template tables in the Delta datastore called Orders_Q4_2006, Orders_Q1_2007,
Orders_Q2_2007, Orders_Q3_2007, and Orders_Q4_2007.
10. Connect the output from the Case transform to the target tables selecting the corresponding
labels.
11. Execute Alpha_Orders_By_Quarter_Job with the default execution properties and save all
objects you have created.
12. View the data in the target tables and confirm that there are 103 orders that were placed in
Q1 of 2007.
A solution file called SOLUTION_Case.atl is included in your resource CD. To check the solution,
import the file and open it to view the data flow design and mapping logic. Do not execute the
solution job, as this may override the results in your target table.
Using Platform Transforms—Learner’s Guide 199
SAP Data Services – Data Integrator XI 3.0
Using the SQL transform
Introduction The SQL transform allows you to submit SQL commands to generate data to be moved into
target objects.
After completing this unit, you will be able to:
• Use the SQL transform
Explaining the SQL transform Use this transform to perform standard SQL operations when other built-in transforms cannot
perform them.
The SQL transform can be used to extract for general select statements as well as stored
procedures and views.
You can use the SQL transform as a replacement for the Merge transform when you are dealing
with database tables only. The SQL transform performs more efficiently because the merge is
pushed down to the database. However, you cannot use this functionality if your source objects
include file formats.
The next section gives a brief description the function, data input requirements, options, and
data output results for the SQL transform. For more information on the SQL transform see
“Transforms” Chapter 5 in the Data Services Reference Guide.
Inputs/Outputs There is no input data set for the SQL transform.
There are two ways of defining the output schema for a SQL transform if the SQL submitted
is expected to return a result set:
200 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
• Automatic — After you type the SQL statement, click Update schema to execute a select
statement against the database that obtains column information returned by the select
statement and populates the output schema.
• Manual — Output columns must be defined in the output portion of the SQL transform if
the SQL operation is returning a data set. The number of columns defined in the output of
the SQL transform must equal the number of columns returned by the SQL query, but the
column names and data types of the output columns do not need to match the column names
or data types in the SQL query.
Options The SQL transform has the following options:
Option Description
Datastore Specify the datastore for the tables referred to in the SQL statement.
Database type Specify the type of database for the datastore where there are
multiple datastore configurations.
Join rank
Indicate the weight of the output data set if the data set is used in
a join. The highest ranked source is accessed first to construct the
join.
Array fetch size Indicate the number of rows retrieved in a single request to a source
database. The default value is 1000.
Cache
Hold the output from this transform in memory for use in
subsequent transforms. Use this only if the data set is small enough
to fit in memory.
SQL text Enter the text of the SQL query.
To create a SQL statement 1. Open the data flow workspace.
2. On the Transforms tab of the Local Object Library, click and drag the SQL transform to the
workspace.
3. Add your target object to the workspace.
4. Connect the transform to the target object.
5. Double-click the SQL transform to open the transform editor.
Using Platform Transforms—Learner’s Guide 201
SAP Data Services – Data Integrator XI 3.0
6. In the parameters area, select the source datastore from the Datastore drop-down list.
7. If there is more than one datastore configuration, select the appropriate configuration from
the Database type drop-down list.
8. Change the other available options, if required.
9. In the SQL text area, enter the SQL statement.
For example, to copy the entire contents of a table into the target object, you would use the
following statement: Select * from Customers.
10. Click Update Schema to update the output schema with the appropriate values.
If required, you can change the names and datatypes of these columns. You can also create
the output columns manually.
11. Click Back to return to the data flow workspace.
12. Click and drag from the transform to the target object.
Activity: Using the SQL transform The contents of the Employee and Department tables must be merged, which can be done using
the SQL transform as a shortcut.
Objective • Use the SQL transform to select employee and department data.
202 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Instructions 1. In the Omega project, create a new batch job called Alpha_Employees_Dept_Job with a data
flow called Alpha_Employees_Dept_DF.
2. In the data flow workspace, add the SQL transform as the source object.
3. Add the Emp_Dept table from the HR_datamart datastore as the target object, and connect
the transform to it.
4. In the transform editor for the SQL transform, specify the appropriate datastore name and
database type for the Alpha datastore.
5. Create a SQL statement to select the last name and first name for the employee from the
Employee table and the department in which the employee belongs by looking up the value
in the Department table based on the Department ID.
The expression should be as follows:
select employee.EMPLOYEEID, employee.LASTNAME, employee.FIRSTNAME,
department.DEPARTMENTNAME from Alpha.employee, Alpha.department where
employee.DEPARTMENTID = department.DEPARTMENTID
6. Update the output schema based on your SQL statement.
7. Set the EMPLOYEEID columns as the primary key.
8. Execute Employees_Dept_Job with the default execution properties and save all objects you
have created.
9. Return to the data flow workspace and view data for the target table.
You should have 40 rows in your target table, because there were 8 employees in the
employee table with department IDs that were not defined in the department table.
A solution file called SOLUTION_SQL.atl is included in your resource CD. To check the solution,
import the file and open it to view the data flow design and mapping logic. Do not execute the
solution job, as this may override the results in your target table.
Using Platform Transforms—Learner’s Guide 203
SAP Data Services – Data Integrator XI 3.0
Quiz: Using platform transforms 1. What would you use to change a row type from NORMAL to INSERT?
2. What is the Case transform used for?
3. Name the transform that you would use to combine incoming data sets to produce a single
output data set with the same schema as the input data sets.
4. A validation rule consists of a condition and an action on failure. When can you use the
action on failure options in the validation rule?
5. When would you use the Merge transform versus the SQL transform to merge records?
204 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Lesson summary After completing this lesson, you are now able to:
• Describe platform transforms
• Use the Map Operation transform
• Use the Validation transform
• Use the Merge transform
• Use the Case transform
• Use the SQL transform
Setting up Error Handling—Learner’s Guide 205
SAP Data Services – Data Integrator XI 3.0
Lesson 7
Setting up Error Handling
Lesson introduction For sophisticated error handling, you can use recoverable work flows and try/catch blocks to
recover data.
After completing this lesson, you will be able to:
• Set up recoverable work flows
206 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Using recovery mechanisms
Introduction If a Data Services job does not complete properly, you must resolve the problems that prevented
the successful execution of the job.
After completing this unit, you will be able to:
• Explain how to avoid data recovery situations
• Explain the levels of data recovery strategies
• Recover a failed job using automatic recovery
• Recover missing values and rows
• Define alternative work flows
Avoiding data recovery situations The best solution to data recovery situations is obviously not to get into them in the first place.
Some of those situations are unavoidable, such as server failures. Others, however, can easily
be sidestepped by constructing your jobs so that they take into account the issues that frequently
cause them to fail.
One example is when an external file is required to run a job. In this situation, you could use
the wait_for_file function or a while loop and the file_exists function to check that the file exists
in a specified location before executing the job.
While loops The while loop is a single-use object that you can use in a work flow. The while loop repeats a
sequence of steps as long as a condition is true.
Typically, the steps done during the while loop result in a change in the condition so that the
condition is eventually no longer satisfied and the work flow exits from the while loop. If the
condition does not change, the while loop does not end.
For example, you might want a work flow to wait until the system writes a particular file. You
can use a while loop to check for the existence of the file using the file_exists function. As long
as the file does not exist, you can have the work flow go into sleep mode for a particular length
of time before checking again.
Because the system might never write the file, you must add another check to the loop, such
as a counter, to ensure that the while loop eventually exits. In other words, change the while
loop to check for the existence of the file and the value of the counter. As long as the file does
not exist and the counter is less than a particular value, repeat the while loop. In each iteration
of the loop, put the work flow in sleep mode and then increment the counter.
Setting up Error Handling—Learner’s Guide 207
SAP Data Services – Data Integrator XI 3.0
Describing levels of data recovery strategies When a job fails to complete successfully during execution, some data flows may not have
completed. When this happens, some tables may have been loaded, partially loaded, or altered.
You need to design your data movement jobs so that you can recover your data by rerunning
the job and retrieving all the data without introducing duplicate or missing data.
There are different levels of data recovery and recovery strategies. You can:
• Recover your entire database: Use your standard RDBMS services to restore crashed data
cache to an entire database. This option is outside of the scope of this course.
• Recover a partially-loaded job: Use automatic recovery.
• Recover from partially-loaded tables: Use the Table Comparison transform, do a full
replacement of the target, use the auto-correct load feature, include a preload SQL command
to avoid duplicate loading of rows when recovering from partially loaded tables.
• Recover missing values or rows: Use the Validation transform or the Query transform with
WHERE clauses to identify missing values, and use overflow files to manage rows that
could not be inserted.
• Define alternative work flows: Use conditionals, try/catch blocks, and scripts to ensure all
exceptions are managed in a work flow.
Depending on the relationships between data flows in your application, you may use a
combination of these techniques to recover from exceptions.
Note: It is important to note that some recovery mechanisms are for use in production systems and
are not supported in development environments.
Configuring work flows and data flows In some cases, steps in a work flow depend on each other and must be executed together. When
there is a dependency like this, you should designate the work flow as a recovery unit. This
requires the entire work flow to complete successfully. If the work flow does not complete
successfully, Data Services executes the entire work flow during recovery, including the steps
that executed successfully in prior work flow runs.
Conversely, you may need to specify that a work flow or data flow should only execute once.
When this setting is enabled, the job never re-executes that object. It is not recommended to
mark a work flow or data flow as “Execute only once” if the parent work flow is a recovery
unit.
To specify a work flow as a recovery unit 1. In the project area or on the Work Flows tab of the Local Object Library, right-click the work
flow and select Properties from the menu.
The Properties dialog box displays.
2. On the General tab, select the Recover as a unit check box.
3. Click OK.
208 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
To specify that an object executes only once 1. In the project area or on the appropriate tab of the Local Object Library, right-click the work
flow or data flow and select Properties from the menu.
The Properties dialog box displays.
2. On the General tab, select the Execute only once check box.
3. Click OK.
Using recovery mode If a job with automated recovery enabled fails during execution, you can execute the job again
in recovery mode. During recovery mode, Data Services retrieves the results for
successfully-completed steps and reruns uncompleted or failed steps under the same conditions
as the original job.
In recovery mode, Data Services executes the steps or recovery units that did not complete
successfully in a previous execution. This includes steps that failed and steps that generated
an exception but completed successfully, such as those in a try/catch block. As in normal job
execution, Data Services executes the steps in parallel if they are not connected in the work
flow diagrams and in serial if they are connected.
For example, suppose a daily update job running overnight successfully loads dimension tables
in a warehouse. However, while the job is running, the database log overflows and stops the
job from loading fact tables. The next day, you truncate the log file and run the job again in
recovery mode. The recovery job does not reload the dimension tables in a failed job because
the original job, even though it failed, successfully loaded the dimension tables.
To ensure that the fact tables are loaded with the data that corresponds properly to the data
already loaded in the dimension tables, ensure the following:
• Your recovery job must use the same extraction criteria that your original job used when
loading the dimension tables.
If your recovery job uses new extraction criteria, such as basing data extraction on the current
system date, the data in the fact tables will not correspond to the data previously extracted
into the dimension tables.
If your recovery job uses new values, the job execution may follow a completely different
path through conditional steps or try/catch blocks.
• Your recovery job must follow the exact execution path that the original job followed. Data
Services records any external inputs to the original job so that your recovery job can use
these stored values and follow the same execution path.
To enable automatic recovery in a job 1. In the project area, right-click the job and select Execute from the menu.
The Execution Properties dialog box displays.
2. On the Parameters tab, select the Enable recovery check box.
Setting up Error Handling—Learner’s Guide 209
SAP Data Services – Data Integrator XI 3.0
If this check box is not selected, Data Services does not record the results from the steps
during the job and cannot recover the job if it fails.
3. Click OK. To recover from last execution
1. In the project area, right-click the job that failed and select Execute from the menu.
The Execution Properties dialog box displays.
2. On the Parameters tab, select the Recover from last execution check box.
This option is not available when a job has not yet been executed, the previous job run
succeeded, or recovery mode was disabled during the previous run.
3. Click OK.
Recovering from partially-loaded data Executing a failed job again may result in duplication of rows that were loaded successfully
during the first job run.
Within your recoverable work flow, you can use several methods to ensure that you do not
insert duplicate rows:
• Include the Table Comparison transform (available in Data Integrator packages only) in
your data flow when you have tables with more rows and fewer fields, such as fact tables.
• Change the target table options to completely replace the target table during each execution.
This technique can be optimal when the changes to the target table are numerous compared
to the size of the table.
• Change the target table options to use the auto-correct load feature when you have tables
with fewer rows and more fields, such as dimension tables. The auto-correct load checks
the target table for existing rows before adding new rows to the table. Using the auto-correct
load option, however, can slow jobs executed in non-recovery mode. Consider this technique
when the target table is large and the changes to the table are relatively few.
• Include a SQL command to execute before the table loads. Preload SQL commands can
remove partial database updates that occur during incomplete execution of a step in a job.
Typically, the preload SQL command deletes rows based on a variable that is set before the
partial insertion step began.
For more information on preloading SQL commands, see “Using preload SQL to allow
re-executable Data Flows”, Chapter 18 in the Data Services Designer Guide.
Recovering missing values or rows
Missing values that are introduced into the target data during data integration and data quality
processes can be managed using the Validation or Query transforms.
210 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Missing rows are rows that cannot be inserted into the target table. For example, rows may be
missing in instances where a primary key constraint is violated. Overflow files help you process
this type of data problem.
When you specify an overflow file and Data Services cannot load a row into a table, Data
Services writes the row to the overflow file instead. The trace log indicates the data flow in
which the load failed and the location of the file.
You can use the overflow information to identify invalid data in your source or problems
introduced in the data movement. Every new run will overwrite the existing overflow file.
To use an overflow file in a job 1. Open the target table editor for the target table in your data flow.
2. On the Options tab, under Error handling, select the Use overflow file check box.
3. In the File name field, enter or browse to the full path and file name for the file.
When you specify an overflow file, give a full path name to ensure that Data Services creates
a unique file when more than one file is created in the same job.
4. In the File format drop-down list, select what you want Data Services to write to the file
about the rows that failed to load:
• If you select Write data, you can use Data Services to specify the format of the
error-causing records in the overflow file.
• If you select Write sql, you can use the commands to load the target manually when the
target is accessible.
Defining alternative work flows You can set up your jobs to use alternative work flows that cover all possible exceptions and
have recovery mechanisms built in. This technique allows you to automate the process of
recovering your results.
Alternative work flows consist of several components:
1. A script to determine if recovery is required.
This script reads the value in a status table and populates a global variable with the same
value. The initial value in table is set to indicate that recovery is not required.
2. A conditional that calls the appropriate work flow based on whether recovery is required.
Setting up Error Handling—Learner’s Guide 211
SAP Data Services – Data Integrator XI 3.0
The conditional contains an If/Then/Else statement to specify that work flows that do not
require recovery are processed one way, and those that do require recovery are processed
another way.
3. A work flow with a try/catch block to execute a data flow without recovery.
The data flow where recovery is not required is set up without the auto correct load option
set. This ensures that, wherever possible, the data flow is executed in a less resource-intensive
mode.
4. A script in the catch object to update the status table.
The script specifies that recovery is required if any exceptions are generated. 5. A work flow to execute a data flow with recovery and a script to update the status table.
The data flow is set up for more resource-intensive processing that will resolve the exceptions.
The script updates the status table to indicate that recovery is not required.
212 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Conditionals Conditionals are single-use objects used to implement conditional logic in a work flow. When
you define a conditional, you must specify a condition and two logical branches:
Statement Description
If
Then
A Boolean expression that evaluates to TRUE
or FALSE. You can use functions, variables,
and standard operators to construct the
expression.
Work flow element to execute if the IF
expression evaluates to TRUE.
Else Work flow element to execute if the IF
expression evaluates to FALSE.
Both the Then and Else branches of the conditional can contain any object that you can have in
a work flow, including other work flows, data flows, nested conditionals, try/catch blocks,
scripts, and so on.
Try/Catch Blocks A try/catch block allows you to specify alternative work flows if errors occur during job
execution. Try/catch blocks catch classes of errors, apply solutions that you provide, and
continue execution.
For each catch in the try/catch block, you will specify:
• One exception or group of exceptions handled by the catch. To handle more than one
exception or group of exceptions, add more catches to the try/catch block.
• The work flow to execute if the indicated exception occurs. Use an existing work flow or
define a work flow in the catch editor.
If an exception is thrown during the execution of a try/catch block, and if no catch is looking
for that exception, then the exception is handled by normal error logic.
Using try/catch blocks and automatic recovery Data Services does not save the result of a try/catch block for re-use during recovery. If an
exception is thrown inside a try/catch block, during recovery Data Services executes the step
that threw the exception and subsequent steps.
Because the execution path through the try/catch block might be different in the recovered
job, using variables set in the try/catch block could alter the results during automatic recovery.
For example, suppose you create a job that defines the value of variable $I within a try/catch
block. If an exception occurs, you set an alternate value for $I. Subsequent steps are based on
the new value of $I.
Setting up Error Handling—Learner’s Guide 213
SAP Data Services – Data Integrator XI 3.0
During the first job execution, the first work flow contains an error that generates an exception,
which is caught. However, the job fails in the subsequent work flow.
You fix the error and run the job in recovery mode. During the recovery execution, the first
work flow no longer generates the exception. Thus the value of variable $I is different, and the
job selects a different subsequent work flow, producing different results.
214 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
To ensure proper results with automatic recovery when a job contains a try/catch block, do
not use values set inside the try/catch block or reference output variables from a try/catch
block in any subsequent steps.
To create an alternative work flow 1. Create a job.
2. Add a global variable to your job called $G_recovery_needed with a datatype of int.
The purpose of this global variable is to store a flag that indicates whether or not recovery
is needed. This flag is based on the value in a recovery status table, which contains a flag of
1 or 0, depending on whether recovery is needed. 3. In the job workspace, add a work flow using the tool palette.
4. In the work flow workspace, add a script called GetStatus using the tool palette.
5. In the script workspace, construct an expression to update the value of the
$G_recovery_needed global variable to the same value as is in the recovery status table.
The script content depends on the RDBMS on which the status table resides. The following
is an example of the expression:
$G_recovery_needed = sql('DEMO_Target', 'select recovery_flag from
recovery_status');
6. Return to the work flow workspace.
7. Add a conditional to the workspace using the tool palette and connect it to the script.
8. Open the conditional.
The transform editor for the conditional allows you to specify the IF expression and
Then/Else branches.
Setting up Error Handling—Learner’s Guide 215
SAP Data Services – Data Integrator XI 3.0
9. In the IF field, enter the expression that evaluates whether recovery is required.
The following is an example of the expression:
$G_recovery_needed = 0
This means the objects in the Then pane will run if recovery is not required. If recovery is
needed, the objects in the Else pane will run.
10. Add a try object to the Then pane of the transform editor using the tool palette.
11. In the Local Object Library, click and drag a work flow or data flow to the Then pane after
the try object.
12. Add a catch object to the Then pane after the work flow or data flow using the tool palette.
13. Connect the objects in the Then pane.
14. Open the workspace for the catch object.
All exception types are lists in the Available exceptions pane.
216 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
15. To change which exceptions act as triggers, expand the tree in the Available exceptions
pane, select the appropriate exceptions, and click Set to move them to the Trigger on these
exceptions pane.
By default, Data Services catches all exceptions. 16. Add a script called Fail to the lower pane using the Tool.
This object will be executed if there are any exceptions. If desired, you can add a data flow
here instead of a script.
17. In the script workspace, construct an expression update the flag in the recovery status table
to 1, indicating that recovery is needed.
The script content depends on the RDBMS on which the status table resides. The following
is an example of the expression:
sql('DEMO_Target','update recovery_status set recovery_flag = 1');
18. Return to the conditional workspace.
19. Connect the objects in the Then pane.
20. In the Local Object Library, click and drag the work flow or data flow that represents the
recovery process to the Else pane.
This combination means that if recovery is not needed, then the first object will be executed;
if recovery is required, the second object will be executed.
21. Add a script called Pass to the lower pane using the tool palette.
22. In the script workspace, construct an expression to update the flag in the recovery status
table to 0, indicating that recovery is not needed.
The script content depends on the RDBMS on which the status table resides. The following
is an example of the expression:
Setting up Error Handling—Learner’s Guide 217
SAP Data Services – Data Integrator XI 3.0
sql('DEMO_Target','update recovery_status set recovery_flag = 0');
23. Return to the conditional workspace.
24. Connect the objects in the Else pane.
25. Validate and save all objects.
26. Execute the job.
The first time this job is executed, the job succeeds because the recovery_flag value in the
status table is set to 0 and the target table is empty, so there is no primary key constraint.
27. Execute the job again.
The second time this job is executed, the job fails because the target table already contains
records, so there is a primary key exception.
28. Check the contents of the status table.
The recovery_flag field now contains a value of 1.
29. Execute the job again.
The third time this job is executed, the version of the data flow with the Auto correct load
option selected runs because the recovery_flag value in the status table is set to 1. The job
succeeds because the auto correct load feature checks for existing values before trying to
insert new rows.
30. Check the contents of the status table again.
The recovery_flag field contains a value of 0.
Activity: Creating an alternative work flow With the influx of new employees resulting from Alpha's acquisition of new companies, the
Employee Department information needs to be updated regularly. Because this information is
used for payroll, it is critical that no records are lost if a job is interrupted, so you need to set
up the job in such a way that exceptions will always be managed. This involves setting up a
conditional that will try to run a less resource-intensive update of the table first; if that generates
an exception, the conditional then tries a version of the same data flow that is configured to
auto correct the load.
Objective • Set up a try/catch block with a conditional to catch exceptions.
Instructions 1. Delete all of the data from the Emp_Dept table in the HR_datamart datastore.
a. From the Start menu, click Programs ➤ MySQL ➤ MySQL Server 5.0 ➤ MySQL
Command Line Client .
b. Enter a password of root and press Enter.
c. At the mysql prompt, enter delete from hr_datamart.emp_dept; and press Enter.
The system confirms that 40 rows were deleted.
218 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
2. In the Local Object Library, replicate Alpha_Employees_Dept_DF and rename the new
version Alpha_Employees_Dept_AC_DF.
3. In the target table editor for the Emp_Dept table in Alpha_Employees_Dept_DF, ensure
that the Delete data from table before loading and Auto correct load options are not
selected.
4. In the target table editor for the Emp_Dept table in Alpha_Employees_Dept_AC_DF, ensure
that the Delete data from table before loading option is not selected.
5. Select the Auto correct load option.
6. In the Omega project, create a new batch job called Alpha_Employees_Dept_Recovery_Job.
7. Add a global variable called $G_Recovery_Needed with a datatype of int to your job.
8. Add a work flow to your job called Alpha_Employees_Dept_Recovery_WF.
9. In the work flow workspace, add a script called GetStatus and construct an expression to
update the value of the $G_Recovery_Needed global variable to the same value as in the
recovery_flag column in the recovery_status table in the HR datamart.
The expression should be:
$G_Recovery_Needed = sql('hr_datamart', 'select recovery_flag from
recovery_status');
10. In the work flow workspace, add a conditional called Alpha_Employees_Dept_Con and
connect it to the script.
11. In the editor for the conditional, enter an IF expression that states that recovery is not
required.
The expression should be:
$G_Recovery_Needed = 0
12. In the Then pane, create a new try object called Alpha_Employees_Dept_Try.
13. Add Alpha_Employees_Dept_DF and connect it to the try object.
14. Create a new catch object called Alpha_Employees_Dept_Catch, and connect it to
Alpha_Employees_Dept_DF.
15. In the editor for the catch object, add a script called Recovery_Fail to the lower pane and
construct an expression to update the flag in the recovery status table to 1, indicating that
recovery is needed.
The expression should be:
sql('hr_datamart','update recovery_status set recovery_flag = 1');
16. In the conditional workspace, add Alpha_Employees_Dept_AC_DF to the Else pane.
17. Add a script called Recovery_Pass to the Else pane next to Alpha_Employees_Dept_AC_DF
and connect the objects.
Setting up Error Handling—Learner’s Guide 219
SAP Data Services – Data Integrator XI 3.0
18. In the script, construct an expression to update the flag in the recovery status table to 0,
indicating that recovery is not needed.
The expression should be:
sql('hr_datamart','update recovery_status set recovery_flag = 0');
19. Execute Alpha_Employees_Dept_Recovery_Job for the first time with the default execution
properties and save all objects you have created.
In the log, note that Alpha_Employees_Dept_DF executed.
20. Execute Alpha_Employees_Dept_Recovery_Job again.
In the log, note that the job fails.
21. Execute Alpha_Employees_Dept_Recovery_Job for a third time.
In the log, note that, this time, Alpha_Employees_Dept_AC_DF executed.
A solution file called SOLUTION_Recovery.atl is included in your resource CD. To check the
solution, import the file and open it to view the data flow design and mapping logic. Do not
execute the solution job, as this may override the results in your target table.
220 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Quiz: Setting up error handling 1. List the different strategies you can use to avoid duplicate rows of data when re-loading a
job.
2. True or false? You can only run a job in recovery mode after the initial run of the job has
been set to run with automatic recovery enabled.
3. What are the two scripts in a manual recovery work flow used for? 4. Which of the following types of exception can you NOT catch using a try/catch block?
a. Database access errors
b. Syntax errors
c. System exception errors
d. Execution errors
e. File access errors
Setting up Error Handling-Learner's Guide 221
SAP Data Services – Data Integrator XI 3.0
Lesson summary
After completing this lesson, you are now able to:
• Set up recoverable work flows
SAP Data Services – Data Integrator XI 3.0
222 BusinessObjects Data Integrator XI 3.0: Core Concepts-Learner's Guide
Capturing Changes in Data—Learner’s Guide 223
SAP Data Services – Data Integrator XI 3.0
Lesson 8
Capturing Changes in Data
Lesson introduction The design of your data warehouse must take into account how you are going to handle changes
in your target system when the respective data in your source system changes. Data Integrator
transforms provide you with a mechanism to do this.
After completing this lesson, you will be able to:
• Update data over time
• Use source-based CDC
• Use target-based CDC
224 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Updating data over time
Introduction Data Integrator transforms provide support for updating changing data in your data warehouse.
After completing this unit, you will be able to:
• Describe the options for updating changes to data
• Explain the purpose of Changed Data Capture (CDC)
• Explain the role of surrogate keys in managing changes to data
• Define the differences between source-based and target-based CDC
Explaining Slowly Changing Dimensions (SCD) SCDs are dimensions that have data that changes over time. The following methods of handling
SCDs are available:
Type Description
Type 1
No history preservation
Type 2
Unlimited history preservation and new rows
Natural consequence of normalization.
• New rows generated for significant
changes.
• Requires use of a unique key. The key
relates to facts/time.
• Optional Effective_Date field.
Type 3
Limited history preservation
• Two states of data are preserved: current
and old.
• New fields are generated to store history
data.
• Requires an Effective_Date field.
Because SCD Type 2 resolves most of the issues related to slowly changing dimensions, it is
explored last.
SCD Type 1 For an SCD Type 1 change, you find and update the appropriate attributes on a specific
dimensional record. For example, to update a record in the SALES_PERSON_DIMENSION
table to show a change to an individual’s SALES_PERSON_NAME field, you simply update
one record in the SALES_PERSON_DIMENSION table. This action would update or correct
that record for all fact records across time. In a dimensional model, facts have no meaning until
Capturing Changes in Data—Learner’s Guide 225
SAP Data Services – Data Integrator XI 3.0
you link them with their dimensions. If you change a dimensional attribute without
appropriately accounting for the time dimension, the change becomes global across all fact
records.
This is the data before the change:
SALES_PERSON_KEY SALES_PERSON_ID NAME SALES_TEAM
15 000120 Doe, John B Northwest
This is the same table after the salesperson’s name has been changed:
SALES_PERSON_KEY SALES_PERSON_ID NAME SALES_TEAM
15 000120 Smith, John B Northwest
However, suppose a salesperson transfers to a new sales team. Updating the salesperson’s
dimensional record would update all previous facts so that the salesperson would appear to
have always belonged to the new sales team. This may cause issues in terms of reporting sales
numbers for both teams. If you want to preserve an accurate history of who was on which sales
team, Type 1 is not appropriate.
SCD Type 3 To implement a Type 3 change, you change the dimension structure so that it renames the
existing attribute and adds two attributes, one to record the new value and one to record the
date of the change.
A Type 3 implementation has three disadvantages:
• You can preserve only one change per attribute, such as old and new or first and last.
• Each Type 3 change requires a minimum of one additional field per attribute and another
additional field if you want to record the date of the change.
• Although the dimension’s structure contains all the data needed, the SQL code required to
extract the information can be complex. Extracting a specific value is not difficult, but if you
want to obtain a value for a specific point in time or multiple attributes with separate old
and new values, the SQL statements become long and have multiple conditions.
In summary, SCD Type 3 can store a change in data, but can neither accommodate multiple
changes, nor adequately serve the need for summary reporting.
This is the data before the change:
SALES_PERSON_KEY SALES_PERSON_ID NAME SALES_TEAM
15 000120 Doe, John B Northwest
This is the same table after the new dimensions have been added and the salesperson’s sales
team has been changed:
226 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
SALES_PERSON_
NAME
OLD_TEAM
NEW_TEAM
EFF_TO_DATE
SALES_
PERSON_ID
Doe, John B
SCD Type 2
Northwest Northeast Oct_31_2004 00120
With a Type 2 change, you do not need to make structural changes to the
SALES_PERSON_DIMENSION table. Instead, you add a record.
This is the data before the change:
SALES_PERSON_KEY SALES_PERSON_ID NAME SALES_TEAM
15 000120 Doe, John B Northwest
After you implement the Type 2 change, two records appear, as in the following table:
SALES_PERSON_KEY SALES_PERSON_ID NAME SALES_TEAM
15 000120 Doe, John B Northwest
133 000120 Doe, John B Southeast
Updating changes to data When you have a large amount of data to update regularly and a small amount of system down
time for scheduled maintenance on a data warehouse, you must choose the most appropriate
method for updating your data over time, also known as “delta load”. You can choose to do a
full refresh of your data or you can choose to extract only new or modified data and update
the target system:
• Full refresh: Full refresh is easy to implement and easy to manage. This method ensures
that no data is overlooked or left out due to technical or programming errors. For an
environment with a manageable amount of source data, full refresh is an easy method you
can use to perform a delta load to a target system.
• Capturing only changes: After an initial load is complete, you can choose to extract only
new or modified data and update the target system. Identifying and loading only changed
data is called Changed Data Capture (CDC). CDC is recommended for large tables. If the
tables that you are working with are small, you may want to consider reloading the entire
table instead. The benefit of using CDC instead of doing a full refresh is that it:
○ Improves performance because the job takes less time to process with less data to extract,
transform, and load.
○ Change history can be tracked by the target system so that data can be correctly analyzed
over time. For example, if a sales person is assigned a new sales region, simply updating
the customer record to reflect the new region negatively affects any analysis by region
Capturing Changes in Data—Learner’s Guide 227
SAP Data Services – Data Integrator XI 3.0
over time because the purchases made by that customer before the move are attributed
to the new region.
Explaining history preservation and surrogate keys History preservation allows the data warehouse or data mart to maintain the history of data
in dimension tables so you can analyze it over time.
For example, if a customer moves from one sales region to another, simply updating the
customer record to reflect the new region would give you misleading results in an analysis by
region over time, because all purchases made by the customer before the move would incorrectly
be attributed to the new region.
The solution to this involves introducing a new record for the same customer that reflects the
new sales region so that you can preserve the previous record. In this way, accurate reporting
is available for both sales regions. To support this, Data Services is set up to treat all changes
to records as INSERT rows by default.
However, you also need to manage the primary key constraint issues in your target tables that
arise when you have more than one record in your dimension tables for a single entity, such
as a customer or an employee.
For example, with your sales records, the Sales Rep ID is usually the primary key and is used
to link that record to all of the rep's sales orders. If you try to add a new record with the same
primary key, it will throw an exception. On the other hand, if you assign a new Sales Rep ID
to the new record for that rep, you will compromise your ability to report accurately on the
rep's’s total sales.
To address this issue, you will create a surrogate key, which is a new column in the target table
that becomes the new primary key for the records. At the same time, you will change the
properties of the former primary key so that it is simply a data column.
When a new record is inserted for the same rep, a unique surrogate key is assigned allowing
you to continue to use the Sales Rep ID to maintain the link to the rep’s orders.
228 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
You can create surrogate keys either by using the gen_row_num or key_generation functions
in the Query transform to create a new output column that automatically increments whenever
a new record is inserted, or by using the Key Generation transform, which serves the same
purpose.
Comparing source-based and target-based CDC Setting up a full CDC solution within Data Services may not be required. Many databases now
have CDC support built into them, such as Oracle, SQL Server, and DB2. Alternatively, you
could combine surrogate keys with the Map Operation transform to change all UPDATE row
types to INSERT row types to capture changes.
However, if you do want to set up a full CDC solution, there are two general incremental CDC
methods to choose from: source-based and target-based CDC.
Source-based CDC evaluates the source tables to determine what has changed and only extracts
changed rows to load into the target tables.
Target-based CDC extracts all the data from the source, compares the source and target rows
using table comparison, and then loads only the changed rows into the target.
Source-based CDC is almost always preferable to target-based CDC for performance reasons.
However, some source systems do not provide enough information to make use of the
source-based CDC techniques. You will usually use a combination of the two techniques.
Capturing Changes in Data—Learner’s Guide 229
SAP Data Services – Data Integrator XI 3.0
Using source-based CDC
Introduction Source-based CDC is the preferred method because it improves performance by extracting the
fewest rows.
After completing this unit, you will be able to:
• Define the methods of performing source-based CDC
• Explain how to use timestamps in source-based CDC
• Manage issues related to using timestamps for source-based CDC
Using source tables to identify changed data Source-based CDC, sometimes also referred to as incremental extraction, extracts only the
changed rows from the source. To use source-based CDC, your source data must have some
indication of the change. There are two methods:
• Timestamps: You can use the timestamps in your source data to determine what rows have
been added or changed since the last time data was extracted from the source. To support
this type of source-based CDC, your database tables must have at least an update timestamp;
it is preferable to have a create timestamp as well.
• Change logs: You can also use the information captured by the RDBMS in the log files for
the audit trail to determine what data is has been changed.
Log-based data is more complex and is outside the scope of this course. For more information
on using logs for CDC, see “Techniques for Capturing Data”, in the Data Services Designer Guide.
Using CDC with timestamps Timestamp-based CDC is an ideal solution to track changes if:
• There are date and time fields in the tables being updated.
• You are updating a large table that has a small percentage of changes between extracts and
an index on the date and time fields.
• You are not concerned about capturing intermediate results of each transaction between
extracts (for example, if a customer changes regions twice in the same day).
It is not recommended that you use timestamp-based CDC if:
• You have a large table with a large percentage of it changes between extracts and there is
no index on the timestamps.
• You need to capture physical row deletes.
• You need to capture multiple events occurring on the same row between extracts.
Some systems have timestamps with dates and times, some with just the dates, and some with
monotonically-generated increasing numbers. You can treat dates and generated numbers in
the same manner. It is important to note that for timestamps based on real time, time zones
230 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
can become important. If you keep track of timestamps using the nomenclature of the source
system (that is, using the source time or source-generated number), you can treat both temporal
(specific time) and logical (time relative to another time or event) timestamps in the same way.
The basic technique for using timestamps is to add a column to your source and target tables
that tracks the timestamps of rows loaded in a job. When the job executes, this column is updated
along with the rest of the data. The next job then reads the latest timestamp from the target
table and selects only the rows in the source table for which the timestamp is later.
This example illustrates the technique. Assume that the last load occurred at 2:00 PM on January
1, 2008. At that time, the source table had only one row (key=1) with a timestamp earlier than
the previous load. Data Services loads this row into the target table with the original timestamp
of 1:10 PM on January 1, 2008. After 2:00 PM, Data Services adds more rows to the source table:
At 3:00 PM on January 1, 2008, the job runs again. The job:
1. Reads the Last_Update field from the target table (01/01/2008 01:10 PM).
2. Selects rows from the source table that have timestamps that are later than the value of
Last_Update. The SQL command to select these rows is:
SELECT * FROM Source WHERE Last_Update > '01/01/2007 01:10 pm'
This operation returns the second and third rows (key=2 and key=3). 3. Loads these new rows into the target table.
Capturing Changes in Data—Learner’s Guide 231
SAP Data Services – Data Integrator XI 3.0
For timestamped CDC, you must create a work flow that contains the following:
• A script that reads the target table and sets the value of a global variable to the latest
timestamp.
• A data flow that uses the global variable in a WHERE clause to filter the data.
The data flow contains a source table, a query, and a target table. The query extracts only those
rows that have timestamps later than the last update.
To set up a timestamp-based CDC delta job 1. In the Variables and Parameters dialog box, add a global variable called $G_Last_Update
with a datatype of datetime to your job.
232 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
The purpose of this global variable is to store a string conversion of the timestamp for the
last time the job executed.
2. In the job workspace, add a script called GetTimestamp using the tool palette.
3. In the script workspace, construct an expression to do the following:
• Select the last time the job was executed from the last update column in the table.
• Assign the actual timestamp value to the $G_Last_Update global variable.
The script content depends on the RDBMS on which the status table resides. The following
is an example of the expression:
$G_Last_Update = sql('DEMO_Target','select max(last_update) from employee_dim');
4. Return to the job workspace.
5. Add a data flow to the right of the script using the tool palette.
6. In the data flow workspace, add the source, Query transform, and target objects and connect
them.
The target table for CDC cannot be a template table. 7. In the Query transform, add the columns from the input schema to the output schema as
required.
8. If required, in the output schema, right-click the primary key (if it is not already set to the
surrogate key) and clear the Primary Key option in the menu.
9. Right-click the surrogate key column and select the Primary Key option in the menu.
10. On the Mapping tab for the surrogate key column, construct an expression to use the
key_generation function to generate new keys based on that column in the target table,
incrementing by 1.
The script content depends on the RDBMS on which the status table resides. The following
is an example of the expression:
key_generation('DEMO_Target.demo_target.employee_dim', 'Emp_Surr_Key', 1)
11. On the WHERE tab, construct an expression to select only those records with a timestamp
that is later than the $G_Last_Update global variable.
The following is an example of the expression:
employee_dim.last_update > $G_Last_Update
12. Connect the GetTimestamp script to the data flow.
13. Validate and save all objects.
14. Execute the job.
Capturing Changes in Data—Learner’s Guide 233
SAP Data Services – Data Integrator XI 3.0
Managing overlaps Unless source data is rigorously isolated during the extraction process (which typically is not
practical), there is a window of time when changes can be lost between two extraction runs.
This overlap period affects source-based CDC because this kind of data capture relies on a
static timestamp to determine changed data.
For example, suppose a table has 10,000 rows. If a change is made to one of the rows after it
was loaded but before the job ends, the second update can be lost.
There are three techniques for handling this situation:
• Overlap avoidance
• Overlap reconciliation
• Presampling
For more information see “Source-based and target-based CDC” in “Techniques for Capturing
Changed Data” in the Data Services Designer Guide.
Overlap avoidance In some cases, it is possible to set up a system where there is no possibility of an overlap. You
can avoid overlaps if there is a processing interval where no updates are occurring on the target
system.
For example, if you can guarantee the data extraction from the source system does not last
more than one hour, you can run a job at 1:00 AM every night that selects only the data updated
the previous day until midnight. While this regular job does not give you up-to-the-minute
updates, it guarantees that you never have an overlap and greatly simplifies timestamp
management.
Overlap reconciliation Overlap reconciliation requires a special extraction process that re-applies changes that could
have occurred during the overlap period. This extraction can be executed separately from the
regular extraction. For example, if the highest timestamp loaded from the previous job was
01/01/2008 10:30 PM and the overlap period is one hour, overlap reconciliation re-applies the
data updated between 9:30 PM and 10:30 PM on January 1, 2008.
The overlap period is usually equal to the maximum possible extraction time. If it can take up
to N hours to extract the data from the source system, an overlap period of N (or N plus a small
increment) hours is recommended. For example, if it takes at most two hours to run the job,
an overlap period of at least two hours is recommended.
Presampling Presampling is an extension of the basic timestamp processing technique. The main difference
is that the status table contains both a start and an end timestamp, instead of the last update
timestamp. The start timestamp for presampling is the same as the end timestamp of the
previous job. The end timestamp for presampling is established at the beginning of the job. It
is the most recent timestamp from the source table, commonly set as the system date.
234 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Activity: Using source-based CDC You need to set up a job to update employee records in the Omega data warehouse whenever
they change. The employee records include timestamps to indicate when they were last updated,
so you can use source-based CDC.
Objective • Use timestamps to enable changed data capture for employee records.
Instructions 1. In the Omega project, create a new batch job called Alpha_Employees_Dim_Job.
2. Add a global variable called $G_LastUpdate with a datatype of datetime to your job.
3. In the job workspace, add a script called GetTimestamp and construct an expression to do
the following:
• Select the last time the job was executed from the last update column in the employee
dimension table.
• If the last update column is NULL, assign a value of January 1, 1901 to the $G_LastUpdate
global variable. When the job executes for the first time for the initial load, this ensures
that all records are processed.
• If the last update column is not NULL, assign the actual timestamp value to the
$G_LastUpdate global variable.
The expression should be:
$G_LastUpdate = sql('omega','select max(LAST_UPDATE) from omega.emp_dim');
if ($G_LastUpdate us null) $G_LastUpdate = to_date('1901.01.01','YYYY.MM.DD');
else print('Last update was ' || $G_LastUpdate);
4. In the job workspace, add a data flow called Alpha_Employees_Dim_DF and connect it to the
script.
5. Add the Employee table from the Alpha datastore as the source object and the Emp_Dim
table from the Omega datastore as the target object.
6. Add the Query transform and connect the objects.
7. In the transform editor for the Query transform, map the columns as follows:
Schema In Schema Out
EMPLOYEEID EMPLOYEEID
LASTNAME LASTNAME
FIRSTNAME FIRSTNAME
BIRTHDATE BIRTHDATE
Capturing Changes in Data—Learner’s Guide 235
SAP Data Services – Data Integrator XI 3.0
Schema In Schema Out
HIREDATE HIREDATE
ADDRESS ADDRESS
PHONE PHONE
EMAIL EMAIL
REPORTSTO REPORTSTO
LastUpdate LAST_UPDATE
discharge_date DISCHARGE_DATE 8. Create a mapping expression for the SURR_KEY column that generates new keys based on
the Emp_Dim target table, incrementing by 1.
The expression should be:
key_generation('Omega.omega.emp_dim', 'SURR_KEY', 1)
9. Create a mapping expression for the CITY column to look up the city name from the City
table in the Alpha datastore based on the city ID.
The expression should be:
lookup_ext([Alpha.alpha.city,'PRE_LOAD_CACHE','MAX'],
[CITYNAME],[NULL],[CITYID,'=',employee.CITYID]) SET
("run_as_separate_process"='no')
10. Create a mapping expression for the REGION column to look up the region name from the
Region table in the Alpha datastore based on the region ID.
The expression should be:
lookup_ext([Alpha.alpha.region,'PRE_LOAD_CACHE','MAX'],
[REGIONNAME],[NULL],[REGIONID,'=',employee.REGIONID]) SET
("run_as_separate_process"='no')
11. Create a mapping expression for the COUNTRY column to look up the country name from
the Country table in the Alpha datastore based on the country ID.
The expression should be:
lookup_ext([Alpha.alpha.country,'PRE_LOAD_CACHE','MAX'],
[COUNTRYNAME],[NULL],[COUNTRYID,'=',employee.COUNTRYID]) SET
("run_as_separate_process"='no')
236 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
12. Create a mapping expression for the DEPARTMENT column to look up the department
name from the Department table in the Alpha datastore based on the department ID.
The expression should be:
lookup_ext([Alpha.alpha.department,'PRE_LOAD_CACHE','MAX'],
[DEPARTMENTNAME],[NULL],[DEPARTMENTID,'=',employee.DEPARTMENTID]) SET
("run_as_separate_process"='no')
13. On the WHERE tab, construct an expression to select only those records with a timestamp
that is later than the $G_LASTUPDATE global variable.
The expression should be:
employee.LastUpdate > $G_LASTUPDATE
14. Execute Alpha_Employees_Dim_Job with the default execution properties and save all
objects you have created.
According to the log, the last update for the table was on 2007.11.07.
15. Return to the data flow workspace and view data for the target table. Sort the records by
the LAST_UPDATE column.
A solution file called SOLUTION_SourceCDC.atl is included in your resource CD. To check the
solution, import the file and open it to view the data flow design and mapping logic. Do not
execute the solution job, as this may override the results in your target table.
Capturing Changes in Data—Learner’s Guide 237
SAP Data Services – Data Integrator XI 3.0
Using target-based CDC
Introduction Target-based CDC compares the source to the target to determine which records have changed.
After completing this unit, you will be able to:
• Define the Data Integrator transforms involved in target-based CDC
Using target tables to identify changed data Source-based CDC evaluates the source tables to determine what has changed and only extracts
changed rows to load into the target tables. Target-based CDC, by contrast, extracts all the data
from the source, compares the source and target rows, and then loads only the changed rows
into the target with new surrogate keys.
Source-based changed-data capture is almost always preferable to target-based capture for
performance reasons; however, some source systems do not provide enough information to
make use of the source-based CDC techniques. Target-based CDC allows you to use the
technique when source-based change information is limited.
You can preserve history by creating a data flow that contains the following:
• A source table contains the rows to be evaluated.
• A Query transform maps columns from the source.
• A Table Comparison transform compares the data in the source table with the data in the
target table to determine what has changed. It generates a list of INSERT and UPDATE rows
based on those changes. This circumvents the default behavior in Data Services of treating
all changes as INSERT rows.
• A History Preserving transform converts certain UPDATE rows to INSERT rows based on
the columns in which values have changed. This produces a second row in the target instead
of overwriting the first row.
• A Key Generation transform generates new keys for the updated rows that are now flagged
as INSERT.
• A target table receives the rows. The target table cannot be a template table.
238 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Identifying history preserving transforms Data Services supports history preservation with three Data Integrator transforms:
Icon Transform Description
History Preserving
Key Generation
Converts rows flagged as
UPDATE to UPDATE plus
INSERT, so that the original
values are preserved in the
target. You specify the column
in which to look for updated
data.
Generates new keys for source
data, starting from a value
based on existing keys in the
table you specify.
Compares two data sets and
produces the difference
Table Comparison between them as a data set
with rows flagged as INSERT
and UPDATE.
Explaining the Table Comparison transform The Table Comparison transform allows you to detect and forward changes that have occurred
since the last time a target was updated. This transform compares two data sets and produces
the difference between them as a data set with rows flagged as INSERT or UPDATE.
Capturing Changes in Data—Learner’s Guide 239
SAP Data Services – Data Integrator XI 3.0
For example, the transform compares the input and comparison tables and determines that
row 10 has a new address, row 40 has a name change, and row 50 is a new record. The output
includes all three records, flagged as appropriate:
The next section gives a brief description of the function, data input requirements, options, and
data output results for the Table Comparison transform. For more information on the Pivot
transform see “Transforms” Chapter 5 in the Data Services Reference Guide.
Input/output The transform compares two data sets, one from the input to the transform (input data set),
and one from a database table specified in the transform (the comparison table). The transform
selects rows from the comparison table based on the primary key values from the input data
set. The transform compares columns that exist in the schemas for both inputs.
The input data set must be flagged as NORMAL.
The output data set contains only the rows that make up the difference between the tables. The
schema of the output data set is the same as the schema of the comparison table. No DELETE
operations are produced.
If a column has a date datatype in one table and a datetime datatype in the other, the transform
compares only the date section of the data. The columns can also be time and datetime datatypes,
in which case Data Integrator only compares the time section of the data.
For each row in the input data set, there are three possible outcomes from the transform:
240 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
• An INSERT column is added: The primary key value from the input data set does not match
a value in the comparison table. The transform produces an INSERT row with the values
from the input data set row.
If there are columns in the comparison table that are not present in the input data set, the
transform adds these columns to the output schema and fills them with NULL values.
• An UPDATE row is added: The primary key value from the input data set matches a value
in the comparison table, and values in the non-key compare columns differ in the
corresponding rows from the input data set and the comparison table.
The transform produces an UPDATE row with the values from the input data set row.
If there are columns in the comparison table that are not present in the input data set, the
transform adds these columns to the output schema and fills them with values from the
comparison table.
• The row is ignored: The primary key value from the input data set matches a value in the
comparison table, but the comparison does not indicate any changes to the row values.
Options The Table transform offers several options:
Option Description
Table name
Specifies the fully qualified name of the source
table from which the maximum existing key
is determined (key source table). This table
must already be imported into the repository.
Table name is represented as
datastore.owner.table where datastore is the
name of the datastore Data Services uses to
access the key source table and owner depends
on the database type associated with the table.
Specifies a column in the comparison table.
When there is more than one row in the
Generated key column comparison table with a given primary key
value, this transform compares the row with
the largest generated key value of these rows
and ignores the other rows. This is optional.
Input contains duplicate keys Provides support for input rows with duplicate
primary key values.
Detect deleted row(s) from comparison table Flags the transform to identify rows that have
been deleted from the source.
Capturing Changes in Data—Learner’s Guide 241
SAP Data Services – Data Integrator XI 3.0
Option Description
Comparison method
Input primary key column(s)
Allows you to select the method for accessing
the comparison table. You can select from
Row-by-row select, Cached comparison table,
and Sorted input.
Specifies the columns in the input data set that
uniquely identify each row. These columns
must be present in the comparison table with
the same column names and datatypes.
Improves performance by comparing only the
sub-set of columns you drag into this box from
Compare columns the input schema. If no columns are listed, all
columns in the input data set that are also in
the comparison table are used as compare
columns. This is optional.
Explaining the History Preserving transform The History Preserving transform ignores everything but rows flagged as UPDATE. For these
rows, it compares the values of specified columns and, if the values have changed, flags the
row as INSERT. This produces a second row in the target instead of overwriting the first row.
For example, a target table that contains employee information is updated periodically from a
source table. In this case, the Table Comparison transform has flagged the name change for
row 40 as an update. However, the History Preserving transform is set up to preserve history
on the LastName column, so the output changes the operation code for that record from
UPDATE to INSERT.
242 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
The next section gives a brief description of the function, data input requirements, options, and
data output results for the History Preserving transform. For more information on the History
Preserving transform see “Transforms” Chapter 5 in the Data Services Reference Guide.
Input/output The input data set is the result of a comparison between two versions of the same data in which
rows with changed data from the newer version are flagged as UPDATE rows and new data
from the newer version are flagged as INSERT rows.
The output data set contains rows flagged as INSERT or UPDATE.
Options The History Preserving transform offers these options:
Option Description
Valid from
Valid to
Specifies a date or datetime column from the
source schema. Specify a Valid from date
column if the target uses an effective date to
track changes in data.
Specifies a date value in the following format:
YYYY.MM.DD. The Valid to date cannot be
the same as the Valid from date.
Capturing Changes in Data—Learner’s Guide 243
SAP Data Services – Data Integrator XI 3.0
Option Description
Specifies a column from the source schema
that identifies the current valid row from a set
Column of rows with the same primary key. The flag
column indicates whether a row is the most
current data in the target for a given primary
key.
Defines an expression that outputs a value
with the same datatype as the value in the Set
Set value flag column. This value is used to update the
current flag column in the new row in the
target added to preserve history of an existing
row.
Defines an expression that outputs a value
with the same datatype as the value in the
Reset value Reset flag column. This value is used to update
the current flag column in an existing row in
the target that included changes in one or more
of the compare columns.
Preserve delete row(s) as update row(s)
Compare columns
Converts DELETE rows to UPDATE rows in
the target. If you previously set effective date
values (Valid from and Valid to), sets the
Valid to value to the execution date. This
option is used to maintain slowly changing
dimensions by feeding a complete data set first
through the Table Comparison transform with
its Detect deleted row(s) from comparison
table option selected.
Lists the column or columns in the input data
set that are to be compared for changes.
• If the values in the specified compare
columns in each version match, the
transform flags the row as UPDATE. The
row from the before version is updated.
The date and flag information is also
updated.
• If the values in each version do not match,
the row from the latest version is flagged
as INSERT when output from the
transform. This adds a new row to the
244 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Option Description
warehouse with the values from the new
row.
Updates to non-history preserving columns
update all versions of the row if the update is
performed on the natural key (for example,
Customer), but only update the latest version
if the update is on the generated key (for
example, GKey).
Explaining the Key Generation transform The Key Generation transform generates new keys before inserting the data set into the target
in the same way as the key_generation function does. When it is necessary to generate artificial
keys in a table, this transform looks up the maximum existing key value from a table and uses
it as the starting value to generate new keys. The transform expects the generated key column
to be part of the input schema.
For example, suppose the History Preserving transform produces rows to add to a warehouse,
and these rows have the same primary key as rows that already exist in the warehouse. In this
case, you can add a generated key to the warehouse table to distinguish these two rows that
have the same primary key.
The next section gives a brief description of the function, data input requirements, options, and
data output results for the Key Generation transform. For more information on the Key
Generation transform see “Transforms” Chapter 5 in the Data Services Reference Guide.
Input/output The input data set is the result of a comparison between two versions of the same data in which
changed data from the newer version are flagged as UPDATE rows and new data from the
newer version are flagged as INSERT rows.
The output data set is a duplicate of the input data set, with the addition of key values in the
generated key column for input rows flagged as INSERT.
Options The Key Generation transform offers these options:
Option Description
Specifies the fully qualified name of the source
table from which the maximum existing key
Table name is determined (key source table). This table
must be already imported into the repository.
Table name is represented as
datastore.owner.table where datastore is the
Capturing Changes in Data—Learner’s Guide 245
SAP Data Services – Data Integrator XI 3.0
Option Description
name of the datastore Data Services uses to
access the key source table and owner depends
on the database type associated with the table.
Specifies the column in the key source table
containing the existing keys values. A column
Generated key column with the same name must exist in the input
data set; the new keys are inserted in this
column.
Increment values Indicates the interval between generated key
values.
Activity: Using target-based CDC You need to set up a job to update product records in the Omega data warehouse whenever
they change. The product records do not include timestamps to indicate when they were last
updated, so you must use target-based CDC to extract all records from the source and compare
them to the target.
Objective • Use target-based CDC to preserve history for the Product dimension.
Instructions 1. In the Omega project, create a new batch job called Alpha_Product_Dim_Job with a data
flow called Alpha_Product_Dim_DF.
2. Add the Product table from the Alpha datastore as the source object and the Prod_Dim table
from the Omega datastore as the target object.
3. Add the Query, Table Comparison, History Preserving, and Key Generation transforms.
4. Connect the source table to the Query transform and the Query transform to the target table
to set up the schema prior to configuring the rest of the transforms.
5. In the transform editor for the Query transform, map the columns as follows:
Schema In Schema Out
PRODUCTID PRODUCTID
PRODUCTNAME PRODUCTNAME
CATEGORYID CATEGORYID
246 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Schema In Schema Out
COST COST
6. Until the key can be generated, specify a mapping expression for the SURR_KEY column
to populate it with NULL.
7. Specify a mapping expression for the EFFECTIVE_DATE column to indicate the current
date as sysdate( ).
8. Delete the link from the Query transform to the target table.
9. Connect the transforms in the following order: Query, Table Comparison, History Preserving,
and Key Generation.
10. Connect the Key Generation transform to the target table.
11. In the transform editor for the Table Comparison transform, use the Prod_Dim table in the
Omega datastore as the comparison table and set Surr_Key as the generated key column.
12. Set the input primary key column to PRODUCTID, and compare the PRODUCTNAME,
CATEGORYID, and COST columns.
13. Do not configure the History Preserving transform.
14. In the transform editor for the Key Generation transform, set up key generation based on
the Surr_Key column of the Prod_Dim table in the Omega datastore, incrementing by 1.
15. In the workspace, before executing the job, display the data in both the source and target
tables.
Note that the OmegaSoft product has been added in the source, but has not yet been updated
in the target.
16. Execute Alpha_Product_Dim_Job with the default execution properties and save all objects
you have created.
17. Return to the data flow workspace and view data for the target table.
Note that the new records were added for product IDs 2, 3, 6, 8, and 13, and that OmegaSoft
has been added to the target.
A solution file called SOLUTION_TargetCDC.atl is included in your resource CD. To check the
solution, import the file and open it to view the data flow design and mapping logic. Do not
execute the solution job, as this may override the results in your target table.
Capturing Changes in Data—Learner’s Guide 247
SAP Data Services – Data Integrator XI 3.0
Quiz: Capturing changes in data 1. What are the two most important reasons for using CDC?
2. Which method of CDC is preferred for the performance gain of extracting the fewest rows?
3. What is the difference between an initial load and a delta load?
4. What transforms do you typically use for target-based CDC?
248 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Lesson summary After completing this lesson, you are now able to:
• Update data over time
• Use source-based CDC
• Use target-based CDC
Using Data Integrator Transforms—Learner’s Guide 249
SAP Data Services – Data Integrator XI 3.0
Lesson 9
Using Data Integrator Transforms
Lesson introduction Data Integrator transforms are used to enhance your data integration projects beyond the core
functionality of the platform transforms.
After completing this lesson, you will be able to:
• Describe the Data Integrator transforms
• Use the Pivot transform
• Use the Hierarchy Flattening transform
• Describe performance optimization
• Use the Data Transfer transform
• Use the XML Pipeline transform
250 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Describing Data Integrator transforms
Introduction Data Integrator transforms perform key operations on data sets to manipulate their structure
as they are passed from source to target.
After completing this unit, you will be able to:
• Describe Data Integrator transforms available in Data Services
Defining Data Integrator transforms The following transforms are available in the Data Integrator branch of the Transforms tab in
the Local Object Library:
Icon Transform Description
Data Transfer
Allows a data flow to split its processing into two sub-data
flows and push down resource-consuming operations to
the database server.
Date Generation Generates a column filled with date values based on the
start and end dates and increment you specify.
Effective Date Generates an additional effective to column based on the
primary key’s effective date.
Hierarchy Flattening
Flattens hierarchical data into relational tables so that it can
participate in a star schema. Hierarchy flattening can be
both vertical and horizontal.
Sorts input data, maps output data, and resolves before and
after versions for UPDATE rows.
Map CDC Operation While commonly used to support Oracle or mainframe
changed data capture, this transform supports any data
stream if its input requirements are met.
Pivot Rotates the values in specified columns to rows.
Reverse Pivot Rotates the values in specified rows to columns.
XML Pipeline Processes large XML inputs in small batches.
Using Data Integrator Transforms—Learner’s Guide 251
SAP Data Services – Data Integrator XI 3.0
Using the Pivot transform
Introduction The Pivot and Reverse Pivot transforms let you convert columns to rows and rows back into
columns.
After completing this unit, you will be able to:
• Use the Pivot transform
Explaining the Pivot transform The Pivot transform creates a new row for each value in a column that you identify as a pivot
column.
It allows you to change how the relationship between rows is displayed. For each value in each
pivot column, Data Services produces a row in the output data set. You can create pivot sets
to specify more than one pivot column.
For example, you could produce a list of discounts by quantity for certain payment terms so
that each type of discount is listed as a separate record, rather than each being displayed in a
unique column.
The Reverse Pivot transform reverses the process, converting rows into columns.
The next section gives a brief description of the function, data input requirements, options, and
data output results for the Pivot transform. For more information on the Pivot transform see
“Transforms” Chapter 5 in the Data Services Reference Guide.
252 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Inputs/Outputs Data inputs include a data set with rows flagged as NORMAL.
Data outputs include a data set with rows flagged as NORMAL. This target includes the
non-pivoted columns, a column for the sequence number, the data field column, and the pivot
header column.
Options The Pivot transform offers several options:
Option Description
Pivot sequence column
Non-pivot columns
Assign a name to the sequence number
column. For each row created from a pivot
column, Data Services increments and stores
a sequence number.
Select the columns in the source that are to
appear in the target without modification.
Pivot set
Identify a number for the pivot set. For each
pivot set, you define a group of pivot columns,
a pivot data field, and a pivot header name.
Data column field
Specify the column that contains the pivoted
data. This column contains all of the Pivot
columns values.
Header column
Pivot columns
Specify the name of the column that contains
the pivoted column names. This column lists
the names of the columns where the
corresponding data originated.
Select the columns to be rotated into rows.
Describe these columns in the Header column.
Describe the data in these columns in the Data
field column.
To pivot a table 1. Open the data flow workspace.
2. Add your source object to the workspace.
3. On the Transforms tab of the Local Object Library, click and drag the Pivot or Reverse Pivot
transform to the workspace to the right of your source object.
Using Data Integrator Transforms—Learner’s Guide 253
SAP Data Services – Data Integrator XI 3.0
4. Add your target object to the workspace.
5. Connect the source object to the transform.
6. Connect the transform to the target object.
7. Double-click the Pivot transform to open the transform editor.
8. Click and drag any columns that will not be changed by the transform from the input schema
area to the Non-Pivot Columns area.
9. Click and drag any columns that will be pivoted from the input schema area to the Pivot
Columns area.
If required, you can create more than one pivot set by clicking Add.
10. If desired, change the values in the Pivot sequence column, Data field column, and Header
column fields.
These are the new columns that will be added to the target object by the transform. 11. Click Back to return to the data flow workspace.
254 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Activity: Using the Pivot transform Currently, employee compensation information is loaded into a table with a separate column
each for salary, bonus, and vacation days. For reporting purposes, you need for each of these
items to be a separate record in the HR datamart.
Objective • Use the Pivot transform to create a separate row for each entry in a new employee
compensation table.
Instructions 1. In the Omega project, create a new batch job called Alpha_HR_Comp_Job with a data flow
called Alpha_HR_Comp_DF.
2. Add the HR_Comp_Update table from the Alpha datastore to the workspace as the source
object.
3. Add the Pivot transform and connect it to the source object.
4. Add the Query transform and connect it to the Pivot transform.
5. Create a new template table called Employee_Comp in the Delta datastore as the target object.
6. In the transform editor for the Pivot transform, specify that the EmployeeID and
date_updated fields are non-pivot columns.
7. Specify that the Emp_Salary, Emp_Bonus, and Emp_VacationDays fields are pivot columns.
8. Specify that the data field column is called Comp, and the header column is called Comp_Type.
9. In the transform editor for the Query transform, map all fields from input schema to output
schema.
10. On the WHERE tab, filter out NULL values for the Comp column.
The expression should be as follows:
Pivot.Comp is not null
11. Execute Alpha_HR_Comp_Job with the default execution properties and save all objects
you have created.
12. Return to the data flow workspace and view data for the target table.
A solution file called SOLUTION_Pivot.atl is included in your resource CD. To check the
solution, import the file and open it to view the data flow design and mapping logic. Do not
execute the solution job, as this may override the results in your target table.
Using Data Integrator Transforms—Learner’s Guide 255
SAP Data Services – Data Integrator XI 3.0
Using the Hierarchy Flattening transform
Introduction The Hierarchy Flattening transform enables you to break down hierarchical table structures
into a single table to speed up data access.
After completing this unit, you will be able to:
• Use the Hierarchy Flattening transform
Explaining the Hierarchy Flattening transform The Hierarchy Flattening transform constructs a complete hierarchy from parent/child
relationships, and then produces a description of the hierarchy in horizontally- or
vertically-flattened format.
For horizontally-flattened hierarchies, each row of the output describes a single node in the
hierarchy and the path to that node from the root.
For vertically-flattened hierarchies, each row of the output describes a single relationship
between ancestor and descendent and the number of nodes the relationship includes. There is
a row in the output for each node and all of the descendants of that node. Each node is
considered its own descendent and, therefore, is listed one time as both ancestor and descendent.
256 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
The next section gives a brief description of the function, data input requirements, options, and
data output results for the Hierarchy Flattening transform. For more information on the
Hierarchy Flattening transform see “Transforms” Chapter 5 in the Data Services Reference Guide.
Inputs/Outputs Data input includes rows describing individual parent-child relationships. Each row must
contain two columns that function as the keys of the parent and child in the relationship. The
input can also include columns containing attributes describing the parent and/or child.
The input data set cannot include rows with operations other than NORMAL, but can contain
hierarchical data.
For a listing of the target columns, consult the Data Services Reference Guide.
Options The Hierarchy Flattening transform offers several options:
Option Description
Parent column Identifies the column of the source data that contains the
parent identifier in each parent-child relationship.
Child column Identifies the column in the source data that contains the
child identifier in each parent-child relationship.
Flattening type Indicates how the hierarchical relationships are described
in the output.
Using Data Integrator Transforms—Learner’s Guide 257
SAP Data Services – Data Integrator XI 3.0
Option Description
Use maximum length paths
Indicates whether longest or shortest paths are used to
describe relationships between descendants and ancestors
when the descendent has more than one parent.
Maximum depth Indicates the maximum depth of the hierarchy.
Parent attribute list Identifies a column or columns that are associated with
the parent column.
Child attribute list Identifies a column or columns that are associated with
the child column.
Run as a separate process
Creates a separate sub-data flow process for the
Hierarchy Flattening transform when Data Services
executes the data flow.
Activity: Using the Hierarchy Flattening transform The Employee table in the Alpha datastore contains employee data in a recursive hierarchy.
To determine all reports, direct or indirect, to a given executive or manager would require
complex SQL statements to traverse the hierarchy.
Objective • Flatten the hierarchy to allow more efficient reporting on data.
Instructions 1. In the Omega project, create a new batch job called Alpha_Employees_Report_Job with a
data flow called Alpha_Employees_Report_DF.
2. In the data flow workspace, add the Employee table from the Alpha datastore as the source
object.
3. Create a template table called Manager_Emps in the HR_datamart datastore as the target
object.
4. Add a Hierarchy Flattening transform to the right of the source table and connect the source
table to the transform.
5. In the transform editor for the Hierarchy Flattening transform, select the following options:
Option Value
Flattening Type Vertical
Parent Column REPORTSTO
258 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Option Value
Child Column EMPLOYEEID
Child Attribute List
LASTNAME
FIRSTNAME
BIRTHDATE
HIREDATE
ADDRESS
CITYID
REGIONID
COUNTRYID
PHONE
DEPARTMENTID
LastUpdate
discharge_date 6. Add a Query transform to the left of the Hierarchy Flattening transform and connect the
transforms.
7. In the transform editor of the Query transform, create the following output columns:
Column Datatype
MANAGERID varchar(10)
MANAGER_NAME varchar(50)
EMPLOYEEID varchar(10)
EMPLOYEE_NAME varchar(102)
DEPARTMENT varchar(50)
HIREDATE datetime
LASTUPDATE datetime
PHONE varchar(20)
Using Data Integrator Transforms—Learner’s Guide 259
SAP Data Services – Data Integrator XI 3.0
Column Datatype
EMAIL varchar(50)
ADDRESS varchar(200)
CITY varchar(50)
REGION varchar(50)
COUNTRY varchar(50)
DISCHARGE_DATE datetime
DEPTH int
ROOT_FLAG int
LEAF_FLAG
8. Map the output columns as follows:
int
Schema In Schema Out
ANCESTOR MANAGERID
DESCENDENT EMPLOYEEID
DEPTH DEPTH
ROOT_FLAG ROOT_FLAG
LEAF_FLAG LEAF_FLAG
C_ADDRESS ADDRESS
C_discharge_date DISCHARGE_DATE
C_EMAIL EMAIL
C_HIREDATE HIREDATE
C_LastUpdate LASTUPDATE
260 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Schema In Schema Out
C_PHONE PHONE
9. Create a mapping expression for the MANAGER_NAME column to look up the manager's
last name from the Employee table in the Alpha datastore based on the employee ID in the
ANCESTOR column of the Hierarchy Flattening transform.
The expression should be:
lookup_ext([Alpha.alpha.employee, 'PRE_LOAD_CACHE', 'MAX'], [LASTNAME], [NULL],
[EMPLOYEEID, '=', Hierarchy_Flattening.ANCESTOR]) SET
("run_as_separate_process"='no')
10. Create a mapping expression for the EMPLOYEE_NAME column to concatenate the
employee's last name and first name, separated by a comma.
The expression should be:
Hierarchy_Flattening.C_LASTNAME || ', ' || Hierarchy_Flattening.C_FIRSTNAME
11. Create a mapping expression for the DEPARTMENT column to look up the name of the
employee's department from the Department table in the Alpha datastore based on the
C_DEPARTMENTID column of the Hierarchy Flattening transform.
The expression should be:
lookup_ext([Alpha.alpha.department, 'PRE_LOAD_CACHE', 'MAX'], [DEPARTMENTNAME],
[NULL], [DEPARTMENTID, '=', Hierarchy_Flattening.C_DEPARTMENTID]) SET
("run_as_separate_process"='no')
12. Create a mapping expression for the CITY column to look up the name of the employee's
city from the City table in the Alpha datastore based on the C_CITYID column of the
Hierarchy Flattening transform.
The expression should be:
lookup_ext([Alpha.alpha.city, 'PRE_LOAD_CACHE', 'MAX'], [CITYNAME], [NULL],
[CITYID, '=', Hierarchy_Flattening.C_CITYID]) SET
("run_as_separate_process"='no')
13. Create a mapping expression for the REGION column to look up the name of the employee's
region from the Region table in the Alpha datastore based on the C_REGIONID column of
the Hierarchy Flattening transform.
The expression should be:
lookup_ext([Alpha.alpha.region, 'PRE_LOAD_CACHE', 'MAX'], [REGIONNAME], [NULL],
[REGIONID, '=',Hierarchy_Flattening.C_REGIONID]) SET
("run_as_separate_process"='no')
Using Data Integrator Transforms—Learner’s Guide 261
SAP Data Services – Data Integrator XI 3.0
14. Create a mapping expression for the COUNTRY column to look up the name of the
employee's country from the Country table in the Alpha datastore based on the
C_COUNTRYID column of the Hierarchy Flattening transform.
The expression should be:
lookup_ext([Alpha.alpha.country, 'PRE_LOAD_CACHE', 'MAX'], [COUNTRYNAME],
[NULL], [COUNTRYID, '=', Hierarchy_Flattening.C_COUNTRYID]) SET
("run_as_separate_process"='no')
15. Add a WHERE clause to the Query transform to return only rows where the depth is greater
than zero.
The expression should be as follows:
Hierarchy_Flattening.DEPTH > 0
16. Execute Alpha_Employees_Report_Job with the default execution properties and save all
objects you have created.
17. Return to the data flow workspace and view data for the target table.
Note that 179 rows were written to the target table.
A solution file called SOLUTION_HierarchyFlattening.atl is included in your resource CD.
To check the solution, import the file and open it to view the data flow design and mapping
logic. Do not execute the solution job, as this may override the results in your target table.
262 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Describing performance optimization
Introduction You can improve the performance of your jobs by pushing down operations to the source or
target database to reduce the number of rows and operations that the engine must retrieve and
process.
After completing this unit, you will be able to:
• List operations that Data Services pushes down to the database
• View SQL code generated by a data flow
• Explore data caching options
• Explain process slicing
Describing push-down operations Data Services examines the database and its environment when determining which operations
to push down to the database:
• Full push-down operations
The Data Services optimizer always tries to do a full push-down operation. Full push-down
operation s are operations that can be pushed down to the databases and the data streams
directly from the source database to the target database. For example, Data Services sends
SQL INSERT INTO... SELECT statements to the target database and it sends SELECT to
retrieve data from the source.
Data Services can only do full push-down operation s to the source and target databases
when the following conditions are met:
○ All of the operations between the source table and target table can be pushed down
○ The source and target tables are from the same datastore or they are in datastores that
have a database link defined between them.
• Partial push-down operations
When a full push-down operation is not possible , Data Services tries to push down the
SELECT statement to the source database. Operations within the SELECT statement that
can be pushed to the database include:
Operation Description
Aggregations
Aggregate functions, typically used with a
Group by statement, always produce a data
set smaller than or the same size as the
original data set.
Using Data Integrator Transforms—Learner’s Guide 263
SAP Data Services – Data Integrator XI 3.0
Operation Description
Distinct rows Data Services will only output unique rows
when you use distinct rows.
Filtering Filtering can produce a data set smaller than
or equal to the original data set.
Joins Joins typically produce a data set smaller
than or similar in size to the original tables.
Ordering
Projections
Ordering does not affect data set size. Data
Services can efficiently sort data sets that fit
in memory. Since Data Services does not
perform paging (writing out intermediate
results to disk), it is recommended that you
use a dedicated disk-sorting program such
as SyncSort or the DBMS itself to order very
large data sets.
A projection normally produces a smaller
data set because it only returns columns
referenced by a data flow.
Functions
Most Data Services functions that have
equivalents in the underlaying database are
appropriately translated. Operations that cannot be pushed down
Data Services cannot push some transform operations to the database. For example:
• Expressions that include Data Services functions that do not have database correspondents.
• Load operations that contain triggers.
• Transforms other than Query.
• Joins between sources that are on different database servers that do not have database links
defined between them.
Similarly, not all operations can be combined into single requests. For example, when a stored
procedure contains a COMMIT statement or does not return a value, you cannot combine the
stored procedure SQL with the SQL for other operations in a query. You can only push
operations supported by the RDBMS down to that RDBMS.
Note: You cannot push built-in functions or transforms to the source database. For best
performance, do not intersperse built-in transforms among operations that can be pushed down
to the database. Database-specific functions can only be used in situations where they will be
pushed down to the database for execution.
264 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Viewing SQL generated by a data flow Before running a job, you can view the SQL generated by the data flow and adjust your design
to maximize the SQL that is pushed down to improve performance. Alter your design to
improve the data flow when necessary.
Keep in mind that Data Services only shows the SQL generated for table sources. Data Services
does not show the SQL generated for SQL sources that are not table sources, such as the lookup
function, the Key Generation transform, the key_generation function, the Table Comparison
transform, and target tables.
To view SQL 1. In the Data Flows tab of the Local Object Library, right-click the data flow and select Display
Optimized SQL from the menu.
The Optimized SQL dialog box displays.
2. In the left pane, select the datastore for the data flow.
The optimized SQL for the datastore displays in the right pane.
Caching data You can improve the performance of data transformations that occur in memory by caching
as much data as possible. By caching data, you limit the number of times the system must
access the database. Cached data must fit into available memory.
Pageable caching Data Services allows administrators to select a pageable cache location to save content over the
2 GB RAM limit. The pageable cache location is set up in Server Manager and the option to use
pageable cache is selected on the Dataflow Properties dialog box.
Using Data Integrator Transforms—Learner’s Guide 265
SAP Data Services – Data Integrator XI 3.0
Persistent caching Persistent cache datastores can be created through the Create New Datastore dialog box by
selecting Persistent Cache as the database type. The newly-created persistent cache datastore
will appear in the list of datastores, and can be used as a source in jobs.
For more information about advanced caching features, see the Data Services Performance
Optimization Guide.
Slicing processes You can also optimize your jobs through process slicing, which involves splitting data flows
into sub-data flows.
Sub-data flows work on smaller data sets and/or fewer transforms so there is less virtual
memory to consume per process. This way, you can leverage more physical memory per data
flow as each sub-data flow can access 2 GB of memory.
This functionality is available through the Advanced tab for the Query transform. You can run
each memory-intensive operation as a separate process.
For more information on process slicing, see the Data Services Performance Optimization Guide.
266 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Using the Data Transfer transform
Introduction The Data Transfer transform allows a data flow to split its processing into two sub-data flows
and push down resource-consuming operations to the database server.
After completing this unit, you will be able to:
• Use the Data Transfer transform
Explaining the Data Transfer transform The Data Transfer transform moves data from a source or the output from another transform
into a transfer object and subsequently reads data from the transfer object. You can use the
Data Transfer transform to push down resource-intensive database operations that occur
anywhere within the data flow. The transfer type can be a relational database table, persistent
cache table, file, or pipeline.
Use the Data Transfer transform to:
• Push down operations to the database server when the transfer type is a database table. You
can push down resource-consuming operations such as joins, GROUP BY, and sorts.
• Define points in your data flow where you want to split processing into multiple sub-data
flows that each process part of the data. Data Services does not need to process the entire
input data in memory at one time. Instead, the Data Transfer transform splits the processing
among multiple sub-data flows that each use a portion of memory.
The next section gives a brief description of the function, data input requirements, options, and
data output results for the Data Transfer transform. For more information on the Data Transfer
transform see “Transforms” Chapter 5 in the Data Services Reference Guide.
Inputs/Outputs When the input data set for the Data Transfer transform is a table or file transfer type, the rows
must be flagged with the NORMAL operation code. When input data set is a pipeline transfer
type, the rows can be flagged as any operation code.
The input data set must not contain hierarchical (nested) data.
Output data sets have the same schema and same operation code as the input data sets. In the
push down scenario, the output rows are in the sort or GROUP BY order.
The sub-data flow names use the following format, where n is the number of the data flow: dataflowname_n
The execution of the output depends on the temporary transfer type:
For Table or File temporary transfer types, Data Services automatically splits the data flow into
sub-data flows and executes them serially.
Using Data Integrator Transforms—Learner’s Guide 267
SAP Data Services – Data Integrator XI 3.0
For Pipeline transfer types, Data Services splits the data flow into sub-data flows if you specify
the Run as a separate process option in another operation in the data flow. Data Services
executes these sub-data flows that use pipeline in parallel.
Activity: Using the Data Transfer transform The Data Transfer transform can be used to push data down to a database table so that it can
be processed by the database server rather than the Data Services Job Server. In this activity,
you will join data from two database schemas. When the Data Transfer transform is not used,
the join will occur on the Data Services Job Server. When the Data Transfer transform is added
to the data flow the join can be seen in the SQL Query by displaying the optimized SQL for the
data flow.
Objective • Use the Data Transfer transform to optimize performance.
Instructions 1. In the Omega project, create a new batch job called No_Data_Transfer_Job with a data flow
called No_Data_Transfer_DF.
2. In the Delta datastore, import the Employee_Comp table and add it to the
No_Data_Transfer_DF workspace as a source table.
3. Add the Employee table from the Alpha datastore as a source table.
4. Add a Query transform to the data flow workspace and attach both source tables to the
transform.
5. In the transform editor for the Query transform, add the LastName and BirthDate columns
from the Employee table and the Comp_Type and Comp columns from the Employee_Comp
table to the output schema.
6. Add a WHERE clause to join the tables on the EmployeeID columns.
7. Create a template table called Employee_Temp in the Delta datastore as the target object
and connect it to the Query transform.
8. Save the job.
9. In the Local Object Library, use the right-click menu for the No_Data_Transfer_DF data
flow to display the optimized SQL.
Note that the WHERE clause does not appear in either SQL statement.
10. In the Local Object Library, replicate the No_Data_Transfer_DF data flow and rename the
copy Data_Transfer_DF.
11. In the Local Object Library, replicate the No_Data_Transfer_Job job and rename the copy
Data_Transfer_Job.
12. Add the Data_Transfer_Job job to the Omega project.
268 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
13. Delete the No_Data_Transfer_DF data flow from the Data_Transfer_Job and add the
Data_Transfer_DF data flow to the job by dragging it from the Local Object Library to the
job's workspace.
14. Delete the connection between the Employee_Comp table and the Query transform.
15. Add a Data Transfer transform between the Employee_Comp table and the Query transform
and connect the three objects.
16. In the transform editor for the Data Transfer transform, select the following options:
Option Value
Transfer Type Table
Table Name alpha.pushdown_data
Note: You must manually enter the name of the table. The table is created when the job
runs and is dropped automatically at the end.
17. In the transform editor for the Query transform, update the WHERE clause to join the
Data_Transfer.employeeid and employee.employeeid fields. Verify the Comp_Type and
Comp columns are mapped to the Data Transfer transform.
18. Save the job.
19. In the Local Object Library, use the right-click menu for the Data_Transfer_DF data flow to
display the optimized SQL.
Note that the WHERE clause appears in the SQL statements.
A solution file called SOLUTION_DataTransfer.atl is included in your resource CD. To check
the solution, import the file and open it to view the data flow design and mapping logic. Do
not execute the solution job, as this may override the results in your target table.
Using Data Integrator Transforms—Learner’s Guide 269
SAP Data Services – Data Integrator XI 3.0
Using the XML Pipeline transform
Introduction The XML Pipeline transform is used to process large XML files more efficiently by separating
them into small batches.
After completing this unit, you will be able to:
• Use the XML Pipeline transform
Explaining the XML Pipeline transform The XML Pipeline transform is used to process large XML files, one instance of a specified
repeatable structure at a time.
With this transform, Data Services does not need to read the entire XML input into memory
and build an internal data structure before performing the transformation.
This means that an NRDM structure is not required to represent the entire XML data input.
Instead, this transform uses a portion of memory to process each instance of a repeatable
structure, then continually releases and re-uses the memory to continuously flow XML data
through the transform.
During execution, Data Services pushes operations of the streaming transform to the XML
source. Therefore, you cannot use a breakpoint between your XML source and an XML Pipeline
transform.
Note:
You can use the XML Pipeline transform to load into a relational or nested schema target. This
course focuses on loading XML data into a relational target.
For more information on constructing nested schemas for your target, refer to the Data Services
Designer Guide.
Inputs/Outputs You can use an XML file or XML message. You can also connect more than one XML Pipeline
transform to an XML source.
When connected to an XML source, the transform editor shows the input and output schema
structures as a root schema containing repeating and non-repeating sub-schemas represented
by these icons:
Icon Schema structure
Root schema and repeating sub-schema
270 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Icon Schema structure
Non-repeating sub-schema
Keep in mind these rules when using the XML Pipeline transform:
• You cannot drag and drop the root level schema.
• You can drag and drop the same child object repeated times to the output schema, but only
if you give each instance of that object a unique name. Rename the mapped instance before
attempting to drag and drop the same object to the output again.
• When you drag and drop a column or sub-schema to the output schema, you cannot then
map the parent schema for that column or sub-schema. Similarly, when you drag and drop
a parent schema, you cannot then map an individual column or sub-schema from under
that parent.
• You cannot map items from two sibling repeating sub-schemas because the XML Pipeline
transform does not support Cartesian products (combining every row from one table with
every row in another table) of two repeatable schemas.
To take advantage of the XML Pipeline transform’s performance, always select a repeatable
column to be mapped. For example, if you map a repeatable schema column, the XML source
produces one row after parsing one item.
Avoid selecting non-repeatable columns that occur structurally after the repeatable schema
because the XML source must then assemble the entire structure of items in memory before
processing. Selecting non-repeatable columns that occur structurally after the repeatable schema
increases memory consumption to process the output into your target.
To map both the repeatable schema and a non-repeatable column that occurs after the repeatable
one, use two XML Pipeline transforms, and use the Query transform to combine the outputs
of the two XML Pipeline transforms and map the columns into one single target.
Options The XML Pipeline is streamlined to support massive throughput of XML data; therefore, it
does not contain additional options other than input and output schemas, and the Mapping
tab.
Activity: Using the XML Pipeline transform Purchase order information is stored in XML files that have repeatable purchase orders and
items, and a non-repeated Total Purchase Orders column. You must combine the customer
name, order date, order items, and the totals into a single relational target table, with one row
per customer per item.
Objectives • Use the XML Pipeline transform to extract XML data.
Using Data Integrator Transforms—Learner’s Guide 271
SAP Data Services – Data Integrator XI 3.0
• Combine the rows required from both XML sources into a single target table joined using
a Query transform
Instructions 1. On the Formats tab of the Local Object Library, create a new file format for an XML schema
called PurchaseOrders_Format, based on the purchaseOrders.xsd file in the Activity_Source
folder. Use a root element of PurchaseOrders.
2. In the Omega project, create a new job called Alpha_Purchase_Orders_Job, with a data flow
called Alpha_Purchase_Orders_DF.
3. In the data flow workspace for Purchase_Orders_DF, add the PurchaseOrders_Format file
format as the XML file source object.
4. In the format editor for the file format, point the file format to the pos.xml file in the
Activity_Source folder.
5. Add two instances of the XML Pipeline transform to the data flow workspace and connect
the source object to each.
6. In the transform editor for the first XML Pipeline transform, map the following columns:
Schema In Schema Out
customerDate customerDate
orderDate orderDate 7. Map the entire item repeatable schema from the input schema to the output schema.
8. In the transform editor for the second XML Pipeline transform, map the following columns:
Schema In Schema Out
customerDate customerDate
orderDate orderDate
totalPOs totalPOs
9. Add a Query transform to the data flow workspace and connect both XML Pipeline transform
to it.
10. In the transform editor for the Query transform, map both columns and the repeatable
schema from the first XML Pipeline transform from the input schema to the output schema.
Also map the totalPOs columns from the second XML Pipeline transform.
11. Unnest the item repeatable schema.
12. Create a WHERE clause to join the inputs from the two XML Pipeline transforms on the
customerName column.
272 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
The expression should be as follows:
XML_Pipeline.customerName = XML_Pipeline_1.customerName
13. Add a new template table called Item_POs to the Delta datastore and connect the Query
transform to it.
14. Execute Alpha_Purchase_Orders_Job with the default execution properties and save all
objects you have created.
15. Return to the data flow workspace and view data for the target table.
A solution file called SOLUTION_XMLPipeline.atl is included in your resource CD. To check
the solution, import the file and open it to view the data flow design and mapping logic. Do
not execute the solution job, as this may override the results in your target table.
Using Data Integrator Transforms—Learner’s Guide 273
SAP Data Services – Data Integrator XI 3.0
Quiz: Using Data Integrator transforms 1. What is the Pivot transform used for?
2. What is the purpose of the Hierarchy Flattening transform?
3. What is the difference between the horizontal and vertical flattening hierarchies?
4. List three things you can do to improve job performance.
5. Name three options that can be pushed down to the database.
274 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Lesson summary After completing this lesson, you are now able to:
• Describe the Data Integrator transforms
• Use the Pivot transform
• Use the Hierarchy Flattening transform
• Describe performance optimization
• Use the Data Transfer transform
• Use the XML Pipeline transform
Answer Key-Learner's Guide 275
SAP Data Services – Data Integrator XI 3.0
Answer Key
This section contains the answers to the reviews and/ or activities for the applicable lessons.
SAP Data Services – Data Integrator XI 3.0
276 BusinessObjects Data Integrator XI 3.0: Core Concepts-Learner's Guide
Answer Key—Learner’s Guide 277
SAP Data Services – Data Integrator XI 3.0
Quiz: Describing Data Services Page 28
1. List two benefits of using Data Services.
Answer:
○ Create a single infrastructure for data movement to enable faster and lower cost
implementation.
○ Manage data as a corporate asset independent of any single system.
○ Integrate data across many systems and re-use that data for many purposes.
○ Improve performance.
○ Reduce burden on enterprise systems.
○ Prepackage data solutions for fast deployment and quick return on investment (ROI).
○ Cleanse customer and operational data anywhere across the enterprise.
○ Enhance customer and operational data by appending additional information to increase
the value of the data.
○ Match and consolidate data at multiple levels within a single pass for individuals,
households, or corporations.
2. Which of these objects is single-use?
Answer:
b. Project
3. Place these objects in order by their hierarchy: data flows, jobs, projects, and work flows.
Answer: Projects, jobs, work flows, data flows. 4. Which tool do you use to associate a job server with a repository?
Answer: The Data Services Server Manager. 5. Which tool allows you to create a repository?
Answer: The Data Services Repository Manager. 6. What is the purpose of the Access Server?
Answer: The Access Server is a real-time, request-reply message broker that collects incoming
XML message requests, routes them to a real-time service, and delivers a message reply
within a user-specified time frame.
278 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Quiz: Defining source and target metadata Page 66
1. What is the difference between a datastore and a database?
Answer: A datastore is a connection to a database. 2. What are the two methods in which metadata can be manipulated in Data Services objects?
What does each of these do?
Answer:
You can use an object’s options and properties settings to manipulate Data Services objects.
Options control the operation of objects. For example, the name of the database to connect
to is a datastore option.
Properties document the object. For example, the name of the datastore and the date on
which it was created are datastore properties. Properties are merely descriptive of the object
and do not affect its operation.
3. Which of the following is NOT a datastore type?
Answer:
d. File Format
4. What is the difference between a repository and a datastore?
Answer: A repository is a set of tables that hold system objects, source and target metadata,
and transformation rules. A datastore is an actual connection to a database that holds data.
Answer Key—Learner’s Guide 279
SAP Data Services – Data Integrator XI 3.0
Quiz: Creating batch jobs Page 99
1. Does a job have to be part of a project to be executed in the Designer?
Answer: Yes. Jobs can be created separately in the Local Object Library, but they must be
associated with a project in order to be executed.
2. How do you add a new template table?
Answer: Click and drag the Template Table icon from the tool palette or from the Datastores
tab of the Local Object Library to the workspace.
3. Name the objects contained within a project.
Answer: Examples of objects are: jobs, work flows, and data flows. 4. What factors might you consider when determining whether to run work flows or data
flows serially or in parallel?
Answer:
Consider the following:
○ Whether or not the flows are independent of each other
○ Whether or not the server can handle the processing requirements of flows running at
the same time (in parallel)
280 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Quiz: Troubleshooting batch jobs Page 128
1. List some reasons why a job might fail to execute.
Answer: Incorrect syntax, Job Server not running, port numbers for Designer and Job Server
not matching.
2. Explain the View Data feature.
Answer: View Data allows you to look at the data for a source or target file. 3. What must you define in order to audit a data flow?
Answer: You must define audit points and audit rules when you want to audit a data flow. 4. True or false? The auditing feature is disabled when you run a job with the debugger.
Answer: True.
Answer Key—Learner’s Guide 281
SAP Data Services – Data Integrator XI 3.0
Quiz: Using functions, scripts, and variables Page 173
1. Describe the differences between a function and a transform.
Answer: Functions operate on single values, such as values in specific columns in a data
set. Transforms operate on data sets, creating, updating, and deleting rows of data.
2. Why are functions used in expressions?
Answer: Functions can be used in expressions to map return values as new output columns.
Adding output columns allows columns that are not in an input data set to be specified in
an output data set.
3. What does a lookup function do? How do the different variations of the lookup function
differ?
Answer: All lookup functions return one row for each row in the source. They differ in how
they choose which of several matching rows to return.
4. What value would the Lookup_ext function return if multiple matching records were found
on the translate table?
Answer: Depends on Return Policy (Min or Max) 5. Explain the differences between a variable and a parameter.
Answer: A parameter is an expression that passes a piece of information to a work flow,
data flow, or custom function when it is called in a job. A variable is a symbolic placeholder
for values.
6. When would you use a global variable instead of a local variable?
Answer:
○ When the variable will need to be used multiple times within a job.
○ When you want to reduce the development time required for passing values between
job components.
○ When you need to create a dependency between job level global variable name and job
components.
7. What is the recommended naming convention for variables in Data Services?
Answer: Variable names must be preceded by a dollar sign ($). Local variables start with
$L_, while global variables can be denoted by $G_. 8. Which object would you use to define a value that is constant in one environment, but may
change when a job is migrated to another environment?
Answer:
d. Substitution parameter
282 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Quiz: Using platform transforms Page 203
1. What would you use to change a row type from NORMAL to INSERT?
Answer: The Map Operation transform. 2. What is the Case transform used for?
Answer: The Case transform simplifies branch logic in data flows by consolidating case or
decision-making logic in one transform with multiple paths defined in an expression table.
It simplifies branch logic in data flows by consolidating case or decision-making logic into
one transform.
3. Name the transform that you would use to combine incoming data sets to produce a single
output data set with the same schema as the input data sets.
Answer: The Merge transform. 4. A validation rule consists of a condition and an action on failure. When can you use the
action on failure options in the validation rule?
Answer:
You can use the action on failure option only if:
○ The column value failed the validation rule.
○ Send to Pass or Send to both option is selected. 5. When would you use the Merge transform versus the SQL transform to merge records?
Answer: The SQL transform performs better than the Merge transform, so it should be used
whenever possible. However, the SQL transform cannot join records from file formats, so
you would need to use the Merge transform for those source objects.
Answer Key—Learner’s Guide 283
SAP Data Services – Data Integrator XI 3.0
Quiz: Setting up error handling Page 220
1. List the different strategies you can use to avoid duplicate rows of data when re-loading a
job.
Answer:
○ Using the auto-correct load option in the target table.
○ Including the Table Comparison transform in the data flow.
○ Designing the data flow to completely replace the target table during each execution.
○ Including a preload SQL statement to execute before the table loads. 2. True or false? You can only run a job in recovery mode after the initial run of the job has
been set to run with automatic recovery enabled.
Answer: True. 3. What are the two scripts in a manual recovery work flow used for?
Answer: The first script determines if recovery is required, usually by reading the status in
a status table. The second script updates the status table to indicate successful job execution.
4. Which of the following types of exception can you NOT catch using a try/catch block?
Answer:
b. Syntax errors
284 BusinessObjects Data Integrator XI 3.0: Core Concepts—Learner’s Guide
SAP Data Services – Data Integrator XI 3.0
Quiz: Capturing changes in data Page 247
1. What are the two most important reasons for using CDC?
Answer: Improving performance and preserving history. 2. Which method of CDC is preferred for the performance gain of extracting the fewest rows?
Answer: Source-based CDC. 3. What is the difference between an initial load and a delta load?
Answer:
An initial load is the first population of a database using data acquisition modules for
extraction, transformation, and load. The first time you execute a batch job, Designer performs
an initial load to create the data tables and populate them.
A delta load incrementally loads data that has been changed or added since the last load
iteration. When you execute your job, the delta load may run several times, loading data
from the specified number of rows each time until all new data has been written to the target
database.
4. What transforms do you typically use for target-based CDC?
Answer: Table Comparison, History Preserving, and Key Generation.
Answer Key—Learner’s Guide 285
SAP Data Services – Data Integrator XI 3.0
Quiz: Using Data Integrator transforms Page 273
1. What is the Pivot transform used for?
Answer: Use the Pivot transform when you want to group data from multiple columns into
one column while at the same time maintaining information linked to the columns.
2. What is the purpose of the Hierarchy Flattening transform?
Answer: The Hierarchy Flattening transform enables you to break down hierarchical table
structures into a single table to speed data access.
3. What is the difference between the horizontal and vertical flattening hierarchies?
Answer:
With horizontally-flattened hierarchies, each row of the output describes a single node in
the hierarchy and the path to that node from the root.
With vertical-flattened hierarchies, each row of the output describes a single relationship
between ancestor and descendent and the number of nodes the relationship includes. There
is a row in the output for each node and all of the descendants of that node. Each node is
considered its own descendent and, therefore, is listed one time as both ancestor and
descendent.
4. List three things you can do to improve job performance.
Answer:
Choose from the following:
○ Utilize the push-down operations.
○ View SQL generated by a data flow and adjust your design to maximize the SQL that is
pushed down to improve performance.
○ Use data caching.
○ Use process slicing. 5. Name three options that can be pushed down to the database.
Answer:
Choose from the following:
○ Aggregations (typically performed with a GROUP BY)
○ Distinct rows
○ Filtering
○ Joins
○ Ordering
○ Projections
○ Functions that have equivalents in the underlying database
SAP Data Services – Data Integrator XI 3.0
286 BusinessObjects Data Integrator XI 3.0: Core Concepts-Learner's Guide
SAP Data Services – Data Integrator XI 3.0
Notes
SAP Data Services – Data Integrator XI 3.0
Notes
SAP Data Services – Data Integrator XI 3.0
Notes
SAP Data Services – Data Integrator XI 3.0
Notes