xfm stage

5
Transformer The Transformer stage is a processing stage. It appears under the processing category in the tool palette. Transformer stages allow you to create transformations to apply to your data. These transformations can be simple or complex and can be applied to individual columns in your data. Transformations are specified using a powerful set of functions such as date & time, logical, mathematical, null handling, number, raw, string, vector, type conversions, type casting, and utility functions. For complete details of these functions, refer to IBM WebSphere DataStage and QualityStage Parallel Job Developer Guide, SC18-9891-00. Transformer stages can have a single input and any number of outputs. It can also have a reject link, which takes any rows that have not been written to any of the outputs links by reason of a write failure or expression evaluation failure. This You might want to pass some data straight through the Transformer stage unaltered, but it is likely that you will want to transform data from some input columns before outputting it from the Transformer stage. You can specify such an operation by entering a transform expression. The source of an output link column is defined in that column’s Derivation cell within the Transformer Editor. You can use the Expression® Editor to enter expressions in this cell. You can also simply drag an input column to an output column’s Derivation cell, to pass the data straight through the Transformer stage. In addition to specifying derivation details for individual output columns, you can

Upload: kantamnenisrinivas1

Post on 10-Mar-2015

60 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: xfm stage

Transformer

The Transformer stage is a processing stage. It appears under the processingcategory in the tool palette. Transformer stages allow you to createtransformations to apply to your data. These transformations can be simple orcomplex and can be applied to individual columns in your data. Transformationsare specified using a powerful set of functions such as date & time, logical,mathematical, null handling, number, raw, string, vector, type conversions, typecasting, and utility functions. For complete details of these functions, refer to IBMWebSphere DataStage and QualityStage Parallel Job Developer Guide,SC18-9891-00.Transformer stages can have a single input and any number of outputs. It canalso have a reject link, which takes any rows that have not been written to any ofthe outputs links by reason of a write failure or expression evaluation failure. This

You might want to pass some data straight through the Transformer stageunaltered, but it is likely that you will want to transform data from some inputcolumns before outputting it from the Transformer stage.You can specify such an operation by entering a transform expression. Thesource of an output link column is defined in that column’s Derivation cell withinthe Transformer Editor. You can use the Expression® Editor to enterexpressions in this cell. You can also simply drag an input column to an outputcolumn’s Derivation cell, to pass the data straight through the Transformer stage.In addition to specifying derivation details for individual output columns, you canalso specify constraints that operate on entire output links. A constraint is anexpression that specifies criteria that data must meet before it can be passed tothe output link. You can also specify a constraint otherwise link, which is anoutput link that carries all the data not output on other links, that is, columns thathave not met the criteria.Each output link is processed in turn. If the constraint expression evaluates toTRUE for an input row, the data row is output on that link. Conversely, if aconstraint expression evaluates to FALSE for an input row, the data row is notoutput on that link.

Constraint expressions on different links are independent. If you have more thanone output link, an input row may result in a data row being output from some,none, or all of the output links.You can also specify another output link, which takes rows that have not beenwritten to any other links because of write failure or expression evaluation failure.This is specified outside the stage by adding a link and converting it to a rejectlink. This link is not shown in the Transformer metadata grid, and derives itsmetadata from the input link. Its column values are those in the input row thatfailed to be written.Figure 2-111 and Figure 2-112 show an example of the configuration of aTransformer stage in a job (“J03_IL_LoadProductDim” on page 202 in the retailindustry scenario described in “Retail industry scenario” on page 138), as

Page 2: xfm stage

follows:1. Figure 2-111 shows the job that initially loads the Product dimension table.This is described in “J03_IL_LoadProductDim” on page 202 and is notrepeated here. Instead, we only focus on the configuration of the Transformerstage in this job.2. Figure 2-112 shows the Trim function being used to remove trailing blanks inall the input columns before being written to the output linkOdbc_ProductDim.However, for the SKU column, a constraint8 is defined that the raw length ofthe value in this field must exceed 5 bytes before it can be passed to theoutput link.Note: If you have enabled Runtime Column Propagation for an output link,

you do not have to specify metadata for that link. IBM InfoSphere DataStage is

flexible about metadata. It can cope with the situation where metadata is not

fully defined. You can define part of your schema and specify that, if your job

encounters extra columns that are not defined in the metadata when it actually

runs, it will adopt these extra columns and propagate them through the rest of

the job. This is known as runtime column propagation (RCP).

This can be enabled for a project via the IBM InfoSphere DataStage and

QualityStage Admin, and set for individual links via the Output Page Columns

tab for most stages, or in the Output page General tab for Transformer stages.

You should always ensure that runtime column propagation is turned on if you

want to use schema files to define column metadata.

Figure 2-111 and Figure 2-112 show an example of the configuration of aTransformer stage in a job (“J03_IL_LoadProductDim” on page 202 in the retailindustry scenario described in “Retail industry scenario” on page 138), asfollows:1. Figure 2-111 shows the job that initially loads the Product dimension table.This is described in “J03_IL_LoadProductDim” on page 202 and is notrepeated here. Instead, we only focus on the configuration of the Transformerstage in this job.2. Figure 2-112 shows the Trim function being used to remove trailing blanks inall the input columns before being written to the output linkOdbc_ProductDim.However, for the SKU column, a constraint8 is defined that the raw length ofthe value in this field must exceed 5 bytes before it can be passed to theoutput link.Note: If you have enabled Runtime Column Propagation for an output link,you do not have to specify metadata for that link. IBM InfoSphere DataStage isflexible about metadata. It can cope with the situation where metadata is notfully defined. You can define part of your schema and specify that, if your job

Page 3: xfm stage

encounters extra columns that are not defined in the metadata when it actuallyruns, it will adopt these extra columns and propagate them through the rest ofthe job. This is known as runtime column propagation (RCP).This can be enabled for a project via the IBM InfoSphere DataStage andQualityStage Admin, and set for individual links via the Output Page Columnstab for most stages, or in the Output page General tab for Transformer stages.You should always ensure that runtime column propagation is turned on if youwant to use schema files to define column metadata.8

Figure 2-111 and Figure 2-112 show an example of the configuration of aTransformer stage in a job (“J03_IL_LoadProductDim” on page 202 in the retailindustry scenario described in “Retail industry scenario” on page 138), asfollows:1. Figure 2-111 shows the job that initially loads the Product dimension table.This is described in “J03_IL_LoadProductDim” on page 202 and is notrepeated here. Instead, we only focus on the configuration of the Transformerstage in this job.2. Figure 2-112 shows the Trim function being used to remove trailing blanks inall the input columns before being written to the output linkOdbc_ProductDim.However, for the SKU column, a constraint8 is defined that the raw length ofthe value in this field must exceed 5 bytes before it can be passed to theoutput link.