isug melbourne operators
TRANSCRIPT
-
8/13/2019 ISUG Melbourne Operators
1/17
2012 Data Migrators Pty. Ltd.
DATASTAGEOPERATORSUnderstand Them,
Then Roll Your Own!
-
8/13/2019 ISUG Melbourne Operators
2/17
2012 Data Migrators Pty. Ltd.
Audience
Who has DataStage experience?
Who understands difference between Parallel and Serverjobs?
Who knows what an Operator is?
Who knows what OSH is?
-
8/13/2019 ISUG Melbourne Operators
3/17
2012 Data Migrators Pty. Ltd.
Introduction
What happens when you click Run?
Whats an operator?
Whats OSH?
Creating your own operators
Summary
-
8/13/2019 ISUG Melbourne Operators
4/17
2012 Data Migrators Pty. Ltd.
Introduction
Well cover Parallel jobs Operators are a Parallel concept
Rich functionality and connectivity Linear scalability
How many people have built production Server jobs? Skills issue!?
-
8/13/2019 ISUG Melbourne Operators
5/17
2012 Data Migrators Pty. Ltd.
How does a job Run? Compilation converts the graphical job into a shell script
Script runs in the Orchestrate Shell OSH Each Job becomes one OSH script Each Stage becomes one or more executable Operators Operators are (sort of) equivalent to commands in Unix shell script, but!
have multiple inputs and outputs, and hence a slightly different syntax to Unix shell scripts
-
8/13/2019 ISUG Melbourne Operators
6/17
2012 Data Migrators Pty. Ltd.
Stage To Operator Mapping
Stage Operator
Sequential File import, export
External Source import
External Target export
Transformer transform
Aggregator group
Join innerjoin, leftouterjoin, rightouterjoin, fullouterjoin
Merge merge
Lookup lookup, oralookup, db2lookup, sybaselookup, sqlsrvrlookup
Funnel sortfunnel, sequence
Sort psort, tsort
Remove Duplicates remdup
-
8/13/2019 ISUG Melbourne Operators
7/17
2012 Data Migrators Pty. Ltd.
#### STAGE: MyInput## Operatorimport## Operator options-schema record{final_delim=end, delim='|'}
(inRecord:int32;
inAddr:string[];telNum:nullable string[];
)-file 'C:\\Users\\isuser\\Documents\\MyData.csv'-rejects continue-reportProgress yes-firstLineColumnNames## General options[ident(MyInput'); jobmon_ident(MyInput')]## Outputs0> [] 'Input:AddressIn.v';
#### STAGE: MyInput## Operatorimport## Operator options-schema record{final_delim=end, delim='|'}
(inRecord:int32;
inAddr:string[];telNum:nullable string[];
)-file 'C:\\Users\\isuser\\Documents\\MyData.csv'-rejects continue-reportProgress yes-firstLineColumnNames## General options[ident(MyInput'); jobmon_ident(MyInput')]## Outputs0> [] 'Input:AddressIn.v';
OSH Operator Syntax
Operator Type
OperatorParameters
Identification
Interface(s)
End
-
8/13/2019 ISUG Melbourne Operators
8/17
2012 Data Migrators Pty. Ltd.
An OSH ScriptInputOperator{parameters}0> [] FirstLink.v;ProcessOperator
{parameters}0< [] FirstLink.v0> [] SecondLink.v;OutputOperator{parameters}0< [] SecondLink.v;
-
8/13/2019 ISUG Melbourne Operators
9/17
2012 Data Migrators Pty. Ltd.
Youre building OSH
You can create and execute your own OSH scripts No DataStage Designer necessary!
[demo]
Programmatically generate DataStage jobs Generate hundreds of jobs from Ruby, C, Python, etc., etc. Generate a bespoke job in response to a Web Page submission, etc.
No compilation necessaryAlthough transformers are a bit special
Start with a DataStage job in Designer Use the generate OSH as a template
-
8/13/2019 ISUG Melbourne Operators
10/17
2012 Data Migrators Pty. Ltd.
Visualise OSH
Writing stand-alone OSH, ordiagnosing generated OSH can bevery cumbersome.
Use a tool to visualise your OSH http://gosh.datamigrators.com [demo]
-
8/13/2019 ISUG Melbourne Operators
11/17
2012 Data Migrators Pty. Ltd.
OSH At Runtime
A Node Configuration file tells DataStage! How to execute multiple parallel instances of your job How to map operators to O/S processes
DataStage may combine operators Good for performance, bad for debugging Can disable this with $APT_DISABLE_COMBINATION
DataStage may add additional operators to your job E.g. Sort or Partition to ensure correct operation of Join Operators Can disable this with $APT_NO_SORT_INSERTION
-
8/13/2019 ISUG Melbourne Operators
12/17
2012 Data Migrators Pty. Ltd.
The Orchestra
-
8/13/2019 ISUG Melbourne Operators
13/17
2012 Data Migrators Pty. Ltd.
4 Ways to Integrate Custom Functionality
Transformer functions Built using any language that
can compile into a sharedlibrary
Called once per data item Integrated into the transformoperator
Wrapped Stages Pipe rows through operating
system commands
Slowest performance!
-
8/13/2019 ISUG Melbourne Operators
14/17
2012 Data Migrators Pty. Ltd.
4 Ways to Integrate Custom Functionality
Build Stages A GUI custom operator constructor Built in C/C++ with helper macros
E.g. readRecord(), writeRecord(), doTransfer() Some restrictions
e.g. minimum 1 input, 1 output
Custom Stages Built in C/C++ Fewer restrictions than a Build Stage. E.g.
Can create data sources and data targets Can create combinable operators
Documented in the Custom Operator Reference
Both of these!
Creates a native OSH operator Offer high performance Custom Icon DataStage native interface
-
8/13/2019 ISUG Melbourne Operators
15/17
2012 Data Migrators Pty. Ltd.
Example
Experian QAS Batch Postal address cleaning solutionA bespoke databaseA C/C++ API which provides!
Start(), Open(), Clean(), Close(), Shutdown()
Thats it!
We integrated QAS Batch so it runs as an operator Scales performance of QAS Batch linearly QAS Batch is now grid-enabled
[demo]
-
8/13/2019 ISUG Melbourne Operators
16/17
2012 Data Migrators Pty. Ltd.
Summary
Dont fear the OSH! It represents your real DataStage job It tells you whats really happening under the hood Understanding them can help performance diagnosis
OSH scripts can be auto-generated
Build an operator Theyre fast Theyre reusable
They can be used to integrate virtually anything, seamlessly
If you can do it in C/C++, then you can build an operator for it They open new possibilities
-
8/13/2019 ISUG Melbourne Operators
17/17
2012 Data Migrators Pty. Ltd.
Fin