isug melbourne operators

Upload: rajeshdatastage

Post on 04-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 ISUG Melbourne Operators

    1/17

    2012 Data Migrators Pty. Ltd.

    DATASTAGEOPERATORSUnderstand Them,

    Then Roll Your Own!

  • 8/13/2019 ISUG Melbourne Operators

    2/17

    2012 Data Migrators Pty. Ltd.

    Audience

    Who has DataStage experience?

    Who understands difference between Parallel and Serverjobs?

    Who knows what an Operator is?

    Who knows what OSH is?

  • 8/13/2019 ISUG Melbourne Operators

    3/17

    2012 Data Migrators Pty. Ltd.

    Introduction

    What happens when you click Run?

    Whats an operator?

    Whats OSH?

    Creating your own operators

    Summary

  • 8/13/2019 ISUG Melbourne Operators

    4/17

    2012 Data Migrators Pty. Ltd.

    Introduction

    Well cover Parallel jobs Operators are a Parallel concept

    Rich functionality and connectivity Linear scalability

    How many people have built production Server jobs? Skills issue!?

  • 8/13/2019 ISUG Melbourne Operators

    5/17

    2012 Data Migrators Pty. Ltd.

    How does a job Run? Compilation converts the graphical job into a shell script

    Script runs in the Orchestrate Shell OSH Each Job becomes one OSH script Each Stage becomes one or more executable Operators Operators are (sort of) equivalent to commands in Unix shell script, but!

    have multiple inputs and outputs, and hence a slightly different syntax to Unix shell scripts

  • 8/13/2019 ISUG Melbourne Operators

    6/17

    2012 Data Migrators Pty. Ltd.

    Stage To Operator Mapping

    Stage Operator

    Sequential File import, export

    External Source import

    External Target export

    Transformer transform

    Aggregator group

    Join innerjoin, leftouterjoin, rightouterjoin, fullouterjoin

    Merge merge

    Lookup lookup, oralookup, db2lookup, sybaselookup, sqlsrvrlookup

    Funnel sortfunnel, sequence

    Sort psort, tsort

    Remove Duplicates remdup

  • 8/13/2019 ISUG Melbourne Operators

    7/17

    2012 Data Migrators Pty. Ltd.

    #### STAGE: MyInput## Operatorimport## Operator options-schema record{final_delim=end, delim='|'}

    (inRecord:int32;

    inAddr:string[];telNum:nullable string[];

    )-file 'C:\\Users\\isuser\\Documents\\MyData.csv'-rejects continue-reportProgress yes-firstLineColumnNames## General options[ident(MyInput'); jobmon_ident(MyInput')]## Outputs0> [] 'Input:AddressIn.v';

    #### STAGE: MyInput## Operatorimport## Operator options-schema record{final_delim=end, delim='|'}

    (inRecord:int32;

    inAddr:string[];telNum:nullable string[];

    )-file 'C:\\Users\\isuser\\Documents\\MyData.csv'-rejects continue-reportProgress yes-firstLineColumnNames## General options[ident(MyInput'); jobmon_ident(MyInput')]## Outputs0> [] 'Input:AddressIn.v';

    OSH Operator Syntax

    Operator Type

    OperatorParameters

    Identification

    Interface(s)

    End

  • 8/13/2019 ISUG Melbourne Operators

    8/17

    2012 Data Migrators Pty. Ltd.

    An OSH ScriptInputOperator{parameters}0> [] FirstLink.v;ProcessOperator

    {parameters}0< [] FirstLink.v0> [] SecondLink.v;OutputOperator{parameters}0< [] SecondLink.v;

  • 8/13/2019 ISUG Melbourne Operators

    9/17

    2012 Data Migrators Pty. Ltd.

    Youre building OSH

    You can create and execute your own OSH scripts No DataStage Designer necessary!

    [demo]

    Programmatically generate DataStage jobs Generate hundreds of jobs from Ruby, C, Python, etc., etc. Generate a bespoke job in response to a Web Page submission, etc.

    No compilation necessaryAlthough transformers are a bit special

    Start with a DataStage job in Designer Use the generate OSH as a template

  • 8/13/2019 ISUG Melbourne Operators

    10/17

    2012 Data Migrators Pty. Ltd.

    Visualise OSH

    Writing stand-alone OSH, ordiagnosing generated OSH can bevery cumbersome.

    Use a tool to visualise your OSH http://gosh.datamigrators.com [demo]

  • 8/13/2019 ISUG Melbourne Operators

    11/17

    2012 Data Migrators Pty. Ltd.

    OSH At Runtime

    A Node Configuration file tells DataStage! How to execute multiple parallel instances of your job How to map operators to O/S processes

    DataStage may combine operators Good for performance, bad for debugging Can disable this with $APT_DISABLE_COMBINATION

    DataStage may add additional operators to your job E.g. Sort or Partition to ensure correct operation of Join Operators Can disable this with $APT_NO_SORT_INSERTION

  • 8/13/2019 ISUG Melbourne Operators

    12/17

    2012 Data Migrators Pty. Ltd.

    The Orchestra

  • 8/13/2019 ISUG Melbourne Operators

    13/17

    2012 Data Migrators Pty. Ltd.

    4 Ways to Integrate Custom Functionality

    Transformer functions Built using any language that

    can compile into a sharedlibrary

    Called once per data item Integrated into the transformoperator

    Wrapped Stages Pipe rows through operating

    system commands

    Slowest performance!

  • 8/13/2019 ISUG Melbourne Operators

    14/17

    2012 Data Migrators Pty. Ltd.

    4 Ways to Integrate Custom Functionality

    Build Stages A GUI custom operator constructor Built in C/C++ with helper macros

    E.g. readRecord(), writeRecord(), doTransfer() Some restrictions

    e.g. minimum 1 input, 1 output

    Custom Stages Built in C/C++ Fewer restrictions than a Build Stage. E.g.

    Can create data sources and data targets Can create combinable operators

    Documented in the Custom Operator Reference

    Both of these!

    Creates a native OSH operator Offer high performance Custom Icon DataStage native interface

  • 8/13/2019 ISUG Melbourne Operators

    15/17

    2012 Data Migrators Pty. Ltd.

    Example

    Experian QAS Batch Postal address cleaning solutionA bespoke databaseA C/C++ API which provides!

    Start(), Open(), Clean(), Close(), Shutdown()

    Thats it!

    We integrated QAS Batch so it runs as an operator Scales performance of QAS Batch linearly QAS Batch is now grid-enabled

    [demo]

  • 8/13/2019 ISUG Melbourne Operators

    16/17

    2012 Data Migrators Pty. Ltd.

    Summary

    Dont fear the OSH! It represents your real DataStage job It tells you whats really happening under the hood Understanding them can help performance diagnosis

    OSH scripts can be auto-generated

    Build an operator Theyre fast Theyre reusable

    They can be used to integrate virtually anything, seamlessly

    If you can do it in C/C++, then you can build an operator for it They open new possibilities

  • 8/13/2019 ISUG Melbourne Operators

    17/17

    2012 Data Migrators Pty. Ltd.

    Fin