ds stages

Upload: subbarao-gaddam

Post on 06-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 ds stages

    1/6

    1.What is the Exact difference between BASIC Transformer and NORMAL

    Transformer?

    A. The Transformer stage is inherent PX functionality,

    whereas the BASIC Transformer uses a Server interface to

    call a Server Transformer stage. There's severe performance

    impact as well as partitioning limitations, but it does

    give a PX job some access to existing Server functionality.

    2. There ate two types of transformer i. Basic

    transformer and ii. Active transformer. Basic transformer

    is used for SMP system and not in MPP or cluster. Basic

    transformer (BASIC is the language supported by the Data

    stage server engine and available in Server job). Where in

    Datastage Px the Active transformer get use.

    3. Transformer stages are always active stages. The

    basic transform stage is part of the Server product, but

    the PX engine allows this stage to be called (the opposite,

    using a PX stage in Server is not possible)

    2.Did sequential stage accepts .xl files ,xml? znd how?

    yes it accepts. use fixed line pattern

    3.what is main difference between change capture and change apply stages

    the stage compares two data set(after and before) and

    makes a record of the differences.

  • 8/3/2019 ds stages

    2/6

    change apply stage combine the changes from the change

    capture stage with the original before data set to

    reproduce the after data set

    4.difference between server shared container and parallel shared container

    1. Server shared containers contain server stage types,

    parallel shared containers contain parallel stage types.

    2. When we go for parallel shared container the logic

    can be reusable across many jobs

    Introduction

    DataStage Enterprise Edition is a package of three products: DataStage Server

    Edition, the parallel extender with parallel ETL jobs and the MetaStage product

    described on the Metadata Workbench entry. The flagship tool of Enterprise Edition

    is parallel ETL jobs.

    [edit]

  • 8/3/2019 ds stages

    3/6

    History

    During the 1990s the data integration vendors such as Ascential and Informatica

    were competing to deliver tools that provided a wide range of data connectivity and

    transformation functions in a mostly code free environment. Towards the late 1990s

    data stores were becoming large, data warehouses and business intelligence wasdemanding larger volumes of data loads. The physical architecture of these loads

    was hitting a limit on the volume that a single server could handle and was moving

    towards clusters or grids of servers.

    The data integration vendors need to be able to integrate data across a massively

    scalable architecture to keep up with the increased data volumes.

    Ascential started to roll out a parallel capability in the DataStage Server Edition

    product called multiple instance jobs. This allowed some additional manual

    programming to partition and process data in parallel. In November 2001 they

    switched to a buy approach and purchased Torrent Systems for $46 million.

    Torrent had the capability to run tools on a massively parallel processing (MPP)

    platform.

    [edit]

    Versions

    This section lists each major release of DataStage Enterprise Edition and the

    enhancements for DataStage parallel jobs. For a list of enhancements to the client

    tools see the versions on the DataStage Server Edition page is it is the version that

    has been delivered with every release going back to DataStage 1.

    All release of DataStage 7 can import and upgrade DataStage 6 export files.

    DataStage 8 can only import and upgrade DataStage 7.5.1 or 7.5.2 jobs.

    [edit]

    DataStage 6

    Released in September 2002, ten months after the acquisition of Torrent, it was the

    first version of DataStage to feature the Parallel Extender (PX), the parallel platform

    that allows processes to run in parallel across a multiple processor environment.

    New parallel job type with a new set of parallel stages. Some with the same

    name as server job stages but with different properties and options.

    Server job shared container for parallel jobs.

  • 8/3/2019 ds stages

    4/6

    CPU based licensing instead of server based licensing.

    Support for SAS 6.12 and 8.2.

    This release was followed by the client only 6.0.1 release that fixed a number

    problems.

    [edit]

    DataStage 7

    Release September 2003 it uses much the same architecture of the previous

    version with improvements to the usability. This was the first release to have no

    server job improvements but many parallel job improvements.

    XML Pack 2.0 provides improved XML metadata support for parallel jobs.

    National Language Support (NLS) for parallel jobs but not for all parallel

    stages.

    Parallel shared and local stages.

    Enhanced transformer with improved reject row handling, string handling,

    timestamp conversion and compile performance.

    Modify, Switch and Filter stages added.

    Multiple-instance parallel jobs.

    Non blocking funnel stage.

    [edit]

    DataStage 7.5

    Unknown release date.

    Parallel complex flat file stage.

    A parallel job message handler for demoting or removing warning messages

    from the job log.

    Lookup stage changes from a property screen to a drag and drop mappingscreen.

    Multi node import of sequential files.

    Additional options for sequential file and file set stages such as Read First

    Rows, Row Number Column and First Line is Column Names.

  • 8/3/2019 ds stages

    5/6

    View data support for custom stages.

    New Parallel Advanced Job Developers Guide.

    [edit]

    DataStage 7.5.1

    Released in March 2005.

    New SQL Builder for building SQL query statements from a database plugin

    stage.

    Command line job search function added.

    DataStage parallel jobs for Unix System Services (USS) on the mainframe.

    Remote job deployment to deliver and run jobs across a cluster or grid.

    Vector support in the parallel transformer stage.

    Sybase and ODBC stages added to parallel jobs.

    Complex Flat File stage improvements: multiple output links, automatically

    generated fillers, MVS dataset support.

    Thread based job monitoring for parallel jobs.

    [edit]

    DataStage 7.5X2

    Released in December 2004 this was the first release of parallel jobs that could run

    on Windows. While the Server runs on all the same Unix and Linux platforms as

    7.5.1 it adds the additional platform of Windows 2003 Standard or Enterprise on the

    Intel x86 Processor Family.

    There were no changes to parallel jobs in this release apart from the capability to

    compile and run them on Windows.

    [edit]

    DataStage 8

    Released in October 2006 for Windows and April 2007 for Unix this is the first

    version to run on the IBM Information Server. There are a number of parallel job

    improvements in this release:

    Lookup stage now supports two new lookup types: range lookup and caseless

    lookup.

  • 8/3/2019 ds stages

    6/6

    New Slowly Changing Dimension stage.

    New QualityStage stages for parallel jobs.

    What is the difference between a Filter and a Swit...

    ________________________________________

    A Filter stage is used to filter the incoming data ,for suppose u want to get the

    details of customer 20 if u give customer 20 as the constraint in filter it will display

    only the customer 20 files and u can also give a reject link,the rest of the records

    will go into reject link.

    where as in the switch,

    we need to give as cases,

    like case1,case2.

    case1=10;

    case2=20;

    it will give the outputs of 10 and 20 customer records.

    switch will check the cases and execute them.