module 07 balanced optimizer

16
© 2010 IBM Corporation Information Management Advanced DataStage Workshop Module 07 – Balanced Optimization

Upload: sharmaisha0902

Post on 24-Nov-2015

15 views

Category:

Documents


0 download

DESCRIPTION

Balanced Optimizer in datastage

TRANSCRIPT

  • 2010 IBM Corporation

    Information Management

    Advanced DataStage WorkshopModule 07 Balanced Optimization

  • 2 2010 IBM Corporation

    Information Management

    After completing this topic, you should be able to:

    Describe what is DataStage Balanced Optimization

    Understand the different optimization options available for users

    Understand what stages that the Balanced Optimizer will consider for pushing to source and/or target

    Module Objectives

  • 2010 IBM Corporation

    Information Management

    InfoSphere DataStage Balanced Optimization

    Provides the same job design as traditional DataStage jobs so there is no recoding required

    Leverages investments in DBMS hardware by executing data integration tasks within the DBMS

    Optimizes job run-time by allowing the developer to control where the entire job or various parts of the job will execute.

    Transform and aggregate any volume of information in batch or real time

    through visually designed logic

    ArchitectsDevelopers

    Optimizing run time through intelligent use of DBMS hardware

  • 2010 IBM Corporation

    Information Management

    Balanced Optimization: Leveraging Best-of-Breed Systems

    Optimization is not constrained to a single implementation style such as ETL or ELT

    InfoSphere DataStage Balanced Optimization fully harnesses available capacity and computing power in the DBMS as well as InfoSphere DataStage

    Delivering unlimited scalability and performance through parallel execution everywhere, all the time

  • 2010 IBM Corporation

    Information Management

    How Balanced Optimizer Works

    Standard DataStage Design Use the same DataStage stage/link design paradigm Compile it, run it, verify that it works correctly (as normal) Allows the process to capture rich metadata that supports impact and

    dependency analysis

    Optimization Process optimizer rewrites job graph into a new optimized job defaults push as much I/O and processing as possible into database

    targets, then into sources Run optimized job to assess performance and resource usage

    Re-optimize as required Supports multiple versions of the optimized job to exist concurrently Test various versions to validate which best balances system

    resources and performance characteristics

  • 2010 IBM Corporation

    Information Management

    Balanced Optimization: Intelligent Pattern Recognition

    Intelligence based on a known and prioritized list of processing patterns

    Examines the job design looking for these known patterns Determines which patterns can be pushed to source or target database based on the specific DBMS

    and user options selected

    Optimizes out stages which the DBMS will address as part of its optimizer

    Iterative approach - after a known pattern is optimized, the job is reanalyzed until no more known patterns can be optimized

  • 2010 IBM Corporation

    Information Management

    Repository

    Balanced Optimization User InterfaceOptimization Options

    Additional Properties

    Original and Optimized Job

    Detailed Trace / Logging

  • 2010 IBM Corporation

    Information Management

    Balanced Optimizer: User Driven Options

    User can influence the optimization process

    Options that are presented are the ones relevant to the job design

    Options are preset to maximize performance

    User can override to tune the job as they see fit.

  • 2010 IBM Corporation

    Information Management

    Balanced Optimization Options

    Leverage high performance bulk loads into staging with post processing

    Use Bulk Loading

    If all sources, targets reside in the same database and transformation logic support, push all processing into target

    Push all processing into the database

    Name for an alternative database where bulk staging is to be used

    Staging database name

    Push Transformations, Joins, Lookups, Sorts, and Aggregation into database targets where possible

    Push processing to database targets

    Push Transformations, Sorts, and Aggregation into database sources

    Push processing to database sources

  • 2010 IBM Corporation

    Information Management

    Balanced Optimization Stage Optimization Overview

    1as supported by database 2involving data already in the target

    1,2

    1

    Push to Target

    1

    1

    1

    Push to Source

    Push everything into the (target) databaseUse bulk staging operations (load)Drop unnecessary processing (e.g., sorting)FunnelJoin, LookupAggregationSortingTransformation

  • 2010 IBM Corporation

    Information Management

    Using Balanced Optimization

    originalDataStage

    job

    designjob

    DataStageDesigner

    jobresults

    compile& run

    verify

    rewrittenoptimized

    job

    optimizejob

    Balanced Optimization

    compile& run

    choose different optionsand re-optimize

    manually review/edit optimized job

  • 2010 IBM Corporation

    Information Management

    Balanced Optimization: Performance Considerations

    Minimize I/O and data copying/movement source data reductions move the processing to the data keep data in the database(s) - avoid target extractions

    Maximize optimization within sources or targets indices, native optimizations, database-specific features

    Maximize parallelism I/O from/to databases in the DataStage parallel engine inside the database(s)

  • 2010 IBM Corporation

    Information Management

    Example

    Within Balanced Optimizer Dialog

    OriginalDataStage

    Job

  • 2010 IBM Corporation

    Information Management

    Why Leverage Both Engines

    Balance processing against operations that scale well on the database (like operations working on indexes) and scalability of the DataStage Parallel Engine

    Processing requirements that have no direct SQL equivalents (see box on right for sample)

    Leveraging Data Quality components alongside other data integration tasks

    Connectivity to other enterprise data sources outside the dbms (ftp, mainframe file, ERP sources, etc)

    Sample Unique Functions Transformer

    stage and loop variable derivations with circular references

    most macros and system variables a few functions and operators (see

    User Guide for list) custom transform functions

    Lookup stage lookup-fail condition-not-met

    Sorting nulls last unique sorts EBCDIC sorting

    XML Mix hierarchical and relational

    processing

  • 2010 IBM Corporation

    Information Management

    When Balanced Optimization is Most Attractive

    Significant amount of homogenous DBMS integration requirements Existing DBMS infrastructure can support the capacity of processing

    for data integration tasks Desire to invest future HW decisions in the DBMS so it can serve

    both purposes (database and data integration)

  • 16 2010 IBM Corporation

    Information Management

    After completing this topic, you should be able to:

    Describe what is DataStage Balanced Optimization

    Understand the different optimization options available for users

    Understand what stages that the Balanced Optimizer will consider for pushing to source and/or target

    Module Summary