ibm infosphere datastage introduction online training

26
For more details please contact us: US : +1 718 819 9361 INDIA : +91 8099776681 Email Us : [email protected] Welcome to IBM Data Stage 9.1

Upload: kernel-training

Post on 12-Jan-2017

886 views

Category:

Education


5 download

TRANSCRIPT

For more details please contact us:US : +1 718 819 9361INDIA : +91 8099776681Email Us : [email protected]

Welcome to IBM Data Stage 9.1

2 http://kerneltraining.com/ibm-data-stage/

DATA WAREHOUSE A data warehouse is a copy of transaction data specifically

structured for querying and reporting. An expanded definition for data warehousing includes business

intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.

This definition of the data warehouse focuses on data storage. A data warehouse can be normalized or de normalized. It can be a relational database, multidimensional database, flat file,

hierarchical database, object database, etc. Data warehouse data often gets changed. And data warehouses often focus on a specific activity or entity.

3 http://kerneltraining.com/ibm-data-stage/

DATA WAREHOUSE

4 http://kerneltraining.com/ibm-data-stage/

Reasons for Dirty Data

Dummy Values Absence of Data Multipurpose Fields Cryptic Data Contradicting Data Inappropriate Use of Address Lines Violation of Business Rules Reused Primary Keys, Non-Unique Identifiers Data Integration Problems

5 http://kerneltraining.com/ibm-data-stage/

Data Cleansing

Source systems contain dirty data that must be cleansed

ETL software contains rudimentary data cleansing capabilities

Specialized data cleansing software is often used. Important for performing name and address correction and house holding functions

Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and First logic (i.e. Centric)

6 http://kerneltraining.com/ibm-data-stage/

IBM ETL Overview

7 http://kerneltraining.com/ibm-data-stage/

IBM ETL Overview

8 http://kerneltraining.com/ibm-data-stage/

Data Stage

In its simplest form, Data Stage performs from source systems to target systems in batch and in real time. The data sources may include indexed files, sequential files, relational databases, archives, external data sources, enterprise applications and message queues.

9 http://kerneltraining.com/ibm-data-stage/

Data Stage

Data Stage Administrator

Data Stage Designer

Data Stage Director

The Data Stage client components are:

10 http://kerneltraining.com/ibm-data-stage/

Data Stage Administrator Designer Director

Specify general server defaults Add and delete projects Set project properties

Access Data Stage Repository by command interface

Use Data Stage Administrator to:

11 http://kerneltraining.com/ibm-data-stage/

Data Stage Administrator Designer Director

12 http://kerneltraining.com/ibm-data-stage/

Data Stage Administrator Designer Director

Specify how the data is extracted

Specify data transformations

Decode (de normalize) data going into the data mart using referenced lookups

Aggregate data Split data into

multiple outputs on the basis of defined constraints

Use Data StageDesigner to:

13 http://kerneltraining.com/ibm-data-stage/

Data Stage Administrator Designer DirectorUse Data stage Director to run, schedule, and monitor your Data Stage jobs. You can also gather statistics as the job runs. Also used for looking at logs for debugging purposes.

The Data Stage Director window is divided into two panes: The Job Category pane lists all of the jobs in the repository. Right pane shows one of three views: Status view, Schedule view, or

Log view.

14 http://kerneltraining.com/ibm-data-stage/

Data Stage Administrator Designer Director

15 http://kerneltraining.com/ibm-data-stage/

Frequently seen Status

1 Finished 2 Finished (see log) 9 Has been reset 11 Validated OK 12 Validated (see log) 21 Has been reset 99 Compiled 0 Running 3 Aborted 8 Failed validation 13 Failed validation 96 Aborted 97 Stopped 98 Not Compiled

16 http://kerneltraining.com/ibm-data-stage/

Data Stage:Getting Started

Set up a project – Before you can create any Data Stage jobs, you must set up your project by entering information about your data.

Create a job – When a Data Stage project is installed, it is empty and you must create the jobs you need in Data Stage Designer.

Define Table Definitions Develop the job – Jobs are designed and developed

using the Designer. Each data source, the data warehouse, and each processing step is represented by a stage in the job design. The stages are linked together to show the flow of data.

17 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer Developing a Job

18 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer Developing a Job

19 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer Input Stage

20 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer Transformer Stage

The Transformer stage performs any data conversion required before the data is output to another stage in the job design.

After you are done, compile and run the job.

21 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer

22 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer

23 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer

24 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer

25

Call us: +91 8099776681Email: [email protected]://kerneltraining.com/ibm-data-stage/

Questions ?

26

Call us: +91 8099776681Email: [email protected]://kerneltraining.com/ibm-data-stage/