more next blog» create blog sign in datastage...

The Complete datastage solutions by Madhava

datastage 8.1Home About blog Datastage Overview DS Course Syllabus Contact Us

what is the diff between sequential file and fileset stages?

Sequential file is used to read the data sequentially. It can be configured to execute both in parallel and sequential mode. We cann't perform lookups using sequential file. Fileset stage is used to import the exported files list. It executes only in parallel mode. the most importance of it is 2G.B limit on the size of a file and we need to distribute files among the nodes to prevent overruns.

What's the Main Function of the Staging area in DWH

Preparing the Extracted Data to be loaded to the DWH. Preparing like examine each extracted file, review the Business rules, perform the various data transformation functions, resolve inconsistencies, Sort & Merge data, etc.

if 3 table having different columes. like first table having 4 columns , second table having 3 columns and third table having 2 columns then how to capture the data by

Realtime datastage interview Questions

All CategoriesSearch

Noika X101

Total Pageviews

5 2 7 6 9

2010 (12) December (4)

Various flavors of UNIX

47Basic UNIXCommands

UNIX commands

Blog Archive

0 More Next Blog» Create Blog Sign In

http://datastageinfosoft.blogspot.in/2010/12/unix-commands.html

http://datastageinfosoft.blogspot.in/p/ds-course-syllabus.html

http://datastageinfosoft.blogspot.in/p/about-blog.html

http://datastageinfosoft.blogspot.in/p/datastage-overview.html

http://datastageinfosoft.blogspot.in/


http://datastageinfosoft.blogspot.in/2010/12/47-basic-unix-commands.html

http://datastageinfosoft.blogspot.in/search?updated-min=2010-01-01T00:00:00-08:00&updated-max=2011-01-01T00:00:00-08:00&max-results=12

http://datastageinfosoft.blogspot.in/2010/12/various-flavors-of-unix.html

http://datastageinfosoft.blogspot.in/2010_12_01_archive.html

http://datastageinfosoft.blogspot.in/p/contact-us_19.html

using funnel stage in parallel jobs...

we can use column generator for each of the inputs in order to make same metadata for all the 3 inputs because the prerequisite for the funnel stage is metadata should be same.........then we can funnel based on the key column

in sequential file 2 columns available , I want only one column load the target how do it.

Datastage: Use modify stage

Unix: $cut command

How can we run same job in 1 day 2 times

we can do this in different ways.

1. we can right the userdefined shell script.

2. we can do with Job sequences by selecting Start loop and End loop activities.

3. we can do with datastage director by scheduling.

I thing using Invocation Id we can run the job more than one time in one day.

Realtime datastageinterview Questions

November (2)

October (2)

September (3)

August (1)

Click Here!

Yahoo News Top StoriesUK royals issue warning over PrinceGeorge picsTesco's 250millionpound black hole:Who was minding the shop?Islamist insurgents seize westernIraqi town of Hit: security sourcesNew type of clinic eyed to help stopEbola

Instagram Google Desktop Gadgets Google Calendar Apps

Yahoo News: Top Stories

Share it





http://datastageinfosoft.blogspot.in/2010/12/realtime-datastage-interview-questions.html

In detailed

we have an option in Job properties>Allow multiple instance (enable this one and)

At the time running it will ask invocation id give the id how many times u want run in particular day...

How to Convert the columns into rows?

Using the Pivot we can convert the columns into rows

if we take 2 tables(like emp and dept), we use join stage and how to improve the performance?

when ever join 2 tables based on key columns if the key column is numeric ,set modulus,if the key column is non numeric set hash partition technique. and compare to look up join give better performance

how would i run job1 then job 3 , then job2 in a sequence of job1 ,job2,job3.

U can run the job1 then job3 by two methods by using trigger conditional or by nested condition activity.

I have source file which contains duplicate data, my requirement is unique datashould pass to one file and duplicate data should pass another file how?

Share this on Facebook

Tweet this

View stats

(NEW) Appointment gadget >>

Search

Search This Blog

Madhava

For any queries regarding this blog inforeach me @ 9652796966

View my complete profile

About Me

http://www.blogger.com/profile/10143992175893683488


All you have to do is link your source to a Sort Stage. Sort the data and generate a Key Change column. Key Change column = ‘1’ represents that the record is unique while Key Change Column = ‘0’ represents the duplicates.

Put a Filter stage and filter out the data into two different outputs based on the generated Key Change Column.

Is it possible to load two tables data into one sequential file? if possible how?

Yes you can join two tables and load the data into sequential file... Take two tables use join stage and join the data based on the key and then load the data into sequential file...

how can we create a Surrogate key in transformer stage? I want it in parallel mode

In transformer Properties, there is a tab called “Surrogate Key”. Here we can define it.

We can select from

a. Flat file : State file which holds the SK values and generates SK values.

b. DB Sequence: Database sequence object to use to generate the SK values.

i have one table i want divide the table with two different table like even rows andodd rows how can i do

oracle stage>surrogate key generator>transformer

the t/f having 'mod' function

mod(empkey,2)=0 >even rows

mod(empkey,2)<>0 >odd rows

empkey is generated from surrogate key generator

on how many columns we can perform aggregation in the Aggregator stage?

any number of columns we can perform aggregation

ex:group by deptno,group by deptname,

How many nodes supported by a one cpu in parallel jobs?

here node is nothing but processor, parallel jobs supports uniprocessor, smp, mpp and clustered system so nodes supporting is depend upon h/w architecture ok.

at source level i have 40 columns, i want only 20 cols at target what r the various ways to get it

Use Modify Stage and Drop the columns that are not needed . Or Use a Transformer and take the links tat are only needed(make sure that RCP is not marked )

when U have a remove duplicate option in sort stage, why we have a removeduplicate stage in PX.

In Duplicate Stages we have more number of options compare to sort while removing duplicates. If you have less number if data you can go with Sort stage to remove duplicates. If you have large number of data go for Remove Duplicates Stage.

Source have 1000 records and it can have three nodes ok but i want how many records are executed on each node?

Can’t perfectly say how many records are executed on each node. It all depends on the partitioning method used. If we use a round robin type the records will be distributed

equally among the nodes. If a Hash method is used where similar records will be assigned to the same node then it depends on the data.

what are the types of nodes

NODE REFERS TO LOGICAL PROCESSING UNIT

1nod file: for sequential execution, lighter reportsMedN – nod file:supports mix of pipeline and data partitionBigN – nod file: aims at full data partitioned parallelism

what is time dimension? and how to populate time dimension

Every DWH has time dimension u can load the time dimension though pl/sql script.

I have oracle enterprise stage, oracle bulkloader stage, ODBC stage and I want toload data in to target, which of the above stages do I need to use and why?

Its all depends on the source data if the data is coming through oracle datastage then go for oracle enterprise stage if it a incremental load but if it is initial load then go for oracle bulkloader stage. Oracle enterprise is faster as compare to ODBC. oracle enterprise runs parallel where as ODBC runs seq.

how can we perform the 2nd time extraction of client database without accepting the data which is already loaded in first time extraction

use change capture stage and make sure drop copy=true

How to display all the duplicate records from the oracle Table?

select empno from emp group by empno having count(empno)>1;

I am running a job with 1000 records.. If the job gots aborted after loading 400records into target... In this case i want to load the records in the target with 401record... How will we do it??? This scenario is not for sequence job it's only in thejob Ex: Seq file> Trans> Dataset..

In dataset stage update policyUse existing(discard records)Use existing(discard schema and discard records)

i have seq file that contents 10 million records load to target any data base.. in that case it takes lot of time for loading. how do performance tuning in that situation...?

Sequential file outputAvailable properties to add: number of readers per node=2 ormore

what is the use of invocation id

It is an ID that we specify for our identification while using a job in multiple instances. One unique ID for each instance of the job. When we enable to run a job in multiple instances, DS engine would throw a window where in we need to specify the invocation ID for that instance. When we check the log in director, we can identify the job by its name followed by its invocation ID.

where we use column generator stage in real time scenario?

The Column Generator stage adds columns to incoming data and generates mock data for these columns for each data row processed. The new data set is then output..

Let us two table x and Y ..and you are doing funnel..

X file is having only 2 columns and Y file is having 3 columns..

While doing funnel metadata should be same. in order to achieve this include one column to X file and next use this column generator stage, this stage populates some mock data in third column..

how to identifie,is it innerjoin,leftouter join in lookup?

double click on lookup >lookup failure >if it is drop then it is inner join. If it is continue then it is left outer join

how to cleansing data

data cleansing is a process of identifying the data inconsistency and inaccuracies

ex:

data inaccuracy:

hyd

Hydrabad

after

hydrabad

hydrabad

data inconsistency

10.78

10.23465

after

10.27

10.23

What are the environmental settings for data stage, while working on parallel jobs?

Mainly we need 3 Environment variable to run datastage px

job. They are 1. APT_CONFIG_FILE

2. APT_RECORD_COUNT

3. APT_DUMP_SCORE

How to create user defined environment variable (parameter)?

u can create parameter in 2 ways 1.job level params 2.project level params.

JobLevel: when u want to create job level params ,go to job properties and use it.

Project Level :when u want to create project level, go to DSAdmin and click on environment variable and define which parameter u need and open design go to job properties and call those params (which u defined in DS admin) in job properties and use it.

30 jobs are running in unix.i want to find out my job.how to do this?Give me command?

ps ef|grep PID

how to move project from developement to uat?

By using the Datastage Manager we can move the project from Dev to Uat. Through datastage manager Export the project into your local machine as .dsx format (project.dsx) from DEV server. The same .dsx (project.dsx) import into UAT server by using the datastage manager.

How to do error handling in datastage?

Error handling can be done by using the reject file link.

what are the errors coming through job needs to be capture in sequential file and that file needs to be fetch in job which will load this exceptions or errors in database.

Can u pls tell me How can u Call the Shell Scripting/Unix Commands in Job Sequence?

There are two scenarios where u myt want to call a script

Scenario 1(Dependency exists between script and a job): Where a job has to be executed first then the script has to run, upon completion of script execution only the sec job has to be invoked. In this case develop a sequencer job where first job activity will invoke the first job then using Execute command activity call the script u would desire to invoke by typing "sh <script name>" in the command property of the activity, then with the other job activity call the second job.

Scenario 2: (Script and job are independent) : In this case right in your parallel job say job1, under job properties u can find "Afterjob subroutine" where u need to select "ExecSH" and pass the script name which you would like to execute. By doing this once the job1 execution completes the script gets invoked. The job succeeding the job1 say job2 doesn’t wait for the execution of the script.

Posted by Madhava

Recommend this on Google

No comments:

Post a Comment

http://www.blogger.com/share-post.g?blogID=5311810216814658887&postID=4828588886270187498&target=twitter

http://www.blogger.com/share-post.g?blogID=5311810216814658887&postID=4828588886270187498&target=pinterest

http://www.blogger.com/share-post.g?blogID=5311810216814658887&postID=4828588886270187498&target=facebook


http://www.blogger.com/email-post.g?blogID=5311810216814658887&postID=4828588886270187498

http://www.blogger.com/share-post.g?blogID=5311810216814658887&postID=4828588886270187498&target=blog

http://www.blogger.com/share-post.g?blogID=5311810216814658887&postID=4828588886270187498&target=email

Newer Post Older PostHome

Enter your comment...

Comment as: Google Account

Publish

Preview

http://datastageinfosoft.blogspot.in/2010/11/scd-type-2.html

http://datastageinfosoft.blogspot.in/2010/12/unix-commands.html


Subscribe to: Post Comments (Atom)

All Categories Search

Flipkart

ABOUT2

Your IP and Google Map location

http://datastageinfosoft.blogspot.com/feeds/4828588886270187498/comments/default

Madhava@2010. Ethereal template. Powered by Blogger.

http://www.blogger.com/

more next blog» create blog sign in datastage...

Documents