data warehousing frequently asked questons

73
Informatica 1.While importing the relational source defintion from database,what are the meta data of source import? Source name, Database location, Column names, Datatypes, Key constraints. 2.Howmany ways you can update a relational source defintion and what are they? Two ways 1. Edit the definition 2. Reimport the defintion 3.Where should you place the flat file to import the flat file defintion to the designer? Place it in local folder 4.To provide support for Mainframes source data,which files are used as a source definitions? COBOL files 5.Which transformation should you need while using the COBOL sources as source defintions? Data Warehousing and Informatica Frequently Asked Questions Classification : Confidential 1

Upload: rhythmofkrishna

Post on 02-Feb-2016

221 views

Category:

Documents


1 download

DESCRIPTION

Datawarehouse

TRANSCRIPT

Page 1: Data Warehousing Frequently Asked Questons

I n f o r m a t i c a

1.While importing the relational source defintion from database,what are the meta data of source import?

Source name, Database location, Column names, Datatypes, Key constraints.

2.Howmany ways you can update a relational source defintion and what are they?

Two ways1. Edit the definition2. Reimport the defintion

3.Where should you place the flat file to import the flat file defintion to the designer? Place it in local folder

4.To provide support for Mainframes source data,which files are used as a source definitions?

COBOL files

5.Which transformation should you need while using the COBOL sources as source defintions? Normalizer transformaiton, which is used to normalize the data.Since COBOL sources oftenly consists of Denormailzed data.

6.How can you create or import flat file definition into the warehouse designer?

You can not create or import flat file defintion into warehouse designer directly.Instead you must analyze the file in source analyzer,then drag it into the warehouse designer.When you drag the flat file source defintion into warehouse desginer workspace,the warehouse designer creates a relational target defintion not a file defintion.If you want to load to a file,configure the session to write to a flat file.When the informatica server runs the session,it creates and loads the flatfile.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

1

Page 2: Data Warehousing Frequently Asked Questons

7.What is the maplet?

Maplet is a set of transformations that is built in the maplet designer and you can use in multiple mapings.

8.what is a transforamation?

It is a collection objects that generates,modifies or passes data.

9.What are the designer tools for creating tranformations?

1. Mapping designer2. Tansformation developer3. Mapplet designer

10.What are the active and passive transforamtions?

An active transforamtion can change the number of rows that pass through it.A passive transformation does not change the number of rows that pass through it.

11.What are the connected or unconnected transforamations?

An unconnected transforamtion is not connected to other transformations in the mapping.Connected transforamation is connected to other transforamtions in the mapping.

12.How many ways you create ports?

1.Drag the port from another transforamtion2.Click the add buttion on the ports tab. 13.What are the reusable transforamtions?

Reusable transformations can be used in multiple mappings. You can create a reusable transformation and add an instance of it to maping.Later if you change the definition of the transformation, all instances of it inherit the changes.Since the instance of reusable transforamation is a pointer to that transforamtion,you can change the transforamation in the transformation developer,its instances automatically reflect these changes.This feature can save you great deal of work.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

2

Page 3: Data Warehousing Frequently Asked Questons

14.What are the methods for creating reusable transforamtions?

Two methods1.Design it in the transformation developer.2.Promote a standard transformation from the mapping designer.After you add a transformation to the mapping , you can promote it to the status of reusable transformation.Once you promote a standard transformation to reusable status,you can demote it to a standard transformation at any time.If you change the properties of a reusable transformation in mapping,you can revert it to the original reusable transformation properties by clicking the revert button.

15.What are the mapping parameters and mapping variables?

Maping parameter represents a constant value that you can define before running a session.A mapping parameter retains the same value throughout the entire session.When u use the maping parameter ,you declare and use the parameter in a maping or maplet.Then define the value of parameter in a parameter file for the session.Unlike a mapping parameter,a maping variable represents a value that can change throughout the session.The informatica server saves the value of maping variable to the repository at the end of session run and uses that value next time you run the session.

16.What are the unsupported repository objects for a mapplet?

1. Joiner transformations2. Normalizer transformations3. Non reusable sequence generator transformations.4. or post session stored procedures5. get defintions6. er mart 3.5 style Look Up functions7. source definitions8. MQ source defintions

17.Can You use the maping parameters or variables created in one maping into another maping?

NO.We can use mapping parameters or variables in any transformation of the same maping or mapplet in which you have created maping parameters or variables.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

3

Page 4: Data Warehousing Frequently Asked Questons

18.Can you use the maping parameters or variables created in one maping into any other reusable transformation? Yes.Because reusable tranformation is not contained with any maplet or maping.

19.How can you improve session performance in aggregator transformation?

Use sorted input.

20.What is aggregate cache in aggregator transforamtion?

The aggregator stores data in the aggregate cache until it completes aggregate calculations.When you run a session that uses an aggregator transformation,the informatica server creates index and data caches in memory to process the transformation.If the informatica server requires more space,it stores overflow values in cache files.

21.What are the diffrences between joiner transformation and source qualifier transformation? You can join hetrogenious data sources in Joiner transformation which we can not achieve in Source qualifier transformation.You need matching keys to join two relational sources in source qualifier transformation.Whereas you don't need matching keys to join two sources in Joiner. Two relational sources should come from the same datasource in Source qualifier. In Joiner transformation you can join relatinal sources which are coming from diffrent sources also.

22.In which condtions we can not use Joiner transformation(Limitaions of Joiner transformation)?

1. Both pipelines begin with the same original data source. 2. Both input pipelines originate from the same Source Qualifier transformation. 3. Both input pipelines originate from the same Normalizer transformation. 4. Both input pipelines originate from the same Joiner transformation. 5. Either input pipelines contains an Update Strategy transformation. 6.Either input pipelines contains a connected or unconnected Sequence Generator transformation.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

4

Page 5: Data Warehousing Frequently Asked Questons

23.What are the settiings that you use to cofigure the Joiner transformation? 1. Master and detail source 2. Type of join 3. Condition of the join

24.What are the join types in Joiner transformation?Normal (Default)Master outerDetail outerFull outer

25.What are the Joiner caches?When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the master source and builds index and data caches based on the master rows.After building the caches, the Joiner transformation reads records from the detail source and perform joins.

26.what is the look up transformation?

Use lookup transformation in u'r mapping to lookup data in a relational table,view,synonym.Informatica server queries the look up table based on the lookup ports in the transformation.It compares the lookup transformation port values to lookup table column values based on the look up condition. 27.Why use the lookup transformation ?

To perform the following tasks.Get a related value: For example, if your source table includes employee ID, but you want to include the employee name in your target table to make summary data easier to read. Perform a calculation: Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales). Update slowly changing dimension tables: can use a Lookup transformation to determine whether records already exist in the target.

28.What are the types of lookup?

Connected and unconnected

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

5

Page 6: Data Warehousing Frequently Asked Questons

29.Differences between connected and unconnected lookup? Connected Lookup Unconnected Lookup

Receives input values diectly from Receives input values from the result of athe pipe line. lkp expression in a another transformation.

You can use a dynamic or static cache You can use a static cache. Cache includes all lookup columns Cache includes all lookup out put ports in theused in the maping lookup condition and the lookup/return port.

Support user defined default values Does not support user defiend default values

30.What is meant by lookup caches?

The informatica server builds a cache in memory when it processes the first row af a data in a cached look up transformation.It allocates memory for the cache based on the amount you configure in the transformation or session properties.The informatica server stores condition values in the index cache and output values in the data cache.

31.What are the types of lookup caches?

Persistent cache: You can save the lookup cache files and reuse them the next time the informatica server processes a lookup transformation configured to use the cache.

Recache from database: If the persistent cache is not synchronized with the lookup table, You can configure the lookup transformation to rebuild the lookup cache.

Static cache: You can configure a static or read only cache for only lookup table.By default informatica server creates a static cache.It caches the lookup table and lookup values in the cache for each row that comes into the transformation.when the lookup condition is true,the informatica server does not update the cache while it prosesses the lookup transformation.

Dynamic cache: If you want to cache the target table and insert new rows into cache and the target, you can create a lookup transformation to use dynamic cache.The informatica server dynamically inerts data to the target table.

Shared cache: You can share the lookup cache between multiple transactions.You can share unnamed cache between transformations in the same maping.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

6

Page 7: Data Warehousing Frequently Asked Questons

32.Difference between static cache and dynamic cache Static cache Dynamic cache You can not insert or update the cache. You can insert rows into the cache as you

pass Rows to the target The informatic server returns a value from The informatic server inserts rows into cache the lookup table or cache when the condition when the condition is false.This indicates thatis true.When the condition is not true,the the row is not in the cache or target table.informatica server returns the default value You can pass these rows to the target table. for connected transformations and null for unconnected transformations.

33.Which transformation should we use to normalize the COBOL and relational sources?

Normalizer Transformation.When you drag the COBOL source into the mapping Designer workspace,the normalizer transformation automatically appears,creating input and output ports for every column in the source.

34.How the informatica server sorts the string values in Rank transformation?

When the informatica server runs in the ASCII data movement mode it sorts session data using Binaryy sortorder.If you configure the session to use a binary sort order, the informatica server caluculates the binary value of each string and returns the specified number of rows with the highest binary values for the String.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

7

Page 8: Data Warehousing Frequently Asked Questons

35.What are the Rank caches?

During the session ,the informatica server compares an inout row with rows in the datacache.If the input row out-ranks a stored row,the informatica server replaces the stored row with the input row.The Informatica Server stores group information in an index cache and row data in a data cache.

36.What is the Rankindex in Ranktransformation?

The Designer automatically creates a RANKINDEX port for each Rank transformation. The Informatica Server uses the Rank Index port to store the ranking position for each record in a group. For example, if you create a Rank transformation that ranks the top 5 salespersons for each quarter, the rank index numbers the salespeople from 1 to 5:

37.What is the Router transformation?

A Router transformation is similar to a Filter transformation because both transformations allow to use a condition to test data. However, a Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. A Router transformation tests data for one or more conditions and gives the option to route rows of data that do not meet any of the conditions to a default output group. If you need to test the same input data based on multiple conditions, use a Router Transformation in a mapping instead of creating multiple Filter transformations to perform the same task.

38.What are the types of groups in Router transformation?

1. Input group2. Output group The designer copies property information from the input ports of the input group to create a set of output ports for each output group.

Two types of output groups1. User defined groups2. Default groupYou can not modify or delete default groups.

39.Why we use stored procedure transformation?

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

8

Page 9: Data Warehousing Frequently Asked Questons

For populating and maintaining data bases.

40.What are the types of data that passes between informatica server and stored procedure?

Three types of data1. Input/Out put parameters2. Return Values3. Status code.

41.What is the status code? Status code provides error handling for the informatica server during the session.The stored p rocedure issues a status code that notifies whether or not stored procedure completed sucessfully.This value can be used not seen by the user.It is only used by the informatica server to determine whether to continue running the session or stop.

42.What is source qualifier transformation?

When you add a relational or a flat file source definition to a maping,you need to connect it to a source qualifer transformation.The source qualifier transformation represnets the recordsthat the informatica server reads when it runs a session.

43.What are the tasks that source qualifier performs?

1. Join data originating from same source data base.2. Filter records when the informatica server reads source data.3. Specify an outer join rather than the default inner join4. Specify sorted records.5. Select only distinct values from the source.6. Creating custom query to issue a special SELECT statement for the informatica server to readsource data.

44. What is the target load order?

You specify the target loadorder based on source qualifiers in a maping.If you have the multiple source qualifiers connected to the multiple targets,You can designate the order in which informaticaserver loads data into the targets.

45.What is the default join that Source qualifier provides?

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

9

Page 10: Data Warehousing Frequently Asked Questons

Inner equi join.

46. What are the basic needs to join two sources in a source qualifier?

Two sources should have primary and Foreign key relationships.Two sources should have matching data types.

47.What is update strategy transformation ? This transformation is used to maintain the history data or just most recent changes into the targettable.

48.Describe two levels in which update strategy transformation sets? Within a session: When you configure a session, you can instruct the Informatica Server to either treat all records in the same way (for example, treat all records as inserts), or use instructions coded into the session mapping to flag records for different database operations.

Within a mapping: Within a mapping, you use the Update Strategy transformation to flag records for insert, delete, update, or reject.

49.What is the default source option for update stratgey transformation? Data driven.

50.What is Datadriven?

The informatica server follows instructions coded into update strategy transformations within the session maping to determine how to flag records for insert,update,delete or rejectIf you do not choose data driven option setting,the informatica server ignores all update strategytransformations in the mapping.

51.What are the options in the target session of update strategy transsformatioin?

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

10

Page 11: Data Warehousing Frequently Asked Questons

InsertDeleteUpdateUpdate as updateUpdate as insertUpdate esle insertTruncate table

52.What are the types of maping wizards that are to be provided in Informatica?

The Designer provides two mapping wizards to help create mappings quickly and easily. Both wizards are designed to create mappings for loading and maintaining star schemas, a series of dimensions related to a central fact table.

Getting Started Wizard: Creates mappings to load static fact and dimension tables, as well as slowly growing dimension tables. Slowly Changing Dimensions Wizard: Creates mappings to load slowly changing dimension tables based on the amount of historical dimension data you want to keep and the method you choose to handle historical dimension data.

53.What r the types of maping in Getting Started Wizard?

Simple Pass through maping :

Loads a static fact or dimension table by inserting all rows. Use this mapping when want to drop all existing data from table before loading new data.

Slowly Growing target : Loads a slowly growing fact or dimension table by inserting new rows.Use this mapping to load new data when existing data does not require updates.

54.What are the mapings that we use for slowly changing dimension table?

Type 1: Rows containing changes to existing dimensions are updated in the target by overwriting the existing dimension. In the Type 1 Dimension mapping, all rows contain current dimension data.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

11

Page 12: Data Warehousing Frequently Asked Questons

Use the Type 1 Dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions of dimensions in the table.

Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the target. Changes are tracked in the target table by versioning the primary key and creating a version number for each dimension in the table. Use the Type2 Dimension/Version Data mapping to update a slowly changing dimension table when you want to keep a full history of dimension data in the table. Version numbers and versioned primary keys track the order of changes to each dimension.

Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and inserts only those found to be new dimensions to the target. Rows containing changes to existing dimensions are updated in the target. When updating an existing dimension, the Informatica Server saves existing data in different columns of the same row and replaces the existing data with the updates

55.What r the different types of Type2 dimension maping?

Type 2 Dimension/Version Data Maping: In this maping the updated dimension in the source will gets inserted in target along with a new version number.And newly added dimension in source will be inserted into target with a primary key. Type 2 Dimension/Flag current Maping: This maping is also used for slowly changing dimensions.In addition it creates a flag value for changed or new dimension.Flag indiactes if the dimension is new or newlyupdated.Recent dimensions will gets saved with cuurent flag value 1. And updated dimensions are saved with the value 0.

Type 2 Dimension/Effective Date Range Maping: This is also one flavour of Type2 mapping used for slowly changing dimensions.This maping also inserts both new and changed dimensions in to the target.And changes are tracked by the effective date range for each version of each dimension.

56.How can you recognise whether or not the newly added rows in the source gets inserted in the target ? In the Type 2 maping we have three options to recognise the newly added rows1. Version number

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

12

Page 13: Data Warehousing Frequently Asked Questons

2. Flagvalue3. Effective date Range

57.What are two types of processes that informatica runs a session with?

Load manager Process: Starts the session, creates the DTM process, and sends post-session email when the session completes.The DTM process: Creates threads to initialize the session, read, write, and transform data, and handle pre- and post-session operations.

58.Can you generate reports in Informatcia?

Yes. By using Metadata reporter we can generate reports in informatica. 59.What is metadata reporter? It is a web based application that enables to run reports againist repository metadata. With a meta data reporter,you can access information about your repository without having knowledge of SQL,transformation language or underlying tables in the repository.

60.Define maping and sessions? Maping: It is a set of source and target definitions linked by transformation objects that define the rules for transformation.Session: It is a set of instructions that describe how and when to move data from source to targets.

61.Which tool you use to create and manage sessions and batches and to monitor and stop the informatica server?

Informatica Workflow manager.

62.Why we use partitioning the session in informatica?

Partitioning achieves the session performance by reducing the time period of reading the source and loading the data into target.

63.To achieve the session partition what are the necessary tasks you have to do?

Configure the session to partition source data.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

13

Page 14: Data Warehousing Frequently Asked Questons

Install the informatica server on a machine with multiple CPU's.

64.How the informatica server increases the session performance through partitioning the source?

For a relational sources informatica server creates multiple connections for each parttion of a single source and extracts seperate range of data for each connection.Informatica server reads multiple partitions of a single source concurently.Similarly for loading also informatica server creates multiple connections to the target and loads partitions of data concurently.

For XML and file sources,informatica server reads multiple files concurently.For loading the data informatica server creates a seperate file for each partition(of a source file).You can choose to merge the targets.

65.Why you use repository connectivity?

When you edit, schedule the sesion each time,informatica server directly communicates the repository to check whether or not the session and users are valid.All the metadata of sessions and mappings is stored in repository.

66.What are the tasks that Loadmanger process will do?

Manages the session and batch scheduling: When you start the informatica server the load maneger launches and queries the repository for a list of sessions configured to run on the informatica server.When you configure the session the loadmanager maintains list of sessions and session start times.When you sart a session loadmanger fetches the session information from the repository to perform the validations and verifications prior to starting DTM process.

Locking and reading the session: When the informatica server starts a session lodamaager locks the session from the repository.Locking prevents you from starting the session again and again.

Reading the parameter file: If the session uses a parameter files,loadmanager reads the parameter file and verifies that the session level parematers are declared in the file

Verifies permission and priveleges: When the sesson starts load manger checks whether or not the user has privelleges to run the session.

Creating log files: Loadmanger creates logfile which contains the status of session.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

14

Page 15: Data Warehousing Frequently Asked Questons

67.What is DTM process?

After the loadmanger performs validations for session,it creates the DTM process.DTM is to create and manage the threads that carry out the session tasks.It creates the master thread.Master thread creates and manges all the other threads.

68.What r the different threads in DTM process?

Master thread: Creates and manages all other threads

Maping thread: One maping thread will be created for each session.Fectchs session and maping information.

Pre and post session threads: This will be created to perform pre and post session operations.

Reader thread: One thread will be created for each partition of a source.It reads data from source.

Writer thread: It will be created to load data to the target.

Transformation thread: It will be created to tranform data. 69.What are the data movement modes in informatcia? Datamovement modes determines how informatcia server handles the charactor data.You choose the datamovement in the informatica server configuration settings.Two types of datamovement modes are avialable in informatica.

1. ASCII mode2. Uni code mode.

70.What are the output files that the informatica server creates during the session run?

Informatica server log: Informatica server(on unix) creates a log for all status and error messages(default name: pm.server.log).It also creates an error log for error messages.These files will be created in informatica home directory.

Session log file: Informatica server creates session log file for each session.It writes information about session into log files such as initialization process,creation of sql commands

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

15

Page 16: Data Warehousing Frequently Asked Questons

for reader and writer threads,errors encountered and load summary.The amount of detail in session log file depends on the tracing level that you set.

Session detail file: This file contains load statistics for each target in mapping.Session detail include information such as table name,number of rows written or rejected.You can view this file by double clicking on the session in monitor window

Performance detail file: This file contains information known as session performance details which helps you where performance can be improved.To genarate this file select the performance detail option in the session property sheet.

Reject file: This file contains the rows of data that the writer does not write to targets.

Control file: Informatica server creates control file and a target file when you run a session that uses the external loader.The control file contains the information about the target flat file such as data format and loading instructios for the external loader.

Post session email: Post session email allows you to automatically communicate information about a session run to the designated recipents.You can create two different messages.One if the session completed sucessfully the other if the session fails.

Indicator file: If you use the flat file as a target, you can configure the Informatica server to create indicator file.For each target row,the indicator file contains a number to indicate whether the row was marked for insert,update,delete or reject.

Output file: If session writes to a target file,the informatica server creates the target file based on file prpoerties entered in the session property sheet.

Cache files: When the informatica server creates memory cache it also creates cache files.For the following circumstances informatica server creates index and data cache files. 1. Aggreagtor transformation2. Joiner transformation3. Rank transformation4. Lookup transformation

71.In which circumstances the Informatica server creates Reject files?

1. When it encounters the DD_Reject in update strategy transformation.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

16

Page 17: Data Warehousing Frequently Asked Questons

2. Violates database constraint3. Filed in the rows was truncated or overflowed.

72.What is polling?

It displays the updated information about the session in the monitor window.The monitor window displays the status of each session when you poll the Informatica server

73.Can you copy the session to a different folder or repository?

Yes. By using copy session wizard you can copy a session in a different folder or repository.But that target folder or repository should consists of mapping of that session.If target folder or repository does not have the maping of copying session , you must copy that maping first before you copy the session.

74.What is batch and describe about types of batches?

Grouping of session is known as batch.Batches are of two types1. Sequential: Runs sessions one after the other2. Concurrent: Runs sessions at same time.

If you have sessions with source-target dependencies you have to go for sequential batch to start the sessions one after another.If you have several independent sessions you can use concurrent batches whcih run all the sessions at the same time.

75.Can you copy the batches?

NO

76.How many number of sessions can you create in a batch?

Any number of sessions.

77.When the informatica server marks that a batch is failed?

If one of the sessions is configured to "run if previous completes" and that previous session fails.

78.What is a command that used to run a batch?

pmcmd is used to start a batch.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

17

Page 18: Data Warehousing Frequently Asked Questons

79.What are the different options used to configure the sequential batches?

Two options

1. Run the session only if previous session completes sucessfully.2. Always runs the session.

80.In a sequential batch can you run the session if previous session fails?

Yes.By setting the option always run the session.

81.Can u start batches within a batch?

You can not. If you want to start batch that resides in a batch,create a new independent batch and copy the necessary sessions into the new batch.

82.Can you start a session inside a batch idividually?

We can start our required session only in case of sequential batch.in case of concurrent batch we cant do like this.

83.How can you stop a batch?

By using server manager or pmcmd.

84.What are the session parameters?

Session parameters are like maping parameters,represent values you might want to change betweensessions such as database connections or source files.

Workflow manager also allows you to create user defined session parameters.Following are the user defined session parameters.1. Database connections:2. Source file names: Use this parameter when you want to change the name or location ofsession source file between session runs3. Target file name: Use this parameter when you want to change the name or location ofsession target file between session runs.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

18

Page 19: Data Warehousing Frequently Asked Questons

4. Reject file name: Use this parameter when you want to change the name or location ofsession reject files between session runs.

85.What is parameter file?

Parameter file is to define the values for parameters and variables used in a session. A parameterfile is a file created by text editor such as word pad or notepad.you can define the following values in parameter file1. Maping parameters2. Maping variables3. Session parameters

86.How can you access the remote source into your session?

Relational source: To acess relational source which is situated in a remote place , you need toconfigure database connection to the datasource.

FileSource: To access the remote source file you must configure the FTP connection to thehost machine before you create the session.

Hetrogenous: When your maping contains more than one source type,the Workflow manager createsa hetrogenous session that displays source options for all types.

87.What is difference between partioning of relatonal target and partitioning of file targets?

If you parttion a session with a relational target the Informatica server creates multiple connectionsto the target database to write target data concurently.If you partition a session with a file targetthe informatica server creates one target file for each partition.You can configure session propertiesto merge these target files.

88.What are the transformations that restrict the partitioning of sessions?

Advanced External procedure tranformation and External procedure transformation: This transformation contains a check box on the properties tab to allow partitioning.

Aggregator Transformation: If you use sorted ports you can not parttion the assosiated source.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

19

Page 20: Data Warehousing Frequently Asked Questons

Joiner Transformation: You cannot partition the master source for a joiner transformation.

Normalizer Transformation

XML targets.

89.Performance tuning in Informatica?

The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server.Increases the session performance by following. Network connecions: The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections ofteny affect on session performance.So aviod netwrok connections. Flat files: If your flat files stored on a machine other than the informatca server, move those files to the machine that consists of informatica server.

Relational datasources: Minimize the connections to sources, targets and informatica server to improve session performance. Moving target database into server system may improve session performance.

Staging areas: If you use staging areas you force informatica server to perform multiple datapasses.Removing of staging areas may improve session performance.

You can run the multiple informatica servers againist the same repository.Distibuting the session load to multiple informatica servers may improve session performance.

Running the informatica server in ASCII datamovement mode improves the session performance.Because ASCII datamovement mode stores a character value in one byte.Unicode mode takes two bytes to store a character.

If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.

We can improve the session performance by configuring the network packet size,which allows

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

20

Page 21: Data Warehousing Frequently Asked Questons

data to cross the network at one time.To do this go to Workflow manger, choose server configure database connections.

If your target consists key constraints and indexes you slow the loading of data.To improve the session performance in this case drop constraints and indexes before you run the session and rebuild them after completion of session.

Running parallel sessions by using concurrent batches will also reduce the time of loading thedata.So concurent batches may also increase the session performance.

Partittionig the session improves the session performance by creating multiple connections to sources and targets and loads data in paralel pipe lines.

In some cases if a session contains an aggregator transformation, you can use incremental aggregation to improve session performance.

Aviod transformation errors to improve the session performance.

If the sessioin contains lookup transformation you can improve the session performance by enabling the look up cache.

If your session contains filter transformation ,create that filter transformation nearer to the sourcesor you can use filter condition in source qualifier.

Aggreagator,Rank and Joiner transformation may oftenly decrease the session performance .Because they must group data before processing it.To improve session performance in this case use sorted ports option.

90.What is difference between maplet and reusable transformation?

Maplet consists of set of transformations that is reusable.A reusable transformation is a single transformation that can be reused.

If you create variables or parameters in maplet these cannot be used in another maping or maplet.Unlike the variables that are created in a reusable transformation that can be useful in any other maping or maplet.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

21

Page 22: Data Warehousing Frequently Asked Questons

We can not include source definitions in reusable transformations. But we can add sources to a mapplet.

Whole transformation logic will be hidden in case of mapplet. But it is transparent in case of reusable transformation.

We cant use COBOL source qualifier,joiner,normalizer transformations in maplet.Nevertheless we can make them as a reusable transformations.

91.Define informatica repository?

The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and Client tools. Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets.

The repository also stores administrative information such as usernames and passwords, permissions and privileges, and product version.

Use repository manager to create the repository.The Repository Manager connects to the repository database and runs the code needed to create the repository tables.These tablesStore metadata in specific format the informatica server,client tools use.

92.What are the types of metadata that is stored in repository?

Following are the types of metadata that repositorystores.

1. Database connections2. Global objects3. Mappings4. Mapplets5. Multidimensional metadata6. Reusable transformations7. Sessions and batches8. Short cuts9. Source definitions10. Target defintions11. Transformations

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

22

Page 23: Data Warehousing Frequently Asked Questons

93.What is power center repository?

The PowerCenter repository allows to share metadata across repositories to create a data mart domain. In a data mart domain, we can create a single global repository to store metadata used across an enterprise, and a number of local repositories to share the global metadata as needed.

94.How can you work with remote database in informatica?did you work directly by using remote connections?

To work with remote datasource you need to connect it with remote connections. But it is notpreferable to work with that remote source directly by using remote connections .Instead you bring that source into your local machine where informatica server resides.If you work directly with remote source the session performance will decrease by passing less amount of data across the network in a particular time.

95.What are the new features in Informatica 5.0?

You can Debug your maping in maping designerYou can view the work space over the entire screenThe designer displays a new icon for a invalid mapings in the navigator windowYou can use a dynamic lookup cache in a lokup transformationCreate maping parameters or maping variables in a maping or maplet to make mapings moreflexibleYou can export objects into repository and import objects from repository.when you export a repository object,the designer or Workflow manager creates an XML file to describe the repository metadata.The designer allows you to use Router transformation to test data for multiple conditions.Router transformation allows you route groups of data to transformation or target.You can use XML data as a source or target.

Server Enahancements:

You can use the command line program pmcmd to specify a parameter file to run sessions or batches.This allows to change the values of session parameters, and mapping parameters and variables at runtime.

If you run the Informatica Server on a symmetric multi-processing system, you can use multiple CPUs to process a session concurrently. configure partitions in the session properties

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

23

Page 24: Data Warehousing Frequently Asked Questons

based on source qualifiers. The Informatica Server reads, transforms, and writes partitions of data in parallel for a single session. This is avialable for Power center only.

Informatica server creates two processes like Loadmanager process, DTM process to run the sessions.

can copy the session across the folders and reposotories using the copy session wizard in the informatica server manager

With new email variables, can configure post-session email to include information, such as the mapping used during the session

96.What is incremantal aggregation?

When using incremental aggregation, apply captured changes in the source to aggregate calculations in a session. If the source changes only incrementally and can capture changes, you can configure the session to process only those changes. This allows the Informatica Server to update target incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time run the session.

97.What are the scheduling options to run a sesion?

You can schedule a session to run at a given time or intervel,or you can manually run the session.

Different options of scheduling

Run only on demand: Informatica server runs the session only when user starts session explicitlyRun once: Informatica server runs the session only once at a specified date and time.Run every: Informatica server runs the session at regular intervels as you configured.Customized repeat: Informatica server runs the session at the dates and times specified in the repeat dialog box.

98.What is tracing level and what are the types of tracing level?

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

24

Page 25: Data Warehousing Frequently Asked Questons

Tracing level represents the amount of information that informatcia server writes in a log file.Types of tracing level 1. Normal2. Verbose3. Verbose init4. Verbose data

99.What is difference between stored procedure transformation and external procedure transformation?

In case of stored procedure transformation procedure will be compiled and executed in a relational data source.You need data base connection to import the stored procedure in to your maping.Where as in external procedure transformation, procedure or function will be executed out side of data source. You need to make it as a DLL to access in your mapping. No need to have data base connection in case of external procedure transformation.

100.Explain about Recovering sessions?

If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of failure.Correct the errors, and then complete the session. The method se to complete the session depends on the properties of the mapping, session, and Informatica Server configuration. Use one of the following methods to complete the session: 1. Run the session again if the Informatica Server has not issued a commit. 2. Truncate the target tables and run the session again if the session is not recoverable.3. Consider performing recovery if the Informatica Server has issued at least one commit.

101.If a session fails after loading of 10,000 records into the target. How can you load the records from 10001st record when you run the session next time? As explained above informatcia server has three methods to recover the sessions.Use performing recovery to load the records from where the session fails.

102.Explain about perform recovery?

When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the ROWID of the last row committed to the target database. The Informatica Server then reads all sources again and starts processing from the next rowID. For example, if

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

25

Page 26: Data Warehousing Frequently Asked Questons

the Informatica Server commits 10,000 rows before the session fails, when run recovery, the Informatica Server bypasses the rows up to 10,000 and starts loading with row 10,001. By default, Perform Recovery is disabled in the Informatica Server setup.You must enable Recovery in the Informatica Server setup before you run a session so the Informatica Server can create and/or write entries in the OPB_SRVR_RECOVERY table.

103. How to recover a standalone session?

A standalone session is a session that is not nested in a batch. If a standalone session fails, you can run recovery using a menu command or pmcmd. These options are not available for batched sessions.

To recover sessions using the menu:

1. In the Server Manager, highlight the session want to recover. 2. Select Server Requests-Stop from the menu. 3. With the failed session highlighted, select Server Requests-Start Session in Recovery Mode from the menu.

To recover sessions using pmcmd:

1.From the command line, stop the session. 2. From the command line, start recovery.

104.How can u recover the session in sequential batches?

If you configure a session in a sequential batch to stop on failure, you can run recovery starting with the failed session. The Informatica Server completes the session and then runs the rest of the batch. Use the Perform Recovery session property

To recover sessions in sequential batches configured to stop on failure:

1.In the Workflow Manager, open the session property sheet. 2.On the Log Files tab, select Perform Recovery, and click OK. 3.Run the session. 4.After the batch completes, open the session property sheet. 5.Clear Perform Recovery, and click OK.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

26

Page 27: Data Warehousing Frequently Asked Questons

If you do not clear Perform Recovery, the next time you run the session, the Informatica Server attempts to recover the previous session.If you do not configure a session in a sequential batch to stop on failure, and the remaining sessions in the batch complete, recover the failed session as a standalone session.

105.How to recover sessions in concurrent batches?

If multiple sessions in a concurrent batch fail, you might want to truncate all targets and run the batch again. However, if a session in a concurrent batch fails and the rest of the sessions complete successfully, you can recover the session as a standalone session.

To recover a session in a concurrent batch:

1.Copy the failed session using Operations-Copy Session. 2.Drag the copied session outside the batch to be a standalone session. 3.Follow the steps to recover a standalone session. 4.Delete the standalone copy.

106.How can you complete unrcoverable sessions?

Under certain circumstances, when a session does not complete, you need to truncate the target tables and run the session from the beginning. Run the session from the beginning when the Informatica Server cannot run recovery or when running recovery might result in inconsistent data.

107.What are the circumstances that lead to an unrecoverable session?

1. The source qualifier transformation does not use sorted ports.2. If you change the partition information after the initial session fails.3. Perform Recovery is disabled in the informatica server configuration.3. If the sources or targets changes after initial session fails.4. If the maping consists of sequence generator or normalizer transformation.5. If a concuurent batch contains multiple failed sessions.

108.If I did any modifications for my table in back end does it reflect in informatca warehouse or maping desginer or source analyzer?

NO. Informatica is not at all concerned with back end data base.It displays you all the information

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

27

Page 28: Data Warehousing Frequently Asked Questons

that is to be stored in repository.If you want to reflect back end changes to informatica screens,again you have to import from back end to informatica by valid connection.And you have to replace the existing files with imported files.

109.After draging the ports of three sources(sql server,oracle,informix) to a single source qualifier, can u map these three ports directly to target?

NO.Unless and until you join those three ports in source qualifier you cannot map them directly.

110.How do do error handling in Informatica?

Error handling is very primitive.1. Log files can be generated which contain error details and code.2. The error code can be checked from troubleshooting guide and corrective action taken.The log file can be increased by giving appropriate tracing level in the session properties.Also we can give that one Session can stop after 1,2 or n number of errors.

111.How do implement configuration management in Informatica?

There are several methods to do this .Some of them are :-

1. Taking a back up of the repository as a binary file and treat it as a configurable item.2. Implement Folder Versioning utility in Informatica.

112.A mapping contains Source Table S_Time ( Start_Year, End_Year ) Target Table Time_Dim ( Date, Day, Month, Year, Quarter)

Stored procedure transformation : A procedure has two input parameters I_Start_Year, I_End_Year and output parameter as O_Date, Day , Month, Year, Quarter. If this session is running, how many rows will be available in the target and why ?.

Only one row the last date of the End_Year.All the subsequent rows are overwrietten the previous rows.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

28

Page 29: Data Warehousing Frequently Asked Questons

113.What is the difference between connected lookup and unconnected lookup.

Connected Look Up is part of Mapping Data flow.It Gives multiple output values on a condition.Support Default valuesUnconnected Look Up is not a part of Mapping Data flow.It is called form other transformations e.g Expression TransformationIt has a return port which returns one value generally a flag.Does not Support Default values

114.What is the difference between lookup cache and lookup index.

Look up Cache contains Index cache and data cache1. Index cache:Contains columns used in condition2. Data cache: Contains other output columns than the condition columns.

115.Discuss two approaches for updation of target table in informatica and how they are different.

Update strategy transformation: We can write our own code .It is flexible.

Normal insert / update /delete (with proper variation of the update option) :It can be configured in the Session properties.Any change in the row will cause an update.Inflexible.

116.How do you handle performance issues in Informatica.Where can you monitor the performance ?There are several aspects to the performance handling .Some of them are :-

Source tuning Target tuning Repository tuning Session performance tuning Incremental Change identification in source side. Software , hardware(Use multiple servers) and network tuning. Bulk LoadingUse the appropriate transformation.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

29

Page 30: Data Warehousing Frequently Asked Questons

To monitor this :

Set performance detail criteria Enable performance monitoring Monitor session at runtime &/ or Check the performance monitor file .

117.What is a suggested method for validating fields / marking them with errors?.

One of the successful methods is to create an expression object, which contains variables.> One variable per port that is to be checked.> Set the error “flag” for that field, then at the bottom of the expression trap each of the error fields.> From this port can choose to set flags based on each individual error which occurred, or feed them out as a combination of concatenated field names – to be inserted in to the database as an error row in an error tracking table.

118.Where is the cache (lookup, index) created and how can you see it.

The cache is created in the server.Some default memory is allocated for it.Once that memory is exceeded then these files can be seen in the Cache directory in the Sever, not before that.

119.When do you use SQL override in Look up Transformation.Use SQl override when You have more than one lookup table To use WHERE condition to reduce records in cache.

120. Explain how "constraint based load ordering" works?

Constraint based load ordering in PowerMart / PowerCenter works like this:

It controls the order in which the target tables are committed to a relational database. It is of no use when sending information to a flat file. To construct the proper constraint order: links between the TARGET tables in Informatica need to be constructed. Simply turning on "constraint based load ordering" has no effect on the operation itself. Informatica does NOT read constraints from the database when this switch is turned on. Again, to take advantage of this switch, you must construct primary / foreign key relationships in the TARGET TABLES in

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

30

Page 31: Data Warehousing Frequently Asked Questons

the designer of Informatica. Creating primary / foreign key relationships is difficult - are only allowed to link a single port (field) to a single table as a primary / foreign key.

121.What is the difference between Power mart and Power Centre.

Power Center - has all the functionality . distributed metadata(repository). global repository and can register multiple Informatica servers. One can share metadata across repositories. Can connect to Varied sources like Peoplesoft,SAP etc. Has bridges which can transport meta data from opther tools (like Erwin) Cost around 200K US $.

Power Mart – Subset of Power centre. One repository and can register only one Informatica server. Cannot connect to Varied sources like Peoplesoft,SAP etc Cost around 50K US $.

122.What is the difference between Oracle Sequence and Informatica Sequence and which is better?

Oracle sequence can be used in a Pl/Sql stored procedure, which in turn can be used with stored procedure transformation of Informatica. Informatica sequence is generated through sequence generator transformation of Informatica.It depends upon the user needs but Oracle sequence provides greater control.

123.How do execute a set of Sql commands before running as session and after completion of session in Informatica.Explain.

Sql commands can be put in stored procedures.Two Unconnected Stored procedure Transformations are created pointing to respective procedures one pre session ,other post session.When the Session is run these two procedures are executed before the session and after the session.

124.How can utilize COM components in Informatica.

By writing C+,VB,VC++ code in External Stored Procedure Transformation

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

31

Page 32: Data Warehousing Frequently Asked Questons

125.What is an indicator file and how it can be used.

Indicator file is used for Event Based Scheduling when you don’t know when the Source Data is availaible.A shell command ,script or a batch file creates and sends this indicator file to the directory local to the Informatica Server.Server waits for the indicator file to appear before running the session.

126.What is persistent cache? When it should be used.

When Lookup cache is saved in Look up Transformation It is called persistent cache.The first time session runs it is saved on the disk and utilized in subsequent runnings of the Session.It is used when the look up table is Static i.e does’nt change frequently

127. What is Incremental Aggregation and how it should be used

If the source changes only incrementally and can capture changes, you can configure the session to process only those changes. This allows the Informatica Server to update your target incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time run the session. Therefore, only use incremental aggregation if:

your mapping includes an aggregate function.

The source changes only incrementally.

can capture incremental changes. might do this by filtering source data by timestamp.

Before implementing incremental aggregation, consider the following issues:

Whether it is appropriate for the session

What to do before enabling incremental aggregation

When to reinitialize the aggregate caches

128.Discuss a strategy (Mapping) for loading of fact table

Sources containing Measures(M1,M2) are S1,S2

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

32

Page 33: Data Warehousing Frequently Asked Questons

Dimensions are D1,D2,D3,D4Fact F1

Join the Sources S1,S2 through a Source Qualifier,Create a lookup on each dimension table and five production primary keys of the dimension in the lookup condition.Fetch Surrogate key out of it.Map these measures M1,M2 and Surrogate Keys to the Fact table.

Source Source Qualifier Lookups on Dimensions

129.Informatica Server and Client are in different machines. When run a session from the server manager by specifying the source and target databases. It displays an error.You are confident that everything is correct. Then why it is displaying the error?

The connect strings for source and target databases are not configured on the Workstation conatining the server though they may be on the client machine.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

33

S1

S2

FactTable

Page 34: Data Warehousing Frequently Asked Questons

Designer Tool

This tool is used by programmers to develop ETL programs (referred as mapping programs or mappings) It has following five components:

1. Source Analyzer:

To define source data objects for mappings. These sources can be RDBMS, semi RDBMS, Files, ERPs, XML files, COBOL Files etc.

2. Warehouse Designer:

To create or include Target data objects. Target can be RDBMS (most preferable ) ERP and file. 3. Mapping Designer: To relate source data objects with the target data objects using predefined or user defined Transformations. 4. Transformation Developer:

To create user defined Transformations based on pre-defined transformations.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

34

Page 35: Data Warehousing Frequently Asked Questons

5. Mapplet Designer:

To create reusable Mappings based on virtual data source and target objects with transformations. Note: First three components are must for any mapping programs.

Pre defined transformations:

Transformations are objects with properties and methods to perform a specific task within the mapping program. Most of the transformations are nothing but part of SELECT statement. It means they perform the same tasks done by different Clauses of SELECT statement.

1. Source Qualifier(Active):

It is a wrapper on the source data objects and data flow from source .Data objects are not allowed without this transformation. A source Qualifier is a complete SELECT statement. We can have one or more source Quailifiers for single source object. We may require multiple source qualifiers when different data objects need different set of data.

A source qualifier can combine multiple source objects (Join Query) such SQ are reffered as common source qualifier.

2. XML Source Qualifier:

Same as source Qualifier but only for XML files.

3. ERP Source Qualifier:

Same as source Qualifier but only for ERP sources.

4. MQ Series Source Qualifier:

Same as source Qualifier but only for MQ Series product from IBM. It is used for Data integration from different databases.

5. Joiner(Active):

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

35

Page 36: Data Warehousing Frequently Asked Questons

It is used to join two Source Qualifiers (Source Objects) based on Equi-join including outer joins. This is required when source data objects are heterogenous. Since in such cases common Source Qualifier cannot be used.

6. Lookup(Active):

Same as Joiner but it can be used for non-Equi join based data objects also. No outer join is allowed. It can combine data objects from source, Target as well as From external database while Joiner can combine data objects only from the source.

7. Expression(Pasive):

It is used to define row or record based formula or expressions. Like: Netpay(sal + comm+da-tax), Substr(Name,1,3), To_char(Join_date,’yyyy’) etc. It is SELECT clause of SELECT statement. For example:SELECT sal + comm FROM emp;

8. Filter(Active) :

To define conditions to restrict records. WHERE clause of SELECT statement. For example: SELECT * FROM emp WHERE sal > 30000;

9. Aggregator(Active):

To group the records with/without summary functions output. GROUP BY clause of SELECT statement with group functions use. For example: SELECT job, SUM(sal), avg(sal) FROM emp GROUP BY job;ORSELECT job from emp GROUP BY job;

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

36

Page 37: Data Warehousing Frequently Asked Questons

10. Rank(Active):

Same as filter but it filters records from top or bottom of the sorted records. For example: Top three customers based on total sales amount. Last three employees based on salary. It can be done by SELECT statement in the following ways: --Simple sub-query using rownum virtual column. --Corelated subquery. --START WITH & CONNECT BY clause.

11. Router(Active):

It is used to provide multiple outputs from single source of data. Each output can have its own filter. Basically it is a combintion of Source Qualifier and Filter transformations. Router is like multiple views on a single table under database.

12.Update strategy(Passive):

It is used to flag records for insert, Delete, Update or Reject to Target data objects. Default flag is Insert. Based on the flags attached to the records Target decides data manipulation within the data objects. This is very useful for incremental data loading.

13. Stored Procedure(Passive/Active):

It is used to call or execute back-end programs like: Procedures, Functions and package members.

14. Sequence Generator(Passive):

It is used to generate unique serial numbers. Same as Sequence object of Oracle database.

15. Normalizer(Passive):

It is used to convert non tabular data into tabular format. Basically for COBOL data structure.

16. Mapplet input(virtual Source Table)(Passive):

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

37

Page 38: Data Warehousing Frequently Asked Questons

To define virtual source data objects that is nothing but parameters or ports. It is same as we define IN or OUT parameters with procedures or functions. When Mapplets are used in mappings then Mapplet inout ports are attached to actual source object columns.

17. Mapplet Output(Virtual Target Table)(Passive):

To define virtual Target data objects that is nothing but parameters or ports . When Mapplets are used in Mappings then Mapplet output ports are attached to actual target object columns. Note: A reusable program must never be based on actual data objects.

Ports

Ports are channels through which values are passed from one transformation to another Transformation. A port does not hold the value. They are just like a pipe used to pass data either directly or after some transformation(Formula). There are following four basic types of ports:

1. Iput Port: to receive values from source. 2. Output Port: To pass values to target. 3. Variable Port: To define formula for internal use of transformation. Such ports are not visible or exposed to other transformation. 4. Input/Output port: To receive as well as to Pass values.

Note:Output or variable ports require a formula or expression.

JoinsNormal Join:

All matching records between Master and Detail tables. Master Outer Join:

All matching records plus Master records without Detail records. Detail Outer Join:

All matching records plus Detail records without Master records. Full Outer Join:

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

38

Page 39: Data Warehousing Frequently Asked Questons

All matching records plus Master records without Detail recoreds plus Detail records without Master records.

How to decide location of stored procedures (Back end Programs):

Do not keep under Repository database because :

a) We need to grant privileges on source objects as well as Target objects to Repository database.

b) Mapping program objects are distributed some under Target database and Some under Repository database.

If by mistake deleted or anipulated repository objects then whole project might be corrupt. Do not keep under source database because :

Client will never allow any type of manipulations on source data objects as mistakes can affect whole transaction database. So the location for the stored programs are target databases. Or Keep all stored procedures under external database if number of external objects are very high it may affect paerformance If combined with target database. Note: External programs use should be avoided. Only in case of necessity we should go for them. Since it requires more maintainance, backup, restores etc.

18 Lookup Transformation:

Is used to join two data objects like Joiner transformation but it has following differences: o Joiner takes both the data objects from Source Analyzer or Transformation while

Lookup takes one data object from Source Analyzer or Transformation and the other from Source analyzer(source) or Warehouse Designer or from external database.

o Joiner is only for equi Joins but Lookup can be used for non equi joins also. o Joiner supports outer join but Lookup cannot.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

39

Page 40: Data Warehousing Frequently Asked Questons

Steps to create dimension tables

Tools -> Warehouse DesignerTargets -> Create /Edit DimensionSelect ‘Dimension’ folder. Use ‘Add Dimension ‘ Button. Enter a name for Dimension. Select the database. Note: dimensions are logical objects to keep related dimension tables(Level). In case of star schema model. A Dimension will have only one dimension Table(Level1)We don’t require hierarchy for star schema based model but Informatica makes compulsory to have a hierarchy even in case of single level and the level must be dragged into hierarchy. This becomes very useful in future when a Dimension has to be changed into star flake or snow flake based model. Note: Dimensions are within the repository and they are never part of the target database. Only dimension tables(Levels) are in the Target Database.

Select ‘Level’ folder. Use ‘Add Level’ button. Enter a name for level. Select ‘Level Properties’ button. Create columns except Primary Key Column. Since PK column is created by Designer itself. And Used as FK in Fact table. Naming convention: GK_<levelName> in Dimension table. FK_<LevelName> in Fact table. Select ‘Hierarchies’ Folder. Use ‘Add Hierarchy’ button. Enter a name of Hierarchy. drag the level into hir\erarchy.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

40

Page 41: Data Warehousing Frequently Asked Questons

Change the table name if required. Decide the parent level(In case of star flake and snow flake schema) Decide position of level within the hierarchy. (In case of star flake and snow flake)Use ‘clse’ button to complete.

Steps to create Fact TabeleTargets-> Create CubeEnter a name for the cube. Select type of Cube, mostly normalized. Select database for the cube. Select dimensions and hierarhy for the cube. It is not necssary that all dimensions and hierarchies must be selected. Select measures from the source database if they exist else create them. Enter a name for the fact table. Decide Dimension table. Decide measures. Use ‘Finish’ button.

Note: Designer does not enable Targets-> Generate/Execute SQL option and until this option is enabled we cannot create tables under database. This is a bug and we need to select all tables from ‘Navigator’ window to drop them again into workspace of warehouse Designer window. Targets -> Generate/Execute SAL to create tables under the database.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

41

Page 42: Data Warehousing Frequently Asked Questons

Transformations

1. Surce Qualifier2. Expression 3. Aggregator4. Filter5. Joiner6. Look Up 7. Stored Procedure8.Sequence Generator9. Update Strategy10. Rank11. Router12. Normalizer

1. Surce Qualifier:

can use the source Qualifier to perform the following tasks. Join data originationg from the same source database. Filter records when the Informatica Server reads the source data. Specify an Outer Join rather than a default inner Join. Specify sorted ports.Select only distinct values form the source. Create a custom query to issue a special SELECT statement for the Infornatica Server to read the source data.

SQL Query:

Defines a custom query that replaces the default query the Informatica Server uses to read from source represented in this Source Qualifier. A custom query overwrites entries for a custom join or a source filter.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

42

Page 43: Data Warehousing Frequently Asked Questons

User Defined Join:

Specifies condition used to join data from multiple sources represented in the same source qualifier transformation.

Source Filter:

Specifies the Filter condition the Informatica Server applies when querying records.

Number of Sorted Ports:

Indicates the number of columns used when sorting records queried from relational sources. If select this option ,the informatica Server adds an ORDER BY to the default query. When it reads source records the ORDER BY includes the number of ports specified starting from the top of the Source Qualifier. When selected , the database sort order must match the sesson sort order.

Tracing Level:

Sets the amount of detail included in the session log when run a session containing this transformation.

SELECT DISTINCT:

Specifies if want to select only unique records. The informatica Server includes a SELECT DISTINCT statement. Do not alter the data types in the source Qualifier. If the data types in the source definition and the Source Qualifier do not match, cannot save the mapping. 2. Expression:

The expression transformation allows to perform calculations on row by row basis.

3. Aggregator:

To perform calculations inolving multiple rows, such as SUMS or AVARAGES , Use the aggregator transformation. The aggregator transformation allows to perform aggregate calculations such as Averages and SUMs.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

43

Page 44: Data Warehousing Frequently Asked Questons

There are partitioning restrictions that apply to aggregator transformation. By default Informatica Server treats NULL values as NULL in Aggregator Functions.

4.Filter:

The Source Qualifier Transformation provides an alternate way to filter records. Rather than filtering records from within a mapping, the source Qualifier filters records when reading from a source. The main difference is that the Source Qualifier limits the record set extracted from a source, while the filter transformation limits the recordset sent to the target. Since a Source Qualifier reduces the number of records used throughout the mapping, it provides better performance. However source Qualifier only lets filter records from relational sources, while filter transformation filters records from any type of source. Also note that Since it runs in the database, must make sure that the Source Qualifier condition only uses standard SQL. The filter transformation can define a condition using Any statement or transformation function that returns either a true or false value.

5. Joiner:

use joiner Transformation to join two sources with at least one matching port. The joiner transformation uses a condition that matches one or more pairs of ports between the two sources. The combination of sources can be varied. Source: 1. Two relational tables existing in separate databases. 2. Two flat files in potentially different File systems. 3. Two different ODBC sources.4. Two instances of the sam XML source. 5. A relational table and a flat File source. 6. A relational table and an XML source.

The Joiner Transformation accepts input from any transformation, However , there are some limitations on the data flows connect to the Joiner Transformation.

You cannot use Joiner transformation in the following situations:

1. Both pipelines begin with the same original data source.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

44

Page 45: Data Warehousing Frequently Asked Questons

2. Both input pipelines originate from the same source Qualifier transformation. 3. Both input pipelines originate from the same Normalizer. 4. Both input pipelines originate form the same Joiner Transformation. 5. Either input pipeline contains an Update Strategy Transformation. 6. Either pipeline contains a Connected or Unconnectd Sequence Generator transformation.

Specify one of the sources as Master Source, and the other as the Detail Source. This is spaecified on the prooperties Tab in the Transformation by clicking the M column. When add the ports of a transformation, the ports from the first source are automatically set as Detail Surce. Adding the ports from the second transformtion automatically sets them as Master Sources. The Master/ Detail relation determines how the join treats data from those sources based on the join.

Note:The Master and Detail can be known by setting the table. The table in which most of the columns are required for the next transformation OR the table on which the next transformation’s value will depend take that one as Master and the the rest one as Detail.

Generally there are four types of Joins:

Normal: Selects only the matching records from both the tables. Master Outer: Selects matching records and also the records of the Detail. Detail Outer: Selects the matching recoprds and also the records of the Master. Full Outer: Selects all the records of both the tables.

A normal or Master Outer join performs faster than a Full Outer or Detail Outer.

6. Look Up:

can use the look Up transformation to perform many tasks including: Get a related value: If r source table includes Employee_ID but want to include the Employee_Name in r Target table to Make r summary data easier to read. Perform a calculation: Many normalized tables include values used in calculations, such as gross sales per invoice or sales tax , but not the calcualted value such as net Sales. Update Slowly changing Dimensions: can use a lookup transformation to determine whether records already exist in the target.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

45

Page 46: Data Warehousing Frequently Asked Questons

7. Stored Procedure:

A stored Procedure is precompiled collection of transact SQL statements and optional flow control statements, similar to an execuatable script. Stored procedures are stored and run within thedatabase. Unlike standard SQL however, stored procedures allow user defined variables, conditional statements, and other powerful programming features. might use Stored procedures to: 1. Drop and recreate indexex. 2. Check the status of the target database before moving records into it. 3. Determine it enough space exists in a database. 4. Perform specified calculation. Stored procedures allow greater flexibility than SQL statements. Stored procedures also allow error handling and logging necessary for mission critical tasks. One of the most useful features of the stored procedures is the ability to send data to the stored procedure and receive data from the stored procedure. There are three types of data that pass between the stored procedure and the informatica Server.

1. Input/Output parameters. 2. Return Values. 3. Status Codes. 4. Status Codes: Status code provides error handling for the informatica server during a session. The stored procedure issues a status code that notifies wherther or not the stored procedure completed successfully. This value cannot be sent by the user ; it is only used by the Informatica Server to determine whether to continue running the session or stop. configure options in the Workflow Manager to continue or stop the session in the event of the stored procedure error.

8. Sequence Generator:

The Sequence Generator transformation generates numeric values. might use the Sequence to create unique primary key values, replace missing primary keys. Or cycle through a sequential range of numbers. It contains two output ports that can connect to one or more transformations. The informatica server generates a value each time a row enters a connected transformation , even if that value is not used.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

46

Page 47: Data Warehousing Frequently Asked Questons

When NEXTVAL is connected to the input port of another transformation, the informatica server generates a sequence of numbers. When a CURRVAL is connected to the transformation the the Informatica Server generates the NEXT value plus one. Some common Uses for the Sequence Generator transformation are: 1. Creating Keys2. Replacing missing values. can connect NEXTVAL to multiple transformations and generate unique values for each row in the transformation. For example: might connect NEXTVAL to two target tables in a mapping to generate unique primary key values. The informatica server creates a column of unique primary key values for each target table. The Sequence generator is unique among all transformations because cannot add, or delete its default ports (NEXTVAL and CURRVAL)

9. Update Strategy:

1.Insert: Populate the target table for the first time, or maintaining a historical DataWarehuse . In the latter case, must set this strategy for the entire DataWarehouse, not just a select group of target tables. 2. Delete: Clear target tables. 3. Update: Update target tables. might choose this setting whether DataWarehouse contains historical data or a snapshot. Later when configure how to update individual target tables, can detemine whether to insert updated records as new records or use the updated information to modify existing records in the target. 4. Data Driven: Exert fine control over how to flag records for insert, delete , Update or reject. Choose this setting if records designed for the same table need to be flagged on occasion for one operation (for example update) , or for a different operation(for example reject). In addition , this setting provides the only way can flag records for reject. For the greatest control over r Update Strartegy , add Update Strategy Transformations to a mapping. The most important feature of this transformation is its Update Strategy expression used to flag individual records for insert, delete, update or reject. The constatns for each database operation and their numeric equivalent include the following:

Insert :- DD_INSERT 0

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

47

Page 48: Data Warehousing Frequently Asked Questons

Update :-DD_UPDATE 1Delete:-DD_DELETE 2Reject:-DD_REJECT 3

Session Wizard

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

48

Page 49: Data Warehousing Frequently Asked Questons

1. Insert: Treat all records as Insert. If inserting the records voilates a primary or foreign key constraint in the database ,the Informatica Server rejects the record. 2.Delete: Treat all records as delete. For each record if the informatica Server finds the record in the target based on the primarykey the Informatica Server deletes the record. Note that the primary key constraint must exist in the target difinition in the repository. 3.Update: Treat all rows as Update. For each record the Informatica Server looks for a matching primary key value in the target table. If it exists the Informatica Server Updates the record. Again the Primary key constraint must exist in the Target Repository. 4. Data Driven: The Informatica Server follows the instructions coded into the Update Strategy transformations within the session mapping to determine how to flag records for insert, delete, update or reject. If a mapping for a session contains an Update Strategy Transformation, this field is marked data driven by default. If do not choose Data Driven Setting, the Informatica Server ignores all Update Strategy Transformations in the mapping.

What are the shortcuts and what are their advantages?

Shortcuts allow to use metadata across folders without making copies, ensuring uniform metadata. A shortcut inherits all properties of the object to which it points. Once create a shortcut, can configure the shortcut name and description.

When the object, the shortcut references changes, the shortcut inherits those changes. By using a shortcut instesd of a copy ensure each use of the shortcut exactly matches the original object. For example if have a shortcut to a target definition and add a column to the definition, the shortcut automatically inherits the additional column.

Shortcuts allow to reuse an object without creating multiple objects in the repository. For example use a source definition in the mapping in ten different folders, instead of creating ten copies of the same source definition, one in each floder. can create ten shortcuts to the original definition. can create shortcuts to objects in the shared folders. If try to create a shortcut to a non sharable folder, the designer creates a copy of the object instead. can create shortcuts to the following repository objects.1. Source Definitions. 2. Reusable transformations.3. Mapplets4. Target Definitions5. Business Components can create two types of shortcuts. 1. Local shortcut: A shortcut created in the same repository as the original object.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

49

Page 50: Data Warehousing Frequently Asked Questons

2. Global Shortcut: A shortcut ctreated in a local repository that references an object in the global repository. Advantages: One of the primary advantages of using shortcuts is ease of Maintainance.If need to change all instances of an object, can edit the the original repository object. All shortcuts accessing the object automatically inherit the changes. Shortcuts have the following advantages over the copied repository objects: can maintain a common repository object in a single location. If need to edit the object, all shortcuts immidiately inherit the changes make. can restrict repository users to a set of predefined metadata by asking users to incorporate the shortcuts into their work instead of developing repository objects indepently. can develop complex mappings, Mapplets, or reusable transformations, then reuse them easily in other folders. can save space in r repository by keeping a single repository object instead of creating copies of the objects in multiple repositories.

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

50

Page 51: Data Warehousing Frequently Asked Questons

Data Warehousing and Informatica Frequently Asked QuestionsClassification : Confidential

51