datastage8.5issuswithsolutions.txt

1. InfoSphere DataStage jobs randomly aborting on Windows systems Problem(Abstract)Job aborting from time to time with file errors. SymptomProgram "XXX": Line 65, Error initializing AK file "Z". Program "DSP.Open": Line 102, "$XXX" is not in the CATALOG space. [EACCES] Permission denied Program "DSP.Open": Line 102, Incorrect VOC entry for $XXX. Program "DSP.Open": Line 102, Unable to load subroutine. CauseAntivirus software is locking Engine or Project files which prevent normal execution EnvironmentAll Windows Systems Diagnosing the problemIssue occurs randomly with error messages relating to "per mission denied" Resolving the problemDisable anti virus or prevent "on access" scan over DataSta ge files/folders 2.SyncProject -ISHost R101:9080 -ISUser admin -ISPassword pword -project -report /tmp/myprojrep.txt 3.Problem(Abstract)DataStage client logins fail with error: 39125: the directory you are connecting to either is not a uv account or does not exist. Resolving the problemThis error usually indicates that one of the 6 files requir ed to make a UniVerse account is missing from the directory of the DataStage pro ject which has the problem or that the user does not have permission to access t hese files. Those 6 files/directories are: VOC D_VOC VOCLIB D_VOCLIB &SAVEDLISTS& D_&SAVEDLISTS& The actions needed to resolve this problem will depend upon which files are miss ing or damaged in project having the problem: 1.Login to an authorized operating system userid on the DataStage server machine , i.e. dsadm. 2.Change directory to the failing project directory, i.e. cd /opt/IBM/InformationServer/Server/Projects/myproject 3.If any of the following 4 files are missing, they can be copied from the Infor mationServer/Server/Template directory to the project directory that is missing the file: D_VOC D_VOCLIB D_&SAVEDLISTS& VOCLIB 4.If the D_&SAVEDLISTS& directory is missing, you will need to copy from the Tem plate directory to project directory the D_&SAVEDLISTS& directory and any conten ts (usually one small file). 5.If the VOC file is missing, that file will need to be recovered in one of the

following 2 methods: Recover the bad file from backup. You should only attempt to restore the VOC fil e from backup and not the entire project directory unless the backup was very re cent. The reason is that many of the files in project directory must be kept in sync with the xmeta repository, so restoring the project directory without resto ring xmeta to the same point in time may cause job definitions to be out of sync with xmeta contents. However, restoring xmeta will affect the state of all proj ects, not just the one with the problem. Contact IBM Information Server support to obtain a SyncProject tool which can re store some project contents based on data stored in the xmeta repository for Inf ormation Server 8.0.1 and 8.1. SyncProject is not available for DataStage v7.5.x . If the files indicated above are not missing, then check that the user is includ ed in the DataStage group; the default group is dstage. Another possible cause for this error is that a prior attempt was made to delete the project (either via the DataStage Administrator client or manually), but th at process did not fully complete. As a result, references to the project name m ay still exist in the UV.ACCOUNT file or in the xmeta repository. Any attempt to access a partially deleted project can result in the same error 39125. In that situation, you c4.an use the following documentation to complete the removal of the project: 4.To resolve this problem, please check the following: 1.Confirm that the hosts file on all tiers contains the hostname of each tier, d efining both the short host name and also the fully qualified hostname (domain a ppended). For example: 175.45.56.87 hostname1 hostname1.ibm.com 175.45.56.92 hostname2 hostname2.ibm.com On Unix systems, the hosts file is /etc/hosts . For Windows, the file is \window s\system32\drivers\etc\hosts . 2.Ensure that the entries for the above hosts exists on each system and that the TCP/IP address is the same on each system. Ping the hostname (both shortname an d fully qualified) from each server machine to confirm. 3.Confirm the hosts file also has a "localhost" entry that points to 127.0.0.1 4.Use netstat -na command to confirm if there is any process using ports 2825 or 2809 on system. This could cause a conflict. 5.Verify that the DataStage client and server are the same version and release. For example, if a user with DataStage 8.0.1 client tries to connect to a DataSta ge 8.1 server, the login will fail with various communication errors. 5.On Windows platform, if UVTEMP variable is undefined, it will default to the U VTEMP directory under DSEngine, for example: /opt/IBM/InformationServer/Server/D SEngine/UVTEMP 6.The Datastage configuration file is a master control file (a textfile which si ts on the server side) for jobs which describes the parallel system resources an d architecture. The configuration file provides hardware configuration for suppo rting such architectures as SMP (Single machine with multiple CPU , shared memor y and disk), Grid , Cluster or MPP (multiple CPU, mulitple nodes and dedicated m emory per node). DataStage understands the architecture of the system through th is file.

This is one of the biggest strengths of Datastage. For cases in which you have c hanged your processing configurations, or changed servers or platform, you will never have to worry about it affecting your jobs since all the jobs depend on t his configuration file for execution. Datastage jobs determine which node to run the process on, where to store the temporary data, where to store the dataset d ata, based on the entries provide in the configuration file. There is a default configuration file available whenever the server is installed. The configuration files have extension ".apt". The main outcome from having the configuration file is to separate software and hardware configuration from job d esign. It allows changing hardware and software resources without changing a job design. Datastage jobs can point to different configuration files by using job parameters, which means that a job can utilize different hardware architectures without being recompiled. The configuration file contains the different processing nodes and also specifie s the disk space provided for each processing node which are logical processing nodes that are specified in the configuration file. So if you have more than one CPU this does not mean the nodes in your configuration file correspond to these CPUs. It is possible to have more than one logical node on a single physical no de. However you should be wise in configuring the number of logical nodes on a s ingle physical node. Increasing nodes, increases the degree of parallelism but i t does not necessarily mean better performance because it results in more number of processes. If your underlying system should have the capability to handle th ese loads then you will be having a very inefficient configuration on your hands .

1. APT_CONFIG_FILE is the file using which DataStage determines the configura tion file (one can have many configuration files for a project) to be used. In f act, this is what is generally used in production. However, if this environment variable is not defined then how DataStage determines which file to use ?? 1. If the APT_CONFIG_FILE environment variable is not defined then DataStage look for default configuration file (config.apt) in following path: 1. Current working directory.

2. INSTALL_DIR/etc, where INSTALL_DIR ($APT_ORCHHOME) is the top level direct ory of DataStage installation.

2.

Define Node in configuration file

A Node is a logical processing unit. Each node in a configuration file is distin guished by a virtual name and defines a number and speed of CPUs, memory availab ility, page and swap space, network connectivity details, etc.

3. What are the different options a logical node can have in the configuratio n file? 1. fastname The fastname is the physical node name that stages use to open co nnections for high volume data transfers. The attribute of this option is often the network name. Typically, you can get this name by using Unix command uname -n .

2. pools Name of the pools to which the node is assigned to. Based on the cha racteristics of the processing nodes you can group nodes into set of pools. 1. A pool can be associated with many nodes and a node can be part of many po ols. 2. A node belongs to the default pool unless you explicitly specify apools li st for it, and omit the default pool name ( ) from the list. 3. A parallel job or specific stage in the parallel job can be constrained to run on a pool (set of processing nodes). 1. In case job as well as stage within the job are constrained to run on spec ific processing nodes then stage will run on the node which is common to stage a s well as job. 3. resource resource resource_type location [{pools disk_pool_name }] | resource resource_type value . resource_type can be canonicalhostname (Which takes quoted e thernet name of a node in cluster that is unconnected to Conductor node by the h ight speed network.) or disk (To read/write persistent data to this directory.) or scratchdisk (Quoted absolute path name of a directory on a file system where intermediate data will be temporarily stored. It is local to the processing node .) or RDBMS Specific resourses (e.g. DB2, INFORMIX, ORACLE, etc.)

4.

How datastage decides on which processing node a stage should be run?

1. If a job or stage is not constrained to run on specific nodes then paralle l engine executes a parallel stage on all nodes defined in the default node pool . (Default Behavior) 2. If the node is constrained then the constrained processing nodes are chose n while executing the parallel stage.

7.http://www-01.ibm.com/support/docview.wss?uid=swg21584967 8. http://pic.dhe.ibm.com/infocenter/iisinfsv/v9r1/index.jsp?topic=%2Fcom.ibm.sw g.im.iis.ds.parjob.tut.doc%2Fmodule3%2Flesson3.2designingamorecomplexjob.html 9. Clear the &PH& file in the Project directory. There is a &PH& directory in ea ch DataStage project directory, which contains information about active stages t hat is used for diagnostic purposes. The &PH& directory is added to every time a job is run, and needs periodic cleaning out. 10.

datastage8.5issuswithsolutions.txt

Documents