admin alert

Admin Alert: When System Job Tables Attack, Part I

Vigilance pays, negligence punishes. Want proof? Try ignoring i5/OS message CPI1468 (System job tables nearing capacity) the next time it shows up. A system job table overflow can prevent your system from accepting new jobs, delete your spooled files, fill up your DASD, or stop an IPL. And those are its good points. This week and next, I'll look at system job table overflows and how they affect your system.

What Are System Job Tables and Why Should I Care?

System job tables are internal system objects used by the i5/OS operating system to track every job on a partition. By default, there can be up to 10 job tables on a partition and each table can track up to 16352 jobs for a maximum number of 163520 jobs. The maximum number of system jobs for a partition is designated in the Maximum number of jobs (QMAXJOB) system value, and that number can be changed to any valid number between 32000 and 485000 jobs. QMAXJOB's shipped value is 165320.

Tracked jobs include active jobs, completed jobs, and jobs that are waiting to be run from job queues. A new system job entry is created every time work (in the form of a job) is submitted to the system. Completed jobs are tracked and remain on the system for as long as there is spooled file output present for the job.

If too many jobs remain active in the system, the job tables can become too big and approach their maximum size. As your system approaches its maximum number of jobs, a number of problems can occur, including the following reported issues.

Slow backups Your system may stop accepting new jobs High DASD usage, because of the high number of spooled files on the system Performance problems with certain i5/OS commands and APIs IPL problems if the system can't start up any new jobs

It's not a pretty picture, which is why it's wise to occasionally check your system job table usage. i5/OS reminds you when system job table usage is approaching a critical point by issuing the following CPI1468 message.

CPI1468 - System job tables nearing capacity.

According to IBM, the job table checking code was changed in OS/400 V4R5M0 to send a CPI1468 message to the QSYSOPR message queue, QHST history log, and the QSYSMSG critical message queue (if it exists) every time the tenth system job table is extended to track more jobs. If you ignore the messages and the system job tables fill up, you may see some of the problems listed above.

The Simple Solution that Doesn't Exist

Since excessive job table entries can cause so many problems, you may want to set the system warning message threshold lower, similar to what you might do with other i5/OS thresholds. A lower threshold would provide you with more time to remove excessive jobs and to clean up your system job tables. But IBM won't let you do that. The operating system has no mechanism for changing the threshold value at which the CPI1468 message is sent.

http://www.ibm.com/

Since you can't lower the warning threshold value, that leaves you with three options for detecting and dealing with system job table problems.

1. Monitoring the system to determine when you are approaching high job table usage.2. Detecting and deleting excessive jobs that are cluttering your system.3. Maintaining the job table entries to remove excessive unused entries, to check for job

table damage, and to compress the job tables.

This week, I'll cover how to monitor the system for high job table usage. Next week, I'll look at how to detect and delete excessive system jobs and how to maintain your job table entries.

Monitoring System Job Table Usage

To monitor job table usage, use the green screen Display Job Table command (DSPJOBTBL), as follows:

DSPJOBTBL OUTPUT(*)

DSPJOBTBL provides a screen display similar to the following:

Display Job Tables

Permanent job structures: Temporary job structures: Initial . . . . : 30 Initial . . . . : 20 Additional . . . : 10 Additional . . . : 10 Available . . . : 72480 Available . . . : 583 Total . . . . . : 126625 Maximum . . . . : 163520 ---------------------Entries-------------- Table Size Total Available In-use Other 1 16752384 16352 0 16352 0 2 16749312 16352 99 16253 0 3 16749312 16352 1718 14634 0 4 16749312 16352 11136 5216 0 5 16749312 16352 15891 461 0 6 16749312 16352 16312 40 0 7 16749312 16352 15833 519 0

There are the three sections to this display: the Permanent job structures, the Temporary job structures, and the Entries.

The Permanent job structures area summarizes the number of permanent job structure entries that exist in the system. A permanent job structure is assigned to each new job that enters the system. These entries are available for reuse but they cannot be recycled until the entry's current job either: a) ends without producing any spooled file output; or b) all the spooled file output for the job is either printed or deleted.

The three fields to pay attention to in this section are the Available, Total, and Maximum permanent job structures. The Maximum figure is the maximum number of jobs that are currently allowed on the system, which is the value contained in the QMAXJOB system value discussed above. The Total figure reports the total of number of entries contained in all system job tables. Total includes permanent job entries for currently active jobs as well as reusable entries for jobs

that have already ended without leaving any spooled file output on the system. Available entries are the number of permanent entries that are available for reuse by new jobs entering the system.

To determine if your system job tables need work, check the permanent job structure entry numbers against each other. If there are a small number of Available entries in relation to the Total entries, the system may experience performance degradation because it will have to extend the job tables when a new job enters the system. Curiously, if there are a large number of Available entries in relationship to the Total, system performance will also suffer when performing functions that examine jobs. Too many available permanent entries can cause degraded performance during IPL steps that process table functions. Finally, if the Total number of jobs is approaching the Maximum system jobs, the CPF1468 message will soon appear and you may start experiencing some of the problems listed above.

For job table overflow purposes, you can pretty much ignore the Temporary job structures section of the DSPJOBTBL display. New jobs entering the system are also assigned a reusable temporary job structure entry. This entry is returned to the Temporary job structure pool when the job ends. The Temporary job structures area also lists out the current number of available temporary entries.

The Entries area indicates how heavily used each of the partition's 10 possible job tables are. It shows you the total size of each job table, how many entries each job table can hold, and the number of available and in-use entries. You can use the Entries information to back up what you see in the permanent job structures area. If there are a small number of available entries in the permanent job structure area, that should be reflected in the job table detail in the Entries area. Similarly, if the permanent job structure area shows a large number of available entries, you can see which tables have the most available entries in the Entries area.

The other nice feature about the Entries area is that it can show how the in-use job entries are distributed. Press F11 and the entries screen will shift from a summary of the total, available, and in-use entries for each job table to a listing of how all the in-use job entries are allocated for each table. Here's an example of what that screen looks like.

Display Job Tables Permanent job structures: Temporary job structures: Initial . . . . : 30 Initial . . . . : 20 Additional . . . : 10 Additional . . . : 10 Available . . . : 72474 Available . . . : 592 Total . . . . . : 126625 Maximum . . . . : 163520 ------------------In-use Entries------------------ Job Output Job Log Table Active Queue Queue Pending 1 127 0 16197 28 2 43 0 16285 4 3 300 0 14665 14 4 1 0 4775 22 5 0 0 461 0 6 0 0 40 0 7 0 0 519 0

With F11, the Entries area changes its display to show you how many of its in-use jobs are active (running), in a job queue (waiting to run), ended but sitting on an output queue with spooled files waiting to print, or sitting in a Job Log pending state. This can provide direction as to whether it

will be worth it to compress the job tables to create more space or if you need to remove old jobs from the system.

Admin Alert: When System Job Tables Attack, Part II

What Are System Job Tables, Again?

System job tables are internal system objects that i5/OS uses to track partition jobs. There can be up to 10 job tables on a partition, and the maximum number of system jobs for a partition is designated in the Maximum Number of Jobs (QMAXJOB) system value.

As the number of partition jobs approach the system's QMAXJOB value, your system job tables start to fill up and a number of issues can occur. These issues include slow backups, failure to accept new jobs, high DASD usage due to a large number of spooled files, performance issues, and IPL problems. As the number of system jobs approaches the system's QMAXJOB value, the more likely you are to experience system issues.

You can check your current system job table status by running the Display Job Tables (DSPJOBTBL) command, which generates a screen display that looks like this.

Display Job Tables

Permanent job structures: Temporary job structures: Initial . . . . : 30 Initial . . . . : 20 Additional . . . : 10 Additional . . . : 10 Available . . . : 72480 Available . . . : 583 Total . . . . . : 126625 Maximum . . . . : 163520

---------------------Entries---------------------- Table Size Total Available In-use Other 1 16752384 16352 0 16352 0 2 16749312 16352 99 16253 0 3 16749312 16352 1718 14634 0 4 16749312 16352 11136 5216 0 5 16749312 16352 15891 461 0 6 16749312 16352 16312 40 0 7 16749312 16352 15833 519 0

The specifics on reading this display are contained in last week's column. Generally, you can detect the following situations when examining the Permanent Job Entries on the DSPJOBTBL screen.

1. If the total number of permanent job structure entries start approaching the maximum system jobs, your system may start experiencing the problems listed above. i5/OS will issue a CPF1468 message: System job tables nearing capacity. In this situation, you will want to delete old jobs from the system to decrease the total number of permanent job entries.

2. Your system contains a small number of available entries in relation to the system's total entries. Your system may experience performance degradation because it will have to continually extend the job tables as new work enters the system. Deleting old jobs will also help in this situation.

3. Your system contains a large number of available entries in relation to the total entries. This can cause problems when running functions that examine jobs. It can also degrade

http://www.itjungle.com/fhg/fhg092408-story03.html

performance during IPL steps that process table entries. You will want to compress your system job tables in this situation.

I'll cover how to handle the first two situations this week and talk about the third situation next week.

What Causes Job Table Usage To Run High?

The number one reason that system job table problems occur is because organizations maintain too many spooled files on a system. It may be that system applications overproduce spooled file output, or that spooled files aren't efficiently pruned from the system by printing or deletion, or that your organization likes to hang on to their system output. Whatever the reason, too many jobs with unprocessed spooled files can cause performance problems and getting control of spooled file output is critical to resetting your system.

The easiest way to check for excessive spooled files is to browse through your output queues by using the Work with Output Queues (WRKOUTQ) command, like this:

WRKOUTQ OUTQ(*ALL)

This will display all the output queues on your system and how many spooled files are in each output queue. From here, it's fairly easy to identify which output queues contain too many spooled files. Here's a sample WRKOUTQ display that I recently ran on one of my machines that had excessive job table entries.

Work with All Output Queues

Type options, press Enter.2=Change 3=Hold 4=Delete 5=Work with 6=Release8=Description 9=Work with Writers 14=Clear

Opt Queue Library Files Writer Status QDKT QGPL 0 RLS QKROUTQ QGPL 0 RLS QPFR OUTQ QGPL 0 RLS QPRINT QGPL 321969 RLS QPRINTH QGPL 0 RLS QPRINTM QGPL 28 RLS QPRINTS QGPL 0 RLS QPRINT1 QGPL 0 RLS QPRINT2 QGPL 387 RLS QUERY QGPL 830 RLS

A screen like this makes it easy to identify where excessive job table entries are coming from. The hard part is deciding which spooled files to delete. Some users can be notoriously fussy when it comes to removing their spooled file output. It's helpful if your shop has a policy that it will only keep spooled files for a certain time period, perhaps 30 days. With a policy in place, you can set up routines to automatically delete spooled files that are older than a certain date. To help with that task, I published a generic routine for selectively deleting spooled files according to any criteria you wish. After spooled files are deleted, the jobs associated with the spooled files will be removed from the system and your available permanent job structure entries will go up. If you want to use the brute force method, you can also reduce your available entries by simply clearing overcrowded job queues when you find them.



It's also worthwhile to check the number of entries in the QEZJOBLOG output queue. On busy systems, tens of thousands of job logs can be kept on the system, taking up disk storage and filling up the system job tables. In the i5/OS CLEANUP options, IBM provides a setting that lets you designate how many days you should keep job logs and other system output. Job logs older than the designated number of days are automatically deleted. Take the following options to check and reduce a partition's number of days to keep: job logs and other system output value.

1. Call the Cleanup Tasks menu by executing the following Go to Menu (GO) command:

GO MENU(CLEANUP)

2.Take option 1, Change cleanup options from the menu that appears. You can change the number of days to keep values on this screen.

By reducing this value to a reasonable number, you are setting up an automatic routine to prune your system job tables by deleting excessive job logs and the jobs they are associated with.

Keeping Your Spooled Files and Eating Them, Too

There's a second way to take care of excessive permanent structure entries caused by jobs that contain unprinted and undeleted spooled files. In i5/OS V5R2 and above, IBM offers a system value called Spooled file action (QSPLFACN). You can set QSPLFACN to one of two values: *KEEP (its default) or *DETACH. When set to *KEEP, spooled files are associated with the job that produced them. After completion, the system table entries for jobs containing spooled files remain active with a status of completed.

Two things happen when you change QSPLFACN to *DETACH. First, when the job ends, all its associated spooled files are detached from the job. After the spooled files are detached, the job itself is removed and its permanent job structure entry is recycled back to the system. *DETACH processing takes effect immediately for any job that becomes active after the change occurs. It does not affect jobs that ended before the change occurred.

Detaching spooled files from their originating jobs can be a great boon in keeping your system job tables under control, because it will keep your job tables relatively small. It can also cause problems in locating spooled file output through i5/OS job commands. After changing QSPLFACN to *DETACH, you will no longer be able to find completed jobs and their associated spooled files by using any of the following job commands:

Work with Job (WRKJOB)

Work with Submitted Jobs (WRKSBMJOB)

Work with User Jobs (WRKUSRJOB)

The bottom line is that setting QSPLFACN to *DETACH is a great move if you will never need to locate spooled files by examining the jobs that created them (and this includes looking for job logs). However, if your spooled files are more device-oriented than job-oriented (such as barcode labels or packing lists) and you don't care about finding the jobs that created them, QSPLFACN can help you skirt most of the issues with system job table overflow.

A Correction From My Last Column (Already)

http://www.ibm.com/

In last week's article, I stated that i5/OS has no mechanism for changing the threshold value at which the CPI1468 message is sent. I was wrong. Reader C. Barbie wrote in to tell me that IBM is offering two PTFs to add this capability to i5/OS V5R4 and V6.1 systems:

V5R4 – PTF SI29585V6.1 – PTF SI30171

Once these PTFs are applied, you can create a two-digit decimal data area called QMAXJOBPCT that contains a new lower threshold value to use for sending out the CPI1468 error message. If you want the system to monitor system job table usage when it reaches 80 percent, for example, you can create QMAXJOBPCT by using the following Create Data Area command (CRTDTAARA).

CRTDTAARA DTAARA(QSYS/QMAXJOBPCT) TYPE(*DEC) LEN(2 0) VALUE(80)

To activate the new threshold, you would do one of two things. You can either IPL the system or you can change the value in the QMAXJOB. The new QMAXJOBPCT data area value will immediately take effect after one of these two actions is completed.

Selectively Deleting OS/400 Spool Files

While i5/OS and OS/400 provide options for automatically deleting system joblogs and other system output after a set number of days, there is no option for automatically deleting spooled file output stored in non-IBM output queues. To remedy this failing, I looked at the options and I came up with a quick CL program for automatically putting spooled file attribute information into a work file and then using that file to selectively delete spooled files in an output queue that meet a certain criteria. Here's how it works.

The key to selectively deleting spooled files is to generate a list of the spooled files included in that output queue and then to copy that information into an OS/400 physical file for automated deletion. To generate my work physical file, I first ran the following Work with Output Queue command (WRKOUTQ).

WRKOUTQ OUTQ(LIBRARY_NAME/OUTPUT_QUEUE_NAME)OUTPUT(*PRINT)

This command produces a printout with a spooled file name of QPRTSPLQ. This printout contains all the file attributes of all the spooled files residing in the designated output queue. These attributes include the spooled file name, user name, job name, and job number that produced the file, as well as the spooled file number, user data, file status, and the date and time that the file was produced. Once I had that printout, I then used a Copy Spooled File command (CPYSPLF), such as the one shown below, to port that information into an OS/400 file.

CPYSPLF FILE(QPRTSPLQ) TOFILE(LIBRARY_NAME/WRKOUTQPRT)SPLNBR(*LAST)

http://www.ibm.com/

Although there is no OS/400 file that provides a physical file layout for the WRKOUTQPRT file that I am porting this information to, I created my own layout that codifies the information for each spooled file contained in the QPRTSPLQ report. Here is the layout for the WRKOUTQPRT file used in this article.

A R QPRTSPLQ A FILL0 1A TEXT('FILLER') A SPLFIL 10A TEXT('FILE NAME') A FILL1 1A TEXT('FILLER') A USER 10A TEXT('USER NAME') A FILL2 1A TEXT('FILLER') A USRDTA 10A TEXT('USER DATA') A FILL3 2A TEXT('FILLER') A STATUS 5A TEXT('STATUS') A FILL4 1A TEXT('FILLER') A PAGES 5A TEXT('# OF PAGES') A FILL5 1A TEXT('FILLER') A COPIES 5A TEXT('# OF COPIES') A FILL6 2A TEXT('FILLER') A FRMTYP 10A TEXT('FORM TYPE') A FILL7 1A TEXT('FILLER') A PTY 2A TEXT('PRIORITY') A FILL8 5A TEXT('FILLER') A FILNUM 6A TEXT('FILE NUMBER') A FILL9 5A TEXT('FILLER') A JOB 10A TEXT('JOB NAME') A FILL10 1A TEXT('FILLER') A JOBNUM 6A TEXT('JOB NUMBER') A FILL11 1A TEXT('FILLER') A JOBDTE 8A TEXT('JOB DATE') A FILL12 22A TEXT('FILLER')

Once I worked through the mechanics of creating a spooled file information file and compiled the DDS for the WRKOUTQPRT file, I could then create a CL program that creates and reads through that file and automatically deletes any spool files that meet my criteria. Here's the code I used to create the program.

PGM PARM(&OUTQNAME &OUTQLIB) DCL VAR(&OUTQNAME) TYPE(*CHAR) LEN(10) DCL VAR(&OUTQLIB) TYPE(*CHAR) LEN(10) DCL VAR(&COUNTER) TYPE(*DEC) LEN(15 5) VALUE(1) DCL VAR(&WORK) TYPE(*CHAR) LEN(1) VALUE('0') DCL VAR(&WORK1) TYPE(*CHAR) LEN(1) VALUE('0') DCL VAR(&ZERO) TYPE(*CHAR) LEN(1) VALUE('0') DCL VAR(&COPIES1) TYPE(*CHAR) LEN(5) VALUE(' ') DCL VAR(&FILNUM1) TYPE(*CHAR) LEN(6) VALUE(' ') DCL VAR(&DELETE) TYPE(*CHAR) LEN(1) DCLF FILE(WRKOUTQPRT) CHKOBJ OBJ(QTEMP/WRKOUTQPRT) OBJTYPE(*FILE) MONMSG MSGID(CPF9801) EXEC(CRTDUPOBJ + OBJ(WRKOUTQPRT) FROMLIB(QGPL) + OBJTYPE(*FILE) TOLIB(QTEMP)) OVRDBF FILE(WRKOUTQPRT) TOFILE(QTEMP/WRKOUTQPRT) + MBR(*FIRST)

WRKOUTQ OUTQ(&OUTQLIB/&OUTQNAME) OUTPUT(*PRINT) CPYSPLF FILE(QPRTSPLQ) TOFILE(QTEMP/WRKOUTQPRT) + SPLNBR(*LAST) GETIT: RCVF MONMSG MSGID(CPF0864) EXEC(GOTO CMDLBL(ENDPGM)) CHGVAR VAR(&COPIES1) VALUE(' ')

CHGVAR VAR(&FILNUM1) VALUE(' ') CHGVAR VAR(&COUNTER) VALUE(0) COUNT:

IF COND(&COUNTER *LT 6) THEN(DO) CHGVAR VAR(&COUNTER) VALUE(&COUNTER + 1) IF COND(&COUNTER *LT 6) THEN(DO) CHGVAR VAR(&WORK) VALUE(%SST(&COPIES &COUNTER 1)) IF COND(&WORK *EQ ' ') THEN(CHGVAR + VAR(&COPIES1) VALUE(&COPIES1 *TCAT &ZERO)) ELSE CMD(CHGVAR VAR(&COPIES1) VALUE(&COPIES1 + *TCAT &WORK)) ENDDO CHGVAR VAR(&WORK1) VALUE(%SST(&FILNUM &COUNTER 1)) IF COND(&WORK1 *EQ ' ') THEN(CHGVAR + VAR(&FILNUM1) VALUE(&FILNUM1 *TCAT &ZERO)) ELSE CMD(CHGVAR VAR(&FILNUM1) VALUE(&FILNUM1 + *TCAT &WORK1)) GOTO CMDLBL(COUNT)

ENDDO

IF COND(&COPIES1 *LE '00000') THEN(GOTO + CMDLBL(GETIT)) IF COND(&COPIES1 *GE '99999') THEN(GOTO + CMDLBL(GETIT)) CHGVAR VAR(&DELETE) VALUE(' ') /* INSERT LOGIC TO SET THE DELETE FLAG ON HERE. WHEN A + SPOOLED FILE MEETS THE DELETION CRITERIA, SET THE &DELETE + VARIABLE TO 'Y' */ IF COND(&DELETE *EQ 'Y') THEN(DO) DLTSPLF FILE(&SPLFIL) JOB(&JOBNUM/&USER/&JOB) + SPLNBR(&FILNUM1) ENDDO GOTO CMDLBL(GETIT)

ENDPGM: DLTOVR FILE(WRKOUTQPRT)

DLTF FILE(QTEMP/WRKOUTQPRT)

ENDPGM

Here's how the code works.

The program is called and submitted to batch with two parameters specifying the name and library that you want to selectively delete spooled output files from (the &OUTQNAME and &OUTQLIB variables).

I declare my variables and I also use the Declare File command (DCLF) to declare that the program is going to use the WRKOUTQPRT file shown above as its input file.

Using the Check Object command (CHKOBJ), I check to see if a copy of the WRKOUTQPRT file already resides in the job's temporary library, QTEMP. If the file is not there, I use the Create Duplicate Object command (CRTDUPOBJ) to create a copy of WRKOUTQPRT in QTEMP. Then I use the Override with Database File command (OVRDBF) to redirect any calls to the WRKOUTQPRT file to the version of the file that is now residing in my temporary library. I am using the QTEMP version of the file so that I can simultaneously run the program in two job queues without creating a conflict as two jobs try to lock the WRKOUTQPRT file at the same time.

I next populate the WRKOUTQPRT file with spooled file information from my designated job queue by using the WRKOUTQ and CPYSPLF commands as I explained above.

At the GETIT: label, the program enters the first of two loops. The GETIT: loop uses the Receive File command (RCVF) to read one record at a time from the WRKOUTQPRT file. It then performs all the processing to determine if the spooled file represented by that record should be deleted.

Inside the GETIT: loop, there is a second loop that preps two file fields for further processing. Starting at the loop defined by the COUNT: label, the program populates two work variables called &COPIES1 and &FILNUM1 with variations of the same values that are contained in the number of copies field (&COPIES) and the spooled file number fields (&FILNUM) from the WRKOUTQPRT record. The reason we have to make copies of these fields is that, because they originated from a WRKOUTQ printout, these numeric fields had their leading zeroes suppressed in the original printout. In order to again use these fields as numeric values in the program, we first have to zero-fill both fields from the left. And that's what the loop designated by the COUNT: label in the program does.

At the end of the COUNT: loop, I then use two IF commands (IF) to determine whether or not the record we are processing is a data record containing information about a spooled file in the output queue or whether it is a record containing a header or a blank line from the report it was copied from. I do this by checking to see if the information in the zero-filled &COPIES1 field (number of copies to be printed of the spooled file) is a numeric. If it's numeric (a value between '00000' and '99999'), I let the record through for deletion consideration. If it's not numeric, I send the program back to the top of the GETIT: loop to retrieve the next record for processing.

The next thing you'll notice about the code is that I have added a comment designating where you would put the logic to set the delete flag on (designated by the &DELETE variable being set to 'Y'). For your purposes, you would enter whatever code or program call that would perform your date arithmetic to determine whether &DELETE should be set to 'Y'. Date arithmetic in CL is its own adventure and that code could easily take up a good part, if not all, of this article. Date arithmetic has also been covered in IT Jungle before in an excellent article, complete with code, called Date Handling in CL Procedures, written by Ted Holt. In Ted's article, you can get plenty of ideas for how to handle the date manipulation in your program. Also, Bruce Guetzkow has just done another excellent article on date handling issues in this very edition of Four Hundred Guru.

If you wanted to, you could also set the deletion criteria according to a different standard than spooled files that are over 30 days old. My program can easily be modified to delete any spooled files in an output queue that have a status of 'SAV', that have a certain form type, or that were produced by a specific user or a specific job. So you have some flexibility in how you want to delete the spooled files.

After determining if the record represents a legitimate spooled file and whether it should be deleted, the Delete Spooled File command (DLTSPLF) will delete it as long as the &DELETE variable is set to 'Y'.

After the program finishes processing all the records in the WRKOUTQPRT file, it goes to the ENDPGM: label where it performs end of programming housekeeping. This includes deleting the WRKOUTQPRT file override and deleting the QTEMP/WRKOUTQPRT file. And then the program ends.

How do I print a Rack Configuration for my System i / AS400?

This document gives the instructions for obtaining a Rack Configuration listing from non-LPAR eServer iSeries systems and LPAR listings from LPAR systems. This document is divided into sections for the different cases.

Thanks to the people contributing to the contents of this document:

Rudy Ramos / Avnet Gilbert Tellez / Avnet

Vess Natchev / IBM Michael Bohlen / IBM

For non-LPAR systems:

To obtain a Rack Config and email it:

1. signon to the iSeries with an appropriate userid2. on a command line perform command STRSST3. Select option 1 Start a service tool4. Select option 7 Hardware service manager5. F6 to Print Configuration



6. Take the defaults on Print Format Options (use 132 columns). Do not print it.7. Email the Rack Config listing as described in the section below titled “To email listings

from an iSeries Output Queue”.

For eServer iSeries (non-i5) LPAR systems:

To obtain LPAR Resources Listing on an iSeries:

1. in the primary partition signon to the iSeries with an appropriate userid2. on a command line perform command STRSST3. Select option 5 Work with partitions4. Select option 1 Display partition information5. Select option 5 Display system I/O resources6. At the Level of detail to display field, type *ALL to set the level of detail to ALL.7. F6 to print the system resources (use 132 columns). Do not print it.8. Email the LPAR listing as described in the section below titled “To email listings from an

iSeries Output Queue”.

For eServer i5 LPAR systems:

For i5 LPAR systems, the procedure is more complex. Some information is obtained from each i5/OS partition and some information is obtained on the HMC. Please provide all of the information.

A. For each i5/OS partition obtain a Rack Config listing:

1. signon to each i5/OS partition with an appropriate userid2. on a command line perform command STRSST3. Select option 1 Start a service tool4. Select option 7 Hardware service manager5. F6 to Print Configuration6. Take the defaults on Print Format Options (use 132 columns). Do not print it.7. Email the Rack Config listing from each partition as described in the section below titled

“To email listings from an iSeries Output Queue”.8. Repeat for each i5/OS partition.

You will need to identify each file for each partition. You may want to rename the files to match the partition names.

B. From the HMC obtain the I/O partition information:

The detailed instructions for step B are described in the section titled “Customer Rack Retrieval process thru the HMC”. The process is outlined here and assumes you already have PuTTY installed on your PC:

1. All partitions must be active in order to obtain complete information.2. using PuTTY from a PC attached to the LAN with the HMC, signon to the HMC with the

‘hscroot’ or another authorized profile.3. Enter the following command into the HMC (Substitute your managed system name. If

the name has embedded blanks, then enclose it in “quotes”):lshwres -m <managed system> -r io –rsubtype slot -F unit_phys_loc,bus_id,phys_loc,lpar_name,description,vpd_type,lpar_id –header

4. To copy the results out of PuTTY, click on the top left and select copy all to clipboard.5. Paste the results into a text document (using Word, WordPad, Notepad, etc.).6. Be sure to save the document as a “plain text” document with the file .txt extension.7. Along with the Rack Config listings from step A, email this text document to your IBM

Business Partner or IBM representative.

To email listings from an iSeries Output Queue:

These steps need to be done on a PC attached to the iSeries.

1. Obtain the listing on the iSeries as instructed. Do not print the listing, leave it in the output queue.

2. Open Windows Explorer and locate a folder to place the listing in.3. Start iSeries Navigator.4. If necessary, move the iSeries Navigator or Windows Explorer windows so both can be

seen at the same time.5. On the iSeries Navigator window, click on the plus (+) sign beside the system name.6. Then click on the (+) sign beside Basic Operations7. Single click on Printer Output8. In the right panel find the listing.9. Drag the listing by holding down the left mouse button from the output queue to the

folder in Windows Explorer where the listing is to be placed and drop it in that folder.10. email the listing text file that was placed in the Windows Explorer folder to your IBM

Business Partner or IBM representative. Be sure to leave the document as a “plain text” document with extension .txt.

Customer Rack Retrieval process thru the HMC

Here’s the quick step-by-step guide of how to retrieve the system rack config on eServer i5.

This is a beginning-to-end walkthrough, from downloading the necessary SSH client to rack examples.

1. From a LAN attached PC, Use the SSH protocol to get a remote command-line session. The HMC does not have Telnet enabled. You will need a 2nd application to paste the text into (Windows Note Pad is recommended). A 2nd application is not available on the HMC.

***To download a no-cost SSH client, obtain PuTTY.exe from here: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

This is the Windows version; if you are running Linux, download the Linux version. It is a single executable file.

2. Double-click on PuTTY to start it, and then enter the hostname or IP address of your HMC, and click “Open.”

http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

If this is the first time you are connecting with SSH to a particular HMC, a security warning will appear. If you are sure you are connecting to the correct system, click OK. Sign onto the HMC with ‘hscroot’ or another authorized profile.

3. All partitions must be active in order to obtain complete information.

4. This general command displays many parameters for each physical slot in the system. For example, if you want to display all unit locations and serial numbers, system buses, slots, partition names, I/O device descriptions, vpd types (feature codes) and LPAR IDs in that order, you would be working with parameters unit_phys_loc, bus_id, phys_loc, lpar_name, description, vpd_type and lpar_id:

lshwres -m <managed system> -r io –rsubtype slot -F unit_phys_loc,bus_id,phys_loc,lpar_name,description,vpd_type,lpar_id –header

Substitute your managed system name, for example iTCLSQ1. If the system name has embedded blanks, enclose the name in quotes, for example “520 10-5536D” The system name is case sensitive.

(The above sample of the PuTTY command does not have vpd_type, lpar_id or –header, but please include these in your command.)

5. You can copy out of and paste into PuTTY:

A) To copy out of PuTTY, click on the top left and select copy all to clipboard. In another window (such as a Word, WordPad or Notepad document), use that program’s paste function (such as Ctrl-V) to paste the text.

Be sure to save the document you pasted the PuTTY output into as a “plain text document” with the .txt file extension.

This is very useful in taking the “rack config” out of PuTTY and copying it into a text document.

B) To paste into PuTTY, highlight the text you want to paste in another program, and use its function to copy, such as Ctrl-C. Next a right-click is used to paste the text in the PuTTY window. This is very handy for taking example commands and pasting them into the PuTTY window.

6. Getting help. The HMC is shipped with electronic manual pages for its commands. For example: to learn how to use the lshwres command, type: man lshwres (and press Enter):

You can use the arrow keys, the Space bar, or the Page Up and Down keys to navigate the manual page. To exit back to the command line Type “q”

Admin Alert: The i5 Battery Checking Process

Why Regular Battery Changes Are Important

Lithium ion battery packs provide power for disk caching on your i5 disk arrays. When one of these batteries dies, it may not cause a disk drive failure, but it will disable caching for the disk array it controls and that will have a significant impact on your hard drive performance. According to IBM, the lithium ion batteries are hot swappable, which means that they can be replaced while your system is up and running. However, depending on your machine type, one of the batteries may be controlling a disk array inside the system CPU and the system must be taken down to replace that battery.

Determining When Your Cache Batteries Need Changing

Like the metal nickel batteries that came with older iSeries and AS/400 machines, it may not be apparent when and how these batteries should be changed. Fortunately, i5/OS provides two different mechanisms that warn you when the cache batteries are approaching the end of their life cycle and need changing.

The first warning is automatically issued when the batteries reach their designated warning condition, which is about 90 days before IBM generally estimates they will fail. The i5/OS operating system calculates when a battery is reaching the end of its life cycle, and it will issue the following two messages to the System Operator Message Queue (QSYSOPR) when it reaches that state.

CPPEA13 - *Attention* Contact your hardware service providerCPP8988 - A critical system hardware problem has occurred. Critical Message Handler has been run.

If you see these messages in QSYSOPR, you can enter option '5=Display' in front of the CPPEA13 message and then select F14=Work with problem to enter the Work with Problems screen for this message. You should also note that that you can enter this function from the green-screen by entering the Work with Problem command (WRKPRB) from a command line. Once inside WRKPRB, enter an option 5=Display details in front of the new Problem ID that was created with this warning. This will bring you to the Display Problem Details screen. On this screen, you should note the System Reference Code value, which will be needed when you call IBM support to replace the battery.

In order to ensure that this is a cache battery problem, select F5=Display possible causes on the Display Problem Details screen. The Select Possible Cause Information screen that appears will provide an option to view a Problem analysis list (which you can reach by selecting option 1). If this incident is a battery problem, the screen will display the Cache battery pack message as a possible cause. If you see this cause, call IBM immediately to order new batteries and to schedule battery replacement.

Besides visually spotting the CPPEA13 message in QSYSOPR, you can also set up an email or paging alert to contact you if the system issues the CPPEA13 message, provided your system is running a system monitoring and notification package, such as <="" a="">Bytware's MessengerPlus.

If you don't see the CPPEA13 message in QSYSOPR, you can also specifically monitor for the situation if you're running i5/OS V5R3 or above. In V5R3, IBM added a new option to the System Service Tools function (SST) that allows you to display and work with any system resources that contain cache battery packs. You start System Service Tools by running the following Start System Service Tools command (STRSST).

STRSST

After you sign in to SST, you can check the status of all cache batteries on your machine by selecting option 1 (Start a Service Tool) followed by option 7 (Hardware Service Manager), and option 9 (Work with resources containing cache battery packs). The Work with resources containing battery packs screen displays all the resources that contain a battery pack. If you take an option 5 (Display battery information) for any of the battery packs, you will see a screen that looks something like this.

Battery Information

http://www.bytware.com/products/mplus.html

http://www.bytware.com/products/mplus.html


Resource name . . . . . . . . . . . : DC01 Serial number . . . . . . . . . . . : xx-xxxxxxx Type-model . . . . . . . . . . . . : 2780-001 Frame ID . . . . . . . . . . . . . : 3C01 Card position . . . . . . . . . . . : C02 Battery type . . . . . . . . . . . : Lithium Ion (LiIon)Battery state . . . . . . . . . . . : Warning condition Power-on time (days) . . . . . . . : 806 Adjusted power-on time (days) . . . : 945 Estimated time to warning (days) . : 0 Estimated time to error (days) . . : 75

The critical items to note on this screen are the Estimated time to warning (days) and the Estimated time to error (days) values. When the Estimated time to warning (days) field falls to zero, the system issues the CPPEA13 error message mentioned above. As the Estimated time to error (days) field approaches zero, the chance of your battery cache pack failing grows stronger and you should replace the batteries as soon as possible to prevent disk drive degradation. So even if the system hasn't yet issued a CPPEA13 error, you can check these values on each of your partitions to determine how close a cache battery pack is to exceeding its regular life cycle.

Battery Checking Guidelines

If you haven't received an error message and your i5 machine is more than a year old, it is worth your while to check your cache pack batteries once every few months to make sure that you are aware of any upcoming battery replacements. Here are my guidelines for checking your batteries.

1. Check all the batteries on all of your partitions. If you find one battery that's ready to be replaced, chances are good that other batteries in that same partition or other partitions will also need to be replaced. It's worthwhile to change as many batteries as possible at the same time, so you don't have to deal with multiple visits by IBM service. So be sure to check all your partitions to make sure that you catch any old batteries that are approaching end of life anywhere within your i5 box.

2. Call IBM as soon as possible if you detect a failing battery. Note that both the message warning value and the time to error value are marked as estimates. This means that these numbers are not exact predictive values; battery failure could be much closer than you think and you should get IBM in as soon as possible to help survey the situation and schedule a cache battery pack change. Again, if one of the lithium ion batteries fails, it will not crash you machine. However, it will affect disk drive performance and cause a slowdown in returning data to your applications.

Once you detect a failing battery and call, IBM should send out a service tech (if you're on maintenance) to survey the situation and order the batteries. Once you receive the batteries, call IBM back to schedule your install. Again, don't wait too long to change the batteries as the batteries could theoretically fail any time after the warning message period expires. Remember the warning periods represent estimated life cycles for your cache battery packs. In actual usage, the battery packs may last longer or shorter than IBM's estimate.

Note that this advice is valid for i5 machines running i5/OS V5R3 or above. For older iSeries and AS/400 machine running i5/OS V5R2 and below (including earlier versions of the OS/400 operating system running OS/400 V4R5 and below), check the instructions for these machines that I've previously documented in an article called Checking Your iSeries Batteries.

Admin Alert: Checking Your iSeries Batteries


Most people don't pay attention to their cache batteries until they spot an OS/400 error message stating that their cache battery is about to die. If you're under maintenance, you can call IBM to arrange for a replacement battery, as well as a visit from a technician to install the new battery and to reset the error. Because of the potential for system problems, you should call IBM as soon as possible after getting a battery warning error. But these errors are generally timed so that you have about 90 days to replace the battery before it fails. So don't panic, but don't ignore the warning, either, or you may find yourself in trouble if the battery suddenly fails before its 90 days are up.

IBM will send you a replacement battery (which is about the size of a battery you might see in a portable phone), and will give you instructions for calling for a replacement appointment once the battery arrives. To replace the battery, you must take down the partition where the I/O adapters with the failing cache battery resides, so that the technician can pull the I/O adapter card and put in the new battery.

But the batteries don't always fail at the same time, especially if you've added or replaced I/O adapters on your system. So while you're planning to take down a partition or two (especially if the failed battery resides in a primary partition, which will disable the whole system), you may want to inventory the other batteries on your system and ask IBM to change any that are close to issuing a failure warning. This way, you only have to take your system down once to replace all of your older batteries.

To find the status of batteries, open a green-screen 5250 session and go into each partition's "system service tools" menu, by typing in the Start System Service Tools (STRSST) command. Beginning with OS/400 V5R1, IBM requires you to type in a user ID and password before entering SST. While this sounds easy, it's also incredibly easy to disable or forget your SST password. (If you need help understanding how to set or reset an SST password, see "Bringing V5R1 DST Passwords Under Control.")

Once you enter the SST menu, perform the following commands to display the status of your batteries.

1. Type in option 1 from the "system services" menu, "start a service tool."

2. Select option 4 from the "start a service tool" menu, "display/alter/dump."

3. Select option 1 from the "display/alter/dump" output device menu, "display/alter storage.

4. Select option 2, "licensed internal code (LIC) data," from the "select data" menu.

5. Select option 14, "advanced analysis," from the "select LIC data" menu.

6. On the "select advanced analysis command" screen, place a 1 in front of the BATTERYINFO command, and press Enter.

http://www.itjungle.com/tfh/tfh042902-story05.html

http://www.ibm.com/

7. On the option line for the BATTERYINFO command, type -INFO –ALL, and press Enter.

Performing this procedure displays the status of all batteries assigned to your partition. This BATTERYINFO results screen shows the frame and card position of each battery, the battery type, and the estimated time (in days) before OS/400 issues an oncoming failure warning on that battery, as well as the estimated time (in days) before the battery actually could fail. And if you have multiple partitions with multiple I/O adapter cards on your system, you should run this procedure on every partition to get a complete inventory of batteries needing maintenance.

My personal guideline is to ask IBM to replace any battery that is within a year of issuing a failure warning. Since iSeries boxes are renowned for running for months or even longer without a shutdown, this should be a reasonable timeframe. After you get the complete information on all batteries on the system that need to be replaced within a year, call IBM to order the batteries and schedule the service call.

There are several other options you can run once you're inside BATTERYINFO. You can find these options by running the BATTERYINFO macro with the "help" option. But be careful when running this command, because it contains one option that will force an error on an active battery cache pack.

Also be aware that, if you're running OS/400 V5R2, there is a PTF that you must apply in order to display battery pack status information or to force a battery pack error. The PTF number is MF32343, which is applied to licensed program 5722999.

By following these simple instructions, you can easily inventory your battery pack to monitor the health of your I/O adapter cards and to plan for orderly battery replacements.

http://www-912.ibm.com/a_dir/as4ptf.nsf/0/99a432192bec008086256e6800529923?OpenDocument

admin alert

Documents