kendall kinnear system analyst standard motor products, inc. (214) 843-0841 [email protected]
TRANSCRIPT
Watches on IBM i
Kendall KinnearSystem AnalystStandard Motor Products, Inc.(214) [email protected]
Why did I explore Watches?
Looking for an easy way to monitor QHST message
Remote journal outages occurringMessages only in QHSTNo monitoring software
Started investigating optionsPurchase monitoring software ($$$$)Process QHST files with a programFind another way – Watches
I’d never heard of these, turns out they are pretty cool
What are Watches?
Officially System Event WatchesImplemented in V5R4Types of events to monitor for
Messages anywhere in the systemAny job logHistory logMessage queue (*MSGQ)
Licensed Internal Code (LIC) LogProduct Activity Log (V6.1 or higher)
User written program called when event occurs
Watch related commands
Start Watch – STRWCHEnd Watch – ENDWCHWork with Watches – WRKWCH
Implemented via Start Watch (STRWCH)
Implemented via Start Watch (STRWCH)
Unique across system
Implemented via Start Watch (STRWCH)
*NONE required if watching PAL or LIC
Implemented via Start Watch (STRWCH)
Message queue to watch:- *SYSOPR- *JOBLOG- *HSTLOG- Specific message queue
Implemented via Start Watch (STRWCH)
Required if message queue is *JOBLOG
Implemented via Start Watch (STRWCH)
Watch Event Program
User written program canExecute any command, APIs, etc.Execute any function accessible to a *USER state/*USER domain program Adopt owner authority if necessary
Four parameters requiredWatch event (MSGID, LICLOG, PALLOG)Session identifierError detection (output)Event specific data
Watch Program Parameters
Watch EventWhy the program was calledValues:
*MSGID - A match on a message id & comparison data*LICLOG - A match on a LIC log & comparison data*PAL - A match on a PAL & comparison data*STRWCH – Watch session is starting*ENDWCH – Watch session is ending
Session IDThe session ID specified or generated by STRWCH
Watch Program Parameters
Error DetectedReturn code to indicate completion statusBlank = successful*ERROR = Error occurred in watch program, watch is canceled
Event Specific DataData structureDetails about event that triggered the watchParse to retrieve information
When a Watch Event OccursQSCWCHPS job is used to call the watch programIf there is a problem with the watch program itself, check the joblog for the QSCWCHPS jobWRKJOB QSCWCHPS will show all active and jobs in OUTQ status To isolate the job, try:
Use CHGJOB LOG(4 00 *SECLVL) LOGCLPGM(*YES) to make sure that messages and CL commands are loggedUse the DMPCLPGM to do a dump of the CL program including variables and error messagesInclude DLYJOB in program, then issue the WRKACTJOB SBS(QUSRWRK) and look for the QSCWCHPS job in DLYWIssue a reply message to put the job in MSGW and use WRKACTJOB SBS(QUSRWRK) and look for the QSCWCHPS job in MSGW
Watch out for…
Single watch program/more than one watch: If you access shared storage it could create contentionIf you access system objects, make sure any locks are released to avoid issuesToo many watches can cause performance issuesRequires *SERVICE special authorityServer QSCWCHPS must be running in subsystem QUSRWRK
STRPJ SBS(QUSRWRK) PGM(QSYS/QSCWCHPS) if not active
Interesting Links
Knowledge Center:http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_71/cl/strwch.htm?lang=en
STRWCH - Watch Exit Programs Explained with CL Example
http://www-01.ibm.com/support/docview.wss?uid=nas8N1011571
Other articleshttp://www.redbooks.ibm.com/abstracts/tips0839.html?Open#contentshttp://support.rjssoftware.com/content/using-event-watches-iseries
So why did I need an event watch?
Remote journals receive a TCP RESET command at random times
Not issued by partner IBM iSomething in network spoofs the IP address of partner
Results in CPF70D5 in QHST for failed journalCPC6984 issued when journal restartedIssue Robot/ALERT for each event
Let Network Support know RESET receivedLet me know if journal doesn’t restart
Simple solution until cause of TCP RESET is found
Watch program example
MMXRJMON: PGM PARM(&WCHOPTN &SSNID &ERRCOD &EVTDATA)
DCL VAR(&WCHOPTN) TYPE(*CHAR) + LEN(10) /* Reason Program called */ DCL VAR(&SSNID) TYPE(*CHAR) + LEN(10) /* Session ID */ DCL VAR(&ERRCOD) TYPE(*CHAR) + LEN(10) /* Return Error Code */ DCL VAR(&EVTDATA) TYPE(*CHAR) + LEN(2048) /* Event Data */
Input parameters
Watch Program Example
Exploded parameter variables/* Event data work area for use by *DEFINED variables */ DCL VAR(&EVTDTADFND) TYPE(*CHAR) + LEN(2048) DCL VAR(&MSGID) TYPE(*CHAR) + STG(*DEFINED) LEN(7) + DEFVAR(&EVTDTADFND 5) /* Watched Message */ DCL VAR(&OFSRPLDTA) TYPE(*INT) + STG(*DEFINED) LEN(4) + DEFVAR(&EVTDTADFND 441) /* Offset to Repl Data */ DCL VAR(&LENRPLDTA) TYPE(*INT) + STG(*DEFINED) LEN(4) + DEFVAR(&EVTDTADFND 445) /* Length of Repl Data */ DCL VAR(&PMSGRPLDTA) TYPE(*PTR) /* PTR to Repl Data */ DCL VAR(&BMSGRPLDTA) TYPE(*CHAR) + STG(*BASED) LEN(512) + BASPTR(&PMSGRPLDTA) /* Based Repl Data */ DCL VAR(&MSGRPLDTA) TYPE(*CHAR) + LEN(512) /* Msg Replacement Data */
Watch Program Example
Work variables/* Work variables */ DCL VAR(&JRNNAM) TYPE(*CHAR) LEN(10) /* Journal Name */ DCL VAR(&FSTLVL) TYPE(*CHAR) + LEN(132) /* 1ST LEVEL TEXT */ DCL VAR(&SCDLVL) TYPE(*CHAR) + LEN(132) /* 2ND LEVEL TEXT */ DCL VAR(&MSGDTA) TYPE(*CHAR) + LEN(132) /* MESSAGE DATA */ DCL VAR(&SYSNAME) TYPE(*CHAR) + LEN(10) /* System name */
Watch Program Example
Main logic/* Setup generic monitor to handle unexpected errors and alert support team */ MONMSG MSGID(CPF0000 LVE0000) EXEC(GOTO CMDLBL(GENERR))
/* Get system name for message handling control */ RTVNETA SYSNAME(&SYSNAME)
/* COPY PARAMETER TO WORK DATA */ CHGVAR VAR(&EVTDTADFND) VALUE(&EVTDATA)
/* Message replacement data */ CHGVAR VAR(&PMSGRPLDTA) + VALUE(%ADDR(&EVTDTADFND)) /* Set base pointer */ CHGVAR VAR(%OFS(&PMSGRPLDTA)) + VALUE(%OFS(&PMSGRPLDTA) + + &OFSRPLDTA) /* Point to start of data*/ IF COND(&LENRPLDTA *GT 0) + THEN(DO) /* If rpl data B01*/ CHGVAR VAR(&MSGRPLDTA) + VALUE(%SST(&BMSGRPLDTA 1 + &LENRPLDTA)) /* Store rpl data 01*/ ENDDO /* END if rpl data E01*/
Watch Program Example
Main logic/* Extract journal name from message replacement data */ CHGVAR VAR(&JRNNAM) VALUE(%SST(&MSGRPLDTA 1 10))
/* If CPF70D5 Remote Journal Failure */IF COND(&MSGID = 'CPF70D5') + THEN(DO) /* If CPF70D5 B01*/
CHGVAR VAR(&FSTLVL) VALUE('Remote journal' *BCAT + &JRNNAM *TCAT ' failed and should + auto-restart, check journaling for + issues.') /* Build 1st level text01*/ CHGVAR VAR(&SCDLVL) VALUE('Message + CPF70D5 received for + remote journal failure. + Sign-on to system + and verify remote + remote journals.') /* Build 2nd level text01*/ CALLSUBR SUBR(MIMIXLOG) /* Send messages 01*/ GOTO CMDLBL(END) /* Exit program 01*/ ENDDO /* End msg CPF70D5 E01*/
Watch Program Example
Main logic/* If CPC6984 Remote Journal Started */IF COND(&MSGID = 'CPC6984') + THEN(DO) /* If CPF70D5 B01*/
CHGVAR VAR(&FSTLVL) VALUE('Remote journal' *BCAT + &JRNNAM *TCAT ' started.') /* Build 1st level text01*/ level text01*/ CHGVAR VAR(&SCDLVL) VALUE('Message + CPC6984 received for + remote journal start.') /* Build 2nd level text01*/ CALLSUBR SUBR(MIMIXLOG) /* Send messages 01*/ GOTO CMDLBL(END) /* Exit program 01*/ ENDDO /* End msg CPF70D5 E01*/
/* Exit program */ GOTO CMDLBL(END) /* Exit program */
Watch Program Example
Error handling and send messages/* General Error Handler */GENERR: CHGVAR VAR(&FSTLVL) VALUE('Unexpected + error occurred in MONRJCMN. + Monitor ended') /* Build Message */ CHGVAR VAR(&ERRCOD) + VALUE('*ERROR') /* Indicate error */ CALLSUBR SUBR(MIMIXLOG) /* Send messages */
/* MIMIX Log Messages Subroutine */ SUBR SUBR(MIMIXLOG) /* Start subroutine */ CHGVAR VAR(&MSGDTA) VALUE(&FSTLVL + *CAT &SCDLVL) /* Build msg data */ MIMIX/ADDMSGLOGE MSGID(LVI0005) + MSGDTA(&MSGDTA) SEV(40) + PRD(*MIMIX) /* Log message */ CALLSUBR SUBR(ROBOTMSG) /* Page with ROBOT */ ENDSUBR
Watch Program Example
Error handling and send messages/* ROBOT Paging Subroutine */ SUBR SUBR(ROBOTMSG) IF COND(&SYSNAME = XXXXXXXXXX) + THEN(DO) /* If on SMPLEW01 */ RBTALRLIB/RBASNDMSG MSG(&FSTLVL) + TOPG(MIMIXINF) RSP(*NO) /* Send alert */ MONMSG MSGID(CPF0000 LVE0000) /* Ignore errors */ ENDDO /* END if LEW01 */ ELSE CMD(DO) /* If not SMPLEW01 */ MIMIX/RUNCMD CMD(RBTALRLIB/RBASNDMSG + MSG(&FSTLVL) TOPG(MIMIXINF) + RSP(*NO)) PROTOCOL(*TCP) + HOST(XXXXXXXXXX) /* Page via LEW01 */ MONMSG MSGID(CPF0000 LVE0000) /* Ignore errors */ ENDDO ENDSUBR
END: ENDPGM
Active Watch
Questions?