operations & availability

35
© Copyright IBM Corporation 2004 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Welcome to: 3.1 Operations and Availability Operations and Availability

Upload: sathyan-mahalingam

Post on 16-Sep-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

  • Copyright IBM Corporation 2004Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

    Welcome to:

    3.1

    Operations and AvailabilityOperations and Availability

  • Copyright IBM Corporation 2004

    Unit Objectives

    After completing this unit, you should be able to:Understand how to achieve high availabilityConsider the role backup, failure recovery, and applying updates plays in daily operationsList what needs to be backed up and whenDiscuss additional monitoring and operational technology aids that can help with daily management

  • Copyright IBM Corporation 2004

    Topic 1High AvailabilityDaily operations, Backup, and RecoveryExternal Tools for Monitoring and Operations

  • Copyright IBM Corporation 2004

    Failure Recovery - Software and HardwareNeed plan in place to recover from failure before it happensSoftware Failure

    Recovery may be automaticRe-create components if necessary Restore backups if necessary

    Hardware Failure, Single SystemRecovery less likely to automaticFor high availability requirement use:

    WMQ Clustering and/orPlatform level failover using HACMP, MS Cluster Server, Veritas, and so forthThen WMQ and Message Broker recovery using the support pac failover procedure

    Combine Clustering and Platform Failover to build a highly available system

    Hardware Failure, CatastrophicRecovery almost never automatic, usually at different physical siteCreate componentsRestore backup for all componentsDetermine need for reprocessing of work and initiate

    Success depends on good synchronized backups

  • Copyright IBM Corporation 2004

    High AvailabilityAvailability

    Percent Availability = Up-Time / (Up-Time + Down-Time) * 100Down-Time = Scheduled Down-Time + Unscheduled Down-Time

    Factors to ensure high availabilityHardware Software Configurations

    Disk mirroringServer redundancyVeritasUninterruptible power supplies (UPS)

    Application DesignApplications need to support non-disruptive release upgradesAvoid single points of failure

    Data Center OrganizationStrict change controlComprehensive testingOperations support

  • Copyright IBM Corporation 2004

    Issues Contributing to High AvailabilityReliable hardwareShared queuesHeath monitoringFailover clusteringOnline backupDual networksReliable operating systemOnline reconfigurationApplication design

    WebSphere MQ ClusteringFast rebootRAID DisksCrash recoverySpeed of recoveryFast startupIP takeoverDocumented proceduresPracticing procedures

  • Copyright IBM Corporation 2004

    Achieved by a combination of two distinct technologies:Hardware clustering (for example, HACMP)

    To provide high availability of a single server within an WMQI hubSoftware clustering

    To provide load balancing and high service availability across the whole hub (by allowing individual servers within a hub to become unavailable while the other servers continue to operate and service requests to the hub)

    Use both of these technologies in messaging hubs to achieve high throughput and availability

    Highly Available WebSphere MQ/WBIMB Configurations

  • Copyright IBM Corporation 2004

    Restart/FailoverClustering (HA, RAID)

    Distributed Clustering(Fastnet)

    No clusteringsupport

    Access toexisting msgs

    Access fornew msgs

    Shared QTandem continuous

    continuous

    continuous

    automaticautomatic

    none none

    none

    automatic continuous

    Messaging and Availability

  • Copyright IBM Corporation 2004

    Software ClusteringLogical hub network technology

    Multiple physical WebSphere MQ queue managers Multiple brokers in WBIMB

    Spreads the workload Improves performance and availability

    Messaging HubsPotential for bottlenecksResolved by scalingPotential single point of failureResolved by availability/recoverability

  • Copyright IBM Corporation 2004

    Clustered Servers Running a Queue Manager

  • Copyright IBM Corporation 2004

    Cold Standby

  • Copyright IBM Corporation 2004

    QM-B QM-A

    ipaddr

    /var/mqm/usr/lpp/mqm

    /var/mqm/usr/lpp/mqm

    /var/mqm/log/QM-A/var/mqm/qmgrs/QM-A

    ipaddr

    /var/mqm/log/QM-B/var/mqm/qmgrs/QM-B

    Active/Active

  • Copyright IBM Corporation 2004

    Availability - Two Servers

  • Copyright IBM Corporation 2004

    z/OS Shared Queues

  • Copyright IBM Corporation 2004

    Failover on Various Platforms

    Platform Failover Facilityz/OS ARMAIX HACMP

    HP-UX ServiceGuardOpenVMS

    Solaris Cluster SunTru64 TruCluster

    WinNT, Win2000 MSCS

  • Copyright IBM Corporation 2004

    Topic 2High AvailabilityDaily Operations, Backup, and RecoveryExternal Tools for Monitoring and Operations

  • Copyright IBM Corporation 2004

    OperationsDaily or Regularly Scheduled Operations Tasks

    BackupsDatabase; Configuration Manager and Broker RepositoriesWMQ LogSoftware Configuration and Individual Workspace Files

    MonitoringRuntime systems and application monitoring for Problem DetectionBusiness Process monitoring for business analysis

    Exceptional Operations TasksCode maintenanceProblem Determination

    Identification, Repair, and ImplementationFailure recovery

  • Copyright IBM Corporation 2004

    What Could Go Wrong?All WBIMB components must be protected against the following failures:

    CPU failureDisk failureSystem lossApplication failureWMQ object corruption WMQ object deletionDatabase corruptionDatabase deletionLoss/reset of environment variable settings

    The above situations can affect any WBIMB component running in development, test or production

  • Copyright IBM Corporation 2004

    Key Areas to Consider in Backing Up WBI MBWBIMB does not hold any data itself.Messages exist on WMQ queuesConfiguration data is held in databasesSource code kept in developer workspace or external SCM

    Message flowsMessage setsUser-defined nodes and parsers (plug-ins)

    Aspects of WBIMB Backups1. WMQ queues used by Message Broker2. WBIMB code and configuration files3. WBIMB product and application databases4. Developer artifacts (source code)

  • Copyright IBM Corporation 2004

    Backup - 1. WMQ QueuesBrokers are WMQ applications processing production data.

    Tight backup procedures are required.Backup procedures for base MQSeries apply for broker queue managers and their queues.

    The queue manager used by the configuration manager does not handle production data.

    Backup procedures need not be as tight.Deploys and command messages can be redriven.Ensure that the queue manager can be rebuilt.

    Select appropriate log file type for disaster recovery Circular versus linear logs

    Archive log files, backup WMQ config file, synchronize with DB backup

  • Copyright IBM Corporation 2004

    Backup - 2. WBIMB Product Code and Configuration

    The complete WBIMB directory structureDeveloper ToolkitsConfiguration manager workstationRuntime broker machines

    Code and plugin LIL files in usr/opt/mqsi Backup on installation and plugin update

    Configuration in broker file system /var/mqsi On broker creation and change, when user db added (odbc.ini)

    Consider development, test and production systems

    Registry entries for ConfigMgr and BrokerConfiguration infos like DBs to access, DB/Service IDs, passwordsHKEY_LOCAL_MACHINE\SOFTWARE\IBM\WebSphereMQIntegrator

    Code for user written plugin nodes and parsersRuntime plugin source codeRuntime executable codeNode definition files used at configuration time (in Toolkit)

  • Copyright IBM Corporation 2004

    Backup - 3. WBIMB DatabasesAfter each production deployment backup:

    Configuration Manager databaseContains critical domain configuration information and ACLs (domain and Pub/Sub)

    Broker databasesContain deployed object information

    For broker database tables updated outside of the deploy process

    Retained publications Subscriber list

    Configure the database manager to store changes in its log files.If DB2 is used the DBM should employ archival logging

    Application databasesContain user data and are accessed by inflight WBIMB message flowsBackup should conform to the practices in place for other databases in the enterprise

  • Copyright IBM Corporation 2004

    Backup - 4. Developer ArtifactsMessage flows, ESQL, mappings, Message sets, test data, Plug-in nodesAll code artifacts stored in file systems

    Programmer workspaceProjects can be distributed in file system

    Software Configuration Manager repository

    Toolkit provides Local HistoryCustomize days to keep files, entries per file, max. file size

  • Copyright IBM Corporation 2004

    Recovery Scenarios It is recommended that recovery plans contain documented procedures for recovery from various failure scenariosFor WBIMB consider the following list

    Execution group or single message flow failsBroker failsWMQ queue manager failsUserNameServer failsConfiguration manager failsConfiguration manager and queue manager fail

    Include for each recovery scenario:Details of the individual components that are neededWhere each component is restored fromSteps to be performed to restore each component

    Order the work items to be performedList the personnel involved

    Test the recovery procedureRecord details of the time taken to restore full serviceHighlight critical stagesCapture details regarding the complexity of stages

  • Copyright IBM Corporation 2004

    MonitoringBrokers should be monitored for performance, errors, and so forth

    Several places to watch:Message flow input queues - number or messages and backout count on first messageMessage flow output queuesFailure queuesDead letter queueBackout queueSystem Log (NT Event Log and UNIX System Logs)

    Can (and should) be done with automated toolsMore on available tools in the next topic

    May want to enable automated responseCan use the same XML messaging that is used by the Configuration Manager to monitor the Broker

    Subscribe to $SYS topics

  • Copyright IBM Corporation 2004

    Problem DeterminationShould have plan in place before you really need itWho will troubleshoot problems?

    May be multiple groupsDatabase, WMQ, System, Network administratorsApplication expertsIn addition to the person or group supporting the Message Broker

    How will problems be found?Broker syslog entriesMessage Broker explicit failure handling techniques

    TryCatch/Throw/Trace nodes, Exception Lists in Failure/Catch pathsUserTrace and Debugger for test/developmentMonitoring products -

    Message Broker, DB2 and WMQMay also need to analyze system monitor information

    Who will fix problems?Need plan to get right groups involved after determining failure pointFixes should be applied by the appropriate group/personAppropriate regression and promotion processes followed

  • Copyright IBM Corporation 2004

    Code MaintenanceInfrastructure Code Fixes (CSDs) apply to all componentsOnline Software Updates for Message Brokers Toolkit

    Documentation and interim fixes Get all from Help->Software Updates->New UpdatesOr download from ftp.software.ibm.com/software/mqseries/fixes/wbimbv50/ and apply selectively via Install/Update PerspectiveSave current configuration - can go back to saved configurations

    Integration Code FixesFollow your shop practices Regression testing for functionStress testing for performanceCatalog and have available previous safe release of the codeMake sure you can regress to a previous safe level

  • Copyright IBM Corporation 2004

    Topic 3High AvailabilityDaily operations, Backup, and RecoveryExternal Tools for Monitoring and Operations

  • Copyright IBM Corporation 2004

    External Monitoring ToolsThe Message Broker and WMQ provided tools provide basic control and monitoringNeed something more if your needs are more sophisticatedExternal tools have on or more of the following capabilities:

    Automated monitoring of application (either by looking at processes, using WMQI internal query queue, or watching queue depth)Automated response to problems

    Varying options from automated recovery to operator notificationPerformance monitoringReport productionIntegrates with base MQ monitoring product for overall solutionMay require a specific plug-in node supplied by the provider

    The Broker facilities include Statistics and Accounting servicesWill look at one tool that monitors business processes rather than the operational components of Broker domains

  • Copyright IBM Corporation 2004

    IBM TivoliTivoli Manager for WebSphere MQ/WebSphere Business Integration Message Broker

    Integrates with Tivoli FrameworkAllows monitoring of WebSphere MQ and Message Broker using Tivoli Distributed MonitoringAllows control of Message Broker componentsEvents are generated through Tivoli Enterprise Console

    Rules can be created to handle various events, including paging, sending e-mail, and automated responses

    Instrumentation is provided on Broker CD for monitoring by Tivoli Business Systems ManagerRequires Tivoli Framework

    Can't be used stand-alone

  • Copyright IBM Corporation 2004

    IBM WebSphere BI MonitorWebSphere Business Integration Monitor

    Can be used for all WebSphere Business Integration TechnologyIs used to monitor business processes rather than software componentsCan be used to look at immediate or historical informationLooks at business processes from a macro level

    Whole multi-component business processesShows progress of automation a high level step view

    Not capable of monitor and control at operational levelRequires WebSphere Business Integration Modeler

  • Copyright IBM Corporation 2004

    CandleNet Command CenterCandleNet Command Center (CCC)

    Allows monitoring of WMQI brokers and eventsGathers performance statisticsOther reporting tools (subscriptions)Can perform automated actions Also has WebSphere MQ base counterpartRefer to: http://www.candle.com

  • Copyright IBM Corporation 2004

    BMC PATROL PATROL for WebSphere MQ Integrator

    Allows monitoring of WMQI brokers and eventsKeeps historical data of problemsCan perform automated actions Integrates with PATROL for MQ - OperatorRefer to: http://www.bmc.com

  • Copyright IBM Corporation 2004

    MQSoftware QPasa!

    MQSoftware's QPasa!Allows monitoring of WMQI brokers and eventsMonitors throughputHistorical data trackingCan personalize GUI to meet user's needsCan perform automated actions Can also monitor WebSphere MQRefer to: http://www.mqsoftware.com

  • Copyright IBM Corporation 2004

    Unit Summary

    High availability achieved through mix of HW, SW and proceduresMany daily tasks need to be considered and planned for to make a WebSphere BI Message Broker installation run smoothly.Define roles and responsibilitiesCreate disaster recovery planDesign backup schedules for configuration and production dataCoordinate with

    Change controlProblem determination procedures

    Test disaster recovery scenarios - often Many external products are available to help with daily monitoring and response