siosprotectionsuiteforlinux technicaldocumentation...

509
SIOS Protection Suite for Linux Technical Documentation v9.2 October 2017

Upload: others

Post on 17-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

SIOS Protection Suite for LinuxTechnical Documentationv9.2

October 2017

Page 2: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

This document and the information herein is the property of SIOS Technology Corp. (previously known asSteelEye® Technology, Inc.) and all unauthorized use and reproduction is prohibited. SIOS TechnologyCorp. makes no warranties with respect to the contents of this document and reserves the right to revise thispublication andmake changes to the products described herein without prior notification. It is the policy ofSIOS Technology Corp. to improve products as new technology, components and software becomeavailable. SIOS Technology Corp., therefore, reserves the right to change specifications without prior notice.

LifeKeeper, SteelEye and SteelEye DataKeeper are registered trademarks of SIOS Technology Corp.

Other brand and product names used herein are for identification purposes only andmay be trademarks oftheir respective companies.

Tomaintain the quality of our publications, we welcome your comments on the accuracy, clarity,organization, and value of this document.

Address correspondence to:[email protected]

Copyright © 2017By SIOS Technology Corp.SanMateo, CA U.S.A.All rights reserved

Page 3: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Table of Contents

Introduction 1

About SIOS Protection Suite for Linux 1

SPS for Linux Integrated Components 1

Documentation and Training 1

Documentation 1

Training 2

Technical Support 2

SIOS LifeKeeper for Linux 4

Introduction 4

Protected Resources 4

LifeKeeper Core 5

LifeKeeper Core Software 5

File System, Generic Application, IP and RAW I/O Recovery Kit Software 6

LifeKeeper GUI Software 7

LifeKeeper Man Pages 7

Configuration Concepts 7

Common Hardware Components 7

Components Common to All LifeKeeper Configurations 9

System Grouping Arrangements 9

Active - Active Grouping 10

Active - Standby Grouping 11

Intelligent Versus Automatic Switchback 12

LoggingWith syslog 13

Resource Hierarchies 13

Resource Types 13

Table of Contentsi

Page 4: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Resource States 14

Hierarchy Relationships 15

Shared Equivalencies 16

Resource Hierarchy Information 16

Resource Hierarchy Example 17

Detailed Status Display 17

Resource Hierarchy Information 19

Communication Status Information 20

LifeKeeper Flags 21

Shutdown Strategy 22

Short Status Display 22

Resource Hierarchy Information 22

Communication Status Information 23

Fault Detection and Recovery Scenarios 23

IP Local Recovery 23

Local Recovery Scenario 23

Command Line Operations 24

Resource Error Recovery Scenario 25

Server Failure Recovery Scenario 27

Installation and Configuration 29

SPS for Linux Installation 29

SPS for Linux Configuration 29

SPS Configuration Steps 29

Set Up TTY Connections 30

LifeKeeper Event Forwarding via SNMP 31

Overview of LifeKeeper Event Forwarding via SNMP 31

LifeKeeper Events Table 31

Configuring LifeKeeper Event Forwarding 33

Prerequisites 33

Configuration Tasks 33

Table of Contentsii

Page 5: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Verifying the Configuration 34

Disabling SNMP Event Forwarding 34

SNMP Troubleshooting 34

LifeKeeper Event Email Notification 35

Overview of LifeKeeper Event Email Notification 35

LifeKeeper Events Generating Email 35

Configuring LifeKeeper Event Email Notification 37

Prerequisites 37

Configuration Tasks 37

Verifying the Configuration 37

Disabling Event Email Notification 37

Email Notification Troubleshooting 38

Optional Configuration Tasks 38

Confirm Failover and Block Resource Failover Settings 38

Set Confirm Failover On 39

When to Select [Confirm Failover]Setting 42

Block Resource Failover On 42

Conditions/Considerations 44

Configuration examples 44

Block All Automatic Failovers 44

Block Failovers in One Direction 46

Setting Server Shutdown Strategy 47

Tuning the LifeKeeper Heartbeat 48

Overview of the Tunable Heartbeat 48

Example 48

Configuring the Heartbeat 48

Configuration Considerations 49

Using Custom Certificates with the SPS API 49

How Certificates Are Used 50

Using Your OwnCertificates 50

Table of Contentsiii

Page 6: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Linux Configuration 50

Data Replication Configuration 55

Network Configuration 55

Application Configuration 56

Storage and Adapter Configuration 56

HP Multipath I/O Configurations 68

Hitachi HDLMMultipath I/O Configurations 70

DeviceMapper Multipath I/O Configurations 100

LifeKeeper I-O Fencing Introduction 102

SCSI Reservations 103

Storage Fence Using SCSI Reservations 103

AlternativeMethods for I/O Fencing 104

Disabling Reservations 104

Non-Shared Storage 105

Configuring I/O FencingWithout Reservations 105

I/O Fencing Chart 105

Quorum/Witness 107

Quorum/Witness Server Support Package for LifeKeeper 107

Feature Summary 107

Package Requirements 107

Package Installation and Configuration 108

Configurable Components 108

Available QuorumModes 109

AvailableWitness Modes 110

Available Actions WhenQuorum is Lost 111

Additional Configuration for Shared-Witness Topologies 111

Adding aWitness Node to a Two-Node Cluster 112

Expected Behaviors (Assuming Default Modes) 113

Scenario 1 113

Scenario 2 113

Table of Contentsiv

Page 7: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Scenario 3 114

Scenario 4 114

STONITH 115

Using IPMI with STONITH 115

Package Requirements 115

STONITH in VMware vSphere Environments 115

Package Requirements 115

STONITH Server 115

Monitored Virtual Machine 115

Installation and Configuration 115

<vm_id> 117

Expected Behaviors 118

Watchdog 118

Components 118

Configuration 119

Uninstall 120

Resource Policy Management 120

Overview 120

SIOS Protection Suite 121

Custom andMaintenance-Mode Behavior via Policies 121

Standard Policies 121

Meta Policies 122

Important Considerations for Resource-Level Policies 122

The lkpolicy Tool 123

Example lkpolicy Usage 123

AuthenticatingWith Local and Remote Servers 123

Listing Policies 124

Showing Current Policies 124

Setting Policies 124

Removing Policies 124

Table of Contentsv

Page 8: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Configuring Credentials 124

Adding or Changing Credentials 125

Listing Stored Credentials 125

Removing Credentials for a Server 125

Additional Information 125

LifeKeeper Administration Overview 127

Overview 127

Error Detection and Notification 127

N-Way Recovery 128

Administrator Tasks 128

Editing Server Properties 128

Creating a Communication Path 128

Deleting a Communication Path 130

Server Properties - Failover 130

Creating Resource Hierarchies 131

LifeKeeper Application Resource Hierarchies 132

Recovery Kit Options 132

Creating a File System Resource Hierarchy 132

Creating a Generic Application Resource Hierarchy 134

Creating a Raw Device Resource Hierarchy 135

Quick Service Protection (QSP) Recovery Kit 136

Introduction 136

Requirements 136

Create QSP Resource Hierarchy 137

Extending QSP Resource Hierarchy 138

QSP Resource Configuration 138

How to change theMonitoring function 139

How to change Timeout Value 140

Editing Resource Properties 142

Editing Resource Priorities 142

Table of Contentsvi

Page 9: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Using the Up and Down Buttons 143

Editing the Priority Values 144

Applying Your Changes 144

Extending Resource Hierarchies 144

Extending a File System Resource Hierarchy 145

Extending a Generic Application Resource Hierarchy 145

Extending a Raw Device Resource Hierarchy 146

Unextending a Hierarchy 146

Creating a Resource Dependency 147

Deleting a Resource Dependency 148

Deleting a Hierarchy from All Servers 148

LifeKeeper User Guide 151

Using LifeKeeper for Linux 152

GUI 152

GUI Overview - General 152

GUI Server 152

GUI Client 152

Exiting GUI Clients 153

The LifeKeeper GUI Software Package 153

Menus 153

SIOS LifeKeeper for Linux Menus 153

Resource Context Menu 153

Server Context Menu 154

File Menu 155

Edit Menu - Resource 156

Edit Menu - Server 156

View Menu 157

Help Menu 158

Toolbars 158

SIOS LifeKeeper for Linux Toolbars 158

Table of Contentsvii

Page 10: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

GUI Toolbar 158

Resource Context Toolbar 160

Server Context Toolbar 161

Preparing to Run the GUI 162

LifeKeeper GUI - Overview 162

GUI Server 162

GUI Client 162

Starting GUI clients 163

Starting the LifeKeeper GUI Applet 163

Starting the application client 163

Exiting GUI Clients 163

Configuring the LifeKeeper GUI 163

Configuring the LifeKeeper Server for GUI Administration 163

Running the GUI 164

GUI Configuration 164

GUI Limitations 165

Starting and Stopping the GUI Server 165

To Start the LifeKeeper GUI Server 165

Troubleshooting 166

To Stop the LifeKeeper GUI Server 166

LifeKeeper GUI Server Processes 166

Java Security Policy 167

Location of Policy Files 167

Policy File Creation andManagement 167

Granting Permissions in Policy Files 168

Sample Policy File 168

Java Plug-In 169

Downloading the Java Plug-in 169

Running the GUI on a Remote System 169

Configuring the GUI on a Remote System 170

Table of Contentsviii

Page 11: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Running the GUI on a Remote System 170

Applet Troubleshooting 171

Running the GUI on a LifeKeeper Server 172

Browser Security Parameters for GUI Applet 172

Firefox 172

Internet Explorer 172

Status Table 173

Properties Panel 173

Output Panel 174

Message Bar 174

Exiting the GUI 174

Common Tasks 174

Starting LifeKeeper 174

Starting LifeKeeper Server Processes 175

Enabling Automatic LifeKeeper Restart 175

Stopping LifeKeeper 175

Disabling Automatic LifeKeeper Restart 176

Viewing LifeKeeper Processes 176

Viewing LifeKeeper GUI Server Processes 177

Viewing LifeKeeper Controlling Processes 178

Connecting Servers to a Cluster 179

Disconnecting From a Cluster 179

Viewing Connected Servers 180

Viewing the Status of a Server 180

Viewing Server Properties 181

Viewing Server Log Files 181

Viewing Resource Tags and IDs 182

Viewing the Status of Resources 182

Server Resource Status 182

Global Resource Status 183

Table of Contentsix

Page 12: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Viewing Resource Properties 184

Resource Labels 184

Resource Labels 184

Viewing Message History 185

Reading theMessage History 185

Expanding and Collapsing a Resource Hierarchy Tree 186

Cluster Connect Dialog 187

Cluster Disconnect Dialog 187

Resource Properties Dialog 188

General Tab 188

Relations Tab 189

Equivalencies Tab 189

Server Properties Dialog 189

General Tab 190

CommPaths Tab 192

Resources Tab 193

Operator Tasks 194

Bringing a Resource In Service 194

Taking a Resource Out of Service 195

Advanced Tasks 195

LCD 195

LifeKeeper Configuration Database 195

Related Topics 196

LCDI Commands 196

Scenario Situation 196

Hierarchy Definition 197

LCD Configuration Data 199

Dependency Information 199

Resource Status Information 199

Inter-Server Equivalency Information 199

Table of Contentsx

Page 13: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LCD Directory Structure 200

LCD Resource Types 200

LifeKeeper Flags 200

Resources Subdirectories 201

Resource Actions 202

Structure of LCD Directory in /opt/LifeKeeper 202

LCM 203

Communication Status Information 204

LifeKeeper Alarming and Recovery 204

Alarm Classes 204

Alarm Processing 205

Alarm Directory Layout 205

LifeKeeper API for Monitoring 205

Introduction 205

Summary 205

Information to be supplied with this API 206

Communication format 207

Data format 207

How to use 210

Activate the API 210

Port number 211

Usage examples 211

Security 211

Basic Authentication 211

SSL/TLS set up 212

Modification to support SSL/TLS + Basic authentication 213

IP address access limitation 214

Error 214

Maintenance Tasks 215

Changing LifeKeeper Configuration Values 215

Table of Contentsxi

Page 14: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

File System Health Monitoring 217

Condition Definitions 217

Full or Almost Full File System 217

Unmounted or Improperly Mounted File System 218

Maintaining a LifeKeeper Protected System 218

Maintaining a Resource Hierarchy 219

Recovering After a Failover 219

Removing LifeKeeper 219

Removing via GnoRPM 220

Removing via Command Line 220

Removing Distribution Enabling Packages 221

Running LifeKeeper With a Firewall 221

LifeKeeper Communication Paths 221

LifeKeeper GUI Connections 221

LifeKeeper IP Address Resources 221

LifeKeeper Data Replication 222

Other Inter-node Communications 222

Disabling a Firewall 222

Running the LifeKeeper GUI Through a Firewall 222

Transferring Resource Hierarchies 223

Technical Notes 223

LifeKeeper Features 224

Tuning 225

LifeKeeper Operations 225

Server Configuration 227

LifeKeeper Version 8.2.0 and Later GUI Requirement 227

Confirm Failover and Block Resource Failover Settings 227

Confirm Failover On: 227

Set Block Resource Failover On: 228

Conditions/Considerations: 228

Table of Contentsxii

Page 15: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

NFS Client Options 228

NFS Client Mounting Considerations 228

UDP or TCP? 228

Sync Option in /etc/exports 228

RedHat EL6 (and Fedora 14) Clients with Red Hat EL6 NFS Server 229

RedHat EL5 NFS Clients with a Red Hat EL6 NFS Server 229

Cluster Example 229

ExpandedMulticluster Example 229

Troubleshooting 230

Common Causes of an SPS Initiated Failover 230

Server Level Causes 231

Server Failure 231

Communication Failures/Network Failures 231

Split-Brain 232

Resource Level Causes 232

Application Failure 233

File System 233

IP Address Failure 233

Reservation Conflict 234

SCSI Device 234

Known Issues and Restrictions 234

Installation 234

LifeKeeper Core 237

Internet/IP Licensing 243

GUI 243

Data Replication 246

IPv6 249

Apache 252

Oracle Recovery Kit 252

MySQL 253

Table of Contentsxiii

Page 16: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

NFS Server Recovery Kit 254

SAP Recovery Kit 256

LVM Recovery Kit 256

Multipath Recovery Kits (DMMP / HDLM / PPATH / NECSPS) 257

DMMP Recovery Kit 258

DB2Recovery Kit 259

MD Recovery Kit 259

SAP DB / MaxDB Recovery Kit 260

Sybase ASE Recovery Kit 261

WebSphereMQRecovery Kit 264

Quick Service Protection (QSP) Recovery Kit 264

GUI Troubleshooting 264

Network-Related Troubleshooting (GUI) 265

Long Connection Delays onWindows Platforms 265

From Sun FAQ: 265

Running from aModem: 265

Primary Network Interface Down: 266

NoRoute To Host Exception: 266

UnknownHost Exception: 266

FromWindows: 267

From Linux: 268

Unable to Connect to X Window Server: 268

Communication Paths Going Up and Down 269

Suggested Action 269

Incomplete Resource Created 269

Incomplete Resource Priority Modification 269

Restoring Your Hierarchy to a Consistent State 270

No Shared Storage Found When Configuring a Hierarchy 271

Recovering from a LifeKeeper Server Failure 272

Suggested Action: 272

Table of Contentsxiv

Page 17: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Recovering from a Non-Killable Process 272

Recovering From A Panic During A Manual Recovery 273

Recovering Out-of-Service Hierarchies 273

Resource Tag Name Restrictions 273

Tag Name Length 273

Valid "Special" Characters 273

Invalid Characters 273

Serial (TTY) Console WARNING 273

Taking the System to init state S WARNING 274

Thread is Hung Messages on Shared Storage 274

Explanation 274

Suggested Action: 274

SIOS DataKeeper for Linux 276

Introduction 276

Mirroring with SIOS DataKeeper for Linux 276

DataKeeper Features 276

Synchronous vs. Asynchronous Mirroring 277

Synchronous Mirroring 277

Asynchronous Mirroring 277

How SIOS DataKeeper Works 278

Synchronization (and Resynchronization) 279

StandardMirror Configuration 279

N+1Configuration 279

Multiple Target Configuration 280

SIOS DataKeeper Resource Hierarchy 281

Failover Scenarios 282

Scenario 1 282

Scenario 2 283

Scenario 3 283

Scenario 4 283

Table of Contentsxv

Page 18: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Chapter A: Installation and Configuration 285

Before Configuring Your DataKeeper Resources 285

Hardware and Software Requirements 285

Hardware Requirements 285

Software Requirements 285

General Configuration 286

Network Configuration 286

Changing the Data Replication Path 286

Determine Network Bandwidth Requirements 287

Measuring Rate of Change on a Linux System (Physical or Virtual) 287

Measuring Basic Rate of Change 288

Measuring Detailed Rate of Change 288

Analyze Collected Detailed Rate of Change Data 288

Graph Detailed Rate of Change Data 294

SIOS DataKeeper for Linux Resource Types 297

Replicate New File System 298

Replicate Existing File System 298

DataKeeper Resource 298

Resource Configuration Tasks 299

Overview 299

Creating a DataKeeper Resource Hierarchy 299

Extending Your Hierarchy 301

Extending a DataKeeper Resource 302

Unextending Your Hierarchy 304

Deleting a Resource Hierarchy 304

Taking a DataKeeper Resource Out of Service 305

Bringing a DataKeeper Resource In Service 305

Testing Your Resource Hierarchy 305

Performing aManual Switchover from the LifeKeeper GUI 305

Administration 307

Table of Contentsxvi

Page 19: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Administering SIOS DataKeeper for Linux 307

Viewing Mirror Status 307

GUI Mirror Administration 308

Force Mirror Online 309

Pause and Resume 310

PauseMirror 310

ResumeMirror 310

Set Compression Level 310

Command Line Mirror Administration 310

Mirror Actions 311

Examples: 311

Mirror Resize 311

Recommended Steps for Mirror Resize: 312

Bitmap Administration 312

Monitoring Mirror Status via Command Line 313

Example: 313

Server Failure 314

Resynchronization 314

Avoiding Full Resynchronizations 315

Method 1 315

Procedure 315

Method 2 316

Procedure 316

Using LVM with DataKeeper 318

Clustering with Fusion-io 319

Fusion-io Best Practices for Maximizing DataKeeper Performance 319

Network 319

TCP/IP Tuning 320

Configuration Recommendations 320

Using external snapshot functions for disks and devices protected by DataKeeper 321

Table of Contentsxvii

Page 20: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Multi-Site Cluster 323

SIOS Protection Suite for Linux Multi-Site Cluster 323

SIOS Protection Suite for Linux Multi-Site Cluster 323

Multi-Site Cluster Configuration Considerations 324

Local site storage destinations for bitmaps 324

Resource Configurations to be Avoided 325

Multi-Site Cluster Restrictions 325

Creating a SIOS Protection Suite for Linux Multi-Site Cluster Resource Hierarchy 325

Replicate New File System 326

Replicate Existing File System 328

DataKeeper Resource 330

Extending Your Hierarchy 332

Extending a DataKeeper Resource 334

Extending a Hierarchy to a Disaster Recovery System 335

Configuring the Restore and Recovery Setting 337

Migrating to a Multi-Site Cluster Environment 338

Requirements 338

Before You Start 338

Performing the Migration 339

Successful Migration 348

Troubleshooting 350

Recovery Kits 354

Combined Message Catalog 355

Index 479

Table of Contentsxviii

Page 21: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Introduction

About SIOS Protection Suite for LinuxSIOS Protection Suite (SPS) for Linux integrates high availability clustering with innovative data replicationfunctionality in a single, enterprise-class solution.

SPS for Linux Integrated ComponentsSIOS LifeKeeper provides a complete fault-resilient software solution to provide high availability for yourservers' file systems, applications,and processes. LifeKeeper does not require any customized, fault-toleranthardware. LifeKeeper simply requires two or more systems to be grouped in a network, and site-specificconfiguration data is then created to provide automatic fault detection and recovery.

In the case of a failure, LifeKeeper migrates protected resources from the failed server to a designated back-up server. Users experience a brief interruption during the actual switchover; however, LifeKeeper restoresoperations on the back-up server without operator intervention.

SIOS DataKeeper provides an integrated datamirroring capability for LifeKeeper environments. This featureenables LifeKeeper resources to operate in shared and non-shared storage environments.

Documentation and Training

DocumentationA complete reference providing instructions for installing, configuring, administering and troubleshooting SIOSProtection Suite for Linux.The following sections cover every aspect of SPS for Linux:

Section Description

Introduction  Provides an introduction to the SIOS Protection Suite for Linux product, including soft-ware packaging and configuration concepts.

SPS for LinuxInstallationGuide

Provides useful information for planning and setting up your SPS environment, installingand licensing SPS and configuring the LifeKeeper graphical user interface (GUI).

Configuration Contains detailed information and instructions for configuring the LifeKeeper software oneach server in your cluster.

Administration Discusses server-level tasks such as editing server properties and creating resourcesand resource-level tasks such as editing, extending or deleting resources.

SIOS Protection Suite for LinuxPage 1

Page 22: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Training

Section Description

User's GuideContains detailed information on the LifeKeeper GUI, including themany tasks that canbe performed within the LifeKeeper GUI. Also includes a Technical Notes section alongwith many more Advanced Topics.

DataKeeper Contains planning and installation instructions as well as administration, configurationand user information for SIOS DataKeeper for Linux.

Troubleshooting Describes known issues and restrictions and suggests solutions to problems that may beencountered during installation, configuration and/or use of SIOS LifeKeeper for Linux.

Recovery KitsContains planning and installation instructions as well as administration, configurationand user information for the Optional Recovery Kits that allow LifeKeeper to manage andcontrol specific applications.

Error CodeSearch

Provides a listing of all messages that may be encountered while using SIOS ProtectionSuite for Linux and, where appropriate, provides additional explanation of the cause of theerrors and necessary action to resolve the error condition. This full listingmay besearched for any error code received.

TrainingSPS training is available through SIOS Technology Corp. or through your reseller. Contact your salesrepresentative for more information.

Technical SupportAs a SIOS Technology Corp. customer with a valid Support contract, you are entitled to access the SIOSTechnology Corp. Support Self-Service Portal.

The SIOS Technology Corp. Support Self-Service Portal offers you the following capabilities:

l Search ourSolution Knowledge Base to find solutions to problems and answers to questions

l Always on 24/7 service with the SIOS Technology Corp. Support team to:

l Log a Case to report new incidents.

l View Cases to see all of your open and closed incidents.

l Review Top Solutions providing information on themost popular problem resolutionsbeing viewed by our customers.

Contact SIOS Technology Corp. Support at [email protected] to set up and activate your Self-ServicePortal account. 

You can also contact SIOS Technology Corp. Support at:

1-877-457-5113 (Toll Free)

SIOS Protection Suite for LinuxPage 2

Page 23: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Technical Support

1-803-808-4270 (International)

Email: [email protected]

SIOS Protection Suite for LinuxPage 3

Page 24: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

SIOS LifeKeeper for Linux

IntroductionSIOS LifeKeeper for Linux provides high availability clustering for up to 32 nodes with many supported storageconfigurations, including shared storage (Fiber Channel SAN, iSCSI), network attached storage (NAS), host-based replication and integration with array-based SAN replication including HP Continuous Access.

Protected ResourcesThe LifeKeeper family of products includes software that allows you to provide failover protection for a rangeof system resources. The following figure demonstrates LifeKeeper's flexibility and identifies the resourcetypes you can specify for automatic recovery:

l File systems. LifeKeeper allows for the definition and failover of file systems, such as ext3, ext4,NFS, vxfs or xfs.

l Communications resources. LifeKeeper provides communications Recovery Kits forcommunications resources, such as TCP/IP.

l Infrastructure resources. LifeKeeper provides optional Recovery Kits for Linux infrastructureservices, such as NFS, Samba, LVM, WebSphereMQ, and software RAID (md).

l Web Server resources. LifeKeeper provides an optional Recovery Kit for ApacheWeb Serverresources.

l Databases and other applications. LifeKeeper provides optional Recovery Kits for major RDBMSproducts such as Oracle, MySQL and PostgreSQL, Sybase, SAP DB/MaxDB, and SAP HANA DB1.0/SAP HANA DB 2.0, and for enterprise applications such as SAP.

l Cloud resources. LifeKeeper provides communications Recovery Kits for communicationsresources, such as EC2 EIP, Route53, andOpenSwan.

 LifeKeeper supports N-Way Recovery for a range of resource types.

SIOS Protection Suite for LinuxPage 4

Page 25: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Core

LifeKeeper CoreLifeKeeper Core is composed of four major components:

l LifeKeeper Core Software

l File System, Generic Application, Raw I/O and IP Recovery Kit Software

l LifeKeeper GUI Software

l LifeKeeper Man Pages

LifeKeeper Core SoftwareThe LifeKeeper Core Software consists of the following components:

l LifeKeeper Configuration Database (LCD) - The LCD stores information about the LifeKeeper-protected resources. This includes information on resource instances, dependencies, sharedequivalencies, recovery direction, and LifeKeeper operational flags. The data is cached in sharedmemory and stored in files so that the data can be remembered over system boots.

l LCD Interface (LCDI) - The LCDI queries the configuration database (LCD) to satisfy requests for dataor modifications to data stored in the LCD. The LCDI may also be used by the Application RecoveryKit to obtain resource state or description information.

l LifeKeeper Communications Manager (LCM) - The LCM is used to determine the status of servers inthe cluster and for LifeKeeper inter-process communication (local and remote). Loss of LCMcommunication across all communication paths on a server in the cluster indicates the server has

SIOS Protection Suite for LinuxPage 5

Page 26: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

File System, Generic Application, IP and RAW I/O Recovery Kit Software

failed.

l LifeKeeper Alarm Interface - The LifeKeeper Alarm Interface provides the infrastructure for triggering anevent. The sendevent program is called by application daemons when a failure is detected in aLifeKeeper-protected resource. The sendevent program communicates with the LCD to determine ifrecovery scripts are available.

l LifeKeeper Recovery Action and Control Interface (LRACI) - The LRACI determines the appropriaterecovery script to execute for a resource and invokes the appropriate restore / remove scripts for theresource.

File System, Generic Application, IP and RAW I/O Recovery Kit SoftwareThe LifeKeeper Core provides protection of specific resources on a server. These resources are:

l File Systems - LifeKeeper allows for the definition and failover of file systems on shared storagedevices. A file system can be created on a disk that is accessible by two servers via a shared SCSIbus. A LifeKeeper file system resource is created on the first server and then extended to the secondserver. File System Health Monitoring detects disk full and improperly mounted (or unmounted) filesystem conditions. Depending on the condition detected, the Recovery Kit may log a warningmessage, attempt a local recovery, or failover the file system resource to the backup server.

Specific help topics related to the File System Recovery Kit include Creating and Extending a FileSystem Resource Hierarchy and File System Health Monitoring.

l Generic Applications - TheGeneric Application Recovery Kit allows protection of a generic or user-defined application that has no predefined Recovery Kit to define the resource type. This kit allows auser to definemonitoring and recovery scripts that are customized for a specific application.

Specific help topics related to the Generic Application Recovery Kit include Creating and Extending aGeneric Application Resource Hierarchy.

l IP Addresses - The IP Recovery Kit provides amechanism to recover a "switchable" IP address froma failed primary server to one or more backup servers in a LifeKeeper environment. A switchable IPaddress is a virtual IP address that can switch between servers and is separate from the IP addressassociated with the network interface card of each server. Applications under LifeKeeper protectionare associated with the switchable IP address, so if there is a failure on the primary server, theswitchable IP address becomes associated with the backup server. The resource under LifeKeeperprotection is the switchable IP address. 

Refer to the IP Recovery Kit Technical Documentation included with the Recovery Kit for specificproduct, configuration and administration information.

l RAW I/O - The RAW I/O Recovery Kit provides support for raw I/O devices for applications that preferto bypass kernel buffering. The RAW I/O Recovery Kit allows for the definition and failover of rawdevices bound to shared storage devices. The raw devicemust be configured on the primary nodeprior to resource creation. Once the raw resource hierarchy is created, it can be extended to additionalservers.

SIOS Protection Suite for LinuxPage 6

Page 27: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper GUI Software

l Quick Service Protection (QSP) - QSP Recovery Kit provides amechanism to simply protect OSservices. Resources can be created for services that can be started/stopped with OS servicecommands. Generic Applications can provide the same protection, but QSP doesn’t require codedevelopment. Also, you can create dependencies to start/stop services with applications protected byother resources.

However, QuickCheck of QSP only performs a simple check (using service command’s “status”) anddoes not ensure the provision of the services and running of the processes. If complicated start/stopprocessing or robust check is required, please consider the use of Generic Applications.

For other topics regarding QSP, please see “Creating/extending QSP resources.”

LifeKeeper GUI SoftwareThe LifeKeeper GUI is a client / server application developed using Java technology that provides a graphicaladministration interface to LifeKeeper and its configuration data. The LifeKeeper GUI client is implemented asboth a stand-alone Java application and as a Java applet invoked from aweb browser.

LifeKeeper Man PagesThe LifeKeeper Core referencemanual pages for the LifeKeeper product.

Configuration ConceptsLifeKeeper works on the basis of resource hierarchies you define for groups of two or more servers. Thefollowing topics introduce the LifeKeeper failover configuration concepts:

Common Hardware ComponentsAll LifeKeeper configurations share these common components:

1. Server groups. The basis for the fault resilience provided by LifeKeeper is the grouping of two or moreservers into a cluster. The servers can be any supported platform running a supported distribution ofLinux. LifeKeeper gives you the flexibility to configure servers in multiple overlapping groups, but, forany given recoverable resource, the critical factor is the linking of a group of servers with defined rolesor priorities for that resource. The priority of a server for a given resource is used to determine whichserver will recover that resource should there be a failure on the server where it is currently running.The highest possible priority value is one (1). The server with the highest priority value (normally 1) for agiven resource is typically referred to as the primary server for that resource; any other servers aredefined as backup servers for that resource.

2. Communications paths. The LifeKeeper heartbeat, a periodic message between servers in aLifeKeeper cluster, is a key fault detection facility. All servers within the cluster require redundantheartbeat communications paths (or, comm paths) to avoid system panics due to simplecommunications failures. Two separate LAN-based (TCP) comm paths using dual independentsubnets are recommended (at least one of these should be configured as a private network); however,

SIOS Protection Suite for LinuxPage 7

Page 28: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CommonHardware Components

using a combination of TCP and TTY comm paths is supported. A TCP comm path can also be usedfor other system communications.

Note: A TTY comm path is used by LifeKeeper only for detecting whether  other servers in the clusterare alive. The LifeKeeper GUI uses TCP/IP for communicating status information about protectedresources; if there are two TCP comm paths configured, LifeKeeper uses the comm path on the publicnetwork for communicating resource status. Therefore if the network used by the LifeKeeper GUI isdown, the GUI will show hierarchies on other servers in an UNKNOWN state, even if the TTY (or otherTCP) comm path is operational.

3. Shared data resources. In shared storage configurations, servers in the LifeKeeper cluster shareaccess to the same set of disks. In the case of a failure of the primary server, LifeKeeper automaticallymanages the unlocking of the disks from the failed server and the locking of the disks to the nextavailable back-up server.

4. Shared communication. LifeKeeper can automatically manage switching of communicationsresources, such as TCP/IP addresses, allowing users to connect to the application regardless ofwhere the application is currently active.

SIOS Protection Suite for LinuxPage 8

Page 29: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Components Common to All LifeKeeper Configurations

Components Common to All LifeKeeper Configurations

System Grouping ArrangementsA resource hierarchy is defined on a cluster of LifeKeeper servers. For a given hierarchy, each server isassigned a priority, with one (1) being the highest possible priority. The primary, or highest priority, server isthe computer you want to use for the normal operation of those resources. The server having the secondhighest priority is the backup server to which you want LifeKeeper to switch those resources should theprimary server fail.

In an active/active group, all servers are active processors, but they also serve as the backup server forresource hierarchies on other servers. In an active/standby group, the primary server is processing and anyone of the backup servers can be configured to stand by in case of a failure on the primary server. The standbysystems can be smaller, lower-performance systems, but they must have the processing capability to assureresource availability should the primary server fail.

SIOS Protection Suite for LinuxPage 9

Page 30: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Active - Active Grouping

Your physical connections and access to the shared resources determine your grouping options. To begrouped, servers must have communications and heartbeat paths installed and operational, and all serversmust have access to the disk resources through a shared SCSI or Fibre Channel interface. For example, inthe following diagram, there is only one grouping option for the resourceAppA on Server 1. Server 2 is the onlyother server in the configuration that has shared access to theAppA database.

The resourceAppB on Server 3, however, could be configured for a group including any one of the other threeservers, because the shared SCSI bus in this example provides all four servers in the configuration access totheAppB database.

Active - Active GroupingIn an active/active pair configuration, all servers are active processors; they also serve as the backup serverfor resource hierarchies on other servers.

For example, the configuration example below shows two active/active pairs of servers. Server 1 isprocessingAppA, but also serves as the backup server forAppX running on Server 2. The reverse is also true.Server 2 is processingAppX, but also serves as the backup server forAppA running on Server 1. Servers 3and 4 have the same type of active/active relationships.

Although the configurations on Servers 1 and 2 and the configurations on Servers 3 and 4 are similar, there is acritical difference. For theAppA andAppX applications, Servers 1 and 2 are the only servers available forgrouping. They are the only servers that have access to the shared resources.

AppB andAppC, however, have several grouping options because all four servers have access to theAppBandAppC shared resources. AppB andAppC could also be configured to failover to Server1 and/or Server2 asa third or even fourth backup system.

SIOS Protection Suite for LinuxPage 10

Page 31: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Active - Standby Grouping

Note: Because LifeKeeper applies locks at the disk level, only one of the four systems connected to theAppBandAppC disk resources can have access to them at any time. Therefore, when Server 3 is activelyprocessingAppB, those disk resources are no longer available to Servers 1, 2, and 4, even though they havephysical connections.

Active - Standby GroupingIn an active/standby pair configuration, the primary server is processing, and the back-up servers are standingby in case of a failure on the primary server. The standby systems can be smaller, lower-performancesystems, but they must have the processing capability to assure resource availability should the primaryserver fail.

SIOS Protection Suite for LinuxPage 11

Page 32: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Intelligent Versus Automatic Switchback

A standby server can provide backup for more than one active server. For example in the figure above, Server2 is the standby server in three active/standby resource pairs. The LifeKeeper resource definitions specify thefollowing active/standby paired relationships:

l AppA onServer1 fails over toServer2.

l AppB onServer3 fails over toServer2.

l AppC onServer4 fails over toServer2.

Be aware of these three critical configuration concepts when you are considering configurations with multipleactive/standby groups:

l Disk ownership. Different active applications cannot use disk partitions on the same shared disk orLUN from different servers. LifeKeeper applies locks at the disk or LUN level. When the SCSI locksare applied, only one system on the shared SCSI bus can access partitions on the disk or LUN. Thisrequires that applications accessing different partitions on the same disk be active on the same server.In the example, Server 3 has ownership of theAppB disk resources and Server 4 owns theAppCresources.

l Processing capacity. Although it is unlikely that Servers 1, 3 and 4 would fail at the same time, youmust take care when designating a standby server to support multiple resource relationships so thatthe standby server can handle all critical processing shouldmultiple faults occur.

l LifeKeeper administration. In the example, Server 2 provides backup for three other servers. Ingeneral it is not desirable to administer the LifeKeeper database on the different logical groupssimultaneously. You should first create the resources between the spare and one active system, thenbetween the spare and another active system, and so on.

Intelligent Versus Automatic SwitchbackBy default, the switchback setting of a resource is intelligent. This means that once the failover occurs for thatresource from Server A toServer B, the resource remains onServer B until another failure or until anadministrator intelligently switches the resource to another server. Thus, the resource continues to run onServer B even afterServer A returns to service. Server A now serves as a backup for the resource.

In some situations, it may be desirable for a resource to switch back automatically to the original failed serverwhen that server recovers. LifeKeeper offers an automatic switchback option as an alternative to the defaultintelligent switchback behavior described above. This option can be configured for individual resourcehierarchies on individual servers. If automatic switchback is selected for a resource hierarchy on a givenserver and that server fails, the resource hierarchy is failed over to a backup system; when the failed serverrecovers, the hierarchy is automatically switched back to the original server.

Notes:

l For automatic switchback, switch back will take place automatically after the primary server comesback online and LifeKeeper communications path is re-established.

l LifeKeeper never performs an automatic switchback from a higher priority server to a lower priorityserver.

SIOS Protection Suite for LinuxPage 12

Page 33: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LoggingWith syslog

Logging With syslogBeginning with LifeKeeper 8.0, logging is done through the standard syslog facility. LifeKeeper supportsthree syslog implementations: standard syslog, rsyslog, and syslog-ng. During package installation,syslog will be configured to use the "local6" facility for all LifeKeeper logmessages. The syslogconfiguration file (for example, /etc/syslog-ng/syslog-ng.conf) will bemodified to include LifeKeeper-specificrouting sending all LifeKeeper logmessages to /var/log/lifekeeper.log. (The original configuration file will bebacked up with the same name ending in "~".)

Important: DONOT edit LifeKeeper unique setups manually, or, upgrading or uninstallingmaynot function correctly.

The facility can be changed after installation by using the lklogconfig tool located in /opt/LifeKeeper/bin.For example, changing the facility to local5, run the following command instead.

lkstop -flklogconfig --action=update --facility=local5lkstart

See the lklogconfig(8) manpage on a system with LifeKeeper installed for more details on this tool.

If a generic resource script puts amessage into /opt/LifeKeeper/out/log directly, LifeKeeper will send a logmessage with the error severity level as the default into /var/log/lifekeeper.log. The severity level can bechanged to information level by adding the following parameter into /etc/default/LifeKeeper, LOGMGR_LOGLEVEL=LK_INFO.

Note:When LifeKeeper is removed from a server, the LifeKeeper-specific syslog configuration will beremoved.

Resource HierarchiesThe LifeKeeper GUI enables you to create a resource hierarchy on one server, then extend that hierarchy toone or more backup servers. LifeKeeper then automatically builds the designated hierarchies on all serversspecified.  LifeKeeper maintains hierarchy information in a database on each server. If you use the commandline interface, youmust explicitly define the hierarchy on each server.

After you create the resource hierarchy, LifeKeeper manages the stopping and starting of the resources withinthe hierarchy. The related topics below provide background for hierarchy definition tasks.

Resource TypesA resource can be either a hardware or software entity, categorized by resource type. LifeKeeper supplies filesystem and SCSI resource types, and the recovery kits provide communications, RDBMS and otherapplication resource types.

For example, a hierarchy for a protected file system includes instances for resources of the following types:

SIOS Protection Suite for LinuxPage 13

Page 34: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Resource States

l filesys. Linux file system resource objects identified by their mount point.

l device. SCSI disk partitions and virtual disks, identified by their device file names, for example sdc1.

l disk. SCSI disks or RAID system logical units, identified by SCSI device name, for example sd.

Resource StatesState Meaning

In-Service,Protected(ISP)

Resource is operational. LifeKeeper local recovery operates normally. LifeKeeper inter-serverrecovery and failure recovery is operational.

In-Service,Unprotected(ISU)

Resource is operational. However, no local recovery or failure recovery will occur becausethe LifeKeeper protection is not operational.

Note:When the file system protected by the file system resource(filesys) has reached atleast 90% (the default threshold) of its capacity, the resource status will be changed to ISU toalert the user. In this case, themonitoring process is continued. For themonitoring of filesystem resources and its capacity, the ISU state is used differently from other resourcetypes. Once the file system capacity drops below the threshold, the resource state will returnto ISP.

Out-of-Service,Failed(OSF)

Resource has gone out-of-service because of a failure in the resource. Recovery has notbeen completed or has failed. LifeKeeper alarming is not operational for this resource.

SIOS Protection Suite for LinuxPage 14

Page 35: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hierarchy Relationships

State MeaningOut-of-Service,Unimpaired(OSU)

Resource is out-of-service but available to take over a resource from another server.

Illegal(Undefined)State(ILLSTATE)

This state appears in situations where no state has been set for a resource instance. Undernormal circumstances, this invalid state does not last long: a transition into one of the otherstates is expected. This state will occur if switchover occurs before all LifeKeeperinformation tables have been updated (for example, when LifeKeeper is first started up).

Hierarchy RelationshipsLifeKeeper allows you to create relationships between resource instances.  The primary relationship is adependency, for example one resource instance depends on another resource instance for its operation . Thecombination of resource instances and dependencies is the resource hierarchy.

For example, since /usr1 depends on its operation upon the disk subsystem, you can create an orderedhierarchy relationship between /usr1 and those instances representing the disk subsystem.

The dependency relationships specified by the resource hierarchy tell LifeKeeper the appropriate order forbringing resource instances in service and out-of-service. In the example resource hierarchy, LifeKeepercannot bring the /usr1 resource into service until it successfully brings into service first the disk and deviceinstances.

SIOS Protection Suite for LinuxPage 15

Page 36: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Shared Equivalencies

Shared EquivalenciesWhen you create and extend a LifeKeeper resource hierarchy, the hierarchy exists on both the primary and thesecondary servers. Most resource instances can be active on only one server at a time. For such resources,LifeKeeper defines a second kind of relationship called a shared equivalency that ensures that when theresource is in-service on one server, it is out-of-service on the other servers on which it is defined.

In the example below, the shared equivalency between the disk partition resource instances on each server isrepresented. Each resource instance will have a similar equivalency in this example.

Resource Hierarchy InformationThe resource status of each resource is displayed in the Detailed Status Display and the Short StatusDisplay. The LifeKeeper tag names of root resources are displayed beginning in the left-most position of theTAG column, with tag names of resources within the hierarchy indented appropriately to indicate dependencyrelationships between resources.

The following sample is from the resource hierarchy section of a short status display (the device and disk ID'sare shortened to fit in the display area):

LOCAL TAG ID STATE PRIO PRIMARY

svr1 app3910-on-svr1 app4238 ISP 1 svr2

svr1 filesys4083 /jrl1 ISP 1 svr2

svr1        device2126 000...300-1 ISP 1 svr2

svr1             disk2083 000...300 ISP 1 svr2

SIOS Protection Suite for LinuxPage 16

Page 37: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Resource Hierarchy Example

See the topic Resource Hierarchy Example for an illustration of a hierarchy. For more information, see theResource Hierarchy Information section of the topics Detailed Status Display and Short Status Display.

Resource Hierarchy Example

Detailed Status DisplayThis topic describes the categories of information provided in the detailed status display as shown in thefollowing example of output from the lcdstatus command. For information on how to display this information,see the LCD(1M)man page. At the command line, you can enter eitherman lcdstatus orman LCD. Forstatus information available in the LifeKeeper GUI, see Viewing the Status of a Server or Viewing the Statusof Resources.

Example of detailed status display: 

Resource Hierarchy Information

Resource hierarchies for machine "wileecoyote":

ROOT of RESOURCE HIERARCHY

apache-home.fred: id=apache-home.fred app=webserver type=apache state=ISP

initialize=(AUTORES_ISP) automatic restore to IN-SERVICE by LifeKeeper

info=/home/fred /usr/sbin/httpd

reason=restore action has succeeded

depends on resources: ipeth0-172.17.104.25,ipeth0-172.17.106.10,ipeth0-172.17.106.105

SIOS Protection Suite for LinuxPage 17

Page 38: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Detailed Status Display

Local priority = 1

SHARED equivalency with "apache-home.fred" on "roadrunner", priority = 10

FAILOVER ALLOWED

ipeth0-172.17.104.25: id=IP-172.17.104.25 app=comm type=ip state=ISP

initialize=(AUTORES_ISP) automatic restore to IN-SERVICE by LifeKeeper

info=wileecoyote eth0 172.17.104.25 fffffc00

reason=restore action has succeeded

these resources are dependent: apache-home.fred

Local priority = 1

SHARED equivalency with "ipeth0-172.17.104.25" on "roadrunner", priority = 10

FAILOVER ALLOWED

ipeth0-172.17.106.10: id=IP-172.17.106.10 app=comm type=ip state=ISP

initialize=(AUTORES_ISP) automatic restore to IN-SERVICE by LifeKeeper

info=wileecoyote eth0 172.17.106.10 fffffc00

reason=restore action has succeeded

these resources are dependent: apache-home.fred

Local priority = 1

SHARED equivalency with "ipeth0-172.17.106.10" on "roadrunner", priority = 10

FAILOVER ALLOWED

ipeth0-172.17.106.105: id=IP-172.17.106.105 app=comm type=ip state=ISP

initialize=(AUTORES_ISP) automatic restore to IN-SERVICE by LifeKeeper

info=wileecoyote eth0 172.17.106.105 fffffc00

reason=restore action has succeeded

These resources are dependent: apache-home.fred

Local priority = 1

SHARED equivalency with "ipeth0-172.17.106.105" on "roadrunner", priority = 10

FAILOVER ALLOWED

Communication Status Information

The following LifeKeeper servers are known:

machine=wileecoyote state=ALIVE

SIOS Protection Suite for LinuxPage 18

Page 39: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Resource Hierarchy Information

machine=roadrunner state=DEAD (eventslcm detected failure at Wed Jun 7 15:45:14 EDT2000)

The following LifeKeeper network connections exist:

to machine=roadrunner type=TCP addresses=192.168.1.1/192.168.105.19

state="DEAD" priority=2 #comm_downs=0

LifeKeeper FlagsThe following LifeKeeper flags are on:

shutdown_switchover

Shutdown Strategy

The shutdown strategy is set to: switchover.

Resource Hierarchy InformationLifeKeeper displays the resource status beginning with the root resource.  The display includes informationabout all resource dependencies.

Elements common tomultiple resources appear only once under the first root resource. The first line for eachresource description displays the resource tag name followed by a colon (:), for example: device13557:.These are the information elements that may be used to describe the resources in the hierarchy:

l id. Unique resource identifier string used by LifeKeeper.

l app. Identifies the type of application, for example the sample resource is awebserver application.

l type. Indicates the resource class type, for example the sample resource is anApache application.

l state. Current state of the resource:

l ISP—In-service locally and protected.

l ISU—In-service, unprotected.

l OSF—Out-of-service, failed.

l OSU—Out-of-service, unimpaired.

l initialize. Specifies the way the resource is to be initialized, for example LifeKeeper restores theapplication resource, but the host adapter initializes without LifeKeeper.

l info. Contains object-specific information used by the object's remove and restore scripts.

l reason. If present, describes the reason the resource is in its current state. For example, anapplicationmight be in the OSU state because it is in-service (ISP or ISU) on another server. Sharedresources can be active on only one of the grouped servers at a time.

SIOS Protection Suite for LinuxPage 19

Page 40: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Communication Status Information

l depends on resources.  If present, lists the tag names of the resources on which this resourcedepends.

l these resources are dependent.  If present, indicates the tag names of all parent resources that aredirectly dependent on this object.

l Local priority. Indicates the failover priority value of the targeted server, for this resource.

l SHARED equivalency. Indicates the resource tag and server name of any remote resources withwhich this resource has a defined equivalency, along with the failover priority value of the remoteserver, for that resource.

l FAILOVER ALLOWED. If present, indicates that LifeKeeper is operational on the remote serveridentified in the equivalency on the line above, and the application is protected against failure.FAILOVER INHIBITED means that the application is not protected due to either the shutting down ofLifeKeeper or the stopping of the remote server.

Communication Status InformationThis section of the status display lists the servers known to LifeKeeper and their current state, followed byinformation about each communications path.

These are the communications information elements you can see on the status display:

l State. Status of communications path. These are the possible communications state values:

l ALIVE. Functioning normally

l DEAD. No longer functioning normally

l priority. The assigned priority value for the communications path.This item is displayed only for TCPpaths.

l #comm_downs. The number of times the port has failed and caused a failover. The path failurecauses a failover only if no other communications paths aremarked "ALIVE" at the time of the failure.

In addition, the status display can provide any of the following statistics maintained only for TTYcommunications paths:

l wrpid. Each TTY communications path has unique reader and writer processes. Thewrpid fieldcontains the process ID for the writer process. The writer process sleeps until one of two conditionsoccurs:

l Heartbeat timer expires, causing the writer process to send amessage.

l Local process requests the writer process to transmit a LifeKeeper maintenancemessage tothe other server. The writer process transmits themessage, using its associated TTY port, tothe reader process on that port on the other system.

l rdpid. Each TTY communications path has unique reader and writer processes. The rdpid fieldcontains the process ID for the reader process. The reader process sleeps until one of two conditionsoccurs:

SIOS Protection Suite for LinuxPage 20

Page 41: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Flags

l Heartbeat timer expires and the reader process must determine whether the predefinedheartbeat intervals have expired. If so, the reader process marks the communications path inthe DEAD state, which initiates a failover event if there are no other communications pathsmarked ALIVE.

l Remote system writer process transmits a LifeKeeper maintenancemessage, causing thereader process to perform the protocol necessary to receive themessage.

l #NAKs. Number of times the writer process received a negative acknowledgment (NAK). ANAK messagemeans that the reader process on the other system did not accept amessage packetsent by the writer process, and the writer process had to re-transmit themessage packet. The #NAKsstatistic can accumulate over a long period of time due to line noise. If, however, you see the numbersincreasing rapidly, you should perform diagnostic procedures on the communications subsystem.

l #chksumerr. Number of mismatches in the check summessage between the servers. This statisticcan accumulate over a long period of time due to line noise. If, however, you see the numbersincreasing rapidly, you should perform diagnostic procedures on the communications subsystem.

l #incmpltmes. Number of times the incomingmessage packet did not match the expected size. Ahigh number of mismatches may indicate that you should perform diagnostic procedures on thehardware port associated with the communications path.

l #noreply. Number of times the writer process timed out while waiting for an acknowledgment and hadto re-transmit themessage. Lack of acknowledgment may indicate an overloaded server or it cansignal a server failure.

l #pacresent. Number of times the reader process received the same packet. This can happen whenthe writer process on the sending server times out and resends the samemessage.

l #pacoutseq. Number of times the reader received packets out of sequence. High numbers in thisfield can indicate lost message packets andmay indicate that you should perform diagnosticprocedures on the communications subsystem.

l #maxretrys. Metric that increments for a particular message when themaximum retransmission countis exceeded (forNAK and noreply messages). If you see a high number in the #maxretrys field, youshould perform diagnostic procedures on the communications subsystem.

LifeKeeper FlagsNear the end of the detailed status display, LifeKeeper provides a list of the flags set for the system. Acommon type is a Lock LCD flag used to ensure that other processes wait until the process lock completes itsaction. The following is the standard LCD lock format:

!action!processID!time!machine:id.

These are examples of general LCD lock flags:

l !action!02833!701236710!server1:filesys. The creation of a file system hierarchy produces a flag inthis format in the status display. The filesys designation can be a different resource type for otherapplication resource hierarchies, or app for generic or user-defined applications.

l Other typical flags include !nofailover!machine, !notarmode!machine, and shutdown_switchover. The

SIOS Protection Suite for LinuxPage 21

Page 42: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Shutdown Strategy

!nofailover!machine and !notarmode!machineflags are internal, transient flags created and deleted byLifeKeeper, which control aspects of server failover. The shutdown_switchover flag indicates that theshutdown strategy for this server has been set to switchover such that a shutdown of the server willcause a switchover to occur. See the LCDI-flag(1M) for more detailed information on the possibleflags.

Shutdown StrategyThe last item on the detailed status display identifies the LifeKeeper shutdown strategy selected for thissystem. See Setting Server Shutdown Strategy for more information.

Short Status DisplayThis topic describes the categories of information provided in the short status display as shown in thefollowing example of output from the lcdstatus -e command. For information on how to display thisinformation, see the LCD(1M)man page. At the command line, you can enter eitherman lcdstatus ormanLCD. For status information available in the LifeKeeper GUI, see Viewing the Status of a Server or Viewingthe Status of Resources. 

Example of Short Status Display:

Resource Hierarchy Information

BACKUP TAG ID STATE PRIO PRIMARY

svr1 appfs3910-on-svr1 appfs4238 ISP 1 svr2

svr1 filesys4083 /jrl1 ISP 1 svr2

svr1        device2126 000...300-1 ISP 1 svr2

svr1             disk2083 000...300 ISP 1 svr2

Communication Status Information

MACHINE NETWORK ADDRESSES/DEVICE STATE  PRIO

svr1 TCP 100.10.1.20/100.11.1.21 ALIVE 1

svr1 TTY /dev/ttyS0 ALIVE --

Resource Hierarchy InformationLifeKeeper displays the resource status of each resource. The LifeKeeper tag names of root resources aredisplayed beginning in the left-most position of the TAG column, with tag names of resources within thehierarchy indented appropriately to indicate dependency relationships between resources.

SIOS Protection Suite for LinuxPage 22

Page 43: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Communication Status Information

TheBACKUP column indicates the next system in the failover priority order, after the system for which thestatus display pertains. If the target system is the lowest priority system for a given resource, theBACKUPcolumn for that resource contains dashes (for example, ------).

l TAG column. Contains the root tag for the resource.

l ID column. Contains each resource’s identifier string.

l STATE column. Contains the current state of each resource, as described in Resource States.

l PRIO column. Contains the failover priority value of the local server, for each resource.

l PRIMARY column. Contains the name of the server with the highest priority, for each resource.

Communication Status InformationThis section of the display lists each communications path that has been defined on the target system.  Foreach path, the following information is provided.

l MACHINE. Remote server name for the communications path.

l NETWORK. The type of communications path (TCP or TTY)

l ADDRESSES/DEVICE. The pair of IP addresses or device name for the communications path

l STATE. The state of the communications path (ALIVE or DEAD)

l PRIO. For TCP paths, the assigned priority of the path. For TTY paths, this columnwill containdashes(----), since TTY paths do not have an assigned priority.

Fault Detection and Recovery ScenariosTo demonstrate how the various LifeKeeper components work together to provide fault detection andrecovery, see the following topics that illustrate and describe three types of recovery scenarios:

IP Local RecoverySIOS recommends the use of bonded interfaces via the standard Linux NIC bondingmechanism in anyLifeKeeper release where a backup interface is required. Beginning with LifeKeeper Release 7.4.0, bondedinterfacing is the only supportedmethod. For releases prior to 7.4.0, the backup interface feature in the IP kit,described below, can be used.

The IP local recovery feature allows LifeKeeper to move a protected IP address from the interface on which itis currently configured to another interface in the same server when a failure has been detected by the IPRecovery Kit. Local recovery provides you an optional backupmechanism so that when a particular interfacefails on a server, the protected IP address can bemade to function on the backup interface, therefore avoidingan entire application/resource hierarchy failing over to a backup server. 

Local Recovery Scenario

SIOS Protection Suite for LinuxPage 23

Page 44: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Command Line Operations

IP local recovery allows you to specify a single backup network interface for each LifeKeeper-protected IPaddress on a server. In order for the backup interface to work properly, it must be attached to the samephysical network as the primary interface. The system administrator is expected to insure that a validinterface is being chosen. Note that it is completely reasonable and valid to specify a backup interface on oneserver but not on another within the cluster (i.e. the chosen backup interface on one server has no impact onthe choice of a backup on any other server).

When a failure of an IP address is detected by the IP Recovery Kit, the resulting failure triggers the executionof the IP local recovery script. LifeKeeper first attempts to bring the IP address back in service on the currentnetwork interface. If that fails, LifeKeeper checks the resource instance to determine if there is a backupinterface available. If so, it will then attempt to move the IP address to the backup interface. If all localrecovery attempts fail, LifeKeeper will perform a failover of the IP address and all dependent resources to abackup server.

The backup interface name can be identified in the Information field of the IP resource instance. TheInformation field values are space-separated and are, in order, the primary server name, the network interfacename, the IP address, the netmask and the backup interface name. Here is an example:

ServerA eth0 172.17.106.10 fffffc00 eth1

If no backup interface is configured, the 5th field value will be set to none.

When the protected IP address is moved to the backup interface, the 2nd and 5th field values are swapped sothat the original backup interface becomes the primary and vice versa. The result is that during LifeKeeperstartups, switchovers and failovers, LifeKeeper always attempts to bring the IP address in service on theinterface on which it was last configured.

Command Line OperationsIn LifeKeeper for Linux v3.01 or later, themechanism for adding or removing a backup interface from anexisting IP resource instance is provided as a command line utility. This capability is provided by the lkipbuutility. The command and syntax are:

lkipbu [-d machine] -{a|r} -t tag -f interface

The add operation (specified via the -a option) will fail if a backup interface has already been defined for thisinstance or if an invalid interface name is provided. The remove operation (specified via the -r option) will fail ifthe specified interface is not the current backup interface for this instance.

A command linemechanism is also provided for manually moving an IP address to its backup interface. Thiscapability is specified via the -m option using the following syntax:

lkipbu [-d machine] -m -t tag

This operation will fail if there is no backup interface configured for this instance. If the specified resourceinstance is currently in service, themove will be implemented by using the ipaction remove operation toun-configure the IP address on the current interface, and ipaction restore to configure it on the backupinterface. Following themove, the execute_broadcast_ping function will be used to verify theoperation of the address on the new interface, and if successful, the interface values will be swapped in the IPresource instance INFO field. If the specified IP resource instance is out-of-service when this command isexecuted, the primary and backup interface values will simply be swapped in the INFO field.

SIOS Protection Suite for LinuxPage 24

Page 45: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Resource Error Recovery Scenario

The lkipbu utility also provides an option for retrieving the currently defined primary and backup interfacesfor a specified IP resource instance along with the state of the resource on the primary interface (up or down).This capability is specified via the -s option using the following syntax:

lkipbu [-d machine] -s -t tag

The output will be similar to the following:

IP address: 172.17.106.10

Netmask: 255.255.252.0

Primary interface: eth0 (up)

Backup interface: eth1

Refer to the lkipbu(8) man page for further detail.

Resource Error Recovery ScenarioLifeKeeper provides a real-time daemonmonitor, lkcheck, to check the status and health of LifeKeeper-protected resources. For each in-service resource, lkcheck periodically calls the quickCheck script for thatresource type. The quickCheck script performs a quick health check of the resource, and if the resource isdetermined to be in a failed state, the quickCheck script calls the event notificationmechanism, sendevent.

The following figure illustrates the recovery process tasks when lkcheck initiates the process:

1. lkcheck runs. By default, the lkcheck process runs once every twominutes. When lkcheck runs, itinvokes the appropriate quickCheck script for each in-service resource on the system.

2. quickCheck script checks resource. The nature of the checks performed by the quickCheck script isunique to each resource type. Typically, the script simply verifies that the resource is available toperform its intended task by imitating a client of the resource and verifying that it receives the expected

SIOS Protection Suite for LinuxPage 25

Page 46: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Resource Error Recovery Scenario

response.

3. quickCheck script invokes sendevent. If the quickCheck script determines that the resource is in afailed state, it initiates an event of the appropriate class and type by calling sendevent.

4. Recovery instruction search. The system event notificationmechanism, sendevent, first attempts todetermine if the LCD has a resource and/or recovery for the event type or component. Tomake thisdetermination, the is_recoverable process scans the resource hierarchy in LCD for a resource instancethat corresponds to the event (in this example, the filesys name).

The action in the next step depends upon whether the scan finds resource-level recovery instructions:

l Not found. If resource recovery instructions are not found, is_recoverable returns to sendevent andsendevent continues with basic event notification.

l Found. If the scan finds the resource, is_recoverable forks the recover process into the background.The is_recoverable process returns and sendevent continues with basic event notification, passing anadvisory flag "-A" to the basic alarming event response scripts, indicating that LifeKeeper is performingrecovery.

5. Recover process initiated. Assuming that recovery continues, is_recoverable initiates the recoverprocess which first attempts local recovery.

6. Local recovery attempt. If the instance was found, the recover process attempts local recovery byaccessing the resource hierarchy in LCD to search the hierarchy tree for a resource that knows how torespond to the event. For each resource type, it looks for a recovery subdirectory containing asubdirectory named for the event class, which in turn contains a recovery script for the event type.

The recover process runs the recovery script associated with the resource that is farthest above thefailing resource in the resource hierarchy. If the recovery script succeeds, recovery halts. If the scriptfails, recover runs the script associated with the next resource, continuing until a recovery scriptsucceeds or until recover attempts the recovery script associated with the failed instance.

If local recovery succeeds, the recovery process halts.

7. Inter-server recovery begins. If local recovery fails, the event then escalates to inter-server recovery.

8. Recovery continues. Since local recovery fails, the recover process marks the failed instance to theOut-of-Service-FAILED (OSF) state andmarks all resources that depend upon the failed resource totheOut-of-Service-UNIMPAIRED (OSU) state. The recover process then determines whether thefailing resource or a resource that depends upon the failing resource has any shared equivalencies witha resource on any other systems,and selects the one to the highest priority alive server. Only oneequivalent resource can be active at a time.

If no equivalency exists, the recover process halts.

If a shared equivalency is found and selected, LifeKeeper initiates inter-server recovery. The recoverprocess sends amessage through the LCM to the LCD process on the selected backup systemcontaining the shared equivalent resource. This means that LifeKeeper would attempt inter-serverrecovery.

9. lcdrecover process coordinates transfer. The LCD process on the backup server forks the processlcdrecover to coordinate the transfer of the equivalent resource.

SIOS Protection Suite for LinuxPage 26

Page 47: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Server Failure Recovery Scenario

10. Activation on backup server. The lcdrecover process finds the equivalent resource and determineswhether it depends upon any resources that are not in-service. lcdrecover runs the restore script (partof the resource recovery action scripts) for each required resource, placing the resources in-service.

The act of restoring a resource on a backup server may result in the need for more shared resources tobe transferred from the primary system. Messages pass to and from the primary system, indicatingresources that need to be removed from service on the primary server and then brought into service onthe selected backup server to provide full functionality of the critical applications. This activitycontinues, until no new shared resources are needed and all necessary resource instances on thebackup are restored.

Server Failure Recovery ScenarioThe LifeKeeper Communications Manager (LCM) has two functions:

l Messaging. The LCM serves as a conduit through which LifeKeeper sends messages during recovery,configuration, or when running an audit.

l Failure detection. The LCM also plays a role in detecting whether or not a server has failed.

LifeKeeper has a built-in heartbeat signal that periodically notifies each server in the configuration that itspaired server is operating. If a server fails to receive the heartbeat message through one of thecommunications paths, LifeKeeper marks that path DEAD.

The following figure illustrates the recovery tasks when the LCM heartbeat mechanism detects a serverfailure.

The following steps describe the recovery scenario, illustrated above, if LifeKeeper marks all communicationsconnections to a server DEAD.

1. LCM activates eventslcm. When LifeKeeper marks all communications paths dead, the LCM initiatesthe eventslcm process.

SIOS Protection Suite for LinuxPage 27

Page 48: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Server Failure Recovery Scenario

Only one activity stops the eventslcm process:

l Communication path alive. If one of the communications paths begins sending the heartbeat signalagain, the LCM stops the eventslcm process.

It is critical that you configure two or more physically independent, redundant communication pathsbetween each pair of servers to prevent failovers and possible system panics due to communicationfailures.

2. Message to sendevent. eventslcm sends the system failure alarm by calling sendevent with the eventtypemachfail.

3. sendevent initiates failover recovery. The sendevent program determines that LifeKeeper can handlethe system failure event and executes the LifeKeeper failover recovery process lcdmachfail.

4. lcdmachfail checks. The lcdmachfail process first checks to ensure that the non-responding serverwas not shut down. Failovers are inhibited if the other system was shut down gracefully before systemfailure. Then lcdmachfail determines all resources that have a shared equivalency with the failedsystem. This is the commit point for the recovery.

5. lcdmachfail restores resources. lcdmachfail determines all resources on the backup server that haveshared equivalencies with the failed primary server. It also determines whether the backup server isthe highest priority alive server for which a given resource is configured. All backup servers performthis check, so that only one server will attempt to recover a given hierarchy. For each equivalentresource that passes this check, lcdmachfail invokes the associated restore program. Then,lcdmachfail also restores each resource dependent on a restored resource, until it brings the entirehierarchy into service on the backup server.

SIOS Protection Suite for LinuxPage 28

Page 49: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Installation and Configuration

SPS for Linux InstallationFor complete installation instructions on installing the SPS for Linux software, see the SPS for LinuxInstallation Guide. Refer to the SPS for Linux Release Notes for additional information.

SPS for Linux ConfigurationOnce the SPS environment has been installed, the SPS software can be configured on each server in thecluster. Follow the steps in theSPS Configuration Steps topic below which contains links to topics withadditional details.

SPS Configuration StepsIf you have installed your SPS environment as described in the SPS Installation Guide, you should be ready tostart and configure the SPS software on each server in your cluster.

Follow the steps below which contain links to topics with additional details. Perform these tasks on eachserver in the cluster.

1. Start LifeKeeper by typing the following command as root:

/etc/init.d/lifekeeper start

This command starts all LifeKeeper daemon processes on the server being administered if they are notcurrently running.

For additional information on starting and stopping LifeKeeper, see Starting LifeKeeper and StoppingLifeKeeper.

2. Set Up TTY Communications Connections. If you plan to use a TTY communications (comm) path fora LifeKeeper heartbeat, you need to set up the physical connection for that heartbeat.

3. Configure the GUI. There aremultiple tasks involved with configuring the GUI. Start with theLifeKeeper GUI - Overview topic within Preparing to Run theGUI. Then for detailed instructions, followthe browse sequence throughout Preparing to Run theGUI.

Note: The first time you run the LifeKeeper GUI, you will see aQuickStart button which opens awindow with instructions and links to help you step through the configuration of your LifeKeeperresources. Subsequently, you can access this QuickStart Configuration Assistant under the Helpmenu.

4. Create Communication Paths. Before you can activate LifeKeeper protection, youmust create thecommunications path (heartbeat) definitions within LifeKeeper.

SIOS Protection Suite for LinuxPage 29

Page 50: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Set Up TTY Connections

5. Perform any of the following optional configuration tasks:

l Set the Server Shutdown Strategy

l Configure themanual failover confirmation option

l Tune the LifeKeeper heartbeat

l Configure SNMP Event Forwarding via SNMP

l Configure Event Email Notification

l If you plan to use STONITH devices in your cluster, create the scripts to control theSTONITH devices and place them in the appropriate LifeKeeper events directory.

6. SPS is now ready to protect your applications. The next step depends on whether you will be using oneof the optional SPS Recovery Kits:

l If you are using an SPS Recovery Kit, refer to the Documentation associated with the kitfor instructions on creating and extending your resource hierarchies.

l If you are using an application that does not have an associated Recovery Kit, then youhave two options:

l If it is a simple application, you should carefully plan how to create an interfacebetween your application and LifeKeeper. Youmay decide to protect it using theGeneric Application Recovery Kit  included with the LifeKeeper core.

Set Up TTY ConnectionsIf you plan to use a TTY communications (comm) path for a LifeKeeper heartbeat, you need to set up thephysical connection for that heartbeat. Remember that multiple communication paths are required to avoidfalse failover due to a simple communications failure. Two ormore LAN-based (TCP) comm paths shouldalso be used.

Connect the TTY cable to the serial ports of each server to be used for the serial heartbeat.

1. Test the serial path using the following command:

/opt/LifeKeeper/bin/portio -r -p port -b baud

where:

l baud is the baud rate selected for the path (normally 9600)

l port is the serial port being tested on Server 1, for example /dev/ttyS0.

Server 1 is now waiting for input from Server 2.

2. Run command portio on Server 2. On the second system in the pair, type the following command:

echo Helloworld | /opt/LifeKeeper/bin/portio -p port -b baud

where:

SIOS Protection Suite for LinuxPage 30

Page 51: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Event Forwarding via SNMP

l baud is the same baud rate selected for Server 1.

l port is the serial port being tested on Server 2, for example /dev/ttyS0.

3. View the console. If the communications path is operational, the software writes "Helloworld" on theconsole on Server 1. If you do not see that information, perform diagnostic and correction operationsbefore continuing with LifeKeeper configuration.

LifeKeeper Event Forwarding via SNMP

Overview of LifeKeeper Event Forwarding via SNMPThe Simple Network Management Protocol (SNMP) defines a device-independent framework for managingnetworks. Devices on the network are described by MIB (Management Information Base) variables that aresupplied by the vendor of the device. An SNMP agent runs on each node of the network, and interacts with aNetwork Manager node. The Network Manager can query the agent to get or set the values of its MIBvariables, there by monitoring or controlling the agent's node. The agent can also asynchronously generatemessages called traps to notify themanager of exceptional events. There are a number of applicationsavailable for monitoring andmanaging networks using the Simple Network Management Protocol (SNMP).

LifeKeeper has an event notificationmechanism for registering applications that wish to be notified of specificevents or alarms (see the sendevent(5) man page). LifeKeeper can be easily enabled to send SNMP trapnotification of key LifeKeeper events to a third party network management console wishing tomonitorLifeKeeper activity.

The remotemanagement console receiving SNMP traps must first be configured through the administrationsoftware of that system; LifeKeeper provides no external SNMP configuration. The remotemanagementserver is typically located outside of the LifeKeeper cluster (i.e., it is not a LifeKeeper node).  

LifeKeeper Events TableThe following table contains the list of LifeKeeper events and associated trap numbers. The entire Object ID(OID) consists of a prefix followed by a specific trap number in the following format:

prefix.0.specific trap number

The prefix is .1.3.6.1.4.1.7359, which expands to iso.org.dod.internet.private.enterprises.7359 in theMIBtree. (7359 is SteelEye’s [SIOS Technology] enterprise number, followed by 1 for LifeKeeper.) For example,the LifeKeeper Startup Complete event generates the OID: .1.3.6.1.4.1.7359.1.0.100.

LifeKeeper Event/Description Trap# Object ID

LifeKeeper Startup Complete

Sent from a node when LifeKeeper is started on that node100 .1.3.6.1.4.1.7359.1.0.100

SIOS Protection Suite for LinuxPage 31

Page 52: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Events Table

LifeKeeper Shutdown Initiated

Sent from a node beginning LifeKeeper shutdown101 .1.3.6.1.4.1.7359.1.0.101

LifeKeeper Shutdown Complete

Sent from a node completing LifeKeeper shutdown102 .1.3.6.1.4.1.7359.1.0.102

LifeKeeper Manual Switchover Initiated on Server

Sent from the node from which amanual switchover was requested110 .1.3.6.1.4.1.7359.1.0.110

LifeKeeper Manual Switchover Complete – recovered list

Sent from the node where themanual switchover was completed111 .1.3.6.1.4.1.7359.1.0.111

LifeKeeper Manual Switchover Complete – failed list

Sent from each node within the cluster where themanual switchoverfailed

112 .1.3.6.1.4.1.7359.1.0.112

LifeKeeper Node Failure Detected for Server

Sent from each node within the cluster when a node in that cluster fails120 .1.3.6.1.4.1.7359.1.0.120

LifeKeeper Node Recovery Complete for Server – recovered list

Sent from each node within the cluster that has recovered resources fromthe failed node

121 .1.3.6.1.4.1.7359.1.0.121

LifeKeeper Node Recovery Complete for Server – failed list

Sent from each node within the cluster that has failed to recoverresources from the failed node

122 .1.3.6.1.4.1.7359.1.0.122

LifeKeeper Resource Recovery Initiated

Sent from a node recovering a resource; a 131 or 132 trap always followsto indicate whether the recovery was completed or failed.

130 .1.3.6.1.4.1.7359.1.0.130

LifeKeeper Resource Recovery Failed

Sent from the node in trap 130 when the resource being recovered fails tocome into service

131* .1.3.6.1.4.1.7359.1.0.131

LifeKeeper Resource Recovery Complete

Sent from the node in trap 130 when the recovery of the resource iscompleted

132 .1.3.6.1.4.1.7359.1.0.132

LifeKeeper Communications Path Up

A communications path to a node has become operational140 .1.3.6.1.4.1.7359.1.0.140

LifeKeeper Communications Path Down

A communications path to a node has gone down141 .1.3.6.1.4.1.7359.1.0.141

SIOS Protection Suite for LinuxPage 32

Page 53: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Configuring LifeKeeper Event Forwarding

The following variables are used to "carry" additional informationin the trap PDU:

Trapmessage all .1.3.6.1.4.1.7359.1.1

Resource Tag 130 .1.3.6.1.4.1.7359.1.2

Resource Tag 131 .1.3.6.1.4.1.7359.1.2

Resource Tag 132 .1.3.6.1.4.1.7359.1.2

List of recovered resources 111 .1.3.6.1.4.1.7359.1.3

List of recovered resources 121 .1.3.6.1.4.1.7359.1.3

List of failed resources 112 .1.3.6.1.4.1.7359.1.4

List of failed resources 122 .1.3.6.1.4.1.7359.1.4

* This trapmay appear multiple times if recovery fails onmultiple backup servers.

Configuring LifeKeeper Event Forwarding

PrerequisitesThe SNMP event forwarding feature is included as part of the LifeKeeper Core functionality and does notrequire additional LifeKeeper packages to be installed. Since LifeKeeper uses the snmptrap utility to generatethe traps, the snmptrap command is required to be installed on the node that will generate SNMP notification.The snmptrap utility is provided by the following packages:RHEL 5 or later and supported operating systems - net-snmp-utilsSLES 11 or later - net-snmp

In older versions of the snmp implementation (prior to 4.1) where the defCommunity directive is notsupported, the traps will be sent using the "public" community string.

It is not necessary to have an SNMP agent snmpd running on the LifeKeeper node.

The configuration of a trap handler on the network management console and its response to trapmessages isbeyond the scope of this LifeKeeper feature. See the documentation associated with your systemmanagement tool for related instructions.

Configuration TasksThe following tasks must be performed to set up LifeKeeper SNMP Event Forwarding. All but the last taskmust be repeated on each node in the LifeKeeper cluster that will be generating SNMP trapmessages.

1. Ensure that the snmptrap utility is available as noted above.

2. Specify the network management node to which the SNMP traps will be sent. This can be done eitherby command line or by editing the /etc/default/LifeKeeper file. Youmust specify the IPaddress rather than domain name to avoid DNS issues.

SIOS Protection Suite for LinuxPage 33

Page 54: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Verifying the Configuration

l By command line, use the lk_configsnmp (see the lk_configsnmp(1M) man page for details).This utility will only accept IP addresses.

l Or, edit the defaults file /etc/default/LifeKeeper to add the IP address. Find the entry LK_TRAP_MGR= and insert one or more IP addresses (separated by commas) to the right of "=" (no whitespace before or after "=" or commas).

3. If you are using an older version of the snmp implementation that does not support the defCommunitydirective, skip this step. Traps will be sent using the "public" community string. Otherwise, do thefollowing:

Specify a default community in /usr/share/snmp/snmp.conf. If this file does not exist,create it using sufficiently secure permissions. Add the directive "defCommunity" with avalue. This specifies the SNMP version 2c community string to use when sending traps. Forexample, add a line like this:

defCommunity myCommunityString

Refer to the snmp.confman page (delivered with the snmp package) for more informationabout this configuration file.

4. Perform whatever configuration steps are needed on the remotemanagement console to detect andrespond to the incoming trap OIDs from LifeKeeper events. If themanagement node is a Linux server,theminimum that you would need to do to begin verification of this feature would be to start thesnmptrapd daemonwith the -f -Lo option (print themessages to stdout).

Verifying the ConfigurationTo verify that the configuration is working, initiate a LifeKeeper action (for example, start or stop LifeKeeper, orbring a resource in-servicemanually using the LifeKeeper GUI). Verify that the trapmessage was received atthemanagement console. If a trap is not received, inspect the appropriate log files on themanagementsystem, and follow the normal troubleshooting practices provided with themanagement software. TheLifeKeeper log can be inspected to determine if there was a problem sending the trapmessage. See SNMPTroubleshooting for more information.

Disabling SNMP Event ForwardingTo disable the generation of SNMP traps by LifeKeeper, simply remove the assignment of an IP address fromthe LK_TRAP_MGR environment variable in the file /etc/default/LifeKeeper. This can beaccomplished using the lk_configsnmp utility from the command line with the "disable"option (see thelk_configsnmp(1M) page for an example). Or, edit /etc/default/LifeKeeper and change theentry for LK_TRAP_MGR to LK_TRAP_MGR= (or remove the line entirely). This must be done on each nodethat should be disabled from sending trapmessages.

SNMP TroubleshootingFollowing are some possible problems and solutions related to SNMP Event Forwarding. For specific errormessages, see the LifeKeeper Message Catalog.

Problem:NoSNMP trapmessages are sent from LifeKeeper.

SIOS Protection Suite for LinuxPage 34

Page 55: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Event Email Notification

Solution:Verify that the snmptrap utility is installed on the system (it is usually located in /usr/bin). If it is notinstalled, install the appropriate snmp package (see Prerequisites ). If it is installed in someother location, editthe PATH variable in the file /etc/default/LifeKeeper and add the appropriate path.

Problem: No SNMP error messages are logged and SNMP trapmessages do not appear to be sent from aLifeKeeper server.

Solution:Check to see if LK_TRAP_MGR is set to the IP address of the network management server that willreceive the traps. From the command line, use the lk_configsnmp utility with the "query" option to verifythe setting (See the lk_configsnmp(1M) man page for an example.) Or, search for the entry for LK_TRAP_MGR in the file /etc/default/LifeKeeper.  This variable must be set on each LifeKeeper nodethat will generate SNMP trapmessages.

LifeKeeper Event Email Notification

Overview of LifeKeeper Event Email NotificationLifeKeeper Event Email Notification is amechanism by which one or more users may receive email noticeswhen certain events occur in a LifeKeeper cluster. LifeKeeper has an event notificationmechanism forregistering applications that wish to be notified of specific events or alarms (see the sendevent(5) manpage). LifeKeeper can be easily enabled to send email notification of key LifeKeeper events to a selected setof users wishing tomonitor LifeKeeper activity. Additionally, a log of each email notice issued is available via/var/log/lifekeeper.log or by using the Viewing Server Log Files facility in the LifeKeeper GUI. Themessages may be found in the NOTIFY log. 

By default, LifeKeeper Event Email Notification is disabled. Enabling this feature requires setting the LK_NOTIFY_ALIAS environment variable defined in /etc/default/LifeKeeper. The LK_NOTIFY_ALIASenvironment variable can be set to a single email address or alias, or it can contain multiple addresses oraliases separated by commas. To set LK_NOTIFY_ALIAS either run lk_confignotify alias (Seethe lk_confignotifyalias(1M) man page for an example) from the command line and supply theaddress or list of addresses that should receive email when an event occurs or edit the defaults file/etc/default/LifeKeeper to add the email address or address list. Search for the entry LK_NOTIFY_ALIAS= and insert the address or address list separated by commas. Repeat this action on all nodes in thecluster that need to send email for the selected LifeKeeper events.

To disable Email Notification, either run lk_confignotifyalias (See the lk_confignotifyalias(1M) man page for an example) with the --disable argument or edit the defaults file/etc/default/LifeKeeper and remove the setting of LK_NOTIFY_ALIAS (change the line to LK_NOTIFY_ALIAS=).

LifeKeeper Events Generating EmailThe following LifeKeeper events will generate email notices when LK_NOTIFY_ALIAS is set.

LifeKeeper Event Event Description

SIOS Protection Suite for LinuxPage 35

Page 56: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Events Generating Email

LifeKeeper StartupComplete Sent from a node when LifeKeeper is started on that node.

LifeKeeper ShutdownInitiated Sent from a node beginning LifeKeeper shutdown.

LifeKeeper ShutdownComplete Sent from a node completing LifeKeeper shutdown.

LifeKeeper ManualSwitchover Initiated onServer

Sent from the node from which amanual switchover was requested.

LifeKeeper ManualSwitchover Complete -recovered list

Sent from the node where themanual switchover was completed listing theresource successfully recovered.

LifeKeeper ManualSwitchover Complete -failed list

Sent from the node where themanual switchover was completed listing theresource that failed to successfully switchover.

LifeKeeper Node FailureDetected Sent from each node within the cluster when a node in that cluster fails.

LifeKeeper NodeRecovery Complete forServer - recovered list

Sent from each node within the cluster that has recovered resources from thefailed node listing the resource successfully recovered.

LifeKeeper NodeRecovery Complete forServer - failed list

Sent from each node within the cluster that has failed to recover resources fromthe failed node listing the resource that failed to successfully recover.

LifeKeeper ResourceRecovery Initiated

Sent from a node recovering a resource; a "Resource Recovery Complete" or"Resource Recovery Failed" message always follows to indicate whether therecovery was completed or failed.

LifeKeeper ResourceRecovery Complete

Sent from the node that issued a "Resource Recovery Initiated" message whenthe recovery of the resource is completed listing the resource successfullyrecovered.

LifeKeeper ResourceRecovery Failed

Sent from the node that issued a "Resource Recovery Initiated" message if theresource fails to come into service listing the resource that failed tosuccessfully recover.

LifeKeeperCommunications Path Up A communications path to a node has become operational.

LifeKeeperCommunications PathDown

A communications path to a node has gone down.

SIOS Protection Suite for LinuxPage 36

Page 57: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Configuring LifeKeeper Event Email Notification

Configuring LifeKeeper Event Email Notification

PrerequisitesThe Event Email Notification feature is included as part of the LifeKeeper core functionality and does notrequire additional LifeKeeper packages to be installed. It does require that email software be installed andconfigured on each LifeKeeper node that will generate email notification of LifeKeeper events. LifeKeeperuses themail utility, usually installed by themailx package to send notifications.

The configuration of email is beyond the scope of this LifeKeeper feature. By default, LifeKeeper Event EmailNotification is disabled.

Configuration TasksThe following tasks must be performed to set up LifeKeeper Event Email Notification.

1. Ensure that themail utility is available as noted above.

2. Identify the user or users that will receive email notices of LifeKeeper events and set LK_NOTIFY_ALIAS in the LifeKeeper defaults file /etc/default/LifeKeeper. This can be done either fromthe command line or by editing the file /etc/default/LifeKeeper and specifying the emailaddress or alias or the list of email addresses or aliases that should receive notification.

l From the command line, use the lk_confignotifyalias utility (see the lk_confignotifyalias(1M) man page for details). This utility will only accept emailaddresses or aliases separated by commas. 

l Or, edit the defaults file /etc/default/LifeKeeper to add the email address or alias.Search for the entry LK_NOTIFY_ALIAS= and insert the email address or alias (single or listseparated by commas) to the right of the = (no white space around the =).

Verifying the ConfigurationTo verify that the configuration is working, initiate a LifeKeeper action (for example, start or stop LifeKeeper orbring a resource in-servicemanually using the LifeKeeper GUI). Verify that an email message was receivedby the users specified in LK_NOTIFY_ALIAS in the file /etc/default/LifeKeeper and amessage waslogged in the LifeKeeper log file. If an email message has not been received, follow your normal debuggingprocedures for email failures. The LifeKeeper log can be inspected to determine if there was a problemsending the email message. See Email Notification Troubleshooting for more information.

Disabling Event Email NotificationTo disable the generation of email notices by LifeKeeper, simply remove the assignment of an email addressor alias from the LK_NOTIFY_ALIAS environment variable in the file /etc/default/LifeKeeper. Thiscan be accomplished using the lk_confignotifyalias utility from the command line with the "--disable"

SIOS Protection Suite for LinuxPage 37

Page 58: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Email Notification Troubleshooting

option (see the lk_confignotifyalias(1M) page for an example). Or, edit/etc/default/LifeKeeper and change the entry for LK_NOTIFY_ALIAS to LK_NOTIFY_ALIAS=.This must be done on each node that should be disabled from sending email messages.

Email Notification TroubleshootingFollowing are some possible problems and solutions related to email notification of LifeKeeper events. Forspecific error messages, see the LifeKeeper Message Catalog.

Problem:No email messages are received from LifeKeeper.

Solution:Verify that themail utility is installed on the system (it is usually located in /bin/mail). If it is notinstalled, install themailx package. If it is installed in some other location, edit the PATH variable in the file/etc/default/LifeKeeper and add the path to themail utility.

Problem:No email messages are received from LifeKeeper.

Solution:Check the email configuration and ensure email messages have not be queued for deliveryindicating a possible email configuration problem. Also ensure that the email address or addresses specifiedin LK_NOTIFY_ALIAS are valid and are separated by a comma.

Problem: The log file has a "mail returned" error message.

Solution: There was some problem invoking or sendingmail for a LifeKeeper event, such as a "node failure",as themail command return the error X. Verify themail configuration and that LK_NOTIFY_ALIAS contains avalid email address or list of addresses separated by a comma and ensure that email can be sent to thoseaddresses by sending email from the command line using the email recipient format defined in LK_NOTIFY_ALIAS.

Problem:Nomessages, success or failure, are logged and the user or users designated to receive emailhave not received any mail when a LifeKeeper Event has occurred, such as a node failure.

Solution:Check to see if LK_NOTIFY_ALIAS is, in fact, set to an email address or list of addressesseparated by commas. From the command line, use the lk_confignotifyalias utility with the "--query"option to verify the setting (See the lk_confignotifyalias(1M) man page for an example.) Or, searchfor the entry LK_NOTIFY_ALIAS in the file /etc/default/LifeKeeper. This variable must be set oneach LifeKeeper node that will generate email notificationmessages. Also, see the Overview of LifeKeeperEvent Email Notification to see if the LifeKeeper event generates an email message (not all events generateemail messages).

Optional Configuration Tasks

Confirm Failover and Block Resource Failover SettingsAs a normal operation of LifeKeeper, the switching to a backup node when a node failure or a resource failure

SIOS Protection Suite for LinuxPage 38

Page 59: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Set Confirm Failover On

occurs is executed automatically. However, depending on the environment, requiringmanual confirmation bya system administrator may be desirable, instead of an automatic failover recovery initiated by LifeKeeper. Inthese cases, theConfirm Failover orBlock Resource Failover settings are available. By using thesefunctions, automatic failover can be blocked, and a time to wait for failover can be set when a resource failureor a node failure occurs.

Set Confirm Failover orBlock Resource Failover in your SPS environment after carefully reading thedescriptions, examples, and considerations below. These settings are available from thePropertiesPanel ofGUI and via the command line of LifeKeeper.

Set Confirm Failover OnWhen a failover occurs because a node in the lifeKeeper cluster failed (Note: a node failure is identified by afailure of all LifeKeeper communication paths to that system), the time to wait before LifeKeeper switchesresources to a backup node can be set with theConfirm Failover setting (see the discussion on theCONFIRMSOTO variable later in this document). Also, a user can decide whether to automatically switch tothe backup node or not after the time interval to wait expires (see the discussion of the CONFIRMSODEFvariable later in this document).

Note: Set Confirm Failover On actions are only available when a node failure occurs. It is not available forresource failures where one or more communications paths are still active.

To enable theConfirm Failover setting via the GUI, use the General tab of the Server Properties. Anexample of the General tab for Server Properties is shown below. The part circled in red on the screenaddresses theConfirm Failover setting.

SIOS Protection Suite for LinuxPage 39

Page 60: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Set Confirm Failover On

Note: This setting is only available for users having an administrator authority for SPS.

In this example, the setting is seen from the host named lktestA. The part circled in red on the screen is usedfor this settingConfirm Failover. The node names for the HA cluster are displayed vertically. In thisexample, the standby node for lktestA is lktestB.

The screen shows the configuration status for server lktestA, with the checkbox for lktestB set. In this case,the confirm failover flag is created on lktestB. When a failover from lktestA to lktestB is executed, theconfirmation process for executing a failover occurs on lktestB. This process includes checking the defaultaction to take based on the CONFIRMSODEF variable setting (see the discussion later in this document) andhow long to wait before taking that action based on the CONFIRMSOTO variable setting.

TheConfirm Failover flag creation status can be checked via the command line. When the checkbox forlktestB is set on the host named lktestA, theConfirm Failover flag is created on lktestB. (NOTE: In thisexample, the flag is not created on lktestA, only on lktestB.) An example of the command line output is below.

[root@lktestB~]# /opt/LifeKeeper/bin/flg_list

confirmso!lktestA

SIOS Protection Suite for LinuxPage 40

Page 61: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Set Confirm Failover On

The "confirmso!lktestA" output is the result of the flg_list command, and displays that theConfirm Failoverflag is set on node lktestB to confirm IktestA failures.

When failover occurs with the confirmso flag, the followingmessages are recorded in the LifeKeeper log file.

INFO:lcd.recover:::004113:

chk_man_interv: Flag confirmso!hostname is set, issuing confirmso event and waiting for switchoverinstruction.

NOTIFY:event.confirmso:::010464:

LifeKeeper: FAILOVER RECOVERY OFMACHINE lktestA requires manual confirmation! Execute'/opt/LifeKeeper/bin/lk_confirmso -y -s lktestA ' to allow this failover, or execute '/opt/LifeKeeper/bin/lk_confirmso -n -s lktestA' to prevent it. If no instruction is provided, LifeKeeper will timeout in 600 seconds andthe failover will be allowed to proceed.

Execute one of the following commands to confirm the failover;

To proceed with the failover:

#/opt/LifeKeeper/bin/lk_confirmso -y -s hostname

To block the failover:

# /opt/LifeKeeper/bin/lk_confirmso -n -s hostname

The host name that is specified when executing the command is the host name listed inConfirm Failoverflag which for this example would be lktestA. Execute the command by referring to the example commandsprovided in the Log output.

In the case where the set time to wait is exceeded, the default failover action is executed (allow failover orblock failover). The default failover action is determined by the CONFIRMSODEF variable (see discussionlater in this document).

The followingmessage is output to the LifeKeeper log when the timeout expires.

lcdrecover[xxxx]: INFO:lcd.recover:::004408:chk_man_interv: Timed out waiting for instruction, using defaultCONFIRMSODEF value 0.

The LifeKeeper operation when the time to wait is exceeded is controlled by the setting of the variable"CONFIRMSODEF” which is set in the /etc/default/LifeKeeper file with a value of “1” or “0”. A value of "0" isset by default, and this indicates that the failover will proceed when the time to wait is exceeded. If the valueis set to a "1", the failover is blocked when the time to wait is exceeded.

The time to wait for confirmation of a failover can be changed by adjusting the value of the CONFIRMSOTOvariable in the /etc/default/LifeKeeper file. The value of the variable specifies the number of seconds to waitfor amanual confirmation from the user before proceeding or blocking the failover as determined by the valueof the “CONFIRMSODEF” variable (see above).

Restarting LifeKeeper or rebooting the OS is not required for changes to these variables to take effect. If thevalue of CONFIRMSOTO is set to 0 seconds, then the operation based on the CONFIRMSODEF setting iswill occur immediately.

SIOS Protection Suite for LinuxPage 41

Page 62: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

When to Select [Confirm Failover]Setting

When to Select [Confirm Failover]SettingThis setting is used for Disaster Recovery orWAN configurations in the environment which thecommunication paths are not redundant.

l In a regular site (non-multi-site cluster and non-XenServer), open theProperties page from one serverand then select the server that you want theConfirm Failover flag to be set on.

l For aMulti-siteWAN configuration: Enablemanual failover confirmation by setting theConfirm Fail-over flag.

l For aMulti-site LAN configuration:  Do not set theConfirm Failover flag to enablemanual failoverconfirmation.

l In amulti-site cluster environment – from the non-disaster system, select the DR system and checkthe set confirm failover flag. You will need to open theProperties panel and select this setting for eachnon-disaster server in the cluster.

l In a XenServer environment, all servers in the list (not just the DR site) need to be checked.

Block Resource Failover OnTheBlock Resource Failover On setting blocks all resource transfers due to a resource failure from thegiven system.

Note: TheBlock Resource Failover On setting has no effect on the failover processing when a node failureoccurs. This setting only blocks a failover attempt when a local resource recovery fails and attempts totransfer the resource to another node in the cluster.

By default, the recovery of resource failures in a local system (local recovery) is performed when a resourcefailure is detected. When the local recovery has failed or is not enabled, a failover is initiated to the nexthighest priority standby node defined for the resource. The Block Resource Failover On setting will preventthis failover attempt.

To enable theBlock Resource Failover On setting by the GUI, use the General tab of the Server Properties.An example of the General tab for Server Properties is below. The part circled in red on the screen addressesthe settingBlock Resource Failover On.

SIOS Protection Suite for LinuxPage 42

Page 63: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Block Resource Failover On

Note: This setting is only available for users having an administrator authority for SPS.

In this example, the setting is seen from the host named lktestA. The part circled in red on the screen is usedfor settingBlock Resource Failover. The node names for the HA cluster are displayed vertically here. In thisexample, the standby node for lktestA is lktestB.

In this case, the Block Resource Failover flag is created on lktestB. The "block_failover" flag can be verifiedon the command line by executing the flg_list command. An example of the output is below.

[root@lktestB~]# /opt/LifeKeeper/bin/flg_list

block_failover

When the block_failover flag is set, the failover to other node(IktestA) is blocked when a resource failureoccurs on IkestB.

The block_failover flag prevents resource failovers from occurring on the node where the flag is set.The following logmessage is output to the LifeKeeper log when the failover is blocked by this setting.

SIOS Protection Suite for LinuxPage 43

Page 64: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Conditions/Considerations

ERROR:lcd.recover:::004787:Failover is blocked by current settings. MANUAL INTERVENTION ISREQUIRED

Conditions/Considerationsl In amulti-site configuration, do not selectBlock Failover for any server in the configuration.

l Important considerations in amulti-site cluster configuration: Do not check the Set Block ResourceFailover On box.

Configuration examplesThe configuration examples are described below.

Block All Automatic Failovers

In this example, the failover is blocked when a node failure or a resource failure is detected on either lktestA orlktestB. Use the Confirm failover and Block Resource failover settings for this. The configuration example isbelow.

1. Select lktestA and view Server Properties.On theGeneral tab, check the "Set Confirm Failover On"box for lktestB and the "Set Block Resource Failover on" box for both lktestA and lktestB. The settingstatus in the GUI is below.

Set Confirm Failover On Set Block Resource Failover OnlktestA (Not checked) ✔

lktestB ✔ ✔

The configuration of Server Properties for lktestA as displayed in the GUI once set.

SIOS Protection Suite for LinuxPage 44

Page 65: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Block All Automatic Failovers

*When viewing the Server Properties in the GUI the node name can be found near the top of theproperties panel display.

2. Select lktestB and view Server Properties.

OnGeneral tab, check "Set Confirm Failover On" box for lktestA. The "Set Block ResourceFailover On" property will already be set based on the actions taken in step 1.

Set Confirm Failover On Set Block Resource Failover OnlktestA (Not checked) ✔

lktestB ✔ ✔

The configuration of Server Properties for lktestB as displayed in the GUI once set.

*When viewing the Server Properties in the GUI the node name can be found near the top of theproperties panel display.

After completing these steps, confirm that the "confirmso!hostname" and " block_failover" flags are setfor each node using the flg_list command. For the confirmso flag, verify that the host name for whichfailover confirmation is to be performed is listed as part of the contents of the flag name (to blockfailover from lktestB on lktestA, lktestA should list lktestB in the contents of the confirmso flag nameon lktestA, see the table below).

Confirm Failover flag Block Resource Failover flaglktestA confirmso! lktestB block_failover

lktestB confirmso! lktestA block_failover

3. Set the values for "CONFIRMSOTO" and "CONFIRMSODEF" in /etc/default/LifeKeeper on eachnode. (Restarting LifeKeeper or rebooting the OS is not required.)

CONFIRMSODEF = 1

CONFIRMSOTO = 0

When setting the time to wait value, it is specified in seconds via CONFIRMSOTO. For the defaultaction to be taken on failover, the CONFIRMSODEF settingmust be either 0 (failover is executed) or 1(failover is blocked).

With the above settings any node failure will be immediately blocked without any operator intervention.

SIOS Protection Suite for LinuxPage 45

Page 66: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Block Failovers in One Direction

Block Failovers in One Direction

In this example, the failover to lktestB is blocked when a node failure or a resource failure is detected onlktestA. On the contrary, the failover to lktestA is allowed when a node failure or a resource failure is detectedon lktestB.

1. Select lktestA and view Server Properties.

2. OnGeneral tab, check "Set Confirm Failover On" box for lktestB and the "Set Block ResourceFailover On" for lktestA.

Set Confirm Failover On Set Block Resource Failover OnlktestA (Not checked) ✔

lktestB ✔ (Not checked)

The configuration of Server Properties for lktestB as displayed in the GUI once set.

3. Select lktestB and view Properties.

In theGeneral tab, the "Set Confirm Failover On" box for lktestA should already be set (from theaction taken on lktestA).

Set Confirm Failover On Set Block Resource Failover OnlktestA (Not checked) (Not checked)

lktestB (Not checked) ✔

The configuration of Server Properties for lktestB as displayed in the GUI once set.

*In the GUI, the local host name is listed first.

For this configuration, verify that the "confirmso!lktestA" flag is set on lktestB (no confirmso flagshould be set on lktestrA) and the block failover flag is set on lktestA.

SIOS Protection Suite for LinuxPage 46

Page 67: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Setting Server Shutdown Strategy

Confirm Failover flag Block Resource Failover flaglktestA N/A block_failover

lktestB confirmso! lktestA N/A

4. Set the values for "CONFIRMSODEF" and "CONFIRMSOTO" in /etc/default/LifeKeeper on lktestB.

CONFIRMSODEF = 0

CONFIRMSOTO = 0

For this configuration resource andmachine failovers from lktestA to lktestB are allowed. Resource andmachine failovers from lktestB to lktestA are prevented.

Setting Server Shutdown StrategyThe Shutdown Strategy is a LifeKeeper configuration option that governs whether or not resources areswitched over to a backup server when a server is shut down. The options are:

Do Not Switch Over Resources(default)

LifeKeeper will not bring resources in service on a backup server duringan orderly shutdown.

Switch Over Resources LifeKeeper will bring resources in service on a backup server during anorderly shutdown.

The Shutdown Strategy is set by default to "Do Not Switch Over Resources." You should decide whichstrategy you want to use on each server in the cluster, and if you wish, change the Shutdown Strategyto"Switch Over Resources".

For each server in the cluster:

1. On the Edit Menu, point toServer and then click Properties.

2. Select the server to bemodified.

3. On theGeneral Tab of theServer Properties dialog, select theShutdown Strategy.

Note: The LifeKeeper process must be running during an orderly shutdown for the Shutdown Strategy to havean effect.

Note:SomeAmazon EC2 configurations have issues when the Shutdown Strategy is set to "Do notSwitchover Resources". For the detailed information,see Troubleshooting > Known issues and restrictions.

SIOS Protection Suite for LinuxPage 47

Page 68: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Tuning the LifeKeeper Heartbeat

Tuning the LifeKeeper Heartbeat

Overview of the Tunable HeartbeatThe LifeKeeper heartbeat is the signal sent between LifeKeeper servers over the communications path(s) toensure each server is "alive". There are two aspects of the heartbeat that determine how quickly LifeKeeperdetects a failure:

l Interval: the time interval between heartbeats signal sent (unit is second). Failing to receive the LCMsignal, which includes heartbeat signal, from another server within the interval time is determined as amissed heartbeat.

l Number of Heartbeats:the consecutive number of heartbeats by which the communications path isdetermined as dead, triggering a failover.

The heartbeat values are specified by two tunables in the LifeKeeper defaults file /etc/default/LifeKeeper.These tunables can be changed if you wish LifeKeeper to detect a server failure sooner than it would using thedefault values:

l LCMHBEATTIME (interval)

l LCMNUMHBEATS (number of heartbeats)

The following table summarizes the defaults andminimum values for the tunables over both TCP and TTYheartbeats. The interval for a TTY communications path cannot be set below 2 seconds because of the slowernature of themedium.

Tunable Default Value Minimum Value

LCMHBEATTIME 5 1 (TCP) 2 (TTY)

LCMNUMHBEATS 3 2 (TCP or TTY)

Important Note: The values for both tunables MUST be the SAME on all servers in the cluster.

ExampleConsider a LifeKeeper cluster in which both intervals are set to the default values. LifeKeeper sends aheartbeat between servers every 5 seconds. If a communications problem causes the heartbeat to skip twobeats, but it resumes on third heartbeat, LifeKeeper takes no action. However, if the communications pathremains dead for 3 beats, LifeKeeper will label that communications path as dead, but will initiate a failoveronly if the redundant communications path is also dead.

Configuring the HeartbeatYoumust manually edit file /etc/default/LifeKeeper to add the tunable and its associated value. Normally, thedefaults file contains no entry for these tunables; you simply append the following lines with the desired valueas follows:

SIOS Protection Suite for LinuxPage 48

Page 69: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Configuration Considerations

LCMHBEATTIME=x

LCMNUMHBEATS=y

If you assign the value to a number below theminimum value, LifeKeeper will ignore that value and use theminimum value instead.

Configuration Considerationsl If you wish to set the interval at less than 5 seconds, then you should ensure that the communicationspath is configured on a private network, since values lower than 5 seconds create a high risk of falsefailovers due to network interruptions.

l Testing has shown that setting the number of heartbeats to less than 2 creates a high risk of falsefailovers. This is why the value has been restricted to 2 or higher.

l The values for both the interval and number of heartbeats MUST be the SAME on all servers in thecluster in order to avoid a false failovers. Because of this, LifeKeeper must be shutdown on bothservers before editing these values. If you wish to edit the heartbeat tunables after LifeKeeper is inoperation with protected applications, youmay use the command /etc/init.d/lifekeeperstop-daemons, which stops LifeKeeper but does not bring down the protected applications.

l LifeKeeper does not impose an upper limit for the LCMHBEATTIME and LCMNUMHBEATS values.But setting these values at a very high number can effectively disable LifeKeeper's ability to detect afailure. For instance, setting both values to 25 would instruct LifeKeeper to wait 625 seconds (over 10minutes) to detect a server failure, whichmay be enough time for the server to re-boot and re-join thecluster.

Note: If you are using both TTY and TCP communications paths, the value for each tunable applies to bothcommunications paths. The only exception is if the interval value is below 2, which is theminimum for a TTYcommunications path.

For example, suppose you specify the lowest values allowed by LifeKeeper in order to detect failure asquickly as possible:

LCMHBEATTIME=1

LCMNUMHBEATS=2

LifeKeeper will use a 1 second interval for the TCP communications path, and a 2 second interval for TTY. Inthe case of a server failure, LifeKeeper will detect the TCP failure first because its interval is shorter (2heartbeats that are 1 second apart), but then will do nothing until it detects the TTY failure, which will be after 2heartbeats that are 2 seconds apart.

Using Custom Certificates with the SPS APIBeginning with Release 7.5, the SIOS Protection Suite (SPS) API uses SSL/TLS to communicate betweendifferent systems. Currently, this API is only partially used and is reserved for internal use only but may beopened up to customer and third party usage in a future release. By default, the product is installed withdefault certificates that provide some assurance of identity between nodes. This document explains how toreplace these default certificates with certificates created by your ownCertificate Authority (CA).

SIOS Protection Suite for LinuxPage 49

Page 70: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

How Certificates Are Used

Note: Normal SPS communication does not use these certificates.

How Certificates Are UsedIn cases where SSL/TLS is used for communications between SPS servers to protect the data beingtransferred, a certificate is provided by systems to identify themselves. The systems also use a CAcertificate to verify the certificate that is presented to them over the SSL connection.

Three certificates are involved:

l /opt/LifeKeeper/etc/certs/LK4LinuxValidNode.pem (server certificate)

l /opt/LifeKeeper/etc/certs/LK4LinuxValidClient.pem (client certificate)

l /opt/LifeKeeper/etc/certs/LKCA.pem (certificate authority)

The first two certificates must be signed by the CA certificate to satisfy the verification performed by theservers. Note that the common name of the certificates is not verified, only that the certificates are signed bythe CA.

Using Your Own CertificatesIn some installations, it may be necessary to replace the default certificates with certificates that are createdby an organization's internal or commercial CA. If this is necessary, replace the three certificates listed abovewith new certificates using the same certificate file names. These certificates are of the PEM type. TheLK4LinuxValidNode.pem and LK4LinuxValidClient.pem each contain both their respective key and certificate.The LK4LinuxValidNode.pem certificate is a server type certificate. LK4LinuxValidClient.pem is a client typecertificate.

If the default certificates are replaced, SPS will need to be restarted to reflect the changes. If the certificatesaremisconfigured, SIOS-lighttpd daemonwill not start successfully and errors will be received in theLifeKeeper log file. If problems arise, refer to this log file to see the full command that should be run.

Linux Configuration

Operating SystemThe default operating systemmust be installed to ensure that all required packagesare installed. Theminimal operating system install does not contain all of therequired packages, and therefore, cannot be used with LifeKeeper.

SIOS Protection Suite for LinuxPage 50

Page 71: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Linux Configuration

Kernel updates

In order to provide the highest level of availability for a LifeKeeper cluster, the kernelversion used on a system is very important.  The table below lists each supporteddistribution and version with the kernel that has passed LifeKeeper certificationtesting.

Note:Beginning with SPS 8.1, when performing a kernel upgrade on RedHatEnterprise Linux and supported Red Hat Enterprise Linux derivatives (CentOS andOEL), it is no longer a requirement that the setup script (./setup) from the installationimage be rerun. Modules should be automatically available to the upgraded kernelwithout any intervention as long as the kernel was installed from a proper Red Hatpackage (rpm file)." There are no SPS kernel module requirements for SUSE LinuxEnterprise Server.

Distribution/Version SupportedVersion Supported Kernels

RedHat Enterprise Linux andRedHat Enterprise LinuxAdvanced Platform forAMD64/EM64T

55.15.25.35.45.55.65.75.85.95.105.11

2.6.18-8.el5 2.6.18-8.1.1.el5 (default kernel)2.6.18-53.el52.6.18-92.el5 2.6.18-128.el52.6.18-164.el52.6.18-194.el52.6.18-238.el52.6.18-274.el52.6.18-308.el52.6.18-348.el52.6.18-371.el52.6.18-398.el5

Red Hat Enterprise Linux forAMD64/EM64T

(*6.0 is Not Recommended)

6.0*6.16.26.36.46.56.66.76.86.9

2.6.32-71.el62.6.32-131.17.1.el62.6.32-220.el62.6.32-279.el62.6.32-358.el62.6.32-431.el62.6.32-504.el62.6.32-573.el62.6.32-642.el62.6.32-696.el6

Red Hat Enterprise Linux forAMD64/EM64T

77.17.27.37.4

3.10.0-123.el73.10.0-229.el73.10.0-327.el73.10.0-514.el73.10.0-693.el7

SIOS Protection Suite for LinuxPage 51

Page 72: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Linux Configuration

Distribution/Version SupportedVersion Supported Kernels

SUSE Linux Enterprise Server11 for x86_64

SP1SP2SP3SP4

2.6.27.19-52.6.32.12-0.73.0.42-0.7.33.0.76-0.11.13.0.101-0.63.1

SUSE Linux Enterprise Server12 for x86_64

SP1SP2

* SLES12.0 isnot supported.

3.12.49-114.4.21-69

Oracle Enterprise Linux for x86_64

55.15.25.35.45.55.65.75.85.95.105.11

2.6.18-8.el5 2.6.18-53.el52.6.18-92.el52.6.18-128.el52.6.18-164.el52.6.18-194.el52.6.18-238.el52.6.18-274.el52.6.18-308.el52.6.18-348.el52.6.18-371.el52.6.18-398.el5

Oracle Linux

6.36.46.56.66.76.86.9

UEK R3UEK R4

2.6.32-279.el62.6.32-358.el62.6.32-431.el62.6.32-504.el62.6.32-573.el62.6.32-642.el62.6.32-696.el63.8.13-16.2.1.el6uek4.1.12-37.3.1.el6uek

Oracle Linux

77.17.27.3

UEK R3UEK R4

3.10.0-123.el73.10.0-229.el773.10.0-327.el73.10.0-514.el73.8.13-16.2.1.el6uek4.1.12-37.3.1.el6uek

SIOS Protection Suite for LinuxPage 52

Page 73: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Linux Configuration

Distribution/Version SupportedVersion Supported Kernels

The Community ENTerpriseOperating System (CentOS) forx86_64

55.15.25.35.45.55.65.75.85.95.105.11

2.6.18-8.el52.6.18-53.el52.6.18-92.1.10.el52.6.18-128.el52.6.18-164.2.1.el52.6.18-194.el52.6.18-238.el52.6.18-274.3.1.el52.6.18-308.el52.6.18-348.el52.6.18-371.el52.6.18-398.el5

The Community ENTerpriseOperating System (CentOS) forx86_64

(*6.0 - DataKeeper Configuration isNotSupported)

6.0*6.16.26.36.46.56.66.76.86.9

2.6.32-71.el62.6.32-131.el62.6.32-220.el62.6.32-279.2.1.el62.6.32-358.el62.6.32-431.el62.6.32-504.el62.6.32-573.el62.6.32-642.el62.6.32-696.el6

The Community ENTerpriseOperating System (CentOS) forx86_64

77.17.27.3

3.10.0-123.el73.10.0-229.el73.10.0-327.el73.10.0-514.el7

Note: This list of supported distributions and kernels is for LifeKeeper only. Youshould also determine and adhere to the supported distributions and kernels for yourserver and storage hardware as specified by themanufacturer.

Note: If both UEK (Unbreakable Enterprise Kernel) and RHCK (Red HatCompatible Kernel) installed in Oracle Linux, the OS is required to be started inadvance in RHCK when installing LifeKeeper. Activate it with UEK after theinstallation completed, then the resource configuration and operation can beavailable.

SIOS Protection Suite for LinuxPage 53

Page 74: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Linux Configuration

LUN support

The Linux SCSI driver has several parameters that control which devices will beprobed for Logical Units (LUNs):

l List of devices that do not support LUNs – this list of devices are known toNOT support LUNs, so the SCSI driver will not allow the probing of thesedevices for LUNs.

l List of devices that do support LUNs – this list of devices is known tosupport LUNs well, so always probe for LUNs.

l Probe all LUNs on each SCSI device – if a device is not found on either list,whether to probe or not.  This parameter is configured by make config in theSCSI module section.

While most distributions (including SUSE) have the Probe all LUNs setting enabledby default, Red Hat has the setting disabled by default.  External RAID controllersthat are typically used in LifeKeeper configurations to protect data are frequentlyconfigured with multiple LUNs (Logical Units).  To enable LUN support, this fieldmust be selected and the kernel remade.

To enable Probe all LUNs without rebuilding the kernel or modules, set the variablemax_scsi_luns to 255 (which will cause the scan for up to 255 LUNs).  To set themax_scsi_luns on a kernel where the scsi driver is amodule (e.g. Red Hat), add thefollowing entry to /etc/modules.conf, rebuild the initial ramdisk and reboot loadingthat ramdisk:

options scsi_mod max_scsi_luns=255

To set themax_scsi_luns on a kernel where the scsi driver is compiled into thekernel (e.g. SUSE), add the following entry to /etc/lilo.conf:

append="max_scsi_luns=255"

Note: For some devices, scanning for 255 LUNs can have an adverse effect onboot performance (in particular devices with the BLIST_SPARSELUN defined). The Dell PV650F is an array where this has been experienced.  To avoid thisperformance problem, set themax_scsi_luns to themaximum number of LUNs youhave configured on your arrays such as 16 or 32.  For example,

append="max_scsi_luns=16"

Testingenvironment ofchannel bonding,network teaming

In LifeKeeper, we performed tests in the environment using the channel bonding ornetwork teaming with the following settings:

l Bonding policy in channel bonding

+ balance-rr

+ active-backup

l Runner in network teaming

+ round-robin

+ active-backup

SIOS Protection Suite for LinuxPage 54

Page 75: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Data Replication Configuration

Data Replication ConfigurationItem Description

SIOS DataKeeperFeature/ Dis-tribution Matrix

SIOS DataKeeper supports Linux kernel versions 2.6 and higher. SeveralDataKeeper features have additional minimum kernel requirements.

l Bitmapmerge* (kernel 2.6.27+)To operate on RHEL 5.x, RHEL 5.4 and later versions are supported(Appliesto RHEL 5.4 or later.  Bitmapmerging code was backported into the Red HatEL5 Update 4 kernel by Red Hat.).

SIOS DataKeeperdocumentation

The documentation for SIOS DataKeeper is located within the SIOS ProtectionSuite Technical Documentation on the SIOS Technology Corp. Website.

Network ConfigurationItem Description

IP RecoveryKit impacton routingtable

LifeKeeper-protected IP addresses are implemented on Linux as logical interfaces. When alogical interface is configured on Linux, a route to the subnet associated with the logicalinterface is automatically added to the routing table, even if a route to that subnet alreadyexists (for example, through the physical interface). This additional route to the subnet couldpossibly result in multiple routing-table entries to the same subnet.

If an application is inspecting and attempting to verify the address from which incomingconnections aremade, themultiple routing-table entries could cause problems for suchapplications on other systems (non-LifeKeeper installed) to which the LifeKeeper systemmay be connecting. Themultiple routing table entries canmake it appear that theconnection was made from the IP address associated with the logical interface rather thanthe physical interface.

IP subnetmask

For IP configurations under LifeKeeper protection, if the LifeKeeper-protected IP address isintended to be on the same subnet as the IP address of the physical interface on which it isaliased, the subnet mask of the two addresses must be the same. Incorrect settings of thesubnet mask may result in connection delays and failures between the LifeKeeper GUIclient and server.

EEpro100driverinitialization

The Intel e100 driver should be installed to resolve initialization problems with the eepro100driver on systems with Intel Ethernet Interfaces. With the eepro100 driver, the followingerrors may occur when the interface is started at boot time and repeat continuously until theinterface is shut down.

eth0: card reports no Rx buffers

eth0: card reports no resources

SIOS Protection Suite for LinuxPage 55

Page 76: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Application Configuration

Application ConfigurationItem Description

Databaseinitializationfiles

The initialization files for databases need to be either on a shared device and symbolicallylinked to specified locations in the local file system or kept on separate systems andmanually updated on both systems when changes need to be implemented.

LocalizedOraclemountpoints

Localized Oracle environments are different depending on whether you connectas internal or as sysdba. A database on a localizedmount point must be created with“connect / as sysdba” if it is to be put under LifeKeeper protection.

Apacheupdates

Upgrading an SPS protected Apache application as part of upgrading the Linux operatingsystem requires that the default server instance be disabled on start up.

If your configuration file (httpd.conf) is in the default directory (/etc/httpd/conf), the Red Hatupgrade will overwrite the config file. Therefore, you shouldmake a copy of the file beforeupgrading and restore the file after upgrading.

Also, see the Specific Configuration Considerations for ApacheWeb Server section inthe ApacheWeb Server Recovery Kit Administration Guide.

Storage and Adapter ConfigurationItem Description

Multipath I/O and Redundant Controllers

There are several multipath I/O solutions eitheralready available or currently being developed for theLinux environment.  SIOS Technology Corp. isactively working with a number of server vendors,storage vendors, adapter vendors and drivermaintainers to enable LifeKeeper to work with theirmultipath I/O solutions.  LifeKeeper’s use of SCSIreservations to protect data integrity presents somespecial requirements that frequently are not met bythe initial implementation of these solutions.

Refer to the technical notes below for supported diskarrays to determine if a given array is supported withmultiple paths and with a particular multipathsolution.  Unless an array is specifically listed asbeing supported by LifeKeeper with multiple pathsand with a particular multipath solution, it must beassumed that it is not.

SIOS Protection Suite for LinuxPage 56

Page 77: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Storage and Adapter Configuration

Item Description

Heavy I/O inMultipath Configurations

Inmultipath configurations, performing heavy I/Owhile paths are beingmanipulated can cause asystem to temporarily appear to be unresponsive. When themultipath softwaremoves the access of aLUN from one path to another, it must alsomove anyoutstanding I/Os to the new path.  The rerouting ofthe I/Os can cause a delay in the response times forthese I/Os.  If additional I/Os continue to be issuedduring this time, they will be queued in the systemand can cause a system to run out of memoryavailable to any process.  Under very heavy I/Oloads, these delays and low memory conditions cancause the system to be unresponsive such thatLifeKeeper may detect a server as down and initiatea failover.

There aremany factors that will affect the frequencyat which this issuemay be seen. 

l The speed of the processor will affect howfast I/Os can be queued.  A faster processormay cause the failure to be seenmorefrequently. 

l The amount of systemmemory will affecthow many I/Os can be queued before thesystem becomes unresponsive.  A systemwith morememory may cause the failure tobe seen less frequently.

l The number of LUNs in use will affect theamount of I/O that can be queued.

l Characteristics of the I/O activity will affectthe volume of I/O queued.  In test caseswhere the problem has been seen, the testwas writing an unlimited amount of data to thedisk. Most applications will both read andwrite data.  As the reads are blocked waitingon the failover, writes will also be throttled,decreasing the I/O rate such that the failuremay be seen less frequently.

For example, during testing of the IBM DS4000multipath configuration with RDAC, when the I/Othroughput to the DS4000 was greater than 190MBper second and path failures were simulated,LifeKeeper would (falsely) detect a failed server

SIOS Protection Suite for LinuxPage 57

Page 78: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Storage and Adapter Configuration

Item Descriptionapproximately one time out of twelve.  The serversused in this test were IBM x345 servers with dualXeon 2.8GHz processors and 2GB of memoryconnected to a DS4400 with 8 volumes (LUNs) perserver in use.  To avoid the failovers, the LifeKeeperparameter LCMNUMHBEATS(in /etc/default/LifeKeeper) was increasedto 16.  The change to this parameter results inLifeKeeper waiting approximately 80 seconds beforedetermining that an unresponsive system is dead,rather than the default wait time of approximately 15seconds.

Special Considerations for Switchovers with LargeStorage Configurations

With some large storage configurations (for example,multiple logical volume groups with 10 or more LUNsin each volume group), LifeKeeper may not be able tocomplete a sendevent within the default timeout of300 seconds when a failure is detected.  This resultsin the switchover to the backup system failing.  Allresources are not brought in-service and an errormessage is logged in the LifeKeeper log.

The recommendation with large storageconfigurations is to change SCSIERROR from“event” to “halt” in the/etc/default/LifeKeeper file.  This will causeLifeKeeper to perform a “halt” on a SCSI error. LifeKeeper will then perform a successful failover tothe backup system.

SIOS Protection Suite for LinuxPage 58

Page 79: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Storage and Adapter Configuration

Item Description

HP 3PAR StoreServ 7200 FC

TheHP 3PAR StoreServ 7200 was tested by aSIOS Technology Corp. partner with the followingconfigurations:

HP 3PAR StoreServ 7200 (Firmware (HP 3PAR OS)version 3.1.2) using QLogic QMH2572 8Gb FC HBAfor HP BladeSystem c-Class (Firmware version5.06.02 (90d5)), driver version 8.03.07.05.06.2-k(RHEL bundled) with DMMP (device-mapper-1.02.66-6, device-mapper-multipath-0.4.9-46.el6).

The test was performed with SPS for Linux v8.1.1using RHEL 6.2 (x86_64).

Note: 3PAR StoreServ 7200 returns a reservationconflict with the default path checker. To avoid thisconflict, set the following parameter in"/etc/default/LifeKeeper":

DMMP_REGISTRATION_TYPE=hba

HP 3PAR StoreServ 7400 FC

TheHP 3PAR StoreServ 7400 was tested by aSIOS Technology Corp. partner with the followingconfigurations:

HP 3PAR StoreServ 7400 (Firmware (HP 3PAR OS)version 3.1.2) with HP DL380pGen8 with EmulexLightPulse Fibre Channel SCSI HBA (driver version8.3.5.45.4p) with DMMP (device-mapper-1.02.66-6,device-mapper-multipath-0.4.9-46.el6).

The test was performed with LifeKeeper for Linuxv8.1.1 using RHEL 6.2 (x86_64).

Note: 3PAR StoreServ 7400 returns a reservationconflict with the default path checker. To avoid thisconflict, set the following parameter in"/etc/default/LifeKeeper":

DMMP_REGISTRATION_TYPE=hba

And user friendly devicemapping are not supported.Set the following parameter in "multipath.conf"

"user_friendly_names no"

SIOS Protection Suite for LinuxPage 59

Page 80: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Storage and Adapter Configuration

Item Description

HP 3PAR StoreServ 7400 iSCSI (multipath con-figuration using the DMMP Recovery Kit)

The HP 3PAR StoreServ 7400 was tested by aSIOS Technology Corp. partner with the followingconfigurations:

HP 3PAR StoreServ 7400 (Firmware (HP 3PAR OS)version 3.1.3) using HP Ethernet 10Gb 2-port560SFP+ Adapter (Networkdriver ixgbe-3.22.0.2)with iSCSI (iscsi-initiator-utils-6.2.0.873-10.el6.x86_64), DMMP (device-mapper-1.02.79-8.el6,device-mapper-multipath-0.4.9-72.el6).

Note: 3PAR StoreServ 7400 iSCSI returns areservation conflict. To avoid this conflict, set thefollowing parameter in "/etc/default/LifeKeeper":

DMMP_REGISTER_IGNORE=TRUE

HP 3PAR StoreServ 7400 iSCSI(using Quor-um/Witness Kit)

The HP 3PAR StoreServ 7400 was tested by aSIOS Technology Corp. partner with the followingconfigurations:

iSCSI (iscsi-initiator-utils-6.2.0.872-21.el6.x86_64),DMMP (device-mapper-multipath-0.4.9-41, device-mapper-1.02.62-3), nx_nic v4.0.588.

DMMP with the DMMP Recovery Kit on RHEL 6.1 --must be used with the combination ofQuorum/Witness Server Kit and STONITH. Todisable SCSI reservation, setRESERVATIONS=none in"/etc/default/LifeKeeper". Server musthave interface based on IPMI 2.0.

SIOS Protection Suite for LinuxPage 60

Page 81: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Storage and Adapter Configuration

Item Description

HP 3PAR StoreServ 10800 FC

TheHP 3PAR StoreServ 10800 FC was tested by aSIOS Technology Corp. partner with the followingconfigurations:

Firmware (HP 3PAR OS) version 3.1.2 with HPDL380pGen8 with Emulex LightPulse Fibre ChannelHBA (driver version 8.3.5.45.4p) with DMMP(device-mapper-1.02.66-6, device-mapper-multipath-0.4.9-46 el6). The test was performed with SPS forLinux v8.1.2 using RHEL 6.2 (x86_64).

Note: 3PAR StoreServ 10800 FC returns areservation conflict with the default path checker. Toavoid this conflict, set the following parameter in"/etc/default/LifeKeeper":

DMMP_REGISTRATION_TYPE=hba

And user friendly devicemapping are not supported.Set the following parameter in "multipath.conf"

"user_friendly_names no"

HP MSA1040/2040fc

The HP MSA 2040 Storage FC was tested by aSIOS Technology Corp. partner with the followingconfigurations:

HP MSA 2040 Storage FC (Firmware GL101R002)using HP SN1000Q 16Gb 2P FC HBA QW972A(Firmware version 6.07.02, driver version8.04.00.12.06.0-k2 (RHEL bundled)) with DMMP(device-mapper-1.02.74-10, device-mapper-multipath-0.4.9-56).

The test was performed with LifeKeeper for Linuxv8.1.2 using RHEL 6.3 (X86_64).

SIOS Protection Suite for LinuxPage 61

Page 82: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Storage and Adapter Configuration

Item Description

HP P9500/XP

Certified by Hewlett-Packard Company using SIOSLifeKeeper for Linux v7.2 or later.  Model tested wasthe HP P9500/XP and has been qualified to workwith LifeKeeper on the following:

l RedHat Enterprise for 32-bit, x64 (64-bit;Opteron and Intel EMT64)

            RHEL 5.3, RHEL 5.4, RHEL 5.5

l SuSE Enterprise Server for 32-bit, x64 (64-bit;Opteron and Intel EMT64)

            SLES 10 SP3, SLES 11, SLES 11 SP1

l Native or Inbox Clustering Solutions RHCSand SLE HA

HP StoreVirtual 4330 iSCSI (multipath configurationusing the DMMP Recovery Kit)

The HP StoreVirtual 4330 was tested by a SIOSTechnology Corp. partner with the followingconfigurations:

HP StoreVirtual 4330 (Firmware HP LeftHandOS10.5) using HP Ethernet 1Gb 4-port 331FLR(Networkdriver tg3-3.125g) with iSCSI (iscsi-initiator-utils-6.2.0.872-41.el6.x86_64), DMMP(device-mapper-1.02.74-10.el6,device-mapper-multipath-0.4.9-56.el6).

StoreVirtual (LeftHand) series OS (SAN/iQ) version11.00 iSCSI (multipath configuration using theDMMP Recovery Kit)

OS (SAN/iQ) version 11.00 is supported in HPStoreVirtual (LeftHand) storage. All StoreVirtualseries are supported, including StoreVirtual VSA asthe virtual storage appliance. This storage wastested with the following configurations:

StoreVirtual VSA(11.0.00.1263.0) + RHEL 6.4(x86_64)+ DMMP(device-mapper-1.02.77-9.el6.x86_64,device-mapper-multipath-0.4.9-64.el6.x86_64)

HP StoreVirtual 4730 iSCSI (multipath configurationusing the DMMP Recovery Kit)

The HP StoreVirtual 4730 was tested by a SIOSTechnology Corp. partner with the followingconfigurations:

HP StoreVirtual 4730 (Firmware HP LeftHandOS11.5) using HP FlexFabric 10Gb 2-port 536FLBAdapter (Networkdriver bnx2x-1.710.40) with iSCSI(iscsi-initiator-utils-6.2.0.873-10.el6.x86_64), DMMP(device-mapper-1.02.79-8.el6,device-mapper-multipath-0.4.9-72.el6).

SIOS Protection Suite for LinuxPage 62

Page 83: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Storage and Adapter Configuration

Item Description

HP StoreVirtual LeftHandOS version 11.5 iSCSI(multipath configuration using the DMMP RecoveryKit)

LeftHandOS version 11.5 is supported in HPStoreVirtual (LeftHand) storage. All StoreVirtualseries are supported, including StoreVirtual VSA asthe virtual storage appliance. This storage wastested with the following configurations:

StoreVirtual 4730(11.5.00.0673.0) + RHEL 6.5(x86_64)+ DMMP(device-mapper-1.02.79-8.el6.x86_64,device-mapper-multipath-0.4.9-72.el6.x86_64)

HP StoreVirtual 4330 iSCSI (using Quorum/WitnessKit)

The HP StoreVirtual 4330 was tested by a SIOSTechnology Corp. partner with the followingconfigurations:

HP StoreVirtual 4330 (Firmware HP LeftHandOS10.5) using iSCSI (iscsi-initiator-utils-6.2.0.872-41.el6.x86_64), bonding(version: 3.6.0),tg3(version:3.125g)To disable SCSI reservation, setRESERVATIONS=none in"/etc/default/LifeKeeper".

IBM San VolumeController (SVC)

Certified by partner testing in a single pathconfiguration.  Certified by SIOS Technology Corp.in multipath configurations using the DeviceMapperMultipath Recovery Kit.

IBM Storwize V7000 iSCSI

The IBM Storwize V7000 (Firmware Version 6.3.0.1)has been certified by partner testing using iSCSI(iscsi-initiator-utils-6.2.0.872-34.el6.x86_64) withDMMP (device-mapper-1.02.66-6.el6, device-mapper-multipath-0.4.9-46.el6). The test wasperformed with LifeKeeper for Linux v7.5 usingRHEL 6.2.

Restriction: IBM Storwize V7000must be used incombination with the Quorum/Witness Server Kitand STONITH. Disable SCSI reservation by settingthe following in /etc/default/LifeKeeper:

RESERVATIONS=none

IBM Storwize V7000 FC

The IBM Storwize V7000 FC has been certified bypartner testing in multipath configurations on RedHat Enterprise Linux Server Release 6.2 (Tikanga),HBA: QLE2562 DMMP: 0.4.9-46.

SIOS Protection Suite for LinuxPage 63

Page 84: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Storage and Adapter Configuration

Item Description

IBM Storwize V3700 FC

The IBM Storwize V3700 FC has been certified bypartner testing in multipath configurations on RedHat Enterprise Linux Server Release 6.5 (Santiago),HBA: QLE2560 DMMP: 0.4.9-72.

IBM XIV Storage System

Certified by partner testing in only multipath con-figuration on RedHat Enterprise Linux Release 5.6,HBA: NEC N8190-127 Single CH 4Gbps (EmulexLPe1150 equivalent), XIV Host Attachement Kit: Ver-sion 1.7.0.

Note: If you have to create over 32 LUNs on IBMXIV Storage System with LifeKeeper, please contactyour IBM sales representative for details.

Dell EqualLogicPS4000/4100/4110/6000/6010/6100/6110/6500/6510

The Dell EqualLogic was tested by a SIOSTechnology Corp. partner with the followingconfigurations: Dell EqualLogicPS4000/4100/4110/6000/6010/6100/6110/6500/6510 using DMMP with the DMMP Recovery Kit withRHEL 5.3 with iscsi-initiator-utils-6.2.0.868-0.18.el5.With a large number of luns (over 20), change theREMOTETIMEOUT setting in/etc/default/LifeKeeper to REMOTETIMEOUT=600.

Fujitsu

ETERNUS DX60 / DX80 / DX90 iSCSI

ETERNUS DX60 S2 / DX80 S2 / DX90 S2 iSCSI

ETERNUS DX410 / DX440 iSCSI

ETERNUS DX410 S2 / DX440 S2 iSCSI

ETERNUS DX8100 / DX8400 / DX8700 iSCSI

ETERNUS DX8100 S2/DX8700 S2 iSCSI

ETERNUS DX100 S3/DX200 S3 iSCSI

ETERNUS DX500 S3/DX600 S3 iSCSI

ETERNUS DX200F iSCSI

ETERNUS DX60 S3 iSCSI

When using LifeKeeper DMMP ARK for multipathconfiguration it is necessary to set the followingparameters to /etc/multipath.conf.

prio alua

path_grouping_policy group_by_prio

failback immediate

no_path_retry 10

Path_checker tur

SIOS Protection Suite for LinuxPage 64

Page 85: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Storage and Adapter Configuration

Item DescriptionFujitsu

ETERNUS DX60 / DX80 / DX90

FC, single path andmultipathconfigurations

ETERNUS DX60 S2 / DX80 S2 / DX90 S2

FC, single path andmultipathconfigurations

ETERNUS DX410 / DX440

FC, single path andmultipathconfigurations

ETERNUS DX410 S2 / DX440 S2

FC, single path andmultipathconfigurations

ETERNUS DX8100 / DX8400 / DX8700

FC, single path andmultipathconfigurations

ETERNUS DX8100 S2/DX8700 S2

FC, single path andmultipathconfigurations

ETERNUS DX100 S3/DX200 S3/DX500S3/DX600S3

FC, single path andmultipathconfigurations

ETERNUS DX200F

FC, single path andmultipathconfigurations

ETERNUS DX60 S3

FC, single path andmultipathconfigurations

ETERNUS DX8700 S3 / DX8900 S3

FC, single path andmultipathconfigurations

ETERNUS DX8700 S3 / DX8900 S3

When using LifeKeeper DMMP ARK for multipathconfiguration it is necessary to set the followingparameters to /etc/multipath.conf.

prio alua

path_grouping_policy group_by_prio

failback immediate

no_path_retry 10

Path_checker tur

When using ETERNUS Multipath Driver formultipath configuration, it is no need to setparameters to any configure file.

SIOS Protection Suite for LinuxPage 65

Page 86: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Storage and Adapter Configuration

Item DescriptioniSCSI, single path andmultipathconfigurations

NEC iStorageM10e iSCSI (Multipath configurationusing the SPS Recovery Kit)

The NEC iStorageM10e iSCSI was tested by aSIOS Technology Corp. partner with the followingconfigurations:

NEC iStorageM10e iSCSI + 1GbE NIC + iSCSI(iscsi-initiator-utils-6.2.0.873-10.el6.x86_64),SPS(sps-utils-5.3.0-0.el6,sps-driver-E-5.3.0-2.6.32.431.el6)

SIOS Protection Suite for LinuxPage 66

Page 87: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Storage and Adapter Configuration

Item Description

NEC iStorage Storage Path Savior Multipath I/O

Protecting Applications and File Systems ThatUse Multipath Devices: In order for SPS toconfigure and protect applications or file systemsthat use SPS devices, the SPS recovery kit must beinstalled.

Once the SPS kit is installed, simply creating anapplication hierarchy that uses one or more of themultipath device nodes will automatically incorporatethe new resource types provided by the SPS kit.

Multipath Device Nodes: To use the SPS kit, anyfile systems and raw devices must bemounted orconfigured on themultipath device nodes (/dev/dd*)rather than on the native /dev/sd* device nodes.

Use of SCSI-3 Persistent Reservations: The SPSkit uses SCSI-3 persistent reservations with a "WriteExclusive" reservation type. This means thatdevices reserved by one node in the cluster willremain read-accessible to other nodes in the cluster,but those other nodes will be unable to write to thedevice. Note that this does not mean that you canexpect to be able tomount file systems on thoseother nodes for ongoing read-only access.

LifeKeeper uses the sg_persist utility to issue andmonitor persistent reservations. If necessary,LifeKeeper will install the sg_persist(8) utility.

Tested Environment: The SPS kit has been testedand certified with the NEC iStorage disk array usingEmulex HBAs and Emulex lpfc driver. This kit isexpected to work equally well with other NECiStorage D, S andM supported by SPS.

[Tested Emulex HBA]

iStorage D-10===========================LP952LP9802LP1050LP1150===========================

iStorageM100===========================LPe1150LPe11002

SIOS Protection Suite for LinuxPage 67

Page 88: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

HP Multipath I/O Configurations

Item DescriptionLPe1250LPe12002LPe1105LPe1205===========================

Multipath Software Requirements: The SPS kithas been tested with SPS for Linux 3.3.001. Thereare no known dependencies on the version of theSPS package installed.

Installation Requirements:SPS softwaremust beinstalled prior to installing the SPS recovery kit.

Adding or Repairing SPS Paths:WhenLifeKeeper brings an SPS resource into service, itestablishes a persistent reservation registered toeach path that was active at that time. If new pathsare added after the initial reservation, or if failedpaths are repaired and SPS automaticallyreactivates them, those paths will not be registeredas a part of the reservation until the next LifeKeeperquickCheck execution for the SPS resource. If SPSallows any writes to that path prior to that point intime, reservation conflicts that occur will be loggedto the systemmessage file. The SPS driver will retrythese IOs on the registered path resulting in noobservable failures to the application. OncequickCheck registers the path, subsequent writeswill be successful.

Pure Storage FA-400 Series FC (Multipath con-figuration using the DMMP Recovery Kit)

By partner testing in multipath configuration of FCconnection using the DMMP Recovery Kit.

QLogic DriversFor other supported fibre channel arrays with QLogicadapters, use the qla2200 or qla2300 driver, version6.03.00 or later.

Emulex Drivers For the supported Emulex fibre channel HBAs, youmust use the lpfc driver v8.0.16 or later. 

Adaptec 29xx DriversFor supported SCSI arrays with Adaptec 29xxadapters, use the aic7xxx driver, version 6.2.0 orlater, provided with the OS distribution.

HP Multipath I/O Configurations

SIOS Protection Suite for LinuxPage 68

Page 89: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

HP Multipath I/O Configurations

Item Description

MultipathClusterInstallationUsing SecurePath

For a fresh installation of amultiple path cluster that uses Secure Path, perform thesesteps:

1. Install the OS of choice on each server.

2. Install the clustering hardware:  FCA2214 adapters, storage, switches and cables. 

3. Install the HP Platform Kit.

4. Install the HP Secure Path software.  This will require a reboot of the system. Verify that Secure Path has properly configured both paths to the storage.  SeeSecure Path documentation for further details.

5. Install LifeKeeper. 

Secure PathPersistentDevice Nodes

Secure Path supports “persistent” device nodes that are in the form of/dev/spdev/spXX where XX is the device name.  These nodes are symbolic links to thespecific SCSI device nodes /dev/sdXX.  LifeKeeper v4.3.0 or later will recognize thesedevices as if they were the “normal” SCSI device nodes /dev/sdXX.  LifeKeeper maintainsits own device name persistence, both across reboots and across cluster nodes, bydirectly detecting if a device is /dev/sda1 or /dev/sdq1, and then directly using the correctdevice node. 

Note:  Support for symbolic links to SCSI device nodes was added in LifeKeeper v4.3.0.

Active/PassiveControllers andControllerSwitchovers

TheMSA1000 implements multipathing by having one controller active with the othercontroller in standby mode. When there is a problem with either the active controller or thepath to the active controller, the standby controller is activated to take over operations. When a controller is activated, it takes some time for the controller to become ready. Depending on the number of LUNs configured on the array, this can take 30 to 90seconds.  During this time, IOs to the storage will be blocked until they can be rerouted tothe newly activated controller.

Single Path onBoot Up DoesNot CauseNotification

If a server can access only a single path to the storage when the system is loaded, therewill be no notification of this problem.  This can happen if a system is rebooted wherethere is a physical path failure as noted above, but transient path failures have also beenobserved.  It is advised that any time a system is loaded, the administrator should checkthat all paths to the storage are properly configured, and if not, take actions to either repairany hardware problems or reload the system to resolve a transient problem.

SIOS Protection Suite for LinuxPage 69

Page 90: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

Hitachi HDLM Multipath I/O Configurations

ProtectingApplications andFile Systems ThatUseMultipathDevices

In order for LifeKeeper to configure and protect applications or file systems that useHDLM devices, the HDLM Recovery Kit must be installed.

Once the HDLM Kit is installed, simply creating an application hierarchy that uses oneor more of themultipath device nodes will automatically incorporate the new resourcetypes provided by the HDLM Kit.

Multipath DeviceNodes

To use the HDLM Kit, any file systems and raw devices must bemounted orconfigured on themultipath device nodes (/dev/sddlm*) rather than on the native/dev/sd* device nodes.

Use of SCSI-3PersistentReservations

The HDLM Kit uses SCSI-3 persistent reservations with a "Write Exclusive"reservation type. This means that devices reserved by one node in the cluster willremain read-accessible to other nodes in the cluster, but those other nodes will beunable to write to the device. Note that this does not mean that you can expect to beable tomount file systems on those other nodes for ongoing read-only access.

LifeKeeper uses the sg_persist utility to issue andmonitor persistent reservations. Ifnecessary, LifeKeeper will install the sg_persist(8) utility.

HardwareRequirements

The HDLM Kit has been tested and certified with the Hitachi SANRISE AMS1000 diskarray using QLogic qla2432 HBAs and 8.02.00-k5-rhel5.2-04 driver and Silkworm3800FC switch. This kit is expected to work equally well with other Hitachi disk arrays. TheHDLM Kit has also been certified with the SANRISE AMS series, SANRISE USP andHitachi VSP. The HBA and the HBA driver must be supported by HDLM.

BR1200 is certified by Hitachi Data Systems. Both single path andmultipathconfiguration require the RDAC driver. Only the BR1200 configuration using theRDAC driver is supported, and the BR1200 configuration using HDLM (HDLM ARK) isnot supported.

SIOS Protection Suite for LinuxPage 70

Page 91: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

Multipath SoftwareRequirements

The HDLM kit has been tested with HDLM for Linux as follows:

05-80, 05-81, 05-90, 05-91, 05-92, 05-93, 05-94, 6.0.0, 6.0.1, 6.1.0, 6.1.1, 6.1.2, 6.2.0,6.2.1, 6.3.0, 6.4.0, 6.4.1, 6.5.0, 6.5.1, 6.5.2, 6.6.0, 6.6.2, 7.2.0, 7.2.1, 7.3.0, 7.3.1,7.4.0, 7.4.1, 7.5.0, 7.6.0, 7.6.1, 8.0.0, 8.0.1, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.2.0,8.2.1, 8.4.0, 8.5.0, 8.5.1, 8.5.2

There are no known dependencies on the version of the HDLM package installed.

Note: The product name changed to “Hitachi Dynamic Link Manager Software(HDLM)” for HDLM 6.0.0 and later. Versions older than 6.0.0 (05-9x) are named“Hitachi HiCommandDynamic Link Manager (HDLM)”.

Note: HDLM version 6.2.1 or later is not supported by HDLM Recovery Kit v6.4.0-2. Ifyou need to use this version of HDLM, you can use HDLM Recovery Kit v7.2.0-1 orlater with LK Core v7.3 or later.

Note: If using LVM with HDLM, the version supported by HDLM is necessary. Also, afilter setting should be added to /etc/lvm/lvm.conf to ensure that the systemdoes not detect the /dev/sd* corresponding to the /dev/sddlm*. If you needmoreinformation, please see "LVM Configuration" in the HDLMmanual.

SIOS Protection Suite for LinuxPage 71

Page 92: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

Linux DistributionRequirements

Linux Distribution Requirements 

HDLM is supported in the following distributions:

RHEL 4 (AS/ES) (x86 or x86_64) Update 1, 2, 3, 4, Update 4 Security Fix (*2), 4.5,4.5Security Fix(*4),4.6,4.6 Security Fix(*8),4.7,4.7 Security Fix(*9), 4.8,4.8 Security Fix(*12) (x86/x86_64)(*1)

RHEL 5, 5.1, 5.1 Security Fix(*5), 5.2, 5.2 Security Fix(*6), 5.3, 5.3 Security Fix(*10),5.4 , 5.4 Security Fix(*11), 5.5, 5.5 Security Fix(*13), 5.6, 5.6 Security Fix(*14),5.7 (x86/x86_64)(*1), 5.8 (x86/x86_64)(*1)

RHEL 6, 6.1, 6.2 , 6.3, 6.4, 6.5 (x86/x86_64)(*1)(*15)

RHEL 7, 7.1, 7.2, 7.3(x86_64)(*1)

(*1) AMD Opteron(Single Core,Dual Core) or Intel EM64T architecture CPU withx86_64 kernel.

(*2) The following kernels are supportedx86:2.6.9-42.0.3.EL,2.6.9-42.0.3.ELsmp,2.6.9-42.0.3.ELhugememx86_64:2.6.9-42.0.3.EL,2.6.9-42.0.3.ELsmp,2.6.9-42.0.3.Ellargesmp

(*3) Hitachi does not support RHEL4U2 environment

(*4) The following kernels are supportedx86:2.6.9-55.0.12.EL,2.6.9-55.0.12.ELsmp,2.6.9-55.0.12.Elhugememx86_64:2.6.9-55.0.12.EL,2.6.9-55.0.12.ELsmp,2.6.9-55.0.12.Ellargesmp

(*5) The following kernels are supportedx86:2.6.18-53.1.13.el5,2.6.18-53.1.13.el5PAE,2.6.18-53.1.21.el5,2.6.18-53.1.21.el5PAEx86_64:2.6.18-53.1.13.el5,2.6.18-53.1.21.el5

(*6) The following kernels are supportedx86:2.6.18-92.1.6.el5,2.6.18-92.1.6.el5PAE,2.6.18-92.1.13.el5,2.6.18-92.1.13.el5PAE,2.6.18-92.1.22.el5,2.6.18-92.1.22.el5PAEx86_64:2.6.18-92.1.6.el5,2.6.18-92.1.13.el5,2.6.18-92.1.22.el5

(*7) The following kernels are supportedx86:2.6.9-34.0.2.EL,2.6.9-34.0.2.ELsmp,2.6.9-34.0.2.ELhugememx86_64:2.6.9-34.0.2.EL,2.6.9-34.0.2.ELsmp,2.6.9-34.0.2.Ellargesmp

(*8) The following kernels are supportedx86:2.6.9-67.0.7.EL,2.6.9-67.0.7.ELsmp,2.6.9-67.0.7.ELhugemem,2.6.9-67.0.22.EL,2.6.9-67.0.22.ELsmp,2.6.9-67.0.22.ELhugememx86_64:2.6.9-67.0.7.EL,2.6.9-67.0.7.ELsmp,2.6.9-67.0.7.ELlargesmp2.6.9-67.0.22.EL,2.6.9-67.0.22.ELsmp,2.6.9-67.0.22.Ellargesmp

(*9) The following kernels are supportedx86:2.6.9-78.0.1.EL,2.6.9-78.0.1.ELsmp,2.6.9-78.0.1.ELhugemem,2.6.9-78.0.5.EL,2.6.9-78.0.5.ELsmp,2.6.9-78.0.5.ELhugemem,2.6.9-78.0.8.EL,2.6.9-78.0.8.ELsmp,2.6.9-78.0.8.ELhugemem, 2.6.9-78.0.17.EL,2.6.9-78.0.17.ELsmp,2.6.9-78.0.17.ELhugemem,2.6.9-78.0.22.EL,2.6.9-78.0.22.ELsmp,2.6.9-

SIOS Protection Suite for LinuxPage 72

Page 93: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

78.0.22.Elhugememx86_64:2.6.9-78.0.1.EL,2.6.9-78.0.1.ELsmp,2.6.9-78.0.1.ELlargesmp, 2.6.9-78.0.5.EL,2.6.9-78.0.5.ELsmp,2.6.9-78.0.5.ELlargesmp,2.6.9-78.0.8.EL,2.6.9-78.0.8.ELsmp,2.6.9-78.0.8.ELlargesmp,2.6.9-78.0.17.EL,2.6.9-78.0.17.ELsmp,2.6.9-78.0.17.ELlargesmp,2.6.9-78.0.22.EL,2.6.9-78.0.22.ELsmp,2.6.9-78.0.22.ELlargesmp

(*10) The following kernels are supportedx86:2.6.18-128.1.10.el5,2.6.18-128.1.10.el5PAE,2.6.18-128.1.14.el5, 2.6.18-128.1.14.el5PAE,2.6.18-128.7.1.el5,2.6.18-128.7.1.el5PAEx86_64:2.6.18-128.1.10.el5,2.6.18-128.1.14.el5

(*11) The following kernels are supportedx86:2.6.18-164.9.1.el5,2.6.18-164.9.1.el5PAE,2.6.18-164.11.1.el5,2.6.18-164.11.1.el5PAEx86_64:2.6.18-164.9.1.el5,2.6.18-164.11.1.el5

(*12) The following kernels are supportedx86:2.6.9-89.0.20.EL,2.6.9-89.0.20.ELsmp,2.6.9-89.0.20.Elhugememx86_64:2.6.9-89.0.20.EL,2.6.9-89.0.20.ELsmp,2.6.9-89.0.20.Ellargesmp

(*13) The following kernels are supportedx86:2.6.18-194.11.1.el5, 2.6.18-194.11.1.el5PAE, 2.6.18-194.11.3.el5, 2.6.18-194.11.3.el5PAE, 2.6.18-194.17.1.el5, 2.6.18-194.17.1.el5PAE, 2.6.18-194.32.1.el5,2.6.18-194.32.1.el5PAEx86_64:2.6.18-194.11.1.el5, 2.6.18-194.11.3.el5, 2.6.18-194.17.1.el5, 2.6.18-194.32.1.el5

(*14) The following kernels are supportedx86:2.6.18-238.1.1.el5,2.6.18-238.1.1.el5PAE,2.6.18-238.9.1.el5,2.6.18-238.9.1.el5PAE,2.6.18-238.19.1.el5,2.6.18-238.19.1.el5PAEx86_64:2.6.18-238.1.1.el5,2.6.18-238.9.1.el5,2.6.18-238.19.1.el5

(*15) The following kernels are supportedx86:2.6.32-71.el6.i686, 2.6.32-131.0.15.el6.i686, 2.6.32-220.el6.i686,2.6.32-279.el6.i686x86_64:2.6.32-71.el6.x86_64, 2.6.32-131.0.15.el6.x86_64, 2.6.32-220.el6.x86_64,2.6.32-279.el6.x86_64

(*16) The following kernels are supportedx86:2.6.18-274.12.1.el5,2.6.18-274.12.1.el5PAE,2.6.18-274.18.1.el5,2.6.18-274.18.1.el5PAEx86_64:2.6.18-274.12.1.el5,2.6.18-274.18.1.el5

(*17) The following kernels are supportedx86:2.6.18-308.8.2.el5,2.6.18-308.8.2.el5PAEx86_64:2.6.18-308.8.2.el5

(*18) The following kernels are supportedx86:2.6.32-220.4.2.el6.i686, 2.6.32-220.17.1.el6.i686, 2.6.32-220.23.1.el6.i686,2.6.32-220.31.1.el6.i686, 2.6.32-220.45.1.el6.i686x86_64:2.6.32-220.4.2.el6.x86_64, 2.6.32-220.17.1.el6.x86_64, 2.6.32-

SIOS Protection Suite for LinuxPage 73

Page 94: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

220.23.1.el6.x86_64, 2.6.32-220.31.1.el6.x86_64, 2.6.32-220.45.1.el6.x86_64,2.6.32-220.48.1.el6.x86_64,2.6.32-220.64.1.el6.x86_64,2.6.32-220.65.1.el6.x86_64, 2.6.32-220.72.2.el6.x86_64

(*19) The following kernels are supportedx86:2.6.32-279.19.1.el6.i686x86_64:2.6.32-279.19.1.el6.x86_64

(*20) The following kernels are supportedx86:2.6.32-358.6.2.el6.i686, 2.6.32-358.11.1.el6.i686, 2.6.32-358.14.1.el6.i686,2.6.32-358.23.2.el6.i686x86_64:2.6.32-358.6.2.el6.x86_64, 2.6.32-358.11.1.el6. x86_64, 2.6.32-358.14.1.el6.x86_64, 2.6.32-358.23.2.el6. x86_64

(*21) The following kernels are supportedx86:2.6.32-431.1.2.el6.i686, 2.6.32-431.3.1.el6.i686, 2.6.32-431.5.1.el6.i686, 2.6.32-431.17.1.el6.i686, 2.6.32-431.20.3.el6.i686, 2.6.32-431.23.3.el6.i686, 2.6.32-431.29.2.el6.i686, 2.6.32-431.72.1.el6.i686,2.6.32-431.77.1.el6.i686x86_64:2.6.32-431.1.2.el6.x86_64, 2.6.32-431.3.1.el6. x86_64, 2.6.32-431.5.1.el6.x86_64, 2.6.32-431.17.1.el6. x86_64, 2.6.32-431.20.3.el6. x86_64, 2.6.32-431.23.3.el6. x86_64, 2.6.32-431.29.2.el6. x86_64, 2.6.32-431.72.1.el6.x86_64,2.6.32-431.77.1.el6.x86_64

(*22) The following kernels are supportedx86:2.6.32-504.3.3.el6.i686x86_64:2.6.32-504.3.3.el6.x86_64

(*23) The following kernels are supportedx86:2.6.18-348.1.1.el5, 2.6.18-348.1.1.el5PAE, 2.6.18-348.6.1.el5, 2.6.18-348.6.1.el5PAE, 2.6.18-348.18.1.el5, 2.6.18-348.18.1.el5PAEx86_64:2.6.18-348.1.1.el5, 2.6.18-348.6.1.el5, 2.6.18-348.18.1.el5

(*24) The following kernels are supportedx86_64:3.10.0-123.13.2.el7.x86_64

(*25) The following kernels are supportedx86_64:3.10.0-229.4.2.el7.x86_64,3.10.0-229.20.1.el7.x86_64

(*26) The following kernels are supportedx86_64:3.10.0-327.4.4.el7.x86_64, 3.10.0-327.4.5.el7.x86_64,3.10.0-327.22.2.el7.x86_64, 3.10.0-327.36.1.el7.x86_64, 3.10.0-327.36.3.el7.x86_64,3.10.0-327.46.1.el7.x86_64, 3.10.0-327.49.2.el7.x86_64,3.10.0-327.55.2.el7.x86_64,3.10.0-327.55.3.el7.x86_64

(*27) The following kernels are supportedx86: 2.6.32-573.8.1.el6.i686, 2.6.32-573.12.1.el6.i686, 2.6.32-573.18.1.el6.i686x86_64: 2.6.32-573.8.1.el6.x86_64,2.6.32-573.12.1.el6.x86_64, 2.6.32-573.18.1.el6.x86_64

(*28) The following kernels are supportedx86:2.6.32-642.1.1.el6.i686, 2.6.32-642.6.2.el6.i686, 2.6.32-642.13.1.el6.i686x86_64:2.6.32-642.1.1.el6.x86_64, 2.6.32-642.6.1.el6.x86_64,2.6.32-

SIOS Protection Suite for LinuxPage 74

Page 95: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

642.6.2.el6.x86_64, 2.6.32-642.13.1.el6.x86_64

(*29) The following kernels are supportedx86:2.6.18-416.el5,2.6.18-416.el5PAE, 2.6.18-419.el5,2.6.18-419.el5PAEx86_64:2.6.18-416.el5, 2.6.18-419.el5

(*30)The following kernels are supportedx86_64:3.10.0-514.6.1.el7.x86_64,3.10.0-514.10.2.el7.x86_64,3.10.0-514.16.1.el7.x86_64

InstallationRequirements

HDLM softwaremust be installed prior to installing the HDLM recovery kit.  Also,customers wanting to transfer their environment from SCSI devices to HDLM devicesmust run the Installation setup script after configuring the HDLM environment. Otherwise, sg3_utils will not be installed.

Adding orRepairing HDLMPaths

When LifeKeeper brings an HDLM resource into service, it establishes a persistentreservation registered to each path that was active at that time. If new paths are addedafter the initial reservation, or if failed paths are repaired and HDLM automaticallyreactivates them, those paths will not be registered as a part of the reservation untilthe next LifeKeeper quickCheck execution for the HDLM resource. If HDLM allowsany writes to that path prior to that point in time, reservation conflicts that occur will belogged to the systemmessage file. The HDLM driver will retry these IOs on theregistered path resulting in no observable failures to the application. Once quickCheckregisters the path, subsequent writes will be successful. The status will be changed to“Offline(E)” if quickCheck detects a reservation conflict. If the status is “Offline(E)”,customers will need tomanually change the status to “Online” using the online HDLMcommand.

Additional settingsfor RHEL7.x

If Red Hat Enterprise Linux 7.0 or later are used with HDLM Recovery Kit, youmustadd "HDLM_DLMMGR=.dlmmgr_exe" in /etc/default/LifeKeeper. HDLM_DLMMGR=.dlmmgr_exe

OS version / Architecture

RHEL4

U1-U4

U3 Security Fix(*7)

U4 Security Fix(*2)

4.5 4.5 Security Fix(*4)

4.6 4.6 Security Fix(*8)

4.7 4.7 Security Fix(*9)

4.8 4.8SecurityFix(*12)

x86/x86_64

SIOS Protection Suite for LinuxPage 75

Page 96: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

HDLM

05-80 05-81 05-90

X

05-91 05-92 X X

05-93 X(*3) X X

05-94 X(*3) X X X X X

6.0.0 X(*3) X X X X X X X X X

6.0.1 X(*3) X X X X X X X X X

6.1.0 X(*3) X X X X X X X X X

6.1.1 X(*3) X X X X X X X X X X

6.1.2 X(*3) X X X X X X X X X X

6.2.0 X(*3) X X X X X X X X X X

6.2.1 X(*3) X X X X X X X X X X

6.3.0 X(*3) X X X X X X X X X X

6.4.0 X(*3) X X X X X X X X X X

6.4.1 X(*3) X X X X X X X X X X

6.5.0 X(*3) X X X X X X X X X X

6.5.1 X(*3) X X X X X X X X X X

6.5.2 X(*3) X X X X X X X X X X

6.6.0 X(*3) X X X X X X X X X X

6.6.2 X(*3) X X X X X X X X X X

SIOS Protection Suite for LinuxPage 76

Page 97: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

7.2.0 X(*3) X X X X X X X X X X

7.2.1 X(*3) X X X X X X X X X X

7.3.0 orlater

X(*3) X X X X X X X X X X

SIOS Protection Suite for LinuxPage 77

Page 98: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

LifeKeeper v6.0 X X X

SIOS Protection Suite for LinuxPage 78

Page 99: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

v6.0(v6.0.1-2 orlater)

v6.1

X X X(v6.1.0-5 orlater)

v6.2

X X X X X X X(v6.2.0-5 orlater)

v6.2

X X X X X X X(v6.2.2-1orlater)

v6.3

X X X X X X X(v6.3.2-1orlater)

v6.4

X X X X X X X X X(v6.4.0-10 orlater)

v7.0

X X X X X X X X X X X(v7.0.0-5 orlater)

V 7.1

X X X X X X X X X X X(v7.1.0-8 orlater)

V7.2

X X X X X X X X X X X(v7.2.0-10 orlater)

SIOS Protection Suite for LinuxPage 79

Page 100: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

V 7.3

X X X X X X X X X X X(v7.3.0-21 orlater)

V 7.4

X X X X X X X X X X X(v7.4.0-63 orlater)

V 7.5

RHEL4 is not supported in v7.5 or later version of LK.(v7.5.0-3640 orlater)

HDLMARK

6.0.1-2 X X X X X X X

6.1.0-4 X X X X X X X

6.2.2-3 X X X X X X X

6.2.3-1 X X X X X X X X X X X

6.4.0-2 X X X X X X X X X X X

7.0.0-1 X X X X X X X X X X X

7.2.0-1 X X X X X X X X X X X

X = supported blank = not supported

RHEL5

NoUpd-ates

5-.-1

5.1Sec-urity Fix(*5)

5-.-2

5.2Sec-urityFix(*6)

5-.-3

5.3Sec-urityFix(*10)

5-.-4

5.4Sec-urityFix(*11)

5-.-5

5.5Sec-urityFix(*13)

5-.-6

5.6Sec-urityFix(*14)

5.7Sec-urityFix(*16)

5.8Sec-urityFix(*17)

5.9Sec-urityFix(*23)

5.-1-0

5.11Sec-urityFix(*29)

x86/x86_64

SIOS Protection Suite for LinuxPage 80

Page 101: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

HDL-M

05-80 05-81 05-90

05-91 05-92

05-93 X

05-94 X X

6.0.-0 X X X X X

6.0.-1 X X X X X

6.1.-0 X X X X X

6.1.-1 X X X X X

6.1.-2 X X X X X X X X X X X X X X X X X X

6.2.-0 X X X X X X X X X X X X X X X X X X

6.2.-1 X X X X X X X X X X X X X X X X X X

6.3.-0 X X X X X X X X X X X X X X X X X X

6.4.-0 X X X X X X X X X X X X X X X X X X

6.4.-1 X X X X X X X X X X X X X X X X X X

6.5.-0 X X X X X X X X X X X X X X X X X X

6.5.-1 X X X X X X X X X X X X X X X X X X

6.5.-2 X X X X X X X X X X X X X X X X X X

SIOS Protection Suite for LinuxPage 81

Page 102: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

6.6.-0 X X X X X X X X X X X X X X X X X X

6.6.-2 X X X X X X X X X X X X X X X X X X

7.2.-0 X X X X X X X X X X X X X X X X X X

7.2.-1 X X X X X X X X X X X X X X X X X X

7.3.-0 X X X X X X X X X X X X X X X X X X

7.3.-1 X X X X X X X X X X X X X X X X X X

7.4.-0 X X X X X X X X X X X X X X X X X X

7.4.-1 X X X X X X X X X X X X X X X X X X

7.5.-0 X X X X X X X X X X X X X X X X X X

7.6.-0 X X X X X X X X X X X X X X X X X X

7.6.-1 X X X X X X X X X X X X X X X X X X

8.0.-0 X X X X X X X X X X X X X X X X X X

8.0.-1 X X X X X X X X X X X X X X X X X X

8.1.-0 X X X X X X X X X X X X X X X X X X

8.1.-1 X X X X X X X X X X X X X X X X X X

8.1.-2 X X X X X X X X X X X X X X X X X X

8.1.-3 X X X X X X X X X X X X X X X X X X

8.1.-4 X X X X X X X X X X X X X X X X X X

8.2.-0 X X X X X X X X X X X X X X X X X X

SIOS Protection Suite for LinuxPage 82

Page 103: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

8.4.-0 X X X X X X X X X X X X X X X X X X

8.5.-0 X X X X X X X X X X X X X X X X X X

8.5.-1 X X X X X X X X X X X X X X X X X X

8.5.-2 X X X X X X X X X X X X X X X X X X

SIOS Protection Suite for LinuxPage 83

Page 104: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

LifeKe-eper

v6.0 

(v6.-0.1-2 orlate-r)

v6.1 

(v6.-1.0-5 orlate-r)

X X

v6.2 

(v6.-2.0-5 orlate-r)

X X

v6.2 

(v6.-2.2-1 orlate-r)

X X X

v6.3 

(v6.-3.2-1 orlate-r)

X X X X X

v6.4 

(v6.-4.0-10orlate-r)

X X X X X X X

SIOS Protection Suite for LinuxPage 84

Page 105: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

v7.0 

(v7.-0.0-5 orlate-r)

X X X X X X X X X

v7.1 

(v7.-1.0-8 orlate-r)

X X X X X X X X X X X

v7.2 

(v7.-2.0-10orlate-r)

X X X X X X X X X X X

v7.3

(v7.-3.0-21orlate-r)

X X X X X X X X X X X X X

v7.4

(v7.-4.0-63orlate-r)

X X X X X X X X X X X X X X

SIOS Protection Suite for LinuxPage 85

Page 106: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

v7.5

(v7.-5.0-364-0 orlate-r)

X X X X X X X X X X X X X X X X X X

v8.0

(v8.-0.0-510orlate-r)

X X X X X X X X X X X X X X X X X X

v8.1

(v8.-1.1-562-0 orlate-r)

X X X X X X X X X X X X X X X X X X

v8.2

(v8.-2.0-621-3 orlate-r)

X X X X X X X X X X X X X X X X X X

v8.2-.1

(v8.-2.1-635-3 orlate-r)

X X X X X X X X X X X X X X X X X X

SIOS Protection Suite for LinuxPage 86

Page 107: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

v8.3-.0

(v8.-3.0-638-9 orlate-r)

X X X X X X X X X X X X X X X X X X

v8.3-.1

(v8.-3.1-639-7 orlate-r)

X X X X X X X X X X X X X X X X X X

v8.3-.2

(v8.-3.2-640-5 orlate-r)

X X X X X X X X X X X X X X X X X X

v8.4-.0

(v8.-4.0-642-7 orlate-r)

X X X X X X X X X X X X X X X X X X

v8.4-.1

(v8.-4.1-644-9 orlate-r)

X X X X X X X X X X X X X X X X X X

SIOS Protection Suite for LinuxPage 87

Page 108: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

v9.0-.0

(v9.-0.0-648-8 orlate-r)

X X X X X X X X X X X X X X X X X X

v9.0-.1

(v9.-0.1-649-2 orlate-r)

X X X X X X X X X X X X X X X X X X

v9.0-.2

(v9.-0.2-651-3 orlate-r)

X X X X X X X X X X X X X X X X X X

v9.1-.0

(v9.-1.0-653-8 orlate-r)

X X X X X X X X X X X X X X X X X X

v9.1-.1

(v9.-1.1-659-4 orlate-r)

X X X X X X X X X X X X X X X X X X

SIOS Protection Suite for LinuxPage 88

Page 109: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

v9.1-.2

(v9.-1.2-660-9 orlate-r)

X X X X X X X X X X X X X X X X X X

SIOS Protection Suite for LinuxPage 89

Page 110: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

HDL-MARK

6.0.-1-2

6.1.-0-4 X X

6.2.-2-3 X X X

6.2.-3-1 X X X X X

6.4.-0-2 X X X X X X X

7.0.-0-1 X X X X X X X X X X X

7.2.-0-1 X X X X X X X X X X X X X X X X X X

8.1.-1-562-0

X X X X X X X X X X X X X X X X X X

8.2.-0-621-3

X X X X X X X X X X X X X X X X X X

8.2.-1-635-3

X X X X X X X X X X X X X X X X X X

8.3.-0-638-9

X X X X X X X X X X X X X X X X X X

8.3.-1-639-7

X X X X X X X X X X X X X X X X X X

8.3.-2-640-5

X X X X X X X X X X X X X X X X X X

SIOS Protection Suite for LinuxPage 90

Page 111: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

8.4.-0-642-7

X X X X X X X X X X X X X X X X X X

8.4.-1-644-9

X X X X X X X X X X X X X X X X X X

9.0.-0-648-8

X X X X X X X X X X X X X X X X X X

9.0.-1-649-2

X X X X X X X X X X X X X X X X X X

9.0.-2-651-3

X X X X X X X X X X X X X X X X X X

9.1.-0-653-8

X X X X X X X X X X X X X X X X X X

9.1.-1-659-4

X X X X X X X X X X X X X X X X X X

9.1.-2-660-9

X X X X X X X X X X X X X X X X X X

X = supported blank = not supported

OS version / Architecture

RHEL6

6 6.1  6.2SecurityFix(*18)

6.3SecurityFix(*19)

6.4SecurityFix(*20)

6.5SecurityFix(*21)

6.6 Secur-ity Fix(*22)

6.7SecurityFix(*27)

6.8SecurityFix(*28)

x86/x86_64

SIOS Protection Suite for LinuxPage 91

Page 112: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

HDLM

6.5.0

6.5.1

6.5.2 X

6.6.0 X

6.6.2 X

6.6.2-01 X X

7.2.0 X X X

7.2.1 X X X

7.3.0 X X X

7.3.1 X X X

7.4.0 X X X X X X X X X

7.4.1 X X X X X X X X X

7.5.0 X X X X X X X X X

7.6.0 X X X X X X X X X

7.6.1 X X X X X X X X X

8.0.0 X X X X X X X X X

8.0.1 X X X X X X X X X

8.1.0 X X X X X X X X X

8.1.1 X X X X X X X X X

8.1.2 X X X X X X X X X

8.1.3 X X X X X X X X X

8.1.4 X X X X X X X X X

8.2.0 X X X X X X X X X

8.2.1 X X X X X X X X X

8.4.0 X X X X X X X X X

8.5.0 X X X X X X X X X

8.5.1 X X X X X X X X X

8.5.2 X X X X X X X X X

SIOS Protection Suite for LinuxPage 92

Page 113: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

LifeKeeper V 7.0

SIOS Protection Suite for LinuxPage 93

Page 114: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

(v7.0.0-5or later)

V 7.1

(v7.1.0-8or later)

V7.2

(v7.2.0-10 orlater)

V 7.3

X(v7.3.0-21 orlater)

V 7.4

X(v7.4.0-63 orlater)

V 7.5

X X X X(v7.5.0-3640 orlater)

v8.0

X X X X(v8.0.0-510 orlater)

v8.1

X X X X(v8.1.1-5620 orlater)

v8.1.2

X X X X X(v8.1.2-5795 orlater)

v8.2.0

X X X X X(v8.2.0-6213 orlater)

SIOS Protection Suite for LinuxPage 94

Page 115: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

v8.2.1 X X X X X X

SIOS Protection Suite for LinuxPage 95

Page 116: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

(8.2.1-6353 orlater)

v8.3.0

X X X X X X(8.3.0-6389 orlater)

v8.3.1

X X X X X X(8.3.1-6937 orlater)

v8.3.2

X X X X X X X X X(8.3.2-6405 orlater)

v8.4.0

X X X X X X X X X(8.4.0-6427 orlater)

v8.4.1

X X X X X X X X X(8.4.1-6449 orlater)

v9.0.0

X X X X X X X X X(v9.0.0-6488 orlater)

v9.0.1

X X X X X X X X X(v9.0.1-6492 orlater)

v9.0.2

X X X X X X X X X(v9.0.2-6513 orlater)

v9.1.0

X X X X X X X X X(v9.1.0-6538 orlater)

SIOS Protection Suite for LinuxPage 96

Page 117: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

v9.1.1

X X X X X X X X X(v9.1.1-6594 orlater)

v9.1.2

X X X X X X X X X(v9.1.2-6609 orlater)

SIOS Protection Suite for LinuxPage 97

Page 118: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

HDLMARK

7.0.0-1

7.2.0-1 X X X X

8.1.1-5620 X X X X

8.1.2-5795 X X X X X

8.2.0-6213 X X X X X

8.2.1-6353 X X X X X X X X X

8.3.0-6389 X X X X X X X X X

8.3.1-6397 X X X X X X X X X

8.3.2-6405 X X X X X X X X X

8.4.0-6427 X X X X X X X X X

8.4.1-6449 X X X X X X X X X

9.0.0-6488 X X X X X X X X X

9.0.1-6492 X X X X X X X X X

9.0.2-6513 X X X X X X X X X

9.1.0-6538 X X X X X X X X X

9.1.1-6594 X X X X X X X X X

9.1.2-6609 X X X X X X X X X

X = supported blank = not supported

RHEL7

7.0SecurityFix(*24)

7.1SecurityFix(*25)

7.2SecurityFix(*26)

7.3SecurityFix(*30)

SIOS Protection Suite for LinuxPage 98

Page 119: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hitachi HDLMMultipath I/O Configurations

x86/x86_64

HDLM

8.0.1 X X

8.1.0 X X

8.1.1 X X

8.1.2 X X

8.1.3 X X

8.1.4 X X

8.2.0 X X

8.2.1 X X

8.4.0 X X X

8.5.0 X X X

8.5.1 X X X X

8.5.2 X X X X

LifeKeeper

v9.0.0(v9.0.0-6488 orlater)

X X

v9.0.1(v9.0.1-6492 orlater)

X X

v9.0.2(v9.0.2-6513 orlater)

X X X

v9.1.0(v9.1.0-6538 orlater)

X X X

v9.1.1(v9.1.1-6594 orlater)

X X X X

v9.1.2(v9.1.2-6609 orlater)

X X X X

SIOS Protection Suite for LinuxPage 99

Page 120: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

DeviceMapper Multipath I/O Configurations

HDLMARK

9.0.0-6488 X X

9.0.1-6492 X X

9.0.2-6513 X X X

9.1.0-6538 X X X

9.1.1-6594 X X X X

9.1.2-6609 X X X X

X = supported blank = not supported

(Note):If you are running the system with LifeKeeper v9.0.x on RHEL7.x, you need to apply the Bug7205'spatch..

(Note 2):The Raw device configuration is not supported on RHEL7/7.1/7.2/7.3.

Device Mapper Multipath I/O Configurations

ProtectingApplicationsand FileSystems ThatUse DeviceMapperMultipathDevices

In order for LifeKeeper to operate with and protect applications or file systems that useDeviceMapper Multipath devices, the DeviceMapper Multipath (DMMP) Recovery Kitmust be installed.

Once the DMMP Kit is installed, simply creating an application hierarchy that uses one ormore of themultipath device nodes will automatically incorporate the new resource typesprovided by the DMMP Kit.

MultipathDeviceNodes

To use the DMMP Kit, any file systems and raw devices must bemounted or configured onthemultipath device nodes rather than on the native /dev/sd* device nodes.  The supportedmultipath device nodes to address the full disk are /dev/dm-#, /dev/mapper/<uuid>,/dev/mapper/<user_friendly_name> and /dev/mpath/<uuid>.  To address the partitions of adisk, use the device nodes for each partition created in the /dev/mapper directory.

Use of SCSI-3 PersistentReservations

The DeviceMapper Multipath Recovery Kit uses SCSI-3 persistent reservations with a"Write Exclusive" reservation type.  This means that devices reserved by one node in thecluster will remain read-accessible to other nodes in the cluster, but those other nodes willbe unable to write to the device.  Note that this does not mean that you can expect to beable tomount file systems on those other nodes for ongoing read-only access.

LifeKeeper uses the sg_persist utility to issue andmonitor persistent reservations.  Ifnecessary, LifeKeeper will install the sg_persist(8) utility.

SCSI-3 Persistent Reservations must be enabled on a per LUN basis when using EMCSymmetrix (including VMAX) arrays with multipathing software andLifeKeeper. This applies to both DMMP and PowerPath.

SIOS Protection Suite for LinuxPage 100

Page 121: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

DeviceMapper Multipath I/O Configurations

HardwareRequirements

The DeviceMapper Multipath Kit has been tested by SIOS Technology Corp. with theEMC CLARiiON CX300, the HP EVA 8000, HP MSA1500, HP P2000, the IBM SANVolumeController (SVC), the IBM DS8100, the IBM DS6800, the IBM ESS, the DataCoreSANsymphony, and the HDS 9980V. Check with your storage vendor to determine theirsupport for DeviceMapper Multipath.

Enabling support for the use of reservations on the CX300 and the VNX Series requires thatthe hardware handler be notified to honor reservations. Set the following parameter in/etc/multipath.conf for this array:

hardware_handler “3 emc 0 1"

The HP MSA1500 returns a reservation conflict with the default path checker setting (tur). This will cause the standby node tomark all paths as failed.  To avoid this condition, set thefollowing parameter in /etc/multipath.conf for this array:

path_checker readsector0

The HP 3PAR F400 returns a reservation conflict with the default path checker. To avoidthis conflict, set (add) the following parameter in /etc/default/LifeKeeper for thisarray:

DMMP_REGISTRATION_TYPE=hba

For the HDS 9980V the following settings are required:

l Host mode: 00

l System option: 254 (must be enabled; global HDS setting affecting all servers)

l Device emulation: OPEN-V

Refer to the HDS documentation "Suse Linux DeviceMapper Multipath for HDS Storage"or "Red Hat Linux DeviceMapper Multipath for HDS Storage" v1.15 or later for details onconfiguring DMMP for HDS. This documentation also provides a compatible multipath.conffile.

For the EVA storage with firmware version 6 or higher, DMMP Recovery Kit v6.1.2-3 orlater is required.  Earlier versions of the DMMP Recovery Kit are supported with the EVAstorage with firmware versions prior to version 6.

MultipathSoftwareRequirements

For SUSE, multipath-tools-0.4.5-0.14 or later is required.

For Red Hat, device-mapper-multipath-0.4.5-12.0.RHEL4 or later is required.

It is advised to run the latest set of multipath tools available from the vendor.  The featurecontent and the stability of this multipath product are improving at a very fast rate.

LinuxDistributionRequirements

Some storage vendors such as IBM have not certified DMMP with SLES 11 at this time.

SIOS Technology Corp. is currently investigating reported issues with DMMP, SLES 11,and EMCs CLARiiON and Symmetrix arrays.

SIOS Protection Suite for LinuxPage 101

Page 122: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper I-O Fencing Introduction

Transientpath failures

While running IO tests on DeviceMapper Multipath devices, it is not uncommon for actionson the SAN, for example, a server rebooting, to cause paths to temporarily be reported asfailed.  In most cases, this will simply cause one path to fail leaving other paths to send IOsdown resulting in no observable failures other than a small performance impact.  In somecases, multiple paths can be reported as failed leaving no paths working.  This can causean application, such as a file system or database, to see IO errors.  There has beenmuchimprovement in DeviceMapper Multipath and the vendor support to eliminate thesefailures.  However, at times, these can still be seen.  To avoid these situations, considerthese actions:

1. Verify that themultipath configuration is set correctly per the instructions of the diskarray vendor.

2. Check the setting of the “failback” feature.  This feature determines how quickly apath is reactivated after failing and being repaired.  A setting of “immediate”indicates to resume use of a path as soon as it comes back online.  A setting of aninteger indicates the number of seconds after a path comes back online to resumeusing it.  A setting of 10 to 15 generally provides sufficient settle time to avoidthrashing on the SAN.

3. Check the setting of the "no_path_retry" feature. This feature determines whatDeviceMapper Multipath should do if all paths fail. We recommend a setting of 10 to15. This allows some ability to "ride out" temporary events where all paths fail whilestill providing a reasonable recovery time. The LifeKeeper DMMP kit will monitorIOs to the storage and if they are not responded to within four minutes LifeKeeperwill switch the resources to the standby server. NOTE: LifeKeeper does notrecommend setting "no_path_retry" to "queue" since this will result in IOs that arenot easily killed.  The only mechanism found to kill them is on newer versions ofDM, the settings of the device can be changed:

/sbin/dmsetup message -u 'DMid' 0 fail_if_no_path

This will temporarily change the setting for no_path_retry to fail causing anyoutstanding IOs to fail.  However, multipathd can reset no_path_retry to the defaultat times. When the setting is changed to fail_if_no_path to flush failed IOs, it shouldthen be reset to its default prior to accessing the device (manually or viaLifeKeeper).

If "no_path_retry" is set to "queue" and a failure occurs, LifeKeeper will switch theresources over to the standby server.  However, LifeKeeper will not kill these IOs. The recommendedmethod to clear these IOs is through a reboot but can also bedone by an administrator using the dmsetup command above.  If the IOs are notcleared, then data corruption can occur if/when the resources are taken out ofservice on the other server thereby releasing the locks and allowing the "old" IOs tobe issued.

LifeKeeper I-O Fencing IntroductionI/O fencing is the locking away of data from amalfunctioning node preventing uncoordinated access to shared

SIOS Protection Suite for LinuxPage 102

Page 123: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

SCSI Reservations

storage. In an environment wheremultiple servers can access the same data, it is essential that all writes areperformed in a controlledmanner to avoid data corruption. Problems can arise when the failure detectionmechanism breaks down because the symptoms of this breakdown canmimic a failed node. For example, ina two-node cluster, if the connection between the two nodes fails, each node would “think” the other hasfailed, causing both to attempt to take control of the data resulting in data corruption. I/O fencing removes thisdata corruption risk by blocking access to data from specific nodes.

SCSI Reservations

Storage Fence Using SCSI ReservationsWhile LifeKeeper for Linux supports both resource fencing and node fencing, its primary fencingmechanism isstorage fencing through SCSI reservations. This fence, which provides the highest level of data protection forshared storage, allows for maximum flexibility andmaximum security providing very granular locking to theLUN level. The underlying shared resource (LUN) is the primary quorum device in this architecture. Quorumcan be defined as exclusive access to shared storage, meaning this shared storage can only be accessed byone server at a time. The server who has quorum (exclusive access) owns the role of “primary.” Theestablishment of quorum (who gets this exclusive access) is determined by the “quorum device.”

As stated above, with reservations enabled, the quorum device is the shared resource. The shared resourceestablishes quorum by determining who owns the reservation on it. This allows a cluster to continue tooperate down to a single server as long as that single server can access the LUN.

SCSI reservations protect the shared user data so that only the system designated by LifeKeeper canmodifythe data. No other system in the cluster or outside the cluster is allowed tomodify that data. SCSIreservations also allow the application being protected by LifeKeeper to safely access the shared user data

SIOS Protection Suite for LinuxPage 103

Page 124: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

AlternativeMethods for I/O Fencing

when there aremultiple server failures in the cluster. A majority quorum of servers is not required; the onlyrequirement is establishing ownership of the shared data.

Adding quorum/witness capabilities provides for the establishment of quorummembership. Without thismembership, split-brain situations could result in multiple servers, even all servers, killing each other.Watchdog added to configurations with reservations enabled provides amechanism to recover from partiallyhung servers. In cases where a hung server goes undetected by LifeKeeper, watchdog will begin recovery.Also, in the case where a server is hung and not able to detect that the reservation has been stolen, watchdogcan reboot the server to begin its recovery.

Alternative Methods for I/O FencingIn addition to resource fencing using SCSI reservations, LifeKeeper for Linux also supports disablingreservations. Regardless of whether reservations are enabled or disabled, there are two issues to be aware of:

l Access to the storagemust be controlled by LifeKeeper.

l Great caremust be taken to ensure that the storage is not accessed unintentionally such as bymounting file systems manually, fsck manually, etc.

If these two rules are followed and reservations are enabled, LifeKeeper will prevent most errors fromoccurring. With reservations disabled (alone), there is no protection. Therefore, other options must be exploredin order to provide this protection. The following sections discuss these different fencing options andalternatives that help LifeKeeper provide a reliable configuration even without reservations.

Disabling ReservationsWhile reservations provide the highest level of data protection for shared storage, in some cases, the use ofreservations is not available andmust be disabled within LifeKeeper. With reservations disabled, the storageno longer acts as an arbitrator in cases wheremultiple systems attempt to access the storage, intentionally orunintentionally.

Consideration should be given to the use of other methods to fence the storage through cluster membershipwhich is needed to handle system hangs, system busy situations and any situation where a server can appearto not be alive.

The key to a reliable configuration without reservations is to “know” that when a failover occurs, the “other”server has been powered off or power cycled. There are four fencing options that help accomplish this,allowing LifeKeeper to provide a very reliable configuration, even without SCSI reservations. These includethe following: 

l STONITH (Shoot the Other Node in the Head) using a highly reliable interconnect, i.e. serialconnection between server and STONITH device. STONITH is the technique to physically disable orpower-off a server when it is no longer considered part of the cluster. LifeKeeper supports the ability topower off servers during a failover event thereby insuring safe access to the shared data. This optionprovides reliability similar to reservations but is limited to two nodes physically located together.

l Quorum/Witness –Quorum/witness servers are used to confirm membership in the cluster, especiallywhen the cluster servers are at different locations. While this option can handle split-brain, it, alone, isnot recommended due to the fact that it does not handle system hangs.

SIOS Protection Suite for LinuxPage 104

Page 125: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Non-Shared Storage

l Watchdog –Watchdogmonitors the health of a server. If a problem is detected, the server with theproblem is rebooted or powered down. This option can recover from a server hang; however, it does nothandle split-brain; therefore this option alone is also not recommended.

l CONFIRM_SO – This option requires that automatic failover be turned off, so while very reliable(depending upon the knowledge of the administrator), it is not as available.

While none of these alternative fencingmethods alone are likely to be adequate, when used in combination, avery reliable configuration can be obtained.

Non-Shared StorageIf planning to use LifeKeeper in a non-shared storage environment, the risk of data corruption that exists withshared storage is not an issue; therefore, reservations are not necessary. However, partial or full resyncs andmerging of datamay be required. To optimize reliability and availability, the above options should beconsidered with non-shared storage as well.

Note: For further information comparing the reliability and availability of the different options, see the I/OFencing Comparison Chart.

It is important to note that no option will provide complete data protection, but the following combination willprovide almost the same level of protection as reservations.

Configuring I/O Fencing Without ReservationsTo configure a cluster to support node fencing, complete the following steps:

1. Stop LifeKeeper.

2. Disable the use of SCSI reservations within LifeKeeper. This is accomplished by editing theLifeKeeper defaults file, /etc/default/LifeKeeper, on all nodes in the cluster. Add or modify theReservations variable to be “none”, e.g. RESERVATIONS=”none”. (Note that this option should onlybe used when reservations are not available.)

3. Obtain and configure a STONITH device or devices to provide I/O fencing. Note that for thisconfiguration, STONITH devices should be configured to do a system “poweroff” command rather thana “reboot”. Take care to avoid bringing a device hierarchy in service on both nodes simultaneously via amanual operation when LifeKeeper communications have been disrupted for some reason.

4. If desired, obtain and configure a quorum/witness server(s). For complete instructions and informationon configuring and using a witness server, seeQuorum/Witness Server Support Package topic.

Note: The quorum/witness server should reside at a site apart from the other servers in the cluster toprovide the greatest degree of protection in the event of a site failure.

5. If desired, configure watchdog. For more information, see the Watchdog topic.

I/O Fencing Chart

SIOS Protection Suite for LinuxPage 105

Page 126: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

I/O Fencing Chart

Split-Brain Hung ServerReservations On

Alone

Quroum/Witness

Watchdog

Watchdog & Quorum/Witness

STONITH (serial)

Reservations Off

Nothing

STONITH (serial)

CONFIRM_SO*

Quorum/Witness

Watchdog

Non-Shared Storage

Default Features

Quorum/Witness

CONFIRM_SO*

Watchdog

SIOS Protection Suite for LinuxPage 106

Page 127: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Quorum/Witness

STONITH (serial)

Watchdog & STONITH

Most Reliable                                      Least Reliable

* While CONFIRM_SO is highly reliable (depending upon the knowledge of the administrator), it has loweravailability due to the fact that automatic failover is turned off.

Quorum/Witness

Quorum/Witness Server Support Package for LifeKeeper

Feature SummaryTheQuorum/Witness Server Support Package for LifeKeeper (steeleye-lkQWK) combined with the existingfailover process of the LifeKeeper core allows system failover to occur with a greater degree of confidence insituations where total network failure could be common. This effectively means that local site failovers andfailovers to nodes across aWAN can be done while greatly reducing the risk of “split-brain” situations. Thepackage will provide amajority-based quorum check to handle clusters with greater than two nodes. Thisadditional quorum logic will only be enabled if the witness support package is installed.

Using one or more witness servers will allow a node, prior to bringing resources in service after acommunication failure, to get a “second opinion” on the status of the failing node. The witness server is anadditional server that acts as an intermediary to determine which servers are part of the cluster. Whendetermining when to fail over, the witness server allows resources to be brought in service on a backup serveronly in cases where it verifies the primary server has failed and is no longer part of the cluster. This willprevent failovers from happening due to simple communication failures between nodes when those failuresdon’t affect the overall access to, and performance of, the in-service node. During actual operation, for theinitial implementation, all other nodes in the cluster will be consulted, including the witness node(s).

Package RequirementsIn addition to the requirements already discussed, this package requires that standard, licensed LifeKeepercore be installed on the server(s) that will act as the witness server(s). Note: As long as communication pathsare configured correctly, multiple clusters can share a single quorum/witness server (for more information, see“Additional Configuration for Shared-Witness Topologies” below).

SIOS Protection Suite for LinuxPage 107

Page 128: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Package Installation and Configuration

All nodes which will participate in a quorum/witness mode cluster, including witness-only nodes, should beinstalled with the Quorum/Witness Server Support Package for LifeKeeper. If using the tcp_remote quorummode, the hosts configured in QUORUM_HOSTS within /etc/default/LifeKeeper are not required to beinstalled with the Quorum/Witness Server Support Package for LifeKeeper.

Package Installation and ConfigurationTheQuorum/Witness Server Support Package for LifeKeeper will need to be installed on each server in thequorum/witness mode cluster, including any witness-only servers. The only configuration requirement for thewitness node is to create appropriate comm paths.

The general process for adding a witness server(s) will involve the following steps:

l Set up the server(s) for the witness node(s) and ensure network communications are available to theother nodes.

l Install the LifeKeeper core on the witness node(s) and properly license/activate it.

l Install the quorum/witness support package on all nodes in the cluster.

l Create appropriate communication paths between the nodes including the witness node.

Once this is complete, the cluster should behave in quorum/witness mode, and failovers will consult othernodes including the witness node prior to a failover being allowed. The default configuration after installing thepackage will enablemajority-based quorum and witness checks.

Note:Due tomajority-based quorum, it is recommended that the clusters always be configured with an oddnumber of nodes.

See the Configurable Components section below for additional configuration options.

Note:Any node with the witness package installed can participate in witness functionality. The witness-onlynodes will simply have a compatible version of the LifeKeeper core, the witness package installed and will nothost any protected resources.

Configurable ComponentsThe quorum/witness package contains two configurable modes: quorum and witness. By default, installingthe quorum/witness package will enable both quorum and witness modes suitable for most environments thatneed witness features.

The behavior of thesemodes can be customized via the /etc/default/LifeKeeper configuration file,and the quorum and witness modes can be individually adjusted. The package installs default settings into theconfiguration file when it is installed,majority being the default quorummode and remote_verify being thedefault witness mode. An example is shown below:

QUORUM_MODE=majorityWITNESS_MODE=remote_verify

SIOS Protection Suite for LinuxPage 108

Page 129: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Available QuorumModes

Note:Although each cluster node can have an entirely different witness/quorum configuration, it isrecommended that all nodes have the same configuration to avoid unexpected, and difficult to diagnose,situations.

Available Quorum ModesThree quorum checkingmodes are available which can be set via the QUORUM_MODE setting in/etc/default/LifeKeeper:majority (the default), tcp_remote and none/off. Each of these is describedbelow:

majority

Themajority setting, which is the default, will determine quorum based on the number of visible/aliveLifeKeeper nodes at the time of the check. This check is a simplemajority -- if more than half the total nodesare visible, then the node has quorum.

tcp_remote

The tcp_remote quorummode is similar tomajority mode except:

l servers consulted are configured separately from the cluster and its comm paths (these servers doNOT have to have LifeKeeper installed).

l servers are consulted by simply connecting to the TCP/IP service listening on the configured port.

Additional configuration is required for this mode since the TCP timeout allowance (QUORUM_TIMEOUT_SECS) and the hosts to consult (QUORUM_HOSTS) must be added to /etc/default/LifeKeeper. Anexample configuration for tcp_remote is shown below:

QUORUM_MODE=tcp_remote

# What style of quorum verification do we do in comm_up/down# and lcm_avail (maybe other) event handlers.# The possible values are:# - none/off: Do nothing, skip the check, assume all is well.# - majority: Verify that this node and the nodes it can reach# have more than half the cluster nodes.# - tcp_remote: Verify that this node can reach more than half# of the QUORUM_HOSTS via tcp/ip.

QUORUM_HOSTS=myhost:80,router1:443,router2:22

# If QUORUM_MODE eq tcp_remote, this should be a comma delimited# list of host:port values – like myhost:80,router1:443,router2:22.# This doesn't matter if the QUORUM_MODE is something else. QUORUM_

TIMEOUT_SECS=20# The time allowed for tcp/ip witness connections to complete.# Connections that don't complete within this time are treated# as failed/unavailable.# This only applies when the QUORUM_MODE is tcp_remote.

SIOS Protection Suite for LinuxPage 109

Page 130: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

AvailableWitness Modes

WITNESS_MODE=remote_verify

# This can be either off/none or remote_verify. In remote_verify# mode, core event handlers (comm_down) will doublecheck the# death of a system by seeing if other visible nodes# also think it is dead.

QUORUM_LOSS_ACTION=fastboot

# This can be one of osu, fastkill or fastboot.# fastboot will IMMEDIATELY reboot the system if a loss of quorum# is detected.# fastkill will IMMEDIATELY halt/power off the system upon# loss of quorum.# osu will just take any in-service resources out of service.# Note: this action does not sync disks or unmount filesystems.

QUORUM_DEBUG=

# Set to true/on/1 to enable debug messages from the Quorum# modules.

HIDE_GUI_SYS_LIST=true

Note: Due to the inherent flexibility and complexity of this mode, it should be used with caution by someoneexperienced with both LifeKeeper and the particular network/cluster configuration involved.

none/off

In this mode, all quorum checking is disabled. This causes the quorum checks to operate as if the nodealways has quorum regardless of the true state of the cluster.

Available Witness ModesTwowitness modes are available which can be set via theWITNESS_MODE setting in the/etc/default/LifeKeeper: remote_verify and none/off. Each of these is described below:

remote_verify

In this default mode, witness checks are done to verify the status of a node. This is typically done when anode appears to be failing. It enables a node to consult all the other visible nodes in the cluster about their viewof the status of the failingmachine to double-check the communications.

none/off

In this mode, witness checking is disabled. In the case of a communication failure, this causes the logic tobehave exactly as if there was no witness functionality installed.

Note: It would be unnecessary for witness checks to ever be performed by servers acting as dedicatedquorum/witness nodes that do not host resources; therefore, this setting should be set to none/off on theseservers.

SIOS Protection Suite for LinuxPage 110

Page 131: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Available Actions WhenQuorum is Lost

Available Actions When Quorum is LostThe witness package offers three different options for how the system should react if quorum is lost --“fastboot”, “fastkill” and “osu”. These options can be selected via the QUORUM_LOSS_ACTION setting in/etc/default/LifeKeeper. All three options take the system’s resources out of service; however, theyeach allow a different behavior. The default option, when the quorum package is installed, is fastboot. Each ofthese options is described below:

fastboot

If the fastboot option is selected, the system will be immediately rebooted when a loss of quorum is detected(from a communication failure). Although this is an aggressive option, it ensures that the system will bedisconnected from any external resources right away. In many cases, such as with storage-level replication,this immediate release of resources is desired.

Two important notes on this option are:

1. The system performs an immediate hard reboot without first performing any shut-down procedure; notasks are performed (disk syncing, etc.).

2. The system will come back up performing normal startup routines, including negotiating storage andresource access, etc.

fastkill

The fastkill option is very similar to the fastboot option, but instead of a hard reboot, the system willimmediately halt when quorum is lost. As with the fastboot option, no tasks are performed (disk syncing, etc.),and the system will then need to bemanually rebooted and will come back up performing normal startuproutines, including negotiating storage and resource access, etc.

osu

The osu option is the least aggressive option, leaving the system operational but taking resources out ofservice on the system where quorum is lost. In some cluster configurations, this is all that is needed, but itmay not be strong enough or fast enough in others.

Additional Configuration for Shared-Witness TopologiesWhen a quorum witness server will be shared by more than one cluster, it can be configured to simplifyindividual cluster management. In standard operation, the LifeKeeper GUI will try to connect to all clusternodes at once when connected to the first node. It connects to all the systems that can be seen by eachsystem in the cluster. Since the shared witness server is connected to all clusters, this will cause the GUI toconnect to all systems in all clusters visible to the witness node.

To avoid this situation, the HIDE_GUI_SYS_LIST configuration parameter should be set to “true” onany shared witness server. This effectively hides the servers that are visible to the witness server, resultingin the GUI only connecting to servers in the cluster that are associated with the first server connected to.Note: This should be set only on the witness server.

SIOS Protection Suite for LinuxPage 111

Page 132: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Adding aWitness Node to a Two-Node Cluster 

Since the GUI connects only to servers in the cluster that are associated with the first server connected to, ifthat first server is the witness server, and HIDE_GUI_SYS_LIST is set to “true,” the GUI will notautomatically connect to the other servers with established communication paths. As this behavior is nottypical LifeKeeper GUI behavior, it may lead an installer to incorrectly conclude that there is a network or otherconfiguration problem. To use the LifeKeeper GUI on a witness server with this setting, connect manually toone of the other nodes in the cluster, and the remaining nodes in the cluster will be shown in the GUI correctly.

Note: To prevent witness checks from being performed on all systems in all clusters, thewitness_modeshould always be set to none/off on shared, dedicated quorum witness nodes.

Adding a Witness Node to a Two-Node Cluster The following is an example of a two-node cluster utilizing the Quorum/Witness Server Support Package forLifeKeeper by adding a third “witness” node.

SIOS Protection Suite for LinuxPage 112

Page 133: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Expected Behaviors (Assuming Default Modes)

Simple Two-Node Cluster with Witness NodeServer A and Server B should already be set up with LifeKeeper core with resource hierarchies created onServer A and extended to Server B (ServerW will have no resource hierarchies extended to it). Using thefollowing steps, a third node will be added as the witness node.

1. Set up the witness node, making sure network communications are available to the other two nodes.

2. Install LifeKeeper core on the witness node and properly license/activate it.

3. Install the Quorum/Witness Server Support Package on all three nodes.

4. Create comm paths between all three nodes.

5. Set desired quorum checkingmode in /etc/default/LifeKeeper (majority, tcp_remote,none/off) (selectmajority for this example). See Available QuorumModes for an explanation of thesemodes.

6. Set desired witness mode in /etc/default/LifeKeeper (remote_verify, none/off). See AvailableWitness Modes for an explanation of thesemodes.

Expected Behaviors (Assuming Default Modes)

Scenario 1Communications fail between Servers A and B

If the communications fail between Server A and Server B, the following will happen:

l Both Server A and B will begin processing communication failure events, though not necessarily atexactly the same time.

l Both servers will perform the simple quorum check and determine that they still are in themajority (bothA and B can seeW and think they have two of the three known nodes).

l Each will consult the other nodes with whom they can still communicate about the true status of theserver with whom they’ve lost communications. In this scenario, this means that Server A will consultW about B’s status and B will also consult W about A’s status.

l Server A and B will both determine that the other is still alive by having consulted the witness serverand no failover processing will occur. Resources will be left in service where they are.

Scenario 2Communications fail between Servers A and W

Since all nodes can and will act as witness nodes when the witness package is installed, this scenario is thesame as the previous. In this case, Server A andWitness ServerW will determine that the other is still aliveby consulting with Server B.

SIOS Protection Suite for LinuxPage 113

Page 134: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Scenario 3

Scenario 3Communications fail between Server A and all other nodes (A fails)

In this case, Server B will do the following:

l Begin processing a communication failure event from Server A.

l Determine that it can still communicate with theWitness ServerW and thus has quorum.

l Verify via ServerW that Server A really appears to be lost and, thus, begin the usual failover activity.

l Server B will now have the protected resources in service.

With B now acting as Source, communication resumes for Server A

Based on the previous scenario, Server A now resumes communications. Server B will process a comm_upevent, determine that it has quorum (all three of the nodes are visible) and that it has the resources in service.Server A will process a comm_up event, determine that it also has quorum and that the resources are inservice elsewhere. Server A will not bring resources in service at this time.

With B now acting as Source, Server A is powered on with communications to the other nodes

In this case, Server B will respond just like in the previous scenario, but Server A will process an lcm_availevent. Server A will determine that it has quorum and respond normally in this case by not bringing resourcesin service that are currently in service on Server B.

With B now acting as Source, Server A is powered on without communications

In this case, Server A will process an lcm_avail event and Servers B andW will do nothing since they can’tcommunicate with Server A. Server A will determine that it does not have quorum since it can onlycommunicate with one of the three nodes. In the case of not having quorum, Server A will not bring resourcesin service.

Scenario 4Communications fail between Server A and all other nodes (A's network fails but A is still running)

In this case, Server B will do the following:

l Begin processing a communication failure event from Server A.

l Determine that it can still communicate with theWitness ServerW and thus has quorum.

l Verify via serverW that Server A really appears to be lost and, thus, begin the usual failover activity.

l Server B will now have the protected resources in service.

Also, in this case, Server A will do the following:

l Begin processing a communication failure event from Server B.

l Determine that it cannot communicate with Server B nor theWitness ServerW and thus does not have

SIOS Protection Suite for LinuxPage 114

Page 135: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

STONITH

quorum.

l Immediately reboot ("fastboot" is the default behavior causing a hard reboot).

STONITHSTONITH (Shoot TheOther Node in the Head) is a fencing technique for remotely powering down a node in acluster. LifeKeeper can provide STONITH capabilities by using external power switch controls, IPMI-enabledmotherboard controls and hypervisor-provided power capabilities to power off the other nodes in a cluster.

Using IPMI with STONITHIPMI (Intelligent Platform Management Interface) defines a set of common interfaces to a computer systemwhich can be used tomonitor system health andmanage the system. Used with STONITH, it allows thecluster software to instruct the switch via a serial or network connection to power off or reboot a cluster nodethat appears to have died thus ensuring that the unhealthy node cannot access or corrupt any shared data.

Package Requirementsl IPMI tools package (e.g. ipmitool-1.8.11-6.el6.x86_64.rpm)

STONITH in VMware vSphere EnvironmentsvCLI (vSphere Command-Line Interface) is a command-line interface supported by VMware for managingyour virtual infrastructure including the ESXi hosts and virtual machines. You can choose the vCLI commandbest suited for your needs and apply it for your LifeKeeper STONITH usage between VMware virtualmachines.

Package Requirements

STONITH Server

l VMware vSphere SDK Package or VMware vSphere CLI. (vSphere CLI is included inthe same installation package as the vSphere SDK).

Monitored Virtual Machine

l VMware Tools

Installation and ConfigurationAfter installing LifeKeeper and configuring communication paths for each node in the cluster, install andconfigure STONITH.

1. Install the LifeKeeper STONITH script by running the following command:

/opt/LifeKeeper/samples/STONITH/stonith-install

SIOS Protection Suite for LinuxPage 115

Page 136: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Installation and Configuration

2. (*For IPMI usage only) Using BIOS or the ipmitool command, set the following BMC (BaseboardManagement Controller) variables:

l Use Static IP

l IP address

l Sub netmask

l User name

l Password

l Add Administrator privilege level to the user

l Enable network access to the user

Example using ipmitool command

(For detailed information, see the ipmitool man page.)

# ipmitool lan set 1 ipsrc static# ipmitool lan set 1 ipaddr 192.168.0.1# ipmitool lan set 1 netmask 255.0.0.0# ipmitool user set name 1 root# ipmitool user set password 1 secret# ipmitool user priv 1 4# ipmitool user enable 1

3. Edit the configuration file.

Update the configuration file to enable STONITH and add the power off command line. Note:Power offis recommended over reboot to avoid fence loops (i.e. twomachines have lost communication but canstill STONITH each other, taking turns powering each other off and rebooting).

/opt/LifeKeeper/config/stonith.conf

SIOS Protection Suite for LinuxPage 116

Page 137: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

<vm_id>

# LifeKeeper STONITH configuration## Each system in the cluster is listed below. To enable STONITH for a# given system,# remove the '#' on that line and insert the STONITH command line to power off# that system.

# Example1: ipmi command

# node-1 ipmitool -I lanplus -H 10.0.0.1 -U root -P secret power off

# Example2: vCLI-esxcli command

# node-2 esxcli --server=10.0.0.1 --username=root --password=secret vms vm kill --type='hard' --world-id=1234567

# Example3: vCLI-vmware_cmd command

# node-3 vmware-cmd -H 10.0.0.1 -U root -P secret <vm_id> stop hard

minute-maid ipmitool -I lanplus -H 192.168.0.1 -U root -P secret power offkool-aid ipmitool -I lanplus -H 192.168.0.2 -U root -P secret power off

vm1 esxcli --server=10.0.0.1 --username=root --password=secret vms vm kill --type='hard' --world-id=1234567vm2 vmware-cmd -H 10.0.0.1 -U root -P secret <vm_id> stop hard

<vm_id>vSphere CLI commands run on top of vSphere SDK for Perl. <vm_id> is used as an identifier of the VM. Thisvariable should point to the VM's configuration file for the VM being configured.

To find the configuration file path:

1. Type the following command:

vmware-cmd -H <vmware host> -l

2. This will return a list of VMware hosts.

Example output from vmware-cmd -l with three vms listed:

/vmfs/volumes/4e08c1b9-d741c09c-1d3e-0019b9cb28be/lampserver/lampserver.vmx/vmfs/volumes/4e1e1386-0b862fae-a859-0019b9cb28bc/oracle10/oracle.vmx/vmfs/volumes/4e08c1b9-d741c09c-1d3e-0019b9cb28be/lampserver02/lampserver02.vmx

Find the VM being configured in the resulting list.

3. Paste the path name into the <vm_id> variable. The example above would then become:

vmware-cmd -H 10.0.0.1 -U root -P secret /vmfs/volumes/4e08c1b9-d741c09c-1d3e-0019b9cb28be/lampserver/lampserver.vmx stop hard

SIOS Protection Suite for LinuxPage 117

Page 138: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Expected Behaviors

Note: For further information on VMware commands, use vmware-cmdwith no arguments to display a helppage about all options.

Expected BehaviorsWhen LifeKeeper detects a communication failure with a node, that node will be powered off and a failover willoccur. Once the issue is repaired, the node will have to bemanually powered on.

WatchdogWatchdog is amethod of monitoring a server to ensure that if the server is not working properly, correctiveaction (reboot) will be taken so that it does not cause problems. Watchdog can be implemented using specialwatchdog hardware or using a software-only option.

(Note: This configuration has only been tested with Red Hat Enterprise Linux Versions 5 and 6. No otheroperating systems have been tested; therefore, no others are supported at this time.)

Componentsl Watchdog timer software driver or an external hardware component

l Watchdog daemon – rpm available through the Linux distribution

l LifeKeeper core daemon – installed with the LifeKeeper installation

l Health check script – LifeKeeper monitoring script

LifeKeeper Interoperability with Watchdog

SIOS Protection Suite for LinuxPage 118

Page 139: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Configuration

Read the next section carefully. The daemon is designed to recover from errors and will reset the system if notconfigured carefully. Planning and care should be given to how this is installed and configured. This section isnot intended to explain and configure watchdog, but only to explain and configure how LifeKeeperinteroperates in such a configuration.

ConfigurationThe following steps should be carried out by an administrator with root user privileges. The administratorshould already be familiar with some of the risks and issues with watchdog.

The health check script (LifeKeeper monitoring script) is the component that ties the LifeKeeper configurationwith the watchdog configuration (/opt/LifeKeeper/samples/watchdog/LifeKeeper-watchdog).This script provides full monitoring of LifeKeeper and should not require any modifications.

1. If watchdog has been previously configured, enter the following command to stop it. If not, go to Step2.

/etc/rc.d/init.d/watchdog stop

Confirmation should be received that watchdog has stopped

Stopping watchdog: [OK]

2. Edit the watchdog configuration file (/etc/watchdog.conf) supplied during the installation of watch-dog software.

l Modify test-binary:

test-binary = /opt/LifeKeeper/samples/watchdog/LifeKeeper-watchdog

l Modify test-timeout:

test-timeout = 5

l Modify interval:

interval = 7

The interval value should be less than LifeKeeper communication path timeout (15 seconds), so a goodnumber for the interval is generally half of this value.

3. Make sure LifeKeeper has been started. If not, please refer to the Starting LifeKeeper topic.

4. Start watchdog by entering the following command:

/etc/rc.d/init.d/watchdog start

Confirmation should be received that watchdog has started

Starting watchdog: [OK]

5. To start watchdog automatically on future restarts, enter the following command:

chkconfig --levels 35 watchdog on

SIOS Protection Suite for LinuxPage 119

Page 140: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Uninstall

Note:Configuring watchdogmay cause some unexpected reboots from time to time. This is the generalnature of how watchdog works. If processes are not responding correctly, the watchdog feature will assumethat LifeKeeper (or the operating system) is hung, and it will reboot the system (without warning).

UninstallCare should be taken when uninstalling LifeKeeper. The above steps should be done in reverse order as listedbelow.

WARNING: IF UNINSTALLING LIFEKEEPER BY REMOVING THE RPM PACKAGES THATMAKE UPLIFEKEEPER, TURN OFF WATCHDOG FIRST! In Step 2 above, the watchdog config file was modified tocall on the LifeKeeper-watchdog script; therefore, if watchdog is not turned off first, it will call on that scriptthat is no longer there. An error will occur when this script is not found which will trigger a reboot. This willcontinue until watchdog is turned off.

1. Stop watchdog by entering the following command:

/etc/rc.d/init.d/watchdog stop

Confirmation should be received that watchdog has stopped

Stopping watchdog: [OK]

2. Edit the watchdog configuration file (/etc/watchdog.conf) supplied during the installation of watch-dog software.

l Modify test-binary and interval by commenting out those entries (add # at the beginning of each line):

#test-binary =#interval =

(Note: If interval was used previously for other functions, it can be left as-is)

3. Uninstall LifeKeeper. See the Removing LifeKeeper topic.

4. Watchdog can now be started again. If only used by LifeKeeper, watchdog can be permanently dis-abled by entering the following command:

chkconfig --levels 35 watchdog off

Resource Policy Management

OverviewResource Policy Management in SIOS Protection Suite for Linux provides behavior management of resourcelocal recovery and failover. Resource policies aremanaged with the lkpolicy command line tool (CLI).  

SIOS Protection Suite for LinuxPage 120

Page 141: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

SIOS Protection Suite

SIOS Protection SuiteSIOS Protection Suite is designed tomonitor individual applications and groups of related applications,periodically performing local recoveries or notifications when protected applications fail. Related applications,by example, are hierarchies where the primary application depends on lower-level storage or networkresources. When an application or resource failure occurs, the default behavior is:

1. Local Recovery: First, attempt local recovery of the resource or application. An attempt will bemadeto restore the resource or application on the local server without external intervention. If local recoveryis successful, then SIOS Protection Suite will not perform any additional action.

2. Failover: Second, if a local recovery attempt fails to restore the resource or application (or the recoverykitmonitoring the resource has no support for local recovery), then a failoverwill be initiated. Thefailover action attempts to bring the application (and all dependent resources) into service on anotherserver within the cluster.

Please see SIOS Protection Suite Fault Detection and Recovery Scenarios for more detailed informationabout our recovery behavior.

Custom and Maintenance-Mode Behavior via PoliciesSIOS Protection Suite Version 7.5 and later supports the ability to set additional policies that modify thedefault recovery behavior. There are four policies that can be set for individual resources (see the sectionbelow about precautions regarding individual resource policies) or for an entire server. The recommendedapproach is to alter policies at the server level.

The available policies are:

Standard Policiesl Failover This policy setting can be used to turn on/off resource failover. (Note: In order forreservations to be handled correctly, Failover cannot be turned off for individual scsi resources.)

l LocalRecovery - SIOS Protection Suite, by default, will attempt to recover protected resources byrestarting the individual resource or the entire protected application prior to performing a failover. Thispolicy setting can be used to turn on/off local recovery.

l TemporalRecovery - Normally, SIOS Protection Suite will perform local recovery of a failed resource.If local recovery fails, SIOS Protection Suite will perform a resource hierarchy failover to another node.If the local recovery succeeds, failover will not be performed.

Theremay be cases where the local recovery succeeds, but due to some irregularity in the server, thelocal recovery is re-attempted within a short time; resulting in multiple, consecutive local recoveryattempts. This may degrade availability for the affected application.

To prevent this repetitive local recovery/failure cycle, youmay set a temporal recovery policy. Thetemporal recovery policy allows an administrator to limit the number of local recovery attempts(successful or not) within a defined time period.

SIOS Protection Suite for LinuxPage 121

Page 142: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Meta Policies

Example: If a user sets the policy definition to limit the resource to three local recovery attemptsin a 30-minute time period, SIOS Protection Suite will fail over when a third local recoveryattempt occurs within the 30-minute period. 

Defined temporal recovery policies may be turned on or off. When a temporal recovery policy is off,temporal recovery processing will continue to be done and notifications will appear in the log when thepolicy would have fired; however, no actions will be taken.

Note: It is possible to disable failover and/or local recovery with a temporal recovery policy also inplace. This state is illogical as the temporal recovery policy will never be acted upon if failover or localrecovery are disabled.

Meta PoliciesThe "meta" policies are the ones that can affect more than one other policy at the same time. Thesepolicies are usually used as shortcuts for getting certain system behaviors that would otherwise requiresettingmultiple standard policies.

l NotificationOnly - This mode allows administrators to put SIOS Protection Suite in a "monitoring only"state. Both local recovery and failover of a resource (or all resources in the case of a server-wide policy) are affected. The user interface will indicate a Failure state if a failure is detected; butno recovery or failover action will be taken. Note: The administrator will need to correct the problemthat caused the failuremanually and then bring the affected resource(s) back in service to continuenormal SIOS Protection Suite operations.

Important Considerations for Resource-Level PoliciesResource level policies are policies that apply to a specific resource only, as opposed to an entire resourcehierarchy or server.

Example :

app- IP - file system

In the above resource hierarchy, app depends on both IP and file system. A policy can be set to disablelocal recovery or failover of a specific resource. This means that, for example, if the IP resource's localrecovery fails and a policy was set to disable failover of the IP resource, then the IP resource will notfail over or cause a failover of the other resources. However, if the file system resource's localrecovery fails and the file system resource policy does not have failover disabled, then the entirehierarchy will fail over.

Note: It is important to remember that resource level policies apply only to the specific resource forwhich they are set.

This is a simple example. Complex hierarchies can be configured, so caremust be taken when settingresource-level policies.

SIOS Protection Suite for LinuxPage 122

Page 143: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

The lkpolicy Tool

The lkpolicy ToolThe lkpolicy tool is the command-line tool that allows management (querying, setting, removing) of policies onservers running SIOS Protection Suite for Linux. lkpolicy supports setting/modifying policies, removingpolicies and viewing all available policies and their current settings. In addition, defined policies can be set onor off, preserving resource/server settings while affecting recovery behavior.

The general usage is :

lkpolicy [--list-policies | --get-policies | --set-policy | --remove-policy] <name value pair data...>

The <name value pair data...> differ depending on the operation and the policy beingmanipulated,particularly when setting policies. For example: Most on/off type policies only require --on or --offswitch, but the temporal policy requires additional values to describe the threshold values.

Example lkpolicy Usage

Authenticating With Local and Remote Servers

The lkpolicy tool communicates with SIOS Protection Suite servers via an API that the servers expose. ThisAPI requires authentication from clients like the lkpolicy tool. The first time the lkpolicy tool is asked to accessa SIOS Protection Suite server, if the credentials for that server are not known, it will ask the user forcredentials for that server. These credentials are in the form of a username and password and:

1. Clients must have SIOS Protection Suite admin rights. This means the usernamemust be in thelkadmin group according to the operating system's authentication configuration (via pam). It is notnecessary to run as root, but the root user can be used since it is in the appropriate group by default.

2. The credentials will be stored in the credential store so they do not have to be enteredmanually eachtime the tool is used to access this server.

See Configuring Credentials for SIOS Protection Suite for more information on the credential store and itsmanagement with the credstore utility.

An example session with lkpolicy might look like this:

[root@thor49 ~]# lkpolicy -l -d v6test4Please enter your credentials for the system 'v6test4'.Username: rootPassword:Confirm password:FailoverLocalRecoveryTemporalRecoveryNotificationOnly[root@thor49 ~]# lkpolicy -l -d v6test4FailoverLocalRecovery

SIOS Protection Suite for LinuxPage 123

Page 144: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Listing Policies

TemporalRecoveryNotificationOnly[root@thor49 ~]#

Listing Policies

lkpolicy --list-policy-types

Showing Current Policies

lkpolicy --get-policies

lkpolicy --get-policies tag=\*

lkpolicy --get-policies --verbose tag=mysql\* # all resources starting with mysql

lkpolicy --get-policies tag=mytagonly

Setting Policies

lkpolicy --set-policy Failover --off

lkpolicy --set-policy Failover --on tag=myresource

lkpolicy --set-policy Failover --on tag=\*

lkpolicy --set-policy LocalRecovery --off tag=myresource

lkpolicy --set-policy NotificationOnly --on

lkpolicy --set-policy TemporalRecovery --on recoverylimit=5 period=15

lkpolicy --set-policy TemporalRecovery --on --force recoverylimit=5 period=10

Removing Policies

lkpolicy --remove-policy Failover tag=steve

Note: NotificationOnly is a policy alias. Enabling NotificationOnly is the equivalent of disabling  thecorresponding LocalRecovery and Failover policies.

Configuring CredentialsCredentials for communicating with other systems aremanaged via a credential store. This store can bemanaged, as needed, by the /opt/LifeKeeper/bin/credstore utility. This utility allows server accesscredentials to be set, changed and removed - on a per server basis.

SIOS Protection Suite for LinuxPage 124

Page 145: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Adding or Changing Credentials

Adding or Changing CredentialsAdding and changing credentials are handled in the sameway. A typical example of adding or changingcredentials for a server, server.mydomain.com, would look like this: 

/opt/LifeKeeper/bin/credstore -k server.mydomain.commyuser

In this case,myuser is the username used to access server.mydomain.com and the password will be askedfor via a prompt with confirmation (like passwd). 

Note: The key name used to store LifeKeeper server credentials must match exactly the hostname used incommands such as lkpolicy. If the hostname used in the command is an FQDN, then the credential key mustalso be the FQDN. If the hostname is a short name, then the key must also be the short name.

Youmay wish to set up a default key in the credential store. The default credentials will be used forauthentication when no specific server key exists. To add or change the defaultkey, run:

/opt/LifeKeeper/bin/credstore -k default myuser

Listing Stored CredentialsThe currently stored credentials can be listed by the following command:

/opt/LifeKeeper/bin/credstore -l

This will list the keys stored in the credential store and, in this case, the key indicates the server for which thecredentials are used. (This commandwill not actually list the credentials, only the key names, since thecredentials themselves may be sensitive.)

Removing Credentials for a ServerCredentials for a given server can be removed with the following command:

/opt/LifeKeeper/bin/credstore -d -k myserver.mydomain.com

In this case, the credentials for the servermyserver.mydomain.com will be removed from the store.

Additional InformationMore information on the credstore utility can be found by running:

/opt/LifeKeeper/bin/credstore --man

This will show the entire man/help page for the command.

SIOS Protection Suite for LinuxPage 125

Page 146: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents
Page 147: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Administration Overview

OverviewLifeKeeper does not require administration during operation. LifeKeeper works automatically to monitorprotected resources and to perform the specified recovery actions if a fault should occur. You use theLifeKeeper GUI in these cases:

l Resource and hierarchy definition. LifeKeeper provides these interface options:

l LifeKeeper GUI.

l LifeKeeper command line interface.

l Resource monitoring. The LifeKeeper GUI provides access to resource status information and to theLifeKeeper logs.

l Manual intervention. Youmay need to stop servers or specific resources for maintenance or otheradministrative actions. The LifeKeeper GUI provides menu functions that allow you to bring specificresources in and out of service. Once applications have been placed under LifeKeeper protection, theyshould be started and stopped only through these LifeKeeper interfaces. Starting and stoppingLifeKeeper is done through the command line only.

SeeGUI Tasks andMaintenance Tasks for detailed instructions on performing LifeKeeper administration,configuration andmaintenance operations.

Note:All LifeKeeper executable scripts and programs run via the command line require super user authority.

A super user granted permissions by running the "su" or "sudo" command is able to execute LifeKeepercommands. However, SIOS Technology Corp has tested executing LifeKeeper commands via the root useronly.

Error Detection and NotificationThe ability to provide detection and alarming for problems within an application is critical to building the besttotal fault resilient solution. Since every specific application varies on themechanism and format of failures,no one set of generic mechanisms can be supplied. In general, however, many application configurations canrely on the Core system error detection provided within LifeKeeper. Two common fault situations are used todemonstrate the power of LifeKeeper's core facilities in the topics Resource Error Recovery Scenario andServer Failure Recovery Scenario.

LifeKeeper also provides a complete environment for defining errors, alarms, and events that can triggerrecovery procedures. This interfacing usually requires patternmatch definitions for the system error log(/var/log/messages), or custom-built application specific monitor processes.

SIOS Protection Suite for LinuxPage 127

Page 148: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

N-Way Recovery

N-Way RecoveryN-Way recovery allows different resources to fail over to different backup servers in a cluster.

Return to Protected Resources

Administrator Tasks

Editing Server Properties1. To edit the properties of a server, bring up the Server Properties dialog just as you would for viewing

server properties.

2. If you are logged into that server with the appropriate permissions, the following items will be editable.

l Shutdown Strategy

l Failover Confirmation

3. Once you havemade changes, theApply button will be enabled. Clicking this button will apply yourchanges without closing the window.

4. When you are finished, click OK to save any changes and close the window, orCancel to close thewindow without applying changes.

Creating a Communication PathBefore configuring a LifeKeeper communication path between servers, verify the hardware and softwaresetup. For more information, see the SPS for Linux Release Notes

To create a communication path between a pair of servers, youmust define the path individually on bothservers. LifeKeeper allows you to create both TCP (TCP/IP) and TTY communication paths between a pair ofservers. Only one TTY path can be created between a given pair. However, you can createmultiple TCPcommunication paths between a pair of servers by specifying the local and remote addresses that are to bethe end-points of the path. A priority value is used to tell LifeKeeper the order in which TCP paths to a givenremote server should be used.

IMPORTANT:Using a single communication path can potentially compromise the ability of servers in acluster to communicate with one another. If a single comm path is used and the comm path fails, LifeKeeperhierarchies may come in service onmultiple servers simultaneously. This is known as "false failover".Additionally, heavy network traffic on a TCP comm path can result in unexpected behavior, including falsefailovers and LifeKeeper initialization problems.

1. There are four ways to begin.

l Right-click on a server icon, then click Create Comm Pathwhen the server contextmenu appears.

l On the global toolbar, click theCreate Comm Path button.

SIOS Protection Suite for LinuxPage 128

Page 149: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Creating a Communication Path

l On the server context toolbar, if displayed, click theCreate Comm Path button.

l On the Edit menu, select Server, thenCreate Comm Path.

2. A dialog entitledCreate Comm Pathwill appear. For each of the options that follow, click Help for anexplanation of each choice.

3. Select the Local Server from the list box and click Next.

4. Select one or moreRemote Servers in the list box. If a remote server is not listed in the list box (i.e. itis not yet connected to the cluster), youmay enter it usingAdd. Youmust make sure that the networkaddresses for both the local and remote servers are resolvable (for example, with DNS or added to the/etc/hosts file). Click Next.

5. Select either TCP or TTY forDevice Type and click Next.

6. Select one or more Local IP Addresses if theDevice Typewas set for TCP. Select the Local TTYDevice if theDevice Typewas set to TTY. Click Next.

7. Select theRemote IP Address if theDevice Typewas set for TCP. Select theRemote TTY Deviceif theDevice Typewas set to TTY. Click Next.

8. Enter or select thePriority for this comm path if theDevice Typewas set for TCP. Enter or select theBaud Rate for this Comm Path if theDevice Typewas set to TTY. Click Next.

9. Click Create. A message should be displayed indicating the network connection is successfullycreated. Click Next.

10. If you selectedmultiple Local IP Addresses or multiple Remote Servers and theDevice Typewas setfor TCP, then you will be taken back to Step 6 to continue with the next Comm Path. If you selectedmultiple Remote Servers and theDevice Typewas set for TTY, then you will be taken back to Step 5to continue with the next Comm Path.

11. Click Donewhen presented with the concludingmessage.

You can verify the comm path by viewing the Server Properties Dialog or by entering the commandlcdstatus -q. See the LCD(1M) man page for information on usinglcdstatus. You should see anALIVE status.

In addition, check the server icon in the right pane of the GUI. If this is the first comm path that has beencreated, the server icon shows a yellow heartbeat, indicating that one comm path is ALIVE, but there is no

redundant comm path.

The server icon will display a green heartbeat when there are at least two comm paths ALIVE.

IMPORTANT: When using IPv6 addresses to create a comm path, statically assigned addresses should beused instead of auto-configured/stateless addresses as the latter may change over time which will cause thecomm path to fail.

If the comm path does not activate after a few minutes, verify that the paired server's computer name iscorrect. If using TTY comm paths, verify that the cable connection between the two servers is correct and isnot loose. Use the portio(1M) command if necessary to verify the operation of the TTY connection.

SIOS Protection Suite for LinuxPage 129

Page 150: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Deleting a Communication Path

Deleting a Communication Path1. There are four ways to begin.

l Right-click on a server icon, then click Delete Comm Pathwhen the server context menuappears.

l On the global toolbar, click theDelete Comm Path button.

l On the server context toolbar, if displayed, click theDelete Comm Path button.

l On the Edit menu, select Server, thenDelete Comm Path.

2. A dialog entitled Delete Comm Path will appear. For each of the options that follow, click Help for anexplanation of each choice.

3. Select Local Server from the list and click Next. This dialog will only appear if the delete is selectedusing theDelete Comm Path button on the global toolbar or via the Edit menu selectingServer.

4. Select the communications path(s) that you want to delete and click Next.

5. Click Delete Comm Path(s). If the output panel is enabled, the dialog closes, and the results of thecommands to delete the communications path(s) are shown in the output panel. If not, the dialogremains up to show these results, and you click Done to finish when all results have been displayed.A message should be displayed indicating the network connection is successfully removed

6. Click Done to close the dialog and return to the GUI status display.

Server Properties - FailoverIn the event that the primary server has attempted and failed local recovery, or failed completely, most serveradministrators will want LifeKeeper to automatically restore the protected resource(s) to a backup server. Thisis the default LifeKeeper behavior. However, some administrators may not want the protected resource(s) toautomatically go in-service at a recovery site. For example, if LifeKeeper is installed in aWAN environmentwhere the network connection between the servers may not be reliable in a disaster recovery situation.

Automatic failover is enabled by default for all protected resources. To disable automatic failover for protectedresources or to prevent automatic failover to a backup server, use the Failover section located on theGeneral tab of Server Properties to configure as follows:

For each server in the cluster:

1. Bring up theServer Properties dialog just as you would for viewing server properties.

2. Select theGeneral tab. In the Failover section of the Server Properties dialog, check the server todisable system and resource failover capabilities. By default, all failover capabilities of LifeKeeper areenabled.

In theSet Confirm Failover On column, select the server to be disqualified as a backup server for acomplete failure of the local server.

SIOS Protection Suite for LinuxPage 130

Page 151: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Creating Resource Hierarchies

In theSet Block Resource Failover On column, select the server to be disqualified as a backupserver for any failed resource hierarchy on this local server. Resource failovers cannot be disabledwithout first disabling system failover capabilities.

To commit your selections, press theApply button.

Refer to[Confirm Failover] and [Block Resource Failover]Settings for configuration details.

Creating Resource Hierarchies1. There are four ways to begin creating a resource hierarchy.

l Right-click on a server icon to bring up the server context menu, then click onCreateResource Hierarchy.

SIOS Protection Suite for LinuxPage 131

Page 152: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Application Resource Hierarchies

l On the global toolbar, click on theCreate Resource Hierarchy button.

l On the server context toolbar, if displayed, click on theCreate Resource Hierarchy button.

l On theEditmenu, select Server, then click onCreate Resource Hierarchy.

2. A dialog entitled Create ResourceWizard will appear with a list of all recognized recovery kits installedwithin the cluster. Select the Recovery Kit that builds resource hierarchies to protect your applicationand click Next.

3. Select the Switchback Type and click Next.

4. Select the Server and click Next. Note: If you began from the server context menu, the server will bedetermined automatically from the server icon that you clicked on, and this step will be skipped.

5. Continue through the succeeding dialogs, entering whatever data is needed for the type of resourcehierarchy that you are creating.

LifeKeeper Application Resource HierarchiesIf you install LifeKeeper without any recovery kits, the Select Recovery Kit list includes options for FileSystem or Generic Application by default. TheGeneric Application optionmay be used for applications thathave no associated recovery kits.

If you install the Raw I/O or IP Recovery Kits (both of which are Core Recovery Kits that are packagedseparately and included on the LifeKeeper Coremedia), the Select Recovery Kit list will provide additionaloptions for these Recovery Kits.

See the following topics describing these available options:

l Creating a File System Resource Hierarchy

l Creating aGeneric Application Resource Hierarchy

l Creating a Raw Device Resource Hierarchy

The IP Recovery Kit is documented in the IP Recovery Kit Technical Documentation.

Recovery Kit OptionsEach optional recovery kit that you install adds entries to the Select Recovery Kit list; for example, youmayseeOracle, Apache, and NFS Recovery Kits. Refer to the Administration Guide that accompanies eachrecovery kit for directions on creating the required resource hierarchies.

Note: If you wish to create a File System or other application resource hierarchy that is built on a logicalvolume, then youmust first have the Logical VolumeManager (LVM) Recovery Kit installed.

Creating a File System Resource HierarchyUse this option to protect a file system only (for example, if you have shared files that need protection).

SIOS Protection Suite for LinuxPage 132

Page 153: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Creating a File System Resource Hierarchy

1. There are four ways to begin creating a file system resource hierarchy.

l Right-click on a server icon to bring up the server context menu, then click onCreate ResourceHierarchy.

l On the global toolbar, click on theCreate Resource Hierarchy button.

l On the server context toolbar,  if displayed, click on theCreate Resource Hierarchy button.

l On the Edit menu, select Server, then click onCreate Resource Hierarchy.

2. A dialog entitledCreate ResourceWizardwill appear with aRecovery Kit list. Select File SystemResource and click Next.

3. Select theSwitchback Type and click Next.

4. Select theServer and click Next. Note: If you began from the server context menu, the server will bedetermined automatically from the server icon that you clicked on, and this step will be skipped.

5. TheCreate gen/filesys Resource dialog will now appear. Select theMount Point for the file systemresource hierarchy and click Next. The selectedmount point will be checked to see that it is sharedwith another server in the cluster by checking each storage kit to see if it recognizes themounteddevice as shared. If no storage kit recognizes themounted device, then an error dialog will bepresented:

<file system> is not a shared file system

SelectingOK will return to theCreate gen/filsys Resource dialog.

Note:

l In order for amount point to appear in the choice list, themount point must be currentlymounted. If an entry for themount point exists in the /etc/fstab file, LifeKeeper willremove this entry during the creation and extension of the hierarchy. It is advisable tomake a backup of /etc/fstab prior to using the NAS Recovery Kit, especially if youhave complex mount settings. You can direct that entries are re-populated back into/etc/fstab on deletion by setting the /etc/default/LifeKeeper tunableREPLACEFSTAB=true|TRUE.

l Many of these resources (SIOS DataKeeper, LVM, DeviceMapper Multipath, etc.)require LifeKeeper recovery kits on each server in the cluster in order for the file systemresource to be created. If these kits are not properly installed, then the file system willnot appear to be shared in the cluster.

6. LifeKeeper creates a default Root Tag for the file system resource hierarchy. (This is the label used forthis resource in the status display). You can select this root tag or create your own, then click Next.

7. Click Create Instance. A window will display amessage indicating the status of the instance creation.

8. Click Next. A window will display amessage that the file system hierarchy has been createdsuccessfully.

9. At this point, you can click Continue to move on to extending the file system resource hierarchy, or

SIOS Protection Suite for LinuxPage 133

Page 154: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Creating aGeneric Application Resource Hierarchy

you can click Cancel to return to the GUI. If you click Cancel, you will receive a warningmessage thatyour hierarchy exists on only one server, and it is not protected at this point.

Creating a Generic Application Resource HierarchyUse this option to protect a user-defined application that has no associated recovery kit. Templates areprovided for the user supplied scripts referenced below in$LKROOT/lkadm/subsys/gen/app/templates. Copy these templates to another directory beforecustomizing them for the application that you wish to protect and testing them.

Note: For applications depending upon other resources such as a file system, disk partition, or IP address,create each of these resources separately, and use Create Dependency to create the appropriatedependencies.

1. There are four ways to begin creating a generic application resource hierarchy.

l Right-click on a server icon to bring up the server context menu, then click onCreate ResourceHierarchy.

l On the global toolbar, click on the Create Resource Hierarchy button.

l On the server context toolbar, if displayed, click on theCreate Resource Hierarchy button.

l On the Edit menu, select Server, then click onCreate Resource Hierarchy.

2. A dialog entitled Create ResourceWizard will appear with aRecovery Kit list. Select GenericApplication and click Next.

3. Select theSwitchback Type and click Next.

4. Select theServer and click Next. Note: If you began from the server context menu, the server will bedetermined automatically from the server icon that you clicked on, and this step will be skipped.

5. On the next dialog, enter the path to theRestore Script for the application and click Next. This is thecommand that starts the application. A template restore script, restore.template, is provided in thetemplates directory. The restore script must not impact applications that are already started.

6. Enter the path to theRemove Script for the application and click Next. This is the command thatstops the application. A template remove script, remove.template, is provided in the templatesdirectory.

7. Enter the path to the quickCheck Script for the application and click Next. This is the command thatmonitors the application. A template quickCheck script, quickCheck.template, is provided in thetemplates directory.

8. Enter the path to the Local Recovery Script for the application and click Next. This is the commandthat attempts to restore a failed application on the local server. A template recover script,recover.template, is provided in the templates directory.

9. Enter any Application Information and click Next. This is optional information about the applicationthat may be needed by the restore, remove, recover, and quickCheck scripts.

10. Select either Yes or No forBring Resource In Service, and click Next. Selecting Nowill cause the

SIOS Protection Suite for LinuxPage 134

Page 155: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Creating a Raw Device Resource Hierarchy

resource state to be set to OSU following the create; selecting Yes will cause the previously providedrestore script to be executed. For applications depending upon other resources such as a file system,disk partition, or IP address, select No if you have not already created the appropriate dependentresources.

11. Enter theRoot Tag, which is a unique name for the resource instance. (This is the label you will see forthis resource in the status display.)

12. Click Create Instance to start the creation process. A window will display amessage indicating thestatus of the instance creation.

13. Click Next. A window will display amessage that the hierarchy has been created successfully.

14. At this point, you can click Continue to move on to extending the generic application resourcehierarchy, or you can click Cancel to return to the GUI. If you click Cancel, you will receive a warningthat your hierarchy exists on only one server, and it is not protected at this point.

Note: The scripts which are provided when resource hierarchy is created, such as restore、remove、quickCheck, are located in each directory under LKROOT/subsys/gen/resource/app/.

l restore - actions/!restore/<tag name>

l remove - actions/!remove/<tag name>

l quickCheck - actions/!quickCheck/<tag name>

l recover - recovery/!recover/<tag name>

Creating a Raw Device Resource HierarchyUse this option to protect a raw device resource. For example, if you create additional table space on a rawdevice that needs to be added to an existing database hierarchy, you would use this option to create a rawdevice resource.

Note: LifeKeeper locks shared disk partition resources at the disk logical unit (or LUN) level to one system ina cluster at a time.

1. There are four ways to begin creating a raw device resource hierarchy.

l Right-click on a server icon to bring up the server context menu, then click onCreate ResourceHierarchy.

l On the global toolbar, click on theCreate Resource Hierarchy button.

l On the server context toolbar, if displayed, click on theCreate Resource Hierarchy button.

l On the Edit menu, select Server, then click onCreate Resource Hierarchy.

2. A dialog entitled Create ResourceWizard will appear with aRecovery Kit list. Select Raw Device andclick Next.

3. Select theSwitchback Type and click Next.

4. Select theServer and click Next.

SIOS Protection Suite for LinuxPage 135

Page 156: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Quick Service Protection (QSP) Recovery Kit

Note: If you began from the server context menu, the server will be determined automatically from theserver icon that you clicked on, and this step will be skipped.

5. Select theRaw Partition on a shared storage device where this resource will reside, and click Next.

6. Enter theRoot Tag, which is a unique name for the resource instance. (This is the label you will see forthis resource in the status display.)

7. Click Create Instance to start the creation process. A window titled Creating scsi/raw resource willdisplay text indicating what is happening during creation.

8. Click Next. A window will display amessage that the hierarchy has been created successfully.

9. At this point, you can click Continue to move on the extending the raw resource hierarchy, or you canclick Cancel to return to the GUI. If you click Cancel, you will receive amessage warning that yourhierarchy exists on only one server, and it is not protected at this point

Quick Service Protection (QSP) Recovery Kit

IntroductionQSP Recovery Kit provides a simplifiedmethod to protect OS service. With the QSP Recover Kit users caneasily create a LifeKeeper resource instance to protect an OS service provided that service can be started andstopped by the OS service command. The service can also be protected via the Generic ApplicationRecovery Kit, but the use of that kit requires code development whereas the QSP Recovery Kit does not.Also, by creating a dependency relationship, protected services can be started and stopped in conjunctionwith the application that requires the service. The application is protected by another LifeKeeper resourceinstance not via the QSP resource.

However, the QSP Recovery Kit quickCheck only perform simple health (using the “status” action of theservice command). So, QSP doesn't guarantee that the service is provided, or process is functioning in fact. Ifcomplicated starting and/or stopping is necessary, or more robust health checking operations are necessary,the using aGeneric Application is recommended.

RequirementsThe service to be protected by the QSP Recovery Kit needs tomeet the following requirements.

l It must support start and stop action via the OS service command. Also, it must returns 0 when thestart and stop action succeeds.

l To perform heath checking the servicemust support the status action via the OS service command. Ifit does not support the status action then quickCheck health check operations must be disabled. Also,it must return 0 when the status action succeeds.

l The name of the service to be protectedmust not exceed 256 characters in length and contain onlyalphanumeric characters.

The service to be protected by the QSP resourcemust be running (started) before attempting a resourcecreate. Please notice that some services which are already supplied with dedicated Recovery Kit are not

SIOS Protection Suite for LinuxPage 136

Page 157: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Create QSP Resource Hierarchy

target of QSP (hereinafter referred as “the Services not targeted by QSP protection”) and cannot be protectedby QSP Recovery Kit.

Create QSP Resource HierarchyThis option is used to protect OS services via the QSP Recovery Kit.

1. There are 4methods to start the creation of QSP resource instance.

l Right click the server icon to show the server context menu, and select [Create Resource Hier-archy].

l Click the [Create Resource Hierarchy] button at on the Global tool bar.

l Click the [Create Resource Hierarchy] button on the Server context tool bar if displayed.

l From the [Edit] menu select [Server] then  [Create Resource Hierarchy]

2. A dialogue box titled [Create ResourceWizard] is displayed. In the [Recovery Kit] drop down is a list ofavailable resource types to create. Select Quick Service Protection and click [Next].

3. Select [Switchback Type] and click [Next].

4. Select [Server] and click [Next].Note: If the create was started via the server context menu, this step is skipped because the server isknown based on the start context (defaults to name of the server on which the create process started)..

5. The next dialog box contains a drop down of the available services [Service Name] that can be pro-tected. Select service to be protected and click [Next].Note: The list may not show the service if it is not running. In this case, click [Cancel] to discontinuethe process, and start service. Once the service is running restart the create process. Also, the list willnot show the Services not targeted by QSP protection.

6. In the next dialog box the quickCheck action is configured.To enable the quickCheck monitoring func-tion, select [enable]. To disable it, select [disable]. Click [Next] to continue. The quickCheck actioncan be changed at any time.Note: If the selected service does not support the “status” action via the OS service command, thenset the quickCheck action to “disabled” because theQSP Recovery Kit cannot monitor the servicestate.

7. Input the [Resource Tag]. This is a unique name for the resource instance.(This is the label that uniquely identifies the resource instance and is used whenever displayingLifeKeeper protected resource instances in UI.)

8. Click [Create Instance] to start the creation process. The status of the resource instance creation isdisplayed in the status window.

9. Click [Next] to display the resource extension dialog. Click [Next] to begin the extension process orclick [Cancel] to go back to GUI. When [Cancel] is clicked, an alert is displayed that the hierarchyexist on only one server, and protection by LifeKeeper is not available at this time.

SIOS Protection Suite for LinuxPage 137

Page 158: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Extending QSP Resource Hierarchy

Extending QSP Resource HierarchyThis function, as explained in the section Extending Resource Hierarchies, starts automatically after finishingthe Create QSP Resource Hierarchy (URL) process, or, from right clicking on an existing QSP resource andselecting [Extend Resource Hierarchy]. After finishing the pre-extend process, then, complete the followingsteps.

1. Select [Resource Tag] provided by LifeKeeper, or, input unique tag for the resource hierarchy on the tar-get server.

2. Click [Extend] to start extension process. The status of the extension process is displayed in the dia-logue box, and when it is finished it will show amessage indicating the hierarchy is correctly extended.If the hierarchy is to be extended to another server click [Next Server], otherwise click [Finish] to com-plete the extension. If [Next Server] is selected, the extension operation is repeated.

3. When [Finish] is clicked the integrity of the hierarchy is checked. If any problems are detected theextension is reversed. To complete the verification and close the dialog box click [Done].

QSP Resource ConfigurationThe following parameters are unique to eachQSP resource instance are available for modification.

Set Up Items DefaultValue Description

Monitoring quickCheckSpecified whencreating theresource

Set to enable the checking of the status of the service or to dis-able / skip themonitoring function

TimeOut

restore 0 Specify the restore timeout (unit: second). If set to 0 no timeoutoccurs when restoring the resource instance.

remove 0 Specify the remove timeout (unit: second). If set to 0 notimeout occurs when removing the resource instance.

quickCheck 0Specify the quickCheck timeout (unit: second). If set to 0 notime out occurs when performing health checking of theresource instance.

recover 0 Specify the recover timeout (unit: second). If set to 0 notimeout occurs during recovery of the resource instance.

Checking / Changing of the set value is possible from the [QSP Configuration] tab by Display ResourceProperties. andmust be performed on each node in the hierarchy.If the quickCheck function is disabled, quickCheck and recover of timeouts are not displayed and thus cannotbe changed.

SIOS Protection Suite for LinuxPage 138

Page 159: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

How to change theMonitoring function

How to change the Monitoring function1. Display the [QSP Configuration] tab of the resource properties, and click [Change quickCheck].

2. Select [enable] to enable quickCheck, or, [disable] to disable it.

3. Clicking [Change] starts the change process, and display change process message.

4. Finish by clicking [Done].Note: Modification of these values is a per node operation. If the same change is needed on anothernode, then the process must be repeated on that node.

SIOS Protection Suite for LinuxPage 139

Page 160: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

How to change Timeout Value

How to change Timeout Value1. Display the [QSP Configuration] tab of resource properties, and click [Change Timeout].

2. Select the timeout action to be changed (restore, remove, quickCheck or recover), and click [Next]Note: [quickCheck] and [recover] timeouts are not displayed in the select list if themonitoring functionis disabled.

3. Input the timeout value seconds.Note: Input decimal numbers only. Non numerical characters are invalid.

4. Clicking [Change] starts the timeout change process and displays change process messages.

5. Finish by clicking [Done]Note:Modification of these values is a per node operation. If the same change is needed on anothernode, then the process must be repeated on that node.

SIOS Protection Suite for LinuxPage 140

Page 161: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

How to change Timeout Value

SIOS Protection Suite for LinuxPage 141

Page 162: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Editing Resource Properties

Editing Resource Properties1. To edit the properties of a resource, bring up the Resource Properties dialog just as you would for

viewing resource properties.

2. If you are logged into that server with the appropriate permissions, the following items will be editable.

l Switchback

l Resource Configuration (only for resources with specialized configuration settings)

l Resource Priorities

3. Once you havemade changes, theApply button will be enabled. Clicking this button will apply yourchanges without closing the window.

4. When you are finished, click OK to save any changes and close the window, orCancel to close thewindow without applying changes.

Editing Resource PrioritiesYou can edit or reorder the priorities of servers on which a resource hierarchy has been defined. First, bring upthe Resource Properties dialog just as you would for viewing resource properties. The Resource Propertiesdialog displays the priority for a particular resource on a server in the Equivalencies Tab as shown below.

SIOS Protection Suite for LinuxPage 142

Page 163: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Using the Up and Down Buttons

There are two ways tomodify the priorities:

l Reorder the priorities by moving an equivalency with theUp/Down buttons ,or

l Edit the priority values directly.

Using the Up and Down Buttons1. Select an equivalency by clicking on a row in the Equivalencies table. TheUp and/orDown buttons

will become enabled, depending on which equivalency you have selected. TheUp button is enabledunless you have selected the highestpriority server. TheDown button is enabled unless you haveselected thelowest priority server.

2. Click Up orDown to move the equivalency in the priority list.

The numerical priorities columnwill not change, but the equivalency will move up or down in the list.

SIOS Protection Suite for LinuxPage 143

Page 164: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Editing the Priority Values

Editing the Priority Values1. Select a priority by clicking on a priority value in the Priority column of the Equivalencies table. A box

appears around the priority value, and the value is highlighted.

2. Enter the desired priority and press Enter.

l Note:Valid server prioritiesare 1 to 999.

After you have edited the priority, the Equivalencies table will be re-sorted.

Applying Your ChangesOnce you have the desired priority order in the Equivalencies table, click Apply (orOK) to commit yourchanges. TheApply button applies any changes that have beenmade. TheOK button applies any changesthat have beenmade and then closes the window. TheCancel button closes the window without saving anychanges made sinceApplywas last clicked.

Extending Resource HierarchiesThe LifeKeeperExtend Resource Hierarchy option copies an existing hierarchy from one server and createsa similar hierarchy on another LifeKeeper server. Once a hierarchy is extended to other servers, cascadingfailover is available for that resource. The server where the existing hierarchy currently resides is referred toas the template server. The server where the new extended hierarchy will be placed is referred to as the targetserver.

The target server must be capable of supporting the extended hierarchy and it must be able to communicatewith equivalent hierarchies on other remote servers (via active LifeKeeper communications paths). Thismeans that all recovery kits associated with resources in the existing hierarchy must already be installed onthe target server, as well as every other server where the hierarchy currently resides.

1. There are five ways to extend a resource hierarchy through theGUI.

l Create a new resource hierarchy. When the dialog tells you that the hierarchy has been created,click on theContinue button to start extending your new hierarchy via the Pre-ExtendWizard.

l Right-click on a global or server-specific resource icon to bring up the resource context menu,then click on Extend Resource Hierarchy to extend the selected resource via the Pre-ExtendWizard.

l On the global toolbar, click on theExtend Resource Hierarchy button. When the Pre-ExtendWizard dialog appears, select a Template Server and a Tag to Extend, clicking onNext aftereach choice.

l On the resource context toolbar, if displayed, click on theExtend Resource Hierarchy buttonto bring up the Pre-ExtendWizard.

l On the Edit menu, select Resource, then click onExtend Resource Hierarchy. When the Pre-ExtendWizard dialog appears, select a Template Server and a Tag to Extend, clicking onNext after each choice.

SIOS Protection Suite for LinuxPage 144

Page 165: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Extending a File System Resource Hierarchy

2. Either select the default Target Server or enter one from the list of choices, then click Next.

3. Select theSwitchback Type, then click Next.

4. Either select the default or enter your own Template Priority, then click Next.

5. Either select or enter your own Target Priority, then click Next.

6. The dialog will then display the pre-extend checks that occur next. If these tests succeed, LifeKeepergoes on to perform any steps that are needed for the specific type of resource that you are extending.

TheAccept Defaults button which is available for theExtend Resource Hierarchy option is intended for theuser who is familiar with the LifeKeeper Extend Resource Hierarchy defaults, and wants to quickly extenda LifeKeeper resource hierarchy without being prompted for input or confirmation. Users who prefer to extenda LifeKeeper resource hierarchy using the interactive, step-by-step interface of the GUI dialogs should use theNext button.

Note:ALL roots in amulti-root hierarchy must be extended together, that is, they may not be extended assingle root hierarchies.

Note: For command line instructions, see Extending the SAP Resource from the Command Line in the SAPDocumentation.

Extending a File System Resource HierarchyThis operation can be started automatically after you have finished creating a file system resource hierarchy,or from an existing file system resource, as described in the section on extending resource hierarchies.  Afteryou have done that, you then complete the steps below, which are specific to file system resources.

1. TheExtend gen/filesys Resource Hierarchy dialog box appears. Select theMount Point for the filesystem hierarchy, then click Next.

2. Select theRoot Tag that LifeKeeper offers, or enter your own tag for the resource hierarchy on thetarget server, then click Next.

3. The dialog displays the status of the extend operation, which should finish with amessage saying thatthe hierarchy has been successfully extended. Click Next Server if you want to extend the sameresource hierarchy to a different server. This will repeat the extend operation. Or click Finish tocomplete this operation.

4. The dialog then displays verification information as the extended hierarchy is validated. When this isfinished, theDone button will be enabled. Click Done to finish.

Extending a Generic Application Resource HierarchyThis operation can be started automatically after you have finished creating a generic application resourcehierarchy, or from an existing generic application resource, as described in the section on extending resourcehierarchies. After you have done that, you then complete the steps below, which are specific to genericapplication resources.

1. Select theRoot Tag that LifeKeeper offers, or enter your own tag for the resource hierarchy on thetarget server, then click Next.

SIOS Protection Suite for LinuxPage 145

Page 166: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Extending a Raw Device Resource Hierarchy

2. Enter any Application Information next (optional), then click Next.

3. The dialog displays the status of the extend operation, which should finish with amessage saying thatthe hierarchy has been successfully extended. Click Next Server if you want to extend the sameresource hierarchy to a different server. This will repeat the extend operation. Or click Finish tocomplete this operation.

4. The dialog then displays verification information as the extended hierarchy is validated. When this isfinished, theDone button will be enabled. Click Done to finish.

Extending a Raw Device Resource HierarchyThis operation can be started automatically after you have finished creating a raw device resourcehierarchy, or from an existing raw device resource, as described in the section on extending resourcehierarchies. After you have done that, you then complete the steps below, which are specific to raw deviceresources.

1. Select theRoot Tag that LifeKeeper offers, or enter your own tag for the resource hierarchy on thetarget server, then click Next.

2. The dialog displays the status of the extend operation, which should finish with amessage saying thatthe hierarchy has been successfully extended. Click Next Server if you want to extend the sameresource hierarchy to a different server. This will repeat the extend operation. Or click Finish tocomplete this operation.

3. The dialog then displays verification information as the extended hierarchy is validated. When this isfinished, theDone button will be enabled. Click Done to finish.

Unextending a HierarchyThe LifeKeeperUnextend Resource Hierarchy option removes a complete hierarchy, including all of itsresources, from a single server. This is different than theDelete Resource Hierarchy selection whichremoves a hierarchy from all servers.

When usingUnextend Resource Hierarchy, the server from which the existing hierarchy is to be removed isreferred to as the target server.

TheUnextend Resource Hierarchy selection can be used from any LifeKeeper server that has activeLifeKeeper communications paths to the target server.

1. There are five possible ways to begin.

l Right-click on the icon for the resource hierarchy/server combination that you want tounextended. When the resource context menu appears, click Unextend Resource Hierarchy.

l Right-click on the icon for the global resource hierarchy that you want to unextended. When theresource context menu appears, click Unextend Resource Hierarchy. When the dialog comesup, select the server in the Target Server list from which you want to unextended the resourcehierarchy, and click Next.

l On the global toolbar, click theUnextend Resource Hierarchy button. When the dialog comes

SIOS Protection Suite for LinuxPage 146

Page 167: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Creating a Resource Dependency

up, select the server in the Target Server list from which you want to unextended the resourcehierarchy, and click Next. On the next dialog, select the resource hierarchy that you want tounextended from theHierarchy to Unextend list, and click Next again.

l On the resource context toolbar, if displayed, click theUnextend Resource Hierarchy button.

l On the Edit menu, point toResource and then click Unextend Resource Hierarchy. When thedialog comes up, select the server in the Target Server list from which you want to unextendedthe resource hierarchy, and click Next. On the next dialog, select the resource hierarchy thatyou want to unextended from theHierarchy to Unextend list, and click Next again.

2. The dialog will display amessage verifying the server and resource hierarchy that you have specifiedto be unextended. Click Unextend to perform the action.

3. If the output panel is enabled, the dialog closes, and the results of the commands to unextended theresource hierarchy are shown in the output panel. If not, the dialog remains up to show these results,and you click Done to finish when all results have been displayed.

Creating a Resource DependencyWhile most Recovery Kits create their dependencies during the original resource hierarchy creation task,under certain circumstances, youmay want to create new or additional resource dependencies or deleteexisting ones. An examplemight be that you wish to change an existing IP dependency to another IP address.Instead of deleting the entire resource hierarchy and creating a new one, you can delete the existing IPdependency and create a new dependency with a different IP address.

1. There are four possible ways to begin.

l Right-click on the icon for the parent server-specific resource under the server, or the parentglobal resource, to which you want to add a parent-child dependency. When the resourcecontext menu appears, click Create Dependency.

Note: If you right-clicked on a server-specific resource in the right pane, the value of theServerwill be that server. If you right-clicked on a global resource in the left pane, the value of theServerwill be the server where the resource has the highest priority.

l On the global toolbar, click theCreate Dependency button. When the dialog comes up,select the server in theServer list from which you want to begin creating the resourcedependency, and click Next. On the next dialog, select the parent resource from theParentResource Tag list, and click Next again.

l On the resource context toolbar, if displayed, click theCreate Dependency button.

l On the Edit menu, point toResource and then click Create Dependency. When the dialogcomes up, select the server in theServer list from which you want to begin creating theresource dependency, and click Next. On the next dialog, select the parent resource from theParent Resource Tag list, and click Next again.

2. Select aChild Resource Tag from the drop down box of existing and valid resources on the server.The dialog will display all the resources available on the server with the following exceptions:

SIOS Protection Suite for LinuxPage 147

Page 168: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Deleting a Resource Dependency

l The parent resource, its ancestors, and its children.

l A resource that has not been extended to the same servers as the parent resource.

l A resource that does not have the same relative priority as the parent resource.

l Any resource that is not in-service on the same server as the parent, if the parent resource is in-service.

Click Next to proceed to the next dialog.

3. The dialog will then confirm that you have selected the appropriate parent and child resource tags foryour dependency creation. Click Create Dependency to create the dependency on all servers in thecluster to which the parent has been extended.

4. If the output panel is enabled, the dialog closes, and the results of the commands to create thedependency are shown in the output panel. If not, the dialog remains up to show these results, and youclick Done to finish when all results have been displayed.

Deleting a Resource Dependency1. There are four possible ways to begin.

l Right-click on the icon for the parent server-specific resource under the server, or the parentglobal resource, from which you want to delete a parent-child dependency. When the resourcecontext menu appears, click Delete Dependency.

l On the global toolbar, click theDelete Dependency button. When the dialog comes up, selectthe server in theServer list from which you want to begin deleting the resource dependency,and click Next. On the next dialog, select the parent resource from theParent Resource Taglist, and click Next again.

l On the resource context toolbar, if displayed, click theDelete Dependency button.

l On the Edit menu, point toResource and then click Delete Dependency. When the dialogcomes up, select the server in theServer list from which you want to begin deleting theresource dependency, and click Next. On the next dialog, select the parent resource from theParent Resource Tag list, and click Next again.

2. Select theChild Resource Tag from the drop down box. This should be the tag name of the child inthe dependency that you want to delete. Click Next to proceed to the next dialog box.

3. The dialog then confirms that you have selected the appropriate parent and child resource tags for yourdependency deletion. Click Delete Dependency to delete the dependency on all servers in the cluster.

4. If the output panel is enabled, the dialog closes, and the results of the commands to delete thedependency are shown in the output panel. If not, the dialog remains up to show these results, and youclick Done to finish when all results have been displayed.

Deleting a Hierarchy from All Servers1. There are five possible ways to begin.

SIOS Protection Suite for LinuxPage 148

Page 169: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Deleting a Hierarchy from All Servers

l Right-click on the icon for a resource in the hierarchy that you want to delete under the serverwhere you want the deletion to begin. When the resource context menu appears, click DeleteResource Hierarchy.

l Right-click on the icon for a global resource in the hierarchy that you want to delete. When theresource context menu appears, click Delete Resource Hierarchy. When the dialog comesup, select the server in the Target Server list from which you want to begin deleting theresource hierarchy, and click Next.

l On the global toolbar, click theDelete Resource Hierarchy button. When the dialog comes up,select the server in the Target Server list from which you want to begin deleting the resourcehierarchy, and click Next. On the next dialog, select a resource in the hierarchy that you want todelete from theHierarchy to Delete list, and click Next again.

l On the resource context toolbar in the properties panel, if displayed, click theDelete ResourceHierarchy button.

l On the Edit menu, point toResource and then click Delete Resource Hierarchy. When thedialog comes up, select the server in the Target Server list from which you want to begindeleting the resource hierarchy, and click Next. On the next dialog, select a resource in thehierarchy that you want to delete from theHierarchy to Delete list, and click Next again.

2. The dialog will display amessage verifying the hierarchy you have specified for deletion. Click Deleteto perform the action.

3. If the output panel is enabled, the dialog closes, and the results of the commands to delete thehierarchy are shown in the output panel. If not, the dialog remains up to show these results, and youclick Done to finish when all results have been displayed.

SIOS Protection Suite for LinuxPage 149

Page 170: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents
Page 171: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper User GuideThe User Guide is a complete, searchable resource containing detailed information on themany tasks thatcan be performed within the LifeKeeper GUI. Click User Guide to access this documentation.

The tasks that can be performed through theGUI can be grouped into three areas:

Common Tasks - These are basic tasks that can be performed by any user such as connecting to a cluster,viewing server or resource properties, viewing log files and changing GUI settings.

Operator Tasks - These aremore advanced tasks that require Operator permission, such as bringingresources in and out of service.

Administrator Tasks - These are tasks that require Administrator permission. They include server-level taskssuch as editing server properties, creating resources, creating or deleting comm paths and resource-leveltasks such as editing, extending, or deleting resources.

The table below lists the default tasks that are available for each user permission. Additional tasks may beavailable for specific resource types, and these will be described in the associated resource kitdocumentation.

TaskPermission

Guest Operator AdministratorView servers and resources X X X

Connect to and disconnect from servers X X X

View server properties and logs X X X

Modify server properties X

Create resource hierarchies X

Create and delete comm paths X

View resource properties X X X

Modify resource properties X

Take resources into and out of service X X

Extend and unextend resource hierarchies X

Create and delete resource dependencies X

Delete resource hierarchies X

SIOS Protection Suite for LinuxPage 151

Page 172: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Using LifeKeeper for Linux

Using LifeKeeper for LinuxThe following topics provide detailed information on the LifeKeeper graphical user interface (GUI) as well asthemany tasks that can be performed within the LifeKeeper GUI.

GUITheGUI components should have already been installed as part of the LifeKeeper Core installation. 

The LifeKeeper GUI uses Java technology to provide a graphical user interface to LifeKeeper and itsconfiguration data. Since the LifeKeeper GUI is a client/server application, a user will run the graphical userinterface on a client system in order to monitor or administer a server system where LifeKeeper is running. Theclient and the server components may or may not be on the same system. 

GUI Overview - GeneralTheGUI allows users working on any machine to administer, operate or monitor servers and resources in anycluster as long as they have the required groupmemberships on the cluster machines. (For details, seeConfiguring GUI Users.) TheGUI Server and Client components are described below.

GUI ServerTheGUI server communicates with GUI clients using Hypertext Transfer Protocol (HTTP) and RemoteMethod Invocation (RMI). By default, the GUI server is initialized during LifeKeeper startup, but this can alsobe configured -- see Starting/Stopping the GUI Server. 

GUI ClientTheGUI client can be run either as an application on any LifeKeeper server or as a web client on any Java-enabled system.

The client includes the following components:

l The status table on the upper left displays the high level status of connected servers and theirresources.

l The properties panel on the upper right displays detailed information about themost recently selectedstatus table object.

l The output panel on the bottom displays command output.

l Themessage bar at the very bottom of the window displays processing status messages.

l The context (in the properties panel) and global toolbars provide fast access to frequently used tasks.

l The context (popup) and global menus provide access to all tasks.

SIOS Protection Suite for LinuxPage 152

Page 173: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Exiting GUI Clients

Exiting GUI ClientsSelect Exit from the File Menu to disconnect from all servers and close the client.

The LifeKeeper GUI Software PackageThe LifeKeeper GUI is included in the steeleye-lkGUI software package which is bundled with the LifeKeeperCore Package Cluster. The steeleye-lkGUI package:

l Installs the LifeKeeper GUI Client in Java archive format.

l Installs the LifeKeeper GUI Server.

l Installs the LifeKeeper administration web server.

Note: The LifeKeeper administration web server is configured to use Port 81, which should be differentfrom any public web server.

l Installs a Java policy file in /opt/LifeKeeper/htdoc/ which contains theminimum permissions requiredto run the LifeKeeper GUI. The LifeKeeper GUI application uses the java.policy file in this location foraccess control.

l Prepares LifeKeeper for GUI administration.

Before continuing, you should ensure that the LifeKeeper GUI package has been installed on the LifeKeeperserver(s). You can enter the command rpm -qi steeleye-lkGUI to verify that this package is installed.You should see output including the package name steeleye-lkGUI if the GUI package is installed.

Menus

SIOS LifeKeeper for Linux Menus

Resource Context Menu

SIOS Protection Suite for LinuxPage 153

Page 174: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Server Context Menu

The Resource Context Menu appears when you right-click on a global (cluster-wide) resource, as shownabove, or a server-specific resource instance, as shown below, in the status table. The default resourcecontext menu is described here, but this menumight be customized for specific resource types, in which casethemenu will be described in the appropriate resource kit documentation.

The actions are invoked for the resource that you select. If you select a resource instance on a specificserver, the action is invoked for that server while if you select a global (cluster-wide) resource, you will need toselect the server.

In Service. Bring a resource hierarchy into service.

Out of Service. Take a resource hierarchy out of service.

Extend Resource Hierarchy. Copy a resource hierarchy to another server for failover support.

Unextend Resource Hierarchy. Remove an extended resource hierarchy from a single server.

Create Dependency. Create a parent/child relationship between two resources.

Delete Dependency. Remove a parent/child relationship between two resources.

Delete Resource Hierarchy. Remove a resource hierarchy from all servers in the LifeKeeper cluster.

Properties. Display the Resource Properties Dialog. 

Server Context MenuThe Server Context Menu appears when you right-click on a server icon in the status table. This menu is thesame as the Edit Menu's Server submenu except that the actions are always invoked on the server that youinitially selected.

SIOS Protection Suite for LinuxPage 154

Page 175: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

File Menu

Disconnect. Disconnect from a cluster.

Refresh. Refresh GUI.

View Logs. View LifeKeeper logmessages on connected servers.

Create Resource Hierarchy. Create a resource hierarchy.

Create Comm Path. Create a communication path between servers.

Delete Comm Path. Remove communication paths from a server.

Properties. Display the Server Properties Dialog. 

File Menu

Connect. Connect to a LifeKeeper cluster. Connection to each server in the LifeKeeper cluster requires loginauthentication on that server.

Exit. Disconnect from all servers and close the GUI window.

SIOS Protection Suite for LinuxPage 155

Page 176: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Edit Menu - Resource

Edit Menu - Resource

In Service. Bring a resource hierarchy into service.

Out of Service. Take a resource hierarchy out of service.

Extend Resource Hierarchy. Copy a resource hierarchy to another server for failover support.

Unextend Resource Hierarchy. Remove an extended resource hierarchy from a single server.

Create Dependency. Create a parent/child relationship between two resources.

Delete Dependency. Remove a parent/child relationship between two resources.

Delete Resource Hierarchy. Remove a resource hierarchy from all servers in the LifeKeeper cluster.

Properties. Display the Resource Properties Dialog. 

Edit Menu - Server

SIOS Protection Suite for LinuxPage 156

Page 177: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

View Menu

Disconnect. Disconnect from a cluster.

Refresh. Refresh GUI.

View Logs. View LifeKeeper logmessages on connected servers.

Create Resource Hierarchy. Create a resource hierarchy.

Create Comm Path. Create a communication path between servers.

Delete Comm Path. Remove communication paths from a server.

Properties. Display the Server Properties Dialog. 

View Menu

Expand Tree. Expand the entire resource hierarchy tree.

Collapse Tree. Collapse the entire resource hierarchy tree.

Row Height. Modify the row viewing size of the resources in the resource hierarchy tree and table. Selectsmall, medium or large row height depending upon the number of resources displayed.

Column Width. Modify the columnwith viewing size of the resources in the resource hierarchy tree andtable. Select fill available space, large, medium or small depending upon the resource displayed.

Resource Labels. This option group allows you to specify whether resources are viewed in the resourcehierarchy tree by their tag name or ID.

Sort Resources by Label. will sort resources by resource label only.

SIOS Protection Suite for LinuxPage 157

Page 178: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

HelpMenu

Group Resources by Cluster. will sort by server cluster and resource label such that resources belonging inthe same cluster of servers will be grouped together.

Comm Path Redundancy Warning. specifies the representation of comm path status in the server statusgraphic.

l If selected, the display will show a server warning graphic if the comm paths between a set of serversare not configured with a redundant comm path.

l If not selected, the display will ignore a lack of redundant comm paths between a pair of servers but willstill present server warning graphic if there are comm path failures.

Global Toolbar. Display this component if the checkbox is selected.

Message Bar. Display this component if the checkbox is selected.

Properties Panel. Display this component if the checkbox is selected.

Output Panel. Display this component if the checkbox is selected.

History. Display the newest messages that have appeared in theMessage Bar in the LifeKeeper GUIMessage History dialog box (up to 1000 lines).

Help Menu

Technical Documentation. Displays the landing page of the SIOS Technology Corp. TechnicalDocumentation. 

About.... Displays LifeKeeper GUI version information.

Toolbars

SIOS LifeKeeper for Linux Toolbars

GUI ToolbarThis toolbar is a combination of the default server and resource context toolbars which are displayed on theproperties panel except that youmust select a server and possibly a resource when you invoke actions fromthis toolbar.

SIOS Protection Suite for LinuxPage 158

Page 179: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

GUI Toolbar

Connect. Connect to a LifeKeeper cluster.

Disconnect. Disconnect from a LifeKeeper cluster.

Refresh. Refresh GUI.

View Logs. View LifeKeeper logmessages on connected servers.

Create Resource Hierarchy. Create a resource hierarchy.

Delete Resource Hierarchy. Remove a resource hierarchy from all servers in the LifeKeepercluster.

Create Comm Path. Create a communication path between servers.

Delete Comm Path. Remove communication paths from a server.

In Service. Bring a resource hierarchy into service.

SIOS Protection Suite for LinuxPage 159

Page 180: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Resource Context Toolbar

Out of Service. Take a resource hierarchy out of service.

Extend Resource Hierarchy. Copy a resource hierarchy to another server for failover support.

Unextend Resource Hierarchy. Remove an extended resource hierarchy from a single server.

Create Dependency. Create a parent/child relationship between two resources.

Delete Dependency. Remove a parent/child relationship between two resources.

Migrate Hierarchy toMulti-Site Cluster. Migrate an existing hierarchy to aMulti-Site ClusterEnvironment.

Resource Context ToolbarThe resource context toolbar is displayed in the properties panel when you select a server-specific resourceinstance in the status table. 

The actions are invoked for the server and the resource that you select. Actions that are not available forselection for a resource will be grayed out.

In Service. Bring a resource hierarchy into service.

SIOS Protection Suite for LinuxPage 160

Page 181: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Server Context Toolbar

Out of Service. Take a resource hierarchy out of service.

Extend Resource Hierarchy. Copy a resource hierarchy to another server for failover support.

Unextend Resource Hierarchy. Remove an extended resource hierarchy from a single server.

Add Dependency. Create a parent/child relationship between two resources.

Remove Dependency. Remove a parent/child relationship between two resources.

Delete Resource Hierarchy. Remove a resource hierarchy from all servers.

Server Context ToolbarThe server context toolbar is displayed in the properties panel when you select a server in the status table.The actions are invoked for the server that you select.

Disconnect. Disconnect from a LifeKeeper cluster.

Refresh. Refresh GUI.

SIOS Protection Suite for LinuxPage 161

Page 182: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Preparing to Run theGUI

View Logs. View LifeKeeper logmessages on connected servers.

Create Resource Hierarchy. Create a resource hierarchy.

Delete Resource Hierarchy. Remove a resource hierarchy from all servers in the LifeKeepercluster.

Create Comm Path. Create a communication path between servers.

Delete Comm Path. Remove communication paths from a server.

Preparing to Run the GUI

LifeKeeper GUI - OverviewThe LifeKeeper GUI uses Java technology to provide a graphical status interface to LifeKeeper and itsconfiguration data. Since the LifeKeeper GUI is a client/server application, a user will run the graphical userinterface on a client system in order to monitor or administer a server system where LifeKeeper is executing.The client and the server may or may not be the same system. The LifeKeeper GUI allows users working onany machine to administer, operate, or monitor servers and resources in any cluster, as long as they have therequired groupmemberships on the cluster machines. [For details, see Configuring GUI Users.] TheLifeKeeper GUI Server and Client components are described below.

GUI ServerThe LifeKeeper GUI server is initialized on each server in a LifeKeeper cluster at system startup. Itcommunicates with the LifeKeeper core software via the Java Native Interface (JNI), and with the LifeKeeperGUI client using RemoteMethod Invocation (RMI).

GUI ClientThe LifeKeeper GUI client is designed to run either as an application on a Linux system, or as an applet whichcan be invoked from aweb browser on either aWindows or Unix system.

The LifeKeeper GUI client includes the following graphical components:

SIOS Protection Suite for LinuxPage 162

Page 183: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Starting GUI clients

l The status table on the upper left displays the high level status of connected servers and theirresources.

l The properties panel on the upper right displays detailed information about themost recently selectedstatus table object.

l The output panel on the bottom displays command output.

l Themessage bar at the very bottom of the window displays processing status messages.

l The server context and resource context toolbars (in the properties panel) and global toolbar providefast access to frequently-used tasks.

l The server context and resource context menus (popup) and global menus (file, edit server, editresource, view, and help) provide access to all tasks.

Right-clicking on a graphic resource, server, or table cell will display a context menu. Most tasks can also beinitiated from these context menus, in which case the resources and servers will be automatically determined.

Starting GUI clients

Starting the LifeKeeper GUI AppletTo run the LifeKeeper GUI applet via the web open your favorite web browser and go to the URL http://<servername>:81 where <server name> is the name of a LifeKeeper server. This will load the LifeKeeper GUIapplet from the LifeKeeper GUI server on that machine.

After it has finished loading, you should see the Cluster Connect dialog, which allows you to connect to anyGUI server.

NOTE: When you run the applet, if your system does not have the required Java Plug-in, you will beautomatically taken to the web site for downloading the plug-in. Youmust also set your browser securityparameters to enable Java.

If you have done this and the client still is not loading, see GUI Troubleshooting.

Starting the application clientUsers with administrator privileges on a LifeKeeper server can run the application client from that server. Tostart the LifeKeeper GUI app run /opt/LifeKeeper/bin/lkGUIapp from a graphical window.

If you have done this and the client still is not loading, see GUI Troubleshooting.

Exiting GUI ClientsSelect Exit from the File menu to disconnect from all servers and close the client.

Configuring the LifeKeeper GUI

Configuring the LifeKeeper Server for GUI Administration

SIOS Protection Suite for LinuxPage 163

Page 184: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Running the GUI

Perform the following steps for each LifeKeeper server. Each step contains references or links for moredetailed instructions.

1. Youmust install the Java Runtime Environment (JRE) or Java Software Development Kit (JDK) oneach server. See the SPS for Linux Release Notes for the required Java version and URL to accessthe required download. Note: Youmay install the JRE from the SPS Installation Image File by runningthe setup script from the installation image file and opting only to install the JRE. (See the SPS forLinux Installation Guide for more information.)

2. Start the LifeKeeper GUI Server on each server (see Starting/Stopping the GUI Server). Note: Oncethe GUI Server has been started following an initial installation, starting and stopping LifeKeeper willstart and stop all LifeKeeper daemon processes including the GUI Server.

3. If you plan to allow users other than root to use the GUI, then you need to Configure GUI Users.

Running the GUIYou can run the LifeKeeper GUI:

l on the LifeKeeper server in the cluster and/or

l on a remote system outside the cluster

See Running the GUI on the LifeKeeper Server for information on configuring and running the GUI on a serverin your LifeKeeper cluster.

See Running the GUI on a Remote System for information on configuring and running the GUI on a remotesystem outside your LifeKeeper cluster.

GUI Configuration

Item Description

GUI client andserver com-munication

The LifeKeeper GUI client and server use Java RemoteMethod Invocation (RMI) tocommunicate.  For RMI to work correctly, the client and server must use resolvable host-names or IP addresses.  If DNS is not implemented (or names are not resolvable usingother name lookupmechanisms), edit the /etc/hosts file on each client and server toinclude the names and addresses of all other LifeKeeper servers.

SIOS Protection Suite for LinuxPage 164

Page 185: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

GUI Limitations

GUI ServerJava platform

The LifeKeeper GUI server requires that the Java Runtime Environment (JRE) - Javavirtual machine, the Java platform core classes and supporting files - be installed. TheJRE for Linux is available on the SPS for Linux Installation Image File (See the SPS forLinux Installation Guide) or it can be downloaded directly fromhttp://www.oracle.com/technetwork/java/javase/downloads/index.html. (Note: Ifdownloading directly from this site, make sure you download Version 1.8 (x64). 32-bitJRE is not supported.)

Note: By default, the LifeKeeper GUI server expects the JRE on each server to beinstalled in the directory /usr/java/jre1.8.0_51. If this is not found, it will look in thedirectory /usr/java/jdk1.8.0_51 for a Java Software Development Kit (JDK). If you wantto use a JRE or JDK in another directory location, youmust edit the PATH in theLifeKeeper default file /etc/default/LifeKeeper to include the directory containing the javainterpreter, java.exe. If LifeKeeper is running when you edit this file, you should stop andrestart the LifeKeeper GUI server to recognize the change. Otherwise, the LifeKeeperGUI will not be able to find the Java command.

Java remoteobject registryserver port

The LifeKeeper GUI server uses port 82 for the Java remote object registry on eachLifeKeeper server. This should allow servers to support RMI calls from clients behindtypical firewalls.

LifeKeeperadministrationweb server

The LifeKeeper GUI server requires an administration web server for client browser com-munication. Currently, the LifeKeeper GUI server is using a private copy of the lighttpdweb server for its administration web server. This web server is installed and configuredby the steeleye-lighttpd package and uses port 81 to avoid a conflict with other web serv-ers.

GUI client net-work access

LifeKeeper GUI clients require network access to all hosts in the LifeKeeper cluster.When running the LifeKeeper GUI client in a browser, you will have to lower the securitylevel to allow network access for applets. Be careful not to visit other sites with securityset to low values (e.g., change the security settings only for intranet or trusted sites).

GUI Limitations

Item Description

GUI inter-operabilityrestriction

The LifeKeeper for Linux client may only be used to administer LifeKeeper on Linux serv-ers. The LifeKeeper for Linux GUI will not interoperate with LifeKeeper forWindows.

Starting and Stopping the GUI Server

To Start the LifeKeeper GUI ServerIf the LifeKeeper GUI Server is not running, type the following command as root:

/opt/LifeKeeper/bin/lkGUIserver start

SIOS Protection Suite for LinuxPage 165

Page 186: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Troubleshooting

This command starts all LifeKeeper GUI Server daemon processes on the server being administered if theyare not currently running. A message similar to the following is displayed.

# Installing GUI Log# LK GUI Server Startup at:# Mon May 8 14:14:46 EDT 2006# LifeKeeper GUI Server Startup completed at:# Mon May 8 14:14:46 EDT 2006

Once the LifeKeeper GUI Server is started, all subsequent starts of LifeKeeper will automatically start theLifeKeeper GUI Server processes.

TroubleshootingThe LifeKeeper GUI uses Ports 81 and 82 on each server for its administration web server and Java remoteobject registry, respectively. If another application is using the same ports, the LifeKeeper GUI will notfunction properly. These values may be changed by editing the following entries in the LifeKeeper default file/etc/default/LifeKeeper.

GUI_WEB_PORT=81 GUI_RMI_PORT=82

Note: These port values are initialized in the GUI server at start time. If you alter them, you will need tostop and restart the steeleye-lighttpd process. These values must be the same across all clusters towhich you connect.

To Stop the LifeKeeper GUI ServerIf the LifeKeeper GUI Server is running, type the following command as root:

/opt/LifeKeeper/bin/lkGUIserver stop

This command halts all LifeKeeper GUI Server daemon processes on the server being administered if they arecurrently running. The followingmessages are displayed.

# LifeKeeper GUI Server Shutdown at:# Fri May 19 15:37:27 EDT 2006# LifeKeeper GUI Server Shutdown Completed at:# Fri May 19 15:37:28 EDT 2006

LifeKeeper GUI Server ProcessesTo verify that the LifeKeeper GUI Server is running, type the following command:

ps -ef | grep runGuiSer

You should see output similar to the following:

root 2805 1 0 08:24 ? 00:00:00 sh/opt/LifeKeeper/bin/runGuiSer

To see a list of the other GUI Server daemon processes currently running, type the following command:

ps -ef | grep S_LK

SIOS Protection Suite for LinuxPage 166

Page 187: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Java Security Policy

You should see output similar to the following:

root 30228 30145 0 11:20 ? 00:00:00 java -Xint -Xss3M-DS_LK=true -Djava.rmi.server.hostname=thor48 ...

Java Security PolicyThe LifeKeeper GUI uses policy-based access control. When theGUI client is loaded, it is assignedpermissions based on the security policy currently in effect. The policy, which specifies permissions that areavailable for code from various signers/locations, is initialized from an externally configurable policy file.

There is, by default, a single system-wide policy file and an optional user policy file. The system policy file,which is meant to grant system-wide code permissions, is loaded first, and then the user policy file is added toit. In addition to these policy files, the LifeKeeper GUI policy file may also be loaded if the LifeKeeper GUI isinvoked as an application.

Location of Policy FilesThe system policy file is by default at:

<JAVA.HOME>/lib/security/java.policy (Linux)

<JAVA.HOME>\lib\security\java.policy (Windows)

Note: JAVA.HOME refers to the value of the system property named "JAVA.HOME", which specifies thedirectory into which the JRE or JDK was installed.

The user policy file starts with `.` and is by default at:

<USER.HOME>\.java.policy

Note: USER.HOME refers to the value of the system property named "user.home", which specifies theuser's home directory. For example, the home directory on aWindows NT workstation for a user named Paulmight be "paul.000".

ForWindows systems, the user.home property value defaults to

C:\WINNT\Profiles\<USER> (on multi-user Windows NT systems)

C:\WINDOWS\Profiles\<USER> (on multi-user Windows 95/98 systems)

C:\WINDOWS (on single-user Windows 95/98 systems)

The LifeKeeper GUI policy file is by default at:

/opt/LifeKeeper/htdoc/java.policy (Linux)

Policy File Creation and ManagementBy default, the LifeKeeper GUI policy file is used when the LifeKeeper GUI is invoked as an application. If youare running the LifeKeeper GUI as an applet, you will need to create a user policy file in your home directory if

SIOS Protection Suite for LinuxPage 167

Page 188: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Granting Permissions in Policy Files

one does not already exist. The user policy file should specify theminimum permissions required to run theLifeKeeper GUI, which are provided in the "Sample Policy File" section later in this topic.

A policy file can be created andmaintained via a simple text editor, or via the graphical Policy Tool utilityincluded with the Java Runtime Environment (JRE) or Java Development Kit (JDK). Using the Policy Toolsaves typing and eliminates the need for you to know the required syntax of policy files. For information aboutusing the Policy Tool, see the Policy Tool documentation athttp://docs.oracle.com/javase/8/docs/technotes/tools/.

The simplest way to create a user policy filewith theminimum permissions required to run the LifeKeeperGUI is to copy the LifeKeeper GUI policy file located in /opt/LifeKeeper/htdoc/java.policy to your homedirectory and rename it .java.policy (note the leading dot before the filenamewhich is required). On aWindowssystem, you can copy the LifeKeeper GUI policy file by opening the file http://<server name>:81/java.policy(where <server name> is the host name of a LifeKeeper server) and saving it as .java.policy in your homedirectory. If you need to determine the correct location for a user policy file, enable the Java Console using theJava Control Panel and start the LifeKeeper GUI as an applet. The home directory path for the user policy filewill be displayed in the Java Console.

Granting Permissions in Policy FilesA permission represents access to a system resource. In order for a resource access to be allowed for anapplet, the corresponding permissionmust be explicitly granted to the code attempting the access. Apermission typically has a name (referred to as a "target name") and, in some cases, a comma-separated listof one or more actions. For example, the following code creates a FilePermission object representing readaccess to the file named abc in the /tmp directory:

perm = new java.io.FilePermission("/tmp/abc","read");

In this, the target name is "/tmp/abc" and the action string is "read".

A policy file specifies what permissions are allowed for code from specified code sources. An example policyfile entry granting code from the /home/sysadmin directory read access to the file /tmp/abc is:

grant codeBase "file:/home/sysadmin/" { permissionjava.io.FilePermission"/tmp/abc", "read"; };

Sample Policy FileThe following sample policy file includes theminimum permissions required to run the LifeKeeper GUI. Thispolicy file is installed in /opt/LifeKeeper/htdoc/java.policy by the LifeKeeper GUI package.

/** Permissions needed by the LifeKeeper GUI. You may want to * restrict this by codebase. However, if you do this, remember* that the recovery kits can have an arbitrary jar component* with an arbitrary codebase, so you'll need to alter the grant* to cover these as well.*/grant {

SIOS Protection Suite for LinuxPage 168

Page 189: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Java Plug-In

/** Need to be able to do this to all machines in the* LifeKeeper cluster. You may restrict the network* specification accordingly.*/permission java.net.SocketPermission"*", "accept,connect,resolve";/** We use URLClassLoaders to get remote properties files and* jar pieces. */permission java.lang.RuntimePermission"createClassLoader";/** The following are needed only for the GUI to run as an* application (the default RMI security manager is more* restrictive than the one a browser installs for its* applets.*/permission java.util.PropertyPermission "*","read";permission java.awt.AWTPermission "*";permission java.io.FilePermission "<<ALL FILES>>","read,execute";

};

Java Plug-InRegardless of the browser you are using (see supported browsers), the first time your browser attempts toload the LifeKeeper GUI, it will either automatically download the Java Plug-In software or redirect you to aweb page to download and install it. From that point forward, the browser will automatically invoke the JavaPlug-in software every time it comes across web pages that support the technology.

Downloading the Java Plug-inJava Plug-in software is included as part of the Java Runtime Environment (JRE) for Solaris, Linux andWindows. Downloading the JRE typically takes a total of three to tenminutes, depending on your networkand system configuration size. The download web page provides more documentation and installationinstructions for the JRE and Java Plug-in software.

Note 1: You should close and restart your browser after installing the plug-in and whenever plug-in propertiesare changed.

Note 2: Only Java Plug-in version 8 update 51 is supported with LifeKeeper.

Running the GUI on a Remote SystemYoumay administer LifeKeeper from a Linux, Unix orWindows system outside the LifeKeeper cluster byrunning the LifeKeeper GUI as a Java applet. Configuring and running the GUI in this environment isdescribed below.

SIOS Protection Suite for LinuxPage 169

Page 190: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Configuring the GUI on a Remote System

Configuring the GUI on a Remote SystemIn order to run the LifeKeeper GUI on a remote Linux, Unix orWindows system, your browser must provide fullJDK 1.7 (x64) applet support. Refer to the SPS for Linux Release Notes for information on the supportedplatforms and browsers for the LifeKeeper GUI.

1. If you are running the LifeKeeper GUI as an applet, you need to create a user policy file in your homedirectory if one does not already exist. The user policy file should specify theminimum permissionsrequired to run the LifeKeeper GUI.

l The simplest way to create a user policy file with theminimum permissions required torun the LifeKeeper GUI is to copy the LifeKeeper GUI policy file located in/opt/LifeKeeper/htdoc/java.policy to your home directory and rename it .java.policy(note there is a leading dot in the file name that is required). On aWindows system, youcan copy the LifeKeeper GUI policy file by opening the file http://<servername>:81/java.policy (where <servername> is the host name of a LifeKeeper server),and saving it as .java.policy in your home directory. If you need to determine the correctlocation for a user policy file, enable the Java Console using the Java Control Panel,and start the LifeKeeper GUI as an applet. The home directory path for the user policy filewill be displayed in the Java Console.

l If you already have a user policy file, you can add the required entries specifiedin/opt/LifeKeeper/ htdoc/java.policy on a LifeKeeper server into the existing file usinga simple text editor. See Java Security Policy for further information.

2. Youmust set your browser security parameters to low. This generally includes enabling of Java andJava applets. Since there are several different browsers and versions, the instructions for settingbrowser security parameters are covered in Setting Browser Security Parameters for the GUI Applet.

Note: It is important to use caution in visiting external sites with low security settings.

3. When you run the GUI for the first time, if you are usingNetscape or Internet Explorer and yoursystem does not have the required Java plug-in, youmay be automatically taken to the appropriateweb site for downloading the plug-in. See the SPS for Linux Release Notes for the required Java Plug-in version and URL to access the download.

Running the GUI on a Remote SystemAfter you have completed the tasks described above, you are ready to run the LifeKeeper GUI as a Javaapplet on a remote system.

1. Open the URL, http://<server name>:81, for the LifeKeeper GUI webpage (where <server name> isthe name of the LifeKeeper server). The web page contains the LifeKeeper splash screen and applet.When the web page is opened, the following actions take place:

l the splash screen is displayed

l the applet is loaded

l the Java Virtual Machine is started

SIOS Protection Suite for LinuxPage 170

Page 191: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Applet Troubleshooting

l some server files are downloaded

l the applet is initialized

Depending upon your network and system configuration, these actions may take up to 20 seconds.Typically, browsers provide someminimal status as the applet is loading and initializing.

If everything loads properly, aStart button should appear in the applet area. If the splash screen doesnot display aStart button or you suspect that the applet failed to load and initialize, refer to AppletTroubleshooting or see Network-Related Troubleshooting.

2. When prompted, click Start. The LifeKeeper GUI appears and the Cluster Connect Dalog isautomatically displayed. Once a Server has been entered and connection to the cluster established,the GUI window displays a visual representation and status of the resources protected by theconnected servers. TheGUI menus and toolbar buttons provide LifeKeeper administration functions.

Note: Some browsers add “Warning: Applet Window” to windows and dialogs created by an applet.This is normal and can be ignored.

Applet TroubleshootingIf you suspect that the applet failed to load and initialize, try the following:

1. Verify that applet failed. Usually amessage is printed somewhere in the browser window specifyingthe state of the applet. InNetscape and Internet Explorer, an iconmay appear instead of the applet inaddition to some text status. Clicking this iconmay bring up a description of the failure.

2. Verify that you have installed the Java Plug-in. If your problem appears to be Java Plug-in related, referto the Java Plug-in topic.

3. Verify that you havemet the browser configuration requirements, especially the security settings.Refer to Setting Browser Security Parameters for the GUI Applet for more information. If you don't findanything obviously wrong with your configuration, continue with the next steps.

4. Open the Java Console.

l For Firefox, Netscape and older versions of Internet Explorer, run the Java Plug-Inapplet from your machine's Control Panel and select the option to show the console,then restart your browser.

l For recent versions of Internet Explorer, select Tools > Java Console. If you do notsee the Java Consolemenu item, select Tools > Manage Add-Ons and enable theconsole, after which youmay need to restart your browser before the console willappear.

l ForMozilla, select Tools > Web Development > Java Console.

5. Reopen the URL, http://<server name>:81 to start the GUI applet. If you'vemodified the Java Plug-In Control Panel, restart your browser.

6. Check the console for any messages. Themessages should help you resolve the problem. If theproblem appears to be network related, refer to Network-Related Troubleshooting.

SIOS Protection Suite for LinuxPage 171

Page 192: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Running the GUI on a LifeKeeper Server

Running the GUI on a LifeKeeper ServerThe simplest way to run the LifeKeeper GUI is as an application on a LifeKeeper server. By doing so you are,in effect, running the GUI client and server on the same system.

1. After configuring the LifeKeeper server for GUI Administration, you can run the GUI as an applicationon the server by entering the following command as root:

/opt/LifeKeeper/bin/lkGUIapp

2. The lkGUIapp script sets the appropriate environment variables and starts the application. As theapplication is loading, an application identity dialog or splash screen for LifeKeeper appears.

3. After the application is loaded, the LifeKeeper GUI appears and the Cluster Connect dialog isautomatically displayed. Enter the Server Name you wish to connect to, followed by the login andpassword.

4. Once a connection to the cluster is established, the GUI window displays a visual representation andstatus of the resources protected by the connected servers. TheGUI menus and toolbar buttonsprovide administration functions.

Browser Security Parameters for GUI AppletWARNING: Be careful of other sites you visit with security set to low values. 

Firefox1. From theEdit menu, select Preferences.

2. In thePreferences dialog box, select Content.

3. Select theEnable Java andEnable Java Script options.

4. Click Close.

Internet ExplorerThemost securemethod for using Internet Explorer is to add the LifeKeeper server to the Trusted Sites zoneas follows:

1. From the Tools menu, click Internet Options.

2. Click theSecurity tab.

3. Select Trusted Sites zone and click Custom Level.

4. UnderReset custom settings, selectMedium/Low, then click Reset.

5. Click Sites.

SIOS Protection Suite for LinuxPage 172

Page 193: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Status Table

6. Enter the server name and port number for the LifeKeeper server(s) to which you wish to connect (forinstance: http://server1:81).

An alternative, but possibly less securemethod, is to do the following:

1. From the Tools menu, click Internet Options.

2. Select either Internet or Local Intranet (depending upon whether your remote system and theLifeKeeper cluster are on the same intranet).

3. Adjust theSecurity Level bar toMedium (for Internet) orMedium-low (for Local Intranet). These arethe default settings for each zone.

4. Click OK.

Status TableThe status table provides a visual representation of the status of connected servers and their resources. Itshows:

l the state of each server in the top row,

l the global (cross-server) state and the parent-child relationships of each resource in the left-mostcolumn, and

l the state of each resource on each server in the remaining cells.

The states of the servers and resources are shown using graphics, text and color. An empty table cell under aserver indicates that a particular resource has not been defined on that server.

If you select a server or a resource instance in the status table, detailed state information and a context-sensitive toolbar for that item are shown in the properties panel. You can also pop up the appropriate servercontext menu or resource context menu for any item by right-clicking on that cell.

The status table is split into two sections. The relative sizes of the left and right sections can bemodified bymoving the divider between them. The status table can also be collapsed to show only the highest level itemsin the hierarchy trees. Collapsing or expanding resource items in the tree causes the hierarchies listed in thetable to also expand and collapse.

Properties PanelThe properties panel displays the properties of the server or resource that is selected in the status table. Theproperties panel has the same functionality as the server properties dialog or the resource properties dialog,plus a context-sensitive toolbar to provide fast access to commonly used commands. The caption at the topof this panel is server_name if a server is selected, or server_name: resource_name if a resource isselected.

The context-sensitive toolbars displayed in the properties panel are the server context toolbar and theresource context toolbar. Server or resource toolbars may also be customized. For more information oncustomized toolbars, see the corresponding application recovery kit documentation.

The buttons at the bottom of the properties panel function as follows:

SIOS Protection Suite for LinuxPage 173

Page 194: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Output Panel

l TheApply button applies any changes that have beenmade to editable properties on the panel. Thisbutton is only enabled if you have changed an editable property.

l TheReset button queries the server for the current values of all properties, clearing any changes thatyoumay havemade. This button is always enabled.

l TheHelp button displays context-sensitive help for the properties panel. This button is alwaysenabled.

You increase or decrease the size of the properties panel by sliding the separator at the left of the panel to theleft or right. If you want to open or close this panel, use theProperties Panel checkbox on the View Menu. 

Output PanelThe output panel collects output from commands issued by the LifeKeeper GUI client. When a commandbegins to run, a time stamped label is added to the output panel, and all of the output from that command isadded under this label. If you are runningmultiple commands at the same time (typically on different servers),the output from each command is sent to the corresponding sectionmaking it easy to see the results of each.

You increase or decrease the size of the output panel by sliding the separator at the top of the panel up ordown. If you want to open or close this panel, use theOutput Panel checkbox on the View Menu. When theoutput panel is closed, the dialog that initiates each commandwill stay up, the output will be displayed on thatdialog until you dismiss it and you will not be able to review the output from any command after you haveclosed that dialog. After the output panel is reopened, the LifeKeeper GUI will return to its default behavior.

Message BarThemessage bar appears beneath the status window. It is used for displayingmessages in a single text line.Message such as "Connecting to Server X" or "Failure to connect to Server X" might be displayed.

To hide themessage bar, clear theMessage Bar checkbox in the View Menu.

To display themessage bar, select theMessage Bar checkbox in the View Menu.

To see a history of messages displayed in themessage bar, see ViewingMessage History. 

Exiting the GUISelect Exit from the File Menu to disconnect from all servers and close the GUI window.

Common TasksThe following are basic tasks that can be performed by any user.

Starting LifeKeeperAll SPS software is installed in the directory /opt/LifeKeeper.

SIOS Protection Suite for LinuxPage 174

Page 195: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Starting LifeKeeper Server Processes

When you have completed all of the verification tasks, you are ready to start LifeKeeper on both servers. Thissection provides information for starting the LifeKeeper server daemon processes. The LifeKeeper GUIapplication is launched using a separate command and is described in Configuring the LifeKeeper GUI.LifeKeeper provides a command line interface that starts and stops the LifeKeeper daemon processes. Thesedaemon processes must be running before you start the LifeKeeper GUI.

Starting LifeKeeper Server ProcessesIf LifeKeeper is not currently running on your system, type the following command as the user root on allservers:

/etc/init.d/lifekeeper start

Following the delay of a few seconds, an informational message is displayed.

Note: If you receive an error message referencing the LifeKeeper Distribution Enabling Packagewhenyou start LifeKeeper, you should install / re-install the LifeKeeper Installation Image File.

See the LCD(1M)man page by enteringman LCD at the command line for details on the/etc/init.d/lifekeeper start command.

Enabling Automatic LifeKeeper RestartWhile the above commandwill start LifeKeeper, it will need to be performed each time the system is re-booted. If you would like LifeKeeper to start automatically when server boots up, type the following command:

chkconfig lifekeeper on

See the chkconfigman page for further information.

Stopping LifeKeeperIf you need to stop LifeKeeper, type the following command as root to stop it:

/etc/init.d/lifekeeper stop-nofailover

This commandwill shut down LifeKeeper on the local system if it is currently running. It will first remove allprotected resources from service on the local system then shut down the LifeKeeper daemons. Protectedresources will not fail over to another system in the cluster. LifeKeeper will automatically restart when thesystem is restarted.

/etc/init.d/lifekeeper stop-daemons

This commandwill skip the section that removes resources from service. The resources will remain runningon the local system but will no longer be protected by LifeKeeper. This command should be used with caution,

SIOS Protection Suite for LinuxPage 175

Page 196: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Disabling Automatic LifeKeeper Restart

because if resources are not gracefully shut down, then items such as SCSI locks will not be removed. If thesystem onwhich this command is executed subsequently fails or is shut down, the system(s) will NOTinitiate failover of the appropriate resources. LifeKeeper will automatically restart when the system isrestarted.

/etc/init.d/lifekeeper stop

This commandwill remove the resources from service but does not set the !nofailover! flag [seeLCDIflag(1M)] on any of the systems that it can communicate with. This means that failover will occur ifthe shutdown_switchover flag is set. If shutdown_switchover is not set, then this command behavesthe same as /etc/init.d/lifekeeper stop-nofailover. LifeKeeper will automatically restart whenthe system is restarted.

/etc/init.d/lifekeeper stop-failover

This commandwill remove resources from service and initiate failover. It will behave the same as/etc/init.d/lifekeeper stop when the shutdown_switchover flag is set. LifeKeeper willautomatically restart when the system is restarted.

Disabling Automatic LifeKeeper RestartIf you do not want LifeKeeper to automatically restart when the system is restarted, type the followingcommand:

chkconfig lifekeeper off

See the chkconfigman page for further information.

Viewing LifeKeeper ProcessesTo see a list of all LifeKeeper core daemon processes currently running, type the following command:

ps -ef | grep LifeKeeper | grep -w bin | grep -v lklogmsg

An example of the output is provided below:

root 11663 11662 0 14:03 pts/0 00:00:00 /bin/bash /etc/redhat-lsb/lsb_start_daemon/opt/LifeKeeper/sbin/runsvdir -P /opt/LifeKeeper/etc/service log: runit juststarted....................................................................................................................................................................

root 11666 11663 0 14:03 pts/0 00:00:00 /bin/bash -c ulimit -S -c 0 >/dev/null 2> &1 ;/opt/LifeKeeper/sbin/runsvdir -P /opt/LifeKeeper/etc/service log: runit juststarte

SIOS Protection Suite for LinuxPage 176

Page 197: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Viewing LifeKeeper GUI Server Processes

d.................................................................................................................................................................

root 11880 11873 0 14:03 ? 00:00:00 /opt/LifeKeeper/bin/lk_logmgr -l/opt/LifeKeeper/out -d/etc/default/LifeKeeper

root 12240 11877 0 14:04 ? 00:00:00 /opt/LifeKeeper/bin/lcm

root 12247 11879 0 14:04 ? 00:00:00 /opt/LifeKeeper/bin/ttymonlcm

root 12250 11876 0 14:04 ? 00:00:00 /opt/LifeKeeper/bin/lcd

root 12307 11874 0 14:04 ? 00:00:00 /opt/LifeKeeper/bin/lkcheck

root 12311 11875 0 14:04 ? 00:00:00 /opt/LifeKeeper/bin/lkscsid

root 12325 11871 0 14:04 ? 00:00:00 /opt/LifeKeeper/bin/lkvmhad

root 12335 12330 0 14:04 ? 00:00:00 /opt/LifeKeeper/bin/perl /opt/LifeKeeper/htdoc/cgi-bin/DoRequest.fcgi

The run state of LifeKeeper can be determined via the following command:

/opt/LifeKeeper/bin/lktest

If LifeKeeper is running it will output something similar to the following:

F S UID PID PPID C CLS PRI NI SZ STIME TIME CMD

4 S root 12240 11877 0 TS 39 -20 6209 14:04 00:00:00 lcm

4 S root 12247 11879 0 TS 39 -20 30643 14:04 00:00:00 ttymonlcm

4 S root 12250 11876 0 TS 29 -10 9575 14:04 00:00:00 lcd

If LifeKeeper is not running, then nothing is output and the command exists with a 1.

Note: There are additional LifeKeeper processes running that start, stop, andmonitor the LifeKeeper coredaemon processed along with those required for the Graphical User Interface (GUI). See Viewing LifeKeeperControlling Processes and Viewing LifeKeeper GUI Server Processes for a list of the processes. Additionally,most LifeKeeper processes have a child lklogmsg to capture and log any unexpected output.

Viewing LifeKeeper GUI Server ProcessesTo verify that the LifeKeeper GUI Server is running, type the following command:

ps -ef | grep runGuiSer

You should see output similar to the following:

root 2805 1 0 08:24 ? 00:00:00 sh /opt/LifeKeeper/bin/runGuiServer

To see a list of the other GUI Server daemon processes currently running, type the following command:

ps -efw | grep S_LK

You should see output similar to the following:

SIOS Protection Suite for LinuxPage 177

Page 198: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Viewing LifeKeeper Controlling Processes

root 819 764 0 Oct16 ? 00:00:00 java -Xint -Xss3M -DS_LK=true -Djava.rmi.server.hostname=wake -Dcom.steeleye.LifeKeeper.rmiPort=82 -Dcom.steeleye.LifeKeeper.LKROOT=/opt/LifeKeeper -DGUI_RMI_REGISTRY=internal -DGUI_WEB_PORT=81 com.steeleye.LifeKeeper.beans.S_LK

To verify that the LifeKeeper GUI Server AdministrationWeb Server is running type the following command:

ps -ef|grep steeleye-light | egrep -v "lklogmsg|runsv"

You should see output similar to the following:

root 12330 11872 0 14:04 ? 00:00:00 /opt/LifeKeeper/sbin/steeleye-lighttpd -D -f/opt/LifeKeeper/etc/lighttpd/lighttpd.conf

Viewing LifeKeeper Controlling ProcessesTo verify that the LifeKeeper controlling processes are running, type the following command:

ps -ef | grep runsv

You should see output similar to the following:

root 11663 11662 0 14:03 pts/0 00:00:00 /bin/bash /etc/redhat-lsb/lsb_start_daemon/opt/LifeKeeper/sbin/runsvdir -P /opt/LifeKeeper/etc/service log: runit juststarted....................................................................................................................................................................

root 11666 11663 0 14:03 pts/0 00:00:00 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ;/opt/LifeKeeper/sbin/runsvdir -P /opt/LifeKeeper/etc/service log: runit juststarted..................................................................................................................................................................

root 11667 11666 0 14:03 pts/0 00:00:00 /opt/LifeKeeper/sbin/runsvdir -P/opt/LifeKeeper/etc/service log: runit juststarted....................................................................................................................................................................

root 11871 11667 0 14:03 ? 00:00:00 /opt/LifeKeeper/sbin/runsv lkvmhad

root 11872 11667 0 14:03 ? 00:00:00 /opt/LifeKeeper/sbin/runsv steeleye-lighttpd

root 11873 11667 0 14:03 ? 00:00:00 /opt/LifeKeeper/sbin/runsv lk_logmgr

root 11874 11667 0 14:03 ? 00:00:00 /opt/LifeKeeper/sbin/runsv lkcheck

root 11875 11667 0 14:03 ? 00:00:00 /opt/LifeKeeper/sbin/runsv lkscsid

root 11876 11667 0 14:03 ? 00:00:00 /opt/LifeKeeper/sbin/runsv lcd

root 11877 11667 0 14:03 ? 00:00:00 /opt/LifeKeeper/sbin/runsv lcm

root 11878 11667 0 14:03 ? 00:00:00 /opt/LifeKeeper/sbin/runsv lkguiserver

SIOS Protection Suite for LinuxPage 178

Page 199: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Connecting Servers to a Cluster

root 11879 11667 0 14:03 ? 00:00:00 /opt/LifeKeeper/sbin/runsv ttymonlcm

These processes start, stop, andmonitor LifeKeeper core daemon processes andmust be running to startLifeKeeper. These processes are configured by default to start when the system boots and this behaviorshould not be altered.

Connecting Servers to a Cluster1. There are two possible ways to begin.

l On the global toolbar, click theConnect button.

l On the File Menu, click Connect.

2. In theServer Name field of the Cluster Connect dialog, enter the name of a server within the cluster towhich you want to connect.

Note: If using an IPv6 address, this address will need to be enclosed in brackets [ ]. This will allow aconnection to be established through amachine's IPv6 address. Alternatively, a name can be assignedto the address, and that name can then be used to connect.

3. In the Login andPassword fields, enter the login name and password of a user with LifeKeeperauthorization on the specified server.

4. Click OK.

If the GUI successfully connects to the specified server, it will continue to connect to (and add to the statusdisplay) all known servers in the cluster until no new servers are found.

Note: If the initial login name and password fails to authenticate the client on a server in the cluster, the user isprompted to enter another login name and password for that server. If "Cancel" is selected from the Passworddialog, connection to that server is aborted and theGUI continues connecting to the rest of the cluster.

Disconnecting From a ClusterThis task disconnects your GUI client from all servers in the cluster, and it does so through the server youselect.

SIOS Protection Suite for LinuxPage 179

Page 200: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Viewing Connected Servers

1. There are three possible ways to begin.

l On theGlobal Toolbar, click theDisconnect button.

l On the Edit Menu, select Server and then click Disconnect.

l On the Server Context Toolbar, if displayed, click theDisconnect button.

2. In theSelect Server in Cluster list of the Cluster Disconnect Dialog, select the name of a server inthe cluster from which you want to disconnect.

3. Click OK. A Confirmation dialog listing all the servers in the cluster is displayed.

4. Click OK in theConfirmation dialog to confirm that you want to disconnect from all servers in thecluster.

After disconnecting from the cluster, all servers in that cluster are removed from theGUI status display.

Viewing Connected ServersThe state of a server can be determined by looking at the graphic representation of the server in the table'sheader as shown below. See Viewing the Status of a Server for an explanation of the server states indicatedvisually by the server icon.

Viewing the Status of a ServerThe state of a server can be determined by looking at the graphic representation of the server in the table'sheader as shown below.

Server State Visualstate What it Means

ALIVE

Client has valid connection to the server.

Comm paths originating from this server to an ALIVE remote server are ALIVE.

Comm paths whichmay bemarked DEAD and which target a DEAD server areignored because the DEAD server will be reflected in its own graphic.

SIOS Protection Suite for LinuxPage 180

Page 201: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Viewing Server Properties

ALIVE

Client has valid connection to the server.

One or more comm paths from this server to a given remote server aremarked asDEAD.

No redundant comm path exists from this server to a given remote server.

DEAD Reported as DEAD by other servers in the cluster.

UNKNOWN Network connection was lost. Last known LifeKeeper state is ALIVE.

Viewing Server Properties1. There are two possible ways to begin.

l Right-click on the icon for the server for which you want to view the properties. When the ServerContext Menu appears, click Properties. Server properties will also be displayed in theProperties Panel if it is enabled when clicking on the server.

l On the Edit Menu, point toServer and then click Properties. When the dialog comes up, selectthe server for which you want to view the properties from the Server list.

2. If you want to view properties for a different server, select that server from the dialog's Server list.

3. When you are finished, click OK to close the window.

Viewing Server Log Files1. There are four ways to begin.

l Right-click on a server icon to display the Server Context Menu, then click View Log to bring upthe LifeKeeper Log Viewer Dialog.

l On theGlobal Toolbar, click theView Log button, then select the server that you want to viewfrom the Server list in the LifeKeeper Log Viewer Dialog. 

l On the Server Context Toolbar, if displayed, click theView Log button.

l On the Edit Menu, point toServer, click View Log, then select the server that you want to viewfrom the Server list in the LifeKeeper Log Viewer Dialog. 

2. If you started from theGlobal Toolbar or theEdit Menu and you want to view logs for a differentserver, select that server from theServer list in the LifeKeeper Log Viewer Dialog. This feature is notavailable if you selectedView Logs from theServer Context Menu orServer Context Toolbar.

3. When you are finished, click OK to close the Log Viewer dialog.

SIOS Protection Suite for LinuxPage 181

Page 202: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Viewing Resource Tags and IDs

Viewing Resource Tags and IDsA resource's tag and ID can be viewed quickly by positioning the cursor over a resource icon in the statuswindow and clicking the left mouse button once (single-click). The resource tag and ID of the server havingthe lowest priority number are displayed in themessage bar. To display the resource tag and ID for a resourceon a specific server, single-click the appropriate resource instance cell in the table.

Messages displayed in themessage bar look similar to the following:

Resource Tag = ipdnet0-153.98.87.73, Resource ID = IP-153.98.87.73

Under certain circumstances, the GUI may not be able to determine the resource ID, in which case only theresource tag is displayed in themessage bar.

Viewing the Status of ResourcesThe status or state of a resource is displayed in two formats: Global Resource Status (across all servers),and theServer Resource Status (on a single server). The global resource status is shown in theResourceHierarchy Tree in the left pane of the status window. The server resource status is found in the table cellwhere the resource row intersects with the server column.

Server Resource StatusThe following figure shows servers with resource statuses of active, standby and unknown.

l All resources on "wallace" are active

l All resources on "gromit", "pat","mike" and "batman" are standby

l All resources on "bullwinkle" are unknown

SIOS Protection Suite for LinuxPage 182

Page 203: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Global Resource Status

ServerResource

State

VisualState What it Means

Active Resource is operational on this server and protected. (ISP)

Degraded Resource is operational on this server, but not protected by a backupresource. (ISU)

StandBy Server can take over operation of the resource. (OSU)

Failed Problem with resource detected on this server. For example, an attempt tobring the resource in-service failed. (OSF)

Unknown Resource has not been initialized (ILLSTATE), or LifeKeeper is not runningon this server.

Emptypanel Server does not have the resource defined.

Global Resource Status

VisualState Description What it Means / Causes

Normal Resource is active (ISP) and all backups are active.

Warning Resource is active (ISP). One or more backups aremarked as unknown or failed (OSF).

SIOS Protection Suite for LinuxPage 183

Page 204: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Viewing Resource Properties

VisualState Description What it Means / Causes

Failed. Resource is not active on anyservers (OSF).

Resource has been taken out-of-service for normalreasons.

Resource has stopped running by unconventionalmeans.

Recovery has not been completed or has failed.

Unknown. Could not determine statefrom available information.

More than one server is claiming to be active.

Lost connection to server.

All server resource instances are in an unknown state.

Viewing Resource Properties1. There are three possible ways to begin.

l Right-click on the icon for the resource/server combination for which you want to view theproperties. When the Resource Context Menu appears, click Properties. Resource propertieswill also be displayed in the Properties Panel if it is enabled.

l Right-click on the icon for the global resource for which you want to view the properties. Whenthe Resource Context Menu appears, click Properties. When the dialog comes up, select theserver for which you want to view that resource from theServer list.

l On the Edit Menu, point toResource and then click Properties. When the dialog comes up,select the resource for which you want to view properties from theResource list, and the serverfor which you want to view that resource from theServer list.

2. If you want to view properties for a different resource, select that resource from theResource list.

3. If you want to view resource properties for a different server, select that server from theServer list.

4. When you are finished, click OK to close the window.

Resource Labels

Resource LabelsThis option group allows you to specify whether resources are viewed in the resource hierarchy tree by theirtag name or ID.

Note: The resource tag/ID shown in the resource hierarchy tree belongs to the server having the lowestpriority number. If you wish to see the tag/ID for a resource on a specific server, left-click the resourceinstance cell in the table and its tag/ID will be displayed in themessage bar.

By tag name:

SIOS Protection Suite for LinuxPage 184

Page 205: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

ViewingMessage History

By ID:

Viewing Message History1. On the View Menu, click History. The LifeKeeper GUI Message History dialog is displayed.

2. If you want to clear all messages from the history, click Clear.

3. Click OK to close the dialog.

TheMessage History dialog displays themost recent messages from themessage bar. The history list candisplay amaximum of 1000 lines. When themaximum number of lines is exceeded, the new messages will"push out" the oldest messages.

Thesemessages represent only the actions between the client and the server and are displayed inchronological order, themost recent messages appearing at the top of the list.

Reading the Message History<-- indicates that themessage is incoming from a server and typically has a format of:

<--"server name":"action"

<--"server name":"app res": "action"

<--"server name":"res instance":"action"

--> indicates that themessage is outgoing from a client and typically has a format of:

-->"server name":"action"

-->"server name":"app res": "action"

-->"server name":"res instance":"action"

SIOS Protection Suite for LinuxPage 185

Page 206: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Expanding and Collapsing a Resource Hierarchy Tree

TheClear button clears the history but does not close the dialog.

TheOK button closes the dialog without clearing the history.

Expanding and Collapsing a Resource Hierarchy TreeIn this segment of the tree, the resource file_system_2 isexpanded and the resource nfs-/opt/qe_auto/NFS/export1 iscollapsed.

 appears to the left of a resource icon if it is expanded.

 appears if it is collapsed.

To expand a resource hierarchy tree,

l Click the  or

l Double-click the resource icon to the right of a  .

To expand all resource hierarchy trees,

l On theView Menu, click Expand Tree or

l Double-click the Resource Hierarchy Tree button in the column header in the left pane of theStatuswindow.

Note: The resource tag/ID shown in the resource hierarchy tree belongs to the server having the lowestpriority number. If you wish to see the tag/ID for a resource on a specific server, left-click the resourceinstance cell in the table and its tag/ID will be displayed in themessage bar.

To collapse a resource hierarchy tree,

l click the   or

l double-click the resource icon to the right of a  .

To collapse all resource hierarchy trees,

l On theView Menu, click Collapse Tree or

l Double-click theResource Hierarchy Tree button in the column header in the left pane of the Status

SIOS Protection Suite for LinuxPage 186

Page 207: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Cluster Connect Dialog

window

Note: The "9" and "0" keys are defined as hot/accelerator keys to facilitate quickly expanding or collapsing allresource hierarchy trees.

Cluster Connect Dialog

Server Name. The name of the server to which you want to connect.

Login. The login name of a user with LifeKeeper authorization on the server to which you want to connect.

Password. The password that authorizes the specified login on the server to which you want to connect.

Cluster Disconnect Dialog

Select Server in Cluster.

SIOS Protection Suite for LinuxPage 187

Page 208: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Resource Properties Dialog

A drop-down list box containing the names of connected servers will appear. From the list, select a serverfrom the cluster from which you want to disconnect. All servers in the cluster to be disconnected are noted inthe confirmation dialog.

Resource Properties DialogThe Resource Properties dialog is available from the Edit menu or from a resource context menu. This dialogdisplays the properties for a particular resource on a server. When accessed from the Edit menu, you canselect the resource and the server. When accessed from a resource context menu, you can select the server.

General Tabl Tag. The name of a resource instance, unique to a system, that identifies the resource to anadministrator.

l ID. A character string associated with a resource instance, unique among all instances of the resourcetype, that identifies some internal characteristics of the resource instance to the application softwareassociated with it.

l Switchback. (editable if user has Administrator permission) The setting that governs the recoverybehavior of the server where the resource was in service when it failed. If the setting is intelligent, theserver acts as a possible backup for the given resource. If the setting is automatic, the server activelyattempts to re-acquire the resource, providing the following conditions aremet:

l The resource hierarchy must have been in service on the server when it left the cluster.

l If it is in service at all, then the resourcemust currently be in service on a server with a lowerpriority.

Note:Checks for automatic switchback aremade only when LifeKeeper starts or when a newserver is added to the cluster; they are not performed during normal cluster operation.

l State. Current state of the resource instance:

l Active - In-service locally and protected.

l Warning - In-service locally, but local recovery will not be attempted.

l Failed - Out-of-service, failed.

l Standby - Out-of-service, unimpaired.

l ILLSTATE - A resource state has not been initialized properly by the resource initializationprocess which is run as part of the LifeKeeper startup sequence. Resources in this state are notunder LifeKeeper protection.

l UNKNOWN - Resource state could not be determined. TheGUI server may not be available.

l Reason. If present, describes the reason the resource is in its current state, that is, the reason for thelast state change. For example the application on galahad is in the OSU state because the sharedprimary resource ordbfsaa-on-tristan on tristan is in ISP or ISU state. Shared resources can be activeon only one of the grouped systems at a time.

SIOS Protection Suite for LinuxPage 188

Page 209: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Relations Tab

l Initialization. The setting that determines resource initialization behavior at boot time, for example,AUTORES_ISP, INIT_ISP, or INIT_OSU.

Relations Tabl Parent. Identifies the tag names of the resources that are directly dependent on this resource.

l Child. Identifies the tag names of all resources on which this resource depends.

l Root. Tag name of the resource in this resource hierarchy that has no parent.

Equivalencies Tabl Server. The name of the server on which the resource has a defined equivalency.

l Priority (editable if the user has Administrator permission). The failover priority value of the targetedserver, for this resource.

l Tag. The tag name of this resource on the equivalent server.

l Type. The type of equivalency (SHARED, COMMON, COMPOSITE).

l Reorder Priorities. (available if the user has Administrator permission) Up/Down buttons let you to re-order the priority of the selected equivalency.

TheOK button applies any changes that have beenmade and then closes the window. The Apply buttonapplies any changes that have beenmade. The Cancel button, closes the window without saving anychanges made since Apply was last clicked.

Server Properties DialogThe Server Properties dialog is available from a server context menu or from the Edit menu. This dialogdisplays the properties for a particular server. The properties for the server will also be displayed in theproperties panel  if it is enabled.

The three tabs of this dialog are described below. TheOK button applies any changes that have beenmadeand then closes the window. The Apply button applies any changes that have beenmade. The Cancel buttoncloses the window without saving any changes made since Apply was last clicked.

SIOS Protection Suite for LinuxPage 189

Page 210: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

General Tab

General Tab

l Name. Name of the selected server.

l State. Current state of the server. These are the possible server state values:

l ALIVE - server is available.

l DEAD - server is unavailable.

l UNKNOWN - state could not be determined. TheGUI server may not be available.

l Permission. The permission level of the user currently logged into that server. These are the possiblepermission values:

l Administrator - the user can perform any LifeKeeper task.

l Operator - the user canmonitor LifeKeeper resource and server status, and can bring resourcesin service and take them out of service.

l Guest - the user canmonitor LifeKeeper resource and server status.

SIOS Protection Suite for LinuxPage 190

Page 211: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

General Tab

l Shutdown Strategy. (editable if the user has Administrator permission) The setting that governswhether or not resources are switched over to a backup server in the cluster when a server isshutdown. The setting "Switchover Resources" indicates that resources will be brought in service on abackup server in the cluster. The setting "Do not Switchover Resources" indicates that resources willnot be brought in service on another server in the cluster.

l Failover Strategy. The setting allows you to require the confirmation of failovers from specificsystems in the LifeKeeper cluster. It is only available to LifeKeeper administrators. Operators andguests will not be able to see it. By default, all failovers proceed automatically with no userintervention. However, once the confirm failover flag is set, failovers from the designated system willrequire confirmation by executing the command: lk_confirmso -y system. The failover may beblocked by executing the command: lk_confirmso -n system. The system will take a pre-programmed default action unless one of these commands is executed within a specified interval. Twoflags in the /etc/default/LifeKeeper file govern this automatic action.

l CONFIRMSODEF

This specifies the default action. If set to "0", the default action is to proceed with failover. If setto "1", the default action is to block failover.

l CONFIRMSOTO

This is set to the time in seconds that LifeKeeper should wait before taking the default action.

SIOS Protection Suite for LinuxPage 191

Page 212: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CommPaths Tab

CommPaths Tab

l Server. The server name of the other server the communication path is connected to in the LifeKeepercluster.

l Priority. The priority determines the order by which communication paths between two servers will beused. Priority 1 is the highest and priority 99 is the lowest.

l State. State of the communications path in the LifeKeeper Configuration Database (LCD). These arethe possible communications path state values:

l ALIVE - functioning normally.

l DEAD - no longer functioning normally.

l UNKNOWN - state could not be determined. TheGUI server may not be available.

l Type. The type of communications path, TCP (TCP/IP) or TTY, between the server in the list and theserver specified in the Server field.

l Address/Device. The IP address or device name that this communications path uses.

l Comm Path Status. Summary communications path status determined by the GUI based on the

SIOS Protection Suite for LinuxPage 192

Page 213: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Resources Tab

state of the communications paths in the LifeKeeper Configuration Database (LCD). These are thepossible communications path status values displayed below the detailed text in the lower panel:

l NORMAL - all comm paths functioning normally.

l FAILED - all comm paths to a given server are dead.

l UNKNOWN - comm path status could not be determined. TheGUI server may not beavailable.

l WARNING - one or more comm paths to a given server are dead.

l DEGRADED - one oremore redundant comm paths to a given server are dead.

l NONE DEFINED - no comm paths defined.

Resources Tab

SIOS Protection Suite for LinuxPage 193

Page 214: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Operator Tasks

l Name. The tag name of a resource instance on the selected server.

l Application. The application name of a resource type (gen, scsi, ...)

l Resource Type. The resource type, a class of hardware, software, or system entities providing aservice (for example, app, filesys, nfs, device, disk,...)

l State. The current state of a resource instance:

l ISP - In-service locally and protected.

l ISU - In-service locally, but local recovery will not be attempted.

l OSF - Out-of-service, failed.

l OSU - Out-of-service, unimpaired.

l ILLSTATE - Resource state has not been initialized properly by the resource initializationprocess which is run as part of the LifeKeeper startup sequence. Resources in this state are notunder LifeKeeper protection.

l UNKNOWN - Resource state could not be determined. TheGUI server may not be available.

Operator TasksThe following topics aremore advanced tasks that require Operator permission.

Bringing a Resource In Service1. There are five possible ways to begin.

l Right-click on the icon for the resource/server combination that you want to bring into service.When the Resource Context Menu appears, click In Service.

l Right-click on the icon for the global resource that you want to bring into service. When theResource Context Menu appears, click In Service. When the dialog comes up, select theserver on which to perform the In Service from theServer list and click Next.

l On theGlobal Toolbar, click the In Service button. When the dialog comes up, select the serveron which to perform the In Service from theServer list and click Next. On the next dialog,select one or more resources that you want to bring into service from theResource(s) list andclick Next again.

l On the Resource Context Toolbar, if displayed, click the In Service button.

l On the Edit Menu, point toResource and then click In Service. When the dialog comes up,select the server on which to perform the In Service from theServer list, and click Next. Onthe next dialog, select one or more resources that you want to bring into service from theResource(s) list and click Next again.

2. A dialog appears confirming the server and resource(s) that you have selected to bring into service.This dialog will include a warning if you are bringing a dependent child resource into service withoutbringing its parent resource into service as well. Click In Service to bring the resource(s) into service

SIOS Protection Suite for LinuxPage 194

Page 215: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Taking a Resource Out of Service

along with any dependent child resources.

3. If the Output Panel is enabled, the dialog closes and the results of the commands to bring the resource(s) in service are shown in the output panel. If not, the dialog remains up to show these results andyou click Done to finish when all results have been displayed. Any additional dependent (child)resources that were brought into service are noted in the dialog or output panel. 

4. Errors that occur while bringing a resource in service are logged in the LifeKeeper log of the server onwhich you want to bring the resource into service. 

Taking a Resource Out of Service1. There are four possible ways to begin.

l Right-click on the icon for the global resource or resource/server combination that you want totake out of service. When the Resource Context Menu appears, click Out of Service.

l On the Global Toolbar, click the Out of Service button. When theOut of Service dialog comesup, select one or more resources that you want to take out of service from the Resource(s) list,and click Next.

l On the Resource Context Toolbar, if displayed, click theOut of Service button.

l On the Edit Menu, point toResource and then click Out of Service. When theOut of Servicedialog comes up, select one or more resources that you want to take out of service from theResource(s) list, and click Next.

2. AnOut of Service dialog appears confirming the selected resource(s) to be taken out of service. Thisdialog will include a warning if you are taking a dependent child resource out of service without takingits parent resource out of service as well. Click Out of Service to proceed to the next dialog box.

3. If the Output Panel is enabled, the dialog closes, and the results of the commands to take the resource(s) out of service are shown in the output panel. If not, the dialog remains up to show these results, andyou click Done to finish when all results have been displayed.

4. Errors that occur while taking a resource out of service are logged in the LifeKeeper log of the server onwhich you want to take the resource out of service. 

Advanced Tasks

LCD

LifeKeeper Configuration DatabaseThe LifeKeeper Configuration Database (LCD)maintains the object-oriented resource hierarchy informationand stores recovery direction information for all resource types known to LifeKeeper. The data is cachedwithin system sharedmemory and stored in files so that configuration data is retained over system restarts.The LCD also contains state information and specific details about resource instances required for recovery.

SIOS Protection Suite for LinuxPage 195

Page 216: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Related Topics

See the following related topics for information on the LCD directory structure, types of data stored, resourcetypes available and use of application scripts.

Related Topics

LCDI CommandsNote: How to create resources by defining your own Recovery Kit is described below. If you use theexisting Recovery Kit, please create resources by using GUI.

LifeKeeper provides twomechanisms for defining an application resource hierarchy:

l LifeKeeper GUI

l LifeKeeper Configuration Database Interface (LCDI) commands

The LCDI is a set of interface commands provided by LifeKeeper that you can use to create and customizeresource hierarchy configurations tomeet your application needs. You use the command interface when anapplication depends uponmultiple resources (such as two or more file systems).

For a description of the commands, see the LCDI manual pages. This topic provides a development scenariothat demonstrates the way you can use both the GUI and command functions to create a resource hierarchy.

Scenario SituationThe example application, ProjectPlan, has data stored in SCSI file systems shared by Servers 1 and 2.Server 1 will be the primary hierarchy for the application. The application has two file systems: /project-dataand /schedule. The first step in the hierarchy definition is to determine the dependencies.

The example application has these dependencies:

l Shared file systems. The application depends upon its file systems: /project-data and /schedule.

l SCSI disk subsystem. The file systems in turn depend upon the SCSI disk subsystem, whichincludes the device, disk and host adapter resources.

As a result, the task is to create a hierarchy that looks like the following diagram.

SIOS Protection Suite for LinuxPage 196

Page 217: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hierarchy Definition

Hierarchy DefinitionThese are the tasks required to construct the example application hierarchy:

1. Create file system resources. The LifeKeeper GUI provides menus to create file system resources.See Creating File System Resource Hierarchies. 

At the end of this definition task, the LCD has two filesys resources defined as follows:

ID Tag Server/project-data/project-data

project-data-on-Server1project-data-from-Server1

Server1Server2

/schedule/schedule

schedule-on-Server1schedule-from-Server1

Server1Server2

Note: LifeKeeper does not place any significance on the tag names used; they are simply labels. Thetag names shown are the LifeKeeper defaults.

2. Define resources. The example requires the following definitions:

Application: projectapp

Resource Type: plan

Instance ID: 1yrplan

Tag: the-project-plan

SIOS Protection Suite for LinuxPage 197

Page 218: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Hierarchy Definition

Note: Although you can createmuch of the definition using the LifeKeeper GUI, the rest of thisexample demonstrates the command interface.

3. Create directories. On each system, you create the necessary application recovery directories underthe directory /opt/LifeKeeper/subsys with the command:

mkdir -p /opt/LifeKeeper/subsys/projectapp/Resources/plan/actions 

4. Define application. The following commands create the application named projectapp:

app_create -d Server1 -a projectapp

app_create -d Server2 -a projectapp 

5. Define the resource type. The following commands create the resource type named plan:

typ_create -d Server1 -a projectapp -r plan

typ_create -d Server2 -a projectapp -r plan 

6. Install recovery scripts. Copy your restore and remove scripts to the following directory on eachserver:

/opt/LifeKeeper/subsys/projectapp/Resources/plan/actions

7. Define instance. The following commands define an instance of resource type plan with the id1yrplan:

ins_create -d Server1 -a projectapp -r plan -I\

AUTORES_ISP -t the-project-plan -i 1yrplan 

ins_create -d Server2 -a projectapp -r plan -I\

SEC_ISP -t the-project-plan -i 1yrplan  

The -I AUTORES_ISP instruction for the instance created on Server1 tells LifeKeeper toautomatically bring the resource in service when LifeKeeper is restarted. In this case, theresource’s restore script is run and, if successful, the resource is placed in the ISP state. Thisoperation is not performed if the paired resource is already in service.

The -I SEC_ISP instruction for the instance created on Server2 tells LifeKeeper that thisresource instance should not be brought into service when LifeKeeper is restarted. Instead,Server2 will serve as the backup for the resource on Server1, and the local resource will bebrought in service upon failure of the primary resource or server.

8. Define dependencies. The following commands define the dependencies between the application andthe file systems:

dep_create -d Server1 -p the-project-plan -c project-data-on-System1

dep_create -d Server2 -p the-project-plan -c project-data-from-Server1

dep_create -d Server1 -p the-project-plan -c schedule-on-Server1

dep_create -d Server2 -p the-project-plan -cschedule-from-Server1 

SIOS Protection Suite for LinuxPage 198

Page 219: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LCD Configuration Data

9. Execute lcdsync. Execute the following lcdsync commands to inform LifeKeeper to update its copyof the configuration:

lcdsync -d Server1

lcdsync -d Server2 

10. Set the resources to “In Service”. Access LifeKeeper GUI on the primary server, select [Edit] >[Resource] > [In-Service] and set the resource “In Service”, or execute the following command on theprimary server:

perform_action -t the-project-plan -a restore

11. Create equivalency. Create equivalency of resources registered for each node with the followingcommands to switch resources:

eqv_create -d Server1 -t the-project-plan -p 1 -S Server2 -o the-project-plan -r 10 -e SHAREDeqv_create -d Server2 -t the-project-plan -p 10 -S Server1 -o the-project-plan -r 1 -e SHARED

LCD Configuration DataLCD stores the following related types of data:

l Dependency Information

l Resource Status Information

l Inter-Server Equivalency Information

Dependency InformationFor each defined resource, LifeKeeper maintains a list of dependencies and a list of dependents (resourcesdepending on a resource.) For information, see the LCDI_relationship (1M) and LCDI_instances(1M)manual pages.

Resource Status InformationLCD maintains status information in memory for each resource instance. The resource states recognized byLCD are ISP, ISU, OSF, OSU and ILLSTATE. Resources may change from one state to another when asystem event occurs or when an administrator takes certain actions. When a resource changes states, thestatus change is reflected in the LCD on the local server as well as in the database of the backup servers forthat resource.

Inter-Server Equivalency InformationRelationships may exist between resources on various servers. A shared equivalency is a relationshipbetween two resources on different servers that represents the same physical entity. When two servers havea resource with a shared equivalency relationship, LifeKeeper attempts to ensure in its actions that only one of

SIOS Protection Suite for LinuxPage 199

Page 220: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LCD Directory Structure

the two servers has the resource instance in the in-service, protected [ISP] state at any one time. Bothservers can have the resource instance in an out-of-service state [OSU orOSF], but for data integrityreasons, only one server can have the resource in service at any given time.

Disks on a Small Computer System Interface (SCSI) bus are one example of equivalent resources. With theSCSI locking (or reserve) mechanism, only one server can own the lock for a disk device at any point in time.This lock ownership feature guarantees that two or more servers cannot access the same disk resource at thesame time.

Furthermore, the dependency relationships within a hierarchy guarantee that all resources that depend uponthe disk, such as a file system, are in service on only one server at a time.

LCD Directory StructureMajor subdirectories under /opt/LifeKeeper:

l config. LifeKeeper configuration files, including shared equivalencies.

l bin. LifeKeeper executable programs, such as is_recoverable. See Fault Detection and RecoveryScenarios for descriptions.

l subsys. Resources and types. LifeKeeper provides resource and type definitions for the shared SCSIdisk subsystem in scsi and for the generic applicationmenu functions in gen. When you define anapplication interface, you create directories under subsys.

l events. Alarming events. See LifeKeeper Alarming and Recovery for further information.

The structure of the LCD directory in /opt/LifeKeeper is shown in the topic Structure of LCD Directory in/opt/LifeKeeper.

LCD Resource TypesThe LCD is maintained in both sharedmemory and in the /opt/LifeKeeper directory. As highlighted on thedirectory structure diagram, subsys contains two application resource sets you can use to define yourapplication interface:

l gen - generic application and file system information

l scsi - recovery information specific to the SCSI

These subdirectories are discussed in Resources Subdirectories. 

LifeKeeper FlagsNear the end of the detailed status display, LifeKeeper provides a list of the flags set for the system. Acommon type is a Lock LCD flag used to ensure that other processes wait until the process lock completes itsaction. The following is the standard LCD lock format:

!action!processID!time!machine:id.

SIOS Protection Suite for LinuxPage 200

Page 221: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Resources Subdirectories

These are examples of general LCD lock flags:

l !action!02833!701236710!<servername>:filesys. The creation of a filesystem hierarchyproduces a flag in this format in the status display. The filesys designation can be a different resourcetype for other application resource hierarchies or app for generic or user-defined applications.

l Other typical flags include !nofailover!machine and shutdown_switchover. The!nofailover!machine flag is an internal, transient flag created and deleted by LifeKeeper whichcontrols aspects of server failover. The shutdown_switchover flag indicates that the shutdownstrategy for this server has been set to switchover such that a shutdown of the server will cause aswitchover to occur. See LCDI-flag(1M) for more detailed information on the possible flags.

Resources SubdirectoriesThe scsi and gen directories each contain a resources subdirectory. The content of those directories providesa list of the resource types provided by LifeKeeper:

scsi resource types. You find these resource types in the /opt/LifeKeeper/subsys/scsi/resources directory.Note that theremay be additional directories depending upon your configuration.

l device—disk partitions or virtual disk devices

l disk—physical disks or LUNs

l hostadp—host adapters

gen resource types. You find these resource types in the /opt/LifeKeeper/subsys/gen/resources directory:

l filesys—file systems

l app—generic or user-defined applications that may depend upon additional resources

Each resource type directory contains one or more of the following:

l instances. This file reflects the permanent information saved in the LCD about resource instances. Itcontains descriptive information for the resource instances associated with this resource type.

WARNING: Do not modify the instances file (or any LCD file) directly. To create or manipulate resourceinstances, use only the LifeKeeper GUI functions or the LifeKeeper LCDI_instances commands: ins_create,ins_remove, ins_gettag, ins_setas, ins_setinfo, ins_setinit, ins_setstate and ins_list. Refer to the LCDI_instances (1M)manual pages for explanations of these commands.

l recovery. This optional directory contains the programs used to attempt the local recovery of aresource for which a failure has been detected. The recovery directory contains directories thatcorrespond to event classes passed to sendevent. The names of the directories must match the classparameter (-C) passed to the sendevent program. (See LifeKeeper Alarming and Recovery.)

In each subdirectory, the application can place recovery programs that service event types of thecorresponding event class. The name of these programs must match the string passed to sendevent with the -E parameter. This optional directory may not exist for many applications.

l actions. This directory contains the set of recovery action programs that act only on resourceinstances of the specific resource type. If, for your application, any actions apply to all resource types

SIOS Protection Suite for LinuxPage 201

Page 222: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Resource Actions

within an application, place them in an actions subdirectory under the application directory rather thanunder the resource type directory.

Recovery direction software is used tomodify or recover a resource instance. Two actions, remove andrestore, must exist in the actions directory for each resource type.

Resource ActionsThe actions directory for a resource type contains the programs (most often shell scripts) that describespecific application functions. Two actions are required for every resource type—restore and remove.

The remove and restore programs should perform symmetrically opposite functions; that is, they undo theeffect of one another. These scripts should never be runmanually. They should only be run by executing theLifeKeeper Recovery Action and Control Interface (LRACI) perform_action shell program described inthe LRACI-perform_action (1M) manual page.

Structure of LCD Directory in /opt/LifeKeeperThe following diagram shows the directory structure of /opt/LifeKeeper.

SIOS Protection Suite for LinuxPage 202

Page 223: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LCM

LCMThe LifeKeeper Communications Manager (LCM) provides reliable communication between processes on oneor more LifeKeeper servers. This process can use redundant communication paths between systems so thatfailure of a single communication path does not cause failure of LifeKeeper or its protected resources. TheLCM supports a variety of communication alternatives including RS-232 (TTY) and TCP/IP connections.

The LCM provides the following:

l LifeKeeper Heartbeat. Periodic communication with other connected LifeKeeper systems todetermine if the other systems are still functioning. LifeKeeper can detect any total system failure thatis not detected by another means by recognizing the absence of the heartbeat signal.

l Administration Services. The administration functions of LifeKeeper use the LCM facilities toperform remote administration. This facility is used for single-point administration, configurationverification and sanity checking of administrative actions.

l Configuration and Status Communication. The LifeKeeper configuration database (LCD) tracksresource status, availability and configuration through the LCM facilities. These facilities allow theLCD tomaintain consistent resource information between the primary and secondary systems.

SIOS Protection Suite for LinuxPage 203

Page 224: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Communication Status Information

l Failover Recovery. If a resource fails on a system, the LCM notifies LifeKeeper to recover theresource on a backup system.

In addition to the LifeKeeper services provided by the LCM, inter-system application communication ispossible through a set of shell commands for reliable communication. These commands include snd_msg,rcv_msg, and can_talk. These commands are described in the LCMI_mailboxes (1M) manualpages. The LCM runs as a real-time process on the system assuring that critical communications such assystem heartbeat will be transmitted.

Communication Status InformationThe communications status information section of the status display lists the servers known to LifeKeeperand their current state followed by information about each communication path.

The following sample is from the communication status section of a short status display:

MACHINE NETWORK ADDRESSES/DEVICE STATE PRIO

tristan TCP 100.10.100.100/100.10.100.200 ALIVE 1

tristan TTY /dev/ttyS0 ALIVE --

Formore information, see the communication status information section of the topics Detailed Status Displayand the Short Status Display. 

LifeKeeper Alarming and RecoveryLifeKeeper error detection and notification is based on the event alarmingmechanism, sendevent. The keyconcept of the sendevent mechanism is that independent applications can register to receive alarms forcritical components. Neither the alarm initiation component nor the receiving application(s) need to bemodified to know the existence of the other applications. Application-specific errors can trigger LifeKeeperrecovery mechanisms via the sendevent facility.

This section discusses topics related to alarming including alarm classes, alarm processing and alarmdirectory layout and then provides a processing scenario that demonstrates the alarming concepts.

Alarm ClassesThe /opt/LifeKeeper/events directory lists a set of alarm classes. These classes correspond to particular sub-components of the system that produces events (for example, filesys). For each alarm class, subdirectoriescontain the set of potential alarms (for example, badmount and diskfull). You can register an application toreceive these alarms by placing shell scripts or programs in the appropriate directories.

LifeKeeper uses a basic alarming notification facility. With this alarming functionality, all applicationsregistered for an event have their handling programs executed asynchronously by sendevent when theappropriate alarm occurs. With LifeKeeper present, the sendevent process first determines if the LifeKeeperresource objects can handle the class and event. If LifeKeeper finds a class/event match, it executes theappropriate recover scenario.

SIOS Protection Suite for LinuxPage 204

Page 225: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Alarm Processing

Defining additional scripts for the sendevent alarming functionality is optional. When you define LifeKeeperresources, LifeKeeper provides the basic alarming functionality described in the processing scenarios later inthis chapter.

Note: Local recovery for a resource instance is the attempt by an application under control of LifeKeeper toreturn interrupted resource services to the end-user on the same system that generated the event. Inter-serverrecovery allows an application tomigrate to a backup system. This type of recovery is tried after localrecovery fails or is not possible.

Alarm ProcessingApplications or processes that detect an event whichmay require LifeKeeper attention can report the event byexecuting the sendevent program, passing the following arguments: respective error class, error name andfailing instance. Refer to the sendevent(5) manual pages for required specifics and optional parameters andsyntax.

Alarm Directory LayoutThe /opt/LifeKeeper/events directory has two types of content:

l LifeKeeper supplied classes. LifeKeeper provides two alarm classes listed under the eventsdirectory: lifekeeper and filesys. An example of an alarm event includes diskfull. The alarm classescorrespond to the strings that are passed with the -C option to the sendevent command and the alarmevents correspond to the strings that are passed with the -E option. The lifekeeper alarm class is usedinternally by LifeKeeper for event reporting within the LifeKeeper subsystem.

l Application-specific classes. The other subdirectories in the events directory are added whenspecific applications require alarm class definitions. Applications register to receive these alarms byplacing shell scripts or binary programs in the directories. These programs are named after theapplication package to which they belong.

LifeKeeper API for Monitoring

IntroductionThe LifeKeeper API for Monitoring can obtain the operational status of LifeKeeper nodes and their protectedresources by making status inquiries to the available nodes in the LifeKeeper cluster.

SummaryThis document describes the LifeKeeper API for Monitoring (hereinafter referred to as the API) and is targetedfor developers whomanage the resource protected by LifeKeeper. By using the API, the information suppliedby the lcdstatus command is obtained through CGI script and the lighttpdmodule. By using this API, userscan determine the current status of the LifeKeeper nodes and resources without logging-in to LifeKeeperservers. The API can supply the following information.

l LifeKeeper node status is the node alive and processing or down

l Communication path status between nodes in the cluster, are communication path(s) up or down

SIOS Protection Suite for LinuxPage 205

Page 226: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Information to be supplied with this API

l Status of protected resourcesTo get the detailed status of any abnormal condition requires logging-in to LifeKeeper GUI or checking theLifeKeeper log as necessary.

Information to be supplied with this APIThe following information is supplied through this API when the user makes an inquiry to an active LifeKeepernode. The information supplied is about the specific LifeKeeper server to which the inquiry was directed evenif the cluster consists of multiple servers.

l Status

l Operating status of each server

l Node name

l Operational status (ALIVE/DEAD)

l Operational status of communication path(s)

l Node name

l Operational status (ALIVE/DEAD)

l Address / device name

l Status of protected resources

SIOS Protection Suite for LinuxPage 206

Page 227: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Communication format

l NodeName

l Tag

l Status (ISP, OSU, OSF, …)

l Dependency setting

l Mirror information for Data Replication resources (available only if status is ISP)

l Tag

l Mirror status (Sync, Paused, ...)

l Replication status (75%, 100%, ...)

l Log

l /var/log/lifekeeper.log *Not supported if log file path is changed

l Up to 1000 lines (when data output format is HTML)

l All (when data output format is plain text)

l /var/log/lifekeeper.err *Not supported if log file path is changed

l Up to 1000 lines (when data output format is HTML)

l All (when data output format is plain text)

Communication formatThe API uses HTTP to obtain the requested information. To obtain information, the user sends a HTTP GETrequest to the CGI scripts via lighttpd on the specific server.

Data formatThe following 3 data formats are available.

l JSON

l To be used by an external tool to analyze the status information returned

l Status checking is possible

l Log output is not available

l HTML

l To be used to visually check via a browser

l Status checking is possible

l Log information is available up to 1000 lines

l plain text

SIOS Protection Suite for LinuxPage 207

Page 228: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Data format

l Used for regular log checking

l For logging purpose only and not for checking the status

l All contents of /var/log/lifekeeper.log and /var/log/lifekeeper.err are available

Available JSON format and HTML format from the status in the following figure.

{"resource": [

{"replication": {},"child": [

{"tag": "datarep-data"

}],"server": {

"status": "ISP","name": "lk01"},

"tag": "/data"},{

"replication": {

SIOS Protection Suite for LinuxPage 208

Page 229: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Data format

"percent": "100%","mirror": "Fully Operational"

},"child": [],"server": {

"status": "ISP","name": "lk01"

},"tag": "datarep-data"

},{

"replication": {},"child": [],"server": {

"status": "ISP","name": "lk01"

},"tag": "ip-10.125.139.118"

}],"compath": [

{"status": "ALIVE","server": [

{"name": "lk01","term": "192.168.139.18"

},{

"name": "lk02","term": "192.168.139.19"

}]

},{

"status": "ALIVE","server": [

{"name": "lk01","term": "172.20.139.18"

},{

"name": "lk02","term": "172.20.139.19"

}]

SIOS Protection Suite for LinuxPage 209

Page 230: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

How to use

}],"server": [

{"status": "ALIVE","name": "lk01"

},{

"status": "ALIVE","name": "lk02"

}]

}

How to use

Activate the APIThe API is disabled by default. To activate, requires modification of /etc/default/LifeKeeper set the LKAPI_MONITORING configuration parameter to true. Setting of the configuration parameter only activates the APIon that node and thereforemust be set on each node on which the API will be used. Setting of thisconfiguration parameter does not require a restart of LifeKeeper.

SIOS Protection Suite for LinuxPage 210

Page 231: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Port number

LKAPI_MONITORING=true

Port numberThe API uses port 779 by default. To change the port number, the user needs to set the following in/etc/default/LifeKeeper.

LKAPI_WEB_PORT=<port number>

Usage examplesTo obtain information a request is made to a server with an active LifeKeeper API configuration. Basicexample using curl.

curl http://<IPADDR>:779/Monitoring.cgiIf no arguments are given, the current status is obtained using the JSON data format. Request for loginformation using HTML data format.

curl http://<IPADDR>:779/Monitoring.cgi?format=html&show=log

The list of available arguments can be found in the table below.

Name Explanation Value Comments

show Specify the targetinformation

status,log, log-err show=status is the default

format Specify dataformat

json, html,plain

format=json is the default. If the format is json an error will be dis-played if show=log or show=log-err is set.

SecurityAll the users requesting information via the API must be authorized to get LifeKeeper status information.For this reason, user security settings can limit the users who can get the status by, configuring SSL, andencrypting the information.

Basic AuthenticationTo obtain the information via the API, Basic Authentication is required. To setup the authentication requiresmodification to the lighttpd configuration file (Modify the part in red colored character.) plus a restart of thelighttpdmodule. See figure 7 for how to configure lightttp.conf.

After modification execute the command ”/opt/LifeKeeper/sbin/sv restart steeleye-lighttpd” and reboot lighttpdto restart lighttpd using the new configuration.

/opt/LifeKeeper/etc/lighttpd/lighttpd.conf

server.modules = (

SIOS Protection Suite for LinuxPage 211

Page 232: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

SSL/TLS set up

:"mod_auth", # uncommenting

/opt/LifeKeeper/lib64/steeleye-lighttpd/include_server_bind.pl

print qq/\$SERVER["socket"] == "$addr:$port" {\n/;print qq/ server.document-root = "\/opt\/LifeKeeper\/api"\n/;print qq/ auth.backend = "htpasswd"\n/;print qq/ auth.backend.htpasswd.userfile ="\/opt\/LifeKeeper\/etc\/lighttpd\/lighttpd.user.htpasswd"\n/;print qq/ auth.require = ( "\/" =>\n/;print qq/ (\n/;print qq/ "method" => "basic",\n/;print qq/ "realm" => "LifeKeeperAPI",\n/;print qq/ "require" => "valid-user"\n/;print qq/ )\n/;print qq/ )\n/;print qq/ }\n/;

Step to create htpasswd file.

htpasswd -c /opt/LifeKeeper/etc/lighttpd/lighttpd.user.htpasswd<USERNAME>

SSL/TLS set upSSL/TLS is available for the communication via this API. The lighttpdmodifications for SSL/TLS is shown inthe example in the following figure. After modification execute the command ”/opt/LifeKeeper/sbin/sv restartsteeleye-lighttpd” and reboot lighttpd to restart lighttpd with the new configuration.

/opt/LifeKeeper/etc/lighttpd/include_ssl_port.pl

configAPI("0.0.0.0", 443);if(socket($sock, AF_INET6, SOCK_STREAM, 0)) {

configAPI("[::]", 443);}

sub configAPI {my $addr = shift;my $port = shift;

SIOS Protection Suite for LinuxPage 212

Page 233: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Modification to support SSL/TLS + Basic authentication

print qq/\$SERVER["socket"] == "$addr:$port" {\n/;print qq/ server.document-root = "\/opt\/LifeKeeper\/api"\n/;print qq/ ssl.engine = "enable"\n/;print qq/ ssl.pemfile =

"\/opt\/LifeKeeper\/etc\/certs\/LK4LinuxValidNode.pem"\n/;print qq/ ssl.use-sslv2 = "disable"\n/;print qq/ ssl.use-sslv3 = "disable"\n/;print qq/ }\n/;

}

Modification to support SSL/TLS + Basic authentication

Using SSL/TLS, modification example to set up Basic authentication is below. After modification, execute thecommand “/opt/LifeKeeper/sbin/sv restart steeleye-lighttpd“ and restart lighttpd to reflect themodified set up.

/opt/LifeKeeper/etc/lighttpd/lighttpd.conf

server.modules = (:"mod_auth", # uncommenting

/opt/LifeKeeper/etc/lighttpd/include_ssl_port.pl

configAPI("0.0.0.0", 443);if(socket($sock, AF_INET6, SOCK_STREAM, 0)) {

configAPI("[::]", 443);}

sub configAPI {my $addr = shift;my $port = shift;

print qq/\$SERVER["socket"] == "$addr:$port" {\n/;print qq/ server.document-root = "\/opt\/LifeKeeper\/api"\n/;print qq/ ssl.engine = "enable"\n/;print qq/ ssl.pemfile =

"\/opt\/LifeKeeper\/etc\/certs\/LK4LinuxValidNode.pem"\n/;print qq/ ssl.use-sslv2 = "disable"\n/;print qq/ ssl.use-sslv3 = "disable"\n/;print qq/ auth.backend = "htpasswd"\n/;

SIOS Protection Suite for LinuxPage 213

Page 234: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

IP address access limitation

print qq/ auth.backend.htpasswd.userfile ="\/opt\/LifeKeeper\/etc\/lighttpd\/lighttpd.user.htpasswd"\n/;

print qq/ auth.require = ( "\/" =>\n/;print qq/ (\n/;print qq/ "method" => "basic",\n/;print qq/ "realm" => "LifeKeeperAPI",\n/;print qq/ "require" => "valid-user"\n/;print qq/ )\n/;print qq/ )\n/;print qq/ }\n/;

}

IP address access limitationThe lightppd configuration can also be setup to limit IP addresses that can be used to access data via the API.The lighttpd configuration to limit access is shown in Figure 9. The example will reject the connections from IPaddress other than 192.168.10.1. After modification execute the command ”/opt/LifeKeeper/sbin/sv restartsteeleye-lighttpd” to restart lighttpd with the new configuration.

/opt/LifeKeeper/etc/lighttpd/conf.d/lkapi_user.conf

$HTTP["remoteip"] != "192.168.10.1" {url.access-deny = ( "" )}

ErrorErrors can occur during the usage of this API when enabled. Should this occur, the summary of the error isoutput. Error example when JSON format is shown below. HTTP status code returned by lighttpd is notdescribed here.

{"error" : {

id : -1,message : "Failed to get LCD status"}

}

Similar message is output in the case the output format is HTML.

SIOS Protection Suite for LinuxPage 214

Page 235: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Maintenance Tasks

Maintenance TasksThe following are tasks for maintaining LifeKeeper.

Changing LifeKeeper Configuration ValuesThere are a number of values in LifeKeeper that may need to be changed after LifeKeeper has been configuredand set up. Examples of values that may bemodified include the uname of LifeKeeper servers, comm path ipaddresses, ip resource addresses and tag names. To change these values, carefully follow the instructionsbelow.

1. Stop LifeKeeper on all servers in the cluster using the command:

/etc/init.d/lifekeeper stop-nofailover

There is no need to delete comm paths or unextend resource hierarchies from any of the servers.

2. If you are changing the uname of a LifeKeeper server, change the server's hostname using the Linuxhostname(1) command.

3. Before continuing, ensure that any new host names are resolvable by all of the servers in the cluster. Ifyou are changing comm path addresses, check that the new addresses are configured and working(the ping and telnet utilities can be used to verify this).

4. If more than one LifeKeeper value is to be changed, old and new values should be specified in a file oneach server in the cluster in the following format:

old_value1=new_value1

....

old_value9=new_value9

5. Verify that the changes to bemade do not have any unexpected side effects by examining the output ofrunning the lk_chg_value command on all servers in the cluster. If there is more than one valueto change, run the command:

$LKROOT/bin/lk_chg_value -Mvf file_name

where file_name is the name of the file created in Step 4.

If there is only one value to change, run the command:

$LKROOT/bin/lk_chg_value -Mvo old_value -n new_value

The -M option specifies that nomodifications should bemade to any LifeKeeper files.

6. Modify LifeKeeper files by running the lk_chg_value commandwithout the -M option on allservers in the cluster. If there is more than one value to change, run the command:

$LKROOT/bin/lk_chg_value -vf file_name

where file_name is the name of the file created in Step 4.

SIOS Protection Suite for LinuxPage 215

Page 236: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Changing LifeKeeper Configuration Values

If there is only one value to change, run the command:

$LKROOT/bin/lk_chg_value -vo old_value -n new_value

7. Restart LifeKeeper using the command:

/etc/init.d/lifekeeper start

If the cluster is being viewed using the LifeKeeper GUI, it may be necessary to close and restart theGUI.

Example:

Server1 andServer2 are the LifeKeeper server unames in a two-node cluster. Server1 has a commpath with address 172.17.100.48. Server2 has an ip resource with address 172.17.100.220 which isextended toServer1. Wewish to change the following values forServer1:

Value Old Newuname Server1 Newserver1

comm path address 172.17.100.48 172.17.105.49

IP resource address 172.17.100.220 172.17.100.221

The following steps should be performed tomake these changes.

1. Stop LifeKeeper on bothServer1 andServer2 using the command:

/etc/init.d/lifekeeper stop-nofailover

2. Change the uname of Server1 toNewserver1 using the command:

hostname Newserver1

3. Create the file, /tmp/subs, with the content below, on bothNewserver1 andServer2:

Server1=Newserver1

172.17.100.48=172.17.105.49

172.17.100.220=172.17.100.221

4. Verify that the changes specified will not have any unexpected side effects by examining the output ofrunning the following command on both servers:

$LKROOT/bin/lk_chg_value -Mvf /tmp/subs

5. Modify the LifeKeeper files by running the lk_chg_value commandwithout the -M option on bothservers:

$LKROOT/bin/lk_chg_value -vf /tmp/subs

6. Restart LifeKeeper on both servers using the command:

/etc/init.d/lifekeeper start

SIOS Protection Suite for LinuxPage 216

Page 237: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

File System Health Monitoring

Notes:

l To see the changes lk_chg_value will make without modifying any LifeKeeper files, use the -Moption. To see the files lk_chg_value is examining, use -v. To not modify tag names, use the -Toption. To not modify resource ids, use the -I option.

File System Health MonitoringThe File System Health Monitoring feature detects conditions that could cause LifeKeeper protectedapplications that depend on the file system to fail. Monitoring occurs on active/in-service resources (i.e. filesystems) only. The two conditions that aremonitored are:

l A full (or almost full) file system, and

l An improperly mounted (or unmounted) file system.

When either of these two conditions is detected, one of several actions might be taken.

l A warningmessage can be logged and email sent to a system administrator.

l Local recovery of the resource can be attempted.

l The resource can be failed over to a backup server.

Condition Definitions

Full or Almost Full File SystemA "disk full" condition can be detected, but cannot be resolved by performing a local recovery or failover -administrator intervention is required. A message will be logged by default. Additional notification functionalityis available. For example, an email can be sent to a system administrator, or another application can beinvoked to send a warningmessage by some other means. To enable this notification functionality, refer to thetopic Configuring LifeKeeper Event Email Notification.

In addition to a "disk full" condition, a "disk almost full" condition can be detected and a warningmessagelogged in the LifeKeeper log.

The "disk full" threshold is:

FILESYSFULLERROR=95

The "disk almost full" threshold is:

FILESYSFULLWARN=90

The default values are 90% and 95% as shown, but are configurable via tunables in the/etc/default/LifeKeeper file. Themeanings of these two thresholds are as follows:

FILESYSFULLWARNING - When a file system reaches this percentage full, a message will bedisplayed in the LifeKeeper log.

SIOS Protection Suite for LinuxPage 217

Page 238: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Unmounted or Improperly Mounted File System

FILESYSFULLERROR - When a file system reaches this percentage full, a message will be displayedin the LifeKeeper log as well as the system log. The file system notify script will also be called.

Unmounted or Improperly Mounted File SystemLifeKeeper checks the /etc/mtab file to determine whether a LifeKeeper protected file system that is in serviceis actually mounted. In addition, themount options are checked against the storedmount options in the filesysresource information field to ensure that they match the original mount options used at the time the hierarchywas created.

If an unmounted or improperly mounted file system is detected, local recovery is invoked and will attempt toremount the file system with the correct mount options.

If the remount fails, failover will be attempted to resolve the condition. The following is a list of commoncauses for remount failure which would lead to a failover:

l corrupted file system (fsck failure)

l failure to createmount point directory

l mount point is busy

l mount failure

l LifeKeeper internal error

Maintaining a LifeKeeper Protected SystemWhen performing shutdown andmaintenance on a LifeKeeper-protected server, youmust put that system’sresource hierarchies in service on the backup server before performingmaintenance. This process stops allactivity for shared disks on the system needingmaintenance. 

Perform these actions in the order specified, whereServer A is the primary system in need of maintenanceandServer B is the backup server:

1. Bring hierarchies in service on Server B. On the backup, Server B, use the LifeKeeper GUI to bringin service any resource hierarchies that are currently in service onServer A. This will unmount any filesystems currently mounted onServer A that reside on the shared disks under LifeKeeper protection.See Bringing a Resource In Service for instructions.

2. Stop LifeKeeper on Server A. Use the LifeKeeper command /etc/init.d/lifekeeper stop-nofailover to stop LifeKeeper. Your resources are now unprotected.

3. Shut down Linux and power down Server A. Shut down the Linux operating system onServer A,then power off the server.

4. Perform maintenance. Perform the necessary maintenance onServer A.

5. Power on Server A and restart Linux. Power onServer A, then reboot the Linux operating system.

6. Start LifeKeeper on Server A. Use the LifeKeeper command /etc/init.d/lifekeeperstart to start LifeKeeper. Your resources are now protected.

SIOS Protection Suite for LinuxPage 218

Page 239: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Maintaining a Resource Hierarchy

7. Bring hierarchies back in-service on Server A, if desired. OnServer A, use the LifeKeeper GUI tobring in service all resource hierarchies that were switched over toServer B.

Maintaining a Resource HierarchyYou can perform maintenance on a resource hierarchy while maintaining LifeKeeper protection of all otherhierarchies on the system. This involves taking the hierarchy in need of maintenance out of service and thenbringing it back in-service after you complete themaintenance tasks.

To perform maintenance on a resource hierarchy:

1. Take the hierarchy out of service. Use the LifeKeeper GUI to take as much of the resource hierarchyout of service as you need to perform themaintenance. See Taking a Resource Out of Service forinstructions.

2. Perform maintenance. Perform the necessary maintenance on the resource hierarchy.

3. Restore the hierarchy. Use the LifeKeeper GUI to bring the resource hierarchy back in service. SeeBringing a Resource In Service for instructions. 

Recovering After a FailoverAfter LifeKeeper performs a failover recovery from a primary server (Server A) to a backup server (Server B),perform the following steps:

1. Review logs. When LifeKeeper onServer B performs a failover recovery from Server A, statusmessages are displayed during the failover.

The exact output depends upon the configuration. Somemessages on failure tomount or unmount areexpected and do not suggest failure of recovery. Thesemessages, as well as any errors that occurwhile bringing the resource in service onServer B, are logged in the LifeKeeper log.

2. Perform maintenance. Determine and fix the cause of the failure onServer A. Server Amay need tobe powered down to perform maintenance.

3. Reboot Server A, if necessary. Oncemaintenance is complete, reboot Server A if necessary.

4. Start LifeKeeper, if necessary. If LifeKeeper is not running onServer A, use the command/etc/init.d/lifekeeper start to start LifeKeeper.

5. Move application back to Server A. At a convenient time, use the LifeKeeper GUI to bring theapplication back into service onServer A. See Bringing a Resource In Service for instructions. Notethat this stepmay be unnecessary if the application onServer A was configured forAutomaticSwitchback.

Removing LifeKeeperYou can uninstall the LifeKeeper packages in a Linux environment using any rpm supported graphicalinterface or through the command line. This section provides detailed instructions on uninstalling LifeKeeperusing the rpm command from the command line. Refer to the rpm(8) man page for complete instructions onusing the rpm command.

SIOS Protection Suite for LinuxPage 219

Page 240: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Removing via GnoRPM

For information on rpm software, you can go to the following web site: http://www.rpm.org/.

Included below are the requirements for removing LifeKeeper software.

l Move applications. Before you remove the software, you should verify that you do not haveapplications requiring LifeKeeper protection on the server. You should never remove LifeKeeper from aserver where an application resource hierarchy is in service. Removing LifeKeeper removes allconfiguration data such as equivalencies, resource hierarchy definitions and log files. See TransferringResource Hierarchies for additional information.

l Start LifeKeeper. LifeKeeper recovery kits may require LifeKeeper to be running when you remove therecovery kit software. If it is not running, the removal process cannot remove the resource instancesfrom other LifeKeeper servers in the cluster which would leave the servers in an inconsistent state.

l Remove all packages. If you remove the LifeKeeper core, you should first remove other packagesthat depend upon LifeKeeper; for example, LifeKeeper recovery kits. It is recommended that beforeremoving a LifeKeeper recovery kit, you first remove the associated application resource hierarchy.

Note: It is recommended that before removing recovery kit software, first remove any associatedhierarchies from that server. Youmay do this using the Unextend Resource configuration task. If youremove a LifeKeeper recovery kit package without unextending the existing hierarchies, any of thecorresponding resource hierarchies currently defined and protected by this recovery kit willautomatically be deleted from your system. The general rule is: You should never remove the recoverykit from a server where the resource hierarchy is in service. This will corrupt your current hierarchies,and you will need to recreate them when you reinstall the recovery kit.

Removing via GnoRPMIn the GnoRPMwindow, for each package to be removed, right-click on the package icon and click Uninstallon the pop-upmenu. (Alternatively, you can select the package icon, then click theUninstall button.)

Removing via Command LineTo remove LifeKeeper from a server, use the rpm -e <packagename> command to remove all theLifeKeeper packages. Refer to the rpm(8) man page for complete instructions on using the rpm command.For example, to remove the LifeKeeper core package, enter the following command:

rpm -e steeleye-lk

For reference, the packages in the LifeKeeper core package cluster are listed below:

steeleye-lksteeleye-lkGUIsteeleye-lkHLPsteeleye-lkIPsteeleye-lkMANsteeleye-lkRAWsteeleye-lkCCISS

SIOS Protection Suite for LinuxPage 220

Page 241: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Removing Distribution Enabling Packages

Removing Distribution Enabling PackagesAfter removing the LifeKeeper packages, the distribution-specific enabling package installed by the setupscript on the SPS Installation Image File should be removed. Depending on your Linux distribution, thatpackage name is steeleye-lk<Linux Distribution>, for example:

steeleye-lkRedHatsteeleye-lkSuSE

Running LifeKeeper With a FirewallLifeKeeper for Linux can work with a firewall in place on the same server if you address the following networkaccess requirements.

Note: If you wish to simply disable your firewall, see Disabling a Firewall below.

LifeKeeper Communication PathsCommunication paths are established between pairs of servers within the LifeKeeper cluster using specific IPaddresses. Although TCP Port 7365 is used by default on the remote side of each connection as it is beingcreated, the TCP port on the initiating side of the connection is arbitrary.  The recommended approach is toconfigure the firewall on each LifeKeeper server to allow both incoming and outgoing traffic for each specificpair of local and remote IP addresses in the communication paths known to that system.

LifeKeeper GUI ConnectionsThe LifeKeeper GUI uses a number of specific TCP ports, including Ports 81 and 82 as the default initialconnection ports. TheGUI also uses RemoteMethod Invocation (RMI), which uses Ports 1024 and above tosend and receive objects. All of these ports must be open in the firewall on each LifeKeeper server to at leastthose external systems on which the GUI client will be run.

LifeKeeper IP Address ResourcesThe firewall should be configured to allow access to any IP address resources in your LifeKeeper hierarchiesfrom those client systems that need to access the application associated with the IP address. Rememberthat the IP address resource canmove from one server to another in the LifeKeeper cluster; therefore, thefirewalls on all of the LifeKeeper servers must be configured properly.

LifeKeeper also uses a broadcast ping test to periodically check the health of an IP address resource. Thistest involves sending a broadcast ping packet from the virtual IP address and waiting for the first responsefrom any other system on the local subnet. To prevent this test from failing, the firewall on each LifeKeeperserver should be configured to allow the following types of network activity.

l Outgoing Internet Control Message Protocol (ICMP) packets from the virtual IP address (so that theactive LifeKeeper server can send broadcast pings)

l Incoming ICMP packets from the virtual IP address (so that other LifeKeeper servers can receive

SIOS Protection Suite for LinuxPage 221

Page 242: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Data Replication

broadcast pings)

l Outgoing ICMP reply packets from any local address (so that other LifeKeeper servers can respond tobroadcast pings)

l Incoming ICMP reply packets to the virtual IP address (so that the active LifeKeeper server canreceive broadcast ping replies)

LifeKeeper Data ReplicationWhen using LifeKeeper Data Replication, the firewall should be configured to allow access to any of the portsused by nbd for replication. The ports used by nbd can be calculated using the following formula:

10001 + <mirror number> + <256 * i>

where i starts at zero and is incremented until the formula calculates a port number that is not in use. In useconstitutes any port found defined in /etc/services, found in the output of netstat -an --inet, or already definedas in use by another LifeKeeper Data Replication resource.

For example: If themirror number for the LifeKeeper Data Replication resource is 0, then the formula wouldinitially calculate the port to use as 10001, but that number is defined in /etc/services on some Linuxdistributions as the SCP Configuration port. In this case, i is incremented by 1 resulting in Port Number10257, which is not in /etc/services on these Linux distributions.

Other Inter-node CommunicationsEach LifeKeeper server communicates using SSL connection on port 778. You can change this port using theconfiguration variable API_SSL_PORT in /etc/default/LifeKeeper.

Disabling a FirewallTo disable the firewall, please follow the procedure described in themanual of your distribution.

Running the LifeKeeper GUI Through a FirewallIn some situations, a LifeKeeper cluster is placed behind a corporate firewall and administrators wish to runthe LifeKeeper GUI from a remote system outside the firewall.

LifeKeeper uses RemoteMethod Invocation (RMI) to communicate between theGUI server and client. TheRMI client must to be able tomake connections in each direction. Because the RMI client uses dynamicports, you can not use preferential ports for the client.

One solution is to use ssh to tunnel through the firewall as follows:

1. Make sure your IT department has opened the secure shell port on the corporate firewall sufficiently toallow you to get behind the firewall. Often themachine IT allows you to get to is not actually amachinein your cluster but an intermediate one from which you can get into the cluster. This machinemust be aUnix or Linux machine.

2. Make sure both the intermediate machine and the LifeKeeper server are running sshd (the secure shell

SIOS Protection Suite for LinuxPage 222

Page 243: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Transferring Resource Hierarchies

daemon) and that X11 port forwarding is enabled (this is usually the line `X11Forwarding yes' in/etc/ssh/sshd_config, but if you are unsure, have your IT do this for you.

3. From your Unix client in X, tunnel to the intermediate machine using:

ssh -X -C <intermediate machine>

The -C means `compress the traffic' and is often useful when coming in over slower internet links.

4. From the intermediate machine, tunnel to the LifeKeeper server using:

ssh -X <LifeKeeper server>

You should not need to compress this time since the intermediate machine should have a reasonablyhigh bandwidth connection to the LifeKeeper server.

5. If all has gone well, when you issue the command:

echo $DISPLAY

it should be set to something like `localhost:10.0'. If it is not set, it is likely that X11 forwarding isdisabled in one of the sshd config files.

6. Verify that you can pop up a simple xterm from the LifeKeeper server by issuing the command:

/usr/X11R6/bin/xterm

7. If the xterm appears, you're ready to run lkGUIapp on the LifeKeeper server using the followingcommand:

/opt/LifeKeeper/bin/lkGUIapp

8. Wait (and wait somemore). Java uses a lot of graphics operations which take time to propagate over aslow link (even with compression), but the GUI console should eventually appear.

Transferring Resource HierarchiesWhen you need to perform routinemaintenance or other tasks on a LifeKeeper Server, you can use theLifeKeeper GUI tomove in-service resources to another server. To transfer in-service resource hierarchiesfrom Server A toServer B, use the GUI to bring the hierarchies into service onServer B. Repeat until all ofServer A’s resources have been placed in-service on their respective backup servers. See Bringing aResource In Service for instructions.

When all of Server A’s resources are active on their backup server(s), you can shut downServer A withoutaffecting application processing. For themaintenance period, however, the resources may not haveLifeKeeper protection depending on the number of servers in the cluster.

Technical NotesWe strongly recommend that you read the following technical notes concerning configuration and operationalissues related to your LifeKeeper environment.

SIOS Protection Suite for LinuxPage 223

Page 244: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Features

LifeKeeper FeaturesItem Description

Licensing

LifeKeeper requires unique runtime license keys for each server.  This applies to bothphysical and virtual servers.  A license key is required for the LifeKeeper core software, aswell as for each separately packaged LifeKeeper recovery kit.  The installation scriptinstalls a License Utilities package that obtains and displays the Host ID of your serverduring the initial install of LifeKeeper. Once your licenses have been installed the utility willreturn the Entitlement ID if it is available or the Host ID if it is not. The Host IDs, along withthe Activation ID(s) provided with your software, are used to obtain license keys from theSIOS Technology Corp. website.

LargeClusterSupport

LifeKeeper supports large cluster configurations, up to 32 servers. There aremany factorsother than LifeKeeper, however, that can affect the number of servers supported in acluster.  This includes items such as the storage interconnect and operating system orstorage software limitations.  Refer to the vendor-specific hardware and softwareconfiguration information to determine themaximum supported cluster size.

International-ization andlocalization

LifeKeeper for Linux v5.2 and later does support wide/multi-byte characters in resource andtag names but does not include native languagemessage support. The LifeKeeper GUI canbe localized by creating locale-specific versions of the Java property files, althoughcurrently only the English version is fully localized. However, many of themessagesdisplayed by the LifeKeeper GUI come from the LifeKeeper core, so localization of the GUIwill provide only a partial solution for users until the core software is fully localized.

See also Language Environment Effects under Restrictions or Known Issues foradditional information.

LifeKeeperMIB File

LifeKeeper can be configured to issue SNMP traps describing the events that are occurringwithin the LifeKeeper cluster.  See the lk_configsnmp(8) man page for more informationabout configuring this capability.  TheMIB file describing the LifeKeeper traps can be foundat /opt/LifeKeeper/include/LifeKeeper-MIB.txt.

Watchdog LifeKeeper supports the watchdog feature.  The feature was tested by SIOS TechnologyCorp. on Red Hat EL 5.5 64-bit, and Red Hat EL 6 + softdog.

STONITH LifeKeeper supports the STONITH feature.  This feature was tested by SIOS TechnologyCorp. on SLES 11 on IBM x3550 x86_64 architecture and RHEL5.5 64-bit. 

XFS FileSystem

The XFS file system does not use the fsck utility to check and fix a file system but insteadrelies onmount to replay the log. If there is a concern that theremay be a consistencyproblem, the system administrator should unmount the file system by taking it out ofservice and run xfs_check(8) and xfs_repair(8) to resolve any issues.

IPv6 SIOS has migrated to the use of the ip command and away from the ifconfig command (formore information, see the IPv6 Known Issue).

SIOS Protection Suite for LinuxPage 224

Page 245: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Tuning

TuningItem Description

IPCSemaphoresand IPCSharedMemory

LifeKeeper requires Inter-Process Communication (IPC) semaphores and IPCsharedmemory.  The default Red Hat values for the following Linux kernel options arelocated in/usr/src/linux/include/linux/sem.h and should be sufficient to support mostLifeKeeper configurations.

Option Required Default Red Hat 6.2SEMOPM        14 32SEMUME        20 32SEMMNU        60 32000SEMMAP        25 32000SEMMNI         25 128

System FileTable

LifeKeeper requires that system resources be available in order to failover successfully to abackup system.  For example, if the system file table is full, LifeKeeper may be unable tostart new processes and perform a recovery.  In kernels with enterprise patches, includingthose supported by LifeKeeper, file-max, themaximum number of open files in the system,is configured by default to 1/10 of the systemmemory size, which should be sufficient tosupport most LifeKeeper configurations.  Configuring the file-max value lower than thedefault could result in unexpected LifeKeeper failures.

The value of file-max may be obtained using the following command:

 cat /proc/sys/fs/file-nr

This will return three numbers.  The first represents the high watermark of file table entries(i.e. themaximum the system has seen so far); the second represents the current number offile table entries, and the third is the file-max value.

To adjust file-max, add (or alter) the “fs,file-max” value in /etc/sysctl.conf (see sysctl.conf(5)for the format) and then run

sysctl –p

to update the system.  The value in /etc/sysctl.conf will persist across reboots.

LifeKeeper OperationsItem Description

KernelDebugger(kdb)

init s

Before using the Kernel Debugger (kdb) or moving to init s on a LifeKeeper protected server,you should first either shut down LifeKeeper on that server or switch any LifeKeeper protectedresources over to the backup server.  Use of kdb with the LifeKeeper SCSI ReservationDaemons (lkscsid and lkccissd) enabled (they are enabled by default) can also lead tounexpected panics.

SIOS Protection Suite for LinuxPage 225

Page 246: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Operations

SystemPanic onLockedSharedDevices

LifeKeeper uses a lock to protect shared data from being accessed by other servers on ashared SCSI Bus.  If LifeKeeper cannot access a device as a result of another server takingthe lock on a device, then a critical error has occurred and quick action should be taken or datacan be corrupted. When this condition is detected, LifeKeeper enables a feature that will causethe system to panic. 

If LifeKeeper is stopped by somemeans other than ‘/etc/init.d/lifekeeper stop-nofailover’ with shared devices still reserved (this could be caused by executing init s),then the LifeKeeper lockingmechanismmay trigger a kernel panic when the other serverrecovers the resource(s).  All resources must be placed out-of-service before stoppingLifeKeeper in this manner.

nolockOption

When using storage applications with locking and following recommendations for the NFSmount options, SPS requires the additional nolock option be set, e.g.rw,nolock,bg,hard,nointr,tcp,nfsvers=3,timeo=600,rsize=32768,wsize=32768,actimeo=0.

Recovering Out-of-ServiceHierarchies

As part of the recovery following the failure of a LifeKeeper server, resource hierarchies thatwere configured on the failed server but which were not in-service anywhere at the time of theserver failure are recovered on the highest priority alive server at the time of the failure. This isthe case nomatter where the out-of-service hierarchy was last in service, including the failedserver, the recovering server, or some other server in the cluster.

Coexistence withLinuxFirewallsandSELinux

The firewall and SELinux are enabled upon installation. After installation is complete, SELinuxshould be disabled and the firewall should bemodified.

LifeKeeper will not install or function if the SELinux mode is "enabled" or "permissive." Todisable SELinux in Red Hat, run the system-config-securitylevel-tui tool from theconsole of the host system. While SELinux for SLES 11 SP1 is available, it toomust bedisabled (http://www.novell.com/linux/releasen.../SUSE-SLES/11/).

AppArmor (for distributions that use this security model) may be enabled.

LifeKeeper will function if a host firewall is enabled. However, unless absolutely necessary, itis recommended that the firewall be disabled and that the LifeKeeper protected resourcesreside behind another shielding firewall.

If LifeKeeper must coexist on firewall enabled hosts, note that LifeKeeper uses specific portsfor communication paths, GUI, IP and Data Replication. When using the Linux firewall feature,the specific ports that LifeKeeper is using need to be opened.

To disable or modify the Red Hat firewall, run the system-config-securitylevel-tuitool from the console of the host system. To disable or modify the SUSE firewall, run yast2and chooseSecurity and Users, then Firewall. Refer to Running LifeKeeper with a Firewallfor details. 

SuidMountOption

The suidmount option is the default whenmounting as root and is not written to the /etc/mtabby themount command. The suidmount option is not needed in LifeKeeper environments.

SIOS Protection Suite for LinuxPage 226

Page 247: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Server Configuration

Server ConfigurationItem Description

BIOS Updates The latest available BIOS should always be installed on all LifeKeeper servers.

LifeKeeper Version 8.2.0 and Later GUI Requirement64-bit versions of any PAM related packages will be required for the LifeKeeper GUI Client to successfullyauthenticate users.

Confirm Failover and Block Resource Failover SettingsMake sure you review and understand the following descriptions, examples and considerations before settingtheConfirm Failover or Block Resource Failover in your LifeKeeper environment. These settings areavailable from the command line or on theProperties panel in the LifeKeeper GUI.

Confirm Failover On:

Definition – Enables manual failover confirmation from System A to System B (whereSystem A is the serverwhose properties are being displayed in the Properties Panel andSystem B is the system to the left of thecheckbox). If this option is set on a system, it will require amanual confirmation by a system administratorbefore allowing LifeKeeper to perform a failover recovery of a system that it detects as failed.

Use the lk_confirmso command to confirm the failover. By default, the administrator has 10minutes to runthis command. This time can be changed by modifying theCONFIRMSOTO setting in/etc/default/LifeKeeper. If the administrator does not run the lk_confirmso commandwithin the timeallowed, the failover will either proceed or be blocked. By default, the failover will proceed. This behavior canbe changed by modifying theCOMFIRMSODEF setting in /etc/default/LifeKeeper.

Example: If you wish to block automatic failovers completely, then you should set the Confirm Failover Onoption in theProperties panel and also set CONFIRMSODEF to 1 (block failover) andCONFIRMSOTO to 0(do not wait to decide on the failover action).

When to select this setting:

This setting is used inmost Disaster Recovery and otherWAN configurations where the configuration doesnot include redundant heartbeat communications paths.

In a regular site (nonmulti-site cluster), open theProperties page from one server and then select the serverthat you want theConfirm Failover flag to be set on.

For aMulti-site WAN configuration: Enable manual failover confirmation

For aMulti-site LAN configuration: Do not enablemanual failover confirmation

SIOS Protection Suite for LinuxPage 227

Page 248: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Set Block Resource Failover On:

In amulti-site cluster environment – from the non-disaster system, select the DR system and check the setconfirm failover flag. You will need to open theProperties panel and select this setting for each non-disasterserver in the cluster.

Set Block Resource Failover On:

Definition - By default, all resource failures will result in a recover event that will attempt to recover the failedresource on the local system. If local recovery fails or is not enabled, then LifeKeeper transfers the resourcehierarchy to the next highest priority system for which the resource is defined. However, if this setting isselected on a designated system(s), all resource transfers due to a resource failure will be blocked from thegiven system.

When the setting is enabled, the followingmessage is logged:

Local recovery failure, failover blocked, MANUAL INTERVENTION REQUIRED

Conditions/Considerations:

In aMulti-site configuration, do not select Block Failover for any server in the configuration.

Remember: This setting will not affect failover behavior if there is a complete system failure. It will only blockfailovers due to resource failures.

NFS Client OptionsWhen setting up a LifeKeeper protected NFS server, how the NFS clients connect to that server canmake asignificant impact on the speed of reconnection on failover. 

NFS Client Mounting Considerations

AnNFS Server provides a network-based storage system to client computers. To utilize this resource, theclient systems must “mount” the file systems that have been NFS exported by the NFS server. There areseveral options that system administrators must consider on how NFS clients are connected to theLifeKeeper protected NFS resources.

UDP or TCP?

The NFS Protocol can utilize either the User Datagram Protocol (UDP) or the Transmission Control Protocol(TCP). NFS has historically used the UDP protocol for client-server communication. One reason for this isthat it is easier for NFS to work in a stateless fashion using the UDP protocol. This “statelessness” is valuablein a high availability clustering environment, as it permits easy reconnection of clients if the protected NFSserver resource is switched between cluster hosts. In general, when working with a LifeKeeper protected NFSresource, the UDP protocol tends to work better than TCP.

Sync Option in /etc/exports

Specifying “sync” as an export option is recommended for LifeKeeper protected NFS resources. The “sync”option tells NFS to commit writes to the disk before sending an acknowledgment back to the NFS client. The

SIOS Protection Suite for LinuxPage 228

Page 249: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

RedHat EL6 (and Fedora 14) Clients with Red Hat EL6 NFS Server

contrasting “async” option is also available, but using this option can lead to data corruption, as the NFSserver will acknowledge NFS writes to the client before committing them to disk. NFS clients can also specify“sync” as an option when they mount the NFS file system.

Red Hat EL6 (and Fedora 14) Clients with Red Hat EL6 NFS Server

Due to what appears to be a bug in the NFS server for Red Hat EL6, NFS clients running Red Hat EL6 (andFedora 14) cannot specify both an NFS version (nfsvers) and UDP in themount command. This samebehavior has been observed on an Ubuntu10.10 client as well. This behavior is not seen with Red Hat EL5clients when using a Red Hat EL6 NFS server, and it is also not seen with any clients using a Red Hat EL5NFS server. The best combination of NFS mount directives to use with Red Hat EL6 (Fedora 14) clients and aRedHat EL 6 NFS server is:

mount <protected-IP>:<export> <mount point>-o nfsvers=2,sync,hard,intr,timeo=1

l This combination produces the fastest re-connection times for the client in case of a switchover or fail-over of the LifeKeeper protected NFS server.

Red Hat EL5 NFS Clients with a Red Hat EL6 NFS Server

The best combination of options when using NFS clients running Red Hat EL5 with a Red Hat EL6 NFSserver for fast reconnection times is:

mount <protected-IP>:<export> <mount point>-o nfsvers=3,sync,hard,intr,timeo=1,udp

Cluster Example

Expanded Multicluster Example

SIOS Protection Suite for LinuxPage 229

Page 250: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

TroubleshootingTheMessage Catalog (located on our Technical Documentation site under “Search for an Error Code”)provides a listing of all error codes, including operational, administrative andGUI, that may be encounteredwhile using SIOS Protection Suite for Linux and, where appropriate, provides additional explanation of thecause of the error code and necessary action to resolve the issue. This full listingmay be searched for anyerror code received, or youmay go directly to one of the following individual Message Catalogs:

l CoreMessage Catalog

l DB2Message Catalog

l DMMP Kit Message Catalog

l Recovery Kit for EC2Message Catalog

l File System Kit Message Catalog

l Gen/App Kit Message Catalog

l GUI Message Catalog

l IP Kit Message Catalog

l Oracle Listener Kit Message Catalog

l Oracle Kit Message Catalog

l SCSI Kit Message Catalog

l DataKeeper Kit Message Catalog

l Quick Service Protection Recovery Kit Message Catalog

In addition to utilizing theMessage Catalog described above, the following topics detail troubleshootingissues, restrictions, etc., that may be encountered:

Common Causes of an SPS Initiated FailoverIn the event of a failure, SPS has twomethods of recovery: local recovery and inter-server recovery. If localrecovery fails, a "failover" is implemented. A failover is defined as automatic switching to a backup serverupon the failure or abnormal termination of the previously active application, server, system, hardwarecomponent or network. Failover and switchover are essentially the same operation, except that failover isautomatic and usually operates without warning, while switchover requires human intervention. Thisautomatic failover can occur for a number of reasons. Below is a list of themost common examples of an SPSinitiated failover.

SIOS Protection Suite for LinuxPage 230

Page 251: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Server Level Causes

Server Level Causes

Server FailureSPS has a built-in heartbeat signal that periodically notifies each server in the configuration that its pairedserver is operating. A failure is detected if a server fails to receive the heartbeat message.

l Primary server loses power or is turned off.

l CPU Usage caused by excessive load -- Under very heavy I/O loads, delays and low memory con-ditions can cause system to become unresponsive such that SPS may detect a server as down and ini-tiate a failover.

l Quorum/Witness - As part of the I/O fencingmechanism of quorum/witness, when a primary serverloses quorum, a "fastboot", "fastkill" or "osu" is performed (based on settings) and a failover is initiated.When determining when to fail over, the witness server allows resources to be brought in service on abackup server only in cases where it verifies the primary server has failed and is no longer part of thecluster. This will prevent failovers from happening due to simple communication failures betweennodes when those failures don’t affect the overall access to, and performance of, the in-service node.

Relevant TopicsSupported Storage List

Server Failure Recovery Scenario

Tuning the LifeKeeper Heartbeat

Quorum/Witness

Communication Failures/Network FailuresSPS sends the heartbeat between servers every five seconds. If a communication problem causes theheartbeat to skip two beats but it resumes on the third heartbeat, SPS takes no action. However, if thecommunication path remains dead for three beats, SPS will label that communication path as dead but willinitiate a failover only if the redundant communication path is also dead.

l Network connection to the primary server is lost.

l Network latency.

l Heavy network traffic on a TCP comm path can result in unexpected behavior, including false failoversand LifeKeeper initialization problems.

l Using STONITH, when SPS detects a communication failure with a node, that node will be poweredoff and a failover will occur.

l Failed NIC.

l Failed network switch.

l Manually pulling/removing network connectivity.

SIOS Protection Suite for LinuxPage 231

Page 252: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Split-Brain

Relevant TopicsCreating a Communication Path

Tuning the LifeKeeper Heartbeat

Network Configuration

Verifying Network Configuration

LifeKeeper Event Forwarding via SNMP

Network-Related Troubleshooting (GUI)

Running LifeKeeperWith a Firewall

STONITH

Split-BrainIf a single comm path is used and the comm path fails, then SPS hierarchies may try to come into service onmultiple systems simultaneously. This is known as a false failover or a “split-brain” scenario. In the “split-brain” scenario, each server believes it is in control of the application and thus may try to access and writedata to the shared storage device. To resolve the split-brain scenario, SPS may cause servers to be poweredoff or rebooted or leave hierarchies out-of-service to assure data integrity on all shared data. Additionally,heavy network traffic on a TCP comm path can result in unexpected behavior, including false failovers and thefailure of LifeKeeper to initialize properly.

The following are scenarios that can cause split-brain:

l Any of the comm failures listed above

l Improper shutdown of LifeKeeper

l Server resource starvation

l Losing all network paths

l DNS or other network glitch

l System lockup/thaw

Resource Level CausesSPS is designed tomonitor individual applications and groups of related applications, periodically performinglocal recoveries or notifications when protected applications fail. Related applications, by example, arehierarchies where the primary application depends on lower-level storage or network resources. SPS monitorsthe status and health of these protected resources. If the resource is determined to be in a failed state, anattempt will bemade to restore the resource or application on the current system (in-service node) withoutexternal intervention. If this local recovery fails, a resource failover will be initiated.

SIOS Protection Suite for LinuxPage 232

Page 253: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Application Failure

Application Failurel An application failure is detected, but the local recovery process fails.

l Remove Failure -During the resource failover process, certain resources need to be removed from ser-vice on the primary server and then brought into service on the selected backup server to provide fullfunctionality of the critical applications. If this remove process fails, a reboot of the primaryserver will be performed resulting in a complete server failover.

Examples of remove failures:

l Unable to unmount file system

l Unable to shut down protected application (oracle, mysql, postgres, etc)

Relevant TopicsFile System Health Monitoring

Resource Error Recovery Scenario

File Systeml Disk Full -- SPS's File System Health Monitoring can detect disk full file system conditions whichmayresult in failover of the file system resource.

l Unmounted or Improperly Mounted File System -- User manually unmounts or changes options on anin-service and LK protected file system.

l Remount Failure -- The following is a list of common causes for remount failure which would lead to afailover:

l corrupted file system (fsck failure)

l failure to createmount point directory

l mount point is busy

l mount failure

l SPS internal error

Relevant TopicsFile System Health Monitoring

IP Address FailureWhen a failure of an IP address is detected by the IP Recovery Kit, the resulting failure triggers the executionof the IP local recovery script. SPS first attempts to bring the IP address back in service on the current

SIOS Protection Suite for LinuxPage 233

Page 254: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Reservation Conflict

network interface. If the local recovery attempt fails, SPS will perform a failover of the IP address and alldependent resources to a backup server. During failover, the remove process will un-configure the IP addresson the current server so that it can be configured on the backup server. Failure of this remove process willcause the system to reboot.

l IP conflict

l IP collision

l DNS resolution failure

l NIC or Switch Failures

Relevant TopicsCreating Switchable IP Address

IP Local Recovery

Reservation Conflictl A reservation to a protected device is lost or stolen

l Unable to regain reservation or control of a protected resource device (caused by manual user inter-vention, HBA or switch failure)

Relevant TopicsSCSI Reservations

Disabling Reservations

SCSI Devicel Protected SCSI device could not be opened. The devicemay be failing or may have been removedfrom the system.

Known Issues and RestrictionsIncluded below are the restrictions or known issues open against LifeKeeper for Linux, broken down byfunctional area.

Installation

DescriptionIn Release 7.4 and forward, relocation of the SIOS product RPM packages is no longer supported.

SIOS Protection Suite for LinuxPage 234

Page 255: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Installation

DescriptionLinux Dependencies

Successful completion of the installation of SIOS Protection Suite for Linux, including the optionalRecovery Kits, requires the installation of a number of prerequisite packages. Note: Unsuccessfulinstallation of these dependent packages will also impact the ability to start SIOS Protection Suite for Linuxalong with the ability to load the SIOS Protection Suite for Linux GUI.

To prevent script failures, these packages should be installed prior to attempting to run the installation setupscript. For a complete list of dependencies, see the Linux Dependencies topic.

The multipathd daemon will log errors in the error log when the nbd driver is loaded as it tries toscan the new devices

Solution:  To avoid these errors in the log, add devnode "^nbd" to the blacklist in /etc/multipath.conf.

Incomplete NFS Setup Logging

When running the Installation setup script from the ISO image (sps.img), the output from the script patchingprocess for NFS is not captured in the LifeKeeper install log (/var/log/LK_install.log). No workaround isavailable.

mksh conflicts with SIOS Protection Suite for Linux setup needing ksh

If the mksh package is installed, the SIOS Protection Suite for Linux setup will fail indicating a packageconflict. The SIOS Protection Suite for Linux requires the ksh package.

Workaround: On RHEL, CentOS or Oracle Linux, remove the mksh package and install the ksh package.After installing the ksh package, re-run the SIOS Protection Suite for Linux setup.

Example:

1. Remove the mksh package

yum remove mksh

2. Install the ksh package

yum install ksh

3. Re-run setup

SIOS Protection Suite for LinuxPage 235

Page 256: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Installation

DescriptionUnexpected termination of daemons

Daemons using IPC terminate unexpectedly after update to Red Hat Enterprise Linux 7.2 and RedHat 7.2derivative systems. A new systemd feature was introduced in Red Hat Enterprise Linux 7.2 related to thecleanup of all allocated inter-process communication (IPC) resources when the last user session finishes.A session can be an administrative cron job or an interactive session. This behavior can cause daemonsrunning under the same user, and using the same resources, to terminate unexpectedly.

To work around this problem, edit the file /etc/systemd/logind.conf and add the following line:

RemoveIPC=no

Then, execute the following command, so that the change is put into effect:

systemctl restart systemd-logind.service

After performing these steps, daemons no longer crash in the described situation. Applications (such asMQ, Oracle, SAP, etc) using sharedmemory and semaphores may be affected by this issue and thereforerequire this change.

SIOS Protection Suite for LinuxPage 236

Page 257: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Core

LifeKeeper Core

SIOS Protection Suite for LinuxPage 237

Page 258: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Core

DescriptionFile system labels should not be used in large configurations

The use of file system labels can cause performance problems during boot-up with large clusters.  Theproblems are generally the result of the requirement that to use labels all devices connected to a systemmust be scanned.  For systems connected to a SAN, especially those with LifeKeeper where accessing adevice is blocked, this scanning can be very slow. 

To avoid this performance problem onRedHat systems, edit /etc/fstab and replace the labels with the pathnames. 

lkscsid will halt system when it should issue a sendevent

When lkscsid detects a disk failure, it should, by default, issue a sendevent to LifeKeeper to recoverfrom the failure.  The sendevent will first try to recover the failure locally and if that fails, will try to recoverthe failure by switching the hierarchy with the disk to another server. On some versions of Linux (RHEL 5and SLES11), lkscsid will not be able to issue the sendevent but instead will immediately halt thesystem. This only affects hierarchies using the SCSI device nodes such as /dev/sda.

DataKeeper Create Resource fails

When using DataKeeper in certain environments (e.g., virtualized environments with IDE disk emulation, orservers with HP CCISS storage), an error may occur when amirror is created:

ERROR 104052: Cannot get the hardware ID of the device "dev/hda3"

This is because LifeKeeper does not recognize the disk in question and cannot get a unique ID to associatewith the device.

Workaround: Add a pattern for our disk(s) to the DEVNAME device_pattern file, e.g.:

# cat /opt/LifeKeeper/subsys/scsi/resources/DEVNAME/device_pattern

/dev/hda*

Specifying hostnames for API access 

The key name used to store LifeKeeper server credentials must match exactly the hostname of the otherLifeKeeper server (as displayed by the hostname command on that server). If the hostname is an FQDN,then the credential key must also be the FQDN. If the hostname is a short name, then the key must also bethe short name.

Workaround: Make sure that the hostname(s) stored by credstorematch the hostname exactly.

SIOS Protection Suite for LinuxPage 238

Page 259: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Core

DescriptionThe use of lkbackups taken from versions of LifeKeeper prior to 8.3.2 requires manuallyupdating /etc/default/LifeKeeper when restored on a later version

In the current version of LifeKeeper/SPS, there have been significant enhancements to the logging andother major core components. These enhancements affect the tunables in the/etc/default/LifeKeeper file. When an lkbackup created with an older version of LifeKeeper/SPS isrestored on a newer version of LifeKeeper/SPS, these tunables will no longer have the right values causinga conflict.

Solution:Prior to restoring from an lkbackup, save /etc/default/LifeKeeper. After restoring fromthe lkbackup, merge in the tunable values as listed below.

When using lkbackups taken from versions previous to 8.0.0 and restoring on 8.0.0 or later:

LKSYSLOGTAG=LifeKeeper

LKSYSLOGSELECTOR=local6

Also, when using lkbackups taken from versions previous to 9.0.1 and restoring on 9.0.1 or later:

PATH=/opt/LifeKeeper/bin:/usr/java/jre1.8.0_51/bin:/usr/java/bin:/usr/java/jdk1.8.0_51/bin:/bin:/usr/bin:/usr/sbin:/sbin

See section on LoggingWith syslog for further information.

Restore of an lkbackup after a resource has been created may leave broken equivalencies

The configuration files for created resources are saved during an lkbackup. If a resource is created for thefirst time after an lkbackup has been taken, that resourcemay not be properly accounted for whenrestoring from this previous backup.

Solution: Restore from lkbackup prior to adding a new resource for the first time. If a new resource hasbeen added after an lkbackup, it should either be deleted prior to performing the restore, or delete aninstance of the resource hierarchy, then re-extend the hierarchy after the restore. Note: It is recommendedthat an lkbackup be run when a resource of a particular type is created for the first time.

Resources removed in the wrong order during failover

In cases where a hierarchy shares a common resource instance with another root hierarchy, resources aresometimes removed in the wrong order during a cascading failover or resource failover.

Solution: Creating a common root will ensure that resource removals in the hierarchy occur from the topdown.

1. Create a gen/app that always succeeds on restore and remove.

2. Make all current roots children of this new gen/app.

Note: Using /bin/true for the restore and remove script would accomplish this.

SIOS Protection Suite for LinuxPage 239

Page 260: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Core

DescriptionLifeKeeper syslog EMERG severity messages do not display to a SLES11 host's console whichhas AppArmor enabled

LifeKeeper is accessing /var/run/utmp which is disallowed by the SLES11 AppArmor syslog-ngconfiguration.

Solution: To allow LifeKeeper syslog EMERG severity messages to appear on a SLES11 console withAppArmor enabled, add the following entry to /etc/apparmor.d/sbin.syslog-ng:

/var/run/utmp kr

If added to sbin.syslog-ng, you can replace the existing AppArmor definition (without rebooting) andupdate with:

apparmor_parser -r /etc/apparmor.d/sbin.syslog-ng

Verify that the AppArmor update was successful by sending an EMERG syslog entry via:

logger -p local6.emerg "This is a syslog/lk/apparmor test."

RHEL 6.0 is NOT Recommended

SIOS strongly discourages the use of RHEL 6.0. If RHEL 6.0 is used, please understand that an OS updatemay be required to fix certain issues including, but not limited to:

l DMMP fails to recover from cable pull with EMC CLARiiON (fixed in RHEL 6.1)

l md recovery process hangs (fixed in the first update kernel of RHEL 6.1)

Note: In DataKeeper configurations, if the operating system is updated, a reinstall/upgrade of SIOSProtection Suite for Linux is required.

Delete of nested file system hierarchy generates "Object does not exist" message

Solution: This message can be disregarded as it does not create any issues.

filesyshier returns the wrong tag on a nested mount create

When a database has nested file system resources, the file system kit will create the file system for boththe parent and the nested child. However, filesyshier returns only the child tag. This causes theapplication to create a dependency on the child but not the parent.

Solution: Whenmultiple file systems are nested within a single mount point, it may be necessary tomanually create the additional dependencies to the parent application tag using dep_create or via the UICreate Dependency.

DataKeeper: Nested file system create will fail with DataKeeper

When creating a DataKeeper mirror for replicating an existing File System, if a file system is nested withinthis structure, youmust unmount it first before creating the File System resource.

Workaround: Manually unmount the nested file systems and remount / create each nestedmount.

SIOS Protection Suite for LinuxPage 240

Page 261: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Core

Descriptionlkstart on SLES 11 SP2 generates insserv message

When lkstart is run on SLES 11 SP2, the following insserv message is generated:

insserv: Service syslog is missed in the runlevel 4 to useservice steeleye-runit

LifeKeeper and steeleye-runit scripts are configured by default to start in run level 4 where dependent initscript syslog is not. If system run level is changed to 4, syslog will be terminated and LifeKeeper will beunable to log.

Solution:Make sure system run level is NOT changed to run level 4.

Changing the mount point of the device protected by Filesystem resource may lead datacorruption

Themount point of the device protected by LifeKeeper via the File System resource (filesys) must not bechanged. Doing somay lead to the device beingmounted onmultiple nodes and if a switchover is done andthis could lead to data corruption.

XFS file system usage may cause quickCheck to fail.

In the case CHECK_FS_QUOTAS setting is enabled for LifeKeeper installed on Red Hat Enterprise Linux7 / Oracle Linux 7 / CentOS 7, quickCheck fails if uquota, gquota option is set to the XFS file systemresource, which is to be protected. Solution: Use usrquota, grpquota instead of uquota, gquota for mountoptions of XFS file system, or, disable CHECK_FS_QUOTAS setting.

Btrfs is not supported

Btrfs cannot be used for LifeKeeper files (/opt/LifeKeeper), bitmap files if they are not in /opt/LifeKeeper,lkbackupfiles, or any other LifeKeeper related files. In addition, LifeKeeper does not support protecting Btrfswithin a resource hierarchy.

SLES12 SP1 or later on AWS

The following restrictions apply with SLES12 SP1 or later on AWS:l Cannot set static routing configurationAutomatic IP address configuration via DHCP does not work if a static routing configuration is set in/etc/sysconfig/network/routes. This causes the network not to start correctly.Solution:Update the routing information in the configuration file by modifying the "ROUTE"parameter in /etc/sysconfig/network/ifroute-ethX

l Hostname is changed even if the "Change Hostname via DHCP" setting is disabled.The LK service does not work properly if the hostname is rewritten. In SLES12 SP1 or later on AWS,the hostname is changed even after the "Change Hostname via DHCP" setting is disabled.Solution:

l Update /etc/cloud/cloud.cfg to comment out the "update_hostname" parameter

l Update /etc/cloud/cloud.cfg to set the preserve_hostname parameter to "true"

l Update /etc/sysconfig/network/dhcp to set the DHCLIENT_SET_HOSTNAME parameter to"no"

SIOS Protection Suite for LinuxPage 241

Page 262: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

LifeKeeper Core

DescriptionShutdown Strategy set to "Switchover Resources" may fail when using Quorum/Witness Kit inWitness mode

Hierarchy switchover during LifeKeeper shutdownmay fail to occur when using the Quorum/Witness Kit inWitness mode.Workaround: Manually switchover resource hierarchies before shutdown.

Edit /etc/service

If the following entry in /etc/service is deleted, LifeKeeper cannot start up.

lcm_server 7365/tcpDon't delete this entry when editing the file.

Any storage unit which returns a string including a space for the SCSI ID cannot be protected byLifeKeeper.

Automatic recovery of network may fail when link status goes down on Red Hat 5.x and Red Hat6.x systems when using bonded interfaces.

The network doesn't recover automatically when using bonded interfaces when the link status is lost andthen restored on RedHat 5.x or Red Hat 6.x. Loss of the link status can occur via a bad network cable, badswitch or hub or a cable reconnection and ifdown -> ifup. To recover from this status, restart the networkmanually by executing the following command by a user with root authorization.

# service network restart

Please note that this problem is already corrected for RHEL7. Also, this problem doesn't occur with SLES.

SIOS Protection Suite for LinuxPage 242

Page 263: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Internet/IP Licensing

Internet/IP Licensing

Description/etc/hosts settings dependency

/etc/hosts settings:When using internet-based licensing (IPv4 address), the configuration of /etc/hosts can negatively impactlicense validation. If LifeKeeper startup fails with:

Error in obtaining LifeKeeper license key:Invalid host.The hostid of this system does not match the hostid specified in the

license file.

and the listed internet hostid is correct, then the configuration of /etc/hosts may be the cause. To correctlymatch /etc/hosts entries, IPv4 entries must be listed before any IPv6 entries. To verify if the /etc/hostsconfiguration is the cause, run the following command:

/opt/LifeKeeper/bin/lmutil lmhostid -internet -n

If the IPv4 address listed does not match the IPv4 address in the installed license file, then /etc/hosts mustbemodified to place IPv4 entries before IPv6 entries to return the correct address.

GUI

DescriptionGUI login prompt may not re-appear when reconnecting via a web browser after exiting the GUI

When you exit or disconnect from theGUI applet and then try to reconnect from the sameweb browsersession, the login prompt may not appear.

Workaround: Close the web browser, re-open the browser and then connect to the server. When using theFirefox browser, close all Firefox windows and re-open.

lkGUIapp on RHEL 5 reports unsupported theme errors

When you start the GUI application client, youmay see the following consolemessage. This messagecomes from the RHEL 5 and FC6 Java platform look and feel and will not adversely affect the behavior ofthe GUI client.

/usr/share/themes/Clearlooks/gtk-2.0/gtkrc:60: Engine "clearlooks"is unsupported, ignoring

SIOS Protection Suite for LinuxPage 243

Page 264: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

GUI

DescriptionGUI does not immediately update IP resource state after network is disconnected and thenreconnected

When the primary network between servers in a cluster is disconnected and then reconnected, the IPresource state on a remote GUI client may take as long as 1minute and 25 seconds to be updated due to aproblem in the RMI/TCP layer.

SIOS Protection Suite for LinuxPage 244

Page 265: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

GUI

DescriptionJava Mixed Signed/Unsigned Code Warning - When loading the LifeKeeper Java GUI client appletfrom a remote system, the following security warning may be displayed: 

Enter “Run” and the following dialog will be displayed: 

SIOS Protection Suite for LinuxPage 245

Page 266: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Data Replication

DescriptionBlock?  Enter “No” and the LifeKeeper GUI will be allowed to operate. 

Recommended Actions:  To reduce the number of security warnings, you have two options: 

1. Check the “Always trust content from this publisher” box and select “Run”.  The next time theLifeKeeper GUI Java client is loaded, the warningmessage will not be displayed.

or

2. Add the following entry to your Java “deployment.properties” file to eliminate the second dialogabout blocking. The security warning will still be displayed when you load the Java client, however,the applet will not be blocked and the Block “Yes” or “No” dialog will not be displayed.  Please notethis setting will apply to all of your Java applets.

deployment.security.mixcode=HIDE_RUN 

To bypass bothmessages, implement 1 and 2.

steeleye-lighttpd process fails to start if Port 778 is in use

If a process is using Port 778 when steeleye-lighttpd starts up, steeleye-lighttpd fails causing a failure toconnect to the GUI.

Solution:Set the following tunable on all nodes in the cluster and then restart LifeKeeper on all the nodes:

Add the following line to /etc/default/LifeKeeper:

API_SSL_PORT=port_number

where port_number is the new port to use.

Data Replication

DescriptionImportant reminder about DataKeeper for Linux asynchronous mode in an LVM over DataKeeprconfiguration

Kernel panics may occur in configurations were LVM resources sit abovemultiple asynchronous mirrors. Inthese configurations data consistency may be an issue if a panic occurs. Therefore the requiredconfigurations are a single DataKeeper mirror or multiple synchronous DataKeeper mirrors.

SIOS Protection Suite for LinuxPage 246

Page 267: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Data Replication

DescriptionIn symmetric active SDR configurations with significant I/O traffic on both servers, the filesystemmounted on the netraid device (mirror) stops responding and eventually the whole system hangs

Due to the single threaded nature of the Linux buffer cache, the buffer cache flushing daemon can hangtrying to flush out a buffer which needs to be committed remotely. While the flushing daemon is hung, allactivities in the Linux system with dirty buffers will stop if the number of dirty buffers goes over the systemaccepted limit (set in/proc/sys/kernel/vm/bdflush).

Usually this is not a serious problem unless something happens to prevent the remote system from clearingremote buffers (e.g. a network failure).  LifeKeeper will detect a network failure and stop replication in thatevent, thus clearing a hang condition.  However, if the remote system is also replicating to the local system(i.e. they are both symmetrically replicating to each other), they can deadlock forever if they both get intothis flushing daemon hang situation.

The deadlock can be released by manually killing the nbd-client daemons on both systems (which will breakthemirrors).  To avoid this potential deadlock entirely, however, symmetric active replication is notrecommended.

Mirror breaks and fills up /var/log/messages with errors

This issue has been seen occasionally (on Red Hat EL 6.x and CentOS 6) during stress tests with inducedfailures, especially in killing the nbd_server process that runs on amirror target system. Upgrading to thelatest kernel for your distributionmay help lower the risk of seeing this particular issue, such as kernel-2.6.32-131.17.1 on Red Hat EL 6.0 or 6.1. Rebooting the source system will clear up this issue. 

With the default kernel that comes with CentOS 6 (2.6.32-71), this issuemay occur muchmore frequently(even when themirror is just under a heavy load.) Unfortunately, CentOS has not yet released a kernel(2.6.32-131.17.1) that will improve this situation. SIOS recommends updating to the 2.6.32-131.17.1 kernelas soon as it becomes available for CentOS 6.

Note:Beginning with SPS 8.1, when performing a kernel upgrade on RedHat Enterprise Linux systems, itis no longer a requirement that the setup script (./setup) from the installation image be rerun. Modulesshould be automatically available to the upgraded kernel without any intervention as long as the kernel wasinstalled from a proper Red Hat package (rpm file).

High CPU usage reported by top for md_raid1 process with large mirror sizes

With the mdX_raid1 process (with X representing themirror number), high CPU usage as reported by topcan be seen on someOS distributions when working with very largemirrors (500GB ormore).

Solution: To reduce the CPU usage percent, modify the chunk size to 1024 via the LifeKeeper tunableLKDR_CHUNK_SIZE then delete and recreate themirror in order to use this new setting.

The use of lkbackup with DataKeeper resources requires a full resync

Although lkbackup will save the instance and mirror_info files, it is best practice to perform a fullresync of DataKeeper mirrors after a restore from lkbackup as the status of source and target cannot beguaranteed while a resource does not exist.

SIOS Protection Suite for LinuxPage 247

Page 268: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Data Replication

DescriptionMirror resyncs may hang in early Red Hat/CentOS 6.x kernels with a "Failed to remove device"message in the LifeKeeper log

Kernel versions prior to version 2.6.32-131.17.1 (RHEL 6.1 kernel version 2.6.32-131.0.15 before update,etc) contain a problem in themd driver used for replication. This problem prevents the release of the nbddevice from themirror resulting in the logging of multiple "Failed to remove device" messages and theaborting of themirror resync. A system reboot may be required to clear the condition. This problem hasbeen observed during initial resyncs after mirror creation and when themirror is under stress.

Solution: Kernel 2.6.32-131.17.1 has been verified to contain the fix for this problem. If you are usingDataKeeper with Red Hat or CentOS 6 kernels before the 2.6.32-131.17.1 version, we recommendupdating to this or the latest available version.

DataKeeper does not support using Network Compression on SLES11 SP4 and SLES12 SP1 orlater

DataKeeper does not support using Network Compression on SLES11 SP4 and SLES12 SP1 or later dueto disk I/O performance problem.

SIOS Protection Suite for LinuxPage 248

Page 269: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

IPv6

IPv6

SIOS Protection Suite for LinuxPage 249

Page 270: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

IPv6

DescriptionSIOS has migrated to the use of the ip command and away from the ifconfig command. Because ofthis change, customers with external scripts are advised tomake a similar change. Instead of issuingthe ifconfig command and parsing the output looking for a specific interface, scripts should insteaduse "ip -o addr show" and parse the output looking for lines that contain the words "inet" and"secondary".

# ip -o addr show1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN\    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

1: lo    inet 127.0.0.1/8 scope host lo1: lo    inet6 ::1/128 scope host\       valid_lft forever preferred_lft forever

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_faststate UP qlen 1000

\    link/ether d2:05:de:4f:a2:e6 brd ff:ff:ff:ff:ff:ff2: eth0    inet 172.17.100.77/22 brd 172.17.103.255 scope global eth02: eth0    inet 172.17.100.79/22 scope global secondary eth02: eth0    inet 172.17.100.80/22 scope global secondary eth02: eth0    inet6 2001:5c0:110e:3364::1:2/64 scope global\       valid_lft forever preferred_lft forever

2: eth0    inet6 2001:5c0:110e:3300:d005:deff:fe4f:a2e6/64 scopeglobal dynamic

\       valid_lft 86393sec preferred_lft 14393sec2: eth0    inet6 fe80::d005:deff:fe4f:a2e6/64 scope link\       valid_lft forever preferred_lft forever

So for the above output from the ip command, the following lines contain virtual IP addresses for the eth0interface:

2: eth0    inet 172.17.100.79/22 scope global secondary eth02: eth0    inet 172.17.100.80/22 scope global secondary eth0

SIOS Protection Suite for LinuxPage 250

Page 271: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

IPv6

Description'IPV6_AUTOCONF = No' for /etc/sysconfig/network-scripts/ifcfg-<nicName> is not being honoredon reboot or boot

On boot, a stateless, auto-configured IPv6 address is assigned to the network interface. If a comm path iscreated with a stateless IPv6 address of an interface that has IPV6_AUTOCONF=No set, the address willbe removed if any system resources manage the interface, e.g. ifdown <nicName>;ifup <nicName>.

Comm path using auto-configured IPv6 addresses did not recover and remained dead after rebootingprimary server because IPV6_AUTOCONF was set to No.

Solution: Use Static IPv6 addresses only. The use of auto-configured IPv6 addresses could cause acomm loss after a reboot, change NIC, etc.

While IPv6 auto-configured addresses may be used for comm path creation, it is incumbent upon thesystem administrator to be aware of the following conditions:

l IPv6 auto-configured/stateless addresses are dependent on the network interface (NIC)MACaddress. If a comm path was created and the associated NIC is later replaced, the auto-configuredIPv6 address will be different and LifeKeeper will correctly show the comm path is dead. The commpath will need to be recreated.

l At least with RHEL 5.6, implementing the intended behavior for assuring consistent IPv6 auto-configuration during all phases of host operation requires specific domain knowledge for accuratelyand precisely setting the individual interface config files AS WELL AS the sysctl.conf,net.ipv6.* directives (i.e. explicity setting IPV6_AUTOCONF in the ifcfg-<nic> which is referencedby the 'if/ip' utilities AND setting directives in /etc/sysctl.conf which impact NIC control when thesystem is booting and switching init levels). 

IP: Modify Source Address Setting for IPv6 doesn't set source address

When attempting to set the source address for an IPv6 IP resource, it will report success when nothing waschanged.

Workaround: Currently no workaround is available. This will be addressed in a future release.

IP: Invalid IPv6 addressing allowed in IP resource creation

Entering IPv6 addresses of the format 2001:5c0:110e:3368:000000:000000001:61:14 is accepted when theoctets contain more than four characters.

Workaround: Enter correctly formatted IPv6 addresses.

Can't connect to host via IPv6 addressing

lkGUIapp will fail connecting to a host via IPv6 hex addressing, either via resolvable host name or IPaddress. lkGUIapp requires an IPv4 configured node for connection. IPv6 comm paths are fully supported.

SIOS Protection Suite for LinuxPage 251

Page 272: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Apache

DescriptionIPv6 resource reported as ISP when address assigned to bonded NIC but in 'tentative' state

IPv6 protected resources in LifeKeeper will incorrectly be identified as 'In Service Protected' (ISP) on SLESsystems where the IPv6 resource is on a bonded interface, amode other than 'active-backup' (1) and Linuxkernel 2.6.21 or lower. The IPv6 bonded link will remain in the 'tentative' state with the addressunresolvable.

Workaround: Set the bonded interfacemode to 'active-backup' (1) or operate with an updated kernel whichwill set the link state from 'tentative' to 'valid' for modes other than 'active-backup' (1).

Apache

DescriptionApache Kit does not support IPv6; doesn't indentify IPv6 in httpd.conf

Any IPv6 addresses assigned to the 'Listen' directive entry in the httpd.conf file will cause problems.

Solution: Until there is support for IPv6 in the Apache Recovery Kit, there can be no IPv6 address inthe httpd.conf file after the resource has been created.

Oracle Recovery Kit

DescriptionThe Oracle Recovery Kit does not include support for Connection Manager and Oracle Namesfeatures

The LifeKeeper Oracle Recovery Kit does not include support for the following Oracle Net features ofOracle: Oracle ConnectionManager, a routing process that manages a large number of connections thatneed to access the same service; andOracle Names, the Oracle-specific name service that maintains acentral store of service addresses.

The LifeKeeper Oracle Recovery Kit does protect the Oracle Net Listener process that listens for incomingclient connection requests andmanages traffic to the server.  Refer to the LifeKeeper for Linux OracleRecovery Kit Administration Guide for LifeKeeper configuration specific information regarding the OracleListener.

SIOS Protection Suite for LinuxPage 252

Page 273: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

MySQL

DescriptionThe Oracle Recovery Kit does not support the ASM or grid component features of Oracle 10g

The following information applies to Oracle 10g database instances only.  TheOracle Automatic StorageManager (ASM) feature provided in Oracle 10g is not currently supported with LifeKeeper.  In addition, thegrid components of 10g are not protected by the LifeKeeper Oracle Recovery Kit.  Support for raw devices,file systems, and logical volumes are included in the current LifeKeeper for Linux Oracle Recovery Kit.  Thesupport for the grid components can be added to LifeKeeper protection using the gen/app recovery kit.

The Oracle Recovery Kit does not support NFS Version 4

TheOracle Recovery Kit supports NFS Version 3 for shared database storage. NFS Version 4 is notsupported at this time due to NFSv4 file lockingmechanisms.

Oracle listener stays in service on primary server after failover

Network failures may result in the listener process remaining active on the primary server after anapplication failover to the backup server. Though connections to the correct database are unaffected, youmay still want to kill that listener process.

The Oracle Recovery Kit does not support Oracle Database Standard Edition 2 (SE2) on AWS EC2system

During Oracle Database Standard Edition 2 (SE2) test, an unknown behavior was reported on AWS EC2system. However, we confirmed that other services except EC2 did not have the same issue, meaning thatOracle Recovery Kit supports SE2 on systems except EC2.

MySQL

DescriptionThe "include" directive is not supported

The "include" directive is not supported. All the setup configuration informationmust be described in asingle my.cnf file.

Crash Recovery

RestartingMySQL after an abnormal termination initiates aMySQL crash recovery. While in this recoverystate MySQL client connections are denied. This will prevent LifeKeeper from checking the state of MySQLcausing a possible failover to the standby node.

SIOS Protection Suite for LinuxPage 253

Page 274: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

NFS Server Recovery Kit

NFS Server Recovery Kit

DescriptionTop level NFS resource hierarchy uses the switchback type of the hanfs resource

The switchback type, which dictates whether the NFS resource hierarchy will automatically switch back tothe primary server when it comes back into service after a failure, is defined by the hanfs resource.

Some clients are unable to reacquire nfs file locks

When acting as NFS clients, some Linux kernels do not respond correctly to notifications from anNFSserver that an NFS lock has been dropped and needs to be reacquired.  As a result, when these systemsare the clients of an NFS file share that is protected by LifeKeeper, the NFS locks held by these clients arelost during a failover or switchover.

When using storage applications with locking and following recommendations for the NFS mount options,SPS requires the additional nolock option be set, e.g.rw,nolock,bg,hard,nointr,tcp,nfsvers=3,timeo=600,rsize=32768,wsize=32768,actimeo=0.

NFS v4 changes not compatible with SLES 11 nfs subsystem operation

Themounting of a non-NFS v4 remote export on SLES 11 starts rpc.statd. The start up of rpc.statd on theout of service node in a cluster protecting an NFS v4 root export will fail.

Solution: Do not mix NFS v2/v3 with a cluster protecting an NFS v4 root export.

NFS v4 cannot be configured with IPv6

IPv6 virtual IP gets rolled up into the NFSv4 heirarchy.

Solution: Do not use an IPv6 virtual IP resource when creating an NFSv4 resource.

NFS v4: Unable to re-extend hierarchy after unextend

Extend fails because export point is already exported on the target server.  A re-extend to server A of a NFSv4 hierarchy will fail if a hierarchy is created on server A and extended to server B, brought in service onserver B and then unextended from server A.

Solution: On server A run the command "exportfs -ra" to clean up the extra export information left behind.

SIOS Protection Suite for LinuxPage 254

Page 275: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

NFS Server Recovery Kit

DescriptionFile Lock switchover with NFSv3 fails on some operating systems

Failover file locks with NFSv3 during resources switchover/failover does not work on the operatingsystems listed below. Lock failover with NFSv3 is currently not supported on theseOS versions.

l RHEL7, CentOS7, OL7

l RHEL6, CentOS6, OL6

l SLES12

l SLES11

Solution:Use the lock failover features available with NFSv4.

The Oracle Recovery Kit does not support NFSv4

TheOracle Recovery Kit supports NFSv3 for shared database storage. NFSv4 is not supported at this timedue to NFSv4 file lockingmechanisms.

Stopping and starting NFS subsystem adversely impacts SIOS Protection Suite protected NFSexports.

If the NFS subsystem (/etc/init.d/nfs on RedHat or /etc/init.d/nfsserver on SuSE) isstopped while SIOS Protection Suite for Linux is protecting NFS exports, then all SIOS Protection Suite forLinux protected exported directories will be impacted as the NFS stop action performs an unexport of all thedirectories. The SIOS Protection Suite for Linux NFS quickCheck script will detect the stopped NFSprocesses and the unexported directories and run a local recovery to restart the processes and re-export thedirectories. However, it will take a quickCheck run for each protected export for the SIOS Protection SuiteNFS ARK to recover everything. For example, if five exports are protected by the SIOS Protection Suite forLinux NFS ARK, it will take five quickCheck runs to recover all the exported directories the kit protects.Based on the default quickCheck time of twominutes, it could take up to tenminutes to recover all theexported directories.

Workaround: Do not stop the NFS subsystem while the SIOS Protection Suite NFS ARK is activelyprotecting exported directories on the system. If the NFS subsystemmust be stopped, all NFS resourcesshould be switched to the standby node before stopping the NFS subsystem. Use of the exportfscommand should also be considered. This command line utility provides the ability to export and unexport asingle directory thus bypassing the need to stop the entire NFS subsystem.

SIOS Protection Suite for LinuxPage 255

Page 276: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

SAP Recovery Kit

SAP Recovery Kit

DescriptionFailed delete or unextend of a SAP hierarchy

Deleting or unextending a SAP hierarchy that contains the same IP resource in multiple locations within thehierarchy can sometimes cause a core dump that results in resources not being deleted.

To correct the problem, after the failed unextend or delete operation, manually remove any remainingresources using the LifeKeeper GUI.  Youmay also want to remove the core file from the server.

Handle Warnings gives a syntax error at -e line 1

When changing the default behavior of No in Handle Warnings to Yes, an error is received.

Solution: Leave this option at the default setting of No. Note: It is highly recommended that this setting beleft on the default selection of No as Yellow is a transient state that most often does not indicate a failure.

Choosing same setting causes missing button on Update Wizard

If user attempts to update the Handle Warning without changing the current setting, the next screen,which indicates that they must go back, is missing the Done button.

When changes are made to res_state, monitoring is disabled

If Protection Level is set to BASIC and SAP is taken downmanually (i.e. for maintenance), it will bemarked as FAILED andmonitoring will stop.

Solution: In order for monitoring to resume, LifeKeeper will need to start up the resource instead of startingit upmanually.

ERS in-service fails on remote host if ERS is not parent of Core/CI

Creating an ERS resource without any additional SAP resource dependents will cause initial in-service tofail on switchover. 

Solution: Create ERS as parent of CI/Core instance (SCS or ASCS), then retry in-service.

LVM Recovery Kit

DescriptionImportant reminder about DataKeeper for Linux asynchronous mode in an LVM over DataKeeprconfiguration

Kernel panics may occur in configurations were LVM resources sit abovemultiple asynchronous mirrors. Inthese configurations data consistency may be an issue if a panic occurs. Therefore the requiredconfigurations are a single DataKeeper mirror or multiple synchronous DataKeeper mirrors.

SIOS Protection Suite for LinuxPage 256

Page 277: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Multipath Recovery Kits (DMMP / HDLM / PPATH / NECSPS)

DescriptionUse of lkID incompatible with LVM overwritten on entire disk

When lkID is used to generate unique disk IDs on disks that are configured as LVM physical volumes, thereis a conflict in the locations in which the lkID and LVM information is stored on the disk.  This causes eitherthe lkID or LVM information to be overwritten depending on the order in which lkID and pvcreate are used.

Workaround: When it is necessary to use lkID in conjunction with LVM, partition the disk and use the diskpartition(s) as the LVM physical volume(s) rather than the entire disk.

LVM actions slow on RHEL 6

When running certain LVM commands on RHEL 6, performance is sometimes slower than in previousreleases.  This can be seen in slightly longer restore and remove times for hierarchies with LVM resources.

The configuration of Raw and LVM Recovery Kits together is not supported in RHEL 6environment

When creating a Raw resource, the Raw Recovery Kit is looking for a device file based onmajor # andminor # of Raw device. As the result, /dev/dm-* will be the device; however, this type of /dev/dm-*cannot be handled by the LVM Recovery Kit and a "raw device not found" error will occur in the GUI.

Multipath Recovery Kits (DMMP / HDLM / PPATH / NECSPS)

DescriptionMultipath Recovery Kits (DMMP / HDLM / PPATH / NECSPS): Registration conflict error occurson lkstop when resource OSF

Themultipath recovery kits (DMMP, HDLM, PPATH, NECSPS) can have a system halt occur on theactive (ISP) node when LifeKeeper is stopped on the standby (OSU) node if theMultipath resource state isOSF.

Workarounds:

a) Switch the hierarchy to the standby node before LifeKeeper is stopped

OR

b) Run ins_setstate on the standby node and set theMultipath resource state to OSU before LifeKeeper isstopped

SIOS Protection Suite for LinuxPage 257

Page 278: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

DMMP Recovery Kit

DMMP Recovery Kit

DescriptionDMMP: Write issued on standby server can hang

If a write is issued to a DMMP device that is reserved on another server, then the IO can hang indefinitely(or until the device is no longer reserved on the other server).  If/when the device is released on the otherserver and the write is issued, this can cause data corruption.

The problem is due to the way the path checking is done along with the IO retries in DMMP. When "no_path_retry" is set to 0 (fail), this hang will not occur. When the path_checker for a device fails when thepath is reserved by another server (MSA1000), then this also will not occur.

Workaround: Set "no_path_retry" to 0 (fail).  However, this can cause IO failures due to transient pathfailures.

DMMP: Multiple initiators are not registered properly for SAS arrays that support ATP_C

LifeKeeper does not natively support configurations where there aremultiple SAS initiators connected to aSAS array. In these configurations, LifeKeeper will not register each initiator correctly, so only one initiatorwill be able to issue IOs. Errors will occur if themultipath driver (DMMP for example) tries to issue IOs to anunregistered initiator.

Solution: Set the following tunable in /etc/default/LifeKeeper to allow path IDs to be set based onSAS storage information:

MULTIPATH_SAS=TRUE

LifeKeeper on RHEL 6.0 cannot support reservations connected to an EMC Clariion

Two or more different storage can not be used concurrently in case of the parameter configurationof DMMP recovery kit is required for some storage model.

DMMP RK doesn't function correctly if the disk name ends with "p<number>".

The DMMP RK doesn't function correctly if the disk name ends with "p<number>".

Workaround: Do not create disk names ending in "p<number>".

SIOS Protection Suite for LinuxPage 258

Page 279: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

DB2Recovery Kit

DB2 Recovery Kit

DescriptionDB2 Recovery Kit reports unnecessary error

If DB2 is installed on a shared disk, the followingmessagemay be seen when extending a DB2 resource.

LifeKeeper was unable to add instance "%s" and/or its variables tothe DB2 registry.

This message will not adversely affect the behavior of the DB2 resource extend.

MD Recovery Kit

DescriptionMD Kit does not support mirrors created with “homehost”

The LifeKeeper MD Recovery Kit will not work properly with amirror created with the "homehost" feature. Where "homehost" is configured, LifeKeeper will use a unique ID that is improperly formatted such that in-service operations will fail.  On SLES 11 systems, the “homehost” will be set by default when amirror iscreated.  The version of mdadm that supports “homehost” is expected to be available on other distributionsand versions as well. When creating amirror, specify --homehost="" on the command line to disable thisfeature.  If a mirror already exists that has been created with the “homehost” setting, themirror must berecreated to disable the setting.  If a LifeKeeper hierarchy has already been built for amirror created with“homehost”, the hierarchy must be deleted and recreated after themirror has been built with the “homehost”disabled.

MD Kit does not support MD devices created on LVM devices

The LifeKeeper MD Recovery Kit will not work properly with anMD device created on an LVM device. When theMD device is created, it is given a name that LifeKeeper does not recognize. 

MD Kit configuration file entries in /etc/mdadm.conf not commented out

The LifeKeeper configuration file entries in /etc/mdadm.conf should be commented out after a reboot.These file entries are not commented out.

Local recovery not performed in large configurations

In some cases with large configurations (6 or more hierarchies), if a local recovery is triggered(sendevent), not all of the hierarchies are checked resulting in local recovery attempt failures.

SIOS Protection Suite for LinuxPage 259

Page 280: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

SAP DB / MaxDB Recovery Kit

DescriptionMirrors automatically started during boot

On some systems (for example those running RHEL 6), there is an AUTO entry in the configuration file(/etc/mdadm.conf) that will automatically start mirrors during boot (example:  AUTO +imsm +1.x –all). 

Solution: Since LifeKeeper requires that mirrors not be automatically started, this entry will need to beedited tomake sure that LifeKeeper mirrors will not be automatically started during boot.  The previousexample (AUTO +imsm +1.x –all) is telling the system to automatically start mirrors created using imsmmetadata and 1.x metadataminus all others.  This entry should be changed to "AUTO -all", telling thesystem to automatically start everything “minus” all; therefore, nothing will be automatically started.  

Important:  If system critical resources (such as root) are usingMD, make sure that thosemirrors arestarted by other means while the LifeKeeper protectedmirrors are not.

SAP DB / MaxDB Recovery Kit

DescriptionExtend fails when MaxDB is installed and configured on shared storage.

Workaround: Install a local copy of MaxDB on the backup node(s) in the same directories as the primarysystem.

SIOS Protection Suite for LinuxPage 260

Page 281: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Sybase ASE Recovery Kit

Sybase ASE Recovery Kit

SIOS Protection Suite for LinuxPage 261

Page 282: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Sybase ASE Recovery Kit

DescriptionUser Name/Password Issues:

l If the default user name is password-protected, the create UI does not detect this until afterall validation is complete

When creating the Sybase resource, you are prompted to enter the user name. The help tofront displays amessage that if no user is specified, the default of 'sa' will be used. However,no password validation is done for the default account at this time. When SIOS ProtectionSuite attempts to create the Sybase resource, the resource creation fails because thepassword has not been validated or entered. The password validation occurs on theuser/password dialog, but only when a valid user is actually entered on the user prompt. Evenif using the default user name, it must be specified during the create action.

l Password prompt skipped if no user name specified

User/password dialog skips the password prompt if you do not enter a user name. Whenupdating the user/password via the UI option, if you do not enter the Sybase user name, thedefault of ‘sa’ will be used and no password validation is done for the account. This causesthemonitoring of the database to fail with invalid credential errors. Even if using the defaultuser name, it must be specified during the update action. To fix this failure, perform thefollowing steps:

1. Verify that the required Sybase data files are currently accessible from the intendedserver. In most instances, this will be the backup server due to themonitoring andlocal recovery failure on the primary.

2. Start the Sybase database instance from the command line on this server (see theSybase product documentation for information on starting the databasemanually).

3. From the command line, change directory (cd) to the LKROOT/bin directory (/op-t/LifeKeeper/bin onmost installations).

4. Once in the bin directory, execute the following:

./ins_setstate –t <SYBASE_TAG> -S ISP

where <SYBASE_TAG> is the tag name of the Sybase resource

5. When the command completes, immediately execute theUpdate User/PasswordWizard from the UI and enter a valid user name, even if planning to use the Sybasedefault of ‘sa’. Note: TheUpdate User/Password Wizard can be accessed by right-clicking on the Sybase resource instance and selectingChange User-name/Password.

6. When the hierarchy has been updated on the local server, verify that the resource canbe brought in service on all nodes.

l Protecting backup server fails when Sybase local user name >= eight characters

The Sybase user namemust consist of less than eight characters. If the Sybase local username is greater than eight characters, the process and user identification checks used for

SIOS Protection Suite for LinuxPage 262

Page 283: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Sybase ASE Recovery Kit

Descriptionresource creation andmonitoring will fail. This will also prevent the protection of a validSybase Backup Server instance from being selected for protection. This problem is causedby the operating system translation of user names that are >= eight characters from the nameto the UID in various commands (for example, ps). Youmust use a user name that is lessthan eight characters long.

Resource Create Issue:

l Default Sybase install prompt is based on ASE 16.0 SP02 (/opt/sybase). During the SIOS Pro-tection Suite resource creation, the default prompt for the location of the Sybase installation showsup relative to Sybase Version 16.0 SP02 (/opt/sybase). Youmust manually enter or browse to thecorrect Sybase install location during the resource create prompt.

Extend Issues:

l The Sybase tag prompt on extend is editable but should not be changed. The Sybase tag canbe changed during extend, but this is not recommended. Using different tags on each server canlead to issues with remote administration via the command line.

Properties Page Issues:

l Image appears missing for the Properties pane update user/password. Instead of the properimage, a small square appears on the toolbar. Selecting this square will launch theUser/PasswordUpdate Wizard.

Sybase Monitor server is not supported in 15.7 or later with SIOS Protection Suite. If the SybaseMonitorserver process is configured in Sybase 15.7 or later, youmust use aGeneric Application (gen/app) resourceto protect this server process.

Remove command not recognized by Sybase on SLES 11

If bringing Sybase in service on the backup server, it must first be taken out of service on the primaryserver. In order for Sybase to recognize this command, youmust add a line in "locales/locales.dat"for the SIOS Protection Suite remotely executed Sybase command using "POSIX" as "vendor_locale".

Example:

locale = POSIX, us_english, utf8

Unable to detect that Sybase ARK is running

Symptom: Unable to detect that Sybase ARK is running.Cause: The Sybase ARK uses the default sql interface tool (isql). On some 64 bit systems, the isql tool isinstalled as isql64 and not isql.Solution: The isql64 tool can be copied, in the same path, to isql. Or a link can be created between theisql64 executable and isql.

SIOS Protection Suite for LinuxPage 263

Page 284: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

WebSphereMQRecovery Kit

WebSphere MQ Recovery Kit

DescriptionError when lksupport command is executed:

The following error can be output when lksupport command is executed in the caseMQ queuemanagerprotected by MQRK is set on the disk shared by NFS.

cat: <PATH>/mqm/qmgrs/tkqmgr/qm.ini: Operation not permitted

This happens because the root access to NFS area is prohibited. This error output doesn't cause anyproblem.

Quickcheck fails if queue has long messages if a message is in the test queue of size > 101characters the put/get fails and the queue fills up

Install fails if the only installed MQ is a relocated install (non-standard and not likely)

Package install fails if the MQ package does not have the default name

Compile samples fails if the software is not installed under /opt/mqm

If two listeners are defined for a single instance and one is set to manual and the other isautomatic failures can occur in create and quickCheck

Quick Service Protection (QSP) Recovery Kit

DescriptionUnexpected failover may occur when the operating system shutting down.

Workaround: Stop LifeKeeper by lkstop before stopping the OS. Or perform out-of-service for allresources, then you can confirm there is no ISP resources, then do shutdown theOS.

GUI TroubleshootingIf you are having problems configuring the LifeKeeper GUI from a remote system, see one of the followingtopics:

Java Plug-In Troubleshooting

Applet Troubleshooting

Network-Related Troubleshooting (GUI)

SIOS Protection Suite for LinuxPage 264

Page 285: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Network-Related Troubleshooting (GUI)

Network-Related Troubleshooting (GUI)LifeKeeper uses Java RMI (RemoteMethod Invocation) for communications betweenGUI clients andservers. Some potential problems may be related to RMI, and others are general network configurationproblems.

Long Connection Delays on Windows Platforms

From Sun FAQ:Most likely, your host's networking setup is incorrect. RMI uses the Java API networking classes, inparticular java.net.InetAddress, which will cause TCP/IP host name lookups for both host to address mappingand address to hostname. OnWindows, the lookup functions are performed by the nativeWindows socketlibrary, so the delays are not happening in RMI but in theWindows libraries. If your host is set up to use DNS,then this could be a problem with the DNS server not knowing about the hosts involved in communication, andwhat you are experiencing are DNS lookup timeouts. If this is the case, try specifying all thehostnames/addresses involved in the file \windows\system32\drivers\etc\hosts. The format of a typical hostfile is:

IPAddress Server Name

e.g.:

208.2.84.61 homer.somecompany.com homer

This should reduce the time it takes tomake the first lookup.

In addition, incorrect settings of the Subnet Mask andGateway address may result in connection delays andfailures. Verify with your Network Administrator that these settings are correct.

Running from a Modem:When you connect to a network in which the servers reside via modem (using PPP or SLIP), your computeracquires a temporary IP number for its operation. This temporary numbermay not be the one your hostnamemaps to (if it maps to anything at all), so in this case, youmust tell the servers to communicate with you by IPalone. To do this, obtain your temporary IP number by opening your modem connection window. This numberwill be used to set the hostname property for the GUI client.

To set the hostname for a browser with the Plugin, open the Java Plug-In Control Panel, and set thehostname for the client by adding the following to "Java Run Time Parameters".

-Djava.rmi.server.hostname=<MY_HOST>

To set the hostname for the HotJava browser, append the following to the hotjava command line:

-Djava.rmi.server.hostname=<MY_HOST>

For example:

-Djava.rmi.server.hostname=153.66.140.1

SIOS Protection Suite for LinuxPage 265

Page 286: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Primary Network Interface Down:

Primary Network Interface Down:The LifeKeeper GUI uses RemoteMethod Invocation (RMI) to maintain contact between theGUI client andthe GUI server. In nearly every case, contact is established over the primary network interface to the server.This means that if the server's primary Ethernet interface goes down, contact is lost and the GUI client showsthat server state as Unknown.

The only solution to this problem is to bring the server's primary Ethernet interface up again. Additionally, dueto limitations in RMI, this problem cannot be overcome by using amulti-homed server (server with multiplenetwork interfaces).

No Route To Host Exception:A socket could not be connected to a remote host because the host could not be contacted. Typically, thismeans that some link in the network between the local server and the remote host is down, or that the host isbehind a firewall.

Unknown Host Exception:The LifeKeeper GUI Client and Server use Java RMI (RemoteMethod Invocation) technology tocommunicate. For RMI to work correctly, the client and server must use resolvable hostname or IPaddresses. When unresolvable names, WINS names or unqualified DHCP names are used, this causes Javato throw an UnknownHostException.

This error messagemay also occur under the following conditions:

l Server name does not exist. Check for misspelled server name.

l Misconfigured DHCP servers may set the fully-qualified domain name of RMI servers to be the domainname of the resolver domain instead of the domain in which the RMI server actually resides. In thiscase, RMI clients outside the server's DHCP domain will be unable to contact the server because ofthe its incorrect domain name.

l The server is on a network that is configured to useWindows Internet Naming Service (WINS). Hoststhat are registered underWINS may not be reachable by hosts that rely solely upon DNS.

l The RMI client and server reside on opposite sides of a firewall. If your RMI client lies outside a firewalland the server resides inside of it, the client will not be able tomake any remote calls to the server.

When using the LifeKeeper GUI, the hostname supplied by the client must be resolvable from the server andthe hostname from the server must be resolvable by the client. The LifeKeeper GUI catches this exceptionand alerts the user. If the client cannot resolve the server hostname, this exception is caught andMessage115 is displayed. If the server cannot resolve the Client hostname, this exception is caught andMessage 116is displayed. Both thesemessages include the part of the Java exception which specifies the unqualifiedhostname that was attempted.

Included below are some procedures that may be used to test or verify that hostname resolution is workingcorrectly.

SIOS Protection Suite for LinuxPage 266

Page 287: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

FromWindows:

FromWindows:1. Verify communication with the Linux Server

From aDOS prompt, ping the target using the hostname:

ping <TARGET_NAME>

For Example:

ping homer

A reply listing the target's qualified hostname and IP address should be seen.

2. Verify proper configuration

l Check configuration of DNS or install a DNS server on your network.

l Check the settings forControlPanel->Network->Protocols->TCP/IP. Verify with yourNetwork Administrator that these settings are correct.

Note that the hostname in the DNS tab shouldmatch the name used on the localname server. This should alsomatch the hostname specified in the GUI errormessage.

l Try editing the hosts file to include entries for the local host and the LifeKeeper serversthat it will be connected to.

OnWindows 95/98 systems the hosts file is:

%windir%\HOSTS (for example, C:\WINDOWS\HOSTS).

Note: OnWindows 95/98, if the last entry in the hosts file is not concluded with acarriage-return/line-feed then the hosts file will not be read at all.

OnWindows NT systems the hosts file is:

%windir%\System32\DRIVERS\ETC\HOSTS(for example, C:\WINNT\System32\DRIVERS\ETC\HOSTS).

For example, if my system is calledHOSTCLIENT.MYDOMAIN.COM and usesIP address 153.66.140.1, add the following entry to the hosts file:

153.66.140.1 HOSTCLIENT.MYDOMAIN.COM HOSTCLIENT

3. Try setting the hostname property to be used by the GUI client. To do this from a browser with thePlugin, open the Java Plug-In Control Panel, and set the host name for the client by adding thefollowing to "Java Run Time Parameters":

Djava.rmi.server.hostname=<MY_HOST>

4. Check for Microsoft network-related patches at www.microsoft.com.

SIOS Protection Suite for LinuxPage 267

Page 288: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

From Linux:

From Linux:1. Verify communication with other servers by pinging the target server from Linux using its hostname or

IP address:

ping <TARGET_NAME>

For example:

ping homer

A reply listing the target's qualified hostname should be seen.

2. Verify that localhost is resolvable by each server in the cluster using ping with its hostname or IPaddress. If DNS is not implemented, edit the /etc/hosts file and add an entry for the localhost name.This entry can list either the IP address for the local server, or it can list the default entry (127.0.0.1).

3. Check that DNS is specified before NIS. DNS should be put before NIS in the hosts line of/etc/nsswitch.conf, and /etc/resolv.conf should point to a properly configured DNS server(s).

4. If DNS is not to be implemented or no other method works, edit the /etc/hosts file, and add an entry forthe hostname.

5. Try setting the hostname property to be used by the GUI client. This will need to be changed for eachadministrator.

To do this from a browser with the Plugin, open the Java Plug-In Control Panel and set thehostname for the client by adding the following to "Java Run Time Parameters":

-Djava.rmi.server.hostname=<MY_HOST>

To do this from the HotJava browser, append the following to the hotjava command line:

-Djava.rmi.server.hostname=<MY_HOST>

For Example:

-Djava.rmi.server.hostname=153.66.140.1

-Djava.rmi.server.hostname= homer.somecompany.com

Unable to Connect to X Window Server:When running the LifeKeeper GUI application from a telnet session, you need to ensure that the GUI client isallowed to access the X Window Server on the LifeKeeper server. The LifeKeeper server must also be able toresolve the hostname or network address of the GUI client.

When you telnet into the LifeKeeper server to run the LifeKeeper GUI application, theDISPLAY environmentvariable should contain the client's host name and display number. For example, if you telnet into a servernamedServer1 from a client namedClient1, theDISPLAY environment variable should be set toClient1:0.When you run the LifeKeeper GUI application, it will try to send the output to theDISPLAY name forClient1. If

SIOS Protection Suite for LinuxPage 268

Page 289: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Communication Paths Going Up and Down

Client1 is not allowed access to theXWindow Server, the LifeKeeper GUI application will fail with anexception.

When starting the LifeKeeper GUI as an application, if an error occurs indicating that you cannot connect totheXWindow Server or that you cannot open the client DISPLAY name, try the following:

1. Set the display variable using the host name or IP address. For example:

DISPLAY=Client1.somecompany.com:0

DISPLAY=172.17.5.74:0

2. Use the xhost or xauth command to verify that the client may connect to theXWindow Server onthe LifeKeeper server.

3. Add a DNS entry for the client or add an entry for the client to the local hosts file on the LifeKeeperserver. Verify communication with the client by pinging the client from the LifeKeeper server using itshostname or IP address.

Communication Paths Going Up and DownIf you find the communication paths failing then coming back up repeatedly (the LifeKeeper GUI showing themas Alive, then Dead, then Alive), the heartbeat tunables may not be set to the same values on all servers inthe cluster.

This situation is also possible if the tunable name is misspelled in the LifeKeeper defaults file/etc/default/LifeKeeper on one of the servers.

Suggested Action1. Shut down LifeKeeper on all servers in the cluster.

2. On each server in the cluster, check the values and spelling of the LCMHBEATTIME andLCMNUMHBEATS tunables in /etc/default/LifeKeeper.  Ensure that for each tunable, the valuesare the same on ALL servers in the cluster.

3. Restart LifeKeeper on all servers.

Incomplete Resource CreatedIf the resource setup process is interrupted leaving instances only partially created, youmust perform manualcleanup before attempting to install the hierarchy again. Use the LifeKeeper GUI to delete any partially-created resources. See Deleting a Hierarchy from All Servers for instructions. If the hierarchy list does notcontain these resources, youmay need to use the ins_remove (see LCDI-instances(1M)) and dep_remove(LCDI-relationship(1M)) to clean up the partial hierarchies.

Incomplete Resource Priority ModificationA hierarchy in LifeKeeper is defined as all resources associated by parent/child relationships. For resourcesthat havemultiple parents, it is not always easy to discern from theGUI all of the root resources for a

SIOS Protection Suite for LinuxPage 269

Page 290: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Restoring Your Hierarchy to a Consistent State

hierarchy. In order to maintain consistency in a hierarchy, LifeKeeper requires that priority changes bemadeto all resources in a hierarchy for each server. TheGUI enforces this requirement by displaying all rootresources for the hierarchy selected after the OK or Apply button is pressed. You have the opportunity at thispoint to accept all of these roots or cancel the operation. If you accept the list of roots, the new priority valueswill be applied to all resources in the hierarchy.

You should ensure that no other changes are beingmade to the hierarchy while the Resource Propertiesdialog for that hierarchy is displayed. Before you have edited a priority in the Resource Properties dialog, anychanges beingmade to LifeKeeper are dynamically updated in the dialog. Once you have begunmakingchanges, however, the values seen in the dialog are frozen even if underlying changes are beingmade inLifeKeeper. Only after selecting the Apply or OK button will you be informed that changes weremade that willprevent the priority change operation from succeeding as requested.

In order to minimize the likelihood of unrecoverable errors during a priority change operation involvingmultiplepriority changes, the program will execute amultiple priority change operation as a series of individualchanges on one server at a time. Additionally, it will assign temporary values to priorities if necessary toprevent temporary priority conflicts during the operation. These temporary values are above the allowedmaximum value of 999 andmay be temporarily displayed in the GUI during the priority change. Once theoperation is completed, these temporary priority values will all be replaced with the requested ones. If an erroroccurs and priority values cannot be rolled back, it is possible that some of these temporary priority values willremain. If this happens, follow the suggested procedure outlined below to repair the hierarchy.

Restoring Your Hierarchy to a Consistent StateIf an error occurs during a priority change operation that prevents the operation from completing, the prioritiesmay be left in an inconsistent state. Errors can occur for a variety of reasons, including system andcommunications path failure. If an error occurs after the operation has begun, and before it finishes, and theprogram was not able to roll back to the previous priorities, you will see amessage displayed that tells youthere was an error during the operation and the previous priorities could not be restored. If this should happen,you should take the following actions to attempt to restore your hierarchy to a consistent state:

1. If possible, determine the source of the problem. Check for system or communications path failure.Verify that other simultaneous operations were not occurring during the same time that the priorityadministration program was executing.

2. If possible, correct the source of the problem before proceeding. For example, a failed system orcommunications pathmust be restored before the hierarchy can be repaired.

3. Re-try the operation from the Resource Properties dialog.

4. If making the change is not possible from the Resource Properties dialog, it may be easier to attempt torepair the hierarchy using the command line hry_setpri. This script allows priorities to be changedon one server at a time and does not work through theGUI.

5. After attempting the repair, verify that the LifeKeeper databases are consistent on all servers byexecuting the eqv_list command for all servers where the hierarchy exists and observing thepriority values returned for all resources in the hierarchy.

6. As a last resort, if the hierarchy cannot be repaired, youmay have to delete and re-create the hierarchy.

SIOS Protection Suite for LinuxPage 270

Page 291: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

NoShared Storage FoundWhenConfiguring a Hierarchy

No Shared Storage Found When Configuring a HierarchyWhen you are configuring resource hierarchies there are a number of situations that might cause LifeKeeper toreport a "No shared storage" message:

Possible Cause: Communications paths are not defined between the servers with the shared storage. Whena hierarchy is configured on the shared storage device, LifeKeeper verifies that at least one other server in thecluster can also access the storage.

Suggested Action:Use the LifeKeeper GUI or lcdstatus (1M) to verify that communication paths areconfigured and that they are active.

Possible Cause:Communication paths are not operational between the servers with the shared storage.

Suggested Action:Use the LifeKeeper GUI or lcdstatus (1M) to verify that communication paths areconfigured and that they are active.

Possible Cause: Linux is not able to access the shared storage. This could be due to a driver not beingloaded, the storage not being powered up when the driver was loaded, or the storage device is not configuredproperly. 

Suggested Action:Verify that the device is properly defined in /proc/scsi/scsi.

Possible Cause: The storage was not configured in Linux before LifeKeeper started. During the startup ofLifeKeeper, all SCSI devices are scanned to determine themappings for devices. If a device is configured(powered on, connected or driver loaded) after LifeKeeper is started, then LifeKeeper must be stopped andstarted again to be able to configure and use the device.

Suggested Action:Verify that the device is listed in$LKROOT/subsys/scsi/resources/hostadp/device_info where $LKROOT is by default/opt/LifeKeeper. If the device is not listed in this file, LifeKeeper will not try to use the device.

Possible Cause: The storage is not supported. The Supported Storage List shows specific SCSI devicesthat are supported and have been tested with LifeKeeper. However, note that this list includes known devices;theremay be other devices that SIOS Technology Corp. has not tested whichmeet LifeKeeper requirements.

Suggested Action:Verify that the device is listed in$LKROOT/subsys/scsi/resources/hostadp/device_info where $LKROOT is by default/opt/LifeKeeper. If the device is listed in this file but the ID following the device name begins with "NU-"then LifeKeeper was unable to get a unique ID from the device. Without a unique ID LifeKeeper cannotdetermine if the device is shared.

SIOS Protection Suite for LinuxPage 271

Page 292: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Recovering from a LifeKeeper Server Failure

Possible Cause: The storagemay require a specific LifeKeeper software to be installed before the devicecan be used by LifeKeeper. Examples are the steeleye-lkRAW kit to enable Raw I/O support and thesteeleye-lkDR software to enable data replication.

Suggested Action:Verify that the necessary LifeKeeper packages are installed on each server. See theSPS for Linux Release Notes for software requirements.

Additional Tip:

The test_lk(1M) tool can be used to help debug storage and communication problems.

Recovering from a LifeKeeper Server FailureIf any server in your LifeKeeper cluster experiences a failure that causes re-installation of the operatingsystem (and thus LifeKeeper), you will have to re-extend the resource hierarchies from each server in thecluster. If any server in the cluster has a shared equivalency relationship with the re-installed server, however,LifeKeeper will not allow you to extend the existing resource hierarchy to the re-installed server. LifeKeeperwill also not allow you to unextend the hierarchy from the re-installed server because the hierarchy does notreally exist on the server that was re-installed.

Suggested Action:1. On each server where the resource hierarchies are configured, use the eqv_list command to obtain a

list of all the shared equivalencies (see LCDI-relationship for details).

The example below shows the command and resulting output for the IP resource iptag on server1 andserver2 where server2 is the server that was re-installed and server1 has the hierarchy configured:

eqv_list -f:

server1:iptag:server2:iptag:SHARED:1:10

2. On each server where the resource hierarchies are configured, use eqv_remove tomanually removethe equivalency relationship for each resource in the hierarchy (see LCDI-relationship fordetails).

For example, execute the following command on server1 using the example from step 1 above:

eqv_remove -t iptag -S server2 -e SHARED

3. In clusters with more than two servers, steps 1-2 should be repeated on each server in the clusterwhere equivalency relationships for these resource hierarchies are defined.

4. Finally, extend each resource hierarchy from the server where the resource hierarchy is in-service tothe re-installed server using the GUI.

Recovering from a Non-Killable ProcessIf a process is not killable, LifeKeeper may not be able to unmount a shared disk partition. Therefore, the

SIOS Protection Suite for LinuxPage 272

Page 293: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Recovering From A Panic During A Manual Recovery

resource cannot be brought into service on the other system. The only way to recover from a non-killableprocess is to reboot the system.

Recovering From A Panic During A Manual RecoveryA PANIC duringmanual switchover may cause incomplete recovery. If a PANIC or other major system failureoccurs during amanual switchover, complete automatic recovery to the back-up system cannot be assured.Check the backup system tomake sure all resources required to be in-service are in-service. If they are not in-service, use the LifeKeeper GUI tomanually bring themissing resources into service. See Bringing aResource In-Service for instructions.

Recovering Out-of-Service HierarchiesAs a part of the recovery following the failure of a LifeKeeper server, resource hierarchies that are configuredon the failed server, but are not in-service anywhere at the time of the server failure, are recovered on thehighest priority alive server at the time of the failure. This is the case nomatter where the out-of-servicehierarchy was last in-service, including the failed server, the recovering server, or some other server in thehierarchy.

Resource Tag Name Restrictions

Tag Name LengthAll tags within LifeKeeper may not exceed the 256 character limit.

Valid "Special" Characters- _ . /

However, the first character in a tag should not contain "." or "/".

Invalid Characters+ ; : ! @ # $ * = "space"

Serial (TTY) Console WARNINGIf any part of the serial console data path is unreliable or goes out of service, users who have a serial (RS-232TTY) console can experience severe problems with LifeKeeper service. During operation, LifeKeepergenerates consolemessages. If your configuration has a serial console (instead of the standard VGAconsole), the entire data path from LifeKeeper to the end-user terminal must be operational in order to ensurethe delivery of these consolemessages.

SIOS Protection Suite for LinuxPage 273

Page 294: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Taking the System to init state S WARNING

If there is any break in the data path—such as terminal powered off, modem disconnected, or cable loose—the Linux STREAMS facility queues the consolemessage. If the STREAMS queue becomes full, the Unixkernel suspends LifeKeeper until the STREAMS buffer queue again has room for moremessages. Thisscenario could cause LifeKeeper to HANG.

Note: The use of serial consoles in a LifeKeeper environment is strongly discouraged and the use of the VGAconsole is recommended. If youmust use a serial console, be sure that your serial console is turned on, thecables and optional modems are connected properly, and that messages are being displayed.

Taking the System to init state S WARNINGWhen LifeKeeper is operational, the systemmust not be taken directly to init state S. Due to the operation ofthe Linux init system, such a transition causes all the LifeKeeper processes to be killed immediately andmayprecipitate a fastfail. Instead, you should either stop LifeKeeper manually (using/etc/init.d/lifekeeper stop-nofailover) or take the system first to init state 1 followed by initstate S.

Thread is Hung Messages on Shared StorageIn situations where the device checking threads are not completing fast enough, this can causemessages tobe placed in the LifeKeeper log stating that a thread is hung. This can cause resources to bemoved from oneserver to another and in worse case, cause a server to be killed.

ExplanationThe FAILFASTTIMER (in /etc/default/LifeKeeper) defines the number of seconds that each deviceis checked to assure that it is functioning properly, and that all resources that are owned by a particularsystem are still accessible by that system and owned by it. The FAILFASTTIMER needs to be as small aspossible to guarantee this ownership and to provide the highest data reliability. However if a device is busy, itmay not be able to respond at peak loads in the specified time. When a device takes longer than theFAILFASTTIMER then LifeKeeper considers that device as possibly hung. If a device has not respondedafter 3 loops of the FAILFASTTIMER time period then LifeKeeper attempts to perform recovery as if thedevice has failed. The recovery process is defined by the tunable SCSIERROR. Depending on the setting ofSCSIERROR the action can be a sendevent to perform local recovery and then a switchover if that fails or itcan cause the system to halt.

Suggested Action:In cases where a device infrequently has a hungmessage printed to the error log followed by amessage that itis no longer hung and the number in parenthesis is always 1, there should be no reason for alarm. However, ifthis message is frequently in the log, or the number is 2 or 3, then two actions may be necessary:

l Attempt to decrease the load on the storage. If the storage is taking longer than 3 times theFAILFASTTIMER (3 times 5 or 15 seconds by default) then one should consider the load that is beingplaced on the storage and re-balance the load to avoid these long I/O delays. This will not only allowLifeKeeper to check the devices frequently, but it should also help the performance of the applicationusing that device.

SIOS Protection Suite for LinuxPage 274

Page 295: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Suggested Action:

l If the load can not be reduced, then the FAILFASTTIMER can be increased from the default 5seconds. This value should be as low as possible so slowly increase the value until themessages nolonger occur, or occur infrequently.

Note: When the FAILFASTTIMER value is modified LifeKeeper must be stopped and restarted before thenew value will take affect.

SIOS Protection Suite for LinuxPage 275

Page 296: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

SIOS DataKeeper for Linux

IntroductionSIOS DataKeeper for Linux provides an integrated datamirroring capability for LifeKeeper environments.  Thisfeature enables LifeKeeper resources to operate in shared and non-shared storage environments.

Mirroring with SIOS DataKeeper for Linux

How SIOS DataKeeperWorks

Mirroring with SIOS DataKeeper for LinuxSIOS DataKeeper for Linux offers an alternative for customers who want to build a high availability cluster(using SIOS LifeKeeper) without shared storage or who simply want to replicate business-critical data in real-time between servers. 

SIOS DataKeeper uses either synchronous or asynchronous volume-level mirroring to replicate data from theprimary server (mirror source) to one or more backup servers (mirror targets).  

DataKeeper FeaturesSIOS DataKeeper includes the following features:

l Allows data to be reliably, efficiently and consistently mirrored to remote locations over any TCP/IP-based Local Area Network (LAN) orWide Area Network (WAN).

l Supports synchronous or asynchronous mirroring.

l Transparent to the applications involved because replication is done at the block level below the filesystem.

l Supports multiple simultaneous mirror targets including cascading failover to those targets when usedwith LifeKeeper.

l Built-in network compression allows higher maximum throughput onWide Area Networks.

l Supports all major file systems (see the SPS for Linux Release Notes product description for moreinformation regarding journaling file system support).

l Provides failover protection for mirrored data.

l Integrates into the LifeKeeper Graphical User Interface.

SIOS Protection Suite for LinuxPage 276

Page 297: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Synchronous vs. Asynchronous Mirroring

l Fully supports other LifeKeeper Application Recovery Kits.

l Automatically resynchronizes data between the primary server and backup servers upon systemrecovery.

l Monitors the health of the underlying system components and performs a local recovery in the event offailure.

l Supports STONITH devices for I/O fencing. For details, refer to the STONITH topic. 

Synchronous vs. Asynchronous MirroringUnderstanding the differences between synchronous and asynchronous mirroring will help you choose theappropriate mirroringmethod for your application environment.

Synchronous Mirroring

SIOS DataKeeper provides real-timemirroring by employing a synchronous mirroring technique in which datais written simultaneously on the primary and backup servers. For each write operation, DataKeeper forwardsthe write to the target device(s) and awaits remote confirmation before signaling I/O completion.  Theadvantage of synchronous mirroring is a high level of data protection because it ensures that all copies of thedata are always identical. However, the performancemay suffer due to the wait for remote confirmation,particularly in aWAN environment.

Asynchronous Mirroring

With asynchronous mirroring, each write is made to the source device and then a copy is queued to betransmitted to the target device(s). This means that at any given time, theremay be numerous committedwrite transactions that are waiting to be sent from the source to the target device.  The advantage ofasynchronous mirroring is better performance because writes are acknowledged when they reach the primarydisk, but it can be less reliable because if the primary system fails, any writes that are in the asynchronouswrite queue will not be transmitted to the target. Tomitigate this issue, SIOS DataKeeper makes an entry toan intent log file for every write made to the primary device.If a large amount of data is written, the I/Operformancemay decrease temporarily because that data takes priority in the queue for transmission to theother nodes.

The intent log is a bitmap file indicating which data blocks are out of sync between the primary and targetmirrors. In the event of a server failure, the intent log can be used to avoid a full resynchronization (or resync)of the data.

SIOS Protection Suite for LinuxPage 277

Page 298: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

How SIOS DataKeeperWorks

INFORMATION: When creating async mirrors, DataKeeper uses, by default, write-behind withno value to set the number of outstanding target write operations to the default of 256. If theLKDR_ASYNC_LIMIT value is set in the defaults file to something greater than 2, this value willbe used.

Once the number of outstanding write operations to the target reaches this value, themirror willrevert to synchronous mode until the number drops below the set value.

Assuming a 4K block size, if the value is left at the default (asynchronous limit of 256 writes), youwould have amaximum of 1MB of data in transit.

Note: The intent log can be used in both asynchronous and synchronous mirroringmodes, but the intent logwith asynchronous mirroring is supported only with a 2.6.16 or higher Linux kernel.

How SIOS DataKeeper WorksSIOS DataKeeper creates and protects NetRAID devices.  A NetRAID device is a RAID1 device thatconsists of a local disk or partition and a Network Block Device (NBD) as shown in the diagram below.

A LifeKeeper supported file system can bemounted on a NetRAID device like any other storage device.  Inthis case, the file system is called a replicated file system.  LifeKeeper protects both the NetRAID device andthe replicated file system.

The NetRAID device is created by building the DataKeeper resource hierarchy. Extending the NetRAIDdevice to another server will create the NBD device andmake the network connection between the twoservers.  SIOS DataKeeper starts replicating data as soon as the NBD connection is made.

SIOS Protection Suite for LinuxPage 278

Page 299: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Synchronization (and Resynchronization)

The nbd-client process executes on the primary server and connects to the nbd-server process running on thebackup server.

Synchronization (and Resynchronization)After the DataKeeper resource hierarchy is created and before it is extended, it is in a degradedmode; that is,data will be written to the local disk or partition only. Once the hierarchy is extended to the backup (target)system, SIOS DataKeeper synchronizes the data between the two systems and all subsequent writes arereplicated to the target. If at any time the data gets “out-of-sync” (i.e., a system or network failure occurs)SIOS DataKeeper will automatically resynchronize the data on the source and target systems. If themirrorwas configured to use an intent log (bitmap file), SIOS DataKeeper uses it to determine what data is out-of-sync so that a full resynchronization is not required. If themirror was not configured to use a bitmap file, then afull resync is performed after any interruption of data replication.

Standard Mirror ConfigurationThemost commonmirror configuration involves two servers with amirror established between local disks orpartitions on each server, as shown below. Server1 is considered the primary server containing themirrorsource. Server2 is the backup server containing themirror target.

N+1 ConfigurationA commonly used variation of the standardmirror configuration above is a cluster in which two or moreservers replicate data to a common backup server. In this case, eachmirror sourcemust replicate to aseparate disk or partition on the backup server, as shown below.

SIOS Protection Suite for LinuxPage 279

Page 300: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Multiple Target Configuration

Multiple Target ConfigurationWhen used with an appropriate Linux distribution and kernel version 2.6.7 or higher, SIOS DataKeeper canalso replicate data from a single disk or partition on the primary server to multiple backup systems, as shownbelow.

SIOS Protection Suite for LinuxPage 280

Page 301: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

SIOS DataKeeper Resource Hierarchy

A given source disk or partition can be replicated to amaximum of 7mirror targets, and eachmirror targetmust be on a separate system (i.e. a source disk or partition cannot bemirrored tomore than one disk orpartition on the same target system).

This type of configuration allows the use of LifeKeeper’s cascading failover feature, providingmultiple backupsystems for a protected application and its associated data.

SIOS DataKeeper Resource HierarchyThe following example shows a typical DataKeeper resource hierarchy as it appears in the LifeKeeper GUI:

SIOS Protection Suite for LinuxPage 281

Page 302: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Failover Scenarios

The resource datarep-ext3-sdr is the NetRAID resource, and the parent resource ext3-sdr is the file systemresource. Note that subsequent references to the DataKeeper resource in this documentation refer to bothresources together. Because the file system resource is dependent on the NetRAID resource, performing anaction on the NetRAID resource will also affect the file system resource above it.

Failover ScenariosThe following four examples show what happens during a failover using SIOS DataKeeper. In theseexamples, the LifeKeeper for Linux cluster consists of two servers, Server 1 (primary server) and Server 2(backup server). 

Scenario 1

Server 1 has successfully completed its replication to Server 2 after which Server 1 becomes inoperable.

Result: Failover occurs. Server 2 now takes on the role of primary server and operates in a degradedmode(with no backup) until Server 1 is again operational. SIOS DataKeeper will then initiate a resynchronizationfrom Server 2 to Server 1. This will be a full resynchronization on kernel 2.6.18 and lower. On kernels 2.6.19and later or with Red Hat Enterprise Linux 5.4 kernels 2.6.18-164 or later (or a supported derivative of Red Hat5.4 or later), the resynchronization will be partial, meaning only the changed blocks recorded in the bitmap fileson the source and target will need to be synchronized.

Note: SIOS DataKeeper sets the following flag on the server that is currently acting as themirror source:

$LKROOT/subsys/scsi/resources/netraid/$TAG_last_owner

When Server 1 fails over to Server 2, this flag is set on Server 2.Thus, when Server 1 comes back up; SIOSDataKeeper removes the last owner flag from Server1. It then begins resynchronizing the data from Server 2to Server 1.

SIOS Protection Suite for LinuxPage 282

Page 303: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Scenario 2

Scenario 2

Considering scenario 1, Server 2 (still the primary server) becomes inoperable during the resynchronizationwith Server 1 (now the backup server).

Result: Because the resynchronization process did not complete successfully, there is potential for datacorruption.  As a result, LifeKeeper will not attempt to fail over the DataKeeper resource to Server 1. Onlywhen Server 2 becomes operable will LifeKeeper attempt to bring the DataKeeper resource in-service (ISP) onServer 2.

Scenario 3

Both Server 1 (primary) and Server 2 (target) become inoperable. Server 1 (primary) comes back up first.

Result: Server 1 will not bring the DataKeeper resource in-service. The reason is that if a source server goesdown, and then it cannot communicate with the target after it comes back online, it sets the following flag:

$LKROOT/subsys/scsi/resources/netraid/$TAG_data_corrupt

This is a safeguard to avoid resynchronizing data in the wrong direction. In this case you will need to force themirror online on Server1, which will delete the data_corrupt flag and bring the resource into service on Server1. See ForceMirror Online.

Note: The user must be certain that Server 1 was the last primary before removing the $TAG_data_corruptfile. Otherwise data corruptionmight occur. You can verify this by checking for the presence of the last_ownerflag.

Scenario 4

Both Server 1 (primary) and Server 2 (target) become inoperable. Server 2 (target) comes back up first.

SIOS Protection Suite for LinuxPage 283

Page 304: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Scenario 4

Result: LifeKeeper will not bring the DataKeeper resource ISP on Server 2. When Server 1 comes back up,LifeKeeper will automatically bring the DataKeeper resource ISP on Server 1.

SIOS Protection Suite for LinuxPage 284

Page 305: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Chapter A: Installation and Configuration

Before Configuring Your DataKeeper ResourcesThe following topics contain information for consideration before beginning to create and administer yourDataKeeper resources. They also describe the three types of DataKeeper resources. Please refer to theLifeKeeper Configuration section for instructions on configuring LifeKeeper Core resource hierarchies.

Hardware and Software RequirementsYour LifeKeeper configuration shouldmeet the following requirements prior to the installation of SIOSDataKeeper. 

Hardware Requirementsl Servers - Two ormore LifeKeeper for Linux supported servers.

l IP Network Interface Cards - Each server requires at least one network interface card. Remember,however, that a LifeKeeper cluster requires two communication paths; two separate LAN-basedcommunication paths using dual independent sub-nets are recommended, and at least one of theseshould be configured as a private network. However using a combination of TCP and TTY is alsosupported.

Note: Due to the nature of softwaremirroring, network traffic between servers can be heavy.Therefore, it is recommended that you implement a separate private network for your SIOSDataKeeper devices whichmay require additional network interface cards on each server.

l Disks or Partitions - Disks or partitions on the primary and backup servers that will act as the sourceand target disks or partitions. The target disks or partitions must be at least as large as the source diskor partition.

Note: With the release of SIOS Data Replication 7.1.1, it became possible to replicate an entire disk,one that has not been partitioned (i.e. /dev/sdd). Previous versions of SIOS Data Replication requiredthat a disk be partitioned (even if it was a single large partition; i.e. /dev/sdd1) before it could bereplicated. SIOS Data Replication 7.1.1 removed that restriction.

Software Requirementsl Operating System – SIOS DataKeeper can be used with any major Linux distribution based on the2.6 Linux kernel. See the SPS for Linux Release Notes for a list of supported distributions.Asynchronous mirroring and intent logs are supported only on distributions that use a 2.6.16 or laterLinux kernel. Multiple target support (i.e., support for more than 1mirror target) requires a 2.6.7 or later

SIOS Protection Suite for LinuxPage 285

Page 306: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

General Configuration

Linux kernel.

l LifeKeeper Installation Script - In most cases, you will need to install the following package (see the“Product Requirements” section in the SPS for Linux Release Notes for specific SIOS DataKeeperrequirements):

HADR-generic-2.6

This packagemust be installed on each server in your LifeKeeper cluster prior to theinstallation of SIOS DataKeeper.  The HADR package is located on the SPS InstallationImage File, and the appropriate package is automatically installed by the Installation setupscript.

l LifeKeeper Software -Youmust install the same version of the LifeKeeper Core on each of yourservers. Youmust also install the same version of each recovery kit that you plan to use on eachserver. See the SPS for Linux Release Notes for specific SPS requirements.

l SIOS DataKeeper software - Each server in your SPS cluster requires SIOS DataKeeper software.Please see the SPS for Linux Installation Guide for specific instructions on the installation and removalof SIOS DataKeeper.

General Configurationl The size of the target disks or partitions (on the backup servers) must be equal to or greater than thesize of the source disk or partition (on the primary server).

l Once the DataKeeper resource is created and extended, the synchronization process will deleteexisting data on the target disks or partitions and replace it with data from the source partition.

Network Configurationl The network path that is chosen for data replication between each pair of servers must also already beconfigured as a LifeKeeper communication path between those servers.  To change the network path,see Changing the Data Replication Path.

l Avoid a configuration to add virtual IP address with IP Recovery Kit to the network interface thatDataKeeper users for data replication.Since the communication line is temporarily disconnected while IP Recovery Kit uses the networkinterface, data replicationmay stop at unexpected timing and unnecessary resynchronizationmayoccur.

l This release of SIOS DataKeeper does not support Automatic Switchback for DataKeeperresources. Additionally, the Automatic Switchback restriction is applicable for any other LifeKeeperresource sitting on top of a DataKeeper resource.

l If using Fusion-io, see the Network section of Clustering with Fusion-io for further network con-figuration information.

Changing the Data Replication PathStarting with LK 7.1, IP addresses for mirror endpoints can bemodified using lk_chg_value. For example,

SIOS Protection Suite for LinuxPage 286

Page 307: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Determine Network Bandwidth Requirements

to change amirror endpoint from IP address 192.168.0.1 to 192.168.1.1:

1. /etc/init.d/lifekeeper stop-nofailover (lk_chg_value cannot be run whileLifeKeeper is running)

2. lk_chg_value -o 192.168.0.1 -n 192.168.1.1

3. /etc/init.d/lifekeeper start

Execute these commands on all servers involved in themirror(s) that are using this IP address.

Note: This commandwill alsomodify communication paths that are using the address in question.

Determine Network Bandwidth RequirementsPrior to installing SIOS DataKeeper, you should determine the network bandwidth requirements for replicatingyour current configuration whether you are employing virtual machines or using physical Linux servers. If youare employing virtual machines (VMs), use themethodMeasuring Rate of Change on a Linux System(Physical or Virtual) to measure the rate of change for the virtual machines that you plan to replicate. Thisvalue indicates the amount of network bandwidth that will be required to replicate the virtual machines.

After determining the network bandwidth requirements, ensure that your network is configured to performoptimally. If your network bandwidth requirements are above your current available network capacity, youmay need to consider one or more of the following options:

l Enable compression in SIOS DataKeeper (or in the network hardware, if possible)

l Increase your network capacity

l Reduce the amount of data being replicated

l Create a local, non-replicated storage repository for temporary data and swap files

l Manually schedule replication to take place daily at off-peak hours

Measuring Rate of Change on a Linux System (Physical orVirtual)DataKeeper for Linux can replicate data across any available network. In Multi-Site orWide Area Network(WAN) configurations, special considerationmust be given to the question, "Is there sufficient bandwidth tosuccessfully replicate the partition and keep themirror in themirroring state as the source partition is updatedthroughout the day?"

Keeping themirror in themirroring state is critical because a switchover of the partition is not allowed unlessthemirror is in themirroring state.

SIOS DataKeeper handles short bursts of write activity by adding that data to its async queue. However,make sure that over any extended period of time, the disk write activity for all replicated volumes combinedremains, on average, below the amount of change that DataKeeper and your network can transmit.

SIOS Protection Suite for LinuxPage 287

Page 308: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Measuring Basic Rate of Change

If the network capacity is not sufficient to keep up with the rate of change that occurs on your disks, and theasync queue fills up, themirror will revert to synchronous behavior, which can negatively affect performanceof the source server.

Measuring Basic Rate of ChangeUse the following command to determine file(s) or partition(s) to bemirrored. For example /dev/sda3, andthenmeasure the amount of data written in a day:

MB_START=`awk '/sda3 / { print $10 / 2 / 1024 }' /proc/diskstats`

…wait for a day …

MB_END=`awk '/sda3 / { print $10 / 2 / 1024 }' /proc/diskstats`

The daily rate of change, in MB, is then MB_END – MB_START.

SIOS DataKeeper canmirror daily, approximately:

T1 (1.5Mbps) - 14,000MB/day (14GB)

T3 (45Mbps) - 410,000MB/day (410GB)

Gigabit (1Gbps) - 5,000,000MB/day (5 TB)

Measuring Detailed Rate of ChangeThe best way to collect Rate of Change data is to log disk write activity for some period of time (one day, forinstance) to determine what the peak disk write periods are.

To track disk write activity, create a cron job which will log the timestamp of the system followed by a dump of/proc/diskstats. For example, to collect disk stats every twominutes, add the following link to /etc/crontab:

*/2 * * * * root ( date ; cat /proc/diskstats ) >> /path_to/filename.txt

…wait for a day, week, etc … then disable the cron job and save the resulting data file in a safe location.

Analyze Collected Detailed Rate of Change DataThe roc-calc-diskstats utility analyzes data collected in the previous step. This utility takes a/proc/diskstats output file that contains output, logged over time, and calculates the rate of change of thedisks in the dataset.

roc-calc-diskstats#!/usr/bin/perl

# Copyright (c) 2011, SIOS Technology, Corp.

# Author: Paul Clements

SIOS Protection Suite for LinuxPage 288

Page 309: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Analyze Collected Detailed Rate of Change Data

use strict;

sub msg {

printf STDERR @_;

}

sub dbg {

return if (! $ENV{'ROC_DEBUG'});

msg @_;

}

$0 =~ s@^.*/@@; # basename

sub usage {

msg "Usage: $0 <interval> <start-time> <iostat-data-file> [dev-list]\n";

msg "\n";

msg "This utility takes a /proc/diskstats output file that contains\n";

msg "output, logged over time, and calculates the rate of change of\n";

msg "the disks in the dataset\n";

msg "OUTPUT_CSV=1 set in env. dumps the full stats to a CSV file on STDERR\n";

msg "\n";

msg "Example: $0 1hour \"jun 23 12pm\" steeleye-iostat.txt sdg,sdh\n";

msg "\n";

msg "interval - interval between samples\n";

msg "start time - the time when the sampling starts\n";

msg "iostat-data-file - collect this with a cron job like:\n";

msg "\t0 * * * * (date ; cat /proc/diskstats) >> /root/diskstats.txt\n";

msg "dev-list - list of disks you want ROC for (leave blank for all)\n";

exit 1;

}

usage if (@ARGV < 3);

my $interval = TimeHuman($ARGV[0]);

my $starttime = epoch($ARGV[1]);

my $file = $ARGV[2];

my $blksize = 512; # /proc/diskstats is in sectors

my %devs = map { $_ => 1 } split /,/, $ARGV[3];

my %stat;

my $firsttime;

my $lasttime;

# datestamp divides output

my %days = ( 'Sun' => 1, 'Mon' => 1, 'Tue' => 1, 'Wed' => 1,

'Thu' => 1, 'Fri' => 1, 'Sat' => 1);

SIOS Protection Suite for LinuxPage 289

Page 310: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Analyze Collected Detailed Rate of Change Data

my %fields = ( 'major' => 0,

'minor' => 1,

'dev' => 2,

'reads' => 3,

'reads_merged' => 4,

'sectors_read' => 5,

'ms_time_reading' => 6,

'writes' => 7,

'writes_merged' => 8,

'sectors_written' => 9,

'ms_time_writing' => 10,

'ios_pending' => 11,

'ms_time_total' => 12,

'weighted_ms_time_total' => 13 );

my $devfield = $fields{'dev'};

my $calcfield = $ENV{'ROC_CALC_FIELD'} || $fields{'sectors_written'};

dbg "using field $calcfield\n";

open(FD, "$file") or die "Cannot open $file: $!\n";

foreach (<FD>) {

chomp;

@_ = split;

if (exists($days{$_[0]})) { # skip datestamp divider

if ($firsttime eq '') {

$firsttime = join ' ', @_[0..5];

}

$lasttime = join ' ', @_[0..5];

next;

}

next if ($_[0] !~ /[0-9]/); # ignore

if (!%devs || exists $devs{$_[$devfield]}) {

push @{$stat{$_[$devfield]}}, $_[$calcfield];

}

}

@{$stat{'total'}} = totals(\%stat);

printf "Sample start time: %s\n", scalar(localtime($starttime));

printf "Sample end time: %s\n", scalar(localtime($starttime + ((@{$stat{'total'}} - 1) * $interval)));

SIOS Protection Suite for LinuxPage 290

Page 311: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Analyze Collected Detailed Rate of Change Data

printf "Sample interval: %ss #Samples: %s Sample length: %ss\n", $interval, (@{$stat{'total'}} - 1), (@{$stat{'total'}} - 1) * $interval;

print "(Raw times from file: $firsttime, $lasttime)\n";

print "Rate of change for devices " . (join ', ', sort keys %stat) . "\n";

foreach (sort keys %stat) {

my @vals = @{$stat{$_}};

my ($max, $maxindex, $roc) = roc($_, $blksize, $interval, @vals);

printf "$_ peak:%sB/s (%sb/s) (@ %s) average:%sB/s (%sb/s)\n", HumanSize($max), HumanSize($max * 8), scalar localtime($starttime + ($maxindex *$interval)), HumanSize($roc), HumanSize($roc * 8);

}

# functions

sub roc {

my $dev = shift;

my $blksize = shift;

my $interval = shift;

my ($max, $maxindex, $i, $first, $last, $total);

my $prev = -1;

my $first = $_[0];

if ($ENV{'OUTPUT_CSV'}) { print STDERR "$dev," }

foreach (@_) {

if ($prev != -1) {

if ($_ < $prev) {

dbg "wrap detected at $i ($_ < $prev)\n";

$prev = 0;

}

my $this = ($_ - $prev) * $blksize / $interval;

if ($this > $max) {

$max = $this;

$maxindex = $i;

}

if ($ENV{'OUTPUT_CSV'}) { print STDERR "$this," }

}

$prev = $_; # store current val for next time around

$last = $_;

$i++;

}

if ($ENV{'OUTPUT_CSV'}) { print STDERR "\n" }

SIOS Protection Suite for LinuxPage 291

Page 312: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Analyze Collected Detailed Rate of Change Data

return ($max, $maxindex, ($last - $first) * $blksize / ($interval * ($i -1)));

}

sub totals { # params: stat_hash

my $stat = shift;

my @totalvals;

foreach (keys %$stat) {

next if (!defined($stat{$_}));

my @vals = @{$stat{$_}};

my $i;

foreach (@vals) {

$totalvals[$i++] += $_;

}

}

return @totalvals;

}

# converts to KB, MB, etc. and outputs size in readable form

sub HumanSize { # params: bytes/bits

my $bytes = shift;

my @suffixes = ( '', 'K', 'M', 'G', 'T', 'P' );

my $i = 0;

while ($bytes / 1024.0 >= 1) {

$bytes /= 1024.0;

$i++;

}

return sprintf("%.1f %s", $bytes, $suffixes[$i]);

}

# convert human-readable time interval to number of seconds

sub TimeHuman { # params: human_time

my $time = shift;

my %suffixes = ('s' => 1, 'm' => 60, 'h' => 60 * 60, 'd' => 60 * 60 * 24);

$time =~ /^([0-9]*)(.*?)$/;

$time = $1;

my $suffix = (split //, $2)[0]; # first letter from suffix

if (exists $suffixes{$suffix}) {

$time *= $suffixes{$suffix};

}

return $time;

SIOS Protection Suite for LinuxPage 292

Page 313: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Analyze Collected Detailed Rate of Change Data

}

sub epoch { # params: date

my $date = shift;

my $seconds = `date +'%s' --date "$date" 2>&1`;

if ($? != 0) {

die "Failed to recognize time stamp: $date\n";

}

return $seconds;

}

Usage:

# ./roc-calc-diskstats <interval> <start_time> <diskstats-data-file>[dev-list]

Usage Example (Summary only):

# ./roc-calc-diskstats 2m “Jul 22 16:04:01” /root/diskstats.txtsdb1,sdb2,sdc1 > results.txt

The above example dumps a summary (with per disk peak I/O information) to results.txt

Usage Example (Summary + Graph Data):

# export OUTPUT_CSV=1

# ./roc-calc-diskstats 2m “Jul 22 16:04:01” /root/diskstats.txtsdb1,sdb2,sdc1 2> results.csv > results.txt

The above example dumps graph data to results.csv and the summary (with per disk peak I/Oinformation) to results.txt

Example Results (from results.txt)

Sample start time: Tue Jul 12 23:44:01 2011

Sample end time: Wed Jul 13 23:58:01 2011

Sample interval: 120s #Samples: 727 Sample length: 87240s

(Raw times from file: Tue Jul 12 23:44:01 EST 2011, Wed Jul 13 23:58:01EST 2011)

Rate of change for devices dm-31, dm-32, dm-33, dm-4, dm-5, total

dm-31 peak:0.0 B/s (0.0 b/s) (@ Tue Jul 12 23:44:01 2011) average:0.0 B/s(0.0 b/s)

dm-32 peak:398.7 KB/s (3.1 Mb/s) (@ Wed Jul 13 19:28:01 2011)average:19.5 KB/s (156.2 Kb/s)

dm-33 peak:814.9 KB/s (6.4 Mb/s) (@ Wed Jul 13 23:58:01 2011)average:11.6 KB/s (92.9 Kb/s)

SIOS Protection Suite for LinuxPage 293

Page 314: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Graph Detailed Rate of Change Data

dm-4 peak:185.6 KB/s (1.4 Mb/s) (@ Wed Jul 13 15:18:01 2011) average:25.7KB/s (205.3 Kb/s)

dm-5 peak:2.7 MB/s (21.8 Mb/s) (@ Wed Jul 13 10:18:01 2011) average:293.0KB/s (2.3 Mb/s)

total peak:2.8 MB/s (22.5 Mb/s) (@ Wed Jul 13 10:18:01 2011)average:349.8 KB/s (2.7 Mb/s)

Graph Detailed Rate of Change DataTo help understand your specific bandwidth needs over time, SIOS has created a template spreadsheet calleddiskstats-template.xlsx. This spreadsheet contains sample data which can be overwritten with the datacollected by roc-calc-diskstats.

diskstats-template

1. Open results.csv, and select all rows, including the total column.

2. Open diskstats-template.xlsx, select the diskstats.csv worksheet.

SIOS Protection Suite for LinuxPage 294

Page 315: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Graph Detailed Rate of Change Data

3. In cell 1-A, right-click and select Insert Copied Cells.

4. Adjust the bandwidth value in the cell towards the bottom left of the worksheet to reflect an amount ofbandwidth you have allocated for replication.

Units: Megabits/second (Mb/sec)

Note: The cells to the right will automatically be converted to bytes/sec tomatch the raw datacollected.

5. Make a note of the following row/column numbers:

a. Total (row 6 in screenshot below)

b. Bandwidth (row 9 in screenshot below)

c. Last datapoint (columnR in screenshot below)

6. Select the bandwidth vs ROC worksheet.

7. Right-click on the graph and select Select Data...

a. Adjust Bandwidth Series

i. From theSeries list on the left, select bandwidth

ii. Click Edit

SIOS Protection Suite for LinuxPage 295

Page 316: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Graph Detailed Rate of Change Data

iii. Adjust theSeries Values: field with the following syntax:

“=diskstats.csv!$B$<row>:$<final_column>$<row>"

example: “=diskstats.csv!$B$9:$R:$9"

iv. Click OK

b. Adjust ROC Series

i. From theSeries list on the left, select ROC

ii. Click Edit

SIOS Protection Suite for LinuxPage 296

Page 317: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

SIOS DataKeeper for Linux Resource Types

iii. Adjust theSeries Values: field with the following syntax:

“=diskstats.csv!$B$<row>:$<final_column>$<row>"

example: “=diskstats.csv!$B$6:$R:$6"

iv. Click OK

c. Click OK to exit theWizard

8. The Bandwidth vs ROC graph will update. Please analyze your results to determine if you havesufficient bandwidth to support replication of your data.

SIOS DataKeeper for Linux Resource TypesWhen creating your DataKeeper resource hierarchy, LifeKeeper will prompt you to select a resource type.

SIOS Protection Suite for LinuxPage 297

Page 318: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Replicate New File System

There are several different DataKeeper resource types.  The following information can help you determinewhich type is best for your environment.

Replicate New File SystemChoosing a New Replicated File System will create/extend the NetRAID device, mount the givenmount pointon the NetRAID device and put both the LifeKeeper supported file system and the NetRAID device underLifeKeeper protection.  The local disk or partition will be formatted. CAUTION: All data will be deleted.

Replicate Existing File SystemChoosing Replicate Existing File System will use a currently mounted disk or partition and create a NetRAIDdevice without deleting the data on the disk or partition. SIOS DataKeeper will unmount the local disk orpartition, create the NetRAID device using the local disk or partition andmount themount point on theNetRAID device. It will then put both the NetRAID device and the LifeKeeper supported file system underLifeKeeper protection.

Important: If you are creating SIOS Protection Suite for Linux Multi-Site Cluster hierarchies, your applicationwill be stopped during the create process. You will need to restart your application once you have finishedcreating and extending your hierarchies.

DataKeeper ResourceChoosing a DataKeeper Resource will create/extend the NetRAID device and put it under LifeKeeperprotection without a file system.  Youmight choose this replication type if using a database that can use a rawI/O device. 

In order to allow the user continued data access, SIOS DataKeeper will not attempt to unmount and delete aNetRAID device if it is currently mounted. The user must manually unmount it before attempting amanualswitchover andmount it on the other server after themanual switchover.

Note: After the DataKeeper resource has been created, should you decide to protect amanually mounted filesystem with LifeKeeper, you can do so as follows:

1. Format the NetRAID device with a LifeKeeper supported file system.

2. Mount the NetRAID device.

3. Create and extend a file system hierarchy using the NetRAID device as if it were a shared storage diskor partition. 

LifeKeeper’s file system recovery kit will now be responsible for mounting/unmounting it during failover.

SIOS Protection Suite for LinuxPage 298

Page 319: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Resource Configuration Tasks

Resource Configuration TasksYou can perform all SIOS DataKeeper configuration tasks via the LifeKeeper Graphical User Interface (GUI). The LifeKeeper GUI provides a guided interface to configure, administer andmonitor SIOS DataKeeperresources.

OverviewThe following tasks are available for configuring SIOS DataKeeper: 

l Create a Resource Hierarchy - Creates a DataKeeper resource hierarchy.

l Delete a Resource Hierarchy - Deletes a DataKeeper resource hierarchy.

l Extend a Resource Hierarchy - Extends a DataKeeper resource hierarchy from the primary server toa backup server.

l Unextend a Resource Hierarchy - Unextends (removes) a DataKeeper resource hierarchy from asingle server in the LifeKeeper cluster.

l Create Dependency- Creates a child dependency between an existing resource hierarchy and anotherresource instance and propagates the dependency changes to all applicable servers in the cluster.

l Delete Dependency- Deletes a resource dependency and propagates the dependency changes to allapplicable servers in the cluster.

l In Service - Activates a resource hierarchy.

l Out of Service - Deactivatesa resource hierarchy.

l View/Edit Properties - View or edit the properties of a resource hierarchy.

Creating a DataKeeper Resource HierarchyIf you are creating a DataKeeper resource hierarchy in aMulti-Site Cluster environment, refer to theprocedures at the end of this section after you select theHierarchy Type.

Perform the following on your primary server:

1. Select Edit > Server > Create Resource Hierarchy

TheCreate Resource Wizard dialog will appear.

2. Select theData Replication option from the drop down list and click Next to continue.

3. You will be prompted for the following information. When theBack button is active in any of the dialogboxes, you can go back to the previous dialog box. This is helpful should you encounter any errorrequiring you to correct the previously entered information. Youmay click Cancel at any time to cancelthe entire creation process.

SIOS Protection Suite for LinuxPage 299

Page 320: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Creating a DataKeeper Resource Hierarchy

Field Tips

SwitchbackType

Youmust select intelligent switchback. This means that after a failover to thebackup server, an administrator must manually switch the DataKeeper resource backto the primary server.

CAUTION: This release of SIOS DataKeeper does not support AutomaticSwitchback for DataKeeper resources.  Additionally, the Automatic Switchbackrestriction is applicable for any other LifeKeeper resource sitting on top of aDataKeeper resource.

ServerSelect the name of the server where the NetRAID device will be created (typicallythis is your primary server). All servers in your cluster are included in the drop downlist box.

HierarchyType

Choose the data replication type you wish to create by selecting one of the following:

l Replicate New File System

l Replicate Existing File System

l DataKeeper Resource

Bitmap File

Select or edit the name of the bitmap file used for intent logging. If you chooseNone,then an intent log will not be used and every resynchronization will be a full resyncinstead of a partial resync.

Important: The bitmap file should not reside on a btrfs filesystem. Placing datareplication bitmap files on a btrfs filesystem will result in an "invalid argument" errorwhen LifeKeeper tries to configure themirror. The default location for the bitmap file isunder /opt/LifeKeeper. This default location should be changed if /opt/LifeKeeperresides on a btrfs filesystem.

Note: btrfs is currently not supported by SIOS Protection Suite for Linux.

EnableAsynchronousReplication ?

Select Yes to allow this replication resource to support asynchronous replication totarget systems. Select No if you will use synchronous replication to all targets.  Youwill be asked later to choose the actual type of replication, asynchronous orsynchronous, when the replication resource is extended to each target server.  (SeeMirroring with SIOS DataKeeper for a discussion of both replication types.) If youwant the replication to any of these targets to be performed asynchronously, youshould chooseYes here, even if the replication to other targets will be donesynchronously.

The next sequence of dialog boxes depends on whichHierarchy Type you have chosen. While someof the dialog boxes may be the same for each Hierarchy Type, their sequence and the requiredinformationmay be slightly different. The next three topics take you through the remainder of theHierarchy creation process.

l DataKeeper Resource

l Replicate New File System

l Replicate Existing File System

SIOS Protection Suite for LinuxPage 300

Page 321: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Extending Your Hierarchy

Extending Your HierarchyThis operation can be started from theEditmenu or initiated automatically upon completing theCreateResource Hierarchy option, in which case you should refer to Step 2 below.

1. On theEditmenu, select Resource, thenExtend Resource Hierarchy. ThePre-Extend Wizardappears. If you are unfamiliar with the Extend operation, click Next. If you are familiar with theLifeKeeperExtend Resource Hierarchy defaults and want to bypass the prompts forinput/confirmation, click Accept Defaults.

2. ThePre-Extend Wizardwill prompt you to enter the following information.

Note: The first two fields appear only if you initiated the Extend from theEditmenu.

Field Tips

TemplateServer

Select the TemplateServerwhere your DataKeeper resource hierarchy is currently inservice. It is important to remember that the Template Server you select now and theTag to Extend that you select in the next dialog box represent an in-service (activated)resource hierarchy. 

An error message will appear if you select a resource tag that is not in service on thetemplate server you have selected.  The drop down box in this dialog provides thenames of all the servers in your cluster.

Tag toExtend

This is the name of the DataKeeper instance you wish to extend from the templateserver to the target server.  The drop down box will list all the resources that you havecreated on the template server.

TargetServer Enter or select the server you are extending to.

SwitchbackType

Youmust select intelligent switchback. This means that after a failover to the backupserver, an administrator must manually switch the DataKeeper resource back to theprimary server.

CAUTION: This release of SIOS DataKeeper does not support Automatic Switchbackfor DataKeeper resources.  Additionally, the Automatic Switchback restriction isapplicable for any other LifeKeeper resource sitting on top of a SIOS DataKeeperresource.

TemplatePriority

Select or enter a Template Priority. This is the priority for the DataKeeper hierarchy onthe server where it is currently in service. Any unused priority value from 1 to 999 isvalid, where a lower numbermeans a higher priority (1=highest). The extend process willreject any priority for this hierarchy that is already in use by another system. The defaultvalue is recommended.

Note: This selection will appear only for the initial extend of the hierarchy.

SIOS Protection Suite for LinuxPage 301

Page 322: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Extending a DataKeeper Resource

Field Tips

TargetPriority

Select or enter the Target Priority. This is the priority for the new extended DataKeeperhierarchy relative to equivalent hierarchies on other servers. Any unused priority valuefrom 1 to 999 is valid, indicating a server’s priority in the cascading failover sequence forthe resource. A lower numbermeans a higher priority (1=highest). Note that LifeKeeperassigns the number “1” to the server on which the hierarchy is created by default. Thepriorities need not be consecutive, but no two servers can have the same priority for agiven resource.

After receiving themessage that the pre-extend checks were successful, click Next.

Depending upon the hierarchy being extended, LifeKeeper will display a series of information boxesshowing the Resource Tags to be extended, some of which cannot be edited.

3. Click Next to launch the Extend Resource Hierarchy configuration task.

4. The next section lists the steps required to complete the extension of a DataKeeper resource toanother server.

Extending a DataKeeper Resource1. After you have been notified that your pre-extend script has executed successfully, you will be

prompted for the following information:

Field Tips

Mount PointEnter the name of the file systemmount point on the target server. (This dialog will notappear if there is no LifeKeeper-protected filesystem associated with the DataKeeperResource.)

Root TagSelect or enter the Root Tag. This is a unique name for the filesystem resource instanceon the target server. (This dialog will not appear if there is no LifeKeeper-protectedfilesystem associated with the DataKeeper Resource.)

SIOS Protection Suite for LinuxPage 302

Page 323: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Extending a DataKeeper Resource

Field Tips

Target Diskor Partition

Select the disk or partition where the replicated file system will be located on the targetserver.

The list of disks or partitions in the drop down box contains all the available disks orpartitions that are not:

l already mounted

l swap disks or partitions

l LifeKeeper-protected disks or partitions

The drop down list will also filter out special disks or partitions, for example, root (/), boot(/boot), /proc, floppy and cdrom.

Note: The size of the target disk or partitionmust be greater than or equal to that of thesource disk or partition.

DataKeeperResourceTag

Select or enter the DataKeeper Resource Tag name.

Bitmap FileSelect or edit the name of the bitmap file used for intent logging. If you choose none,then an intent log will not be used, and every resynchronization will be a full resyncinstead of a partial resync.

ReplicationPath

Select the pair of local and remote IP addresses to use for replication between the targetserver and the other indicated server in the cluster.  The valid paths and their associatedIP addresses are derived from the set of LifeKeeper communication paths that havebeen defined for this same pair of servers. Due to the nature of DataKeeper, it is stronglyrecommended that you use a private (dedicated) network.

If the DataKeeper Resource has previously been extended to one or more targetservers, the extension to an additional server will loop through each of the pairings of thenew target server with existing servers, prompting for a Replication Path for each pair.

ReplicationType

Choose “synchronous” or “asynchronous” to indicate the type of replication that shouldbe used between the indicated pair of servers.

As for the previous Replication Path field, if the DataKeeper Resource has previouslybeen extended to one or more target servers, the extension to an additional server willloop through each of the pairings of the new target server with existing servers,prompting for a Replication Type for each pair.

2. Click Extend to continue.  An information box will appear verifying that the extension is beingperformed.

3. Click Finish to confirm the successful extension of your DataKeeper resource instance.

4. Click Done to exit theExtend Resources Hierarchymenu selection.

SIOS Protection Suite for LinuxPage 303

Page 324: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Unextending Your Hierarchy

Note: Be sure to test the functionality of the new instance on all servers by performing amanualswitchover. See Testing Your Resource Hierarchy for details. At this point, SIOS DataKeeper hasinitiated the data resynchronization from the source to the target disk or partition.  In the LifeKeeperGUI, the state of the DataKeeper resource on the target server is set to “Resyncing”.  Once theresynchronization is complete, the state will change to “Target” which is the normal Standby condition.

During resynchronization, the DataKeeper resource, and any resource that depends on it, will not beable to failover.  This is to avoid data corruption.

Unextending Your HierarchyTo remove a resource hierarchy from a single server in the LifeKeeper cluster, do the following:

1. On theEdit menu, select Resource thenUnextend Resource Hierarchy.

2. Select the Target Serverwhere you want to unextend the DataKeeper resource. It cannot be theserver where the DataKeeper resource is currently in service (active).

Note: If you selected theUnextend task by right-clicking from the right pane on an individual resourceinstance, this dialog box will not appear.

Click Next.

3. Select theDataKeeper Hierarchy to Unextend and click Next. (This dialog will not appear if youselected theUnextend task by right-clicking on a resource instance in either pane).

4. An information box appears confirming the target server and the DataKeeper resource hierarchy youhave chosen to unextend.  Click Unextend.

5. Another information box appears confirming that the DataKeeper resource was unextendedsuccessfully. Click Done to exit the Unextend Resource Hierarchymenu selection.

Note: At this point, data is not being replicated to the backup server.

Deleting a Resource HierarchyTo delete a DataKeeper resource from all servers in your LifeKeeper configuration, complete the followingsteps.

Note: It is recommended that you take the DataKeeper resource out of service BEFORE deleting it.Otherwise, themd andNetRAID devices will not be removed, and you will have to unmount the file systemmanually. See Taking a DataKeeper Resource Out of Service.

1. On theEdit menu, select Resource, thenDelete Resource Hierarchy.

2. Select the name of the TargetServerwhere you will be deleting your DataKeeper resource hierarchy.

Note: If you selected the Delete Resource task by right-clicking from either the left pane on a globalresource or the right pane on an individual resource instance, this dialog will not appear.

3. Select theHierarchy to Delete. (This dialog will not appear if you selected theDelete Resource taskby right-clicking on a resource instance in the left or right pane.) Click Next.

SIOS Protection Suite for LinuxPage 304

Page 325: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Taking a DataKeeper Resource Out of Service

4. An information box appears confirming your selection of the target server and the hierarchy you haveselected to delete. Click Delete.

5. Another information box appears confirming that the DataKeeper resource was deleted successfully.Click Done to exit.

Note: If the NetRAID device was mounted prior to the resource deletion then it will remainmounted. Otherwise, the NetRAID device will also be deleted.

Taking a DataKeeper Resource Out of ServiceTaking a DataKeeper resource out of service removes LifeKeeper protection for the resource. It breaks themirror, unmounts the file system (if applicable), stops themd device and kills the nbd server and client.

WARNING:  Do not take your DataKeeper resource out of service unless you wish to stopmirroring your dataand remove LifeKeeper protection. Use thePause operation to temporarily stopmirroring.

1. In the right pane of the LifeKeeper GUI, right-click on the DataKeeper resource that is in service.

2. Click Out of Service from the resource popupmenu.

3. A dialog box confirms the selected resource to be taken out of service. Any resource dependenciesassociated with the action are noted in the dialog. Click Next.

4. An information box appears showing the results of the resource being taken out of service. Click Done. 

Bringing a DataKeeper Resource In ServiceBringing a DataKeeper resource in service is similar to creating the resource: LifeKeeper  starts the nbdserver and client, starts themd device which synchronizes the data between the source and target devices,andmounts the file system (if applicable).

1. Right-click on theDataKeeper resource instance from the right pane.

2. Click In Service from the popupmenu. A dialog box appears confirming the server and resource thatyou have selected to bring into service. Click In Service to bring the resource into service.

3. An information box shows the results of the resource being brought into service. Any resourcedependencies associated with the action are noted in the confirmation dialog. Click Done.

Testing Your Resource HierarchyYou can test your DataKeeper resource hierarchy by initiating amanual switchover.  This will simulate afailover of the resource instance from the primary server to the backup server.

Performing a Manual Switchover from the LifeKeeper GUIYou can initiate amanual switchover from the LifeKeeper GUI by selectingEdit, Resource, and InService. For example, an in-service request executed on a backup server causes the DataKeeper resource hierarchy

SIOS Protection Suite for LinuxPage 305

Page 326: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Performing aManual Switchover from the LifeKeeper GUI

to be taken out-of-service on the primary server and placed in-service on the backup server. At this point, theoriginal backup server is now the primary server and original primary server has now become the backupserver.

After the switchover, the state of the DataKeeper resource on the target server is set to “Resyncing” in theLifeKeeper GUI.  Once the resynchronization is complete the state will change to “Target”, which is thenormal Standby condition.

Note: Manual failover is prevented for DataKeeper resources during resynchronization.

If you execute theOut of Service request, the resource hierarchy is taken out of service without bringing it inservice on the other server. The resource can only be brought in service on the same server if it was taken outof service during resynchronization.

SIOS Protection Suite for LinuxPage 306

Page 327: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Administration

Administering SIOS DataKeeper for LinuxThe following topics provide information to help in understanding andmanaging SIOS DataKeeper for Linuxoperations and issues after DataKeeper resources are created.

Viewing Mirror StatusYou can view theReplication Status dialog to see the following information about your mirror:

l Mirror status: Fully Operational, Paused, Resyncing, or Out Of Sync

l Synchronization status: percent complete

l Replication type: synchronous or asynchronous

l Replication direction: from source server to target server

l Bitmap: the state of the bitmap/intent log

l Network Compression Level: the compression level (if enabled)

To view theReplication Status dialog, do the following:

1. Click theViewmenu, and select Properties Panel.

2. Click theDataKeeper resource in the LifeKeeper status display.

or,

1. Right-click theDataKeeper resource in the LifeKeeper status display.

2. From the pop-upmenu, select Properties.

SIOS Protection Suite for LinuxPage 307

Page 328: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

GUI Mirror Administration

GUI Mirror AdministrationA SIOS DataKeeper mirror can be administered through the LifeKeeper GUI in two ways:

1. By enabling the Properties Panel and clicking the toolbar icons (shown in the screenshot).

SIOS Protection Suite for LinuxPage 308

Page 329: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

ForceMirror Online

Click on each icon below for a description

or,

2. By right-clicking the data replication resource and selecting an action from the popupmenu.

Force Mirror Online

Force Mirror Online should be used only in the event that both servers have become inoperable and theprimary server cannot bring the resource in service after rebooting. Selecting Force Mirror Online removesthe data_corrupt flag and brings the DataKeeper resource in service. For more information, see Primary servercannot bring the resource ISP in the Troubleshooting section.

SIOS Protection Suite for LinuxPage 309

Page 330: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Pause and Resume

Note: mirror_settings should be run on the target system(s) (or on all systems, if you want the settingsto take effect regardless of which system becomes themirror source). Themirror must be paused andrestarted before any settings changes will take effect.

Pause and Resume

Pause Mirror

Resume Mirror

Youmay pause amirror to temporarily stop all writes from being replicated to the target disk. For example, youmight pause themirror to take a snapshot of the target disk or to increase I/O performance on the sourcesystem during peak traffic times. 

When themirror is paused, it will bemounted for read (or read/write with kernel 2.6.19 or higher) access at thenormal filesystemmount point on the target system. Any data written to the target while themirror is pausedwill be overwritten when themirror is resumed.

Set Compression Level

The Network Compression Level may be set to a value from 0 to 9. A value of 0 disables compressionentirely. Level 1 is the fastest but least aggressive compression level, while Level 9 is the slowest but best.Network compression is typically effective only onWANs.

Command Line Mirror AdministrationIn addition to performing actions through the LifeKeeper GUI, themirror can also be administered using thecommand line. There are several commands (found in the $LKROOT/bin directory) that can be used toadminister a DataKeeper resource.

SIOS Protection Suite for LinuxPage 310

Page 331: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Mirror Actions

Mirror Actionsmirror_action <tag> <action> [source] [target(s)]

<tag> is the LifeKeeper resource tag of the DataKeeper resource

<action> is one of: pause, resume, force, fullresync

[source] (optional) is the current source system (if source is not specified, it will use thecurrent system the commandwas run from)

[target] (optional) is the target system (or list of systems) that the action should affect (iftarget(s) is not specified, it will use all of the applicable target(s))

Note: When using the force action, source argument is requiredto specify source node and target(s) argument is unnecessary.When using the pause, resume or fullresync action, ifspecifying target(s) argument, source argument is alsorequired.

Examples:

To pause themirror named datarep-ext3:

mirror_action datarep-ext3 pause

To resume replication from adam to both eve and sophocles:

mirror_action datarep-ext3 resume adam eve sophocles

To force themirror online on system eve:

mirror_action datarep-ext3 force eve

To resume replication and force a full resynchronization from adam to sophocles:

mirror_action datarep-ext3 fullresync adam sophocles

Mirror ResizeThe mirror_resize command performs an offlinemirror resize allowing a DataKeeper mirrorto be resized without having to delete and recreate the resource. It should be run on the sourcesystem and themirror must be out of service. The underlying disks should be resized beforeresizing themirror. The size of the underlying disks will be auto-detected and used as the newmirror size.

mirror_resize [-f] -s <size> <tag>

SIOS Protection Suite for LinuxPage 311

Page 332: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Recommended Steps for Mirror Resize:

<tag> is the tag of amirror resource

-f forces the resize without user prompts(not recommended)

-s <size> specifies alternatemirror size (in KB) This parameter is required.

Recommended Steps for Mirror Resize:1. Take themirror and all dependent resources out of service.

2. Perform the disk resize on the underlyingmirror disks (e.g., extend logical volume,expand LUN). Perform this on both source and target. (Remember that the target sizemust be greater than or equal to the source size.)

3. Run mirror_resize on the source system. This will update the internal metadata andbitmap for themirror to reflect the newly expanded disk size.

Example: mirror_resize –s <size in blocks> <tag>

4. Bring only themirror (i.e., datarep) resource in service. A resync of the newly expan-ded disk or partition will occur.

5. Perform the file system resize on themirror device (e.g. resize2fs /dev/mdX where X ismd device number for themirror being resized such as /dev/md0). Note: An fsckmaybe required before being able to resize the file system.

6. Bring the file system and application resources online.

NOTE: mirror_resize is NOT supported in multi-target/multi-site con-figurations.

Bitmap Administrationbitmap -a <num>|-c|-d|-x <size_kb>|-X <bitmap_file>

-a <num> adds the asynchronous write parameter to the bitmap file. It is needed if asynchronous mirror is upgraded to include an asynchronous target. The default value for<num> is 256. To calculate the correct value for this limit, see the Asynchronous MirroringInformation inMirroring with SIOS DataKeeper for Linux.

-c cleans the bitmap file (zeroes all the bits). This can be used to avoid a full resync in case anexact replica of the source disk exists on the target. Use this option with extremecaution.

-d dirties the bitmap file (sets all the bits to ones). This option can be used to force a fullresync, for example after a split-brain situation has occurred.

-m reads the bitmap and produces merge stream.

SIOS Protection Suite for LinuxPage 312

Page 333: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

MonitoringMirror Status via Command Line

-X <bitmap file> examines the bitmap file and displays useful information about thebitmap and themirror.

-x <size_kb> extends bitmap file to be valid with disk of size_kb.(Note: This option is only used internally for mirror resizing.)

In addition, the mdadm commandmay also be used to administer a DataKeeper resource, as the DataKeeperresource is actually anmd device. Refer to the mdadm(8)man page for details. Note: When using mdadm, besure to use the version that is located in $LKROOT/bin, as it is more up-to-date than the version includedwith the operating system.

Monitoring Mirror Status via Command LineNormally, themirror status can be checked using theReplication Status tab in theResource Propertiesdialog of the LifeKeeper GUI. However, youmay alsomonitor the status of your mirror by executing:

$LKROOT/bin/mirror_status <tag>

Example:# mirror_status datarep-ext3-sdr

[-] eve -> adam

 Status: Paused

 Type: Asynchronous

[-] eve -> sophocles

 Status: Resynchronizing

[=> ] 11%

 Resync Speed: 1573K/sec

 Type: Synchronous

Bitmap: 4895 bits (chunks), 4895 dirty (100.0%)

The following command may also be helpful:

cat /proc/mdstat

A sample mdstat file is shown below:

eve:~ # cat /proc/mdstat

Personalities : [raid1]

SIOS Protection Suite for LinuxPage 313

Page 334: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Server Failure

md1 : active raid1 nbd10[1] nbd8[3](F) sdb1[0]

313236 blocks super non-persistent [3/2] [UU_]

bitmap: 3/3 pages [12KB], 64KB chunk, file:/opt/LifeKeeper/bitmap_ext3-sdr

unused devices: <none/></tag>

Server FailureIf both your primary and backup servers become inoperable, your DataKeeper resource will be brought intoservice/activated only when both servers are functional again.  This is to avoid data corruption that couldresult from initiating the resynchronization in the wrong direction.  If you are certain that the only operableserver was the last server on which the resource was “In Service Protected” (ISP), then you can force itonline by right-clicking the DataKeeper resource and then selecting Force Mirror Online. 

ResynchronizationDuring the resynchronization of a DataKeeper resource, the state of this resource instance on the targetserver is “Resyncing”.  However, the resource instance is “Source” (ISP) on the primary server.  TheLifeKeeper GUI reflects this status by representing the DataKeeper resource on the target server with thefollowing icon:

and the DataKeeper resource on the primary server with this icon:

As soon as the resynchronization is complete, the resource state on the target becomes “Target” and the iconchanges to the following:

The following points should be noted about the resynchronization process:

l A SIOS DataKeeper resource and its parent resources cannot fail over to a target that was in thesynchronization process when the primary failed.

l If your DataKeeper resource is taken out of service/deactivated during the synchronization of a target

SIOS Protection Suite for LinuxPage 314

Page 335: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Avoiding Full Resynchronizations

server, that resource can only be brought back into service/activated on the same system or onanother target that is already in sync (if multiple targets exist), and the resynchronization will continue.

l If your primary server becomes inoperable during the synchronization process, any target server that isin the synchronization process will not be able to bring your DataKeeper resource into service.  Onceyour primary server becomes functional again, a resynchronization of themirror will continue.

Avoiding Full ResynchronizationsWhen replicating large amounts of data over aWAN link, it is desirable to avoid full resynchronizations whichcan consume large amounts of network bandwidth and time. With newer kernels, SIOS DataKeeper can avoidalmost all full resyncs by using its bitmap technology. However, the initial full resync, which occurs when themirror is first set up, cannot be avoided when existing data is being replicated. (For brand new data, SIOSDataKeeper does not perform a full resync, so the steps below are not necessary.)

There are a couple of ways to avoid an initial full resync when replicating existing data. Two recommendedmethods are described below.

Method 1The first method consists of taking a raw disk image and shipping it to the target site. This results in minimaldowntime as themirror can be active on the source system while the data is in transit to the target system.

Procedure1. Create themirror (selecting Replicate Existing Filesystem), but do not extend themirror to the target

system.

2. Take themirror out of service.

3. Take an image of the source disk or partition. For this example, the chosen disk or partition is/dev/sda1:

root@source# dd if=/dev/sda1 of=/tmp/sdr_disk.img bs=65536

(The block size argument of 65536 is merely for efficiency).

This will create a file containing the raw disk image of the disk or partition.

Note that instead of a file, a hard drive or other storage device could have been

used.

4. Optional Step – Take a checksum of the source disk or partition:

root@source# md5sum /dev/sda1

5. Optional Step – Compress the disk image file:

root@source# gzip /tmp/sdr_disk.img

SIOS Protection Suite for LinuxPage 315

Page 336: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Method 2

6. Clear the bitmap file (The "_dr" part below depends on tag names):

root@source# /opt/LifeKeeper/bin/bitmap -c /opt/LifeKeeper/bitmap__dr

7. Bring themirror and dependent filesystem and applications (if any), into service. The bitmap file willtrack any changes made while the data is transferred to the target system.

8. Transfer the disk image to the target system using your preferred transfer method.

9. Optional Step – Uncompress the disk image file on the target system:

root@target# gunzip /tmp/sdr_disk.img.gz

10. Optional Step – Verify that the checksum of the image file matches the original checksum taken inStep 4:

root@target# md5sum /tmp/sdr_disk.img

11. Transfer the image to the target disk, for example, /dev/sda2:

root@target# dd if=/tmp/sdr_disk.img of=/dev/sda2 bs=65536

12. Set LKDR_NO_FULL_SYNC=1 in /etc/default/LifeKeeper on both systems:

root@source# echo 'LKDR_NO_FULL_SYNC=1' >/etc/default/LifeKeeper

root@target# echo 'LKDR_NO_FULL_SYNC=1' >/etc/default/LifeKeeper

13. Extend themirror to the target. A partial resync will occur.

Method 2This method can be used if the target system can be easily transported to or will already be at the source sitewhen the systems are configured. This method consists of temporarily modifying network routes tomake theeventual WAN mirror into a LAN mirror so that the initial full resync can be performed over a faster localnetwork. In the following example, assume the source site is on subnet 10.10.10.0/24 and the target site is onsubnet 10.10.20.0/24. By temporarily setting up static routes on the source and target systems, the "WAN"traffic can bemade to go directly from one server to another over a local ethernet connection or loopbackcable.

Procedure1. Install and configure the systems at the source site.

2. Add static routes: 

root@source# route add -net 10.10.20.0/24 dev eth0

root@target# route add -net 10.10.10.0/24 dev eth0

The systems should now be able to talk to each other over the LAN.

SIOS Protection Suite for LinuxPage 316

Page 337: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Procedure

3. Configure the communication paths in LifeKeeper.

4. Create themirror and extend to the target. A full resync will occur.

5. Pause themirror. Changes will be tracked in the bitmap file until themirror is resumed.

6. Delete the static routes:

root@source# route del -net 10.10.20.0/24

root@target# route del -net 10.10.10.0/24

7. Shut down the target system and ship it to its permanent location.

8. Boot the target system and ensure network connectivity with the source.

9. ResumeReplication. A partial resync will occur.

SIOS Protection Suite for LinuxPage 317

Page 338: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Using LVM with DataKeeper

Using LVM with DataKeeperSPS for Linux currently supports both the use of DataKeeper “above” LVM and LVM "above" DataKeeper. In astandard DataKeeper configuration, using DataKeeper above LVM is supported and does not require the SPSLVMRecovery Kit. DataKeeper is the only recovery kit necessary. However, using the LVM aboveDataKeeper configuration, the LVM Recovery Kit is required.

SIOS recommends using DataKeeper above LVM; however, if the LVM above DataKeeper configuration isbeing used, a two-phase hierarchy creation process must be used. The DataKeeper devices (i.e. hierarchies)must be configured using the DataKeeper “Data Replication Resource” option prior to the creation of the LVMvolume groups and logical volumes on the primary server. Once the desired volume groups and logicalvolumes have been created, the remainder of the hierarchy is created according to the configurationinstructions for the recovery kit associated with the application to be protected. The resulting hierarchy willlook something like the one shown in Figure 3 below. Note: For data consistency reasons, in an LVM overDataKeeper configuration, theremust either be only one DataKeeper mirror or multiple synchronousmirrors.

Figure 3: Hierarchy with LVM above DataKeeper

SIOS Protection Suite for LinuxPage 318

Page 339: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Clustering with Fusion-io

Clustering with Fusion-io

Fusion-io Best Practices for Maximizing DataKeeper PerformanceSPS for Linux includes integrated, block level data replication functionality that makes it very easy to set up acluster when there is no shared storage involved. Using Fusion-io, SPS for Linux allows you to form "sharednothing" clusters for failover protection.

When leveraging data replication as part of a cluster configuration, it is critical that you have enoughbandwidth so that data can be replicated across the network just as fast as it is being written to disk. Thefollowing best practices will allow you to get themost out of your “shared nothing” SPS cluster configurationwhen high-speed storage is involved:

Networkl Use a 10 Gbps NIC: Flash-based storage devices from Fusion-io (or other similar products from OCZ,LSI, etc.) are capable of writing data at speeds of HUNDREDS (750+) MB/sec or more. A 1Gbps NICcan only push a theoretical maximum of approximately 125MB/sec, so anyone taking advantage of anioDrive’s potential can easily write datamuch faster than 1Gbps network connection could replicate it.To ensure that you have sufficient bandwidth between servers to facilitate real-time data replication, a10Gbps NIC should always be used to carry replication traffic.

l Enable Jumbo Frames: Assuming that your network cards and switches support it, enabling jumboframes can greatly increase your network’s throughput while at the same time reducing CPU cycles.To enable jumbo frames, perform the following configuration (example on a RedHat/CentOS/OELLinux distribution):

l Run the following command:

ifconfig <interface_name> mtu 9000

SIOS Protection Suite for LinuxPage 319

Page 340: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

TCP/IP Tuning

l To ensure change persists across reboots, add “MTU=9000” to the following file:

/etc/sysconfig/network-scripts/ifcfg-<interface_name>

l To verify end-to-end jumbo frame operation, run the following command:

ping -s 8900 -M do <IP-of-other-server>

l Change the NIC’s transmit queue length:

l Run the following command:

/sbin/ifconfig <interface_name> txqueuelen 10000

l To preserve the setting across reboots, add to /etc/rc.local.

l Change the NIC’s netdev_max_backlog:

l Set the following in /etc/sysctl.conf:

net.core.netdev_max_backlog = 100000

TCP/IP Tuningl TCP/IP tuning that has shown to increase replication performance:

l Edit /etc/sysctl.conf and add the following parameters (Note: These are examples andmay vary according to your environment):

net.core.rmem_default = 16777216

net.core.wmem_default = 16777216

net.core.rmem_max = 16777216

net.core.wmem_max = 16777216

net.ipv4.tcp_rmem = 4096 87380 16777216

net.ipv4.tcp_wmem = 4096 65536 16777216

net.ipv4.tcp_timestamps = 0

net.ipv4.tcp_sack = 0

net.core.optmem_max = 16777216

net.ipv4.tcp_congestion_control=htcp

Configuration Recommendationsl Allocate a small (~100MB) disk partition, located on the Fusion-io drive to place the bitmap file. Createa filesystem on this partition andmount it, for example, at /bitmap:

# mount | grep /bitmap

/dev/fioa1 on /bitmap type ext3 (rw)

SIOS Protection Suite for LinuxPage 320

Page 341: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Using external snapshot functions for disks and devices protected by DataKeeper

l Prior to creating your mirror, adjust the following parameters in /etc/default/LifeKeeper:

l LKDR_CHUNK_SIZE=4096

l Default value is 256

l Create your mirrors and configure the cluster as you normally would.

l The Bitmap file must be set up to be created in the partition, which is created as above.

l Set up for faster resynchronization. Select “Set Resync Speed Limits” from right menu of DataKeeperResource and set up the following figure to the wizard.

l Minimum Resync Speed Limit: 200000

l At the same time, set up Resync speed to be allowed during other I/Os operating. This figuremust beset up under the half of themaximum wright speed throughput of the drive as the empirical rule not todisturb the normal I/O functions during the Resync operation.

l Maximum Resync Speed Limit: 1500000

l Set up themaximum bandwidth to use during Resync. This figuremust be set up with enough high fig-ure to execute Resync with the available maximum speed.

Using external snapshot functions for disks and devicesprotected by DataKeeperFull synchronization is required when using a snapshot process to restore data to a disk or device that isactively protected by DataKeeper.

Snapshot capability referred to in this document incudes:

1. Snapshots provided by cloud environment services such as AWS;

2. Snapshots provided by virtualization software such as vSphere;

3. Snapshots provided by shared storage in physical environment.

The process of restoring snapshots typically takes place without involvement of the operating system. Whenthe snapshot is restored without involving the OS DataKeeper cannot properly synchronize these changes tothe target system. In addition, the changes to the underlying disk or device are not written in the bitmap, soconsistency is not maintained with differential synchronization.

The recommended procedure for using snapshots is as follows:

1. Stop themirror with “pausemirror” command

2. Restore the snapshot

3. Perform a complete re-synchronization with restored data as the source

SIOS Protection Suite for LinuxPage 321

Page 342: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Using external snapshot functions for disks and devices protected by DataKeeper

The source and target data are in an inconsistent state until themirror is fully synchronized. Therefore, it isessential to execute full synchronization after the snapshot has been restored. Please note a full resync cantake a long time depending on themirror size, available bandwidth, and system resources. During the resyncperiod, the DataKeeper resource and any dependent resources cannot be switched over.

Full synchronizationmay not be required when restoring the same snapshot to both the source and the target,but this operation is not supported.

SIOS Protection Suite for LinuxPage 322

Page 343: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Multi-Site Cluster

SIOS Protection Suite for Linux Multi-Site ClusterThe SIOS Protection Suite for Linux Multi-Site Cluster is a separately licensed product that uses a LifeKeepershared storage configuration between two or more servers with the additional ability to replicate the shareddisk(s) to one or more target servers using SIOS DataKeeper for Linux.

SIOS Protection Suite for Linux Multi-Site ClusterThe SIOS Protection Suite for Linux Multi-Site Cluster is a separately licensed product that uses aLifeKeeper shared storage configuration between two or more servers with the additional ability to replicatethe shared disk(s) to one or more target servers using SIOS DataKeeper.

SIOS Protection Suite for Linux Multi-Site Cluster can be built upon aWide Area Network that is configured toprovide failover of IP addresses across multiple network segments that are on different subnets. Thisconfiguration involves either a virtual network (Virtual LAN (VLAN)) or Virtual Private Network (VPN).

Following is an image of the SIOS LifeKeeper GUI after the SIOS Protection Suite for Linux Multi-Site Clusterproduct has been configured. Although the hierarchies appear unbalanced, they are configured properly andwill function correctly. If you are an existing SIOS DataKeeper customer and are familiar with the SIOS

SIOS Protection Suite for LinuxPage 323

Page 344: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Multi-Site Cluster Configuration Considerations

LifeKeeper graphical user interface, the SIOS Protection Suite Multi-Site Cluster resource hierarchy display inthe LifeKeeper GUI will appear differently from previous releases of SIOS DataKeeper.

Multi-Site Cluster Configuration Considerations

Local site storage destinations for bitmapsIn Multi-Site Cluster configurations, a bitmap only file system shared between local nodes must be allocatedfor storing the bitmap files used to tracked changes to the replicated file system being protected. This allowsfor seamless switchovers and failovers between local nodes as all changes to the replicated file system aretracked in the bitmap file on the file system shared between the local nodes. Additional things to considerwhen configuring LUNs:

l The file system should be between only the local nodes and only used for storing bitmaps.

l Add one LUN for bitmap storage for each additional replicated file system in the configuration. Otherdata replication resources and LUN must NOT be shared.

SIOS Protection Suite for LinuxPage 324

Page 345: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Resource Configurations to be Avoided

l The file system size should be largest enough to hold the bitmap. The size required to store the bitmapfile can be calculated based on the size of the replicated file system. For each 64KB of space in thereplicated file system 1 bit is allocated in the bitmap when using default settings. For example, areplicated file system size of 64MB would require a 1000 bit bitmap file.

Resource Configurations to be AvoidedBefore you begin your system configuration, it is important to understand what hierarchy configurations shouldto be avoided in a Linux Multi-Site Cluster environment.

Below are three examples of hierarchy configurations that should be avoided in Linux Multi-Site Clusterenvironments. In all these examples, the Linux Multi-Site Cluster hierarchy shares an underlying device withanother hierarchy. When a failure or a switchover occurs in one of these hierarchies, the related hierarchiesare affected. This could possibly produce unintended consequences such as application failure or mirrorbreakage; which would require a full-resync process. In addition, complications could result when themirrorsource is switched over to the DR site to mirror back to the primary site. Since themirror target system willhave the lower level disk resources in service, LifeKeeper will not prevent any shared resources using thisdisk from also being operational (ISP) on the same node as themirror target.

Example Description

1 Using theMulti-Site Cluster hierarchy’s mirror disk resourcemore than once in the same ordifferent hierarchies.

2Using the sameMulti-Site Cluster file system or disk resource for themirror bitmap inmorethan oneMulti-Site Cluster hierarchy. (Eachmirror’s bitmap file must reside on a unique LUNand can’t be shared.)

3 Using the bitmap file system, device or disk resource with another hierarchy (Multi-Site ornon-Multi-Site).

Multi-Site Cluster RestrictionsNote the following caution when configuring other Multi-Site Clusters.

The SIOS Logical VolumeManager Recovery Kit should not be installed on the Disaster Recovery node whenusing Linux Multi-Site Cluster.

Creating a SIOS Protection Suite for Linux Multi-Site ClusterResource HierarchyPerform the following on your primary server:

1. Select Edit > Server > Create Resource Hierarchy

TheCreate Resource Wizard dialog will appear.

SIOS Protection Suite for LinuxPage 325

Page 346: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Replicate New File System

2. Select the Data Replication option from the drop down list and click Next to continue.

3. You will be prompted for the following information. When theBack button is active in any of the dialogboxes, you can go back to the previous dialog box. This is helpful should you encounter any errorrequiring you to correct the previously entered information. Youmay click Cancel at any time to cancelthe entire creation process.

Field Tips

SwitchbackType

Youmust select intelligent switchback. This means that after a failover tothe backup server, an administrator must manually switch theMulti-SiteCluster resource back to the primary server.

CAUTION: This release of SIOS DataKeeper does not support AutomaticSwitchback for DataKeeper resources.  Additionally, the AutomaticSwitchback restriction is applicable for any other LifeKeeper resource thatbecomes part of theMulti-Site Cluster hierarchy. This includes anythingsitting above the hierarchy or becomes a child within the hierarchy.

ServerSelect the name of the server where the NetRAID device will be created(typically this is your primary server). All servers in your cluster are includedin the drop down list box.

HierarchyType

Choose the data replication type you wish to create by selecting one of thefollowing:

l Replicate New File System

l Replicate Existing File System

l DataKeeper Resource

The next sequence of dialog boxes depends on which Hierarchy Type you have chosen. While some of thedialog boxes may be the same for each Hierarchy Type, their sequence and the required informationmay beslightly different. The following three topics will take you through the remainder of the Hierarchy creationprocess:

l Replicate New File System

l Replicate Existing File System

l DataKeeper Resource

Replicate New File SystemThis option will create a NetRAID device, format it with a LifeKeeper supported file system type, mount thefile system on the NetRAID device and place both themounted file system and the NetRAID device underLifeKeeper protection.  The NetRAID device and the local disk or partition will be formatted causing existingdata to be deleted.  You should select this option if you want to create amirror on a new file system and placeit under LifeKeeper protection.  You will need one free disk or partition for this resource type.

SIOS Protection Suite for LinuxPage 326

Page 347: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Replicate New File System

CAUTION: This option will cause your local disk or partition to be formatted and all existing data will bedeleted.

1. Enter the following information when prompted:

Field Tip

Source Diskor Partition

The list of Source Disks or Partitions in the drop-down list contains all the availabledisks or partitions that are not:

l currently mounted

l swap disks or partitions

l LifeKeeper-protected disks or partitions

The drop-down list will also filter out special disks or partitions, for example,root (/), boot (/boot), /proc, floppy and cdrom.

2. The following screen will display if you select a source disk or partition that is not shared.

3. Select Back to select a different source disk or partition that is shared. Provide the remaininginformation to finish configuring the SIOS Protection Suite for Linux Multi-Site Cluster resource.

SIOS Protection Suite for LinuxPage 327

Page 348: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Replicate Existing File System

Field TipsNew MountPoint

Enter theNew Mount Point of the new file system. This should be themount pointwhere the replicated disk or partition will be located.

New File Sys-tem Type

Select the File System Type. Youmay only choose from the LifeKeeper sup-ported file system types.

DataKeeperResource Tag

Select or enter a unique DataKeeper Resource Tag name for the DataKeeperresource instance.

File SystemResource Tag

Select or enter the File System Resource Tag name for the file system resourceinstance.

Bitmap File

Select the bitmap file entry from the pull down list.

Displayed in the list are all available shared file systems that can be used to holdthe bitmap file. The bitmap file must be placed on a shared device that can switchbetween the local nodes in the cluster.

Important: The bitmap file should not reside on a btrfs filesystem. Placing datareplication bitmap files on a btrfs filesystem will result in an "invalid argument" errorwhen LifeKeeper tries to configure themirror. The default location for the bitmap fileis under /opt/LifeKeeper. This default location should be changed if /opt/LifeKeeperresides on a btrfs filesystem.

4. Click Next to continue to theConfirmation Screen.         

5. A confirmation screen noting the location where the new file system will be created and a warningindicating the pending reformat of the local disk or partition will display. Click Create to beginResource Creation.

6. LifeKeeper will verify that you have provided valid data to create your resource on a new file system.  IfLifeKeeper detects a problem, an ERROR will appear in the information box.  If the validation issuccessful, your resource will be created. Note that the creation of the file systemmay take severalminutes depending upon the disk or partition size.

Click Next to continue.

7. An information box appears announcing the successful creation of your new replicated file systemresource hierarchy.  Youmust Extend the hierarchy to another server in your cluster to begin datareplication and in order to place it under LifeKeeper protection.

Click Next to extend the resource or click Cancel if you wish to extend your resource at another time.

If you click Continue LifeKeeper will launch thePre-extend Wizard. Refer to Step 2 under ExtendingYour Hierarchy for details on how to extend your resource hierarchy to another server.

Replicate Existing File SystemThis option will unmount a currently mounted file system on a local disk or partition, create a NetRAID device,

SIOS Protection Suite for LinuxPage 328

Page 349: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Replicate Existing File System

then re-mount the file system on the NetRAID device.  Both the NetRAID device and themounted file systemare placed under LifeKeeper protection. You should select this option if you want to create amirror on anexisting file system and place it under LifeKeeper protection.

1. Enter the following information when prompted:

Field TipExistingMount Point

This should be themount point for the NetRAID device on the primary server. Thelocal disk or partition should already bemounted at this location.

2. The following screen will display if you select amount point that is not shared.

3. Select Back to select a sharedmount point. A sharedmount point is required in order to configure theSIOS Protection Suite for Linux Multi-Site Cluster resource. Provide the remaining information to finishconfiguring the SIOS Protection Suite for Linux Multi-Site Cluster resource.

Field TipsDataKeeperResource Tag

Select or enter a uniqueDataKeeper Resource Tag name for the DataKeeperresource instance.

File SystemResource Tag Select or enter the File System Resource Tag name.

SIOS Protection Suite for LinuxPage 329

Page 350: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

DataKeeper Resource

Field Tips

Bitmap File

Select the bitmap file entry from the pull down list.

Displayed in the list are all available shared file systems that can be used to holdthe bitmap file. The bitmap file must be placed on a shared device that can switchbetween the local nodes in the cluster.

Important: The bitmap file should not reside on a btrfs filesystem. Placing datareplication bitmap files on a btrfs filesystem will result in an "invalid argument" errorwhen LifeKeeper tries to configure themirror. The default location for the bitmap fileis under /opt/LifeKeeper. This default location should be changed if /opt/LifeKeeperresides on a btrfs file system.

Important:Do not select the shared disk area used for replication if displayed inthe pull-down selection as the storage area for bitmaps. The shared file systemallocated for replication cannot be used as the storage destination of bitmap files.Youmust use a shared file system location that has been allocated for just bitmapfile.

4. Click Next to create your DataKeeper resource on the primary server.

5. LifeKeeper will verify that you have provided valid data to create your DataKeeper resource.  IfLifeKeeper detects a problem, an ERROR will appear in the information box.  If the validation is suc-cessful, your resource will be created.

Click Next.

6. An information box appears announcing that you have successfully created an existing replicated filesystem resource hierarchy.  Youmust Extend the hierarchy to another server in your cluster to beginreplication and to place it under LifeKeeper protection.

Click Next to extend the resource, or click Cancel if you wish to extend your resource at another time.

If you click Continue LifeKeeper will launch thePre-extendWizard. Refer to Step 2 under ExtendingYour Hierarchy for details on how to extend your resource hierarchy to another server.

DataKeeper ResourceThis option will create only the NetRAID device (not a file system) and place the device under LifeKeeperprotection. You should select this option if you only want to create a DataKeeper device on a disk or partitionand place the device under LifeKeeper protection. You will need tomanually make andmount a file system onthis device in order to create a readablemirror. You will need one free disk or partition for this resource type.

SIOS Protection Suite for LinuxPage 330

Page 351: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

DataKeeper Resource

1. Enter the following information when prompted:

Field Tip

Source Diskor Partition

The list of Source Disks or Partitions in the drop-down list contains all the availabledisks or partitions that are not:

l currently mounted

l swap disks or partitions

l LifeKeeper-protected disks or partitions

The drop-down list will also filter out special disks or partitions, for example,root (/), boot (/boot), /proc, floppy and cdrom.

2. The following screen will display if you select a source disk or partition that is not shared.

3. Select Back to select a different source disk or partition that is shared. Provide the remaininginformation to finish configuring the SIOS Protection Suite for Linux Multi-Site Cluster resource.

Field TipsDataKeeperResource Tag

Select or enter a uniqueDataKeeper Resource Tag name for the DataKeeperresource instance.

SIOS Protection Suite for LinuxPage 331

Page 352: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Extending Your Hierarchy

Field Tips

Bitmap File

Select the bitmap file entry from the pull down list.

Displayed in the list are all available shared file systems that can be used to holdthe bitmap file. The bitmap file must be placed on a shared device that can switchbetween the local nodes in the cluster.

Important: The bitmap file should not reside on a btrfs filesystem. Placing datareplication bitmap files on a btrfs filesystem will result in an "invalid argument" errorwhen LifeKeeper tries to configure themirror. The default location for the bitmap fileis under /opt/LifeKeeper. This default location should be changed if /opt/LifeKeeperresides on a btrfs file system.

4. Click Next.

5. An information window appears notifying you that you will have tomanually make the file system andmount the NetRAID device (/dev/mdX) before being able to use it.

Click Create to create your DataKeeper device on the local disk or partition.

6. An information box appears and LifeKeeper will verify that you have provided valid data to create yourDataKeeper resource.  If LifeKeeper detects a problem, an ERROR will appear in the information box. If the validation is successful, your resource will be created.

Click Next to continue.

7. An information box appears announcing the successful creation of your DataKeeper resource device. Youmust Extend the hierarchy to another server in your cluster to begin data replication and in order toplace it on the backup/target server and under LifeKeeper protection.

Click Continue to extend the resource, or click Cancel if you wish to extend your resource at anothertime.

If you click Continue LifeKeeper will launch thePre-extend Wizard. Refer to Step 2 under ExtendingYour Hierarchy for details on how to extend your resource hierarchy to another server.

Extending Your HierarchyThis operation should be started on the Primary Server to the Secondary Server from theEditmenu orinitiated automatically upon completing theCreate Resource Hierarchy option in which case you shouldrefer to Step 2 below. 

1. On theEditmenu, select Resource thenExtend Resource Hierarchy. ThePre-ExtendWizardappears. If you are unfamiliar with theExtend operation, click Next. If you are familiar with theLifeKeeperExtend Resource Hierarchy defaults and want to bypass the prompts forinput/confirmation, click Accept Defaults.

SIOS Protection Suite for LinuxPage 332

Page 353: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Extending Your Hierarchy

2. ThePre-Extend Wizardwill prompt you to enter the following information.

Note: The first two fields appear only if you initiated theExtend from theEditmenu.

Field Tips

TemplateServer

Select the Template Serverwhere your DataKeeper resource hierarchy iscurrently in service. It is important to remember that the Template Server youselect now and the Tag to Extend that you select in the next dialog box representan in-service (activated) resource hierarchy.

An error message will appear if you select a resource tag that is not in service onthe template server you have selected. The drop down box in this dialog providesthe names of all the servers in your cluster.

Tag to ExtendThis is the name of the DataKeeper instance you wish to extend from the templateserver to the target server. The drop down box will list all the resources that youhave created on the template server.

Target Server Enter or select the server you are extending to.

SwitchbackType

Youmust select intelligent switchback. This means that after a failover to thebackup server, an administrator must manually switch theMulti-Site Clusterhierarchy resource back to the primary server.

CAUTION: This release of DataKeeper for Linux does not support AutomaticSwitchback for DataKeeper resources. Additionally, the Automatic Switchbackrestriction is applicable for any other LifeKeeper resource that becomes part of theMulti-Site Cluster hierarchy. This includes anything sitting above the hierarchy orbecomes a child within the hierarchy.

Template Pri-ority

Select or enter a Template Priority. This is the priority for the DataKeeperhierarchy on the server where it is currently in service. Any unused priority valuefrom 1 to 999 is valid, where a lower numbermeans a higher priority (1=highest).The extend process will reject any priority for this hierarchy that is already in use byanother system. The default value is recommended.

Note: This selection will appear only for the initial extend of the hierarchy.

Target Priority

Select or enter the Target Priority. This is the priority for the new extendedDataKeeper hierarchy relative to equivalent hierarchies on other servers. Anyunused priority value from 1 to 999 is valid, indicating a server’s priority in the cas-cading failover sequence for the resource. A lower numbermeans a higher priority(1=highest). Note that LifeKeeper assigns the number “1” to the server on whichthe hierarchy is created by default. The priorities need not be consecutive, but notwo servers can have the same priority for a given resource.

3. After receiving themessage that the pre-extend checks were successful, click Next.

4. Depending upon the hierarchy being extended, LifeKeeper will display a series of information boxesshowing the Resource Tags to be extended, some of which cannot be edited.

SIOS Protection Suite for LinuxPage 333

Page 354: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Extending a DataKeeper Resource

Click Next to launch theExtend Resource Hierarchy configuration task.

The next section lists the steps required to complete the extension of a DataKeeper resource to anotherserver.

Extending a DataKeeper Resource1. After you have been notified that your pre-extend script has executed successfully, you will be

prompted for the following information:

Field Tips

Mount PointEnter the name of the file systemmount point on the target server. (This dialog willnot appear if there is no LifeKeeper-protected filesystem associated with theDataKeeper Resource.)

Root TagSelect or enter theRoot Tag. This is a unique name for the filesystem resourceinstance on the target server. (This dialog will not appear if there is no LifeKeeper-protected filesystem associated with the DataKeeper Resource.)

DataKeeperResource Tag Select or enter theDataKeeper Resource Tag name.

Bitmap File

Select the name of the bitmap file used for intent logging. If you chooseNone, thenan intent log will not be used and every resynchronization will be a full resyncinstead of a partial resync.

Important: The bitmap file should not reside on a btrfs filesystem. Placing datareplication bitmap files on a btrfs filesystem will result in an "invalid argument" errorwhen LifeKeeper tries to configure themirror. The default location for the bitmap fileis under /opt/LifeKeeper. This default location should be changed if /opt/LifeKeeperresides on a btrfs filesystem.

2. Click Next to continue.  An information box will appear verifying that the extension is being performed.

3. Click Finish to confirm the successful extension of your DataKeeper resource instance.

4. Click Done to exit theExtend Resources Hierarchymenu selection.

Note: Be sure to test the functionality of the new instance on all servers by performing amanualswitchover. See Testing Your Resource Hierarchy for details. At this point, DataKeeper has initiatedthe data resynchronization from the source to the target disk or partition.  In the LifeKeeper GUI, thestate of the DataKeeper resource on the target server is set to “Resyncing”.  Once theresynchronization is complete, the state will change to “Target” which is the normal Standbycondition.

During resynchronization, the DataKeeper resource and any resource that depends on it will not beable to fail over. This is to avoid data corruption.

SIOS Protection Suite for LinuxPage 334

Page 355: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Extending a Hierarchy to a Disaster Recovery System

Extending a Hierarchy to a Disaster Recovery SystemThis operation can only occur from an ISP node or as the continuation of the creation process for multiplenodes from theEditmenu or initiated automatically upon completing theCreate Resource Hierarchy option,in which case you should refer to Step 2 below. 

1. On theEditmenu, select Resource thenExtend Resource Hierarchy. ThePre-ExtendWizardappears. If you are unfamiliar with theExtend operation, click Next. If you are familiar with theLifeKeeperExtend Resource Hierarchy defaults and want to bypass the prompts forinput/confirmation, click Accept Defaults.

2. ThePre-Extend Wizardwill prompt you to enter the following information.

Note: The first two fields appear only if you initiated theExtend from theEditmenu.

Field TipsTarget Server Enter or select the server you are extending to.

SwitchbackType

Youmust select intelligent switchback. This means that after a failover to thebackup server, an administrator must manually switch theMulti-Site Clusterhierarchy resource back to the primary server.

CAUTION: This release of DataKeeper for Linux does not support AutomaticSwitchback for DataKeeper resources. Additionally, the Automatic Switchbackrestriction is applicable for any other LifeKeeper resource that becomes part of theMulti-Site Cluster hierarchy. This includes anything sitting above the hierarchy orbecomes a child within the hierarchy.

Target Priority

Select or enter the Target Priority. This is the priority for the new extendedDataKeeper hierarchy relative to equivalent hierarchies on other servers. Anyunused priority value from 1 to 999 is valid, indicating a server’s priority in the cas-cading failover sequence for the resource. A lower numbermeans a higher priority(1=highest). Note that LifeKeeper assigns the number “1” to the server on whichthe hierarchy is created by default. The priorities need not be consecutive, but notwo servers can have the same priority for a given resource.

Template Pri-ority

Select or enter a Template Priority. This is the priority for the DataKeeperhierarchy on the server where it is currently in service. Any unused priority valuefrom 1 to 999 is valid, where a lower numbermeans a higher priority (1=highest).The extend process will reject any priority for this hierarchy that is already in use byanother system. The default value is recommended.

Note: This selection will appear only for the initial extend of the hierarchy.

SIOS Protection Suite for LinuxPage 335

Page 356: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Extending a Hierarchy to a Disaster Recovery System

Field Tips

Target Priority

Select or enter the Target Priority. This is the priority for the new extendedDataKeeper hierarchy relative to equivalent hierarchies on other servers. Anyunused priority value from 1 to 999 is valid, indicating a server’s priority in the cas-cading failover sequence for the resource. A lower numbermeans a higher priority(1=highest). Note that LifeKeeper assigns the number “1” to the server on whichthe hierarchy is created by default. The priorities need not be consecutive, but notwo servers can have the same priority for a given resource.

3. After receiving themessage that the pre-extend checks were successful, click Next.

Note: Depending upon the hierarchy being extended, LifeKeeper will display a series of informationboxes showing the Resource Tags to be extended, some of which cannot be edited.

4. Click Next to launch theExtend Resource Hierarchy configuration task.

The next section lists the steps required to complete the extension of a DataKeeper resource to anotherserver.

1. After you have been notified that your pre-extend script has executed successfully, you will beprompted for the following information:

Field Tips

Mount PointEnter the name of the file systemmount point on the target server. (This dialog willnot appear if there is no LifeKeeper-protected filesystem associated with theDataKeeper Resource.)

Root TagSelect or enter theRoot Tag. This is a unique name for the filesystem resourceinstance on the target server. (This dialog will not appear if there is no LifeKeeper-protected filesystem associated with the DataKeeper Resource.)

Target Disk orPartition

Select the disk or partition where the replicated file system will be located on thetarget server.

The list of disks or partitions in the drop down box contains all the available disks orpartitions that are not:

l already mounted

l swap disks or partitions

l LifeKeeper-protected disks or partitions

The drop down list will also filter out special disks or partitions, for example, root(/), boot (/boot), /proc, floppy and cdrom.

Note: The size of the target disk or partitionmust be greater than or equal to that ofthe source disk or partition.

SIOS Protection Suite for LinuxPage 336

Page 357: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Configuring the Restore and Recovery Setting

Field TipsDataKeeperResource Tag Select or enter theDataKeeper Resource Tag name.

Bitmap File

Select the name of the bitmap file used for intent logging. If you chooseNone, thenan intent log will not be used and every resynchronization will be a full resyncinstead of a partial resync.

Important: The bitmap file should not reside on a btrfs filesystem. Placing datareplication bitmap files on a btrfs filesystem will result in an "invalid argument" errorwhen LifeKeeper tries to configure themirror. The default location for the bitmap fileis under /opt/LifeKeeper. This default location should be changed if /opt/LifeKeeperresides on a btrfs filesystem.

ReplicationPath

Select the pair of local and remote IP addresses to use for replication between thetarget server and the other indicated server in the cluster. The valid paths and theirassociated IP addresses are derived from the set of LifeKeeper communicationpaths that have been defined for this same pair of servers. Due to the nature ofDataKeeper, it is strongly recommended that you use a private (dedicated)network.

If the DataKeeper Resource has previously been extended to one or more targetservers, the extension to an additional server will loop through each of the pairingsof the new target server with existing servers, prompting for a Replication Path foreach pair.

ReplicationType

Choose “synchronous” or “asynchronous” to indicate the type of replication thatshould be used between the indicated pair of servers.

As for the previous Replication Path field, if the DataKeeper Resource haspreviously been extended to one or more target servers, the extension to anadditional server will loop through each of the pairings of the new target server withexisting servers, prompting for aReplication Type for each pair.

2. Click Next to continue.  An information box will appear verifying that the extension is being performed.

3. Click Finish to confirm the successful extension of your DataKeeper resource instance.

4. Click Done to exit the Extend Resources Hierarchy menu selection.

Configuring the Restore and Recovery SettingIn case of use IP resources in multi-site environment for configure different subnets, you will need select the[Disable] of [Modify Restore and Recover] button on the IP resource.

Refer to the [IP Recovery Kit] for more information regarding this option.

Note:Be sure to test the functionality of the new instance on all servers by performing amanual switchover.See Testing Your Resource Hierarchy for details. At this point, SIOS DataKeeper has initiated the data

SIOS Protection Suite for LinuxPage 337

Page 358: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Migrating to aMulti-Site Cluster Environment

resynchronization from the source to the target disk or partition once the extend to the disaster recovery nodeis completed.  In the LifeKeeper GUI, the state of the DataKeeper resource on the target server is set to“Resyncing”.  Once the resynchronization is complete, the state will change to “Target” which is the normalStandby condition.

During resynchronization, the DataKeeper resource and any resource that depends on it will not be able to failover. This is to avoid data corruption.

If you haven’t done so already, make sure you set the confirm failover flags. Refer to the section ConfirmFailover and Block Resource Failover Settings for more information about this procedure.

Migrating to a Multi-Site Cluster EnvironmentThe SIOS Multi-Site Migrate feature is included in the SIOS Protection Suite for Linux Multi-Site Clusterproduct. This additional feature enables an administrator to migrate an existing SIOS Linux LifeKeeperenvironment to aMulti-Site Cluster Environment. Themigration procedure allows selected shared filesystem’s resources to be safely migrated and replicated with minimum hierarchy downtime.

Following are a few important considerations when creating aMulti-Site resource from an existing file system:

l TheMulti-Site migrate procedure will un-mount the file system during the creation process and remountit on a NETRAID device.

l Any applications that depend on this file system will need to be stopped during the create resourceprocedure. This action is handled by theMigrate procedure; no administration action is required.

l Hierarchies containing the following resource types cannot bemigrated using theMulti-Site migrationfeature–NAS (scsi/netstorage), DRBD (scsi/drbd), SDR (scsi/netraid) andMulti-Site Clusterresource (scsi/disrec).

RequirementsPrior to performing amigration, make sure your systems meet the requirements described in the Installationand Configuration section of this document.

This feature is defined for configurations that have two servers that share a storage device. One of the serversis considered the primary node and is located at a primary site. A third server is remote and located at adisaster recovery site.

After you have installed the SIOS Protection Suite for Linux Multi-Site Cluster on the primary node and othershared storage nodes, there is no additional installation or configuration required to take advantage of theMigrate feature.

Before You StartThe following image depicts a file system resource hierarchy prior to performing amigrate.

SIOS Protection Suite for LinuxPage 338

Page 359: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Performing theMigration

Performing the MigrationThere are threemethods for configuring and performing aMulti-Site Migrate. You can:

l Select theMigrate icon from the LifeKeeper GUI toolbar  and then select the resource tomigrate.

l Select the file system resource and right-click themouse to display theMigrate Hierarchy to Multi-Site Cluster menu option.

SIOS Protection Suite for LinuxPage 339

Page 360: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Performing theMigration

l Select the file system resource and select theMigration icon from theProperties Panel toolbar.

SIOS Protection Suite for LinuxPage 340

Page 361: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Performing theMigration

If you initiate theMigrate from the global toolbar icon, the following dialog box will display:

1. Select the server where the hierarchy to migrate exists and is in-service. Click Next.

SIOS Protection Suite for LinuxPage 341

Page 362: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Performing theMigration

2. Select the root hierarchy tag that will bemigrated and click Next. The root tag can be a file system orother application resource. The tag selected (for non-file system resources) must contain a file systemdependent resource.

If you select a File System in the LifeKeeper GUI window and selectMigrate Hierarchy to Multi-SiteCluster from the pop-up window or theMigrate icon in theProperties Panel Migrate icon, thefollowing initialization screen displays.

SIOS Protection Suite for LinuxPage 342

Page 363: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Performing theMigration

3. Press Continuewhen the Continue button is enabled. The following bitmap dialog will display.

SIOS Protection Suite for LinuxPage 343

Page 364: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Performing theMigration

4. Select a bitmap file for the file system you aremigrating. Select Next.

Important: Once you select Next, you will not be able to change theBitmap File Selection for thisfile system resource.

SIOS Protection Suite for LinuxPage 344

Page 365: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Performing theMigration

5. Select the second bitmap file for the second file system beingmigrated within the hierarchy. Afterselecting the first bitmap file in the previous dialog box, any additional file system tags will be displayedso that the user can enter a unique bitmap file for each additional file system tag.

Note: This screen will not appear if there is only one file system beingmigrated. Also, multiple screenssimilar to this will exist if there aremore than two file systems beingmigrated.

6. Select Next, a summary screen similar to the one below will display.

SIOS Protection Suite for LinuxPage 345

Page 366: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Performing theMigration

7. This Summary screen displays all the configuration information you’ve submitted during theMigrateprocedure. Once you selectMigrate, the following screen displays.

SIOS Protection Suite for LinuxPage 346

Page 367: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Performing theMigration

8. TheMigration status will display in this window. Press Finishwhen the Finish button is enabled.

SIOS Protection Suite for LinuxPage 347

Page 368: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Successful Migration

Successful MigrationThe following image is an example of a file system resource hierarchy after theMulti-Site migration iscompleted. At this time, the hierarchy can be extended to the non-shared node (megavolt).

SIOS Protection Suite for LinuxPage 348

Page 369: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Successful Migration

SIOS Protection Suite for LinuxPage 349

Page 370: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

TroubleshootingThis section provides information regarding issues that may be encountered with the use of DataKeeper forLinux. Where appropriate, additional explanation of the cause of an error is provided along with necessaryaction to resolve the error condition.

Messages specific to DataKeeper for Linux can be found in the DataKeeper Message Catalog. Messagesfrom other SPS components are also possible. In these cases, please refer to the CombinedMessageCatalog. Both of these catalogs can be found on our Technical Documentation site under “Search for an ErrorCode” which provides a listing of all error codes, including operational, administrative andGUI, that may beencountered while using SIOS Protection Suite for Linux and, where appropriate, provides additionalexplanation of the cause of the error code and necessary action to resolve the issue. This full listingmay besearched for any error code received, or youmay go directly to one of the individual Message Catalogs for theappropriate SPS component.

The following table lists possible problems and suggestions.

Symptom Suggested Action

NetRAID device notdeleted afterDataKeeper resourcedeletion.

Deleting a DataKeeper resource will not delete the NetRAID device if the NetRAIDdevice is mounted. You canmanually unmount the device and delete  it byexecuting:

 mdadm –S <md_device> (cat /proc/mdstat to determine the <md_device>).

Installation/HADRrpm fails

See the Installation section for complete instructions onmanually installing thesefiles.

Errors during failover Check the status of your device. If resynchronization is in progress you cannotperform a failover.

After primary serverpanics, DataKeeperresource goes ISPon the secondaryserver, but whenprimary serverreboots, theDataKeeper resourcebecomes OSF onboth servers.

Check the “switchback type” selected when creating your DataKeeper resourcehierarchy.  Automatic switchback is not supported for DataKeeper resources in thisrelease.  You can change the Switchback type to “Intelligent” from the resourceproperties window.

SIOS Protection Suite for LinuxPage 350

Page 371: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Troubleshooting

Symptom Suggested ActionPrimary servercannot bring theresource ISP when itreboots after bothservers becameinoperable.

If the primary server becomes operable before the secondary server, you can forcethe DataKeeper resource online by opening the resource properties dialog, clickingtheReplication Status tab, clicking theActions button, and then selecting ForceMirror Online. Click Continue to confirm, then Finish.

Error creating aDataKeeperhierarchy oncurrently mountedNFS file system

You are attempting to create a DataKeeper hierarchy on a file system that iscurrently exported by NFS. You will need to replicate this file system before youexport it. 

DataKeeper GUIwizard does not list anewly createdpartition

The Linux OS may not recognize a newly created partition until the next reboot ofthe system.  View the /proc/partitions file for an entry of your newly createdpartition.  If your new partition does not appear in the file, you will need to rebootyour system.

Resources appeargreen (ISP) on bothprimary and backupservers.

This is a “split-brain” scenario that can be caused by a temporary communicationsfailure. After communications are resumed, both systems assume they are primary.

DataKeeper will not resync the data because it does not know which system wasthe last primary system. Manual intervention is required.

If not using a bitmap:

Youmust determine which server was the last backup, then take the resource out ofservice on that server. DataKeeper will then perform a FULL resync.

If using a bitmap (2.6.18 and earlier kernel):

You should take both resources out of service, starting with the original backupnode first. You should then dirty the bitmap on the primary node by executing:$LKROOT/lkadm/subsys/scsi/netraid/bin/bitmap –d /opt/LifeKeeper/bitmap_filesys

(where /opt/LifeKeeper/bitmap_filesys is the bitmap filename). This will force a fullresync when the resource is brought into service. Next, bring the resource intoservice on the primary node and a full resync will begin.

If using a bitmap (2.6.19 and later kernel or with Red Hat Enterprise Linux 5.4kernels 2.6.18-164 or later or a supported derivative of Red Hat 5.4 or later):

Youmust determine which server was the last backup, then take the resource out ofservice on that server. DataKeeper will then perform a partial resync.

SIOS Protection Suite for LinuxPage 351

Page 372: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Troubleshooting

Symptom Suggested Action

Core - LanguageEnvironment Effects 

Some LifeKeeper scripts parse the output of Linux system utilities and rely oncertain patterns in order to extract information. When some of these commands rununder non-English locales, the expected patterns are altered and LifeKeeper scriptsfail to retrieve the needed information.  For this reason, the language environmentvariable LC_MESSAGES has been set to the POSIX “C” locale (LC_MESSAGES=C) in/etc/default/LifeKeeper.  It is not necessary to install Linux withthe language set to English (any language variant available with your installationmediamay be chosen); the setting of LC_MESSAGESin /etc/default/LifeKeeper will only influence LifeKeeper.  If you change the value ofLC_MESSAGES in /etc/default/LifeKeeper, be aware that it may adversely affectthe way LifeKeeper operates.  The side effects depend on whether or not messagecatalogs are installed for various languages and utilities and if they produce textoutput that LifeKeeper does not expect. 

GUI - GUI loginprompt may not re-appear whenreconnecting via aweb browser afterexiting the GUI

When you exit or disconnect from theGUI applet and then try to reconnect from thesameweb browser session, the login prompt may not appear.

Workaround: Close the web browser, re-open the browser and then connect to theserver. When using the Firefox browser, close all Firefox windows and re-open.

GUI - lkGUIapp onRHEL5 reportsunsupported themeerrors

When you start the GUI application client, youmay see the following consolemessage:

/usr/share/themes/Clearlooks/gtk-2.0/gtkrc:60: Engine "clearlooks" is unsupported,ignoring

This message comes from the RHEL 5 and FC6 Java platform look and feel and willnot adversely affect the behavior of the GUI client.

SIOS Protection Suite for LinuxPage 352

Page 373: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Troubleshooting

Symptom Suggested Action

DataKeeper CreateResource fails

When using DataKeeper in certain environments (e.g., virtualized environmentswith IDE disk emulation, servers with HP CCISS storage, or solid state devices(SSD), an error may occur when amirror is created:

ERROR 104052: Cannot get the hardware ID of the device"dev/hda3"

This is because LifeKeeper does not recognize the disk in question and cannot get aunique ID to associate with the device.

Workaround: Add a pattern for our disk(s) to the DEVNAME device_pattern file,e.g.:

# cat/opt/LifeKeeper/subsys/scsi/resources/DEVNAME/device_pattern

/dev/hda*

/dev/fio*(Fusion IO SDD)

/dev/hio*(PCI SSD)

In Multi-Site Clusterconfigurations, fullsynchronizationsometimes occurswhen stopping andresumingDataKeeper

This could be caused by the incorrect selection for the bitmap file location. Bitmapfiles must be located on a file system shared between the local nodes in theMulti-site Cluster. This shared file systemmust be different than the shared file systemused for replication. Once this has been corrected the hierarchy will need to beremoved and recreated using the correct shared file system for replication and thecorrect shared file system for the bitmap.

SIOS Protection Suite for LinuxPage 353

Page 374: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Recovery Kits

Recovery KitsFor a list of all SIOS Protection Suite Recovery Kits and their Administration Guides, see Documentation forLinux Recovery Kits from the SIOS Technical Documentation (http://docs.us.sios.com).

SIOS Protection Suite for LinuxPage 354

Page 375: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Combined Message CatalogUseControl F to search for a specific error code in each catalog.

Cod-e

Sever-ity Message Cause/Action

00020-0

ERROR pam_start() failed

00020-1

ERROR pam_authenticate failed (user%s, retval %d

00020-2

ERROR pam_end() failed?!?!

00020-3

ERROR Did not find expected group'lkguest'

00020-4

ERROR Did not find expected group 'lko-per'

00020-5

ERROR Did not find expected group 'lkad-min'

00090-2

ERROR Error removing system namefrom loopback address line in/etc/hosts file. Youmust do thismanually before starting the GUIserver.

Cause: System name did not get removed from/etc/hosts file.

Action: Remove system namemanually then restartthe GUI server, then enter the following: run <actionname>

00091-8

ERROR LifeKeeper GUI Server error dur-ing Startup

Cause: TheGUI server terminated due to anabnormal condition.

Action: Check the logs for related errors and try toresolve the reported problem.

00105-2

FATAL Template resource "%s" onserver "%s" does not exist

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

00105-3

ERROR Cannot access canextend script"%s" on server "%s"

Cause: LifeKeeper was unable to run pre-extendchecks because it was unable to find the scriptCANEXTEND on {server}.

Action: Check your LifeKeeper configuration.

00105-4

ERROR Cannot extend resource "%s" toserver "%s"

Cause: LifeKeeper was unable to extend theresource {resource} on {server}.

SIOS Protection Suite for LinuxPage 355

Page 376: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00105-5

ERROR Cannot access extend script"%s" on server "%s"

Cause: LifeKeeper was unable to extend theresource hierarchy because it was unable to the findthe script EXTEND on {server}.

Action: Check your LifeKeeper configuration.

00105-7

ERROR Cannot extend resource "%s" toserver "%s"

Cause: LifeKeeper was unable to extend theresource {resource} on {server}.

00105-9

ERROR Resource with tag "%s" alreadyexists

Cause: The name provided for a resource is alreadyin use.

Action: Either choose a different name for theresource, or use the existing resource.

00106-0

ERROR Resource with either matchingtag "%s" or id "%s" alreadyexists on server "%s" for App"%s" and Type "%s"

Cause: The name or id provided for a resource isalready in use.

Action: Either choose a different name or id for theresource or use the existing resource.

00106-1

ERROR Error creating resource "%s" onserver "%s"

Cause: An unexpected failure occurred while creatinga resource.

Action: Check the logs for related errors and try toresolve the reported problem.

00108-1

WARN IP address \"$ip\" is neither v4nor v6

Cause: The IP address provided is neither an IPv4nor an IPv6 address.

Action: Please check the name or address providedand try again. If a namewas provided, verify thatname resolution returns a valid IP address.

00402-4

ERROR Cause: LCD failed to fetch resource information forresource id {id} during resource recovery.

Action: Verify the input resource id and retry therecovery operation.

00402-8

ERROR %s occurred to resource \"%s\" Cause: Local recovery failed for resource {resource}.

Action: Check the logs for related errors and try toresolve the reported problem.

00405-5

ERROR attempt to remote-removeresource \"%s\" that can't befound

Cause: Remotely removing a resource from servicefailed while attempting to find the resource by tagname {tag}.

Action: Check input tag name and retry the recoveryoperation.

SIOS Protection Suite for LinuxPage 356

Page 377: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00405-6

ERROR attempt to remote-removeresource \"%s\" that is not ashared resource

Cause: Remotely removing a resource from servicefailed given that the tag name {tag} is not a sharedresource.

Action: Check input tag name and retry the recoveryoperation.

00406-0

ERROR attempt to transfer-restoreresource \"%s\" that can't befound

Cause: Remote transfer of an in service resourcefailed given tag name {tag}.

Action: Check input tag name and retry the recoveryoperation.

00406-1

ERROR attempt to transfer-restoreresource \"%s\" that is not ashared resource with machine\"%s\"

Cause: LifeKeeper failed to find a shared resource by{tag} name during remote transfer of a resource inservice from a remote {machine}.

Action: Check input tag name and retry the recoveryoperation.

00408-9

ERROR ERROR: Parallel recovery ini-tialization failed.\n

Cause: Parallel recovery failed to initialize the list ofresources in the hierarchy.

Action: Check the logs for related errors and try toresolve the reported problem.

00409-3

ERROR ERROR: reserve failed. con-tinuing to next resource\n

Cause: Parallel recovery failed to reserve a singleresource from the collective hierarchy.

Action: Check the logs for related errors and try toresolve the reported problem.

00409-6

ERROR ERROR: clone%d is hung,attempting to kill it\n

Cause: A single sub process of a resource recoveryis hung during parallel recovery of a whole resourcehierarchy.

Action: A kill of the hanging sub process will beexecuted automatically.

00409-7

ERROR ERROR: Could not kill clone%d\n

Cause: Failed to kill the hung sub process.

00411-6

ERROR %s Cause: Writing an on-disk version of an in-memorydata object failed attempting to create theintermediate folder. This is a system error.

Action: Check log for the error information in detailand determine why the intermediate folder is notcreated.

SIOS Protection Suite for LinuxPage 357

Page 378: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00411-7

ERROR open(%s Cause: Writing an on-disk version of an in-memorydata object failed while attempting to open atemporary file. This is a system error.

Action: Check log for the error information in detailand determine why the file is not successfullyopened.

00411-8

ERROR write(%s Cause: Writing an on-disk version of an in-memorydata object failed while attempting to write to atemporary file. This is a system error.

Action: Check log for the error information in detailand determine why the file writing is failed.

00411-9

ERROR fsync(%s Cause: Writing an on-disk version of an in-memorydata object failed while attempting to fsync atemporary file. This is a system error.

Action: Check log for the error information in detailand determine why the "fsync" is failed.

00412-0

ERROR close(%s Cause: Writing an on-disk version of an in-memorydata object failed while attempting to close atemporary file. This is a system error.

Action: Check log for the error information in detailand determine why the file closing is failed.

00412-1

ERROR rename(%s, %s Cause: Writing an on-disk version of an in-memorydata object failed while attempting to rename atemporary file {file} to original file {file}. This is asystem error.

Action: Check log for the error information in detailand determine why the file renaming is failed.

00412-2

ERROR open(%s Cause: Writing an on-disk version of an in-memorydata object failed while attempting to open anintermediate directory {directory}. This is a systemerror."

Action: Check log for the error information in detailand determine why the directory open is failed.

SIOS Protection Suite for LinuxPage 358

Page 379: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00412-3

ERROR fsync(%s Cause: Writing an on-disk version of an in-memorydata object failed while attempting to fsync anintermediate directory {directory}. This is a systemerror.

Action: Check log for the error information in detailand determine why the directory "fsync" is failed.

00412-4

ERROR close(%s Cause: Writing an on-disk version of an in-memorydata object failed while attempting to close anintermediate directory {directory}. This is a systemerror.

Action: Check log for the error information in detailand determine why the directory close is failed.

00412-5

ERROR wrote only %d bytes of reques-ted%d\n

Cause: Writing an on-disk version of an in-memorydata object failed as the final size {size} bytes ofwritten data is less than the requested number{number} of bytes.

Action: Check log for the related error information indetail and determine why the data writing is failed.

00412-6

ERROR open(%s Cause: Attempting to open a data file failed during thereading of an on-disk version of the data object intothe buffer. This is a system error.

Action: Check log for the error information in detailand determine why the file open is failed.

00412-7

ERROR open(%s Cause: Attempting to open a temporary data filefailed during the reading of an on-disk version of thedata object into the buffer. This is a system error.

Action: Check log for the error information in detailand determine why the file open is failed.

00412-8

ERROR read(%s Cause: Reading a data file failed during the loading ofan on-disk version of the data object into the buffer.This is a system error.

Action: Check log for the error information in detailand determine why the file reading is failed.

SIOS Protection Suite for LinuxPage 359

Page 380: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00412-9

ERROR read buffer overflow (MAX=%d)\n

Cause: The read buffer limit {max} was reached whileattempting to read an on-disk version of the dataobject into the buffer.

Action: Check the LifeKeeper configuration andrestart the LifeKeeper.

00413-0

ERROR close(%s Cause: Failed to close a data file during reading anon-disk version of the data object into the buffer. Thisis a system error.

Action: Check log for the error information in detailand determine why the file close is failed.

00413-1

ERROR rename(%s, %s Cause: Failed to rename a temporary data file duringreading an on-disk version of the data object into thebuffer. This is a system error.

Action: Check log for the error information in detailand determine why the file renaming is failed.

00413-2

ERROR Can't open%s : %s Cause: Failed to open a directory {directory} witherror {error} during reading an on-disk version of theapplication and resource type information into thebuffer. This is a system error.

Action: Check log for the error information in detailand determine why the open directory is failed.

00413-3

ERROR path argument may not be NULL Cause: The command "lcdrcp" failed during a filecopy because the input source path is missing.

Action: Check the input source path and retry"lcdrcp".

00413-4

ERROR destination path argument maynot be NULL

Cause: The "lcdrcp" command failed during a filecopy because the input destination path is missing.

Action: Check the input destination path and retry"lcdrcp".

00413-5

ERROR destination path can't be zerolength string

Cause: Input destination path was empty during filecopy when using "lcdrcp".

Action: Check the input destination path and retry"lcdrcp".

SIOS Protection Suite for LinuxPage 360

Page 381: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00413-6

ERROR open(%s Cause: Failed to open source file path during file copyusing "lcdrcp". This is a system error.

Action: Check the existence/availability of the inputsource path and retry "lcdrcp". Also check the relatedlog for error information in detail.

00413-7

ERROR fstat(%s Cause: Failed to fetch file attributes using "fstat"during file copy by "lcdrcp". This is a system error.

Action: Check the log for error information in detail.

00413-8

ERROR file \"%s\" is not an ordinary file(mode=0%o

Cause: Detected source file as a none ordinary fileduring file copy using "lcdrcp".

Action: Check the input source file path and retry"lcdrcp".

00415-2

ERROR having \"%s\" depend on \"%s\"would produce a loop

Cause: Adding the requested dependency wouldproduce a loop of dependent relationship.

Action: Correct the requested dependency and retrythe dependency creation.

00416-4

ERROR Priority mismatch betweenresources %s and%s. Dependency creation failed.

Cause: The priorities for {resource1} and {resource2}do not match.

Action: Resource priorities must match. Change oneor both priorities to the same value and retry creatingthe dependency.

00417-6

ERROR %s Cause: The command "doabort" failed to create the{directory} for writing the core file. This is a systemerror.

Action: Check the logs for related errors and try toresolve the reported problem.

00419-0

ERROR %s: ::receive(%d) did notreceivemessage within%dseconds on incoming_mailbox%s

Cause: In function {function}, attempting to receivemessage within timeout {timeout} seconds failed withincomingmailbox {mailbox}.

Action: Check the status of the connections insidethe cluster and retry the process.

00420-5

ERROR destination system \"%s\" isunknown

Cause: Sendingmessage failed due to an unknowndestination system name {system}.

Action: Check the configuration and status of thesystem and check the logs for related errors. Retrythe same process after the system is full initialized.

SIOS Protection Suite for LinuxPage 361

Page 382: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00420-6

ERROR destinationmailbox \"%s\" atsystem \"%s\" is unknown

Cause: Sendingmessage failed due to an unknownmailbox {mailbox} on destination system name{system}. This error may be caused by sending amessage before the LCD is fully initialized.

Action: Check the configuration and status of thesystem and check the logs for related errors. Retrythe same process after the system is full initialized.

00420-8

ERROR destination system \"%s\" isalive but themailbox process isnot listening.

Cause: Sendingmessage failed. The networkconnection to destination system {system} is alivebut the contact to the destinationmailbox is lost.

Action: Check the configuration and status of thesystem and check the logs for related errors. Retrythe same process after the system is full initialized.

00420-9

ERROR destination system \"%s\" isdead.

Cause: Sendingmessage failed due to losingconnection with the destination system {system}.

Action: Check the configuration and status of thesystem and check the logs for related errors. Retrythe same process after the system is full initialized.

00421-1

ERROR can't send to destination \"%s\"error=%d

Cause: Sendingmessage to destination system{system} failed due to internal error {error}.

Action: Check the logs for related errors and try toresolve the reported problem.

00421-7

ERROR destination system \"%s\" is outof service.

Cause: Sendingmessage failed due to losingconnection with the destination system {system}.

Action: Check the configuration and status of thesystem and check the logs for related errors. Retrythe same process after the system is full initialized.

00422-1

ERROR destination system \"%s\" wentout of service.

Cause: Sendingmessage failed due to losingconnection with the destination system {system}.

Action: Check the configuration and status of thesystem and check the logs for related errors. Retrythe same process after the system is full initialized.

00422-8

ERROR Can't get host name from get-addrinfo(

Cause: Creating network object failed due to a failurewhen getting host name using "getaddrinfo()".

Action: Check the configuration and status of thesystem. Do the same process again.

SIOS Protection Suite for LinuxPage 362

Page 383: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00423-4

ERROR IP address pair %s already inuse

Cause: Creating network object failed due to the IPaddress pair {pair} is already used for a TCPcommunication path.

Action: Check the input IP address pair and do thenetwork creation again.

00425-8

WARN Communication to%s by %sFAILED

Cause: Communication to system {system} bycommunication path {path} failed.

Action: Check system configuration and networkconnection.

00426-1

WARN COMMUNICATIONS failoverfrom system \"%s\" will be star-ted.

Cause: A failover from system {system} will bestarted due to all the communications path are down.

Action: Check system configuration and networkconnection status. Confirm the system status whenfailover is done.

00429-2

ERROR resource \"%s\" %s Cause: A resource could not be brought in servicebecause its current state is unknown.

Action: Check the logs for related errors and try toresolve the reported problem.

00429-3

ERROR resource \"%s\" %s Cause: A resource could not be brought into servicebecause its current state disallows it.

Action: Check the logs for related errors and try toresolve the reported problem.

00429-4

ERROR resource \"%s\" requires alicense (for Kit %s/%s) but noneis installed

Cause: The resource's related recovery kit requires alicense.

Action: Install a license for the recovery kit on theserver where the resource was to be brought intoservice.

00429-7

ERROR secondary remote resource\"%s\" onmachine \"%s\" isalready in-service, so resource\"%s\" onmachine \"%s\" can'tbe brought in-service.

Cause: A resource {resource} could not be broughtinto service onmachine {machine} because itssecondary remote resource {resource} is already inservice onmachine {machine}.

Action: Manually change the remote resource out ofservice and do in service on local resource again.

00430-0

ERROR restore of resource \"%s\" hasfailed

Cause: A resource could not be brought into service.

Action: Check the logs for related errors and try toresolve the reported problem.

SIOS Protection Suite for LinuxPage 363

Page 384: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00431-1

ERROR can't perform \"remove\" actionon resources in state \"%s\"

Cause: A resource could not be put out of service dueto the current state being {state}.

Action: Check the logs for related errors and try toresolve the reported problem.

00431-3

ERROR remove of resource \"%s\" hasfailed

Cause: A resource {resource} could not be put out ofservice.

Action: Check the logs for related errors and try toresolve the reported problem.

00431-8

ERROR %s,priv_globact(%d,%s): script%s FAILED returning%d

Cause: A global action script failed with the specifiederror code.

Action: Check the logs for related errors and try toresolve the reported problem.

00433-2

ERROR action \"%s\" has failed onresource \"%s\"

Cause: A resource action failed.

Action: Check the logs for related errors and try toresolve the reported problem.

00435-1

ERROR a \"%s\" equivalency must haveone remote resource

Cause: Creating of a {eqvtype} equivalency faileddue to the two input tag names exist on the samesystem.

Action: Correct the inputs resource tag names anddo the same process again.

00437-6

FATAL wait period of %u seconds forLCM to become available hasbeen exceeded (lock file \"%s\"not removed

Cause: The LCM daemon did not become availablewithin a reasonable time and the LCD cannot operatewithout the LCM.

Action: Check the logs for related errors and try toresolve the reported problem.

00438-6

ERROR initlcdMalloc;shmget Cause: A sharedmemory segment could not beinitialized.

Action: Check the logs for related errors and try toresolve the reported problem. Review the productdocumentation and ensure that the server meets theminimum requirements and that the operating systemis configured properly.

00444-4

WARN License key (for Kit %s/%s) hasEXPIRED

Cause: Your license has expired.

Action: Contact Support to obtain a new license.

SIOS Protection Suite for LinuxPage 364

Page 385: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00444-5

WARN License key (for Kit %s/%s) willexpire at midnight in%ld days

Cause: Your license is about to expire.

Action: Contact Support to obtain a new license.

00446-6

ERROR system \"%s\" not defined onmachine \"%s\".

Cause: The specified system name is not known.

Action: Verify the system name, and try theoperation again.

00446-7

ERROR system \"%s\" unknown onmachine \"%s\"

Cause: The specified system name is notrecognized.

Action: Verify the system name, and try theoperation again.

00449-4

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00449-5

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00449-6

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00449-7

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00449-8

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00449-9

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

SIOS Protection Suite for LinuxPage 365

Page 386: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00450-0

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00450-1

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00450-2

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00450-3

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00450-4

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00450-5

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00450-6

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00450-7

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

SIOS Protection Suite for LinuxPage 366

Page 387: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00450-8

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00450-9

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00451-0

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00451-1

ERROR COMMAND OUTPUT: %s Cause: An action or event script producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

00451-2

ERROR Cause: An error occured on the remotemachine.

Action: Check the logs on the remotemachine foradditional details.

00456-5

ERROR can't set resource state type toILLSTATE

Cause: An attempt was made to put a resource in anillegal state.

Action: Do not try to put a resource into an illegalstate.

00456-7

ERROR resource \"%s\" can't bechanged to \"%s\" state becauseit is a primary resource of aSHARED equivalency toresource \"%s\" onmachine\"%s\" which is in state \"%s\"

Cause: Change resource {resource} to state {state}failed since its SHARED equivalency primaryresource {resource} onmachine {machine} is in state{state}.

Action: Check the input or manually put the primaryresource out of service and try the same processagain.

00460-7

ERROR no resource instance has tag\"%s\"

Cause: No resource with the provided tag exists.

Action: Provide a valid tag, or check the logs forrelated errors and try to resolve the reported problem.

SIOS Protection Suite for LinuxPage 367

Page 388: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00460-8

ERROR no resource instance has iden-tifier \"%s\"

Cause: No resource exists with the providedidentifier.

Action: Provide a valid identifier, or check the logsfor related errors and try to resolve the reportedproblem.

00461-9

ERROR resource with tag \"%s\" alreadyexists with identifier \"%s\"

Cause: The provided tag name already exists.

Action: Choose a different tag name.

00462-0

ERROR resource with identifier \"%s\"already exists with tag \"%s\"

Cause: The provided identifier already exists.

Action: Choose a different identifier to use for thisresource.

00464-3

ERROR Instance tag name is too long. Itmust be shorter than%d char-acters.

Cause: Tag name is too long.

Action: Provide a tag name that is less than 256characters.

00464-6

ERROR Tag name contains illegal char-acters

Cause: Tag name contains an illegal character.

Action: Specify a tag name that does not include oneof these characters: _-./

00469-1

ERROR can't set both tag and identifierat same time

Cause: Both a tag and an identifier were specified.

Action: Provide only one of tag or identifier.

00474-5

ERROR failed to access lkexterrlog path-h=%s

Cause: The utility "lkexterrlog" can not be accessedfor collecting system information.

Action: Check the installation of package "steeleye-lk" andmake sure the utility "lkexterrlog" isaccessible.

00474-6

ERROR lkexterrlog failed runret=%dcmdline=%s

Cause: The execution of utility "lkexterrlog" failedwhen collecting system information.

Action: Check the logs for related errors and try toresolve the reported problem.

00478-2

ERROR Resource \"%s\" was in state\"%s\" before event occurred -recovery will not be attempted

Cause: The resource is already in service. Recoverywill not be attempted.

00478-3

ERROR Resource \"%s\" was already instate \"%s\" before eventoccurred

Cause: A resource was not in an appropriate state toallow recovery.

Action: Put the resource in the ISP state if recoveryis still needed.

SIOS Protection Suite for LinuxPage 368

Page 389: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00478-6

ERROR %s on failing resource \"%s\" Cause: An error occured why attempting to recover aresource.

Action: Check the logs for related errors and try toresolve the reported problem.

00478-8

EMERG failed to remove resource '%s'.SYSTEMHALTED.

Cause: A error occur that prevented a resource frombeing taken out of service during a recovery. Thesystem has been restarted to ensure the resource isnot active on two systems.

Action: Check the logs for related errors and try toresolve the reported problem.

00479-3

ERROR lcdsendremote transfer resource\"%s\" to \"%s\" onmachine\"%s\" failed (rt=%d

Cause: A failure occured while transfering a resourceand it's dependencies to another system.

Action: Check the logs for related errors and try toresolve the reported problem. Check the log on theother system for related errors.

00479-7

ERROR Restore of SHARED resource\"%s\" has failed

Cause: There was an error while restoring a resource.

Action: Check the logs for related errors and try toresolve the reported problem.

00480-6

ERROR Restore in parallel of resource\"%s\" has failed; will re-try seri-ally

Cause: Parallel recovery failed. Check the logs forrelated errors and try to resolve the reported problem.

Action: No action is required. The system willcontinue to recover serially. If the recovery fails,check for error messages related to the resourceswhich failed to recover to find out what further actionsto take.

00481-9

ERROR read_temporal_recovery_log():failed to fopen file: %s. fopen()%s.

Cause: The opening of the temporal recovery log file{file} in preparation for loading into it into memoryfailed with the error {error}.

Action: Check system log files and correct anyreported errors before retying the operation.

00482-0

ERROR read_temporal_recovery_log():failed tomalloc initial buf for tem-poral_recovery_stamp.

Cause: Loading the temporal recovery log informationinto memory failed when attempting to acquirememory to store the log information.

Action: Check system log files and correct anyreported errors before retying the operation.

SIOS Protection Suite for LinuxPage 369

Page 390: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00482-1

ERROR read_temporal_recovery_log():failed to reallocate buffer for tem-poral_recovery_stamp.

Cause: Loading the temporal recovery log informationinto memory failed when attempting to increase theamount of memory required to store the loginformation.

Action: Check system log files and correct anyreported errors before retrying the operation.

00482-2

ERROR write_temporal_recovery_log():failed to open file: %s.

Cause: The update of the temporal recovery log filewas terminated when the open of the temporary file{temporary name} failed.

Action: Check system log files and correct anyreported errors before retrying the operation.

00482-3

ERROR rename(%s, %s) failed. Cause: The update of the temporal recovery log filewas terminated when the rename of the temporary file{temporary name} to the real log file {real name} failed.

Action: Check system log files and correct anyreported errors before retrying the operation.

00482-9

FATAL err=%s line=%dSemid=%dnumops=%zd perror=%s

Cause: Themodification of semaphore ID{semaphore} failed with error {err} and error messagedescription {perror}.

Action: Check adjacent logmessages for moredetails. Also, check the system log files and correctany reported errors before retrying the operation.

00486-0

ERROR restore ftok failed for resource%s with path%s

Cause: The attempt to generate an IPC key for use insemaphore operations for resource {tag} using path{path} failed. This is a system error.

Action: Check adjacent logmessages for moredetails. Also check system log files and correct anyreported errors before retrying the operation.

00486-1

ERROR semget failed with error%d Cause: The attempt to retrieve the semaphoreidentification associated with the instances files hasfailed. This is a system error.

Action: Check adjacent logmessages for moredetails. Also check system log files and correct anyreported errors before retying the operation.

SIOS Protection Suite for LinuxPage 370

Page 391: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00486-2

ERROR semctl SEMSET failed with error%d

Cause: The attempt to create and initialize asemaphore used during the recovery process hasfailed with the error {error number}. This is a systemerror.

Action: Check adjacent logmessages for moredetails. Also, check system log files and correct anyreported errors before retying the operation.

00486-3

ERROR semop failed with error%d Cause: The attempt to set a semaphore used duringthe recovery process has failed with the error {errornumber}. This is a system error.

Action: Check adjacent logmessages for moredetails. Also, check system log files and correct anyreported errors before retying the operation.

00486-4

ERROR semctl SEMSET failed with error%d

Cause: The attempt to release a semaphore usedduring the recovery process has failed with the error{error number}. This is a system error.

Action: Check adjacent logmessages for moredetails. Also, check system log files and correct anyreported errors before retying the operation.

00486-5

ERROR restore action failed for resource%s

Cause: The attempt to bring resource {tag} In Servicehas failed.

Action: Check adjacent logmessages for moredetails. Correct any reported errors and retry theoperation.

00487-2

ERROR Remote remove of resource\"%s\" onmachine \"%s\" failed(rt=%d

Cause: The request to take resource {tag} Out ofService on {server} for transfer to the local systemhas failed.

Action: Check adjacent logmessages for moredetails on the local system. Also, check the logmessages on {server} for further details on the failureto remove the resource.

00487-5

ERROR remote remove of resource\"%s\" onmachine \"%s\" failed

Cause: The request to take resource {tag} Out ofService on {server} for transfer to the local systemhas failed.

Action: Check adjacent logmessages for moredetails on the local system. Also, check the logmessages on {server} for further details on the failureto remove the resource.

SIOS Protection Suite for LinuxPage 371

Page 392: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00487-6

ERROR remote remove of resource\"%s\" onmachine \"%s\" failed

Cause: The request to take resource {tag} Out ofService on {server} for transfer to the local systemhas failed.

Action: Check adjacent logmessages for moredetails on the local system. Also, check the logmessages on {server} for further details on the failureto remove the resource.

00504-5

ERROR tli_fdget_i::execute unable toestablish a listener port

Cause: A network connection could not be properlyconfigured.

Action: Verify that all network hardware and driversare properly configured. If this message continuesand resources cannot be put into service, contactSupport.

00505-5

ERROR tli_fdget_o::execute - async con-nect failure

Cause: A network connection could not be properlyconfigured.

Action: Verify that all network hardware and driversare properly configured. If this message continuesand resources cannot be put into service, contactSupport.

00506-1

ERROR tli_fdget_o::execute - bindsocket

Cause: A network connection could not be properlyconfigured.

Action: Verify that all network hardware and driversare properly configured. If this message continuesand resources cannot be put into service, contactSupport.

00514-5

ERROR opening the file Cause: A pipe could not be opened or created.

Action: Check adjacent logmessages for moredetails.

00522-5

WARN so_driver::handle_error: send-ing/receiving datamessageerrno%d: %s

Cause: A message failed to be sent or received.

Action: Check adjacent logmessages for moredetails. This may be a temporary error, but if this errorcontinues and servers cannot communicate, verifythe network configuration on the servers.

00601-2

ERROR quickCheck script '%s' failed toexit after %u seconds. Forciblyterminated. Please examine thescript or adjust theLKCHECKINTERVAL para-meter in%s.

Cause: A quickCheck script is probably taking toolong or hanging.

Action: Perform the steps listed in themessage text.

SIOS Protection Suite for LinuxPage 372

Page 393: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00610-2

ERROR COMMAND OUTPUT:$LKROOT/bin/sendevent

Cause: This is output from a "sendevent "(eventgenerator) command.

Action: Check adjacent logmessages for moredetails.

00610-3

ERROR COMMAND OUTPUT:$LKROOT/bin/sendevent

Cause: This is output from a "sendevent" (eventgenerator) command.

Action: Check adjacent logmessages for moredetails.

00610-4

ERROR COMMAND OUTPUT:$LKROOT/bin/sendevent

Cause: This is output from a "sendevent" (eventgenerator) command.

Action: Check adjacent logmessages for moredetails.

00705-8

ERROR %s: %s failed on '%s', res-ult:%d, Sense Key = %d.

Cause: A SCSI device couldn't be reserved or haveits status checked. This may be because the storageis malfunctioning or because the disk has beenreserved by another server.

Action: Check adjacent logmessages for moredetails and verify that resources are being handledproperly.

00705-9

ERROR %s: %s failed on '%s', res-ult:%d.

Cause: A SCSI device couldn't be reserved or haveits status checked. This may be because the storageis malfunctioning or because the disk has beenreserved by another server.

Action: Check adjacent logmessages for moredetails and verify that resources are being handledproperly.

00706-0

EMERG %s: failure on device '%s'.SYSTEMHALTED.

Cause: A SCSI device couldn't be reserved or haveits status checked. This may be because the storageis malfunctioning or because the disk has beenreserved by another server. THE SERVER WILL BEREBOOTED/HALTED.

Action: Verify that the storage is functioning properlyand, if so, that resources were handled properly andhave been put in service on another server.

SIOS Protection Suite for LinuxPage 373

Page 394: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

00707-2

ERROR %s: failed to open SCSI device'%s', initiate recovery. errno-o=0x%x, retry count=%d.

Cause: The protected SCSI device could not beopened. The devicemay be failing or may have beenremoved from the system.

Action: The system will be halted or a failover to thebackup node will be initiated. The default action inthis case is a failover, but this can bemodified withthe SCSIERROR tunable.

00707-3

ERROR %s: failed to open SCSI device'%s', RETRY. errno=%d, retrycount=%d.

Cause: The protected SCSI device could not beopened. The devicemay be failing or may have beenremoved from the system.

Action: This error is not critical. The operation will beretried in 5 seconds. If the problem persists, thesystem will perform a halt or resource failover.

00707-5

ERROR %s: RESERVATIONCONFLICT on SCSI device'%s'. ret=%d, errno=0x%x, retrycount=%d.

Cause: A SCSI device couldn't be reserved due to aconflict with another server. This may be because thestorage is malfunctioning or because the disk hasbeen reserved by another server.

Action: Check adjacent logmessages for moredetails and verify that resources are handled properly.

00707-7

ERROR %s: DEVICE FAILURE onSCSI device '%s', initiate recov-ery. ret=%d, errno=0x%x, retrycount=%d.

Cause: A SCSI device couldn't be reserved or haveits status checked. This may be because the storageis malfunctioning or because the disk has beenreserved by another server.

Action: Check adjacent logmessages for moredetails and verify that resources are handled properly.

00707-8

ERROR %s: DEVICE FAILURE onSCSI device '%s', RETRY. ret-t=%d, errno=0x%x, retry coun-t=%d.

Cause: A SCSI device couldn't be reserved or haveits status checked. This may be because the storageis malfunctioning or because the disk has beenreserved by another server.

Action: Check adjacent logmessages for moredetails and verify that resources are handled properly.

01000-2

WARN flag $flag not present, sendmes-sage again.

Cause: This message indicates an incompleteprocess that will be retried.

Action: Check adjacent logmessages for repeatederrors.

SIOS Protection Suite for LinuxPage 374

Page 395: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01000-3

ERROR COMMAND OUTPUT:$LKBIN/ins_remove

Cause: This message is part of the output from an"ins_remove" command.

Action: Check adjacent logmessages for moredetails. This may not be a true error.

01000-4

EMERG Cannot find the $LKROOT/-config/lkstart-hook file. Youmust install the LifeKeeper Dis-tribution Enabling packagebefore starting $PRODUCT.

Cause: This error may be caused by installingLifeKeeper directly using the rpms, or perhaps byinadvertently removing an rpm from the system.

Action: You should reinstall LifeKeeper using the'setup' script on the installationmedia.

01000-6

WARN flg_list -d $i took more than$pswait seconds to complete...

Cause: A flag list operation on a server took muchlonger than expected. Theremay be a problemcommunicating with the other server.

Action: Check adjacent logmessages for moredetails.

01000-7

ERROR flag $flag not present,switchovers may occur.

Cause: One of the servers in the cluster could not betold to disallow failover operations from the currentserver.

Action: Check adjacent logmessages for moredetails andmonitor the cluster for unexpectedbehavior.

01000-8

WARN flag $flag not present, sendmes-sage again.

Cause: A process is incomplete but will be retried.

Action: Check adjacent logmessages for moredetails and for repeated warnings/errors.

01002-3

FATAL LifeKeeper failed to initializeproperly.

Cause: There was a fatal error while attempting tostart LifeKeeper.

Action: Check adjacent logmessages for moredetails.

01002-5

ERROR `printf 'Unable to get a unique tagname on server "%s" for tem-plate resource "%s"' $MACH$DISK`

Cause: A suitable tag during the create process for astorage resource could not be automaticallygenerated.

Action: Check adjacent logmessages for moredetails. Retry the operation if there are no othererrors.

SIOS Protection Suite for LinuxPage 375

Page 396: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01003-4

FATAL Unable to start lcm. Cause: A core component of the software could notbe started.

Action: Check adjacent logmessages for moredetails and try to resolve the reported problem.

01003-8

WARN Waiting for LifeKeeper core com-ponents to initialize hasexceeded 10 seconds. Continu-ing anyway, check logs for fur-ther details.

Cause: Some parts of the software are taking longerthan expected to start up.

Action: Perform the steps listed in themessage text.

01003-9

WARN Waiting for LifeKeeper core com-ponents to initialize hasexceeded 10 seconds. Continu-ing anyway, check logs for fur-ther details.

Cause: Some parts of the software are taking longerthan expected to start up.

Action: Perform the steps listed in themessage text.

01004-6

ERROR The dependency creation failedon server $SERVER:" `cat$TEMP_FILE`

Cause: A dependency relationship could not becreated on the given server.

Action: Check the logs for related errors and try toresolve the reported problem

01006-3

ERROR $REMSH error Cause: A command to request data to backup fromanother server failed.

Action: Check the logs for related errors and try toresolve the reported problem.

01008-5

ERROR lkswitchback($MACH): Auto-matic switchback of \"$loctag\"failed

Cause: The resource was not switched back over asexpected.

Action: Check the logs for related errors and try toresolve the reported problem.

01010-2

ERROR adminmachine not specified Cause: Invalid parameters were specified for the"getlocks" operation.

Action: Verify the parameters and retry the operation.If this error happens during normal operation, contactSupport.

01010-7

WARN Lock for $m is ignored becausesystem is OOS

Cause: A lock was ignored because the system forwhich the lock was created is not alive.

Action: Check the logs for related errors. This maybe a harmless error.

SIOS Protection Suite for LinuxPage 376

Page 397: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01010-8

ERROR lock acquisition timeout Cause: Acquiring a lock tool longer thanexpected/allowed.

Action: Check the logs for related errors and try toresolve the reported problem.

01010-9

ERROR could not get admin locks." `cat/tmp/ER$$`

Cause: The software failed to acquire a lock that isrequired tomanage resources.

Action: Check the logs for related errors and try toresolve the reported problem.

01011-2

ERROR lcdrcp failed with error no:$LCDRCPRES

Cause: A file could not be copied to another server.

Action: Check the logs for related errors and try toresolve the reported problem.

01011-6

ERROR unable to set !lkstop flag Cause: A flag could not be set to indicate that theserver is being stopped by user request.

Action: Check the logs for related errors and try toresolve the reported problem.

01012-1

ERROR Extended logs aborted due to afailure in opening $destination.($syserrmsg

Cause: The execution of utility "lkexterrlog" failedwhen opening the extended log file.

Action: Check the logs for related errors and try toresolve the reported problem.

01013-2

ERROR Unable to retrieve reservation idfrom "%s". Error: "%s". Attempt-ing to regenerate.

Cause: Unable to open the file/opt/LifeKeeper/config/.reservation_id to retrieve theunique id used for SCSI 3 persistent reservations.

Action: None. An attempt will bemade to regeneratethe ID and update the file.

01013-3

ERROR No reservation ID exists in"%s". A new ID must be gen-erated by running "%s/bin/-genresid -g" on "%s".

Cause: A unique reservation id has not been defined.

Action: Take all resources out of service on this nodeand then run "/opt/LifeKeeper/bin/genresid -g" togenerate a unique reservation id.

01013-4

ERROR LifeKeeper does not appear to berunning. Unable to determine theuniqueness of the reservation IDwithin the cluster.

Cause: LifeKeeper was not running when an attemptto check the uniqueness of the reservation id wasmade.

Action: Start LifeKeeper and rerun the uniquenesscheck via "/opt/LifeKeeper/bin/genresid -v".

SIOS Protection Suite for LinuxPage 377

Page 398: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01013-5

ERROR The current reservation ID of"%s" is not unique within thecluster. A new ID must be gen-erated by running "%s/bin/-genresid -g" on "%s".

Cause: The reservation id defined for the system isnot unique within the cluster and cannot be used.

Action: Take all resources out of service on this nodeand then run "/opt/LifeKeeper/bin/genresid -g" togenerate a unique reservation id.

01013-6

ERROR Unable to store reservation id in"%s". Error: "%s"

Cause: Unable to open the file/opt/LifeKeeper/config.reservation_id to store theunique id used for SCSI 3 persistent reservations.

Action: Correct the error listed as the reason theopen failed and then take all resources out of serviceon this node and then run"/opt/LifeKeeper/bin/genresid -g" to generate a newunique reservation id.

01013-7

ERROR Failed to generate a reservationID that is unique within thecluster.

Cause: The generated reservation id is alreadydefined on another node in the cluster andmust beunique within the cluster.

Action: Take all resources out of service on this nodeand then run "/opt/LifeKeeper/bin/genresid -g" togenerate a new unique reservation id.

01022-2

ERROR scsifree(%s): LKSCSI_Release(%s) unsuccessful

Cause: A SCSI device that appeared to be reservedwas not released as expected.

Action: Check the logs for related errors and try toresolve any reported problems. This error may bebenign if the system is functioning properly.

01023-1

ERROR scsiplock(%s): reserve failed. Cause: A reservation on a SCSI device could not beacquired.

Action: Check the logs for related errors and try toresolve the reported problem.

01025-0

ERROR Failed to exec command '%s' Cause: The "lklogmsg" tool failed to execute a sub-command {command}.

Action: Check the logs for related errors and try toresolve any reported problems. Verify that the sub-command exists and is a valid command or program.If this message happens during normal operation,contact Support.

SIOS Protection Suite for LinuxPage 378

Page 399: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01040-2

EMERG local recovery failure onresource $opts{'N'}, triggerVMware HA...

Cause: When in LifeKeeper Single Server Protectionoperation, a resource could not be recovered andVMware-HA is about to be triggered to handle thefailure (if VMware-HA is enabled).

Action: No action is required. VMware should handlethe failure.

01041-3

ERROR COMMAND OUTPUT: cat /tm-p/err$$

Cause: This is the output from an "snmptrap"command that may have failed.

Action: Check the logs for related errors and try toresolve the reported problem.

01042-0

EMERG local recovery failure onresource $opts{'N'}, triggerreboot...

Cause: When in LifeKeeper Single Server Protectionoperation, a resource could not be recovered and areboot is about to be triggered to handle the failure.

Action: No action is required.

01044-0

ERROR [$SUBJECT event] mailreturned $err

Cause: This indicates a notification email could notbe sent via the "mail" command.

Action: Check the logs for related errors and try toresolve the reported problem.

01044-3

ERROR COMMAND OUTPUT: cat /tm-p/err$$

Cause: This is the output from a "mail" command thatmay have failed.

Action: Check the logs for related errors and try toresolve the reported problem.

01044-5

ERROR COMMAND OUTPUT: cat /tm-p/err$$

Cause: This is the output from a "mail" command thatmay have failed.

Action: Check the logs for related errors and try toresolve the reported problem.

01046-3

ERROR LifeKeeper: name of machine isnot specified, ARGS=$ARGS

Cause: Invalid arguments were specified for the"comm_down" event.

Action: Check your LifeKeeper configuration andretry the operation.

SIOS Protection Suite for LinuxPage 379

Page 400: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01047-1

ERROR COMM_DOWN: Attempt toobtain local comm_down lockflag failed

Cause: During the handling of a communicationfailure with another node, a local lock could not beacquired. This will likely stop a failover fromproceeding.

Action: Check the logs for related errors and try toresolve the reported problem. If failovers are nottaking place properly, contact Support.

01048-2

ERROR LifeKeeper: name of machine isnot specified, ARGS=$ARGS

Cause: Invalid arguments were specified for the"comm_up" event.

Action: Check your LifeKeeper configuration andretry the operation.

01048-4

WARN flg_list -d $MACH check timed-out ($delay seconds).

Cause: "flg_list" command reached its timeout value{delay} seconds.

01048-7

WARN flg_list -d $MACH check timed-out, unintended switchoversmay occur.

Cause: "flg_list" command reached its timeout value.

Action: Switch back the resource tree if unintendedswitchover occurs.

01049-2

WARN $m Cause: One of other servers looked dead to thisserver {server}, but witness servers did not agree.

Action: Ensure other server is dead and switch overthe resourcemanually.

01049-4

ERROR LifeKeeper: COMM_UP tomachine $MACH completedwith errors.

Cause: An unexpected failure occurred during"COMM_UP" event.

Action: Check adjacent logmessages for moredetails.

01050-3

ERROR lcdrecover hung or returnederror, attempting kill of process$FPID

Cause: "lcdrecover" took too long or errored out.

01050-6

ERROR Intelligent Switchback CheckFailed

Cause: Failed 5 times to perform "lcdrecover."

Action: Switch over the resource treemanually.

01050-8

ERROR [$SUBJECT] sent to $LK_NOTIFY_ALIAS

Cause: This message is for information only.

01060-0

ERROR removing hierarchy remnants

SIOS Protection Suite for LinuxPage 380

Page 401: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01062-7

WARN Equivalency Trim: does not havea full complement of equi-valencies. Hierarchy will beunextended from

01062-9

WARN Your hierarchy exists on onlyone server. Your application hasno protection until you extend itto at least one other server.

01071-2

ERROR Unextend hierarchy failed Cause: A resource hierarchy failed to be unextendedfrom a server.

Action: Check the logs for related errors and try toresolve the reported problem.

01074-6

ERROR $ERRMSGTarget machine\"$TARGET_MACH\" does nothave an active LifeKeeper com-munication path tomachine\"$aMach\" in the hierarchy.">&2

Cause: A hierarchy cannot be unextended becausethe target server does not have adequatecommunication with the other servers in the cluster.

Action: Check the logs for related errors and try toresolve the reported problem. Ensure that all servershave communication paths to each other.

01100-0

ERROR appremote: unknown commandtype%d('%c')\n

Cause: Internal error.

Action: Try restarting the product.

01100-1

ERROR depremote: unknown commandtype%d('%c')\n

Cause: Internal error.

Action: Try restarting the product.

01100-2

ERROR eqvremote: unknown commandtype%d('%c')\n

Cause: Internal error.

Action: Try restarting the product.

01100-3

ERROR flgremote: unknown commandtype%d('%c')\n

Cause: Internal error.

Action: Try restarting the product.

01100-4

WARN Illegal creation of resource Cause: This will not occur under normalcircumstances.

01101-1

FATAL %s Cause: LifeKeeper could not get IPC key.

Action: Check adjacent logmessages for moredetails.

01101-2

FATAL semget(%s,%c Cause: LifeKeeper could not get semaphore set id.

Action: Check adjacent logmessages for moredetails.

SIOS Protection Suite for LinuxPage 381

Page 402: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01101-3

FATAL shmget(%s,%c Cause: System could not allocate a sharedmemorysegment.

Action: Check adjacent logmessages for moredetails.

01101-4

FATAL prefix_lkroot("out" Cause: A system error has occurred while accessing/opt/LifeKeeper/out.

Action: Determine why /opt/LifeKeeper/out is notaccessible.

01101-5

ERROR DEMO_UPGRADE_MSG Cause: You are running a demo license.

Action: Contact Support to obtain a new license.

01101-6

ERROR lic_single_node_msg Cause: You have a license for LifeKeeper SingleServer Protection but you do not have LifeKeeperSingle Server Protection installed.

Action: Either install LifeKeeper Single ServerProtection or obtain a license that matches theproduct you are running.

01101-7

ERROR lic_init_fail_msg, "flex_initfailed"

Cause: Licensemanager initialization failed.

Action: Check adjacent logmessages for moredetails.

01101-8

ERROR lic_init_fail_msg, lc_errstring(lm_job

Cause: Licensemanager initialization failed.

Action: Check adjacent logmessages for moredetails.

01101-9

EMERG lic_init_fail_msg, "flex_initfailed"

Cause: Licensemanager initialization failed.

Action: Check adjacent logmessages for moredetails.

01102-0

EMERG lic_init_fail_msg, lc_errstring(lm_job

Cause: Licensemanager initialization failed.

Action: Check adjacent logmessages for moredetails.

01102-1

EMERG lic_error_msg, lc_errstring(lm_job

Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01102-2

EMERG lic_error_msg, lc_errstring(lm_job

Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

SIOS Protection Suite for LinuxPage 382

Page 403: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01102-3

EMERG lic_no_rest_suite, "" Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01102-4

EMERG lic_error_msg, lc_errstring(lm_job

Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01102-5

EMERG lic_no_license, "" Cause: LifeKeeper could not find valid license keys.

Action: Ensure license keys are valid for the serverand retry the operation.

01102-6

EMERG lic_error_msg, lc_errstring(lm_job

Cause: There is an unknown problem with yourlicense.

Action: Contact Support to obtain a new license.

01102-7

EMERG lic_no_license, "" Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01102-8

ERROR lang_error_msg Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01102-9

FATAL can't set reply system Cause: A message failed to be sent.

Action: Check adjacent logmessages for moredetails. This may be a temporary error.

01103-0

FATAL can't set reply mailbox Cause: A message failed to be sent.

Action: Check adjacent logmessages for moredetails. This may be a temporary error.

01103-1

ERROR Failure reading output of '%s' onbehalf of %s

Cause: A system error has occurred while accessingtemporary file /tmp/OUT.{pid}.

Action: Determine why /tmp/OUT.{pid} is notaccessible.

01103-2

ERROR Failure reading output of '%s' Cause: A system error has occurred while accessingtemporary file /tmp/ERR.{pid}.

Action: Determine why /tmp/ERR.{pid} is notaccessible.

01103-3

ERROR event \"%s,%s\" already postedfor resource with id \"%s\"

Cause: This message is for information only.

SIOS Protection Suite for LinuxPage 383

Page 404: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01103-4

ERROR no resource has id of \"%s\" Cause: LifeKeeper could not find the {id} resource.

Action: Verify the parameters and retry the"sendevent" operation.

01104-4

ERROR flagcleanup:fopen(%s Cause: A system error has occurred while reading/opt/LifeKeeper/config/flg.

Action: Determine why /opt/LifeKeeper/config/flg isnot readable.

01104-5

ERROR flagcleanup:fopen(%s Cause: A system error has occurred while writing/opt/LifeKeeper/config/flg.

Action: Determine why /opt/LifeKeeper/config/flg isnot writable.

01104-6

ERROR flagcleanup:fputs(%s Cause: A system error has occurred while writing/opt/LifeKeeper/config/flg.

Action: Determine why /opt/LifeKeeper/config/flg isnot writable.

01104-7

ERROR flagcleanup:rename(%s,%s Cause: A system error has occurred while renaming/opt/LifeKeeper/config/.flg to/opt/LifeKeeper/config/flg.

Action: Determine why /opt/LifeKeeper/config/flgwas not able to be renamed.

01104-8

ERROR flagcleanup:chmod(%s Cause: A system error has occurred while changingpermissions in /opt/LifeKeeper/config/flg.

Action: Determine why LifeKeeper could not changepermissions in /opt/LifeKeeper/config/flg.

01104-9

ERROR License check failed with errorcode%d

Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01105-1

ERROR lcdinit: clearing Disk Reservefile failed

Cause: A system error has occurred while writing/opt/LifeKeeper/subsys/scsi/resources/disk/disk.reserve.

Action: Determine why/opt/LifeKeeper/subsys/scsi/resources/disk/disk.reserve is not writable.

SIOS Protection Suite for LinuxPage 384

Page 405: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01105-2

FATAL malloc() failed Cause: The system could not allocatememory forLifeKeeper.

Action: Increase the process limit for the datasegment.

01105-3

FATAL lcm_is_unavail Cause: A system error has occurred while writing/tmp/LK_IS_UNAVAIL.

Action: Determine why /tmp/LK_IS_UNAVAIL is notwritable.

01105-4

FATAL lk_is_unavail Cause: A system error has occurred while writing/opt/LifeKeeper/config/LK_IS_ON.

Action: Determine why /opt/LifeKeeper/config/LK_IS_ON is not writable.

01105-5

FATAL usr_alarm_config_LK_IS_ON Cause: A system error has occurred while writing/tmp/LCM_IS_UNAVAI.

Action: Determine why /tmp/LCM_IS_UNAVAI isnot writable.

01105-6

ERROR License check failed with errorcode%d

Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01105-7

ERROR lcdremote: unknown commandtype%d('%c')\n

Cause: A message failed to be received.

Action: Check adjacent logmessages for moredetails. This may be a temporary error, but if this errorcontinues and servers cannot communicate, verifythe network configuration on the servers.

01105-9

FATAL Could not write to: %s Cause: A system error has occurred while accessing/opt/LifeKeeper/config/LK_START_TIME.

Action: Determine why /opt/LifeKeeper/config/LK_START_TIME is not accessible.

01106-0

FATAL received NULLmessage Cause: A message failed to be received.

Action: Check adjacent logmessages for moredetails. This may be a temporary error, but if this errorcontinues and servers cannot communicate, verifythe network configuration on the servers.

SIOS Protection Suite for LinuxPage 385

Page 406: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01106-1

ERROR unknown data type%d('%c') onmachine \"%s\"\n

Cause: A message failed to be received.

Action: Check adjacent logmessages for moredetails. This may be a temporary error, but if this errorcontinues and servers cannot communicate, verifythe network configuration on the servers.

01106-2

WARN LifeKeeper shutdown inprogress. Unable to perform fail-over recovery processing for%s\n

Cause: LifeKeeper was unable to fail over the givenresource during shutdown.

Action: Switch over the resource tree to other servermanually.

01106-3

WARN LifeKeeper resource initializationin progress. Unable to performfailover recovery processing for%s\n

Cause: LifeKeeper was unable to fail over the givenresource during start up.

Action: Switch over the resource treemanually afterLifeKeeper starts up.

01106-8

ERROR ERROR on command%s Cause: An error occurred while running the "rlslocks"command.

Action: Check adjacent messages for more details.

01107-0

ERROR ERROR on command%s Cause: An error occurred while running the "getlocks"command.

Action: Check adjacent logmessages for moredetails.

01108-1

FATAL Failed to ask ksh to run: %s Cause: A system error has occurred while invokingksh.

Action: Make sure the pdksh (v8.0 and earlier) or thesteeleye-pdksh (v81 and later) package is installed.

01108-2

ERROR Failed to remove: %s Cause: A system error has occurred while removing/tmp/LCM_IS_UNAVAIL.

Action: Determine why /tmp/LCM_IS_UNAVAIL isnot removable.

01108-3

ERROR Failed to remove: %s Cause: A system error has occurred while trying tounlink /tmp/LK_IS_UNAVAIL.

Action: Determine why /tmp/LK_IS_UNAVAIL is notremovable.

SIOS Protection Suite for LinuxPage 386

Page 407: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01108-4

FATAL Failed to generate an IPC keybased on: %s

Cause: A system error has occurred while accessing/opt/LifeKeeper.

Action: Determine why /opt/LifeKeeper is notaccessible.

01108-5

ERROR semget(%s,%c) failed Cause: A system error has occurred while removinga semaphore.

Action: Try to remove the semaphoremanually.

01108-6

ERROR shmget(%s,%c) failed Cause: A system error has occurred while removinga sharedmemory segment.

Action: Try to remove the sharedmemory segmentmanually.

01108-7

ERROR semctl(IPC_RMID) failed Cause: A system error has occurred while removinga semaphore.

Action: Try to remove the semaphoremanually.

01108-8

ERROR shmctl(IPC_RMID) failed Cause: A system error has occurred while removinga sharedmemory segment.

Action: Try to remove the sharedmemory segmentmanually.

01108-9

FATAL Execution of lcdstatus onremote system <%s> failed\n

Cause: The remote {node} is down, inaccessible viathe network or some other system problem occurredon the remote node.

Action: Bring the remote node back online, or checkadjacent messages for additional information, orcheck the logs on the remote node for additionalinformation.

01109-1

WARN Cause: This will not occur under normalcircumstances.

01109-2

FATAL Cause: There is a problem with your license.

Action: Perform the steps listed in themessage text.

01109-3

FATAL Cause: There is a problem with your license.

Action: Perform the steps listed in themessage text.

01109-4

FATAL Cause: There is a problem with your license.

Action: Perform the steps listed in themessage text.

SIOS Protection Suite for LinuxPage 387

Page 408: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01109-5

FATAL Cause: There is a problem with your license.

Action: Perform the steps listed in themessage text.

01109-6

FATAL Cause: There is a problem with your license.

Action: Perform the steps listed in themessage text.

01109-7

FATAL Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01109-8

FATAL Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01109-9

FATAL Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01110-0

FATAL Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01110-1

FATAL Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01110-2

FATAL Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01110-3

FATAL Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01110-4

FATAL Cause: There is a problem with your license.

Action: Contact Support to obtain a new license.

01110-5

FATAL Cause: There is a problem with your license.

Action: Perform the steps listed in themessage text.

01111-1

ERROR action \"%s\" on resource withtag \"%s\" has failed

Cause: The {action} for resource {tag} has failed.

Action: See adjacent error messages for furtherdetails.

01111-2

ERROR Cause: LifeKeeper could not find the network device.

Action: Check your LifeKeeper configuration.

SIOS Protection Suite for LinuxPage 388

Page 409: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01111-7

ERROR sysremote: system \"%s\" notfound on \"%s\"

Cause: An invalid system namewas provided.

Action: Recheck the system name and rerun thecommand.

01112-9

ERROR Failure during run of '%s' onbehalf of %s

Cause: Command execution failed.

Action: Seemessage details to determine theproblem.

01113-0

ERROR %s Cause: The command {command} producedunexpected output.

Action: Action should be determined by the contentof adjacent error messages.

01113-1

EMERG demo_update_msg Cause: There is a problem with your demo license.

Action: Contact Support to obtain a new license.

01113-2

EMERG demo_tamper_msg Cause: You have a demo license and clocktampering has been detected.

Action: Contact Support to obtain a new license.

01113-3

EMERG demo_tamper_msg Cause: You have a demo license and clocktampering has been detected.

Action: Contact Support to obtain a new license.

01113-4

EMERG demo_expire_msg Cause: The demo license for this product hasexpired.

Action: Contact Support to obtain a new license.

01113-5

EMERG demo_tamper_msg Cause: You have a demo license and clocktampering has been detected.

Action: Contact Support to obtain a new license.

01113-6

EMERG buf Cause: You are running a demo license.

Action: Contact Support to obtain a new license.

01113-8

EMERG buf Cause: You are running a demo license.

Action: Contact Support to obtain a new license.

01114-2

WARN LifeKeeper Recovery Kit %slicense key NOT FOUND

Cause: An Application Recovery Kit license for {kit}was not found.

Action: Contact Support to obtain a new license.

SIOS Protection Suite for LinuxPage 389

Page 410: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

01115-0

ERROR COMMAND OUTPUT: %s Cause: The command "eventslcm" producedunexpected output.

Action: Check the logs for related errors and try toresolve the reported problem.

01115-1

EMERG &localebuf[3] Cause: This version of the LifeKeeper core packageis restricted to being used within the territories of thePeople's Republic of China or Japan.

01115-2

EMERG Localized license failure Cause: There was amis-match between your localeand the locale for which the product license wascreated.

Action: Contact Support to obtain a new licensewhichmatches your locale.

01115-4

EMERG Single Node flag check failed. Cause: You have a license for LifeKeeper SingleServer Protection but you do not have LifeKeeperSingle Server Protection installed.

Action: Either install LifeKeeper Single ServerProtection or obtain a license that matches theproduct you are running.

01115-5

EMERG lic_master_exp_msg, "" Cause: Your license key for this product has expired.

Action: Contact Support to obtain a new license.

01116-2

EMERG lic_restricted_exp_msg, "" Cause: Your license key for this product has expired.

Action: Contact Support to obtain a new license.

01116-3

EMERG Single Node license check failed Cause: You have a license for LifeKeeper SingleServer Protection but you do not have LifeKeeperSingle Server Protection installed.

Action: Either install LifeKeeper Single ServerProtection or obtain a license that matches theproduct you are running.

01116-4

EMERG demo_expire_msg, DEMO_UPGRADE_MSG

Cause: Your license key for this product has expired.

Action: Please contact Support to obtain apermanent license key for your product.

01500-0

ERROR COMMAND OUTPUT: /op-t/LifeKeeper/sbin/steeleye-light-tpd

Cause: An error has ocurred with the "steeleye-lighttpd" process. Specific details of the error areincluded in the actual logmessage.

Action: Correct the configration and "steeleye-lighttpd" will automatically be restarted.

SIOS Protection Suite for LinuxPage 390

Page 411: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10300-1

ERROR LifeKeeper has detected an errorwhile trying to determine thenode number(s) of the DB par-tition server(s) for the instance

Cause: The db2nodes.cfg does not contain anyserver names.

Action: Ensure the db2nodes.cfg is valid.

10300-2

ERROR LifeKeeper was unable to get theversion for the requestedinstance "%s"

Cause: "db2level" command did not return DB2version.

Action: Check your DB2 configuration.

10300-3

ERROR LifeKeeper has detected an errorwhile trying to determine thenode number(s) of the DB par-tition server(s) for the instance

Cause: The DB2 Application Recovery Kit wasunable to find any nodes for the DB2 instance.

Action: Check your DB2 configuration.

10300-4

ERROR Unable to get the information forresource "%s"

Cause: Failed to get resource information.

Action: Check your LifeKeeper configuration.

10300-5

ERROR Unable to get the information forresource "%s"

Cause: Failed to get resource information.

Action: Check your LifeKeeper configuration.

10300-6

ERROR Unable to get the instanceinformation for resource "%s"

Cause: Failed to get the instance information.

Action: Check your LifeKeeper configuration.

10300-7

ERROR Unable to get the instance homedirectory information forresource "%s"

Cause: Failed to get the instance home directorypath.

Action: Check your LifeKeeper configuration.

10300-8

ERROR Unable to get the instance typeinformation for resource "%s"

Cause: The DB2 Application Recovery Kit foundinvalid instance type.

Action: Check your LifeKeeper configuration.

10300-9

ERROR LifeKeeper has encountered anerror while trying to get the data-base configuration parametersfor database \"$DB\"

Cause: There was an unexpected error running "db2get db cfg for $DB" command.

Action: Check the logs for related errors and try toresolve the reported problem.

10301-2

ERROR LifeKeeper was unable to startthe database server for instance"%s"

Cause: The requested startup of the DB2 instancefailed.

Action: Check the logs for related errors and try toresolve the reported problem. Correct any reportederrors before retrying the "restore" operation.

SIOS Protection Suite for LinuxPage 391

Page 412: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10301-3

ERROR LifeKeeper was unable to startthe database server for instance"%s"

Cause: The requested startup of the DB2 instancefailed.

Action: Check the logs for related errors and try toresolve the reported problem. Correct any reportederrors before retrying the "restore" operation.

10301-5

ERROR An entry for the home directory"%s" of instance "%s" does notexist in "/etc/fstab"

Cause: The home directory of instance of MultiplePartition database should exist in "/etc/fstab".

Action: Ensure the home directory exists in"/etc/fstab".

10301-6

ERROR LifeKeeper was unable tomountthe home directory for the DB2instance "%s"

Cause: Failed tomount the home directory ofinstance of Multiple Partition database.

Action: Ensure the home directory is mounted andretry the operation.

10301-7

ERROR Unable to get the instance nodesinformation for resource "%s"

Cause: Failed to get the instance nodes.

Action: Check your LifeKeeper configuration.

10301-8

ERROR LifeKeeper was unable to startdatabase partition server "%s"for instance "%s"

Cause: The requested startup of the DB2 instancefailed.

Action: Check the logs for related errors and try toresolve the reported problem. Correct any reportederrors before retrying the "restore" operation.

10302-0

ERROR LifeKeeper was unable to stopthe database server for instance"%s"

Cause: The requested shutdown of the DB2 instancefailed.

Action: Check the logs for related errors and try toresolve the reported problem. Correct any reportederrors before retrying the "remove" operation.

10302-1

ERROR LifeKeeper was unable to stopthe database server for instance"%s"

Cause: The requested shutdown of the DB2 instancefailed.

Action: Check the logs for related errors and try toresolve the reported problem. Correct any reportederrors before retrying the "remove" operation.

10302-3

ERROR Unable to get the instance nodesinformation for resource "%s"

Cause: Failed to get the instance nodes.

Action: Check your LifeKeeper configuration.

SIOS Protection Suite for LinuxPage 392

Page 413: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10302-4

ERROR LifeKeeper was unable to stopdatabase partition server "%s"for instance "%s"

Cause: The requested shutdown of the DB2 instancefailed.

Action: Check the logs for related errors and try toresolve the reported problem. Correct any reportederrors before retrying the "remove" operation.

10302-6

ERROR Unable to get the instance nodesinformation for resource "%s"

Cause: Failed to get the instance nodes.

Action: Check your LifeKeeper configuration.

10302-7

FATAL The argument for the DB2instance is empty

Cause: Invalid parameters were specified for thecreate operation.

Action: Verify the parameters and retry the operation.

10302-8

FATAL Unable to determine the DB2instance home directory

Cause: The DB2 Application Recovery Kit wasunable to determine the DB2 instance homedirectory.

Action: Ensure the instance owner name is same asthe instance name and retry the operation.

10302-9

FATAL Unable to determine the DB2instance type

Cause: The DB2 Application Recovery Kit wasunable to determine the DB2 instance type.

Action: Check your DB2 configuration.

10303-0

FATAL LifeKeeper has detected an errorwhile trying to determine thenode number(s) of the DB par-tition server(s) for the instance

Cause: The DB2 Application Recovery Kit wasunable to find any nodes for the DB2 instance.

Action: Check your DB2 configuration.

10303-1

ERROR The path "%s" is not on a sharedfilesystem

Cause: The instance home directory should be on ashared filesystem.

Action: Ensure the path is on shared filesystem andretry the create operation.

10303-2

ERROR LifeKeeper was unable to get theDB tablespace containers forinstance "%s" or the log path forone of its databases

Cause: LifeKeeper could not determine the locationof the database table space containers or verify thatthey are located in a path which is on amountedfilesystem.

Action: Check the logs for related errors and try toresolve the reported problem. Correct any reportederrors before retrying the "create" operation.

SIOS Protection Suite for LinuxPage 393

Page 414: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10303-3

ERROR The path "%s" is not on a sharedfilesystem

Cause: The path of database table space containershould be on a shared filesystem.

Action: Ensure database table space container is ona shared filesystem and retry the operation.

10303-4

ERROR A DB2Hierarchy already existsfor instance "%s"

Cause: An attempt was made to protect the DB2instance that is already under LifeKeeper protection.

Action: Youmust select a different DB2 instance forLifeKeeper protection.

10303-5

ERROR The file system resource "%s" isnot in-service

Cause: The file system which the DB2 resourcedepends on should be in service.

Action: Ensure the file system resource is in serviceand retry the "create" operation.

10303-6

ERROR Unable to create the hierarchyfor raw device "%s"

Cause: LifeKeeper was unable to create the resource{raw device} .

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the "create" operation.

10303-7

ERROR A RAW hierarchy does not existfor the tag "%s"

Cause: LifeKeeper was unable to find the rawresource {tag} .

Action: Check your LifeKeeper configuration.

10303-8

ERROR LifeKeeper was unable to createa dependency between the DB2hierarchy "%s" and the Raw hier-archy "%s"

Cause: The requested dependency creation betweenthe parent DB2 resource and the child Raw resourcefailed.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors before retrying the "create" operation.

10303-9

ERROR LifeKeeper could not disable theautomatic startup feature of DB2instance "%s"

Cause: An unexpected error occurred whileattempting to update the DB2 setting.

Action: The DB2AUTOSTART will need to beupdatedmanually to turn off the automatic startup ofthe instance at system boot.

10304-0

ERROR DB2 version "%s" is notinstalled on server "%s"

Cause: LifeKeeper could not find DB2 installedlocation.

Action: Check your DB2 configuration.

SIOS Protection Suite for LinuxPage 394

Page 415: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10304-1

ERROR The instance owner "%s" doesnot exist on target server "%s"

Cause: An attempt to retrieve the DB2 instanceowner from template server during a "canextend" or"extend" operation failed.

Action: Verify the DB2 instance owner exists on thespecified server. If the user does not exist, it shouldbe created with the same uid and gid on all servers inthe cluster.

10304-2

ERROR The instance owner "%s" uidsare different on target server"%s" and template server "%s"

Cause: The user id on the target server {target server}for the DB2 instance owner {user} does not match thevalue of the user {user} on the template server{template server}.

Action: The user ids for the DB2 instance owner{user} must match on all servers in the cluster. Theuser id mismatch should be correctedmanually on allservers before retrying the "canextend" operation.

10304-3

ERROR The instance owner "%s" gidsare different on target server"%s" and template server "%s"

Cause: The group id on the target server {targetserver} for the DB2 instance owner {user} does notmatch the value of the user {user} on the templateserver {template server}.

Action: The group ids for the DB2 instance owner{user} must match on all servers in the cluster. Thegroup idmismatch should be correctedmanually onall servers before retrying the "canextend" operation.

10304-4

ERROR The instance owner "%s" homedirectories are different on targetserver "%s" and template server"%s"

Cause: The home directory location of the user {user}on the target server {target server} does not match theDB2 instance owner's home directory on the templateserver {template server}.

Action: The home directory location of the DB2instance owner {user} must match on all servers inthe cluster. The locationmismatch should becorrectedmanually on all servers before retrying the"canextend" operation.

10304-5

ERROR LifeKeeper was unable to get theDB2 "SVCENAME" parameterfor the DB2 instance

Cause: There was an unexpected error running "db2get dbm cfg" command.

Action: Check your DB2 configuration.

10304-6

ERROR Unable to get the value of theDB2 "SVCENAME" parameterfor the DB2 instance%s.

Cause: The DB2 "SVCENAME" parameter is set tonull.

Action: Check your DB2 configuration.

SIOS Protection Suite for LinuxPage 395

Page 416: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10304-7

ERROR LifeKeeper was unable to get thecontents of the "/etc/services"file on the server "%s"

Cause: "/etc/services" on the template server doesnot contain the service names for the DB2 instance.

Action: The service names in "/etc/services" for theDB2 instancemust match on all servers in thecluster. The service names mismatch should becorrectedmanually on all servers before retrying the"canextend" operation.

10304-8

ERROR LifeKeeper was unable to get thecontents of the "/etc/services"file on the server "%s"

Cause: "/etc/services" on the target server does notcontain the service names for the DB2 instance.

Action: The service names in "/etc/services" for theDB2 instancemust match on all servers in thecluster. The service names mismatch should becorrectedmanually on all servers before retrying the"canextend" operation.

10304-9

ERROR The "/etc/services" entries forthe instance "%s" are differenton target server "%s" and tem-plate server "%s"

Cause: The "/etc/services" entries for the instancearemismatched.

Action: The service names in "/etc/services" for theDB2 instancemust match on all servers in thecluster. The service names mismatch should becorrectedmanually on all servers before retrying the"canextend" operation.

10305-0

ERROR The home directory "%s" forinstance "%s" is not mounted onserver "%s"

Cause: LifeKeeper could not find db2nodes.cfg forMultiple Partition instance.

Action: Ensure the home directory is mounted andretry the operation.

10305-1

ERROR Error getting resource inform-ation for resource "%s" on server"%s"

Cause: Failed to get resource information from thetemplate server.

Action: Check your LifeKeeper configuration.

10305-2

ERROR LifeKeeper was unable to addinstance "%s" and/or its vari-ables to the DB2 registry

Cause: There was an unexpected error running"db2iset" command.

Action: Check the logs for related errors and try toresolve the reported problem.

10305-4

ERROR Unable to determine the DB2instance type

Cause: The DB2 Application Recovery Kit wasunable to determine the DB2 instance type.

Action: Check your DB2 configuration.

SIOS Protection Suite for LinuxPage 396

Page 417: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10305-5

ERROR LifeKeeper has detected an errorwhile trying to determine thenode number(s) of the DB par-tition server(s) for the instance

Cause: The DB2 Application Recovery Kit wasunable to find any nodes for the DB2 instance.

Action: Check your DB2 configuration.

10306-0

ERROR Unable to determine the DB2instance home directory

Cause: The DB2 Application Recovery Kit wasunable to determine the DB2 instance homedirectory.

Action: Ensure the instance owner name is same asthe instance name and retry the operation.

10306-1

ERROR Unable to determine the DB2instance type

Cause: The DB2 Application Recovery Kit wasunable to determine the DB2 instance type.

Action: Check your DB2 configuration.

10306-2

ERROR LifeKeeper has detected an errorwhile trying to determine thenode number(s) of the DB par-tition server(s) for the instance

Cause: The DB2 Application Recovery Kit wasunable to find the node for the DB2 instance.

Action: Check your DB2 configuration.

10306-3

ERROR Unable to determine the DB2install path

Cause: The DB2 Application Recovery Kit wasunable to find DB2 for the instance.

Action: Check your DB2 configuration.

10306-5

ERROR Invalid input provided for "%s"utility operation, characters arenot allowed.

Cause: Invalid parameters were specified for the"nodes" command.

Action: Verify the parameters and retry the operation.

10306-6

ERROR Unable to get the information forresource "%s"

Cause: LifeKeeper was unable to find the resource{tag}.

Action: Verify the parameters and retry the operation.

10306-7

ERROR TheDB2 instance "%s" is not aEEE orMultiple Partitioninstance

Cause: The resource {tag} is single partition instance.

Action: Verify the parameters and retry the operation.

10306-9

ERROR Node "%s" is already protectedby this hierarchy

Cause: Invalid parameters were specified for the"nodes" command.

Action: Verify the parameters and retry the operation.

10307-0

ERROR Node number "%s" is the lastremaining node protected byresource "%s". Deleting allnodes is not allowed.

Cause: Invalid parameters were specified for the"nodes" command.

Action: Verify the parameters and retry the operation.

SIOS Protection Suite for LinuxPage 397

Page 418: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10307-1

ERROR LifeKeeper is unable to get theequivalent instance for resource"%s"

Cause: There was an unexpected error running"nodes" command.

Action: Check the logs for related errors and try toresolve the reported problem.

10307-2

ERROR Unable to set NodesInfo forresource "%s" on "%s"

Cause: There was an unexpected error running"nodes" command.

Action: Check the logs for related errors and try toresolve the reported problem.

10307-3

ERROR Unable to set NodesInfo forresource "%s" on "%s"

Cause: There was an unexpected error running"nodes" command.

Action: Check the logs for related errors and try toresolve the reported problem.

10307-6

ERROR Unable to determine the DB2instance type

Cause: The DB2 Application Recovery Kit wasunable to determine the DB2 instance type.

Action: Check your DB2 configuration.

10307-7

ERROR Unable to determine the DB2instance home directory

Cause: The DB2 Application Recovery Kit wasunable to determine the DB2 instance homedirectory.

Action: Ensure the instance owner name is same asthe instance name and retry the operation.

10307-8

ERROR The database server is not run-ning for instance "%s"

Cause: A process check for the DB2 instance did notfind any processes running.

Action: The DB2 instancemust be started.

10307-9

ERROR LifeKeeper has detected an errorwhile trying to determine thenode number(s) of the DB par-tition server(s) for the instance

Cause: The DB2 Application Recovery Kit wasunable to find any nodes for the DB2 instance.

Action: Check your DB2 configuration.

10308-0

ERROR One ormore of the database par-tition servers for instance "%s"is down

Cause: All database partition servers should berunning.

Action: Ensure all database partition servers arerunning and retry the operation.

SIOS Protection Suite for LinuxPage 398

Page 419: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10308-2

ERROR Failed to create flag "%s" Cause: An unexpected error occurred attempting tocreate a flag for controlling DB2 local recoveryprocessing.

Action: Check the adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

10308-3

ERROR Failed to remove flag "%s" Cause: An unexpected error occurred attempting toremove a flag for controlling DB2 local recoveryprocessing.

Action: Check the adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

10308-4

ERROR Unable to determine the DB2instance \"$Instance\" home dir-ectory

Cause: The DB2 Application Recovery Kit wasunable to determine the DB2 instance homedirectory.

Action: Ensure the instance owner name is same asthe instance name and retry the operation.

10400-2

FATAL $msg Cause: This message indicates an internal softwareerror.

Action: The stack trace indicates the source of theerror.

10400-3

FATAL $self->Val('Tag') . " is not anSDR resource"

Cause: A data replication action was attempted on anon data replication resource.

Action: Check the logs for related errors and try toresolve the reported problem.

10401-0

ERROR $self->{'md'}: bitmapmergefailed, $action

Cause: The bitmapmerge operation has failed.

Action: The target server may have themirror and/orprotected filesystemmounted, or the bitmap file maybemissing on the target. Check the target server.

10402-2

ERROR $argv[1]: mdadm failed ($ret Cause: The "mdadm" command has failed to add adevice into themirror.

Action: This is usually a temporary condition.

10402-3

ERROR $_ Cause: Themessage contains the output of the"mdadm" command.

SIOS Protection Suite for LinuxPage 399

Page 420: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10402-5

ERROR failed to spawnmonitor Cause: The system failed to start the 'mdadm -F'monitor process. This should not happen undernormal circumstances.

Action: Reboot the system to ensure that anypotential conflicts are resolved.

10402-6

ERROR cannot create $md Cause: Themirror device could not be created.

Action: Ensure the device is not already in use andthat all other parameters for themirror creation arecorrect.

10402-7

ERROR $_ Cause: This message contains the "mdadm"command output.

10403-5

ERROR Toomany failures. Abortingresync of $md

Cause: The device was busy for an abnormally longperiod of time.

Action: Reboot the system to be sure that the deviceis no longer busy.

10403-6

ERROR Failed to start nbd-server on $tar-get (error $port

Cause: The nbd-server process could not be startedon the target server.

Action: Ensure that the target disk device isavailable and that its Device ID has not changed.

10403-7

ERROR Failed to start compression(error $port

Cause: The system was unable to start the 'balance'tunnel process or there was a network problem.

Action: Ensure that the network is operating properlyand that TCP ports in the range 10000-10512 areopened and unused. Ensure that the software isinstalled properly on all systems.

10403-8

ERROR Failed to start nbd-client on$source (error $ret

Cause: The nbd-client process has failed to start onthe source server.

Action: Look up the reported errno value and try toresolve the problem reported. For example, an errnovalue of 110means "Connection timed out", whichmay indicate a network or firewall problem.

10403-9

ERROR Failed to add $nbd to $md on$source

Cause: This is usually a temporary condition.

Action: If this error persists, reboot the system toresolve any potential conflicts.

SIOS Protection Suite for LinuxPage 400

Page 421: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10404-5

ERROR failed to stop $self->{'md'} Cause: Themirror device could not be stopped.

Action: Ensure that the device is not busy ormounted. Try running "mdadm --stop" manually tostop the device.

10404-8

WARN failed to kill $proc, pid $pid Cause: The process could not be signalled. This mayindicate that the process has already died.

Action: Ensure that the process in question is nolonger running. If it is, then reboot the system to clearup the unkillable process.

10405-0

ERROR Setting $name on $dest failed:$ret. Please try again.

Cause: The system failed to set a 'mirrorinfo' filesetting.

Action: Check the network and systems and retrythemirror setting operation.

10405-2

FATAL Specified existingmount point"%s" is not mounted

Cause: Themount point became unmounted.

Action: Ensure that themount point is mounted andretry the operation.

10405-5

ERROR Failed to set up temporary $typeaccess to data for $self->{'tag'}.Error: $ret

Cause: The filesystem or device was not available onthe target server. Themirrored data will not beavailable on the target server until themirror ispaused and resumed again.

Action: Reboot the target server to resolve anypotential conflicts.

10405-7

ERROR Failed to undo temporary accessfor $self->{'tag'} on $self->{'sys'}. Error: $ret. Please verifythat $fsid is not mounted onserver $self->{'sys'}.

Cause: The filesystem could not be unmounted onthe target server.

Action: Ensure that the filesystem and device are notbusy on the target server. Reboot the target server toresolve any potential conflicts.

10406-2

FATAL Cannot find a device with uniqueID "%s"

Cause: The target disk could not be identified.

Action: Ensure that the appropriate storage recoverykits are installed on the target server. Ensure that theDevice ID of the target disk has not changed.

10406-6

FATAL Cannot get the hardware ID ofdevice "%s"

Cause: A unique ID could not be found for the targetdisk device.

Action: Ensure that the appropriate storage recoverykits are installed on the target server. Ensure that theDevice ID of the target disk has not changed.

SIOS Protection Suite for LinuxPage 401

Page 422: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10406-7

FATAL Asynchronous writes cannot beenabled without a bitmap file

Cause: An attempt was made to create amirror withinvalid parameters.

Action: A bitmap file parameter must be specified orsynchronous writes must be specified.

10406-8

FATAL Failed to extend dependentresource%s to system%s.Error%s

Cause: The hierarchy extend operation failed.

Action: Check the logs for related errors and try toresolve the reported problem.

10407-0

FATAL Unable to extend themirror "%s"to system "%s"

Cause: The hierarchy extend operation failed.

Action: Check the logs for related errors and try toresolve the reported problem.

10407-1

ERROR Failed to restore target deviceresources on $target->{'sys'} :$err

Cause: The in-service operation has failed on thetarget server.

Action: Check the logs for related errors and try toresolve the reported problem.

10407-3

EMERG WARNING: A temporary com-munication failure has occurredbetween systems %s and%s.In order to avoid data corruption,data resynchronization will notoccur. MANUALINTERVENTION ISREQUIRED. In order to initiatedata resynchronization, youshould: 1) Take one of the fol-lowing resources out of service(this resource will become themirror target): %s on%s or%son%s. 2) Take the otherresource out of service (thisresource will become themirrorsource). 3) Run%s/bitmap -d%s on the system that willbecomemirror source. 4) Bringthemirror in service on thesource system. A full resync willoccur.

Cause: A temporary communication failure (split-brain scenario) has occurred between the source andtarget servers.

Action: Perform the steps listed in themessage text.

SIOS Protection Suite for LinuxPage 402

Page 423: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10407-4

FATAL Cannot get the hardware ID ofdevice "%s"

Cause: There is no storage recovery kit thatrecognizes the underlying disk device that you areattempting to use for themirror.

Action: Make sure the appropriate storage recoverykits are installed. If necessary, place your devicename in the/opt/LifeKeeper/subsys/scsi/resources/DEVNAME/device_pattern file.

10408-1

FATAL Cannot make the%s filesystemon "%s" (%d)

Cause: The "mkfs" command failed.

Action: Ensure that the disk device is writable andfree of errors and that the filesystem tools for theselected filesystem are installed.

10408-2

FATAL %s Cause: This message contains the output of the"mkfs" command.

10408-3

FATAL Cannot create filesys hierarchy"%s"

Cause: The resource creation failed.

Action: Check the logs for related errors and try toresolve the reported problem.

10408-6

ERROR The "%s_data_corrupt" flag isset in "%s/sub-sys/scsi/resources/netraid/" onsystem "%s". To avoid data cor-ruption, LifeKeeper will notrestore the resource

Cause: The data corrupt flag file has been set as aprecaution to prevent accidental data corruption. Themirror cannot be restored on this server until the file isremoved.

Action: If you are sure that the data is valid on theserver in question, you can either: 1) remove the fileand restore themirror, or 2) force themirror onlineusing the LifeKeeper GUI or 'mirror_action force'command.

10409-2

ERROR Mirror target resourcemovementto system%s : status %s

Cause: The hierarchy switchover operation hasfailed.

Action: Check the logs for related errors and try toresolve the reported problem.

10409-9

ERROR Unable to unextend themirror forresource "%s" from system"%s"

Cause: The hierarchy unextend operation failed.

Action: Reboot the target server to resolve anypotential conflicts and retry the operation.

SIOS Protection Suite for LinuxPage 403

Page 424: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10410-6

ERROR remote 'bitmap -m' commandfailed on $target->{'sys'}:$ranges

Cause: The bitmapmerge command failed on thetarget server. This may be caused by one of twothings: 1) The bitmap file may bemissing orcorrupted, or 2) themirror (md) devicemay be activeon the target.

Action: Make sure that themirror and protectedfilesystem are not active on the target. If the target'sbitmap file is missing, pause and resume themirror torecreate the bitmap file.

10410-7

ERROR Asynchronous writes cannot beenabled without a bitmap file

Cause: Invalid parameters were specified for themirror create operation.

10410-8

ERROR Local Partition not available Cause: Invalid parameters were specified for themirror create operation.

10410-9

ERROR Cannot get the hardware ID ofdevice "%s"

Cause: A unique ID could not be determined for thedisk device.

Action: Ensure that the appropriate storage recoverykits are installed on the server. Ensure that theDevice ID of the disk has not changed.

10411-1

FATAL Insufficient input parameters for"%s" creation

Cause: Invalid parameters were specified for themirror create operation.

10411-2

FATAL Insufficient input parameters for"%s" creation

Cause: Invalid parameters were specified for themirror create operation.

10411-3

FATAL Insufficient input parameters for"%s" creation

Cause: Invalid parameters were specified for themirror create operation.

10411-4

FATAL Insufficient input parameters for"%s" creation

Cause: Invalid parameters were specified for themirror create operation.

10411-5

FATAL Insufficient input parameters for"%s" creation

Cause: Invalid parameters were specified for themirror create operation.

10411-7

FATAL Insufficient input parameters for"%s" creation

Cause: Invalid parameters were specified for themirror create operation.

10411-8

FATAL Cannot unmount existingMountPoint "%s"

Cause: Themount point is busy.

Action: Make sure the filesystem is not busy. Stopany processes or applications that may be accessingthe filesystem.

10411-9

FATAL Invalid data replication resourcetype requested ("%s"

Cause: An invalid parameter was specified for themirror create operation.

SIOS Protection Suite for LinuxPage 404

Page 425: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10412-4

EMERG WARNING: A temporary com-munication failure has occurredbetween systems %s and%s.In order to avoid data corruption,data resynchronization will notoccur. MANUALINTERVENTION ISREQUIRED. In order to initiatedata resynchronization, youshould take one of the followingresources out of service: %s on%s or%s on%s. The resourcethat is taken out of service willbecome themirror target.

Cause: A temporary communication failure (split-brain scenario) has occurred between the source andtarget servers.

Action: Perform the steps listed in themessage text.

10412-5

ERROR failed to start '$cmd $_[2] $user_args' on '$_[3]'

Cause: The specified command failed.

Action: Check the logs for related errors and try toresolve the reported problem.

10412-6

ERROR $_ Cause: This message contains the output of thecommand that was reported as failing in message104125.

10412-9

ERROR The replication connection formirror "%s" (resource: "%s") isdown (reason: %s)

Cause: The replication connection for themirror isdown.

Action: Check the network.

10413-0

ERROR Mirror resize failed on%s (%s).Youmust successfully completethis operation before using themirror. Please try again.

Cause: Themirror resize operation has failed toupdate themirror metadata on the listed system.

Action: Youmust successfully complete the resizebefore using themirror. Re-runmirror_resize(possibly using -f to force the operation if necessary).

10413-6

ERROR Extend failed. Cause: The hierarchy extend operation failed.

Action: Check the logs for related errors and try toresolve the reported problem.

10414-3

ERROR Mirror resumewas unsuccessful($ret

Cause: Themirror could not be established.

Action: Check the logs for related errors and try toresolve the reported problems.

SIOS Protection Suite for LinuxPage 405

Page 426: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

10414-4

ERROR Unable to stop themirror accessfor $self->{'md'} on system$self->{'sys'}. Error: $ret. Usethe \"mdadm --stop $self->{'md'}\" command tomanually stop themirror.

Cause: Themirror device created on the target nodewhen themirror was paused could not be stopped.

Action: Ensure that the device is not busy ormounted. Try running "mdadm --stop" manually tostop the device.

10415-6

WARN Resynchronization of "%s" is inPENDING state. Current sync_action is: "%s"

Cause: The resynchronization of themd device isdetected in PENDING state.

Action: LifeKeeper will try to fix the issue by forcing aresynchronization. Check the logs for related errors.When successful assure that the PENDING statehas been cleared in /proc/mdstat and theresynchronization is in progress or has beencompleted for the datarep resource.

12200-5

ERROR Unable to "%s" on "%s" Cause: There was an unexpected error running"getlocks".

Action: Check the logs for related errors and try toresolve the reported problem.

12200-7

ERROR Unable to "%s" on "%s" Cause: There was an unexpected error running"rlslocks".

Action: Check the logs for related errors and try toresolve the reported problem.

12200-9

ERROR The path%s is not a valid file. Cause: There is no listener.ora file.

Action: Ensure the file exists and retry the operation.

12201-0

ERROR The listener user does not existon the server%s.

Cause: "Stat" command could not get user id.

Action: Retry the operation.

12201-1

ERROR The listener user does not existon the server%s.

Cause: UID is not in passwd file.

Action: Ensure the UID exists in passwd file andretry the operation.

12201-2

ERROR The listener user does not existon the server%s.

Cause: User name is not in passwd file.

Action: Ensure the user name exists in passwd file;retry the operation.

12202-3

ERROR The%s command failed (%d Cause: This message contains the return code of the"lsnrctl" command.

Action: Check the logs for related errors and try toresolve the reported problem.

SIOS Protection Suite for LinuxPage 406

Page 427: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12202-4

ERROR $line Cause: Themessage contains the output of the"lsnrctl" command.

Action: Check the logs for related errors and try toresolve the reported problem.

12203-9

ERROR Usage error Cause: Invalid parameters were specified for therestore operation.

Action: Verify the parameters and retry the operation.

12204-0

ERROR Script $cmd has hung on therestore of \"$opt_t\". Forcibly ter-minating.

Cause: The listener restore script reached its timeoutvalue.

Action: Ensure listener.ora is valid and that LSNR_START_TIME (default 35 seconds) in/etc/default/LifeKeeper is set to a value greater thanor equal to the time needed to start the listener.

12204-1

ERROR END failed%s of "%s" on server"%s" due to a "%s" signal

Cause: LifeKeeper was unable to restore theresource {resource} on {server}.

Action: Check the logs for related errors and try toresolve the reported problem.

12204-5

ERROR Error getting resource inform-ation for resource "%s" on server"%s"

Cause: Failed to get resource information.

Action: Check your LifeKeeper configuration.

12204-6

ERROR Usage error Cause: Invalid parameters were specified for therestore operation.

Action: Verify the parameters and retry the operation.

12204-9

ERROR The script $cmd has hung onremove of \"$opt_t\". Forcibly ter-minating.

Cause: The listener remove script reached itstimeout value.

Action: Ensure listener.ora is valid and that LSNR_STOP_TIME (default 35 seconds) in/etc/default/LifeKeeper is set to a value greater thanor equal to the time needed to stop the listener.

12205-1

ERROR Error getting resource inform-ation for resource "%s" on server"%s"

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

Action: Check your LifeKeeper configuration.

12205-5

ERROR END failed%s of "%s" on server"%s" due to a "%s" signal

Cause: LifeKeeper was unable to quickCheck theresource {resource} on {server}.

Action: Check the logs for related errors and try toresolve the reported problem.

SIOS Protection Suite for LinuxPage 407

Page 428: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12205-7

ERROR Error getting resource inform-ation for resource "%s" on server"%s"

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

Action: Check your LifeKeeper configuration.

12206-4

WARN The%s level is set to%s a%swill not occur.

Cause: Theminimal Listener protection level is StartandMonitor.

Action: Start the listener manually.

12206-6

ERROR Script has hung checking\"$tag\". Forcibly terminating.

Cause: The listener quickCheck script reached itstimeout value.

Action: Ensure listener.ora is valid and that LSNR_STATUS_TIME (default 15 seconds) in/etc/default/LifeKeeper is set to a value greater thanor equal to the time needed to check the listener.

12206-7

ERROR Usage error Cause: Invalid parameters were specified for thequickCheck operation.

Action: Verify the parameters and retry the operation.

12206-9

ERROR Usage error Cause: Invalid parameters were specified for thedelete operation.

Action: Verify the parameters and retry the operation.

12207-2

ERROR %s: resource "%s" not found onlocal server

Cause: Invalid parameters were specified for therecover operation.

Action: Verify the parameters and retry the operation.

12207-4

WARN The local recovery attempt hasfailed but %s level is set to%spreventing a failover to anothernode in the cluster. With%srecovery set all local recoveryfailures will exit successfully toprevent resource failovers.

Cause: The optional listener recovery level is set tolocal recovery only.

Action: Switch over the resource treemanually.

12207-8

ERROR END failed%s of "%s" on server"%s" due to a "%s" signal

Cause: LifeKeeper was unable to recover theresource {resource} on {server}.

Action: Check the logs for related errors and try toresolve the reported problem.

12208-2

ERROR Error getting resource inform-ation for resource "%s" on server"%s"

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

Action: Check your LifeKeeper configuration.

SIOS Protection Suite for LinuxPage 408

Page 429: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12208-3

ERROR $cmd has hung checking\"$tag\". Forcibly terminating

Cause: The recover script was stopped by signal.

Action: Ensure listener.ora is valid.

12208-4

ERROR Cannot extend resource "%s" toserver "%s"

Cause: LifeKeeper was unable to extend theresource {resource} on {server}.

Action: Verify the parameters and retry the operation.

12208-5

ERROR Usage: %s %s Cause: Invalid parameters were specified for thecanextend operation.

Action: Verify the parameters and retry the operation.

12208-6

ERROR The values specified for the tar-get and the template servers arethe same. Please specify the cor-rect values for the target and tem-plate servers.

Cause: The values specified for the target and thetemplate servers are the same.

Action: Perform the steps listed in themessage text.

12208-7

ERROR Error getting resource inform-ation for resource "%s" on server"%s"

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

Action: Check your LifeKeeper configuration.

12208-8

ERROR Error getting resource inform-ation for resource "%s" on server"%s"

Cause: Failed to get listener user name fromresource information.

Action: Ensure the resource info field is valid thenretry the operation.

12208-9

ERROR The listener user%s does notexist on the server%s.

Cause: User name is not in passwd file.

Action: Ensure the user name exists in passwd fileand retry the operation.

12209-0

ERROR The id for user%s is not thesame on template server%s andtarget server%s.

Cause: User ID should be same on both servers.

Action: Trim user ID to the same.

12209-1

ERROR The group id for user%s is notthe same on template server%sand target server%s.

Cause: Group ID should be same on both servers.

Action: Trim group ID to the same.

12209-2

ERROR Cannot access canextend script"%s" on server "%s"

Cause: LifeKeeper was unable to run pre-extendchecks because it was unable to find the "canextend"script on {server}.

Action: Check your LifeKeeper configuration.

SIOS Protection Suite for LinuxPage 409

Page 430: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12209-7

ERROR Usage: %s %s Cause: Invalid arguments were specified for the"configActions" operation.

Action: Verify the arguments and retry the operation.

12209-8

ERROR Error getting resource inform-ation for resource "%s" on server"%s"

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

Action: Check your LifeKeeper configuration.

12209-9

ERROR Unable to update the resource%s to change the%s to%s on%s.

Cause: LifeKeeper failed to put information into theinfo field.

Action: Restart LifeKeeper and retry the operation.

12210-0

ERROR Error getting resource inform-ation for resource "%s" on server"%s"

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

Action: Check your LifeKeeper configuration.

12210-1

ERROR Unable to update the resource%s to change the%s to%s on%s.

Cause: LifeKeeper failed to put information to infofield on {server}.

Action: Restart LifeKeeper on {server} and retry theoperation.

12210-3

ERROR Usage: %s %s Cause: Invalid parameters were specified for thecreate operation.

Action: Verify the parameters and retry the operation.

12212-4

ERROR END failed hierarchy "%s" ofresource "%s" on server "%s"with return value of %d

Cause: LifeKeeper was unable to create the resource{resource} on {server}.

Action: Check the logs for related errors and try toresolve the reported problem.

12212-6

ERROR Unable to "%s" on "%s Cause: There was an unexpected error running"rlslocks".

Action: Check the logs for related errors and try toresolve the reported problem.

12212-7

ERROR END failed%s of "%s" on server"%s" due to a "%s" signal

Cause: LifeKeeper was unable to create the resource{resource} on {server}.

Action: Check the logs for related errors and try toresolve the reported problem.

12212-9

ERROR Unable to "%s" on "%s Cause: There was an unexpected error running"getlocks."

Action: Check the logs for related errors and try toresolve the reported problem.

SIOS Protection Suite for LinuxPage 410

Page 431: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12213-1

ERROR Error creating resource "%s" onserver "%s"

Cause: LifeKeeper was unable to create the resource{resource} on {server}.

Action: Check your LifeKeeper configuration.

12213-3

ERROR Unable to create a file systemresource hierarchy for the filesystem%s.

Cause: There was an unexpected error running"filesyshier."

Action: Check adjacent logmessages for furtherdetails.

12213-5

ERROR Unable to create a dependencybetween parent tag%s and childtag%s.

Cause: There was an unexpected error running "dep_create."

Action: Check the logs for related errors and try toresolve the reported problem.

12214-0

ERROR Resource "%s" is not ISP onserver "%s" Manually bring theresource in service and retry theoperation

Cause: IP resource {tag} which the listener resourcedepends on should be ISP.

Action: Perform the steps listed in themessage text.

12214-1

ERROR Unable to create a dependencybetween parent tag%s and childtag%s.

Cause: There was an unexpected error running "dep_create."

Action: Check the logs for related errors and try toresolve the reported problem.

12214-4

ERROR Usage: %s %s Cause: Invalid parameters were specified for the"create_ins" operation.

Action: Verify the parameters and retry the operation.

12214-5

ERROR An error has occurred in utility%s on server%s. View theLifeKeeper logs for details andretry the operation.

Cause: There was an unexpected error running "app_create."

Action: Check the logs for related errors and try toresolve the reported problem.

12214-6

ERROR An error has occurred in utility%s on server%s. View theLifeKeeper logs for details andretry the operation.

Cause: There was an unexpected error running "typ_create."

Action: Check the logs for related errors and try toresolve the reported problem.

12214-7

ERROR An error has occurred in utility%s on server%s. View theLifeKeeper logs for details andretry the operation.

Cause: There was an unexpected error running"newtag."

Action: Check the logs for related errors and try toresolve the reported problem.

SIOS Protection Suite for LinuxPage 411

Page 432: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12214-8

ERROR Error creating resource "%s" onserver "%s"

Cause: LifeKeeper was unable to create the resource{resource} on {server}.

Action: Check your LifeKeeper configuration.

12214-9

ERROR An error has occurred in utility%s on server%s. View theLifeKeeper logs for details andretry the operation.

Cause: There was an unexpected error running "ins_setstate."

Action: Check the logs for related errors and try toresolve the reported problem.

12215-0

ERROR Error creating resource "%s" onserver "%s"

Cause: LifeKeeper was unable to create the resource{resource} on {server}.

Action: Check your LifeKeeper configuration.

12215-1

ERROR Usage: %s %s Cause: Invalid arguments were specified for the"depstoextend" operation.

Action: Verify the arguments and retry the operation.

12215-2

ERROR Usage: %s %s Cause: Invalid parameters were specified for the"extend" operation.

Action: Verify the parameters and retry the operation.

12215-3

ERROR Error getting resource inform-ation for resource "%s" on server"%s"

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

Action: Check your LifeKeeper configuration.

12215-4

ERROR Cannot extend resource "%s" toserver "%s"

Cause: LifeKeeper was unable to extend theresource {resource} on {server}.

12215-5

ERROR Resource with either matchingtag "%s" or id "%s" alreadyexists on server "%s" for App"%s" and Type "%s"

Cause: During the Listener resource extension, aresource instance was found using the same {tag}and/or {id} but with a different resource applicationand type.

Action: Resource IDs must be unique. The resourceinstance with the ID matching the Oracle Listenerresource instancemust be removed.

12215-6

ERROR Cannot access extend script"%s" on server "%s"

Cause: LifeKeeper was unable to extend theresource hierarchy because it was unable to the findthe script EXTEND on {server}.

Action: Check your LifeKeeper configuration.

12215-7

ERROR Usage: %s %s Cause: Invalid arguments were specified for the"getConfigIps" operation.

Action: Verify the arguments and retry the operation.

SIOS Protection Suite for LinuxPage 412

Page 433: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12215-8

ERROR The file%s is not a valid listenerfile. The file does not contain anylistener definitions.

Cause: Failed to find any listener definitions.

Action: Ensure listener definition is in the listener.oraand retry the operation.

12215-9

ERROR Usage: %s %s Cause: Invalid arguments were specified for the"getSidListeners" operation.

Action: Verify the arguments and retry the operation.

12216-0

ERROR The file%s is not a valid listenerfile. The file does not contain anylistener definitions.

Cause: Failed to find any listener definitions.

Action: Ensure listener definition is in the listener.oraand retry the operation.

12216-1

ERROR Usage: %s %s Cause: Invalid arguments were specified for the "lsn-display" operation.

Action: Verify the arguments and retry the operation.

12216-2

ERROR Error getting resource inform-ation for resource "%s" on server"%s"

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

Action: Check your LifeKeeper configuration.

12216-3

ERROR Usage: %s %s Cause: Invalid parameters were specified for theupdateHelper operation.

Action: Verify the parameters and retry the operation.

12216-4

ERROR END failed hierarchy "%s" ofresource "%s" on server "%s"with return value of %d

Cause: LifeKeeper was unable to update theresource {resource} on {server}.

Action: Check the logs for related errors and try toresolve the reported problem.

12216-6

ERROR Usage: %s %s Cause: Invalid parameters were specified for the"updateHelper" operation.

Action: Verify the parameters and retry the operation.

12217-0

ERROR Unable to create a dependencybetween parent tag%s and childtag%s.

Cause: There was an unexpected error running "dep_create."

Action: Check the logs for related errors and try toresolve the reported problem.

12217-1

ERROR Unable to create a dependencybetween parent tag%s and childtag%s.

Cause: There was an unexpected error running "dep_create."

Action: Check the logs for related errors and try toresolve the reported problem.

SIOS Protection Suite for LinuxPage 413

Page 434: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12217-2

ERROR Usage: %s %s Cause: Invalid arguments were specified for the"updIPDeps" operation.

Action: Verify the arguments and retry the operation.

12217-3

ERROR END failed hierarchy "%s" ofresource "%s" on server "%s"with return value of %d

Cause: LifeKeeper was unable to update theresource {resource} on {server}.

Action: Check the logs for related errors and try toresolve the reported problem.

12217-5

ERROR Unable to "%s" on "%s Cause: There was an unexpected error running"rlslocks."

Action: Check the logs for related errors and try toresolve the reported problem.

12217-7

ERROR Unable to "%s" on "%s Cause: There was an unexpected error running"getlocks."

Action: Check the logs for related errors and try toresolve the reported problem.

12218-0

ERROR Unable to create a dependencybetween parent tag%s and childtag%s.

Cause: There was an unexpected error running "dep_create."

Action: Check the logs for related errors and try toresolve the reported problem.

12218-1

ERROR Unable to create a dependencybetween parent tag%s and childtag%s.

Cause: There was an unexpected error running "dep_create."

Action: Check the logs for related errors and try toresolve the reported problem.

12218-3

ERROR The path%s is not a valid file. Cause: There is no listener.ora file.

Action: Ensure the file exists and retry the operation.

12218-5

ERROR The file%s is not a valid listenerfile. The file does not contain anylistener definitions.

Cause: LifeKeeper failed to find any valid listenerdefinitions.

Action: Ensure there are valid listener definitions inthe listener.ora and retry the operation.

12218-6

ERROR The value specified for%s can-not be empty. Please specify avalue for this field.

Cause: The config and/or executable {path} field isempty.

Action: Input a non-empty value for {path} and retrythe operation.

SIOS Protection Suite for LinuxPage 414

Page 435: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12218-7

ERROR The path%s is not a valid file ordirectory.

Cause: The defined {path} is invalid.

Action: Ensure the {path} exists and retry theoperation.

12218-8

ERROR The path%s is not a valid file ordirectory.

Cause: There is no {path}.

Action: Ensure the {path} exists and retry theoperation.

12218-9

ERROR The value specified for%s can-not be empty. Please specify avalue for this field.

Cause: The config and/or executable Path field isempty.

Action: Input path for the field.

12219-0

ERROR Usage: %s %s Cause: Invalid arguments were specified for the"valid_rpath" operation.

Action: Verify the arguments and retry the operation.

12219-1

ERROR The values specified for the tar-get and the template servers arethe same.

Cause: Invalid argument of valid_rpath.

Action: Ensure arguments and retry the operation.

12219-2

ERROR Unable to find the configurationfile "oratab" in its default loc-ations, /etc/oratab or%s on"%s"

Cause: There is no oratab file in /etc/oratab or {path}.

Action: Ensure oratab file exists in {path} orORACLE_ORATABLOC in /etc/default/Lifekeeper isset to a valid path.

12219-3

ERROR Unable to find the configurationfile "oratab" in its default loc-ations, /etc/oratab or%s on"%s"

Cause: There is no oratab file in /etc/oratab or {path}.

Action: Ensure oratab file exists in {path} orORACLE_ORATABLOC in /etc/default/Lifekeeper isset to a valid path.

12219-4

ERROR Unable to find the configurationfile "oratab" in its default loc-ations, /etc/oratab or%s on"%s"

Cause: There is no oratab file in /etc/oratab or {path}.

Action: Ensure oratab file exists in {path} orORACLE_ORATABLOC in /etc/default/Lifekeeper isset to a valid path.

12219-5

ERROR Unable to find the configurationfile "oratab" in its default loc-ations, /etc/oratab or%s on"%s"

Cause: There is no oratab file in /etc/oratab or {path}.

Action: Ensure oratab file exists in {path} orORACLE_ORATABLOC in /etc/default/Lifekeeper isset to a valid path.

12219-6

ERROR END failed%s of "%s" on server"%s" due to a "%s" signal

Cause: LifeKeeper was unable to remove theresource {resource} on {server}.

Action: Check the logs for related errors and try toresolve the reported problem.

SIOS Protection Suite for LinuxPage 415

Page 436: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12219-7

ERROR Unable to find the configurationfile \"oratab\" in its default loc-ations, /etc/oratab or $listen-er::oraTab on \"$me\"

Cause: There is no oratab file in /etc/oratab or {path}.

Action: Ensure oratab file exists in {path} orORACLE_ORATABLOC in /etc/default/Lifekeeper isset to a valid path.

12250-1

ERROR DB instance "%s" is already pro-tected on "%s".

Cause: An attempt was made to protect an Oracledatabase instance {sid} that is already underLifeKeeper protection on {server}.

Action: Youmust select a different databaseinstance {sid} for LifeKeeper protection.

12250-2

ERROR Failed to create object instancefor Oracle on "%s".

Cause: There was an unexpected error creating aninternal representation of the Oracle instance beingprotected.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12250-3

ERROR Unable to locate the oratab file"%s" on "%s".

Cause: The oratab file was not found at the default oralternate locations on {server}.

Action: Verify the oratab file exists and has properpermissions for the Oracle user. A valid oratab file isrequired to complete the "create" operation.

12250-4

ERROR Unable to determine Oracle userfor "%s" on "%s".

Cause: TheOracle Application Recovery Kit wasunable to determine the ownership of the Oracledatabase installation binaries.

Action: The owner of the Oracle binaries must be avalid non-root user on {server}. Correct thepermissions and ownership of the Oracle databaseinstallation and retry the operation.

12250-5

ERROR TheOracle database "%s" is notrunning or no open connectionsare available on "%s".

Cause: The database instance {sid} was not runningor connections to the database were not available viathe credentials provided.

Action: The database instance {sid} must be startedon {server} and the proper credentials must beprovided for the completion of the "create" operation.

SIOS Protection Suite for LinuxPage 416

Page 437: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12250-6

ERROR Unable to determine Oracledbspaces and logfiles for "%s"on "%s".

Cause: A query to determine the location of requiredtablespaces, logfiles and related database files failed.This may have been caused by an internal databaseerror.

Action: Check the adjacent logmessages for furtherdetails and related errors. Check the Oracle log(alert.log) and related trace logs (*.trc) for additionalinformation and correct the reported problem(s).

12250-7

ERROR Unknown chunk type found for"%s" on "%s".

Cause: The specified tablespace, logfile or otherrequired database file is not one of the LifeKeepersupported file or character device types.

Action: The specified file {database_file} mustreference an existing character device or file. Consultthe Oracle installation documentation to recreate thespecified file {database_file} as a supported file orcharacter device type.

12250-8

ERROR DB Chunk "%s" for "%s" on"%s" does not reside on ashared file system.

Cause: The specified tablespace, logfile or otherrequired database file {database_file} does not resideon a file system that is shared with other systems inthe cluster.

Action: Use the LifeKeeper UI or "lcdstatus (1M)" toverify that communication paths have been properlycreated. Use "rpm" to verify that the necessaryApplication Recovery Kits for storage protection havebeen installed. Verify that the file is, in fact, not onshared storage, and if not, move it to a shared storagedevice.

12251-0

ERROR File system create failed for"%s" on "%s". Reason

Cause: LifeKeeper was unable to create the resource{filesystem} on the specified server {server}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the "create" operation.

12251-1

ERROR %s Cause: Themessage contains the output of the"filesyshier" command.

Action: Check the adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

SIOS Protection Suite for LinuxPage 417

Page 438: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12251-3

ERROR Dependency creation betweenOracle database "%s (%s)" andthe dependent resource "%s" on"%s" failed. Reason

Cause: LifeKeeper was unable to create adependency between the database resource {tag} andthe necessary child resource {childtag}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Once any problemshave been corrected, it may be possible to create thedependency between {tag} and {childtag} manually.

12251-4

ERROR Unable to "%s" on "%s" duringresource create.

Cause: TheOracle Application Recovery Kit wasunable to release the administrative lock using the"rlslocks" command.

Action: Check adjacent logmessages for furtherdetails and relatedmessages.

12251-6

ERROR Raw device resource createdfailed for "%s" on "%s". Reason

Cause: LifeKeeper was unable to create the resource{raw device} on the specified server {server}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the "create" operation.

12251-9

ERROR In-service attempted failed fortag "%s" on "%s".

Cause: The "perform_action" command for {tag} on{server} failed to start the database {sid}. The in-service operation has failed.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the "create" operation.

12252-1

ERROR Create of app "%s" on "%s"failed with return code of "%d".

Cause: There was an error running the command"app_create" to create the internal application type.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12252-2

ERROR Create of typ "%s" for app "%s"on "%s" failed with return codeof "%d".

Cause: There was an error running the command"typ_create" to create the internal resource type.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

SIOS Protection Suite for LinuxPage 418

Page 439: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12252-4

ERROR Setting "resstate" for resource"%s" on "%s" failed with returncode of "%d".

Cause: There was an error running the command"ins_setstate" to set the resource state to {state}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12252-5

ERROR The values specified for the tar-get and the template servers arethe same: "%s".

Cause: The value specified for the target andtemplate servers for the "extend" operation were thesame.

Action: Youmust specify the correct parameter forthe {target server} and {template server}. The {targetserver} is the server where the {tag} will be extended.

12252-6

ERROR Unable to locate the oratab file in"/etc" or in "%s" on "%s".

Cause: The oratab file was not found at the default oralternate locations on {server}.

Action: Verify the oratab file exists and has properpermissions for the Oracle user. A valid oratab file isrequired to complete the "extend" operation.

12252-7

ERROR Unable to retrieve the Oracleuser on "%s".

Cause: An attempt to retrieve the Oracle user from{template server} during a "canextend" or "extend"operation failed.

Action: The owner of the Oracle binaries must be avalid user on {target server} and {template server}.Correct the permissions and ownership of the Oracledatabase installation and retry the operation.

12252-8

ERROR TheOracle user and/or groupinformation for user "%s" doesnot exist on the server "%s".

Cause: LifeKeeper is unable to find the Oracle userand/or group information for the Oracle user {user} onthe server {server}.

Action: Verify the Oracle user {user} exists on thespecified {server}. If the user {user} does not exist, itshould be created with the same uid and gid on allservers in the cluster.

12252-9

ERROR The id for user "%s" is not thesame on template server "%s"and target server "%s".

Cause: The user id on the target server {target server}for the Oracle user {user} does not match the value ofthe user {user} on the template server {templateserver}.

Action: The user ids for the Oracle user {user} mustmatch on all servers in the cluster. The user idmismatch should be correctedmanually on allservers before retrying the "extend" operation.

SIOS Protection Suite for LinuxPage 419

Page 440: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12253-0

ERROR The group id for user "%s" is notthe same on template server"%s" and target server "%s".

Cause: The group id on the target server {targetserver} for the Oracle user {user} does not match thevalue of the user {user} on the template server{template server}.

Action: The group ids for the Oracle user {user} mustmatch on all servers in the cluster. The group idmismatch should be correctedmanually on allservers before retrying the "extend" operation.

12253-2

ERROR No file system or raw devicesfound to extend for "%s" on"%s".

Cause: There were no dependent file system or rawdevice resources found for the Oracle resource {tag}on server {template server}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages.

12253-3

WARN A RAMDISK (%s) was detectedin the ORACLE Database con-figuration for "%s" on "%s".LifeKeeper cannot protectRAMDISK. This RAMDISKresource will not be protected byLifeKeeper! ORACLE hierarchycreation will continue.

Cause: The specified tablespace, logfile or otherdatabase file {database_file} was detected as aramdisk. No protection is available for this type ofresource in the current LifeKeeper product.

Action: The ramdisk will not be protected. Youmustmanually ensure that the required database file{database_file} will be available during all Oracledatabase operations.

12253-4

ERROR Failed to initialize objectinstance for Oracle sid "%s" on"%s".

Cause: There was an unexpected error creating aninternal representation of the Oracle instance beingprotected.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12253-7

ERROR Update of instance info field for"%s" on "%s" failed (%s).

Cause: There was an error while running thecommand "ins_setinfo" to update the internalresource information field.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

SIOS Protection Suite for LinuxPage 420

Page 441: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12253-8

ERROR Initial connect with query bufferto database "%s" on "%s" failed,testing output.

Cause: A connection attempt to the Oracle database{sid} to determine the database status has failed.

Action: The connection attempt failed with thespecified credentials. Check the adjacent logmessages for further details and related errors.Check the Oracle log (alert.log) and related trace logs(*.trc) for additional information and correct thereported problem(s).

12254-2

ERROR The "%s [ %s ]" attempt of thedatabase "%s" appears to havefailed on "%s".

Cause: The attemptedOracle action {action} usingmethod {action_method} for the database instance{sid} failed on the server {server}.

Action: Check the adjacent logmessages for furtherdetails and related errors. Check the Oracle log(alert.log) and related trace logs (*.trc) for additionalinformation and correct the reported problem(s).

12254-3

ERROR All attempts to "%s" database"%s" on "%s" failed

Cause: All efforts to perform the action {action} on theOracle database {sid} on server {server} have failed.

Action: Check the adjacent logmessages for furtherdetails and related errors. Check the Oracle log(alert.log) and related trace logs (*.trc) for additionalinformation and correct the reported problem(s).

12254-4

ERROR Update of "%s" sid "%s" on"%s" failed. Reason: "%s" "%s"failed: "%s".

Cause: An unexpected error occurred whileattempting to update the oratab entry for the database{sid}. The error occurred while attempting to open theoratab file.

Action: The oratab file entry for {sid} will need to beupdatedmanually to turn off the automatic start up ofthe database at system boot.

12254-5

ERROR Unable to locate the oratab file in"/etc" or in "%s" on "%s".

Cause: The oratab file was not found at the default oralternate locations on {server}.

Action: Verify the oratab file exists and has properpermissions for the oracle user. A valid oratab file isrequired to complete the "extend" operation.

SIOS Protection Suite for LinuxPage 421

Page 442: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12254-6

ERROR Unable to open file "%s" on "%s"(%s).

Cause: The specified file {file} could not be opened oraccessed on the server {server} due to the error{error}.

Action: Verify the existence and permissions on thespecified file {file}. Check adjacent logmessages forfurther details and related errors. Youmust correctany reported errors before retrying the operation.

12254-7

ERROR (cleanUpPids):Forcefully killinghung pid(s):pid(s)="%s"

Cause: The process {pid} failed to respond to therequest to terminate gracefully. The process {pid} willbe forcefully terminated.

Action: Use the command line to verify that theprocess {pid} has been terminated. Check theadjacent logmessages for further details and relatedmessages.

12254-8

ERROR Unable to locate the DB utility(%s/%s) on this host.

Cause: TheOracle binaries and required databaseutility {utility} located at {path/utility} were not foundon this server {server}.

Action: Verify that the Oracle binaries and requiredsoftware utilities are installed and properly configuredon the server {server}. TheOracle binaries must beinstalled locally on each node or located on sharedstorage available to all nodes in the cluster.

12254-9

ERROR Oracle internal error or non-stand-ard Oracle configuration detec-ted. Oracle User and/or Groupset to "root".

Cause: The detected ownership of the Oracledatabase installation resolves to the root user and/orroot group. Ownership of the Oracle installation byroot is a non-standard configuration.

Action: The owner of the Oracle binaries must be avalid non-root user on {server}. Correct thepermissions and ownership of the Oracle databaseinstallation and retry the operation.

12255-0

ERROR Initial inspection of "%s" failed,verifying failure or success ofreceived output.

Cause: The previous Oracle query {query} orcommand {cmd} failed to return success.

Action: Check the adjacent logmessages for furtherdetails and related errors. Check the Oracle log(alert.log) and related trace logs (*.trc) for additionalinformation and correct the reported problem(s).

SIOS Protection Suite for LinuxPage 422

Page 443: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12255-1

ERROR Logon failed with "%s" for "%s"on "%s". Please check user-name/password and privileges.

Cause: The logon with the credentials {credentials}for the database instance {sid} on server {server}failed. An invalid user {user} or password wasspecified.

Action: Verify that the Oracle database user {user}and password {password} are indeed valid. Inaddition, the Oracle database user {user} must havesufficient privileges for the attempted action.

12255-2

ERROR %s Cause: Themessage contains the output of the"sqlplus" command.

Action: Check adjacent logmessages for furtherdetails and relatedmessages.

12255-3

ERROR Unable to open file "%s" on "%s"(%s).

Cause: The specified file {file} could not be opened oraccessed on the server {server} due to the error{error}.

Action: Verify the existence and permissions on thespecified file {file}. Check adjacent logmessages forfurther details and related errors. Youmust correctany reported errors before retrying the operation.

12255-4

ERROR The tag "%s" on "%s" is not anOracle instance or it does notexist.

Cause: The specified tag {tag} on server {server}does not refer to an existing and valid Oracle resourceinstance.

Action: Use the UI or "lcdstatus (1M)" to verify theexistence of the resource tag {tag}. The resource tag{tag} must be anOracle resource instance to use thecommand "ora-display."

12255-5

ERROR Failed to create object instancefor Oracle on "%s".

Cause: There was an unexpected error creating aninternal representation of the Oracle instance beingprotected while attempting to update the authorizeduser, password and database role for the Oracleresource instance.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

SIOS Protection Suite for LinuxPage 423

Page 444: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12255-7

ERROR Update of user and passwordfailed for "%s" on "%s".

Cause: A request to update the user and passwordfor the resource tag {tag} failed. The specifiedcredentials failed the initial validation/connectionattempt on server {server}.

Action: Verify the correct credentials{user/password} were specified for the attemptedoperation. Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12255-9

ERROR Update of user and passwordfailed for "%s" on "%s".

Cause: The update of the user and passwordinformation for the resource tag {tag} on server{server} failed.

Action: Verify the correct credentials{user/password} were specified for the attemptedoperation. Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12256-2

ERROR Unable to find the Oracle execut-able "%s" on "%s".

Cause: The required Oracle executable {exe} was notfound on this server {server}.

Action: Verify that the Oracle binaries and requiredsoftware utilities are installed and properly configuredon the server {server}. TheOracle binaries must beinstalled locally on each node or located on sharedstorage available to all nodes in the cluster.

12256-6

ERROR Unable to find Oracle home for"%s" on "%s".

Cause: TheOracle home directory {Oracle home}does not appear to contain files necessary for theproper operation of the Oracle instance {sid}.

Action: Verify using the command line that theOracle home directory {Oracle home} contains theOracle binaries, a valid spfile{sid}.ora or init{sid}.orafile.

12256-7

ERROR Oracle SID mismatch. Theinstance SID "%s" does notmatch the SID "%s" specifiedfor the command.

Cause: There was an unexpected error creating aninternal representation of the Oracle instance beingprotected. The specified internal ID {id} does notmatch the expected SID {sid}.

Action: Verify the parameters are correct. Checkadjacent logmessages for further details and relatedmessages. Youmust correct any reported errorsbefore retrying the operation.

SIOS Protection Suite for LinuxPage 424

Page 445: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12256-8

ERROR DB Processes are not runningon "%s".

Cause: A process check for the Oracle instance didnot find any processes running on server {server}.

Action: If local recovery is enabled, the Oracleinstance will be restarted locally. Check adjacent logmessages for further details and relatedmessages.

12257-2

ERROR Failed to create flag "%s" on"%s".

Cause: An unexpected error occurred attempting tocreate a flag for controlling Oracle local recoveryprocessing causing a failover to the standby node.

Action: Check the adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12257-4

ERROR all attempts to shutdown thedatabase%s failed on "%s".

Cause: The shutdown of the Oracle database failedduring a local recovery process most likely causedbecause themaximum number of databaseconnections has been reached.

Action: Check the Oracle logs for connection failurescaused by themaximum number of availableconnections being reached, and if found, considerincreasing the value. Additionally, set the tunable LK_ORA_NICE to 1 to prevent connection failures fromcausing a quickCheck failure followed by a localrecovery attempt.

12259-7

ERROR Failed to create object instancefor Oracle on "%s".

Cause: There was an unexpected error creating aninternal representation of the Oracle instance beingprotected during pre-extend checking.

Action: Check the adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors before retrying the pre-extend.

12259-8

ERROR Failed to create object instancefor Oracle on "%s".

Cause: There was an unexpected error creating aninternal representation of the Oracle instance beingcreated while attempting to determine the validity ofthe Oracle home directory.

Action: Check the adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors before retrying the "create."

SIOS Protection Suite for LinuxPage 425

Page 446: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12259-9

ERROR Failed to create object instancefor Oracle on "%s".

Cause: There was an unexpected error creating aninternal representation of the Oracle instance beingprotected while attempting to look up theOracle useron the template system.

Action: Check the adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors before retrying the extend.

12260-0

ERROR Failed to create object instancefor Oracle on "%s".

Cause: There was an unexpected error creating aninternal representation of the Oracle instance beingprotected while attempting to display the resourceproperties.

Action: Check the adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors before retrying the display of the resourceproperties.

12260-1

ERROR Failed to create object instancefor Oracle on "%s".

Cause: There was an unexpected error creating aninternal representation of the Oracle instance beingprotected while attempting to check for validdatabase authorization.

Action: Check the adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors before retrying the command.

12260-3

ERROR Failed to create object instancefor Oracle on "%s".

Cause: There was an unexpected error creating aninternal representation of the Oracle instance beingprotected while attempting to perform health checkson theOracle resource instance.

Action: Check that correct arguments were passedto the quickCheck command and also check theadjacent logmessages for further details and relatedmessages. Correct any reported errors before retryingthe restore.

12260-4

ERROR Failed to create object instancefor Oracle on "%s".

Cause: There was an unexpected error creating aninternal representation of the Oracle instance beingprotected while attempting to perform a local recoveryon the Oracle resource instance.

Action: Check that correct arguments were passedto the "recover" command, and also check theadjacent logmessages for further details and relatedmessages. Correct any reported errors before retryingthe recover.

SIOS Protection Suite for LinuxPage 426

Page 447: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12260-6

ERROR TheOracle database "%s" is notrunning or no open connectionsare available on "%s".

Cause: The database instance {sid} was not runningor connections to the database are not available viathe credentials provided.

Action: The database instance {sid} must be startedon {server} and the proper credentials must beprovided for the completion of the selected operation.

12260-7

ERROR TheOracle database "%s" is notrunning or no open connectionsare available on "%s".

Cause: The database instance {sid} was not runningor connections to the database are not available viathe credentials provided.

Action: The database instance {sid} must be startedon {server} and the proper credentials must beprovided for the completion of the selected operation.

12260-8

ERROR Failed to create object instancefor Oracle on "%s".

Cause: The "remove" operation failed to create theresource object instance required to take the Oracleresource Out of Service.

Action: Check that correct arguments were passedto the "remove" command and also check theadjacent logmessages for further details and relatedmessages. Correct any reported errors before retryingthe restore.

12260-9

ERROR Failed to create object instancefor Oracle on "%s".

Cause: The "restore" operation failed to create theresource object instance required to put the Oracleresource In Service.

Action: Check that correct arguments were passedto the "restore" command and also check theadjacent logmessages for further details and relatedmessages. Correct any reported errors before retryingthe "restore."

12261-0

ERROR Unable to "%s" on "%s" duringresource create.

Cause: TheOracle Application Recovery Kit wasunable to create the administrative lock using the"getlocks" command during resource creation.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors before retrying the create.

SIOS Protection Suite for LinuxPage 427

Page 448: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12261-1

ERROR %s Cause: The requested dependency creation betweenthe parent Oracle resource and the child File Systemresource failed.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors before retrying the create operation.

12261-2

ERROR %s Cause: The requested dependency creation betweenthe parent Oracle resource and the child Rawresource failed.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors before retrying the create operation.

12261-3

ERROR %s Cause: The requested dependency creation betweenthe parent Oracle resource and the child Rawresource failed.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors before retrying the create operation.

12261-4

ERROR %s Cause: The requested dependency creation betweenthe parent Oracle resource and the child Listenerresource failed.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors before retrying the create operation.

12261-6

ERROR %s Cause: The requested start up or shutdown of theOracle database failed.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors before retrying the "restore" or "remove"operation.

12261-8

ERROR Dependency creation betweenOracle database "%s (%s)" andthe dependent resource "%s" on"%s" failed. Reason

Cause: LifeKeeper was unable to create adependency between the database resource {tag} andthe necessary child resource {childtag}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Once any problemshave been corrected, it may be possible to create thedependency between {tag} and {childtag} manually.

SIOS Protection Suite for LinuxPage 428

Page 449: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12261-9

ERROR Dependency creation betweenOracle database "%s (%s)" andthe dependent resource "%s" on"%s" failed. Reason

Cause: LifeKeeper was unable to create adependency between the database resource {tag} andthe necessary child resource {childtag}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Once any problemshave been corrected, it may be possible to create thedependency between {tag} and {childtag} manually.

12262-5

ERROR Unable to find the Oracle execut-able "%s" on "%s".

Cause: The quickCheck process was unable to findthe Oracle executable "sqlplus."

Action: Check the Oracle configuration and alsocheck adjacent logmessages for further details andrelatedmessages. Correct any reported problems.

12262-6

ERROR Unable to find the Oracle execut-able "%s" on "%s".

Cause: The remove process was unable to find theOracle executable "sqlplus."

Action: Check the Oracle configuration and alsocheck adjacent logmessages for further details andrelatedmessages. Correct any reported problems.

12262-7

ERROR Unable to find the Oracle execut-able "%s" on "%s".

Cause: The restore process was unable to find theOracle executable "sqlplus."

Action: Check the Oracle configuration and alsocheck adjacent logmessages for further details andrelatedmessages. Correct any reported problems.

12262-8

ERROR Unable to find the Oracle execut-able "%s" on "%s".

Cause: The recover process was unable to find theOracle executable "sqlplus."

Action: Check the Oracle configuration and alsocheck adjacent logmessages for further details andrelatedmessages. Correct any reported problems.

12263-2

ERROR Oracle SID mismatch. Theinstance SID "%s" does notmatch the SID "%s" specifiedfor the command.

Cause: During a remove, the resource instance {sid}passed to the remove process does not matchinternal resource instance information for the {sid}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

SIOS Protection Suite for LinuxPage 429

Page 450: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12263-3

ERROR Oracle SID mismatch. Theinstance SID "%s" does notmatch the SID "%s" specifiedfor the command.

Cause: During a restore, the resource instance {sid}passed to restore does not match internal resourceinstance information for the {sid}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12263-4

ERROR Oracle SID mismatch. Theinstance SID "%s" does notmatch the SID "%s" specifiedfor the command.

Cause: During resource recovery, the resourceinstance {sid} passed to recovery does not matchinternal resource instance information for the {sid}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12263-6

ERROR END failed hierarchy "%s" ofresource "%s" on server "%s"with return value of %d

Cause: The create of the Oracle resource hierarchy{tag} failed on {server}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12263-8

ERROR END failed%s of "%s" on server"%s" due to a "%s" signal

Cause: The create action for the Oracle databaseresource {tag} on server {server} failed. The signal{sig} was received by the create process.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12264-0

ERROR Error creating resource "%s" onserver "%s"

Cause: An unexpected error occurred attempting tocreate the Oracle resource instance {tag} on {server}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12264-1

ERROR Error creating resource "%s" onserver "%s"

Cause: An unexpected error occurred attempting tocreate the Oracle resource instance {tag} on {server}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

SIOS Protection Suite for LinuxPage 430

Page 451: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12264-2

ERROR Error creating resource "%s" onserver "%s"

Cause: An unexpected error occurred attempting tocreate the Oracle resource instance {tag} on {server}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12264-3

ERROR Cannot extend resource "%s" toserver "%s"

Cause: LifeKeeper was unable to extend theresource {resource} on {server}.

Action: Check the adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12264-4

ERROR Error getting resource inform-ation for resource "%s" on server"%s"

Cause: An unexpected error occurred attempting toretrieve resource instance information for {tag} on{server}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors and retry the extend.

12264-5

ERROR Cannot access canextend script"%s" on server "%s"

Cause: LifeKeeper was unable to run pre-extendchecks because it was unable to find the "canextend"script on {server} for a dependent child resource.

Action: Check your LifeKeeper configuration.

12264-6

ERROR Error getting resource inform-ation for resource "%s" on server"%s"

Cause: An unexpected error occurred attempting toretrieve resource instance information for {tag} on{server}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors and retry the extend.

12264-7

ERROR Cannot extend resource "%s" toserver "%s"

Cause: LifeKeeper was unable to extend theresource {resource} on {server}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

SIOS Protection Suite for LinuxPage 431

Page 452: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12264-8

ERROR Resource with either matchingtag "%s" or id "%s" alreadyexists on server "%s" for App"%s" and Type "%s"

Cause: During the database resource extension, aresource instance was found using the same {tag}and/or {id} but with a different resource applicationand type.

Action: Resource IDs must be unique. The resourceinstance with the ID matching the Oracle resourceinstancemust be removed.

12264-9

ERROR Error creating resource "%s" onserver "%s"

Cause: An unexpected error occurred attempting tocreate the Oracle resource instance {tag} on {server}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12265-0

ERROR Cannot access extend script"%s" on server "%s"

Cause: The request to extend the database resource{resource} to {server} failed because it was unable tothe find the script {extend} on {server} for a dependentchild resource.

Action: Check your LifeKeeper configuration.

12265-1

ERROR Cannot extend resource "%s" toserver "%s"

Cause: The request to extend the database resouce{resource} to {server} failed because of an errorattempting to extend a dependent child resource.

Action: Check the adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12265-4

ERROR END failed%s of "%s" on server"%s" due to a "%s" signal

Cause: The health check for the database {sid} wasterminated because the quickCheck processreceived a signal. This is most likely caused by thequickCheck process requiringmore time to completethan was allotted.

Action: The health check time for anOracle resourceis controlled by the tunable value ORACLE_QUICKCHECK_TIMEOUT. Set it to a value greaterthan 45 seconds to allow more time for the healthcheck process to complete.

SIOS Protection Suite for LinuxPage 432

Page 453: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12265-5

ERROR END failed%s of "%s" on server"%s" due to a "%s" signal

Cause: The request to take database {sid} "Out ofService" was terminated because the removeprocess received a signal. This is most likely causedby the remove process requiringmore time tocomplete than was allotted.

Action: The remove time for anOracle resource iscontrolled by the tunable value ORACLE_REMOVE_TIMEOUT. Set the tunable to a value greater than240 seconds to allow more time for the removeprocess to complete.

12265-9

ERROR END failed%s of "%s" on server"%s" due to a "%s" signal

Cause: The request to place database {sid} "InService" was terminated because the restore processreceived a signal. This is most likely caused by therestore process requiringmore time to complete thanwas allotted.

Action: The restore time for anOracle resource iscontrolled by the tunable value ORACLE_RESTORE_TIMEOUT. Set the tunable to a valuegreater than 240 seconds to allow more time for therestore process to complete.

12266-3

ERROR END failed%s of "%s" on server"%s" due to a "%s" signal

Cause: The recovery of the failed database wasterminated because the recovery process received asignal. This is most likely caused by the recoveryprocess requiringmore time to complete than wasallotted.

Action: The recovery time for anOracle resource iscontrolled by the tunable values ORACLE_RESTORE_TIMEOUT andORACLE_REMOVE_TIMEOUT. Set one or both of these to a value greaterthan 240 seconds to allow more time for a recovery tocomplete.

12267-0

ERROR Update of "%s" sid "%s" on"%s" failed. Reason: "%s" "%s"failed: "%s".

Cause: An unexpected error occurred whileattempting to update the oratab entry for the database{sid}. The error occurred while attempting to open thetemporary file used in the update process.

Action: The oratab file entry for {sid} will need to beupdatedmanually to turn off the automatic start up ofthe database at system boot.

SIOS Protection Suite for LinuxPage 433

Page 454: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12267-1

ERROR Update of "%s" sid "%s" on"%s" failed. Reason: "%s" "%s"failed: "%s".

Cause: An unexpected error occurred whileattempting to update the oratab entry for the database{sid}. The error occurred while attempting to close thetemporary file used in the update process.

Action: The oratab file entry for {sid} will need to beupdatedmanually to turn off the automatic start up ofthe database at system boot.

12267-2

ERROR Update of "%s" sid "%s" on"%s" failed. Reason: "%s" "%s"failed: "%s".

Cause: An unexpected error occurred whileattempting to update the oratab entry for the database{sid}. The error occurred while attempting to renamethe temporary file back to oratab.

Action: The oratab file entry for {sid} will need to beupdatedmanually to turn off the automatic start up ofthe database at system boot.

12267-3

ERROR Unable to logmessages queuedwhile running as oracle user%son%s. Reason: $!

Cause: An unexpected error {reason} occurred whileattempting to addmessages to the log file. Thesemessages were generated while running as theOracle user.

Action: Review the reason for the failure and takecorrective action.

12267-4

ERROR Unable to open%s Reason: %s. Cause: An unexpected error occurred whileattempting to open a connection to the Oracledatabase and run the database {cmd}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Additionally, check theOracle log (alert.log) and related trace logs (*.trc) foradditional information and correct any reportedproblems.

12300-6

FATAL Unknown version%s of IPaddress

Cause: The IP address does not appear to be validfor either IPv4 or IPv6.

Action: Provide a valid IP address.

12300-8

ERROR No pinglist found for%s. Cause: Problem while opening the pinglist for this IPaddress.

Action: Make sure you have provided a pinglist forthis IP address.

SIOS Protection Suite for LinuxPage 434

Page 455: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12300-9

ERROR List ping test failed for virtual IP%s

Cause: No response was received from any of theaddresses in the ping list.

Action: Check network connectivity of this node andthe systems on which the IPs in the ping list reside.

12301-3

ERROR Link check failed for virtual IP%s on interface%s.

Cause: The requested interface is showing 'NO-CARRIER' indicating that no link is present on thephycial layer connection.

Action: Check the physical connections for theinterface and bring the physical layer link up.

12301-5

ERROR Link check failed for virtual IP%s on interface%s.

Cause: The requested interface is a bondedinterface, and one of the slaves is showing 'NO-CARRIER' indicating that no link is present on thephycial layer connection.

Action: Check the physical connections for the slaveinterface and bring the physical layer link up.

12302-4

ERROR IP address seems to still existsomewhere else.

Cause: The IP address appears to be in useelsewhere on the network.

Action: Either select a different IP address to use orlocate and disable the current use of this IP address.

12302-7

ERROR END failed%s of "%s" on server"%s" due to a "%s" signal

Cause: The quickCheck of a virtual IP ended due to atimeout.

Action: Correct the condition or provide amoreappropriate value for IP_QUICKCHECK_TIMEOUTin /etc/default/LifeKeeper.

12303-7

ERROR must specify machine name con-taining primary hierarchy

Cause: Not enough arguments were provided tocreIPhier.

Action: Supply all of the needed arguments tocreIPhier.

12303-8

ERROR must specify IP resource name Cause: Not enough arguments were passed tocreIPhier.

Action: Supply all of the needed arguments tocreIPhier.

12303-9

ERROR must specify primary IPResource tag

Cause: The argument specifying the primary IPResource tag was missing from the "creIPhier"command.

Action: Supply all of the needed arguments.

SIOS Protection Suite for LinuxPage 435

Page 456: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12304-2

ERROR An unknown error has occurredin utility validmask onmachine%s.

Cause: There was an unexpected error running the"validmask" utility.

Action: Check adjacent logmessages for additionaldetails.

12304-5

ERROR An unknown error has occurredin utility getlocks.

Cause: There was an unexpected error running the"getlocks" utility.

Action: Check adjacent logmessages for additionaldetails.

12305-3

ERROR Cannot resolve hostname%s Cause: A hostnamewas provided for the IP address,but the system was unable to resolve the name to anIP address.

Action: Check the correctness of the hostname andverify that name resolution (DNS or /etc/hosts) isworking correctly and returns the IP for the hostname.

12305-5

ERROR An unknown error has occurredin utility %s onmachine%s.

Cause: There was a failure while creating the IPresource.

Action: Check the logs for related errors and try toresolve the reported problem.

12305-6

ERROR create ip hierarchy failure: per-form_action failed

Cause: Unexpected error trying to restore the IPaddress during creation.

Action: Check adjacent logmessages for additionaldetails.

12305-9

ERROR Resource already exists onmachine%s

Cause: Attempted to create an IP address thatalready exists.

Action: Reuse the existing resource or manuallyremove the IP address if it exists or use a different IPaddress.

12306-0

ERROR ins_create failed onmachine%s Cause: An unexpected failure occurred while creatingan IP resource.

Action: Check adjacent logmessages for furtherdetails.

12306-4

ERROR An unknown error has occurredin utility %s onmachine%s.

Cause: There was a failure while creating adependency for the IP resource.

Action: Check the logs for related errors and try toresolve the reported problem.

SIOS Protection Suite for LinuxPage 436

Page 457: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12306-6

ERROR An error occurred during creationof LifeKeeper application=common%s.

Cause: A failure occured while calling "app_create."

Action: Check the logs for related errors and try toresolve the reported problem.

12306-8

ERROR An error occurred during creationof LifeKeeper resource type=ipon%s.

Cause: A failure occured while calling "typ_create."

Action: Check the logs for related errors and try toresolve the reported problem.

12309-1

ERROR the link for interface%s is down Cause: The requested interface is showing 'NO-CARRIER' indicating that no link is present on thephycial layer connection.

Action: Check the physical connections for theinterface and bring the physical layer link up.

12309-3

ERROR the ping list check failed Cause: No response was received from any of theaddresses in the ping list.

Action: Check network connectivity of this node andthe systems on which the IPs in the ping list reside.

12309-5

ERROR broadcast ping failed Cause: No replies were received from a broadcastping.

Action: Verify that at least one host on the subnetwill respond to broadcast pings. Verify that virtual IPis on the correct network interface. Consider using apinglist instead of a broadcast ping.

12309-6

ERROR $msg Cause: The broadcast ping used to determine theviability of the virtual IP failed.

Action: Please ensure that the ping list for thisresource is properly configured in the properties panelor that broadcast ping checking is disabled by addingNOBCASTPING=1 to the /etc/default/LifeKeeperconfiguration file.

12309-7

ERROR exec_list_ping(): broadcast pingfailed.

Cause: The broadcast ping used to determine theviability of the virtual IP failed.

Action: Ensure that the ping list for this resource isproperly configured in the properties panel or thatbroadcast ping checking is disabled by addingNOBCASTPING=1 to the /etc/default/LifeKeeperconfiguration file.

SIOS Protection Suite for LinuxPage 437

Page 458: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12400-4

FATAL resource tag name not specified Cause: Invalid arguments were specified for the"quickCheck" operation.

Action: Ensure that the correct arguments arepassed.

12400-5

FATAL resource id not specified Cause: Invalid arguments were specified for the"quickCheck" operation.

Action: Ensure that the correct arguments arepassed.

12400-7

FATAL Failed to get resource inform-ation

Cause: The filesystem resource's info field does notcontain the correct information.

Action: Put the correct information in the resource'sinfo field or restore the system from a recent"lkbackup" to restore the original info field.

12400-8

ERROR getId failed Cause: The filesystem resource could not find theunderlying disk device.

Action: Check adjacent logmessages for furtherdetails. Verify that the resource hierarchy is valid andthat all required storage kits are installed.

12400-9

ERROR LifeKeeper protected filesystemis in service but quickCheckdetects the following error

Cause: The filesystem kit has found somethingwrong with the resource.

Action: Check themessages immediately followingthis one for more details.

12401-0

ERROR \"$id\" is not mounted Cause: The filesystem resource is no longermounted.

Action: No action is required. Allow local recovery toremount the resource.

12401-1

ERROR \"$id\" is mounted but with theincorrect mount options (currentmount option list: $mntopts,expectedmount option list: $in-foopts

Cause: The filesystem resource is mountedincorrectly.

Action: No action is required. Allow local recovery toremount the resource.

12401-2

ERROR \"$id\" is mounted but on thewrong device (current mountdevice: $tmpdev, expectedmount device: $dev

Cause: The filesystem resource has the wrongdevicemounted.

Action: No action is required. Allow local recovery toremount the resource.

SIOS Protection Suite for LinuxPage 438

Page 459: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12401-5

ERROR LifeKeeper protected filesystem\"$tag\" ($id) is $percent% full($blocksfree free blocks).

Cause: The filesystem is getting full.

Action: Remove or migrate data from the filesystem.

12401-6

WARN LifeKeeper protected filesystem\"$tag\" ($id) is $percent% full($blocksfree free blocks).

Cause: The filesystem is getting full.

Action: Remove or migrate data from the filesystem.

12402-0

FATAL cannot find device informationfor filesystem $id

Cause: The filesystem resource could not find theunderlying disk device.

Action: Check adjacent logmessages for furtherdetails. Verify that the resource hierarchy is valid andthat all required storage kits are installed.

12402-9

ERROR Failed to find child resource. Cause: The filesystem resource could not determineits underlying disk resource.

Action: Ensure that the resource hierarchy is correct.

12403-2

FATAL Script has hung. Exiting. Cause: Processes had files open on amountedfilesystem that needed to be unmounted. Killing thoseprocesses has taken too long.

Action: If this error continues, try to temporarily stopall software that may be using themount point toallow it to be unmounted. If the filesystem still cannotbe unmounted, contact Support.

12404-2

ERROR file system $fsname failedunmount; will try again

Cause: Processes had files open on amountedfilesystem that needed to be unmounted. It can takemultiple attempts to clear those processes.

Action: No action is required. Allow the process tocontinue.

12404-6

ERROR file system $fsname failedunmount

Cause: A filesystem could not be unmounted.

Action: If this error continues, try to temporarily stopall software that may be using themount point toallow it to be unmounted. If the filesystem still cannotbe unmounted, contact Support.

12404-9

ERROR Local recovery of resource hasfailed (err=$err

Cause: A filesystem resource has a problem thatcannot be repaired locally.

Action: No action is required. Allow the resource tobe failed over to another system.

SIOS Protection Suite for LinuxPage 439

Page 460: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12410-3

ERROR $ERRMSGScript was ter-minated for unknown reason

Cause: This message should not occur under normalcircumstances.

Action: Check adjacent logmessages for furtherdetails.

12410-4

ERROR $ERRMSGRequired templatemachine name is null

Cause: Invalid arguments were specified for thecanextend operation.

Action: Ensure that the arguments are correct. If thiserror happens during normal operation, please contactSupport.

12410-5

ERROR $ERRMSGRequired templateresource tag name is null

Cause: Invalid arguments were specified for thecanextend operation.

Action: Ensure that the arguments are correct. If thiserror happens during normal operation, please contactSupport.

12410-6

ERROR $ERRMSGUnable to accesstemplate resource \"$Tem-plateTagName\

Cause: The resource's underlying disk informationcannot be determined.

Action: Ensure the hierarchy is correct on thetemplate system before extending.

12410-7

ERROR $ERRMSGResource \"$Tem-plateTagName\" must have oneand only one device resourcedependency

Cause: The resource has toomany underlyingdevices in the hierarchy.

Action: Ensure the hierarchy is correct on thetemplate system before extending.

12410-8

ERROR $ERRMSGUnable to accesstemplate resource \"$Tem-plateTagName\

Cause: The resource cannot be found on thetemplate system.

Action: Ensure the hierarchy is correct on thetemplate system before extending.

12410-9

ERROR $ERRMSGCan not accesscanextend forscsi/$DeviceResTyperesources onmachine \"$Tar-getSysName\

Cause: The target system is missing some requiredcomponents.

Action: Ensure that the target system has all thecorrect kits installed and licensed.

SIOS Protection Suite for LinuxPage 440

Page 461: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12411-0

ERROR $ERRMSGEither filesystem\"$TemplateLKId\" is not moun-ted on \"$TemplateSysName\"or filesystem is not shareablewith \"$TargetSysName\

Cause: The filesystem isn't in service on thetemplate system or doesn't meet the requirements forextending to the target system.

Action: Make sure the resource is in service on thetemplate system and review the productdocumentation regarding the requirements forextending filesystems.

12411-1

ERROR $ERRMSGFile system type\"${FSType}\" is not supportedby the kernel currently runningon \"${TargetSysName}\

Cause: The filesystem's type cannot bemounted onthe target system due to lack of kernel support.

Action: Ensure that the target system has all itskernel modules configured correctly before extendingthe resource.

12411-2

ERROR must specify machine name con-taining primary hierarchy

Cause: Invalid arguments were specified for thecreFShier operation.

Action: If this error happens during normal operation,please contact Support.

12411-3

ERROR must specify primary ROOT tag Cause: Invalid arguments were specified for thecreFShier operation.

Action: If this error happens during normal operation,please contact Support.

12411-4

ERROR must specify primary mountpoint

Cause: Invalid arguments were specified for thecreFShier operation.

Action: If this error happens during normal operation,please contact Support.

12411-5

ERROR must specify primary switch-back type

Cause: Invalid arguments were specified for thecreFShier operation.

Action: If this error happens during normal operation,please contact Support.

12411-8

ERROR dep_remove failure onmachine\""$PRIMACH"\" for parent\"$PRITAG\" and child\"$DEVTAG.\

Cause: Cleanup after a dependency creation failed.

Action: Check adjacent logmessages for furtherdetails.

12411-9

ERROR ins_remove failure onmachine\""$PRIMACH"\" for\"$PRITAG.\

Cause: Cleanup after an instance creation failed.

Action: Check adjacent logmessages for furtherdetails.

SIOS Protection Suite for LinuxPage 441

Page 462: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12412-1

ERROR ins_remove failure onmachine\""$PRIMACH"\

Cause: Cleanup after a resource creation failed.

Action: Check adjacent logmessages for furtherdetails.

12412-2

ERROR $ERRMSGScript was ter-minated for unknown reason

Cause: This message should not occur under normalcircumstances.

Action: Check adjacent logmessages for furtherdetails.

12412-3

ERROR $ERRMSGRequired templatemachine name is null

Cause: Invalid arguments were specified for thedepstoextend operation.

Action: Ensure the script is called correctly. If thiserror happens during normal operation, please contactSupport.

12412-4

ERROR $ERRMSGRequired templateresource tag name is null

Cause: Invalid arguments were specified for thedepstoextend operation.

Action: Ensure the script is called correctly. If thiserror happens during normal operation, please contactSupport.

12412-5

ERROR $ERRMSGUnable to accesstemplate resource \"$Tem-plateTagName\

Cause: The resource was unable to locate itsunderlying disk resource.

Action: Ensure the hierarchy and all dependenciesare correct before extending.

12412-6

ERROR unextmgr failure onmachine\""$PRIMACH"\

Cause: The cleanup, after a failed resource extendoperation, failed.

Action: Manually clean up any remaining resourcesand check adjacent logmessages for further details.

12412-8

ERROR unextmgr failure onmachine\""$PRIMACH"\" for\"$PRITAG.\

Cause: The cleanup, after a failed resource extendoperation, failed.

Action: Manually clean up any remaining resourcesand check adjacent logmessages for further details.

12412-9

ERROR $ERRMSGScript was ter-minated for unknown reason

Cause: This message should not occur under normalcircumstances.

Action: Look for additional logmessages for moredetails.

SIOS Protection Suite for LinuxPage 442

Page 463: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12413-0

ERROR $ERRMSGRequired templatemachine name is null

Cause: Invalid arguments were specified for theextend operation.

Action: Ensure the script is called correctly. If thiserror happens during normal operation, please contactSupport.

12413-1

ERROR $ERRMSGRequired templateresource tag name is null

Cause: Invalid arguments were specified for theextend operation.

Action: Ensure the script is called correctly. If thiserror happens during normal operation, please contactSupport.

12413-2

ERROR $ERRMSGRequired targetmount point is null

Cause: Invalid arguments were specified for theextend operation.

Action: Ensure the script is called correctly. If thiserror happens during normal operation, please contactSupport.

12413-3

ERROR $ERRMSGUnable to accesstemplate resource \"$Tem-plateTagName\

Cause: The tag being extended doesn't exist on thetemplate system.

Action: Ensure that the hierarchy is correct on thetemplate system before extending.

12413-4

ERROR $ERRMSGDetected conflict inexpected tag name \"$Tar-getTagName\" on targetmachine.

Cause: A resource already exists on the targetsystem with the same tag as the resource beingextended.

Action: Recreate one of the conflicting resourceswith a different tag.

12413-5

ERROR $ERRMSGResource \"$Tem-plateTagName\" does not haverequired device resource depend-ency or unable to access thisresource on templatemachine.

Cause: The resource or its underlying disk resourcecannot be found on the template system.

Action: Ensure that the hierarchy is correct on thetemplate system before extending.

12413-6

ERROR $ERRMSGResource \"$Tem-plateTagName\" must have oneand only one device resourcedependency

Cause: The resource has multiple underlying devicesin the hierarchy on the template system.

Action: Ensure the hierarchy is correct beforeextending and that the filesystem resource onlydepends on a single disk resource.

SIOS Protection Suite for LinuxPage 443

Page 464: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12413-7

ERROR $ERRMSGCan not accessextend forscsi/$DeviceResTyperesources onmachine \"$Tar-getSysName\

Cause: The files required to support the given storagetype aren't available on the target system.

Action: Ensure that the required kits are installed onthe target system and licensed.

12413-8

ERROR $ERRMSGUnable to access tar-get device resource\"$DeviceTagName\" onmachine \"$TargetSysName\

Cause: The required underlying disk resource doesn'texist on the target system.

Action: Check adjacent logmessages for furtherdetails and ensure that the target system is properlyconfigured for hosting the resources being extended.

12413-9

ERROR $ERRMSGUnable to accesstemplate \"/etc/mtab\" file

Cause: The target system cannot read the templatesystem's /etc/mtab file.

Action: Check adjacent logmessages for furtherdetails. Ensure that the /etc/mtab file exists on thetemplate system.

12414-0

ERROR $ERRMSGUnable to findmountpoint entry \"$TemplateLKId\" intemplate \"/etc/mtab\" file. Istemplate resource in-service?

Cause: The resource doesn't appear to bemountedon the template system.

Action: Make sure the resource is in service beforeextending.

12414-1

ERROR $ERRMSGUnable to findmountpoint \"$TemplateLKId\" modeon templatemachine

Cause: The details of themount point on the templatesystem cannot be determined.

Action: Ensure that the resource is in service andaccessible on the template system before extending.

12414-2

ERROR $ERRMSGUnable to create oraccess mount point \"$Tar-getLKId\" on target machine

Cause: Themount point could not be created on thetarget system.

Action: Ensure that themount point's parentdirectory exists and is accessible on the targetsystem.

12414-3

ERROR $ERRMSGTwo ormore con-flicting entries found in /etc/fstabon \"$TargetSysName\

Cause: The device or mount point appears to bemountedmore than once on the target system.

Action: Ensure that themount point is not mountedon the target system before extending.

12414-4

ERROR $ERRMSGFailed to createresource instance on \"$Tar-getSysName\

Cause: The resource creation on the target systemfailed.

Action: Check adjacent logmessages for furtherdetails. Make sure to check the logs on the targetserver.

SIOS Protection Suite for LinuxPage 444

Page 465: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12414-5

ERROR $ERRMSGFailed to setresource instance state for\"$TargetTagName\" on \"$Tar-getSysName\

Cause: The source state could not be changed toOSU on the target system.

Action: Check adjacent logmessages for furtherdetails.

12414-6

ERROR must specify machine name con-taining primary hierarchy

Cause: Invalid arguments were specified for thefilesyshier operation.

Action: Ensure the script is called correctly. If thiserror happens during normal operation, please contactSupport.

12414-7

ERROR must specify primary mountpoint

Cause: Invalid arguments were specified for thefilesyshier operation.

Action: Ensure the script is called correctly. If thiserror happens during normal operation, please contactSupport.

12414-9

ERROR create file system hierarchy fail-ure

Cause: The process of finding the resource instancefailed.

Action: Check adjacent logmessages for furtherdetails.

12415-0

ERROR create file system hierarchy fail-ure

Cause: The system failed to read the /etc/mtab file.

Action: Check adjacent logmessages for furtherdetails.

12415-1

ERROR create file system hierarchy fail-ure

Cause: Themount point could not be found in the/etc/mtab file.

Action: Check adjacent logmessages for furtherdetails.

12415-2

ERROR create file system hierarchy fail-ure

Cause: The underlying disk resource could not befound.

Action: Check adjacent logmessages for furtherdetails.

12415-3

ERROR create file system hierarchy fail-ure

Cause: Creating the filesystem resource failed.

Action: Check adjacent logmessages for furtherdetails.

SIOS Protection Suite for LinuxPage 445

Page 466: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12415-4

ERROR create file system hierarchy fail-ure

Cause: The info field for the resource could not beupdated.

Action: Check adjacent logmessages for furtherdetails.

12415-5

ERROR create file system hierarchy fail-ure

Cause: The switchback strategy could not be set onthe resource.

Action: Check adjacent logmessages for furtherdetails.

12415-7

ERROR create file system hierarchy fail-ure \(conflicting entries in /etc/f-stab\

Cause: Themount point could not be removed fromthe /etc/fstab file.

Action: Check adjacent logmessages for furtherdetails.

12416-0

ERROR Unknown error in script filesys-ins, err=$err

Cause: This message should not occur under normalcircumstances.

Action: Check adjacent logmessages for furtherdetails.

12416-1

ERROR create filesys instance - existid -failure

Cause: This message should not occur under normalcircumstances.

Action: Check adjacent logmessages for furtherdetails.

12416-3

ERROR create filesys instance - ins_list -failure

Cause: Checking for an existing resource failed.

Action: Check adjacent logmessages for furtherdetails.

12416-4

ERROR create filesys instance - newtag- failure

Cause: The system failed to generate a suggestedtag for the resource.

Action: If this error happens during normal operation,contact Support.

12416-8

ERROR create filesys instance - ins_cre-ate - failure

Cause: The filesystem resource could not becreated.

Action: Check adjacent logmessages for furtherdetails.

12416-9

ERROR filesys instance - ins_setstate -failure

Cause: The new filesystem resource's state couldnot be initialized.

Action: Check adjacent logmessages for furtherdetails.

SIOS Protection Suite for LinuxPage 446

Page 467: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12417-3

ERROR create filesys instance - dep_cre-ate - failure

Cause: The resource's dependency relationship withits underlying disk could not be created.

Action: Check adjacent logmessages for furtherdetails.

12417-4

ERROR machine not specified Cause: Invalid arguments were specified for thermenu_mp operation.

Action: Ensure the script is called correctly. If thiserror happens during normal operation, please contactSupport.

12417-5

ERROR mount point not specified Cause: Invalid arguments were specified for thermenu_mp operation.

Action: Ensure the script is called correctly. If thiserror happens during normal operation, please contactSupport.

12417-7

ERROR unexpectedmultiple matchesfound

Cause: One or more systems show a filesystem ormount point usedmore than once.

Action: Verify filesystem devices andmount pointsand ensure that filesystems are only mounted once.Look for additional logmessages for more details.

12417-8

ERROR machine name not specified Cause: Invalid arguments were specified for thermenump operation.

Action: Ensure the script is called correctly. If thiserror happens during normal operation, please contactSupport.

12418-0

ERROR must specify filesystem type Cause: Invalid arguments were specified for thevalidfstype operation.

Action: Ensure the script is called correctly. If thiserror happens during normal operation, please contactSupport.

12418-1

ERROR mount point not specified Cause: Invalid arguments were specified for thevalidmp operation.

Action: Ensure the script is called correctly. If thiserror happens during normal operation, please contactSupport.

SIOS Protection Suite for LinuxPage 447

Page 468: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12418-2

ERROR Themount point $MP is not anabsolute path

Cause: A mount point was specified that isn't anabsolute path (doesn't start with a '/').

Action: Specify amount point as an absolute pathstarting with a '/'.

12418-3

ERROR $MP is already mounted on$MACH

Cause: The requestedmount point is already in useon the system.

Action: Specify amount point that isn't in use orunmount it before retrying the operation.

12418-4

ERROR Themount point $MP is alreadyprotected by LifeKeeper on$MACH

Cause: The system is already protecting thespecifiedmount point.

Action: Choose a different mount point that isn'talready being protected.

12418-5

ERROR Themount point $MP is not a dir-ectory on $MACH

Cause: Themount point refers to a non-directorysuch as a regular file.

Action: Choose amount point that refers to adirectory.

12418-6

ERROR Themount point directory $MPis not empty on $MACH

Cause: The specifiedmount point refers to adirectory that isn't empty.

Action: Choose amount point that is empty orremove the contents of the specified directory beforeretrying the operation.

12418-7

ERROR server name not specified Cause: Invalid arguments were specified for thevaluepmp operation.

Action: Ensure the script is called correctly. If thiserror happens during normal operation, please contactSupport.

12418-8

ERROR There are nomount points onserver $MACH

Cause: There are no possible mount points forfilesystem resource on the server.

Action: Check adjacent logmessages for furtherdetails.

12419-4

WARN Please correct conflicting \"/etc/f-stab\" entries on server$UNAME for: $FSDEV,$FSNAME

Cause: After deleting a filesystem resource, someentries in /etc/fstab need tomemanually cleaned up.

Action: Manually clean up the /etc/fstab file.

SIOS Protection Suite for LinuxPage 448

Page 469: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12419-5

ERROR getchildinfo found no $OKAPPchild for $PTAG

Cause: The system could not find a child resource inthe hierarchy.

Action: Check adjacent logmessages for furtherdetails and ensure that the hierarchy is correct beforeretrying the operation.

12419-6

ERROR enablequotas - quotacheck mayhave failed for $FS_NAME

Cause: The quota operation failed.

Action: Check adjacent logmessages for furtherdetails in both the lifekeeper log and in/var/log/messages.

12419-8

ERROR enablequotas - quotaon failed toturn on quotas for $FS_NAME,reason

Cause: The quota operation failed.

Action: Check adjacent logmessages for furtherdetails in both the lifekeeper log and/var/log/messages.

12420-0

ERROR The device node $dev was notfound or did not appear in theudev create time limit of $delayseconds

Cause: A device node (/dev/...) was not created byudev. This may indicate an issue with the storage orthe server's connection to the storage.

Action: Check adjacent logmessages for furtherdetails in both the lifekeeper log and in/var/log/messages.

12420-1

WARN Device $device not found. Willretry wait to see if it appears.

Cause: This can happen under normal conditionswhile udev creates device node entries for storage.This message should not happen repeatedly.

Action: Check adjacent logmessages for furtherdetails in both the lifekeeper log and in/var/log/messages.

12420-2

ERROR Command \"$-commandwithargs\" failed.Retrying ....

Cause: The given command failed but may havefailed temporarily. This failuremay happen duringnormal operations but should not keep failing.

Action: Check adjacent logmessages for furtherdetails if this message continues.

12420-4

WARN cannot make file system$FSNAME mount point

Cause: Themount point directory could not becreated.

Action: Ensure that themount point can be created.This may be due to filesystem permissions, mountoptions, etc.

SIOS Protection Suite for LinuxPage 449

Page 470: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12420-7

ERROR \"fsck\"ing file system$FSNAME failed, trying altern-ative superblock

Cause: This message indicates that the typicalfilesystem check failed. This messagemay be ok forext2 filesystems or other filesystems where analternative superblock location is used.

Action: Check adjacent logmessages for furtherdetails.

12420-9

ERROR \"fsck\"ing file system$FSNAME with alternativesuperblock failed

Cause: This indicates that an ext2 filesystem (orother filesystem where an alternative superblocklocation is used) check failed with the alternativesuperblock location.

Action: Check adjacent logmessages for furtherdetails and instructions on how to proceed.

12421-0

WARN POSSIBLE FILESYSTEMCORRUPTION ON $FSNAME($FPNAME

Cause: A filesystem was put in service or failed overwhen it was out of sync with its mirror source.

Action: Check adjacent logmessages for furtherdetails and review the product documentation forinformation on how to bring the filesystem in servicesafely.

12421-1

ERROR Reason for fsck failure ($retval):$ret

Cause: This logmessage is part of a series ofmessages and gives the actual exit code from thefsck process.

Action: Check adjacent logmessages for furtherdetails and instructions on how to proceed.

12421-2

ERROR \"fsck\" of file system$FSNAME failed

Cause: The check of the filesystem failed. This isusually due to the filesystem having corruption.

Action: Check adjacent logmessages for furtherdetails. Review the product documentation forinstructions on how to handle possible filesystemcorruption.

12421-3

WARN POSSIBLE FILESYSTEMCORRUPTION ON $FSNAME($FPNAME

Cause: The system or user tried to bring into servicea filesystem that may be corrupted. This can happenif a filesystem is switched or failed over when it wasout of sync with its mirror source.

Action: Check adjacent logmessages for furtherdetails and review the product documentation forinstructions on how to bring the resource into servicesafely.

SIOS Protection Suite for LinuxPage 450

Page 471: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12421-4

ERROR Reason for fsck failure ($retval) Cause: This message should follow a previous logmessage about a filesystem check failure and givesthe process exit code of the fsck process.

Action: Check adjacent logmessages for furtherdetails.

12421-8

ERROR File system $FSNAME wasfound to be already

Cause: This message is part of a series ofmessages.

Action: Check adjacent logmessages for furtherdetails.

12421-9

ERROR mounted after initial mountattempt failed.

Cause: This message is part of a series ofmessages. This should not happen under normalcircumstances but may not be fatal if the resourcecan be put in service.

Action: Check adjacent logmessages for furtherdetails in both the lifekeeper log and in/var/log/messages.

12422-0

ERROR File system $FSNAME failed tomount.

Cause: The filesystem could not bemounted.

Action: Check adjacent logmessages for furtherdetails.

12422-1

WARN Protected Filesystem $ID is full Cause: The filesystem is full.

Action: Remove unused data from the filesystem ormigrate to a larger filesystem.

12422-2

WARN Dependent Applications may beaffected <>

Cause: This indicates that an operation on a resourceis likely to cause operation on other resources basedon the resource hierarchy.

Action: Make sure it's acceptable for the indicatedresources to be affected before continuing.

12422-3

ERROR Put \"$t\" Out-Of-Service FailedBy Signal

Cause: This message should not occur under normalcircumstances.

Action: Check adjacent logmessages for furtherdetails.

12422-7

ERROR Put \"$i\" Out-Of-Service Failed Cause: The operation failed.

Action: Check adjacent logmessages for furtherdetails.

SIOS Protection Suite for LinuxPage 451

Page 472: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12423-0

ERROR Put \"$t\" In-Service Failed BySignal

Cause: This message should not occur under normalcircumstances.

Action: Check adjacent logmessages for furtherdetails.

12423-1

ERROR Put \"$t\" In-Service Failed Cause: The operation failed.

Action: Check adjacent logmessages for furtherdetails.

12423-4

ERROR Put \"$t\" In-Service Failed Cause: The operation failed.

Action: Check adjacent logmessages for furtherdetails.

12510-2

ERROR `printf 'Template resource "%s"on server "%s" does not exist'$TemplateTagName $Tem-plateSysName`

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

12510-3

ERROR `printf '%s is not shareable withany machine.' $DEV`

Cause: The device does not appear to be shared withany other systems.

Action: Verify that the device is accessible from allservers in the cluster. Ensure that all relevant storagedrivers and software are installed and configuredproperly.

12510-4

ERROR `printf 'Failed to create disk hier-archy for "%s" on "%s"'$PRIMACH $DEV`

Cause: The creation of a resource to protect aphysical disk failed.

Action: Check adjacent logmessages for moredetails and try to resolve the reported problem.

12510-7

ERROR `printf 'Template resource "%s"on server "%s" does not exist'$TemplateTagName $Tem-plateSysName`

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

12511-4

ERROR `printf 'Template resource "%s"on server "%s" does not exist'$TemplateTagName $Tem-plateSysName`

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

12512-0

ERROR `printf 'Template resource "%s"on server "%s" does not exist'$TemplateTagName $Tem-plateSysName`

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

SIOS Protection Suite for LinuxPage 452

Page 473: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12512-3

ERROR `printf 'Cannot access dep-stoextend script "%s" on server"%s"' $depstoextend $Tar-getSysName`

Cause: LifeKeeper was unable to run pre-extendchecks on the resource hierarchy because it wasunable to find the script "DEPSTOEXTEND" on{server}.

Action: Check your LifeKeeper configuration.

12512-6

ERROR `printf 'Template resource "%s"on server "%s" does not exist'$ChildTag $TemplateSysName`

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

12512-9

ERROR `printf 'Template resource "%s"on server "%s" does not exist'$TemplateTagName $Tem-plateSysName`

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

12515-5

ERROR SCSI $DEV failed to lock. Cause: There was a problem locking a SCSI device.

Action: Check adjacent logmessages for moredetails and try to resolve the reported problem.

12516-4

ERROR SCSI $INFO failed to unlock. Cause: There was a problem unlocking a SCSIdevice.

Action: Check adjacent logmessages for moredetails and try to resolve the reported problem.

12518-1

ERROR `printf 'Template resource "%s"on server "%s" does not exist'$TemplateTag $Tem-plateSysName`

Cause: LifeKeeper was unable to find the resource{tag} on {server}.

12610-5

ERROR script not specified - $PTH is adirectory

Cause: The specified script path is a directory.

Action: Correct the path of the script.

12611-0

ERROR script $PTH does not exist Cause: The specified script path does not exist.

Action: Correct the path of the script.

12611-5

ERROR script $PTH is a zero length file Cause: The specified script is an empty file.

Action: Correct the script's file path and check thecontents inside the script.

12611-7

ERROR script $PTH is not executable Cause: The specified script is not executable.

Action: Correct the script's file path, check thecontents inside the script file andmake sure it has theproper execute permissions.

SIOS Protection Suite for LinuxPage 453

Page 474: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12612-5

ERROR required templatemachine nameis null

Cause: The input templatemachine name is null.

Action: Correct the input templatemachine name.

12613-0

ERROR required template resource tagname is null

Cause: The input template resource {tag} is null.

Action: Correct the input template resource tagname.

12613-5

ERROR Unable to generate a new tag Cause: Failed to generate a new tag as the same asthe template tag name on the target node using the"newtag" script during the extension. The tag name isalready existing.

Action: Avoid using duplicate tag name on the samenode and check the log for detail.

12614-0

ERROR Unable to generate a new tag Cause: Failed to generate a new tag as input targettag name on the target node using the "newtag" scriptduring the extension. The tag name is alreadyexisting.

Action: Avoid using duplicate tag name on the samenode and check log for detail.

12615-0

ERROR unable to remote copy template\"$_lscript\" script file

Cause: Failed to remote copy template script file.The causemay be due to the non-existence/availability of template script file ontemplate node or any transaction failure during"lcdrcp" process.

Action: Check the existence/availability of templatescript and the connection to template node. Alsocheck the logs for related errors and try to resolve thereported problem.

12615-5

ERROR failed to create resourceinstance on \"$TargetSysName\

Cause: Failed to create resource instance using "ins_create".

Action: Check the logs for related errors and try toresolve the reported problem.

12616-0

ERROR failed to set resource instancestate for \"$TargetTagName\" on\"$TargetSysName\

Cause: Failed to set resource instance state using"ins_setstate" during GenApp resource extension.

Action: Check the logs for related errors and try toresolve the reported problem.

SIOS Protection Suite for LinuxPage 454

Page 475: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12617-0

ERROR getlocks failure Cause: Failed to get the administrative lock whencreating a resource hierarchy.

Action: Check the logs for related errors and try toresolve the reported problem.

12617-5

ERROR instance create failed Cause: Failed to create a GenApp instance using"appins".

Action: Check the logs for related errors and try toresolve the reported problem.

12618-0

ERROR unable to set state to OSU Cause: Failed to set resource instance state using"ins_setstate" during GenApp resource creation.

Action: Check the logs for related errors and try toresolve the reported problem.

12619-0

ERROR resource restore has failed Cause: Failed to restore GenApp resource.

Action: Check the logs for related errors and try toresolve the reported problem.

12620-0

ERROR create application hierarchy rls-locks failure

Cause: Failed to release lock after GenApp resourcecreated.

Action: Check the logs for related errors and try toresolve the reported problem.

12621-0

ERROR copy $ltype script $lscript failure Cause: Failed to copy user provided script toappropriate GenApp directory during resourcecreation.

Action: Check the existence/availability of userprovided script and the GenApp directory as well.Also check the logs for related errors and try toresolve the reported problem.

12621-5

ERROR no $ltype script specified Cause: Missing user defined script during GenAppresource creation.

Action: Check the input action script and runresource creation again.

12622-0

ERROR nomachine name specified Cause: Missing specifiedmachine name duringGenApp resource creation. Failed to copy specifieduser script due tomissing the input for machinename.

Action: Check the input for machine name and runresource creation again.

SIOS Protection Suite for LinuxPage 455

Page 476: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12622-5

ERROR no resource tag specified Cause: Missing specified tag name during resourcecreation.

Action: Check the input for source tag name and runresource creation again.

12623-0

ERROR $ERRMSGScript was ter-minated for unknown reason

Cause: Failed to extendGenApp resource due tounknown reason.

Action: Check the logs for related errors and try toresolve the reported problem.

12623-5

ERROR $ERRMSGRequired templatemachine name is null

Cause: Missing the input for templatemachine nameduring GenApp resource extension.

Action: Check the input for templatemachine nameand do the resource extension again.

12624-0

ERROR $ERRMSGRequired templateresource tag name is null

Cause: Missing the input for template resource tagname during GenApp resource extension.

Action: Check the input for template resource tagname and do the resource extension again.

12624-5

ERROR $ERRMSGCan not accessextend for $AppType/$ResTyperesources onmachine \"$Tar-getSysName\

Cause: Failed to locate "extend" script duringGenApp resource extension on target node.

Action: Check the existence/availability of "extend"script and doGenApp resource extension again.

12625-0

ERROR create application failure - ins_list failed

Cause: Failed when calling "ins_list" during GenAppresource creation.

Action: Check the logs for related errors and try toresolve the reported problem.

12625-5

ERROR create application failure - unableto generate a new tag

Cause: Failed to generate a new tag during theGenApp resource creation.

Action: Avoid using duplicate tag name on the samenode. Also check the logs for related errors and try toresolve the reported problem.

12627-0

ERROR create application failure - ins_create failed

Cause: Failed using "ins_create" to create GenAppinstance.

Action: Check the logs for related errors and try toresolve the reported problem.

SIOS Protection Suite for LinuxPage 456

Page 477: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12627-5

ERROR create application failure - copy_actions failed

Cause: Failed using "copy_actions" to copy userspecified template script file.

Action: Check the existence/availability of templatescript. Also check the logs for related errors and try toresolve the reported problem.

12629-0

ERROR Unable to obtain tag for resourcewith id $ID

Cause: Failed to fetch GenApp resource tag name byinput ID during recovery.

Action: Check the correctness of input ID andexistance/availability of GenApp resource in LCD.Also check the logs for related errors and try toresolve the reported problem.

12630-0

ERROR generic application recoverscript for $TAGwas not found orwas not executable

Cause: Failed to locate the user defined script forGenApp resource during recovery.

Action: Check the existence/availability of the userdefined script and do theGenApp recovery processagain.

12631-0

ERROR -t flag not specified Cause: Missing the input for resource tag nameduring GenApp resource restore.

Action: Check the input for resource tag name anddo resource restore again.

12631-5

ERROR -i flag not specified Cause: Missing the input for resource internal idduring GenApp resource restore.

Action: Check the input for resource internal id anddo resource restore again.

12633-5

ERROR restore script \"$LCDAS/$APP_RESTOREDIR/$TAG\" was notfound or is not executable

Cause: Failed to locate the user defined script forGenApp resource during restore.

Action: Check the existence/availability of the userdefined script and do theGenApp restore processagain.

12634-0

ERROR -t flag not specified Cause: Missing the input for resource tag nameduring GenApp resource remove.

Action: Check the input for resource tag name anddo resource remove again.

SIOS Protection Suite for LinuxPage 457

Page 478: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12634-5

ERROR -i flag not specified Cause: Missing the input for resource internal idduring GenApp resource remove.

Action: Check the input for resource internal id anddo resource remove again.

12636-5

ERROR remove script \"$LCDAS/$APP_REMOVEDIR/$TAG\" was notfound or was not executable

Cause: Failed to locate the user defined script forGenApp resource during remove.

Action: Check the existence/availability of the userdefined script and do theGenApp remove processagain.

12637-5

ERROR Script has hung checking\"$tag\". Forcibly terminating.

Cause: The "quickCheck" Script will be forciblyterminated for GenApp resource with tag name {tag}due to a waiting time over the user defined timeout.

Action: Check the GenApp resource performanceand restart quickChecking. Also check the logs forrelated errors and try to resolve the reported problem.

12638-0

ERROR Usage error: no tag specified Cause: Missing the input for resource tag nameduring GenApp resource quickCheck.

Action: Check the input for resource tag name andretry resource quickCheck.

12638-5

ERROR Internal error: ins_list failed on$tag.

Cause: Failed using "ins_list" to fetch the GenAppresource information by input tag name during thequickCheck process.

Action: Correct the input tag name and do thequickCheck process again. Also check the logs forrelated errors and try to resolve the reported problem.

12639-0

FATAL Failed to fork process to execute$userscript: $!

Cause: Failed to fork process to execute user defined"quickCheck" script during the GenApp resource"quickCheck" process.

Action: Check the existency/availability of the userdefined "quickCheck" script and do the "quickCheck"process again.

SIOS Protection Suite for LinuxPage 458

Page 479: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12639-1

ERROR quickCheck has failed for\"$tag\". Starting recovery.

Cause: TheGenApp resource with tag name {tag} isdetermined to be failed by using the user definedhealth monitoring script - "quickCheck" and therecovery process will be initiated.

Action: Check the performance of the GenAppresource when local recovery finished. Also checkthe logs for related errors and try to resolve thereported problem.

12640-0

ERROR -t flag not specified Cause: Missing the input for resource tag nameduring GenApp resource deletion process.

Action: Check the input for resource tag name anddo resource deletion process again.

12640-5

ERROR -i flag not specified Cause: Missing the input for resource internal idduring GenApp resource deletion process.

Action: Check the input for resource internal id anddo resource deletion process again.

12800-5

ERROR END failed%s of "%s" on server"%s" due to a "%s" signal

Cause: The quickCheck of {resource} on {server}failed due to an operating system signal {signal}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12800-8

ERROR Usage: quickCheck -t <tagname> -i <id>

Cause: Incorrect arguments have been supplied tothe dmmp device quickCheck command preventing itfrom running.

Action: Make sure all software components areproperly installed and at the correct version. Rerunthe command and supply the correct argument list: -t<Resource Tag> and -i <Resource ID> that identifiesthe dmmp device resource to be quickChecked.

12801-0

ERROR quickCheck for "%s" failedchecks of underlying paths, ini-tiate recovery. retry count=%s.

Cause: The dmmp kit failed to quickCheck a deviceafter {count} times of retries. A recovery of theprotected dmmp resource will be executed.

Action: Check adjacent logmessages for furtherdetails and relatedmessages.

SIOS Protection Suite for LinuxPage 459

Page 480: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12802-1

ERROR unable to find device for uuid"%s".

Cause: The device could not be found by unique idduring a restore operation.

Action: Verify that the resource is configuredproperly. Rerun the command and supply the correctdevice id that identifies the dmmp device resource tobe restored.

12802-5

ERROR Device "%s" failed to unlock. Cause: A non working {device} was detected andcould not be unlocked during the restore operation.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12802-6

ERROR Device "%s" failed to lock. Cause: The {device} could not be locked during theresotre.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12803-1

ERROR unable to find device for uuid"%s".

Cause: The device could not be found by unique idduring the remove operation.

Action: Verify that the resource is configuredproperly. Rerun the command and supply the correctdevice id that identifies the dmmp device resource tobe removed.

12803-4

ERROR Device "%s" failed to unlock. Cause: The {device} could not be unlocked during theremove.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12803-6

ERROR unable to load existing inform-ation for device with uuid "%s".

Cause: The device information could not be loadedby unique id.

Action: Make sure the resource is configuredproperly. Rerun the command and supply the correctdevice id that identifies the dmmp device resource.

SIOS Protection Suite for LinuxPage 460

Page 481: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12803-7

ERROR unable to load existing inform-ation for device "%s".

Cause: The device information could not be loadedby name.

Action: Make sure the resource is configuredproperly. Rerun the command and supply the correctdevice name that identifies the dmmp deviceresource.

12803-8

ERROR unable to load existing inform-ation for device, no dev or uuiddefined.

Cause: The device information could not be loadedsince neither a unique device id nor name of thedevice were defined.

Action: Make sure the resource is configuredproperly. Rerun the command and supply the correctdevice id or name that identifies the dmmp deviceresource.

12804-1

ERROR unable to load existing inform-ation for device with uuid "%s".

Cause: The device information could not be loadedby unique id.

Action: Make sure the resource is configuredproperly. Rerun the command and supply the correctdevice id that identifies the dmmp device resource.

12805-7

ERROR All paths are failed on "%s". Cause: LifeKeeper detected all paths listed to theprotected dmmp device are in the failed state.

Action: Check adjacent logmessages for furtherdetails and relatedmessages.

12805-8

ERROR could not determine registrationsfor "%s"! All paths failed.

Cause: LifeKeeper could not determine registrationsfor protected dmmp {device}. All paths to the dmmp{device} are in the failed state.

Action: Check adjacent logmessages for furtherdetails and relatedmessages.

12805-9

WARN path "%s" no longer configuredfor "%s", remove from path list.

Cause: LifeKeeper detected listed {path} to protected{device} is not valid anymore and will remove it fromthe path list.

Action: Check adjacent logmessages for furtherdetails and relatedmessages.

12806-0

WARN registration failed on path "%s"for "%s".

Cause: LifeKeeper failed the registration on {path} forprotected dmmp {device}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages.

SIOS Protection Suite for LinuxPage 461

Page 482: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12806-2

ERROR all paths failed for "%s". Cause: LifeKeeper failed to verify a valid path toprotected dmmp {device}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages.

12807-2

ERROR The daemon "%s" does notappear to be running and couldnot be restarted. Path failuresmay not be correctly handledwithout this daemon.

Cause: LifeKeeper failed to verify dmmp daemon isrunning and could not restart it.

Action: Check adjacent logmessages for furtherdetails and relatedmessages.

12807-8

ERROR "%s" resource type is notinstalled on "%s".

Cause: The DeviceMapper Multipath Recovery Kitfor dmmp device support is not installed on thesystem.

Action: Install the steeleye-lkDMMP DeviceMapperMultipath Recovery Kit rpm on the system.

12808-3

ERROR This script must be executed on"%s".

Cause: An incorrect system name has been suppliedas an argument to the devicehier script used to createthe dmmp device resource.

Action: Make sure the cluster nodes and comm-paths are properly configured. Supply the correctsystem name to the devicehier script. The namemust match the name of the system onwhich thecommand is run.

12808-4

ERROR The device%s is not active. Cause: LifeKeeper failed to find the specified {device}as a valid device on the system during resourcecreation.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Rerun the commandand supply the correct argument list: -t <ResourceTag> and -i <Resource ID> that identifies the dmmpdevice resource to be created.

12808-6

ERROR Failed to create "%s" hierarchy. Cause: LifeKeeper failed to create resource hierarchyfor {device}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

SIOS Protection Suite for LinuxPage 462

Page 483: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12808-8

ERROR Error creating resource "%s" onserver "%s"

Cause: LifeKeeper failed to create the resource with{tagname} on {server}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12809-0

ERROR Failed to create dependency"%s"-"%s" on system "%s".

Cause: LifeKeeper failed to create dependency{resource tag name} - {resource tag name} on{system} during creation.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12809-1

ERROR Error creating resource "%s" onserver "%s"

Cause: LifeKeeper failed to create {resource} on{system}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12810-1

ERROR "%s" constructor requires a validargument.

Cause: LifeKeeper failed to create an object for thedmmp resouce during construction.

Action: Rerun the command and supply the correctargument list: -t <Resource Tag> and -i <ResourceID> that identifies the dmmp device resource.

12810-2

ERROR Invalid tag "%s". Cause: A resource instance could not be found forthe given tag name.

Action: Make sure the resource is configuredproperly. Rerun the command and supply the correctargument list: -t <Resource Tag> and -i <ResourceID> that identifies the dmmp device resource.

12811-1

ERROR Failed to get registrations for"%s": %s. Verify the storage sup-ports persistent reservations.

Cause: LifeKeeper failed to get the registrations of{device} with themessage, "bad field in Persistentreservation in cdb".

Action: Verify if the storage supports persistentreservations. Check adjacent logmessages forfurther details and relatedmessages. Youmustcorrect any reported errors before retrying theoperation.

SIOS Protection Suite for LinuxPage 463

Page 484: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12811-2

ERROR Failed to get registrations for"%s": %s. Verify the storage sup-ports persistent reservations.

Cause: LifeKeeper failed to get the registrations of{device} with themessage, "illegal request".

Action: Verify if the storage supports persistentreservations. Check adjacent logmessages forfurther details and relatedmessages. Youmustcorrect any reported errors before retrying theoperation.

12813-6

ERROR A previous quickCheck with PID"%s" running for device "%s"has been terminated.

Cause: LifeKeeper detected that a previousquickCheck operation is still running during the dmmpresource restore operation. It has been terminated byLifeKeeper.

Action: Check adjacent logmessages for furtherdetails and relatedmessages.

12813-7

ERROR SCSI reservation conflict on%sduring LifeKeeper resource ini-tialization. Manual interventionrequired.

Cause: LifeKeeper detected a SCSI reservationconflict on {device} during dmmp resource restore.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Manual interventionand fix of the reservation conflict on {device} isrequired.

12813-8

ERROR unable to clear registrations on%s.

Cause: LifeKeeper failed to clear all the registrationson {device}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12814-0

WARN registration failed on path%s for%s.

Cause: LifeKeeper failed tomake the registratioin on{path} for {device}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12814-3

ERROR reserve failed (%d) on%s. Cause: LifeKeeper failed tomake reservation for{resource} on {device}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

SIOS Protection Suite for LinuxPage 464

Page 485: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12814-5

ERROR The server ID "%s" returned by"%s" is not valid.

Cause: LifeKeeper failed to generate a valid host{ID}.

Action: The ID used to register a device is made upof 1 to 12 Hex digits that uniquely identifies the serverin the cluster. Check adjacent logmessages forfurther details and relatedmessages. Youmustcorrect any reported errors before retrying theoperation.

12814-6

ERROR device failure on%s. SYSTEMHALTED.

Cause: LifeKeeper detected failure on {device} andwill reboot the server.

Action: Check adjacent logmessages for furtherdetails and relatedmessages.

12814-8

ERROR device failure on%s. SYSTEMHALTED DISABLED.

Cause: LifeKeeper detected a failure on {device}. Thereboot was skipped due to LifeKeeper configuration.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Turn on theconfiguration "SCSIHALT" tomake the rebootavailable for any detected device failure.

12814-9

ERROR device failure or SCSI Error on%s. SENDEVENT DISABLED.

Cause: LifeKeeper detected a failure on {device}. Theevent generation was skipped due to LifeKeeperconfiguration.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Turn on theconfiguration "SCSIEVENT" tomake the sendeventavailable for any detected device failure.

12815-0

ERROR %s does not have EXCLUSIVEaccess to%s, halt system.

Cause: LifeKeeper detected a reservation conflict for{device} on {server} and will reboot the server.

Action: Check adjacent logmessages for furtherdetails and relatedmessages.

12815-1

ERROR %s does not have EXCLUSIVEaccess to%s, halt systemDISABLED.

Cause: LifeKeeper detected a reservation conflict for{device} on {server}. The reboot was skipped due toLifeKeeper configuration.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Turn on theconfiguration "RESERVATIONCONFLICT" tomakethe reboot available for any detected reservationconflicts.

SIOS Protection Suite for LinuxPage 465

Page 486: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12815-4

WARN unable to flush buffers on%s. Cause: LifeKeeper failed to flush the buffers for{device} during dmmp resource remove.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12815-7

WARN %s utility not found, limitedhealthcheck for%s.

Cause: LifeKeeper failed to find "dd" utility for thehealth check of {device}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12816-0

ERROR %s failed to read%s. Cause: LifeKeeper failed a disk I/O test for {device}when using {utility}.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12816-3

ERROR Registration ID "%s" for "%s" isnot valid.

Cause: LifeKeeper failed to generate a validregistration {ID} for {device}.

Action: The ID used to register a device is made upof 4 Hex digits derived from the path to the device.Check adjacent logmessages for further details andrelatedmessages. Youmust correct any reportederrors before retrying the operation.

12817-0

ERROR Usage: canextend <Templatesystem name> <Template tagname>

12850-0

ERROR Usage error Cause: Incorrect arguments have been supplied tothe dmmp device restore command preventing it fromrunning.

Action: Rerun the command and supply the correctargument list: -t <Resource Tag> and -i <ResourceID> that identifies the dmmp device resource to berestored.

12850-4

ERROR "%s" resource type is notinstalled on "%s".

Cause: The DeviceMapper Multipath Recovery Kitfor dmmp device support is not installed on thesystem.

Action: Install the steeleye-lkDMMP DeviceMapperMultipath Recovery Kit rpm on the system.

SIOS Protection Suite for LinuxPage 466

Page 487: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12850-6

ERROR Usage error Cause: Incorrect arguments have been supplied tothe dmmp device devShared command preventing itfrom running.

Action: Rerun the command and supply the correctargument list: <Template Resource System Name>and <Template Resource Tag> that identifies thedmmp device resource to be created.

12850-7

FATAL This script must be executed on"%s".

Cause: An incorrect system name has been suppliedas an argument to the devicehier script used to createthe dmmp device resource.

Action: Supply the correct system name to thedevicehier script. The namemust match the name ofthe system onwhich the command is run.

12851-1

ERROR Failed to get the ID for thedevice "%s". Hierarchy createfailed.

Cause: The devicehier script used to create thedmmp device resource was unable to determine theSCSI ID for the supplied device.

Action: Check that the supplied device path existsand that is for a supported SCSI storage array.

12851-2

ERROR Failed to get the disk ID for thedevice "%s". Hierarchy createfailed.

Cause: The devicehier script used to create thedmmp disk resource was unable to determine theSCSI ID for the supplied disk.

Action: Check that the supplied device path existsand that is for a supported SCSI storage array.

12851-3

ERROR Failed to create the underlyingresource for device "%s". Hier-archy create failed.

Cause: The creation of the underlying dmmp diskresource failed.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12851-5

ERROR Error creating resource "%s" onserver "%s"

Cause: The creation of the dmmp device resourcefailed.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

SIOS Protection Suite for LinuxPage 467

Page 488: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12851-7

ERROR Failed to create dependency"%s"-"%s" on system "%s".

Cause: The parent child dependency creationbetween the dmmp device and dmmp disk resourcesfailed.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12851-9

ERROR Error creating resource "%s" onserver "%s"

Cause: The attempt to bring the newly created dmmpdevice resource in service has failed.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Youmust correct anyreported errors before retrying the operation.

12852-1

ERROR Either TEMPLATESYS orTEMPLATETAG argument miss-ing

Cause: Incorrect arguments have been supplied tothe extend command for the dmmp device resource.

Action: Rerun the dmmp device resource extend andsupply the correct template system and tag names.

12854-0

ERROR Usage error Cause: Incorrect arguments have been supplied tothe dmmp device getId command used to retrieve theSCSI ID.

Action: Rerun the command and supply the correctargument list: -i <device path> or -b <device ID>.

12854-1

ERROR Usage error Cause: Incorrect arguments have been supplied tothe command used to delete the dmmp deviceresource.

Action: Rerun the command and supply the correctargument list: -t <dmmp device resource tag>.

12854-3

ERROR device node \"$dev\" does notexist.

Cause: The device node required for restoring thedmmp device resource does not exist. The allocatedwait time in restore for udev device creation has beenexceed.

Action: Rerun the dmmp device resource restoreonce udev has created the device.

12854-4

ERROR Usage error Cause: Incorrect arguments have been supplied tothe remove command used to take the dmmp deviceresource out of service.

Action: Rerun the command and supply the correctargument list: -t <dmmp device resource tag>.

SIOS Protection Suite for LinuxPage 468

Page 489: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12910-0

FATAL Failed to load instance fromLifeKeeper.

Cause: An invalid resource tag or ID was specified.

Action: Check that the tag or ID is valid and re-runthe command.

12910-3

FATAL No resourcematches tag \"$-self->{'tag'}\".

Cause: An invalid resource tag was specified.

Action: Check the tag and re-run the command.

12910-4

FATAL An error occurred settingLifeKeeper resource information

Cause: An internal error has occurred in LifeKeeper.

12910-8

ERROR Missing parameter \"$param\" inec2 resource object

Cause: This is an internal error.

12911-0

ERROR Could not get the Elastic Net-work Interface ID for $dev

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12911-1

ERROR Failed to get Allocation ID ofElastic IP \"$elasticIp\".

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12911-2

ERROR Cannot find curl executable.Please verify that the steeleye-curl package is installed.

Cause: Install the steeleye-curl package, if it is notinstalled.

12911-3

ERROR Failed to get my instance ID. Cause: The EC2 instancemetadata access failed.

Action: Check the Amazon console and retry theoperation.

12911-4

ERROR Failed to get ENI ID. Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12911-5

ERROR Failed to associate Elastic IP,retrying $i/$MAX_RETRY

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12911-6

ERROR Failed to associate Elastic IP \"$-self->{'EIP'}\" on \"$self->{'DEV'}\".

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

SIOS Protection Suite for LinuxPage 469

Page 490: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12911-7

ERROR Failed to disassociate ElasticIP, retrying $i/$MAX_RETRY

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12911-8

WARN $self->{'EIP'} is not associatedwith any instance.

Cause: The Elastic IP is not associated with anyinstance.

Action: LifeKeeper will try to fix the issue by callingthe EC2 API to associate the Elastic IP. Checkadjacent logmessages for more details.

12911-9

WARN $self->{'EIP'} is associated withanother instance.

Cause: The Elastic IP is associated with anotherinstance.

Action: LifeKeeper will try to fix the issue by callingthe EC2 API to associate the Elastic IP. Checkadjacent logmessages for more details.

12912-0

ERROR Failed to recover Elastic IP. Cause: The EC2 API call failed to associate theElastic IP.

Action: Check the network and the Amazon consoleand retry the operation.

12912-1

ERROR Recovery process ended butElastic IP is not associated withthis instance. Please checkAWS console.

Cause: The EC2 API call failed to associate theElastic IP.

Action: Check the network and the Amazon consoleand retry the operation.

12912-2

ERROR Error creating resource \"$tar-get_tag\" with return code of\"$err\".

Cause: LifeKeeper was unable to create the resourceinstance on the server.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12912-3

ERROR Failed to get ENI ID. Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12912-4

WARN $self->{'EIP'} is associated withanother network interface.

Cause: The Elastic IP is associated with the properinstance, but the wrong ENI.

Action: LifeKeeper will try to fix the issue by callingthe EC2 API to associate the Elastic IP. Checkadjacent logmessages for more details.

SIOS Protection Suite for LinuxPage 470

Page 491: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12912-5

ERROR Link check failed for interface\'$dev\'.

Cause: The requested interface is showing 'NO-CARRIER' indicating that no link is present.

Action: Check the network interface and bring thelink up.

12912-6

ERROR Link check failed for interface\'$dev\'. Reason: down slave.

Cause: The requested interface is showing 'NO-CARRIER' indicating that no link is present.

Action: Check the network interface and bring thelink up.

12912-9

WARN The link for network interface \'$-self->{'DEV'}\' is down. Attempt-ing to bring the link up.

Cause: The requested interface is showing 'NO-CARRIER' indicating that no link is present.

Action: LifeKeeper will try to fix the issue by bringingthe link up and associating the Elastic IP with theinterface. Check adjacent logmessages for moredetails.

12913-7

ERROR The link for network interface \'$-self->{'DEV'}\' is still down.

Cause: LifeKeeper could not bring the link up.

Action: Ensure the interface is enabled and up.Check adjacent logmessages for more details.

12913-9

WARN The link for network interface \'$-self->{'DEV'}\' is down.

Cause: The requested interface is showing 'NO-CARRIER' indicating that no link is present.

Action: Check the network interface and bring thelink up.

12914-0

ERROR Could not get ENI ID for $self->{IP}.

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12914-1

ERROR Could not update route table,retrying $i/$MAX_RETRY (err-r=%s)(output=%s

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12914-2

ERROR Failed to update route table Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

SIOS Protection Suite for LinuxPage 471

Page 492: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12914-3

ERROR Youmust have exactly one IPaddress resource as the parentof the RouteTable EC2resource. Please reconfigureyour resource hierarchy.

Cause: The Route Table EC2 resourcemust haveone and only one IP resource as a parent.

Action: Repair the resource hierarchy as necessary.

12914-4

ERROR $func called with invalid timeout:$timeout

Cause: An invalid timeout value was specified in the/etc/default/LifeKeeper file.

Action: Verify all EC2_*_TIMEOUT settings arevalid in /etc/default/LifeKeeper.

12914-5

ERROR $func action timed out after$timeout seconds

Cause: The action did not complete within thetimeout period.

Action: Consider increasing the EC2_*_TIMEOUTvalue for the given action (in /etc/default/LifeKeeper).

12914-6

ERROR failed to run $func with timeout:$@

Cause: This is an internal error.

12914-8

ERROR Amazon ec2-describe-route-tables call failed (err=%s)(out-put=%s

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12915-0

ERROR Elastic IP \"$elasticIp\" is asso-ciated with another instance.

Cause: The Elastic IP is not associated with theproper instance.

Action: LifeKeeper will try to fix the issue by callingthe EC2 API to associate the Elastic IP. Checkadjacent logmessages for more details.

12915-1

ERROR Could not get the Association IDfor Elastic IP \"$elasticIp\".

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12915-2

ERROR Failed to disassociate Elastic IP\"$self->{'EIP'}\" on \"$self->{'DEV'}\".

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12915-3

ERROR Failed to disassociate Elastic IP\"$elasticIp\", (err=%s)(out-put=%s

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

SIOS Protection Suite for LinuxPage 472

Page 493: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12915-4

ERROR Amazon ec2-describe-addresses call failed (err=%s)(output=%s

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12915-5

ERROR Amazon ec2-describe-addresses call failed (err=%s)(output=%s

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12915-6

ERROR curl call failed (err=%s)(out-put=%s

Cause: The EC2 instancemetadata access failed.

Action: Check the Amazon console and retry theoperation.

12915-7

ERROR curl call failed (err=%s)(out-put=%s

Cause: The EC2 instancemetadata access failed.

Action: Check the Amazon console and retry theoperation.

12915-8

ERROR curl call failed (err=%s)(out-put=%s

Cause: The EC2 instancemetadata access failed.

Action: Check the Amazon console and retry theoperation.

12915-9

ERROR Amazon ec2-associate-addresscall failed (err=%s)(output=%s

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12916-0

ERROR Amazon ec2-describe-addresses call failed (err=%s)(output=%s

Cause: The EC2 API call failed, possibly due to anetwork issue.

Action: Check the network and the Amazon consoleand retry the operation.

12916-1

ERROR Error deleting resource \"$other-Tag\" on \"$otherSys\" withreturn code of \"$err\".

Cause: LifeKeeper was unable to delete the resourceinstance on the server.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12940-3

ERROR END failed create of $TAG dueto a $sig signal

Cause: The create process was interrupted by asignal.

12940-9

ERROR The IP resource $IP_RES is not\"ISP\".

Cause: The IP resource is not in service.

Action: Bring the resource in service and retry theoperation.

SIOS Protection Suite for LinuxPage 473

Page 494: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

12941-0

ERROR Could not find IP resource $IP_RES

Cause: Ensure that the IP resource exists and retrythe operation.

12941-2

ERROR EC2 resource $ID is already pro-tected

Cause: A resource with the specified ID alreadyexists.

Action: Make sure to clean up any remnants of an oldresource before re-creating a new resource.

12941-3

ERROR Failed to encrypt AWS_SECRET_KEY

Cause: Ensure that the AWS_ACCESS_KEY andAWS_SECRET_KEY have been entered correctly.

12941-6

ERROR Error creating resource \"$TAG\"with return code of \"$lcderror\".

Cause: LifeKeeper was unable to create the resourceinstance.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12941-8

ERROR Dependency creation between\"$IP_RES\" and \"$TAG\" failedwith return code of \"$lcderror\".

Cause: LifeKeeper was unable to create the resourcedependency.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12942-0

ERROR In-service failed for tag\"$TAG\".

Cause: LifeKeeper could not bring the resourceinstance into service.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

12980-0

ERROR canextend checks failed for \"$-self->{'tag'}\" (err=$ret

Cause: The pre-extend checks failed for the targetserver.

Action: Check adjacent logmessages for furtherdetails and relatedmessages. Correct any reportederrors.

13400-3

ERROR LK_ERROR:$tag:catch a\"$sig\" signal

Cause: The "create" process was interrupted by asignal.

Action: Check adjacent logmessages.

13400-4

ERROR LK_ERROR:$tag:Unable to get-locks on $server during resourcecreate. Error ($rc)

Cause: Failed to get the administrative lock whencreating the resource hierarchy.

Action: Check the logs for related errors and resolvethe reported problem.

SIOS Protection Suite for LinuxPage 474

Page 495: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

13400-5

ERROR LK_ERROR:$tag:The service\"$serviceName\" is not sup-ported on $server. Error ($rc)

Cause: The service does not exist or cannot beprotected.

Action: Input the appropriate service name.

13400-6

ERROR LK_ERROR:$tag:The service\"$serviceName\" is already pro-tected on $server.

Cause: This service is already protected.

Action: Choose a different service. Cannot create aresource to protect this service.

13400-7

ERROR LK_ERROR:$tag:Error creatingresource $tag. Error ($rc)

Cause: LifeKeeper was unable to create the resourceinstance.

Action: Check adjacent logmessages and correct.the cause of the error.

13401-1

ERROR LK_ERROR:$tag:In-serviceattempted failed for tag $tag.

Cause: Failed to start the service protected by theQSP resource.

Action: Check the log related to the service toprotect and. Resolve the problem.

13401-5

ERROR LK_ERROR:$tag:Unable to rls-locks on $server during resourcecreate. Error ($rc)

Cause: Failed to release the lock after QSP resourcecreated.

Action: Check the logs for related errors and resolvethe reported problem.

13410-3

ERROR LK_ERROR:$tag:Templateresource \"$template_tag\" onserver \"$template_sys\" doesnot exist

Cause: The resource cannot be found on thetemplate server.B

Action: Ensure the hierarchy is correct on thetemplate server before extending.

13410-4

ERROR LK_ERROR:$tag:Templateresource \"$template_tag\" onserver \"$template_sys\" is notQSP resource (app=$ins[1], res-s=$ins[2])

Cause: The template resource is not a QSPresource.

Action: Extend the resource as to the same type oneas the template.

13410-5

ERROR LK_ERROR:$tag:The service\"$service\" is not supported on$me. Error ($check)

Cause: Failed to release the lock after the QSPresource was created.

Action: Install the service on target server beforeextending.

13410-6

ERROR LK_ERROR:$tag:The service\"$service\" is already protectedon $me.

Cause: There is already a resource of with the sameID on target server.

Action: Cannot create a duplicated resource of thesame service.

SIOS Protection Suite for LinuxPage 475

Page 496: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

13420-3

ERROR LK_ERROR:$tag:catch a\"$sig\" signal

Cause: The process to extend the resource wasinterrupted by a signal.

Action: Check adjacent logmessages.

13420-4

ERROR LK_ERROR:$tag:Templateresource \"$template_tag\" onserver \"$template_sys\" doesnot exist

Cause: The resource cannot be found on by thetemplate system.

Action: Ensure the hierarchy is correct on thetemplate system server before extending.

13420-8

ERROR LK_ERROR:$tag:Error creatingresource \"$tag\" on server\"$me\"

Cause: The resource cannot be found by thetemplate system.

Action: Ensure the hierarchy is correct on thetemplate system before extending.

13440-1

ERROR An error occurred during creationof LifeKeeper application=common%s.

Cause: The start process of the service does notterminate within the specified time.

Action: Check the protected service and retry the"restore" operation. Also check the logs for relatederrors and try to resolve the reported problem.

13420-8

ERROR LK_ERROR:$tag:timeout $cmdfor \"$tag\". Forcibly terminating.

Cause: LifeKeeper was unable to create the resourceinstance on target server.

Action: Check adjacent logmessages and correctthe cause of the error.

13440-5

FATAL LK_FATAL:$tag:Failed to forkprocess to execute service com-mand: $!

Cause: Failed to fork. This is a system error.

Action: Determine why fork fails.

13440-7

ERROR LK_ERROR:$tag:service com-mand has failed for \"$tag\"

Cause: Failed to execute service command.

Action: Manual execution of "service" command by"start" option causes an error. Correct the cause ofthe error referring to the error message.

13450-1

FATAL LK_ERROR:$tag:timeout $cmdfor \"$tag\". Forcibly ter-minating.!

Cause: The "remove" process of the service doesnot terminate within the specified time.

Action: Check the protected service and retry the"remove" operation. Also check the logs of relatederrors and resolve the reported problem.

13450-5

FATAL LK_FATAL:$tag:Failed to forkprocess to execute service com-mand: $!

Cause: Failed to fork. This is a system error.

Action: Determine why fork fails.

SIOS Protection Suite for LinuxPage 476

Page 497: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

13450-7

FATAL LK_ERROR:$tag:service com-mand has failed for \"$tag\"

Cause: Failed to execute service command.

Action: Manual execution of service command bystop option causes an error. Correct the cause of theerror by referring to error message.

13460-1

ERROR LK_ERROR:$tag:timeout $cmdfor \"$tag\". Forcibly terminating.

Cause: The "remove" process of the service did notterminate within the specified time.

Action: Check the protected service. Also check thelogs for related errors and try to resolve the reportedproblem.

13460-5

FATAL LK_FATAL:$tag:Failed to forkprocess to execute service com-mand: $!

Cause: Failed to fork. This is a system error.

Action: Determine why fork fails.

13460-7

FATAL LK_FATAL:$tag:Failed to forkprocess to execute service com-mand: $!

Cause: Failed to execute service command.

Action: Manual execution of service command by"status" option causes an error. Correct the cause ofthe error by referring to error message.

13450-1

ERROR LK_ERROR:$tag:service com-mand has failed for \"$tag\"

Cause: The "remove" process of the service did notterminate within the specified time.

Action: Check the protected service and retry the"remove" operation. Also check the logs for relatederrors and resolve the reported problem.

13470-1

ERROR LK_ERROR:$tag:timeout $cmdfor \"$tag\". Forcibly terminating.

Cause: The "recover" process of the service doesnot terminate within the specified time.

Action: Check the protected service. Also check thelogs for related errors and resolve the reportedproblem.

13470-6

FATAL LK_FATAL:$tag:Failed to forkprocess to execute service com-mand: $!

Cause: Failed to fork. This is a system error.

Action: Determine why fork fails.

13470-8

ERROR LK_ERROR:$tag:service com-mand has failed for \"$tag\"

Cause: Failed to execute service command.

Action: Manual execution of service command by"start" option causes an error. Correct the cause ofthe error by referring to error message.

13480-3

ERROR LK_ERROR:$tag:tag \"$tag\"does not exist on server \"$me\"

Cause: The specified tag does not exist. This is aninternal error.

SIOS Protection Suite for LinuxPage 477

Page 498: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

CombinedMessage Catalog

Cod-e

Sever-ity Message Cause/Action

13480-4

ERROR LK_ERROR:$tag:app type\"$ins[1]\" is not $app

Cause: The specified tag is not a QSP resource. Thisis an internal error.

13480-5

ERROR LK_ERROR:$tag:res type \"$ins[2]\" is not $res

Cause: The specified tag is not a QSP resource. Thisis an internal error

13482-3

ERROR LK_ERROR:$tag:tag \"$tag\"does not exist on server \"$me\"

Cause: The specified tag is not a QSP resource. Thisis an internal error.

13482-4

ERROR LK_ERROR:$tag:app type\"$ins[1]\" is not $app

Cause: The specified tag is not a QSP resource. Thisis an internal error.

13482-5

ERROR LK_ERROR:$tag:res type \"$ins[2]\" is not $res

Cause: The specified tag is not a QSP resource. Thisis an internal error

13484-3

ERROR LK_ERROR:$tag:tag \"$tag\"does not exist

Cause: The specified tag does not exist. This is aninternal error.

13484-4

ERROR LK_ERROR:$tag:app type\"$ins[1]\" is not $app

Cause: The specified tag is not a QSP resource. Thisis an internal error.

13484-5

ERROR LK_ERROR:$tag:res type \"$ins[2]\" is not $res

Cause: The specified tag is not a QSP resource. Thisis an internal error.

SIOS Protection Suite for LinuxPage 478

Page 499: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Index

Index

A

Active/Active 10

Active/Standby 11

Administration 127

Asynchronous Mirroring 277

Automatic LifeKeeper Restart

Disabling 176

Enabling 175

Automatic Switchback 12

B

Bitmap File 300

Block Resource Failover 42

Browser Security Parameters 172

C

Command Line

Mirror Administration 310

MonitoringMirror Status 313

Communication Paths

Creating 128

Deleting 130

Firewall 221

Heartbeat 7

Compression Level 310

Configuration 29

Application 56

Concepts 7

SIOS Protection Suite for LinuxPage 479

Page 500: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Index

Data Replication 55

General 286

Network 55

Network and LifeKeeper 286

Optional Tasks 38

Steps 29

Storage and Adapter 56

Values 215

Confirm Failover 39

CONFIRM_SO

Disabling Reservations 105

Connecting

Servers to a Cluster 179

Core 5

Credentials 124

Custom Certificates 49

D

Data Replication Path 286

Dialogs

Cluster Connect 187

Cluster Disconnect 187

Resource Properties 188

Server Properties 189

Disconnecting 179

E

Error Detection 127

Event Email Notification 35

Configuration 37

SIOS Protection Suite for LinuxPage 480

Page 501: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Index

Overview 31

Troubleshooting 38

Event Forwarding via SNMP 31

Configuration 33

Overview 31

SNMP Troubleshooting 34

F

Failover Scenarios 282

Fault Detection and Recovery 23

IP Local Recovery 23

Resource Error Recovery Scenario 25

Server Failure Recovery Scenario 27

Fencing

AlternativeMethods 104

I/O Fencing Chart 105

Introduction 102

File System 241

File Systems 6

Firewall

Running LifeKeeper GUI Through Firewall 222

Running LifeKeeper with Firewall 221

Flags 44

Force Mirror Online 309

G

Generic Applications 6

GUI

Configuring 163

Exiting 174

SIOS Protection Suite for LinuxPage 481

Page 502: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Index

Overview 162

Running on LifeKeeper Server 172

Running on Remote System 169

Software Package 153

Starting 165

Stopping 165

H

Hardware 7

Health Monitoring 217

I

In Service 194

Intelligent Switchback 12

INTERFACELIST 240-241

Introduction

How It Works 278

Mirroring 276

IP Addresses 6

J

Java

Plug-in 169

Security Policy 167

L

LCD Interface (LCDI) 196

LifeKeeper Alarm Interface 204

LifeKeeper Communications Manager (LCM) 203

Alarming and Recovery 204

Status Information 204

SIOS Protection Suite for LinuxPage 482

Page 503: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Index

LifeKeeper Configuration Database (LCD) 195

Commands 196

Configuration Data 199

Directory Structure 200

Flags 200

Resource Types 200

Resources Subdirectories 201

Structure of LDC Directory in /opt/LifeKeeper 202

LifeKeeper Recovery Action and Control Interface (LRACI) 6

lkbackup

Broken Equivalencies 239

With SDR 247

lkpolicy Tool 123

M

Manual Failover Confirmation 39

Menus 153

Edit Menu - Resource 156

Edit Menu - Server 156

File 155

Help 158

Resource Context 153

Server Context 154

View 157

Message Bar 174

Mirror Administration

Command Line 310

GUI 308

Mirror Status

Monitoring via Command Line 313

SIOS Protection Suite for LinuxPage 483

Page 504: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Index

Viewing 307

Multi-Site Cluster 323

Before You Start 338

Configuration Considerations 324

File System

Replicate Existing 328

Replicate New 326

Migration

Performing 339

Successful 348

Overview 323

Requirements 338

Resource Hierarchy

Creating 325

Extending 332

Extending to Disaster Recovery System 335

Restore and Recover 337

Restrictions 325

N

N-Way Recovery 128

Nested File System 240

Network Bandwidth

Determine Requirements 287

Measuring Rate of Change 287

O

Out of Service 195

Output Panel 174

SIOS Protection Suite for LinuxPage 484

Page 505: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Index

P

Pause and Resume 310

Properties Panel 173

Protected Resources 4

Q

Quick Service Protection (QSP) 7

Quorum/Witness 107

Actions WhenQuorum is Lost 111

Configurable Components 108

Disabling Reservations 104

Installation and Configuration 108

QuorumModes 109

SharedWitness 111

Witness Modes 110

R

Rate of Change 287

RAW I/O 6

Recovery

After Failover 219

Non-Killable Process 272

Out-of-Service Hierarchies 273

Panic DuringManual Recovery 273

Server Failure 272

Removing LifeKeeper 219

Requirements

DataKeeper 55

Firewall 221

Hardware 285

SIOS Protection Suite for LinuxPage 485

Page 506: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Index

Quorum/Witness Package 107

Software 285

STONITH 115

Reservations

Disabling 104

SCSI 103

Resource Dependency

Creating 147

Deleting 148

Resource Hierarchies 13

Collapsing Tree 186

Creating 131, 299

File System 132

Generic Application 134

Raw Device 135

Deleting 148, 304

Example 17

Expanding Tree 186

Extending 144, 301

File System 145

Generic Application 145

Raw Device 146

Hierarchy Relationships 15

In Service 305

Information 16

Maintaining 219

Out of Service 305

Testing 305

Transferring 223

SIOS Protection Suite for LinuxPage 486

Page 507: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Index

Unextending 146, 304

Resource Policy Management 120

Resource Priorities 142

Resource Properties 142

Resource States 14

Resource Types 13

Resynchronization 314

Avoiding Full 315

S

Server Failure 314

Server Groups 7

Server Properties

Editing 128

Failover 130

Viewing 181

Shared Communication 8

Shared Data Resources 8

Shared Equivalencies 16

Starting LifeKeeper 174

Status Display

Detailed 17

Short 22

Status Table 173

STONITH

Disabling Reservations 104

Stopping LifeKeeper 175

Synchronous Mirroring 277

SIOS Protection Suite for LinuxPage 487

Page 508: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Index

T

Tag Name

Restrictions 273

Valid Characters 273

Technical Notes 223

Technical Support 2

Toolbars 158

GUI 158

Resource Context 160

Server Context 161

Troubleshooting 230, 350

Communication Paths 269

GUI

Troubleshooting 264

Incomplete Resource Created 269

Incomplete Resource Priority Modification 269

Known Issues 234

Restrictions 234

TTY Connections 30

V

View Options 184

Viewing

Connected Servers 180

Message History 185

Resource Properties 184

Resource Tags and IDs 182

Server Log Files 181

Server Properties 181

SIOS Protection Suite for LinuxPage 488

Page 509: SIOSProtectionSuiteforLinux TechnicalDocumentation v9docs.us.sios.com/Linux/9.2/LK4L/TechDocPDF/TechDoc.pdfTableofContents Introduction 1 AboutSIOSProtectionSuiteforLinux 1 SPS forLinuxIntegratedComponents

Index

Status of a Server 180

Status of Resources 182

W

Watchdog

Disabling Reservations 105

SIOS Protection Suite for LinuxPage 489