sql server clustering best practices

8/3/2019 SQL Server Clustering Best Practices

http://slidepdf.com/reader/full/sql-server-clustering-best-practices 1/6

1

SQL Server clustering best practices Reprints

SQL Server clustering is a high-availability technology for SQL Server instances. It involves the

sharing of server resources between one or more nodes (or servers), which have one or more

shared disks grouped into logical units called resource groups. A resource group containing at

least one IP address, network name and disk resource is called a virtual server. The cluster

service arbitrates ownership of the resource groups. A single node can own a resource group

and its associated resources at any given time.

The following table of contents will help you navigate this tip.

Clustering basics

Each virtual server appears on the network as a complete system. When the virtual server

contains SQL Server resources, clients connected to the virtual server access resources on its

current host node. While the terms "active" and "passive" are often used in this context, theyare not fixed roles, as all nodes in a cluster are interchangeable. Should the current host,

sometimes designated as the primary, fail; the resource group will be transferred to another

node (secondary node) in the cluster. With clusters having more than two nodes or two

instances, it is important to set failover order by choosing the preferred node ownership order

for each instance. The secondary will become the primary and host the virtual server. Active

client connections will be broken during failover, but they can reconnect to the virtual server

now hosted by the new node. The clients will have to reconnect manually, and work in progress

will be lost during the failover. Most commercial applications now handle this reconnection task

seamlessly.

The goal of clustering is to provide increased availability to clients by having a hot standbysystem with an automatic failover mechanism. SQL Server clustering is not a load-sharing or

scale-out technology. On all clusters during a failure there will be a brief database server

interruption. On large clusters with multiple nodes and instances, clients may experience

degraded performance during a failure event but they will not lose database availability.

Clustering topologies

There are four types of cluster topologies -- or arrangements of nodes in a cluster:

Single instance

Multi-instance

N+1

N+M

Single instance: In this case, one node in a cluster owns all resource groups at any one time and

the other nodes are offline. Should the primary node owning the resources fail, the resource

groups will be transferred to the secondary node, which comes online. While the secondary

http://reprints.ygsgroup.com/m/techtarget





2

node comes online, it will assume ownership of the resource groups, which typically consist of

disks containing your database files and transaction logs. The secondary node comes online

(starts up), and SQL Server will start up on the virtual server and roll uncommitted transactions

in the transaction log backward or forward as it recovers the database.

This topology was formerly called active-passive. Single-instance clustering is most frequently

used for mission-critical applications, where the cost of downtime far outweighs the cost of the

wasted hardware resources of the secondary node sitting idle while offline.

Multiple instance: In this situation, one virtual server in a cluster owns some of the resource

groups and another virtual server owns other resource groups. At any one time, the virtual

servers themselves can be hosted by a single node or different nodes and would appear to

clients as named instances of a single server. In that case, they are named instances of a virtual

server, hence the name multiple instance. With multiple-instance clustering, previously called

active-active, the hardware requirements of each individual node are greater as each node may

at any one time be hosting two (or more) virtual servers.

You should consider multiple-instance clusters to be more cost effective than single-instanceclusters as there are no nodes offline or waiting. However, should one node host more than

one virtual server, performance for clients are typically degraded. Your best bet is to use

multiple instances when you require high availability but not high performance.

N+1: This is a modification of multiple-instance clustering topologies where two or more nodes

share the same failover node. The secondary node will need enough hardware capabilities to

support the load of all N servers at any one time should they all fail over simultaneously. You

can achieve cost savings if multiple clusters use the same failover node. However, the cost of an

individual node tends to be small in comparison to other related clustering costs, such as

storage.

Many people consider N+1 to be more cost effective than multiple-instance clustering because

there is only one secondary node offline (or waiting) for several active nodes. However,

depending on the hardware configuration of the failover node, it does not offer the

performance of multiple-instance clustering. Use N+1 in environments where cost constraints

force you to reduce the number of failover nodes and you need high availability but not high

performance.

N+M: In a situation where you have two or more working nodes in a cluster along with two or

more standby nodes, it is typically configured in eight-node clusters with six working nodes for

every two standby, or five working nodes for every three standby.

N+M offers some of the cost benefits of N+1, but it has a lower chance of performance

degradation during a multiple failure event than N+1 since the failover node(s) do not have to

support the entire load of the failed nodes. Use N+M in environments where cost constraints

force you to reduce the number of failover nodes and at the same time provide a high level of

performance.



3

Clustering dependencies

SQL Server clustering has several dependencies:

Network

Hardware

Software

Network dependencies: Clustering requires a private network among all nodes in a cluster.

Clustering services use a private communication channel on each node to keep in sync with

each other. This allows the cluster to communicate and act appropriately even if the public

network is offline. Looks-Alive and Is-Alive checks -- used by cluster services to determine if a

cluster resource group is "up" -- connect over the public networks to best emulate a client

connection process.

Hardware dependencies: Clustering requires specialized hardware and software. And to shareresources between nodes, you need specialized disk controllers. Clustering hardware must be

certified by Microsoft to meet the requirements of clustering. And, you must have a second set

of network cards to provide the private network between cluster nodes.

Software dependencies: To benefit from clustering services, you need specialized versions of

the operating system (Windows 2000 and 2003 Enterprise or Data Center editions). You will

also need SQL Server 2000 Enterprise Edition, SQL Server 2005 Standard Edition (up to two

nodes) or SQL Server 2005 Enterprise Edition (up to eight nodes).

Clustering best practices

What follows is a list of clustering best practices. I have broken these down according to

dependencies.

Network best practices

There are two different and contradictory settings required for the public network and the

private network in clustering.

Private

Ensure the private network is private. Clustering requires a 150-ms ping response time. If your

private network is saturated or congested with other network traffic, you may find your clusters

failing over unexpectedly. On your private network, consider isolating traffic by implementing a

VLAN (virtual LAN), a separate subnet or use a crossover cable for Single-instance clusters. The

actual traffic generated by cluster communication is small, so high-bandwidth networks are

unnecessary. However, they must still be low latency and reliable. Make sure the following

points are established on the private network:

Use TCP/IP as the only protocol bound to the NIC.



4

No default gateway is configured.

No DNS servers are configured unless the cluster nodes are DNS servers, in which case

127.0.0.1 should be configured.

No DNS registration or DNS suffix is configured.

No WINS servers are configured.

Static IP addresses are used for all nodes.

NetBIOS over TCP/IP is disabled.

No NIC teaming is used, where two network interface cards are aggregated together to act as

a single NIC card.

Public

For your public network, use at least two WINS or DNS servers on your cluster network segment

or VLAN. While installing your cluster you will have to resolve cluster, DC (domain controller)

and virtual server names. You must have a name server on your network for this. You can

decrease the time required for a node to fail over by providing a name server on your network

as well.

Use at least two DCs on your network. Clustering requires DCs not only during setup but also

for normal functioning and failover.

If you use NIC teaming for greater bandwidth throughput and reliability, do not configure it

while building the cluster. Add NIC teaming as a last step before final testing. Be prepared to

"undo" NIC teaming as an early step in troubleshooting. Microsoft Customer Support Services

(CSS) will likely direct you to disable teaming as a first diagnostic step, so be ready.

Both

Ensure that your network card settings are identical for every server in your cluster and that

they are not configured to automatically detect network settings.

Software best practices

Ensure applications are cluster aware and will not lose work or fail to meet the SLA during a

cluster failover.

Ensure transactions are as small as possible in your application and on any jobs that may run on

your clustered SQL Servers. Long-running transactions increase the length of time required to

apply the transaction log on the failover node and consequently increase the amount of timefor failover.

Do not run antivirus software on cluster nodes. If you must run antivirus software, be sure the

quorum disk and database files are excluded from the scans. Even in this configuration, there

have been reports of antivirus drivers interfering with cluster disk resource failover. Test your

setup and make sure it fails as expected. Select another antivirus product if yours causes

problems.



5

Make sure there are no password expiration policies in use for any of the cluster-related

accounts. Cluster accounts should:

be the same for all nodes in the cluster;

have domain accounts (but not domain admin accounts) and have local administrative rights

on each node in the cluster. SQL Server 2005 forces you to set up domain-level groups for

these accounts and then grants appropriate rights to the groups.

have the least security privileges to minimize damage that could be done to the node or other

servers on your network should the password be compromised or the account be hijacked by

a buffer overflow.

Ensure all software components are the same version (i.e., SQL Server 2005 Standard), same

architecture (i.e., 64 bit for all OS and SQL Server components) and at the same service pack

and hot fix level. The exception is that individual SQL Server instances can be at different

releases, editions and hotfix levels.

Ensure all external software dependencies (COM components, file paths, binaries) are either

cluster aware or installed on all nodes in a cluster. MSDTC (Microsoft Distributed Transaction

Coordinator) is the most common external dependency in a cluster. While it is not necessary,

many people install it before installing SQL Server because installing it later is much harder.

When installing a cluster, consider installing a single-node cluster and adding nodes to the

cluster as required. This way, if the cluster setup fails while adding a single node, you are left

with a working cluster (although it could be a single-node cluster).

While applying hot fixes or service packs that require a system reboot, apply it to the primary

(current instance host), fail over to the secondary, reboot the primary, fail back to the primaryand reboot the secondary. Typically hot fixes and service packs are cluster aware and install on

all cluster nodes simultaneously.

Hardware

Ensure that your cluster is approved by the vendor and that it is part of the Microsoft Windows

Catalog with a specific endorsement for clustering.

Ensure each node in your cluster has identical hardware and components.

Regularly check vendor Web sites for potential hardware problems, fixes and BIOS patches for

each component in your cluster.Use the appropriate RAID technology to ensure that your disk array is fault tolerant. Be as

proactive as possible in replacing failed or marginal disks. A disk failure will put a greater load

on the remaining disks in an array and may cause other marginal disks to fail. Depending on

your RAID technology, your RAID array may not be tolerant to more than one disk failure per

array.



6

Ensure you have properly conditioned or charged batteries on any array controller. It prevents

data loss or corruption in the event of a power failure.

Use uninterrupted power supplies and be sure you have redundancy in your power supplies.

Use Hot-Add Memory if it's supported by your SQL Server version, operating system and

hardware. Hot-Add Memory is a hardware technology that allows you to add memory to arunning system; the OS detects and uses the additional memory. Windows Server 2003,

Enterprise and Data Center Editions, as well as SQL Server 2005 Enterprise Edition can take

advantage of Hot-Add Memory. Read about Hot-Add Memory Support in Windows Server 2003.

Use ECC (Error Correction Code) memory chips, which store parity information used to

reconstruct original data when errors are detected in data held in memory.

Use fault-tolerant NICs and network devices (switches).

Summary

Clustering is a relatively new technology and has a reputation for being fragile. SQL Server 2000clustering is far simpler than the earlier versions and has proven to be much more reliable.

Today, clustering on SQL Server 2000 and SQL Server 2005 is a highly reliable technology, but it

still has many dependencies that prevent it from meeting your high-availability goals. Foremost

among these dependencies is a staff that is trained and knowledgeable. Running a close second

is having operating processes and procedures that are designed to work specifically with a SQL

Server cluster. Ensure that you address all of your clustering dependencies to deliver high

availability with SQL Server clustering.

http://www.microsoft.com/whdc/system/pnppwr/hotadd/hotaddmem.mspx




sql server clustering best practices

Documents