sql server clustering best practices
TRANSCRIPT
8/3/2019 SQL Server Clustering Best Practices
http://slidepdf.com/reader/full/sql-server-clustering-best-practices 1/6
1
SQL Server clustering best practices Reprints
SQL Server clustering is a high-availability technology for SQL Server instances. It involves the
sharing of server resources between one or more nodes (or servers), which have one or more
shared disks grouped into logical units called resource groups. A resource group containing at
least one IP address, network name and disk resource is called a virtual server. The cluster
service arbitrates ownership of the resource groups. A single node can own a resource group
and its associated resources at any given time.
The following table of contents will help you navigate this tip.
Clustering basics
Each virtual server appears on the network as a complete system. When the virtual server
contains SQL Server resources, clients connected to the virtual server access resources on its
current host node. While the terms "active" and "passive" are often used in this context, theyare not fixed roles, as all nodes in a cluster are interchangeable. Should the current host,
sometimes designated as the primary, fail; the resource group will be transferred to another
node (secondary node) in the cluster. With clusters having more than two nodes or two
instances, it is important to set failover order by choosing the preferred node ownership order
for each instance. The secondary will become the primary and host the virtual server. Active
client connections will be broken during failover, but they can reconnect to the virtual server
now hosted by the new node. The clients will have to reconnect manually, and work in progress
will be lost during the failover. Most commercial applications now handle this reconnection task
seamlessly.
The goal of clustering is to provide increased availability to clients by having a hot standbysystem with an automatic failover mechanism. SQL Server clustering is not a load-sharing or
scale-out technology. On all clusters during a failure there will be a brief database server
interruption. On large clusters with multiple nodes and instances, clients may experience
degraded performance during a failure event but they will not lose database availability.
Clustering topologies
There are four types of cluster topologies -- or arrangements of nodes in a cluster:
Single instance
Multi-instance
N+1
N+M
Single instance: In this case, one node in a cluster owns all resource groups at any one time and
the other nodes are offline. Should the primary node owning the resources fail, the resource
groups will be transferred to the secondary node, which comes online. While the secondary
8/3/2019 SQL Server Clustering Best Practices
http://slidepdf.com/reader/full/sql-server-clustering-best-practices 2/6
2
node comes online, it will assume ownership of the resource groups, which typically consist of
disks containing your database files and transaction logs. The secondary node comes online
(starts up), and SQL Server will start up on the virtual server and roll uncommitted transactions
in the transaction log backward or forward as it recovers the database.
This topology was formerly called active-passive. Single-instance clustering is most frequently
used for mission-critical applications, where the cost of downtime far outweighs the cost of the
wasted hardware resources of the secondary node sitting idle while offline.
Multiple instance: In this situation, one virtual server in a cluster owns some of the resource
groups and another virtual server owns other resource groups. At any one time, the virtual
servers themselves can be hosted by a single node or different nodes and would appear to
clients as named instances of a single server. In that case, they are named instances of a virtual
server, hence the name multiple instance. With multiple-instance clustering, previously called
active-active, the hardware requirements of each individual node are greater as each node may
at any one time be hosting two (or more) virtual servers.
You should consider multiple-instance clusters to be more cost effective than single-instanceclusters as there are no nodes offline or waiting. However, should one node host more than
one virtual server, performance for clients are typically degraded. Your best bet is to use
multiple instances when you require high availability but not high performance.
N+1: This is a modification of multiple-instance clustering topologies where two or more nodes
share the same failover node. The secondary node will need enough hardware capabilities to
support the load of all N servers at any one time should they all fail over simultaneously. You
can achieve cost savings if multiple clusters use the same failover node. However, the cost of an
individual node tends to be small in comparison to other related clustering costs, such as
storage.
Many people consider N+1 to be more cost effective than multiple-instance clustering because
there is only one secondary node offline (or waiting) for several active nodes. However,
depending on the hardware configuration of the failover node, it does not offer the
performance of multiple-instance clustering. Use N+1 in environments where cost constraints
force you to reduce the number of failover nodes and you need high availability but not high
performance.
N+M: In a situation where you have two or more working nodes in a cluster along with two or
more standby nodes, it is typically configured in eight-node clusters with six working nodes for
every two standby, or five working nodes for every three standby.
N+M offers some of the cost benefits of N+1, but it has a lower chance of performance
degradation during a multiple failure event than N+1 since the failover node(s) do not have to
support the entire load of the failed nodes. Use N+M in environments where cost constraints
force you to reduce the number of failover nodes and at the same time provide a high level of
performance.
8/3/2019 SQL Server Clustering Best Practices
http://slidepdf.com/reader/full/sql-server-clustering-best-practices 3/6
3
Clustering dependencies
SQL Server clustering has several dependencies:
Network
Hardware
Software
Network dependencies: Clustering requires a private network among all nodes in a cluster.
Clustering services use a private communication channel on each node to keep in sync with
each other. This allows the cluster to communicate and act appropriately even if the public
network is offline. Looks-Alive and Is-Alive checks -- used by cluster services to determine if a
cluster resource group is "up" -- connect over the public networks to best emulate a client
connection process.
Hardware dependencies: Clustering requires specialized hardware and software. And to shareresources between nodes, you need specialized disk controllers. Clustering hardware must be
certified by Microsoft to meet the requirements of clustering. And, you must have a second set
of network cards to provide the private network between cluster nodes.
Software dependencies: To benefit from clustering services, you need specialized versions of
the operating system (Windows 2000 and 2003 Enterprise or Data Center editions). You will
also need SQL Server 2000 Enterprise Edition, SQL Server 2005 Standard Edition (up to two
nodes) or SQL Server 2005 Enterprise Edition (up to eight nodes).
Clustering best practices
What follows is a list of clustering best practices. I have broken these down according to
dependencies.
Network best practices
There are two different and contradictory settings required for the public network and the
private network in clustering.
Private
Ensure the private network is private. Clustering requires a 150-ms ping response time. If your
private network is saturated or congested with other network traffic, you may find your clusters
failing over unexpectedly. On your private network, consider isolating traffic by implementing a
VLAN (virtual LAN), a separate subnet or use a crossover cable for Single-instance clusters. The
actual traffic generated by cluster communication is small, so high-bandwidth networks are
unnecessary. However, they must still be low latency and reliable. Make sure the following
points are established on the private network:
Use TCP/IP as the only protocol bound to the NIC.
8/3/2019 SQL Server Clustering Best Practices
http://slidepdf.com/reader/full/sql-server-clustering-best-practices 4/6
4
No default gateway is configured.
No DNS servers are configured unless the cluster nodes are DNS servers, in which case
127.0.0.1 should be configured.
No DNS registration or DNS suffix is configured.
No WINS servers are configured.
Static IP addresses are used for all nodes.
NetBIOS over TCP/IP is disabled.
No NIC teaming is used, where two network interface cards are aggregated together to act as
a single NIC card.
Public
For your public network, use at least two WINS or DNS servers on your cluster network segment
or VLAN. While installing your cluster you will have to resolve cluster, DC (domain controller)
and virtual server names. You must have a name server on your network for this. You can
decrease the time required for a node to fail over by providing a name server on your network
as well.
Use at least two DCs on your network. Clustering requires DCs not only during setup but also
for normal functioning and failover.
If you use NIC teaming for greater bandwidth throughput and reliability, do not configure it
while building the cluster. Add NIC teaming as a last step before final testing. Be prepared to
"undo" NIC teaming as an early step in troubleshooting. Microsoft Customer Support Services
(CSS) will likely direct you to disable teaming as a first diagnostic step, so be ready.
Both
Ensure that your network card settings are identical for every server in your cluster and that
they are not configured to automatically detect network settings.
Software best practices
Ensure applications are cluster aware and will not lose work or fail to meet the SLA during a
cluster failover.
Ensure transactions are as small as possible in your application and on any jobs that may run on
your clustered SQL Servers. Long-running transactions increase the length of time required to
apply the transaction log on the failover node and consequently increase the amount of timefor failover.
Do not run antivirus software on cluster nodes. If you must run antivirus software, be sure the
quorum disk and database files are excluded from the scans. Even in this configuration, there
have been reports of antivirus drivers interfering with cluster disk resource failover. Test your
setup and make sure it fails as expected. Select another antivirus product if yours causes
problems.
8/3/2019 SQL Server Clustering Best Practices
http://slidepdf.com/reader/full/sql-server-clustering-best-practices 5/6
5
Make sure there are no password expiration policies in use for any of the cluster-related
accounts. Cluster accounts should:
be the same for all nodes in the cluster;
have domain accounts (but not domain admin accounts) and have local administrative rights
on each node in the cluster. SQL Server 2005 forces you to set up domain-level groups for
these accounts and then grants appropriate rights to the groups.
have the least security privileges to minimize damage that could be done to the node or other
servers on your network should the password be compromised or the account be hijacked by
a buffer overflow.
Ensure all software components are the same version (i.e., SQL Server 2005 Standard), same
architecture (i.e., 64 bit for all OS and SQL Server components) and at the same service pack
and hot fix level. The exception is that individual SQL Server instances can be at different
releases, editions and hotfix levels.
Ensure all external software dependencies (COM components, file paths, binaries) are either
cluster aware or installed on all nodes in a cluster. MSDTC (Microsoft Distributed Transaction
Coordinator) is the most common external dependency in a cluster. While it is not necessary,
many people install it before installing SQL Server because installing it later is much harder.
When installing a cluster, consider installing a single-node cluster and adding nodes to the
cluster as required. This way, if the cluster setup fails while adding a single node, you are left
with a working cluster (although it could be a single-node cluster).
While applying hot fixes or service packs that require a system reboot, apply it to the primary
(current instance host), fail over to the secondary, reboot the primary, fail back to the primaryand reboot the secondary. Typically hot fixes and service packs are cluster aware and install on
all cluster nodes simultaneously.
Hardware
Ensure that your cluster is approved by the vendor and that it is part of the Microsoft Windows
Catalog with a specific endorsement for clustering.
Ensure each node in your cluster has identical hardware and components.
Regularly check vendor Web sites for potential hardware problems, fixes and BIOS patches for
each component in your cluster.Use the appropriate RAID technology to ensure that your disk array is fault tolerant. Be as
proactive as possible in replacing failed or marginal disks. A disk failure will put a greater load
on the remaining disks in an array and may cause other marginal disks to fail. Depending on
your RAID technology, your RAID array may not be tolerant to more than one disk failure per
array.
8/3/2019 SQL Server Clustering Best Practices
http://slidepdf.com/reader/full/sql-server-clustering-best-practices 6/6
6
Ensure you have properly conditioned or charged batteries on any array controller. It prevents
data loss or corruption in the event of a power failure.
Use uninterrupted power supplies and be sure you have redundancy in your power supplies.
Use Hot-Add Memory if it's supported by your SQL Server version, operating system and
hardware. Hot-Add Memory is a hardware technology that allows you to add memory to arunning system; the OS detects and uses the additional memory. Windows Server 2003,
Enterprise and Data Center Editions, as well as SQL Server 2005 Enterprise Edition can take
advantage of Hot-Add Memory. Read about Hot-Add Memory Support in Windows Server 2003.
Use ECC (Error Correction Code) memory chips, which store parity information used to
reconstruct original data when errors are detected in data held in memory.
Use fault-tolerant NICs and network devices (switches).
Summary
Clustering is a relatively new technology and has a reputation for being fragile. SQL Server 2000clustering is far simpler than the earlier versions and has proven to be much more reliable.
Today, clustering on SQL Server 2000 and SQL Server 2005 is a highly reliable technology, but it
still has many dependencies that prevent it from meeting your high-availability goals. Foremost
among these dependencies is a staff that is trained and knowledgeable. Running a close second
is having operating processes and procedures that are designed to work specifically with a SQL
Server cluster. Ensure that you address all of your clustering dependencies to deliver high
availability with SQL Server clustering.