high availability and performance oracle configuration...

1

SY

MA

TE

C S

OL

UT

ION

S D

EP

LO

YM

EN

T G

UID

ES

High Availability and

Performance Oracle

Configuration with Flexible

Shared Storage in a SAN-Free

Environment using Intel SSDs

Author: Carlos Carrero

Technical Product Manager 10th December 2013 Version 7

2

Table of Contents Introduction .................................................................................................................................................. 4

Setup ............................................................................................................................................................. 4

Hardware .................................................................................................................................................. 4

Software .................................................................................................................................................... 4

Architecture .............................................................................................................................................. 5

Deployment Steps ......................................................................................................................................... 5

InfiniBand and RDMA Set UP ........................................................................................................................ 6

Linux Drivers.............................................................................................................................................. 6

Packages installation ................................................................................................................................. 7

Configuring RDMA over Infiniband ........................................................................................................... 7

Configure IP addresses .............................................................................................................................. 9

Enable Max Performance on CPUs ......................................................................................................... 12

Storage Foundation Cluster File System HA 6.1 Installation ...................................................................... 14

Packages Deployment ............................................................................................................................. 14

Configure SFCFSHA ................................................................................................................................. 16

Verify the configuration .......................................................................................................................... 20

Fencing Configuration ............................................................................................................................. 21

Password-less ssh with CP Servers ...................................................................................................... 21

Configuration ...................................................................................................................................... 22

Intel SSD Configuration ............................................................................................................................... 26

Drives configuration ................................................................................................................................ 26

Tuning SSD performance ........................................................................................................................ 28

Node 1 SetUp ...................................................................................................................................... 28

Node 2 SetUp ...................................................................................................................................... 29

Volumes and File Systems Configuration .................................................................................................... 30

Initialize and rename the internal SSD devices ....................................................................................... 31

Make internal SSD devices available to the cluster ................................................................................ 33

Create a File System for redo logs .......................................................................................................... 35

Create a File System for Data .................................................................................................................. 36

Oracle Configuration and Tuning ................................................................................................................ 38

3

Installing Oracle binaries in each node ................................................................................................... 38

Instance configuration ............................................................................................................................ 39

Oracle Disk Manager configuration ........................................................................................................ 41

Oracle redo log configuration ................................................................................................................. 42

Oracle huge pages configuration ............................................................................................................ 42

Oracle HA and Fast Failover Configuration ................................................................................................. 43

Oracle agent configuration in VCS .......................................................................................................... 43

Remove previous HA configuration for the mount points ..................................................................... 44

Service Groups and Resources for Oracle HA ......................................................................................... 44

Service Group tpcc_data ..................................................................................................................... 45

Service Group tpcc_instance............................................................................................................... 48

Fast Failover Setting ................................................................................................................................ 52

4

Introduction This document is a step by step guide to achieve a high availability and high performance environment

for Oracle databases without the need of SAN Storage. For this purpose, Symantec Cluster File System

High Availability (SFCFSHA) 6.1 will be used with the new Flexible Storage Sharing (FSS) feature. FSS

allows clustering up to 8 nodes without requiring shared storage while providing high performance and

full protection to both mission critical data and applications. To provide a high number of transactions

per second and accelerate performance, internal Solid State Drives from Intel will be used. FSS within

SFCFSHA will be used to mirror the internal data across servers, ensuring two copies (one in each server)

of the data at all times.

It is important to note that, while in this documentation FSS will be used with internal storage only, FSS

does not limit existing capabilities of using either Direct Attached Storage or SAN for larger capacity. FSS

within SFCFSHA can create hybrid models by using both internal and shared storage.

This documentation is not a supplement of the Installation and Administration Guide that should be

consulted for further information. The steps presented here just pretend to be a guide for a very specific

configuration described below.

Setup

Hardware

2 x Sandy Bridge generation servers. Each of the server nodes contains:

- 3 x Intel SSD DC S3700 (800GB) for data

- 2 x Intel SSD DC S3700 (200GB) for redo logs

- 512 GB Memory

- 40 x CPUs (2.3MHz)

- 1 x Mellanox 56Gb/s NIC card

Software

- Symantec Storage Foundation Cluster File System 6.1 GA

- Oracle 11gR2 single instance

- RedHat Enterprise Linux 6.3

5

Architecture

The configuration consists in two servers using only Intel SSDs as internal storage. There are two file systems, one for redo logs and another for data files that are accessible from the two nodes of the cluster. These are based on two clustered volumes that are striped and mirrored across the two servers. Every write will be made in parallel in the two servers, while the reads will be served locally. The two servers are interconnected using one InfiniBand 56Gb/s link Oracle single instance will be used together with Fast Failover capability for Symantec Cluster Server, so it can be restarted in a few seconds in the other node in case of failure. Three Coordination Point (CP) Servers are used to deal with arbitration in case of any split brain occurs.

. Figure 1

Deployment Steps Figure 2 outlines the different steps needed to complete the deployment. The HW setup and RHEL

installation will not be covered in this document. For the HW setup a single cable has been used to

connect the two InfiniBand cards. In case of having more than two nodes, one InfiniBand switch should

be used. The Intel SSD cards have been plugged into the internal server slots.

In the RDMA configuration section the drivers needed for the specific RedHat version used will be listed

and information about how to configure InfiniBand and RDMA will be included. Once the new IB

interfaces appear in the system, SFCFSHA can be installed and configured. The IB link will be used for

both heart beat and IO shipping across nodes. The public interface will be used as a lower priority link.

Once RDMA is configured, there is some specific tuning that will be outlined in order to get the best

performance from the Intel SSDs.

The next step will be to configure the volumes and clustered file systems for both redo logs and data

and how to make then available across the two server nodes.

The Oracle section will not cover a full Oracle deployment, but will just highlight the specific tuning and

setting that has been done for this setup.

6

Finally, Oracle will be configured within Symantec Cluster Server and Fast Failover capability will be

enabled.

Figure 2

InfiniBand and RDMA Set UP New to Symantec Cluster File System HA 6.1 is Low Latency Protocol (LLT) and GAB support for high-

speed interconnects using RDMA technology over InfiniBand or Ethernet (RoCE). LLT maintains two

channels (RDMA and non-RDMA) for each link. The RDMA channel is mainly used for data transfer and

the non-RDMA one created over UDP is used for sending and receiving heartbeats.

As described above, one Mellanox InfiniBand 56Gb/s is going to be used at each server. For a production

environment, two cards at each server are recommended, so a single point of failure is avoided.

Something to notice is that SFCFSHA will be using the native Linux drivers to manage the Host Channel

Adapter (HCA), so it is important to install the native ones and not the drivers provided by the HW

vendor. This will be the first step to be performed.

Linux Drivers

As highlighted, it is important to install and configure the native Linux drivers and not the ones provided

by Mellanox. Symantec does not yet support any external Mellanox OFED packages. Each of the servers

needs to have these packages versions (or higher) installed:

librdmacm-1.0.10-2.el6.x86_64.rpm

librdmacm-devel-1.0.10-2.el6.x86_64.rpm

7

librdmacm-utils-1.0.10-2.el6.x86_64.rpm

rdma-1.0-9.el6.noarch.rpm

libmthca-1.0.5-7.el6.x86_64.rpm

libmlx4-1.0.1-7.el6.x86_64.rpm

opensm-3.3.5-1.el6.x86_64.rpm

opensm-libs-3.3.5-1.el6.x86_64.rpm

libibumad-1.3.4-1.el6.x86_64.rpm

libibumad-devel-1.3.4-1.el6.x86_64.rpm

ibutils-1.5.4-3.el6.x86_64.rpm

infiniband-diags

perftest-1.2.3-3.el6.x86_64.rpm

libibverbs-1.1.4-2.el6.x86_64.rpm

libibverbs-devel-1.1.4-2.el6.x86_64.rpm

libibverbs-utils-1.1.4-2.el6.x86_64.rpm

The list presented here is a specific one for the RedHat distribution used during this configuration. For a

more generic list, including SUSE packages please refer to the section “Using LLT over RDMA” of the

Installation Guide.

Packages installation

During this configuration, yum is going to be used to deploy the packages needed:

# yum install librdmacm (it was already installed in our RH distribution)

# yum install librdmacm-devel

# yum install librdmacm-utils

# yum install rdma

# yum install libmthca

# yum install libmlx4

# yum install opensm (this also install opensm-libs)

# yum install libibumad (it was already installed in our RH distribution)

# yum install libibumad-devel

# yum install ibutils

# yum install infiniband-diags

# yum install perftest

# yum install libibverbs (it was already installed in our RH distribution)

# yum install libibverbs-devel (it was already installed in our RH distribution)

# yum install libibverbs-utils

Configuring RDMA over Infiniband

The InfiniBand interfaces are not visible by default until InfiniBand drivers are loaded:

8

# modprobe rdma_cm

# modprobe rdma_ucm

# modprobe mlx4_en

# modprobe mlx4_ib

# modprobe ib_mthca

# modprobe ib_ipoib

# modprobe ib_umad

Drivers loaded:

# lsmod | egrep "ib|rdma|mlx4"

ib_umad 12122 0

ib_ipoib 77230 0

ib_mthca 137429 0

rdma_ucm 13433 0

ib_uverbs 36269 1 rdma_ucm

rdma_cm 35253 1 rdma_ucm

ib_cm 37028 2 ib_ipoib,rdma_cm

iw_cm 8740 1 rdma_cm

ib_sa 22854 4 ib_ipoib,rdma_ucm,rdma_cm,ib_cm

ib_addr 6091 1 rdma_cm

mlx4_ib 55056 0

mlx4_en 70097 0

mlx4_core 177697 2 mlx4_ib,mlx4_en

ipv6 322541 198 ib_ipoib,ib_addr

ib_mad 40544 5 ib_umad,ib_mthca,ib_cm,ib_sa,mlx4_ib

ib_core 74343 11

ib_umad,ib_ipoib,ib_mthca,rdma_ucm,ib_uverbs,rdma_cm,ib_cm,iw_cm,ib_sa,mlx4_ib,ib_

mad

[root@intel-eva2 ~]#

In order to load the drivers at boot time, modify the /etc/rdma/rdma.conf file on the operating system

with the following values:

ONBOOT=yes

RDMA_UCM_LOAD=yes

MTHCA_LOAD=yes

IPOIB_LOAD=yes

SDP_LOAD=yes

MLX4_LOAD=yes

MLX4_EN_LOAD=yes

Enable RDMA service:

# chkconfig --level 235 rdma on

9

Start OpenSM:

# /etc/init.d/opensm start

Enable Linux service to start OpenSM automatically after restart

# chkconfig --level 235 opensm on

Apply all the previous steps to the other node(s) in the cluster.

Configure IP addresses

Once the drivers are configured in both nodes, it is time to configure IP addresses for the IB interface we

are planning to use.

Verify the IB configuration at each node. First node:

[root@intel-eva1 ~]# ibstat

CA 'mlx4_0'

CA type: MT4099

Number of ports: 1

Firmware version: 2.10.600

Hardware version: 0

Node GUID: 0x0002c90300193bd0

System image GUID: 0x0002c90300193bd3

Port 1:

State: Initializing

Physical state: LinkUp

Rate: 56

Base lid: 0

LMC: 0

SM lid: 0

Capability mask: 0x02514868

Port GUID: 0x0002c90300193bd1

Link layer: InfiniBand


Second node:

[root@intel-eva2 ~]# ibstat

CA 'mlx4_0'

CA type: MT4099

Number of ports: 1

Firmware version: 2.11.1140

Hardware version: 0

Node GUID: 0x0002c90300365bc0

System image GUID: 0x0002c90300365bc3

Port 1:

10

State: Initializing

Physical state: LinkUp

Rate: 56

Base lid: 0

LMC: 0

SM lid: 0

Capability mask: 0x02514868

Port GUID: 0x0002c90300365bc1

Link layer: InfiniBand


Now ifconfig shows the ib0 interface in our configuration:

ib0 Link encap:InfiniBand HWaddr

80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00

UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:256

RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

The IP addresses for the internal interconnect that will be used in this setup are:

Node 1 (intel-eva1)

Link0: 192.168.27.1

Node 2 (intel-eva2)

Link0: 192.168.27.2

In order to configure an IP address, modify the file /etc/sysconfig/network-scripts/ifcfg-ib0 in

each of the nodes:

Node 1:

DEVICE="ib0"

BOOTPROTO="static"

#DHCP_HOSTNAME="intel-eva1"

HWADDR="80:00:00:48:FE:80:00:00:00:00:00:00:00:02:C9:03:00:19:3B:D1"

NM_CONTROLLED="no"

ONBOOT="yes"

TYPE="InfiniBand"

UUID="66a74acb-21d7-47b3-9b80-d57af0cab53c"

IPADDR=192.168.27.1

NETMASK=255.255.255.0

NETWORK=192.168.27.0

BROADCAST=192.168.27.255

11

Node 2:

DEVICE="ib0"

BOOTPROTO="static"

DHCP_HOSTNAME="intel-eva2"

HWADDR="80:00:00:48:FE:80:00:00:00:00:00:00:00:02:C9:03:00:36:5B:C1"

NM_CONTROLLED="no"

ONBOOT="yes"

TYPE="InfiniBand"

UUID="84fdd382-fb56-471d-93c6-ed70bb4224aa"

IPADDR=192.168.27.2

NETMASK=255.255.255.0

NETWORK=192.168.27.0

BROADCAST=192.168.27.255

And restart the network service with

# service network restart

We can see how the IP is now up at the ib0 interface at node 1.


80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00

inet addr:192.168.27.1 Bcast:192.168.27.255 Mask:255.255.255.0






Output of ifconfig for node 2:


80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00

inet addr:192.168.27.2 Bcast:192.168.27.255 Mask:255.255.255.0






Verify ping is working:

[root@intel-eva1 ~]# ping 192.168.27.2

PING 192.168.27.2 (192.168.27.2) 56(84) bytes of data.

12

64 bytes from 192.168.27.2: icmp_seq=1 ttl=64 time=0.270 ms

64 bytes from 192.168.27.2: icmp_seq=2 ttl=64 time=0.274 ms

And check the connection using ibping also. In one of the servers run:

[root@intel-eva1]# ibping –S

Note that it will not provide any response. From the other node, run ibping –G using the Port GUID

provided by ibstat command (see previous output):

[root@intel-eva2]# ibping -G 0x0002c90300193bd1

Pong from intel-eva1.(none) (Lid 2): time 0.506 ms

Pong from intel-eva1.(none) (Lid 2): time 0.549 ms

Notice the high latency shown by the ibping command. It should be under 30us, which indicates the

system may need some tuning.

Enable Max Performance on CPUs

In order to get the best performance of the IB or RoCE interconnect “Max Performance” for the servers

needs to be enabled. Depending on your server BIOS this can be achieved in different ways.

In this specific setup, the steps are:

- Restart the system and enter the BIOS settings.

- Go to BIOS menu > Launch System setup > BIOS settings > System Profile Settings > System

Profile > Max performance

In the Intel servers used in this configuration this is how it looks like after enabling “Turbo Boost

Technology”:

The CPU needs to run at the max clock rate to get the best performance. Initially that is not the case on

the default installation setup. Bellow can be seen that the real CPU MHz is 2.40GHz but it is running at

1GHz.

13

[root@intel-eva1]# cat /proc/cpuinfo | more

processor : 0

vendor_id : GenuineIntel

cpu family : 6

model : 47

model name : Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz

stepping : 2

cpu MHz : 1064.000

cache size : 30720 KB

physical id : 0

In order to fix that, edit the file /boot/grub/grub.conf and add the following to the boot parameters:

intel_idle.max_cstate=0 processor.max_cstate=1

Perform that operation in both servers and reboot them. This is how the grub.conf file looks like in this

particular setup:

[root@intel-eva1]# cat /boot/grub/grub.conf

# grub.conf generated by anaconda

#

# Note that you do not have to rerun grub after making changes to this file

# NOTICE: You have a /boot partition. This means that

# all kernel and initrd paths are relative to /boot/, eg.

# root (hd0,0)

# kernel /vmlinuz-version ro root=/dev/sda2

# initrd /initrd-[generic-]version.img

#boot=/dev/sda

default=0

timeout=5

splashimage=(hd0,0)/grub/splash.xpm.gz

hiddenmenu

title Red Hat Enterprise Linux (2.6.32-279.el6.x86_64)

root (hd0,0)

kernel /vmlinuz-2.6.32-279.el6.x86_64 ro root=UUID=20d1e2a7-b36c-447a-

a35b-6c30f636cbc2 rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8

console=tty0 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto console=tty0

console=ttyS0,19200 rd_NO_LVM rd_NO_DM rhgb quiet intel_idle.max_cstate=0

processor.max_cstate=1

initrd /initramfs-2.6.32-279.el6.x86_64.img

[root@intel-eva1 grub]#

Additionally, make sure that each CPU is set for performance as the scaling governor. This will provide

the better performance:

https://access.redhat.com/site/documentation/en-

US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/cpufreq_setup.html

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/cpufreq_setup.html

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/cpufreq_setup.html

14

The servers used in this setup do not have the cpupower package installed, so we have changed the

setting manually. Verify the governors available:

[root@intel-eva1]# cat

/sys/devices/system/cpu/cpu1/cpufreq/scaling_available_governors

ondemand userspace performance

And manually set performance for each cpu:

[root@intel-eva1 cpufreq]# for i in `ls

/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor`

> do

> echo $i

> echo performance > $i

> done

And verify the setting make effect:

[root@intel-eva1 cpufreq]# cat /proc/cpuinfo | grep "cpu MHz"

cpu MHz : 2395.000

cpu MHz : 2395.000

cpu MHz : 2395.000

cpu MHz : 2395.000

cpu MHz : 2395.000

…

Storage Foundation Cluster File System HA 6.1 Installation

Packages Deployment

In order to present some of the new features of the installer, and also in order to have a complete

reference, the steps taken to deploy SFCFSHA 6.1 for this particular configuration will be noted here.

As usual, run the installer script.

[root@intel-eva1 rhel6_x86_64]# ./installer

Storage Foundation and High Availability Solutions 6.1 Install Program

Symantec Product Version Installed on intel-eva1 Licensed

================================================================================

Symantec Licensing Utilities (VRTSvlic) are not installed due to which products

and licenses are not discovered.

15

Use the menu below to continue.

Task Menu:

P) Perform a Pre-Installation Check I) Install a Product

C) Configure an Installed Product G) Upgrade a Product

O) Perform a Post-Installation Check U) Uninstall a Product

L) License a Product S) Start a Product

D) View Product Descriptions X) Stop a Product

R) View Product Requirements ?) Help

Enter a Task: [P,I,C,G,O,U,L,S,D,X,R,?] I

Select to install SFCFSHA (option 5)

Storage Foundation and High Availability Solutions 6.1 Install Program

1) Symantec Dynamic Multi-Pathing (DMP)

2) Symantec Cluster Server (VCS)

3) Symantec Storage Foundation (SF)

4) Symantec Storage Foundation and High Availability (SFHA)

5) Symantec Storage Foundation Cluster File System HA (SFCFSHA)

6) Symantec Storage Foundation for Oracle RAC (SF Oracle RAC)

7) Symantec ApplicationHA (ApplicationHA)

b) Back to previous menu

Select a product to install: [1-7,b,q] 5

Agree on the EULA and select option 3 to install all rpms. All packages are needed because this option

includes Coordination Point Server packages.

Symantec Storage Foundation Cluster File System HA 6.1 Install Program

1) Install minimal required rpms - 492 MB required

2) Install recommended rpms - 769 MB required

3) Install all rpms - 793 MB required

4) Display rpms to be installed for each option

Select the rpms to be installed on all systems? [1-4,q,?] (2) 3

These are the two nodes that compose our cluster:

Enter the 64 bit RHEL6 system names separated by spaces: [q,?] (intel-eva1 intel-eva2)

The installer will verify all pre-requisites are met. If ssh passwordless has not been enabled, the installer

will allow to automatically set it up just by entering the root password of the other node.

16

Once all the pre-requisites are met, the installer will deploy packages in both nodes at the same time.

Here an example of the first steps:


intel-eva1 intel-eva2

Logs are being written to /var/tmp/installer-201307040646nTM while installer is

in progress

Installing SFCFSHA: 13% _________________________________________

Estimated time remaining in total: (mm:ss) 2:25 4 of 30

Performing SFCFSHA preinstall tasks ............................... Done

Installing VRTSperl rpm ........................................... Done

Installing VRTSvlic rpm ........................................... Done

Installing VRTSspt rpm ............................................ Done

Installing VRTSvxvm rpm |

Once the package installation has finished, the installer will ask for a license key. These servers will be

controlled by Veritas Operation Manager (VOM), so we can enable keyless licensing (option 2).



To comply with the terms of Symantec's End User License Agreement, you have 60

days to either:

* Enter a valid license key matching the functionality in use on the systems

* Enable keyless licensing and manage the systems with a Management Server. For

more details visit http://go.symantec.com/sfhakeyless. The product is fully

functional during these 60 days.

1) Enter a valid license key

2) Enable keyless licensing and complete system licensing later

How would you like to license the systems? [1-2,q] (2) 2

In this setup neither replication nor Global Cluster Option will be used.

Would you like to enable replication? [y,n,q] (n) n

Would you like to enable the Global Cluster Option? [y,n,q] (n) n

Configure SFCFSHA

Once the system has been registered, the installer will provide the option to configure the cluster. We

can postpone this action if needed and run it later using installer –config option. In this case we are

going to complete the cluster configuration in a single step.

Would you like to configure SFCFSHA on intel-eva1 intel-eva2? [y,n,q] (n) y

This setup will not be using any SAN, and therefore, Coordination Point Servers will be used for split-

brain protection. Fencing will be configured in a later section, and therefore we will answer no at this

point.

Do you want to configure I/O Fencing in enabled mode? [y,n,q,?] (y) n

17

Enter the cluster name.

Enter the unique cluster name: [q,?] intel-eva

And now the private link needs to be chosen. In previous steps, InfiniBand over RDMA has been

configured. SFCFSHA supports the use of RDMA for the private link (option 3) and this will be the option

selected in this setup to reduce latency and increase throughput.

1) Configure heartbeat links using LLT over Ethernet

2) Configure heartbeat links using LLT over UDP

3) Configure heartbeat links using LLT over RDMA

4) Automatically detect configuration for LLT over Ethernet


How would you like to configure heartbeat links? [1-4,b,q,?] (4) 3

For this deployment InfiniBand will be used, so option 2 will be selected.

1) Converged Ethernet (RoCE)

2) InfiniBand


Choose the RDMA interconnect type [1-2,b,q,?] (1) 2

The installer will verify that all IB packages needed are present. If any of those packages are not installed

the installer will indicate the missing packages. Please refer to the previous steps where InfiniBand and

RDMA setup was explained and verify all the packages needed have been properly installed. Once

correct, the installer will detect our ib0 interface.

Checking required OS rpms for LLT over RDMA on intel-eva1 ........................................ Done Checking required OS rpms for LLT over RDMA on intel-eva2 ........................................ Done Checking RDMA driver and configuration on intel-eva1 ............................................. Done Checking RDMA opensm service on intel-eva1 ....................................................... Done Checking RDMA driver and configuration on intel-eva2 ............................................. Done Checking RDMA opensm service on intel-eva2 ....................................................... Done

Configuring and starting RDMA drivers on intel-eva1 .............................................. Done

Configuring and starting RDMA drivers on intel-eva2 .............................................. Done

Checking the IP address for the RDMA enabled NICs on intel-eva1 .................................. Done

Checking the IP address for the RDMA enabled NICs on intel-eva2 .................................. Done

More detailed information about the IP address of the RDMA enabled NICs:

System RDMA NIC IP Address

================================================================================

intel-eva1 ib0 192.168.27.1

intel-eva2 ib0 192.168.27.2

18

Discovering NICs on intel-eva1 ......................................................... Discovered ib0

Enter the NIC for the first private heartbeat link (RDMA) on intel-eva1: [b,q,?] (ib0)

Select ib0 and the IP address previously configured.

Do you want to use address 192.168.27.1 for the first private heartbeat link on

intel-eva1: [y,n,q,b,?] (y) y

Enter the port for the first private heartbeat link (RDMA) on intel-eva1:

[b,q,?] (50000)

Would you like to configure a second private heartbeat link? [y,n,q,b,?] (y) n

The public network will be used a low-pri interconnect:

Enter the NIC for the low-priority heartbeat link(RDMA or UDP) on intel-eva1: [b,q,?] (eth0)

Input 'y' to go on configuring the RDMA link, input 'n' for the UDP link [y,n,q,b] (y) n

Do you want to use the address 10.182.74.220 for the low-priority heartbeat link on intel-eva1:

[y,n,q,b,?] (y)

Enter the UDP port for the low-priority heartbeat link on intel-eva1: [b,q,?] (50010)

Are you using the same NICs for private heartbeat links on all systems? [y,n,q,b,?] (y)

Do you want to use the address 192.168.27.2 for the first private heartbeat link on intel-eva2:

[y,n,q,b,?] (y)

The RDMA Port for this link: 50000

Do you want to use the address 10.182.74.221 for the low-priority heartbeat link on intel-eva2:

[y,n,q,b,?] (y)

The UDP Port for this link: 50010

Checking media speed for ib0 on intel-eva1 ................................................... 56Gb/sec

Checking media speed for ib0 on intel-eva2 ................................................... 56Gb/sec

Enter a unique cluster ID number between 0-65535: [b,q,?] (28671)

The installer will verify that cluster ID is not in use on that network:

The cluster cannot be configured if the cluster ID 28671is in use by another

cluster. Installer can perform a check to determine if the cluster ID is

duplicate. The check will take less than a minute to complete.

Would you like to check if the cluster ID is in use by another cluster? [y,n,q]

(y) y

Checking cluster ID ............................................... Done

Duplicated cluster ID detection passed. The cluster ID 28671 can be used for the cluster.

19

Press [Enter] to continue:

Verify all the information is correct and let the installer configure the cluster.

Cluster information verification:

Cluster Name: intel-eva

Cluster ID Number: 28671

Private Heartbeat NICs for intel-eva1:

link1=ib0 over RDMA

ip 192.168.27.1 netmask 255.255.255.0 port 50000

Low-Priority Heartbeat NIC for intel-eva1:

link-lowpri1=eth0 over UDP

ip 10.182.74.220 netmask 255.255.240.0 port 50010

Private Heartbeat NICs for intel-eva2:

link1=ib0 over RDMA

ip 192.168.27.2 netmask 255.255.255.0 port 50000

Low-Priority Heartbeat NIC for intel-eva2:

link-lowpri1=eth0 over UDP

ip 10.182.74.221 netmask 255.255.240.0 port 50010

Is this information correct? [y,n,q,?] (y)

A Virtual IP may be added to manage the cluster using that IP. In this setup it will not be used.

For simplification purposes of this example, secure mode will not be used. Accept the default cluster

credentials.

Would you like to configure the VCS cluster in secure mode? [y,n,q,?] (n)

No SMTP

Do you want to configure SMTP notification? [y,n,q,?] (n) n

Processes will be stopped.

Do you want to stop SFCFSHA processes now? [y,n,q,?] (y)

And the configuration will automatically start.

Logs are being written to /var/tmp/installer-201307040646nTM while installer is

in progress

Starting SFCFSHA: 16% __________________________________________

Estimated time remaining in total: (mm:ss) 2:35 4 of 24

Performing SFCFSHA configuration .................................. Done

Starting vxdmp .................................................... Done

Starting vxio ..................................................... Done

Starting vxspec ................................................... Done

Starting vxconfigd

Take note of the log directory in case any troubleshooting during the install process is needed.

20

Finally the installer will report a success install.

Symantec Storage Foundation Cluster File System HA Startup completed successfully

Verify the configuration

First verify that cluster is up and running and the two nodes are available:

[root@intel-eva1 rhel6_x86_64]# hastatus -sum

-- SYSTEM STATE

-- System State Frozen

A intel-eva1 RUNNING 0

A intel-eva2 RUNNING 0

-- GROUP STATE

-- Group System Probed AutoDisabled State

B cvm intel-eva1 Y N ONLINE

B cvm intel-eva2 Y N ONLINE

[root@intel-eva1 rhel6_x86_64]#

Given we have used RDMA for our interconnect in this deployment we are going to verify it is correct.

[root@intel-eva1 ~]# lltstat -l

LLT link information:

link 0 ib0 on rdma hipri

mtu 8192, sap 0xc350, broadcast 192.168.27.255, addrlen 4

txpkts 4657 txbytes 816981

rxpkts 4001 rxbytes 413411

latehb 0 badcksum 0 errors 0


And verify the link is active.

[root@intel-eva1 ~]# lltstat -rnvv active

LLT node information:

Node State Link Status TxRDMA RxRDMA Address

* 0 intel-eva1 OPEN

ib0 UP UP UP 192.168.27.1

1 intel-eva2 OPEN

ib0 UP UP UP 192.168.27.2


21

Fencing Configuration

Fencing is needed in order to protect the cluster from a split-brain situation. If the nodes loss all the

heartbeat communication, a mechanism is needed to decide whether the other server is alive and what

node will continue running the application.

FSS does not need SCSI3 keys, given that the storage is local. That overrides the risks of having a shared

configuration, where several servers can write to the same device. With FSS, there is only one final

writer, which is the node where the storage is attached. It is this node that will be in charge of

protecting the data.

In order to decide what servers should be up and running after a split-brain, arbitration via Coordination

Point (CP) Servers is needed. For a production environment, 3 CP Servers are required. For this

deployment we are going to present an example where 1 CP Server will be used. The same CP Servers

can be used for other clusters.

In this particular case, cps2 (VIP 10.182.100.137) is the server that will provide arbitration for the

cluster. This guide does not cover how to implement a CP Server, given that it is covered in other guides

and any CP Server can be reused.

Password-less ssh with CP Servers

In order to complete the configuration, the cluster node from where the installer with –config option

will be executed needs to have password-less ssh configured with the CP Servers.

[root@intel-eva1 ~]# cd /root

[root@intel-eva1 ~]# ssh-keygen -t dsa

Generating public/private dsa key pair.

Enter file in which to save the key (/root/.ssh/id_dsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /root/.ssh/id_dsa.

Your public key has been saved in /root/.ssh/id_dsa.pub.

The key fingerprint is:

45:c6:a2:24:b7:fc:e5:9a:1b:ff:c6:d6:a5:ca:be:76 root@intel-eva1

The key's randomart image is:

+--[ DSA 1024]----+

| .o |

| . o .o. |

| = o .. |

| + .. |

| .So |

| . . .|

| .o . . o |

| oo .= E |

| ...**+ |

+-----------------+

22

[root@intel-eva1 .ssh]# scp id_dsa.pub cps2:/root/id_dsa_eva1.pub

The authenticity of host 'cps2 (10.182.99.207)' can't be established.

RSA key fingerprint is 36:3f:e8:93:bd:3b:e1:85:fa:13:bf:7b:87:26:29:a2.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'cps2,10.182.99.207' (RSA) to the list of known hosts.

root@cps2's password:

id_dsa.pub

100% 605 0.6KB/s 00:00

[root@intel-eva1 .ssh]#

[root@intel-eva1 .ssh]# ssh cps2

root@cps2's password:

[root@cps2 ~]#

[root@cps2 .ssh]# cat /root/id_dsa_eva1.pub >> /root/.ssh/authorized_keys

[root@cps2 .ssh]# exit

logout

Connection to cps2 closed.

And verify the password-less ssh works:

[root@intel-eva1 .ssh]# ssh cps2 uname -a

Linux cps2 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64

x86_64 GNU/Linux

[root@intel-eva1 .ssh]#

Configuration

Call the installer with –fencing option:

[root@intel-eva1 ~]# cd /opt/VRTS/install

[root@intel-eva1 install]# ./installsfcfsha61 -fencing

Enter the name of one system in the VCS cluster for which you would like to configure I/O fencing:


Cluster information verification:

Cluster Name: intel-eva-clus

Cluster ID Number: 60717

Systems: intel-eva1 intel-eva2

Would you like to configure I/O fencing on the cluster? [y,n,q] (y)

23

For FSS with no shared storage, option 1 will be chosen:

Fencing configuration

1) Configure Coordination Point client based fencing

2) Configure disk based fencing

3) Configure fencing in disabled mode

4) Replace/Add/Remove coordination points

5) Refresh keys/registrations on the existing coordination points

6) Set the order of existing coordination points

Select the fencing mechanism to be configured in this Application Cluster: [1-6,q,?] 1

This is our first deployment, so there are no issues restarting VCS. If fencing is being enabled after some

resources have already being created, these will have to be restarted

This I/O fencing configuration option requires a restart of VCS. Installer will stop VCS at a later

stage in this

run. Note that the service groups will be online only on the systems that are in the 'AutoStartList'

after restarting VCS. Do you want to continue? [y,n,q,b,?] y

We are using local storage and therefore we do not have shared storage with SCSI3 PR

Does your storage environment support SCSI3 PR? [y,n,q,b,?] n

Non-SCSI3 fencing will be configured

In this environment, either Non-SCSI3 fencing can be configured or fencing can be configured in

disabled mode

Do you want to configure Non-SCSI3 fencing? [y,n,q,b] (y) y

3 CP Servers are required for a production environment. For this setup, only 1 CP Server will be used

Enter the total number of coordination points. All coordination points should be Coordination Point

servers: [b]

(3) 1

In case your CP Server is connected via several networks, you can add all of them to the configuration,

so the cluster nodes will try to reach the CP Servers on different interfaces. For this configuration we

have single network to connect to the CP Server. You will also need to enter the Virtual IP the CP Server

is listening on


You are now going to be asked for the Virtual IP addresses or fully qualified host names of the

Coordination Point

Servers. Note that the installer assumes these values to be the identical as viewed from all the client

cluster

nodes.


24

How many IP addresses would you like to use to communicate to Coordination Point Server #1? [b,q,?] (1)

1

Enter the Virtual IP address or fully qualified host name #1 for the HTTPS Coordination Point Server

#1: [b] 10.182.100.137

IF the previous step to configure password-less rsh or ssh were not successful, this message may appear

at this point. Make sure that you can ssh without password from the node where you are running the

fencing configuration and the CP Servers


#1: [b] 10.182.100.137

Cannot communicate with system 10.182.100.137. Make sure password-less rsh or ssh is configured or the

CP Server is

up and running.


#1: [b]

If password-less ssh was enabled as expected then the configuration can continue. Accept the default

port


#1: [b] 10.182.100.137

Enter the port that the coordination point server 10.182.100.137 would be listening on or accept the

default port

suggested: [b] (443)

Review the configuration is correct:

CPS based fencing configuration: Coordination points verification

Total number of coordination points being used: 1

Coordination Point Server ([VIP or FQHN]:Port):

1. 10.182.100.137 ([10.182.100.137]:443)

Is this information correct? [y,n,q] (y)

CPS based fencing configuration: Client cluster verification

CPS Admin utility : /opt/VRTScps/bin/cpsadm

Cluster ID: 60717

Cluster Name: intel-eva-clus

UUID for the above cluster: {313a17b2-1dd2-11b2-a040-a4616f743004}

Is this information correct? [y,n,q] (y)

25

Verify that all the registrations have happened:

Updating client cluster information on Coordination Point Server 10.182.100.137

Adding the client cluster to the Coordination Point Server 10.182.100.137 .................. Done

Registering client node intel-eva1 with Coordination Point Server 10.182.100.137 ........... Done

Adding CPClient user for communicating to Coordination Point Server 10.182.100.137 ......... Done

Adding cluster intel-eva-clus to the CPClient user on Coordination Point Server 10.182.100.137 ...

Done

Registering client node intel-eva2 with Coordination Point Server 10.182.100.137 ........... Done

Adding CPClient user for communicating to Coordination Point Server 10.182.100.137 ......... Done

Adding cluster intel-eva-clus to the CPClient user on Coordination Point Server 10.182.100.137 ...

Done

Installer will stop VCS before applying fencing configuration. To make sure VCS shuts down

successfully, unfreeze

any frozen service group and unmount the mounted file systems in the cluster.

Are you ready to stop VCS and apply fencing configuration on all nodes at this time? [y,n,q] (y)

VCS will be restarted and fencing configuration will be applied

Stopping VCS on intel-eva2 ................................................................... Done

Stopping Fencing on intel-eva2 ............................................................... Done

Stopping VCS on intel-eva1 ................................................................... Done

Stopping Fencing on intel-eva1 ............................................................... Done

Updating /etc/vxfenmode file on intel-eva1 ................................................... Done

Updating /etc/vxenviron file on intel-eva1 ................................................... Done

Updating /etc/sysconfig/vxfen file on intel-eva1 ............................................. Done

Updating /etc/llttab file on intel-eva1 ...................................................... Done

Updating /etc/vxfenmode file on intel-eva2 ................................................... Done

Updating /etc/vxenviron file on intel-eva2 ................................................... Done

Updating /etc/sysconfig/vxfen file on intel-eva2 ............................................. Done

Updating /etc/llttab file on intel-eva2 ...................................................... Done

Starting Fencing on intel-eva1 ............................................................... Done

Starting Fencing on intel-eva2 ............................................................... Done

Updating main.cf with fencing ................................................................ Done

Starting VCS on intel-eva1 ................................................................... Done

Starting VCS on intel-eva2 ................................................................... Done

The Coordination Point Agent monitors the registrations on the coordination points.

Do you want to configure Coordination Point Agent on the client cluster? [y,n,q] (y)

It is recommended to configure the Coordination Point Agent on the cluster. The goal is to proactively

detect any anomaly with the CP Servers

Do you want to configure Coordination Point Agent on the client cluster? [y,n,q] (y) y

Enter a non-existing name for the service group for Coordination Point Agent: [b] (vxfen)

Adding Coordination Point Agent via intel-eva1 .............................................. Done

I/O Fencing configuration ................................................................... Done

26

I/O Fencing configuration completed successfully

And finally you can verify fencing has been enabled

[root@intel-eva1 install]# vxfenadm -d

I/O Fencing Cluster Information:

================================

Fencing Protocol Version: 201

Fencing Mode: Customized

Fencing Mechanism: cps

Cluster Members:

* 0 (intel-eva1)

1 (intel-eva2)

RFSM State Information:

node 0 in state 8 (running)

node 1 in state 8 (running)

[root@intel-eva1 install]#

Intel SSD Configuration

Drives configuration

This section explains the steps given to configure the five Intel SSD cards used in each node.

The RAID controller used is

07:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108

[Liberator] (rev 03)

The RAID controller is managed by the MegaCLI utility, which can be downloaded from LSI support page:

http://www.lsi.com/support/Pages/download-results.aspx?keyword=MegaCli

Install the latest MegaCLI package: MegaCli-8.07.10-1.noarch.rpm

[root@intel-eva2 Linux MegaCLI 8.07.10]# rpm -ivh MegaCli-8.07.10-1.noarch.rpm

Preparing... ########################################### [100%]

1:MegaCli ########################################### [100%]

[root@intel-eva2 Linux MegaCLI 8.07.10]#

If the SSDs are not visible after reboot, enter Control + G during boot to enter the controller

configuration and make the SSDs accessible.

Verify that the Operating System is able to detect them:

http://www.lsi.com/support/Pages/download-results.aspx?keyword=MegaCli

27

[root@intel-eva1]# dmesg | grep SSD

scsi 0:0:0:0: Direct-Access ATA INTEL SSDSC2BA80 0265 PQ: 0 ANSI: 5






Using MegaCli64 utility the slot number, capacity, ID etc can be verified. Here some of the fields for the

four local SSDs:

[root@intel-eva1]# /opt/MegaRAID/MegaCli/MegaCli64 -PDlist -a0 | egrep "Device

ID|Slot Number|WWN|PD Type|Raw Size"

Enclosure Device ID: 252

Slot Number: 0

WWN: 5000C5000CB3B3A0

PD Type: SAS

Raw Size: 136.732 GB [0x11177330 Sectors]


Slot Number: 1

WWN: 50015178f361d7dd

PD Type: SATA

Raw Size: 745.211 GB [0x5d26ceb0 Sectors]


Slot Number: 2

WWN: 50015178f361d5fd

PD Type: SATA



Slot Number: 3

WWN: 50015178f361d807

PD Type: SATA



Slot Number: 4

WWN: 50015178f361d642

PD Type: SATA



Slot Number: 5

WWN: 50015178f355f328

PD Type: SATA

Raw Size: 186.310 GB [0x1749f1b0 Sectors]


Slot Number: 6

WWN: 5000C5000CB3BAE8

PD Type: SAS

Raw Size: 136.732 GB [0x11177330 Sectors]

28


Slot Number: 7

WWN: 50015178f355f0ce

PD Type: SATA

Raw Size: 186.310 GB [0x1749f1b0 Sectors]

If the devices are not accessible to the OS, MegaCli64 can be used to add them. This is one example:

[root@intel-eva1]# ./MegaCli64 -CfgLdAdd -r0 [252:1] -a0

Adapter 0: Created VD 1

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

Verify the OS can see them:

[root@intel-eva1 ~]# lsscsi | grep INTEL

[0:2:0:0] disk INTEL RS2BL080 2.12 /dev/sda

[0:2:1:0] disk INTEL RS2BL080 2.12 /dev/sdb

[0:2:2:0] disk INTEL RS2BL080 2.12 /dev/sdc

[0:2:3:0] disk INTEL RS2BL080 2.12 /dev/sdd

[0:2:4:0] disk INTEL RS2BL080 2.12 /dev/sde

[0:2:5:0] disk INTEL RS2BL080 2.12 /dev/sdf

[0:2:6:0] disk INTEL RS2BL080 2.12 /dev/sdg

Tuning SSD performance

Here some optimizations for Linux IO

Use noop/deadline (default is cfq) at /sys/block/sdX/queue/scheduler

Turn rotational=0

Turn off read_ahead_kb=0

First identify the physical name of each of the SSDs plugged in each sever.

Node 1 SetUp

[root@intel-eva1 ~]# vxdisk -e list

DEVICE TYPE DISK GROUP STATUS

OS_NATIVE_NAME ATTR

disk_1 auto:cdsdisk - - online sdd

-

disk_2 auto:cdsdisk - - online sde

-

29

disk_3 auto:cdsdisk - - online sdb

-

disk_4 auto:cdsdisk - - online sdc

-

disk_5 auto:cdsdisk - - online sdf

-

disk_6 auto:cdsdisk - - online sdg

-

Verify which the default values are:

[root@intel-eva1 queue]# pwd

/sys/block/sdb/queue

[root@intel-eva1 queue]# cat rotational

1

[root@intel-eva1 queue]# cat read_ahead_kb

128

[root@intel-eva1 queue]# cat scheduler

noop anticipatory deadline [cfq]

[root@intel-eva1 queue]#

And now modify the values for the SSDs:

# echo deadline > /sys/block/sdd/queue/scheduler

# echo deadline > /sys/block/sde/queue/scheduler

# echo deadline > /sys/block/sdb/queue/scheduler

# echo deadline > /sys/block/sdc/queue/scheduler

# echo deadline > /sys/block/sdf/queue/scheduler

# echo deadline > /sys/block/sdg/queue/scheduler

# echo 0 > /sys/block/sdd/queue/rotational

# echo 0 > /sys/block/sde/queue/rotational

# echo 0 > /sys/block/sdb/queue/rotational

# echo 0 > /sys/block/sdc/queue/rotational

# echo 0 > /sys/block/sdf/queue/rotational

# echo 0 > /sys/block/sdg/queue/rotational

# echo 0 > /sys/block/sdd/queue/read_ahead_kb

# echo 0 > /sys/block/sde/queue/read_ahead_kb

# echo 0 > /sys/block/sdb/queue/read_ahead_kb

# echo 0 > /sys/block/sdc/queue/read_ahead_kb

# echo 0 > /sys/block/sdf/queue/read_ahead_kb

# echo 0 > /sys/block/sdg/queue/read_ahead_kb

Node 2 SetUp

[root@intel-eva2 init.d]# vxdisk -e list

30


OS_NATIVE_NAME ATTR

disk_1 auto:cdsdisk - - online sdb

-

disk_2 auto:cdsdisk - - online sdc

-

disk_3 auto:cdsdisk - - online sdd

-

disk_4 auto:cdsdisk - - online sde

-

disk_5 auto:cdsdisk - - online sdf

-

disk_6 auto:cdsdisk - - online sdg

-

Modify the values for the SSDs:

# echo deadline > /sys/block/sdd/queue/scheduler

# echo deadline > /sys/block/sde/queue/scheduler

# echo deadline > /sys/block/sdb/queue/scheduler

# echo deadline > /sys/block/sdc/queue/scheduler

# echo deadline > /sys/block/sdf/queue/scheduler

# echo deadline > /sys/block/sdg/queue/scheduler

# echo 0 > /sys/block/sdd/queue/rotational

# echo 0 > /sys/block/sde/queue/rotational

# echo 0 > /sys/block/sdb/queue/rotational

# echo 0 > /sys/block/sdc/queue/rotational

# echo 0 > /sys/block/sdf/queue/rotational

# echo 0 > /sys/block/sdg/queue/rotational

# echo 0 > /sys/block/sdd/queue/read_ahead_kb

# echo 0 > /sys/block/sde/queue/read_ahead_kb

# echo 0 > /sys/block/sdb/queue/read_ahead_kb

# echo 0 > /sys/block/sdc/queue/read_ahead_kb

# echo 0 > /sys/block/sdf/queue/read_ahead_kb

# echo 0 > /sys/block/sdg/queue/read_ahead_kb

Volumes and File Systems Configuration Two file systems are going to be created. One will be used for Oracle data under /tpccdata and other

for Oracle redo logs under /tpcclog.

The /tpccdata file system will have one stripe across the 3 x 745GB SSDs and it will be mirrored to the

other node. The /tpcclog file system will use one stripe across the 2 x 186GB SSDs and it will also be

mirrored for redundancy.

31

Given that the disks are named from disk_1 to disk_6, and that there is a mix of sizes, the first step will

be to identify each of them. For clarity reasons, we are going to rename them to make their utilization

easier.

Initialize and rename the internal SSD devices

Initialize the SSDs in each of the nodes. Note that in our configuration we have an extra SSD 800GB

device, but that is not going to be used in our setup. Example for the first node node.

[root@intel-eva1 ~]# vxdisk list


disk_1 auto:none - - online invalid






Use vxdisksetup command to write the label.

[root@intel-eva1 ~]# vxdisksetup -i disk_1






Verify disks have been initialized.



disk_1 auto:cdsdisk - - online






Repeat the same steps in the second node.

Now we are going to identify the capacity for each device.

[root@intel-eva1 ~]# for i in 1 2 3 4 5 6; do echo "disk_$i"; vxdisk list disk_$i

| grep public ; done

disk_1

public: slice=3 offset=65792 len=1560343712 disk_offset=0

disk_2


32

disk_3


disk_4


disk_5


disk_6



The len is specified in sector units, so 388593808 = 185GB and 1560343712 = 744GB

We are going to rename those disks according to the following table:

Original Name Size New name

disk_1 744GB Intel_SSD_744_1



disk_4 744GB



Use vxdmpadm command to rename those devices:

[root@intel-eva1 ~]# vxdmpadm setattr dmpnode disk_1 name=Intel_SSD_744_1





And verify the changes:



Intel_SSD_185_1 auto:cdsdisk - - online






Run the same steps on the second node, taking note of the sizes and rename accordingly. These are the

specific steps for our configuration:

[root@intel-eva2]# for i in 1 2 3 4 5 6; do echo "disk_$i"; vxdisk list disk_$i |

grep public ; done

33

disk_1


disk_2


disk_3


disk_4


disk_5


disk_6


We are going to rename those disks according to the following table:

Original Name Size New name




disk_4 744GB



[root@intel-eva2]# vxdmpadm setattr dmpnode disk_1 name=Intel_SSD_185_1





And the final naming configuration:

[root@intel-eva2]# vxdisk list








Make internal SSD devices available to the cluster

34

Each of the nodes within our SFCFSHA setup have to export those five SSDs so they are visible across the

other cluster node. Use vxdisk command as follows. First in the node 1:

[root@intel-eva1 ~]# vxdisk export Intel_SSD_185_1 Intel_SSD_185_2 Intel_SSD_744_1

Intel_SSD_744_2 Intel_SSD_744_3

And then export the SSDs from second node.

[root@intel-eva2 ~]# vxdisk export Intel_SSD_185_1 Intel_SSD_185_2 Intel_SSD_744_1

Intel_SSD_744_2 Intel_SSD_744_3

Now run vxdisk list in each of the nodes and verify that the local SSD appears as “online exported”,

while the remote one appears as “online remote”.

On the first node:



Intel_SSD_185_1 auto:cdsdisk - - online exported






intel-eva2_Intel_SSD_185_1 auto:cdsdisk - - online remote





sda auto:none - - online invalid

On the second node:














sda auto:none - - online invalid

35

Create a File System for redo logs

From one of the nodes, we are going to create a disk group of fss type. This disk group will have a plex in

each node, so the data will be protected across the cluster nodes. Each write will be done in parallel to

the two nodes, while the reads will be served from the local node.

Create tpcclog01 diskgroup using the two internal plus two remote 185GB disks:

[root@intel-eva1 ~]# vxdg -o fss -s init tpcclog01 Intel_SSD_185_1 Intel_SSD_185_2

intel-eva2_Intel_SSD_185_1 intel-eva2_Intel_SSD_185_2

Create tpcclog volume:

[root@intel-eva1 ~]# vxassist -g tpcclog01 make tpcclog maxsize ncolumns=2

layout=mirror-stripe

Because the disk group has been created as FSS type, vxassist command will automatically create a

plex in each node and it will include a DCO volume for Fast Mirror Resync (FMR).

[root@intel-eva1 ~]# vxprint -g tpcclog01

TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0

dg tpcclog01 tpcclog01 - - - - - -

dm Intel_SSD_185_1 Intel_SSD_185_1 - 388593808 - - - -


dm intel-eva2_Intel_SSD_185_1 intel-eva2_Intel_SSD_185_1 - 388593808 - REMOTE - -


v tpcclog fsgen ENABLED 776925184 - ACTIVE - -

pl tpcclog-01 tpcclog ENABLED 776925184 - ACTIVE - -

sd Intel_SSD_185_1-01 tpcclog-01 ENABLED 388462592 0 - - -

sd Intel_SSD_185_2-01 tpcclog-01 ENABLED 388462592 0 - - -

pl tpcclog-02 tpcclog ENABLED 776925184 - ACTIVE - -

sd intel-eva2_Intel_SSD_185_1-01 tpcclog-02 ENABLED 388462592 0 - - -

sd intel-eva2_Intel_SSD_185_2-01 tpcclog-02 ENABLED 388462592 0 - - -

dc tpcclog_dco tpcclog - - - - - -

v tpcclog_dcl gen ENABLED 94720 - ACTIVE - -

pl tpcclog_dcl-01 tpcclog_dcl ENABLED 94720 - ACTIVE - -

sd Intel_SSD_185_1-02 tpcclog_dcl-01 ENABLED 94720 0 - - -

pl tpcclog_dcl-02 tpcclog_dcl ENABLED 94720 - ACTIVE - -

sd intel-eva2_Intel_SSD_185_1-02 tpcclog_dcl-02 ENABLED 94720 0 - - -

It is important to notice that a mirror-stripe supports FMR, while a stripe-mirror is a layered volume

setup which currently does not support FMR.

36

Create a file system with bsize=4096 for the redo logs:

[root@intel-eva1 ~]# mkfs -t vxfs -o bsize=4096 /dev/vx/rdsk/tpcclog01/tpcclog

version 10 layout

776925184 sectors, 97115648 blocks of size 4096, log size 16384 blocks

rcq size 2048 blocks

largefiles supported

maxlink supported

Add the mount point to the cluster configuration:

[root@intel-eva1 ~]# cfsmntadm add tpcclog01 tpcclog /tpcclog all=crw

Mount Point is being added...

/tpcclog added to the cluster-configuration

Mount it:

[root@intel-eva1 ~]# cfsmount /tpcclog

Mounting...

[/dev/vx/dsk/tpcclog01/tpcclog] mounted successfully at /tpcclog on intel-eva1

[/dev/vx/dsk/tpcclog01/tpcclog] mounted successfully at /tpcclog on intel-eva2


And provide permission for oracle user:

[root@intel-eva1 ]# chown oracle:oinstall /tpcclog

Create a File System for Data

Create tpccdata01 diskgroup using the three internal plus three remote SSDs for 744GB:

[root@intel-eva1 ~]# vxdg -o fss -s init tpccdata01 Intel_SSD_744_1

Intel_SSD_744_2 Intel_SSD_744_3 intel-eva2_Intel_SSD_744_1 intel-

eva2_Intel_SSD_744_2 intel-eva2_Intel_SSD_744_3

Create a 2TB volume striped across the three local SSDs and mirrored across the remote SSDs located in

the other node.

[root@intel-eva1 ~]# vxassist -g tpccdata01 make tpccdata 2T ncolumns=3

layout=mirror-stripe

Verify the configuration.

[root@intel-eva1 ~]# vxprint -g tpccdata01

TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0

37

dg tpccdata01 tpccdata01 - - - - - -







v tpccdata fsgen ENABLED 4294967296 - ACTIVE - -

pl tpccdata-01 tpccdata ENABLED 4294967424 - ACTIVE - -

sd Intel_SSD_744_1-01 tpccdata-01 ENABLED 1431655808 0 - - -



pl tpccdata-02 tpccdata ENABLED 4294967424 - ACTIVE - -

sd intel-eva2_Intel_SSD_744_1-01 tpccdata-02 ENABLED 1431655808 0 - - -



dc tpccdata_dco tpccdata - - - - - -

v tpccdata_dcl gen ENABLED 217216 - ACTIVE - -

pl tpccdata_dcl-01 tpccdata_dcl ENABLED 217216 - ACTIVE - -

sd Intel_SSD_744_1-02 tpccdata_dcl-01 ENABLED 217216 0 - - -

pl tpccdata_dcl-02 tpccdata_dcl ENABLED 217216 - ACTIVE - -

sd intel-eva2_Intel_SSD_744_1-02 tpccdata_dcl-02 ENABLED 217216 0 - - -


And create a file system using bsize=8192.

[root@intel-eva1 ~]# mkfs -t vxfs -o bsize=8192 /dev/vx/rdsk/tpccdata01/tpccdata

version 10 layout

4294967296 sectors, 268435456 blocks of size 8192, log size 32768 blocks

rcq size 8192 blocks

largefiles supported

maxlink supported

Add the mount point to the cluster:

[root@intel-eva1 ~]# cfsmntadm add tpccdata01 tpccdata all=crw

Mount Point is being added...

/tpccdata added to the cluster-configuration


And mount it:

[root@intel-eva1 ~]# cfsmount /tpccdata

Mounting...

[/dev/vx/dsk/tpccdata01/tpccdata] mounted successfully at /tpccdata on intel-

eva1

38

[/dev/vx/dsk/tpccdata01/tpccdata] mounted successfully at /tpccdata on intel-

eva2

Finally add Oracle permissions for the folder:

[root@intel-eva1 ]# chown oracle:oinstall /tpccdata

Oracle Configuration and Tuning This section will highlight some key points during Oracle configuration that needs to be followed. It is

not a complete guide to install and configure Oracle. It will highlight some tuning to get the best

performance from the Intel SSD devices.

Installing Oracle binaries in each node

Oracle Single instance will be used in this configuration. This will provide a simpler and cheaper

configuration. With Fast Failover VCS capabilities and the utilization of Intel SSDs as the storage, we will

be creating a high resilience configuration with very fast recover capabilities.

If none of the enterprise features is going to be used, and given the number of sockets we have in this

setup (4), the Standard Edition can be used:

39

Oracle binaries will be installed under /oracle folder in each node of the cluster

Perform the same steps in the second node of the cluster to install the binaries locally

Instance configuration

The Database Configuration Assistant (dbca) from Oracle will be used to configure the first instance that

will be used. Here the steps needed to create the instance on the file systems that has been created will

be specified.

The first step taken is to configure a listener using netca tool. LISTENER will be created:

[oracle@intel-eva1]$ netca

Oracle Net Services Configuration:

Configuring Listener:LISTENER

Listener configuration complete.

Oracle Net Listener Startup:

Running Listener Control:

/oracle/product/11.2.0/dbhome_1/bin/lsnrctl start LISTENER

Listener Control complete.

Listener started successfully.

Oracle Net Services configuration successful. The exit code is 0

The database instance will be identified as tpcc:

40

At the step 6 of the installer, make sure to select File System as the storage type, and enter /tpccdata

as the folder to store the database files:

At step 10, click on each of the Redo Log Groups and modify the directory name to /tpcclog for 1, 2

and 3:

41

Finally, the folder $ORACLE_BASE/admin/SID needs to be copied to node 2

Oracle Disk Manager configuration

Oracle Disk Manager (ODM) provides support for Oracle’s file management and I/O calls for database

storage on VxFS file systems. ODM provides true kernel asynchronous I/O for files, reduce system call

overhead, improve file system layout by preallocating contiguous files on a VxFS file system and

provides performance on file systems that is equivalent to raw devices.

ODM is transparent for users once it is enabled.

First make sure the Oracle database is down.

Then we need to create a link from the Oracle directory to VRTSodm one. Make sure this file exists:

$ ls -l /opt/VRTSodm/lib64/libodm.so

-rwxr-xr-x. 1 bin bin 71358 Sep 30 03:42 libodm.so

Then go to $ORACLE_HOME/lib folder (/oracle/product/11.2.0/dbhome_1/lib) in our case and

check the libodm library:

[oracle@intel-eva1]$ cd $ORACLE_HOME/lib

[oracle@intel-eva1]$ ls -l libodm*

-rw-r--r--. 1 oracle oinstall 7442 Aug 14 2009 libodm11.a

lrwxrwxrwx. 1 oracle oinstall 12 Oct 24 22:38 libodm11.so -> libodmd11.so

-rw-r--r--. 1 oracle oinstall 12331 Aug 14 2009 libodmd11.so

Move the current link:

[oracle@intel-eva1]$ mv libodm11.so libodm11.so.preVxFS

And create a new link to the VRTSodm library:

[oracle@intel-eva1]$ ln -s /opt/VRTSodm/lib64/libodm.so libodm11.so

Check the link is correct:

[oracle@intel-eva1 ]$ ls -l libodm*

-rw-r--r--. 1 oracle oinstall 7442 Aug 14 2009 libodm11.a

lrwxrwxrwx. 1 oracle oinstall 28 Oct 29 00:49 libodm11.so ->

/opt/VRTSodm/lib64/libodm.so

lrwxrwxrwx. 1 oracle oinstall 12 Oct 24 22:38 libodm11.so.preVxFS ->

libodmd11.so

-rw-r--r--. 1 oracle oinstall 12331 Aug 14 2009 libodmd11.so

42

Repeat the same steps in the other node.

Start Oracle again.

Taking a look to the log file /oracle/diag/rdbms/tpcc/tpcc/trace/alert_tpcc.log we can verify

that Oracle is usin the VRTSodm library:

db_name = "tpcc"

open_cursors = 300

diagnostic_dest = "/oracle"

Oracle instance running with ODM: Veritas 6.1.0.000 ODM Library, Version 2.0

Tue Oct 29 00:55:07 2013

PMON started with pid=2, OS id=30440

Oracle redo log configuration

To obtain a better performance for the redo log configuration, it is advisable to modify the default 512

bytes block size to 4096 bytes. In this cluster, five 40G redo log files where added with the following

procedure:

SQL> ALTER DATABASE ADD LOGFILE ‘/tpcclog/log_4k_1’ size 41000M blocksize 4096 ;

And we repeat the same for log_4k_2 until log_4k_5

SQL> alter system switch logfile;

And eview alert_tpcc.log until you see the new log is being used, then force a checkpoint:

SQL> alter system checkpoint local;

Remove the old redo log files. First identify the group number running

SQL> select * from v$logfile;

And remove them:

SQL> alter database drop logfile group 1;

Oracle huge pages configuration

The following tuning has been performed in order to enable hug pages for SGA.

Modify file /etc/sysctl.conf in both servers with:

vm.nr_hugepages=200000

43

Increase memlock soft/hard limits for Oracle user. On both servers edit the file

/etc/security/limits.conf with:

oracle soft memlock 410000000

oracle hard memlock 410000000

Oracle HA and Fast Failover Configuration

Oracle agent configuration in VCS

In order to configure the Oracle agent for HA, it is needed to import its definition file. This can be done

manually, using the Cluster Manager console, or using Veritas Operation Manager (VOM). Here we are

going to present the example using VOM, as it provide a global view for storage and HA and will simplify

solution maintenance.

On the Availability tab, right click on the cluster name and select “Import Type Definition”

In the Import Type Definition window type /etc/VRTSagents/ha/conf/Oracle/OracleTypes.cf:

44

One Oracle Service Group will be created to have all the resources needed for the tpcc instance to be

run. That will include IP address, NIC, Oracle database and Oracle listener. Another service group to hold

the volumes and mount points that will be running in the two nodes at the same time will also be

created. Finally there will be dependencies among all the service groups in the cluster. These steps are

explained below.

Remove previous HA configuration for the mount points

In a previous step, a CFS mount point for /tpcclog and /tpccdata were created. This was done using

the CLI in order to have the storage available to create the first instance. Those mount points are going

to be added now into the new service group that is going to be created, so we are going to remove them

from the configuration in order to add them to the proper service group.

[root@intel-eva1 ~]# cfsmntadm delete /tpcclog

[root@intel-eva1 ~]# cfsmntadm delete /tpccdata

Please note that these steps are only removing the mount points from the HA configuration. The mount

points and file systems are still available on the system. Here we are only going to configure those under

the other service group with the other resources.

Service Groups and Resources for Oracle HA

Next step will be creating two service groups where we the resources needed for Oracle will be added.

The first one will be a parallel service group containing the mount points that had been previously

removed. The second will hold Oracle resources.

45

Service Group tpcc_data

Right click on Service Group and select Create Service Group. This one will be parallel and will be named

tpcc_data:

Add the two systems available to the service group and click finish. This is a parallel service group, as the

mount points will be available in both systems at the same time. They will be a base framework for a

Fast Failover configuration, making the data continuously available to the two nodes of the cluster.

Now both /tpccdata and /tpcclog will be added into the tpcc_data service group. Right click on the

tpcc_data service group and select Add/Modify Resources

On the following window, add these four resources to the tpcc_data service group:

46

Then click on the dotted icon and add the properties for each of the resources:

For the Clustered Volume ones:

Name Type CVMDiskGroup CVMVolume CVMActivation

tpccdata_vol CVMvolDg tpccdata01 tpccdata sw

tpcclog_vol CVMvolDg tpcclog01 tpcclog sw

And for the CFS mount points:

Name Type Block Device Mount Point

MountOpt

tpccdata_mnt CFSMount /dev/vx/dsk/tpccdata01/tpccdata /tpccdata Crw

tpcclog_mntl CFSMount /dev/vx/dsk/tpcclog01/tpcclog /tpcclog Crw

Then click on Enabled and remove the Critical check for now to avoid any error until you check that the

entire configuration is ok. Then click on Finish:

There is a dependency between those resources. The mount points are the parents of the volumes. Click

next and add these dependencies:

47

This will be the final dependency view:

To make sure that the service group is automatically brought online upon system restart, right click on

the service group and select Properties. Click on the Attributes tab and then right click on the

AutoStartList to edit it.

48

Add the two nodes to the list:

Service Group tpcc_instance

Next step will be to configure a specific service group for the tpcc Oracle instance. Follow the same

instructions taken to create tpcc_data service group, but this time select a Failover service group, given

that the tpcc instance will be running only in a node at a time:

Create the following resources for this service group:

49

And set the properties for each of them:

Listener:

- Home: /oracle/product/11.2.0/dbhome_1

- Owner: oracle

NIC:

- Device: eth0

Oracle:

- Home: /oracle/product/11.2.0/dbhome_1

- Owner: oracle

- Sid: tpcc

IP:

- Address: 10.182.100.138

- Device: eth0

- Netmask: 255.255.248.0

And create the following dependencies between those resources:

50

As in the previous service group, modify the AutoStartList to include both nodes of the cluster.

This will be final hierarchy:

So far, two service groups have been created. One parallel for data that will take care of the volumes

and the mount points, and will be active in the two nodes at the same time, and other one has the

resources needed to bring the database up and monitor it. The second service group needs to have the

first one available, so there is a service group dependency between those two. To create this

dependency, right click on the tpcc_instance service group, select Edit and Link:

Select tpcc_instance as the partent group, tpcc_data as the child, and Online Local Firm:

51

The tpcc_data service group also depends on cvm service group, so create their dependency:

This will be final service group dependency for the cluster:

52

Once the configuration has been tested and it has been verified that the Oracle instance can failover

between nodes, set the resources as critical so the cluster will take the proper actions in case of failure.

Fast Failover Setting

SFCFSHA is able to provide a Fast Failover framework for Oracle based in two technologies. First, the

usage of Cluster Volume Manager and Cluster File System provides a single name space that is currently

accessible from all the nodes of the cluster. This means that there is no delay by having to make the data

accessible to any node, as the file systems are currently mounted. This is the goal of the tpcc_data

service group.

Second is the utilization of Asynchronous Monitoring Framework for the VCS agents. This framework

allows detection of any application failure in real time. Previous released needed manual configuration,

but that is no longer the case for SFCFS 6.1 version. For reference, we are going to include where in the

configuration this property is being specified.

The file /etc/sysconfig/amf determines that AMF will start and stop:

cat /etc/sysconfig/amf

AMF_START=1

AMF_STOP=1

The following agents used in this configuration already support IMF:

53

- CVMCluster

- CVMVxconfigd

- CVMVolDg

- CFSMount

- CFSfsckd

- Coordination Point Agent

- Oracle

- Netlsnr

This command can be used to check that AMF module is loaded:

[root@intel-eva1 ~]# /etc/init.d/amf status

AMF: Module loaded and configured

To verify IMF properties, right click on each agent and select Properties. Look for the IMF attribute. This

is the example for Oracle:

Mode 3 means it is enabled for both Online and Offline operations. In addition of asynchronous

monitoring, every 5 minutes (MonitorFreq) a normal monitor is triggered. The Netlsnr agent has the

same properties by default:

high availability and performance oracle configuration...

Documents