epicibm best practices 2012 final

39
© Copyright IBM Corporation, 2012 EPIC/IBM BEST PRACTICES – v2.1 Anita Govindjee – IBM Jean-Luc Degrenand – IBM Christopher Strauss – IBM February 13, 2012

Upload: gmarxou

Post on 26-Oct-2015

588 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

EPIC/IBM BEST PRACTICES – v2.1

Anita Govindjee – IBM Jean-Luc Degrenand – IBM Christopher Strauss – IBM

February 13, 2012

Page 2: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

INTRODUCTION 4

A General Description of the Epic Product 4 Tiered database architecture utilizing Caché Enterprise Cache Protocol (ECP) technology 6

THE EPIC HARDWARE PLATFORM SIZING PROCESS 12

A DESCRIPTION OF THE INTERSYSTEMS CACHÉ DATABASE ENGINE 13

GENERAL GUIDELINES FOR STORAGE HARDWARE 14

General Concepts 15

The Use of RAID 16

How Data is processed through the Storage System 17

A Typical Layout of the Epic Production Caché data volumes 18

FlashCopy 20

EasyTier 20

CONFIGURATION GUIDELINES FOR THE DS8000 SERIES ENTERPRISE STORAGE SYSTEM 20

SVC AND THE EPIC STORAGE CONFIGURATION 22

CONFIGURATION GUIDELINES FOR THE STORWIZE V7000 MID-RANGE STORAGE SYSTEM 23

V7000 Configuration ScreenShots 24

CONFIGURATION GUIDELINES FOR THE DS5000 SERIES MID-RANGE STORAGE SYSTEM 28

CONFIGURATION GUIDELINES FOR THE XIV STORAGE SYSTEM 29

CONFIGURATION GUIDELINES FOR THE N-SERIES STORAGE SYSTEM 29

CONFIGURING THE POWER SYSTEMS AIX SERVER 29

POWER7 29

Mounting With Concurrent I/O 30

Page 3: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

Creation of Volume Groups, Logical Volumes, and File Systems for use by Caché 30

Additional System Settings 32

ADDITIONAL RECOMMENDATIONS 33

What Information Should Be Collected When A Problem Occurs 38

Page 4: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

INTRODUCTION

Epic is a Healthcare Information System (HIS) provider which delivers a comprehensive

Electronic Medical Recordkeeping System covering all aspects of the Medical Healthcare

Profession. The Epic Solution includes a variety of applications which cover such areas

as Medical Billing, Emergency Room, Radiology, Outpatient, Inpatient and Ambulatory

care.

The Epic product relies almost exclusively on an Electronic Database Management

System called Caché produced by InterSystems Corp.

Epic has two main databases technologies. The on-line transactional production (OLTP)

DB runs Caché as database engine. The analytical DB runs MS-SQL or Oracle. The

analytical DB has the highest bandwidth but the Caché OLTP DB is by far the most

critical to end user performance and consequently is where most of the attention needs to

be focused. This Best Practices guide is therefore centered on the Caché OLTP DB.

A General Description of the Epic Product

There are two fundamental architecture models which Epic uses:

(1) Single Symmetric Multiprocessing (SMP)

(2) Enterprise Cache Protocol (ECP)

The majority of customers are using the SMP architecture. Each architecture has a

production database server that is clustered in an active-passive configuration to a

failover server. The Epic production database server runs a post-relational database

developed by InterSystems Corporation called Caché. The Caché language is a modern

implementation of M (formerly MUMPS), which is a language originally created for

healthcare applications.

Page 5: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

Functional Layers Of the Epic Architecture

Epic Applications Epic Chronicles

Epic Core Utilities

InterSystems Caché

Filesystems

OSS

Hardware

Page 6: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

Single symmetric

multiprocessing (SMP) database server

Single symmetric multiprocessing (SMP) database server The single database server architecture provides the greatest ease of administration. The

SMP model today scales well up to the 16 to 24 processor range. Beyond this point, the

ECP model is required.

Tiered database

architecture utilizing Caché

Enterprise Cache Protocol (ECP) technology

Tiered database architecture utilizing Caché Enterprise Cache Protocol (ECP)

technology The tiered architecture retains a central database server with a single data storage

repository. Unlike the SMP architecture, most processing needs are offloaded to

application servers. The application servers contain no permanent data. This architecture

offers increased scaling over the SMP architecture.

Page 7: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

Page 8: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

Production database server (Epicenter OLTP) — Caché and chronicle data

repositories (CDR) live here, including clinical, financial and operational data. UNIX

server hardware is clustered (see failover server) and is SAN-attached. The production

database will be replicated to the data recovery (DR) site via Caché Shadow service or

array-based replication.

Failover server — Used only when production has problems; then takes over

functionality of the production database server. The switch from production to failover

happens in minutes. UNIX server has same configuration as production database server

and is connected to same SAN volumes. The cluster software is provided by OS vendors

and triggers when production should failover. Epic scripts are added to the software

scripts for automatically moving the application from production to failover hardware.

Application server (app server) — Caché service is running on these UNIX systems.

User processing load is distributed via content switches across the application servers,

rather than directly accessing the production database server. All permanent data lives on

the database server, but temporary data is created for local activities on the app servers.

App servers can be added or removed from the network for maintenance when necessary.

Page 9: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

Scaling performance is accomplished by adding additional app servers. App servers

cache block information brought from the database server so network traffic is not

incurred for each request for data. App servers also run ECP (Enterprise Cache Protocol),

which allows the app server to access the production database server directly over

redundant, dedicated GigE networks. If an app server fails, the client (or clients) must

reconnect and restart any unsaved activities.

(Reporting) Shadow Server — Near-real-time database of production or a delayed

mirror of what is in production based on Caché journaling process. Replicated data is

used for off-loading production reporting needs, such as Clarity. Shadow servers can also

be used for disaster recovery purposes rather than host-based or array-based replication.

The shadow server is SAN-attached.

Clarity server OLAP – Oracle or SQL RDBMS storing data extracted daily from

Reporting Shadow database server via Extract, Transfer, Load (ETL) process. The Clarity

server is SAN attached.

BusinessObjects— Windows servers will host Crystal and will run the reports that

connect to the Clarity database. The results of the reports typically are distributed out to a

file server.

HA BLOB / file server cluster — Used more by clinicals to store images, scans, voice

files and dictation files. (Can be stored on same cluster, but some customers wish to

separate them.) The HA file server cluster is SAN attached.

Web server — Connects to either application servers or production database server.

Used for the Web applications: MyChart, EpicCare Link, EpicWeb, etc. Depending on

the service functionality, it is linked to either production app servers or the production

database server via TCP/IP.

Print format server (EPS) — Converts RTF (rich-text format) to PCL/PS and controls

routing, batching, and archiving of printouts.

Print relay server — Can be run on the same server with the print format server. Used

for downtime reporting. Info from here is sent to DR PC where users can access

downtime reports.

Full Client Workstation — x86-based PC that runs the client software (Hyperspace) and

communicates to a production application server using TCP/IP. When you set up the

client on the workstation, there is an EpicComm configuration where you define the

environments (production, training, test and so forth) to which that the workstation can

connect. 1. If you choose not to use Citrix XenApp to present Epic’s Hyperspace client, or if you

require third-party devices that aren’t fully supported through XenApp, you will need

some number of full client workstations. See the Citrix XenApp Farm section for further

details on the tradeoffs between full client and thin client implementations of Epic.

Page 10: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

2. For each Epic software version, we publish workstation specifications which you can use

to determine whether or not your existing workstations are adequate to run Hyperspace. 3. If you require new workstations, we publish Workstation Purchasing Guidelines which

are reviewed regularly and are expected to exceed Epic’s minimum requirements for the

next several years. The current workstation purchasing guidelines document appears as

an appendix at the end of this document. 4. The number of workstations required will be determined in the early stages of your Epic

implementation. This depends on your facility layout, the number of staff working in a

given area, and the workflows performed in that area throughout the day. As a guideline, most organizations choose to have enough workstations so that one is readily available to

every user at the busiest time of the day, in the user’s preferred work area.

5. Epic Monitor is optional functionality that you may choose to deploy in patient rooms in intensive care settings. Epic Monitor requires Windows Presentation Foundation.

Consequently, Citrix XenApp is not a viable option for it at this time. Epic Monitor has

the following display requirements:

a. 24” touch screen monitor or larger b. Native resolution of at least 1680 x 1050

c. Resistive touch technology, which allows the use of gloves

d. For usability reasons, a stable wall mount is an absolute necessity

6. We require a round-trip network latency of 40 ms or less between full client workstations

and the Epic database server.

Thin Client Terminal – Consists of Citrix XenApp x86 servers 1. If you choose to use Microsoft Remote Desktop Services or Citrix XenApp to present

Epic’s Hyperspace client, you will need some number of thin client terminals. Low end

workstations or dedicated thin client devices work well for this purpose. You may have

existing hardware that meets this need and consequently may not have to purchase new devices.

2. It is important for you to conduct thorough testing of any thin client device that you are

considering for production use. Epic is happy to assist you in your evaluation of such

devices. 3. We recommend against the use of thin client devices with Windows Embedded CE. Our

customers have consistently had poor experience with Windows Embedded CE.

4. The number of devices required will be determined in the early stages of your Epic implementation. This depends on your facility layout, the number of staff working in a

given area, and the workflows performed in that area throughout the day. As a guideline,

most organizations choose to have enough devices so that one is readily available to

every user at the busiest time of the day, in the user’s preferred work area. 5. Epic’s testing has demonstrated that good performance can be achieved with up to 150

ms of round-trip latency between a thin client terminal and a XenApp server running

Hyperspace. If latency exceeds 150 ms, ICA protocol optimizations for high latency conditions should be employed, but may or may not yield acceptable performance.

DR PC: Houses the downtime reports.

CL/EMFI server (community lead / enterprise master file infrastructure) — Community

lead manages and maintains collaboratively built data shared across all instances in the

Page 11: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

community. Enterprise server is used as a mechanism to move static master files between

deployments in a community (a group of affiliated deployments). A common build /

vocabulary can be distributed across the organization for ease and maintenance and

consistent enterprise reporting. Essentially EMFI is an internal interface broker. Server is

not critical for real-time operations, but is needed to make community configuration

changes. The EMFI server is SAN attached and will use array-based replication to the

DR facility.

Page 12: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

THE EPIC HARDWARE PLATFORM SIZING PROCESS

IBM does not provide initial sizing of Epic implementations to customers. The primary

reason is that IBM does not have the same sort of information which Epic collects from

their customers prior to the implementation. Before an Epic deployment, Epic dedicates a

considerable set of resources in order to analyze and understand the customer’s

requirements. To accomplish this, a great deal of knowledge about the customer’s

hospital, clinic, or health care environment as well as knowledge about the Epic software

is needed.

Epic will periodically visit an IBM benchmark center for purposes of running multiple

benchmarks using the latest available IBM storage and server equipment. Testing is also

performance on-site at Epic in Verona, Wisconsin with the latest IBM server and storage

systems. During these benchmarks, Epic will test multiple types of simulated customer

loads in a variety of simulated environments. The data which is collected will be used to

generate a generic set of sizing parameters along various performance measurement axes.

These parameters are used to calculate an appropriate sizing for a specific customer,

based on their specific user loads and anticipated use of Epic products.

Epic will provide their customer with a sizing document, which outlines reasonably

specific recommendations regarding the size of server and storage that they recommend

for a particular customer. Next, the IBM account team can then work with the Epic

customer to further refine the exact set of hardware that is required to implement a

production version of Epic. Alterations in configuration may be required based on the

existing customer IT implementation, equipment location, cost considerations etc.

However, we strongly recommend not reducing the Epic recommended basic set of

resources such as CPUs, number of disk spindles, storage cache sizes etc. Experience has

shown that the Epic estimates are reasonably conservative. In addition, we have found

that the growth of future customer hardware resource requirements has always exceeded

original estimates.

It should be noted that Epic provides hardware sizing information with the assumption

that Epic will be the only application using the proposed configuration. Epic maintains

very strict requirements regarding response times from the database system with regards

to retrieval of data. The data is essentially 100% random access. This requires fast disk

access with little opportunity to take advantage of I/O read-aheads or the optimal use of

storage cache. Our recommendation is that both the server and storage components not be

shared with other applications. The difficulty with a shared system materializes when

trying to diagnose a performance issue in a production system. In addition to unplanned

loads associated within the Epic environment, trying to identify unplanned load

excursions presented by non-Epic applications further complicates the process of

diagnoses.

Page 13: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

Initially, the IBM account representative must request a copy of the sizing guide provided

to the customer by Epic. This document will provide everything that will be needed as far

as a working hardware configuration. The IBM account representative or business partner

can communicate with the Epic IBM Alliance team via this email: [email protected],

and can send the sizing guide copy through this email.

A DESCRIPTION OF THE INTERSYSTEMS CACHÉ DATABASE ENGINE In order to better understand the Epic environment, we need to examine the underlying

database system which is used by Epic. The main reason is that Caché interfaces directly

with the IBM hardware. Virtually, the entire Epic environment relies on Caché in order to

accomplish its work. The Caché Database engine is based on a B-Tree data storage

structure. Internal to Caché, the user data is managed as 8K byte blocks. Caché maintains

a “Global Buffer” cache in the computer’s real memory. All transactions (read and write)

between the user and the data base are read in or written from the Global Buffer. The

Global Buffer acts as a storage cache thus reducing OS level I/O requests. It also acts as a

global data locking communication system which provides mutually exclusive access to

data being referenced or changed by multiple users.

Data being referenced by one or more users will initially be read from the storage device

(disk), into the Global Buffer. The data objects are now accessible for repeated operations

including updates to the contents. Access and updates happen rapidly as the data is kept

in RAM.

As data blocks in the Global Buffer are updated they are considered “dirty”. Caché will

“flush” the “dirty” blocks at a regularly scheduled interval of about eighty seconds, or

when the percentage of dirty blocks over total global buffer exceeds the internal

threshold, whichever comes sooner. Caché has dedicated write daemon processes that

perform the actual update operations. We typically reference the “flush” of the dirty

blocks as a “write daemon cycle”.

Caché uses two-phase update technique with database updates. During the write daemon

cycle, updates are first written to CACHE.WIJ file. The updates to the actual database

file only happen after the WIJ updates have completed successfully. After all database

updates are committed, Caché will go back and mark the WIJ clean. The first WIJ writes

are sequential writes of 256 KB blocks.

While the Global Buffer is not being flushed, the I/O requests issued by Caché are strictly

read I/O in nature. This does not include the Caché Journal File which is being written

out to disk in a sequential manner. Writing to the Journal File is a continuous process, but

does not require a large number of resources. Therefore the random read operations can

occur with little or no interference.

Page 14: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

This is not the case during the “flush” or “write burst” which is initiated every eighty

seconds. While Caché continues to issue 100% read requests, the DB engine also

generates a large quantity of write requests in a very short amount of time. Epic has strict

read latency guidelines to avoid degrading user performance. Write latencies also become

increasingly important for high-end scalability. For large implementations this can lead to

a clear conflict of requiring optimal read performance while at the same time demanding

optimal write performance during intense write bursts.

The reality is that no storage system can complete both 100% reads and 100% writes

simultaneously. The performance metric which EPIC uses to determine adequate user

response time is the time required for a read request to complete. The acceptable

threshold is 15ms or less; that is, the interval of time required between the time a request

has been generated and the time that the I/O request returns control to the user with the

requested data available to the user.

Caché also keeps a time-sequenced log of database changes, known as Caché journal

files. Caché journal daemon writes out journal updates in a sequential manner every two

seconds, or when a journal buffer becomes full, whichever happens sooner. The amount

of journal updates is insignificant compared to the amount of database updates during

each write daemon cycle. Therefore we normally consider the IO operations mostly

random read operations when the write daemons are not active.

Overall, the IO access pattern of Epic/Caché system is expected to consist of

continuously random read operations topped with 80-second interval write bursts. In

order to meet Epic’s response time requirements, the read service time measured at

application level needs to be 15 ms or less in an SMP configuration and 12ms or less in

an ECP configuration.

Without adequate storage resources and diligent configuration of these resources, the read

response time will degrade during the “write burst” period. Read response times which

exceed 15ms will be perceived by the end user as an unacceptable delay in overall

performance. Depending on how under-configured or incorrectly tuned a storage system

is, response times as slow as 300+ms have been observed.

The information provided in the next sections will outline steps needed to mitigate slow

read response times. This document will not provide information regarding the makeup

of the IBM storage system family. However, we will provide references which will

furnish complete background information about IBM storage systems in general, or

provide details about specific hardware which is addressed in this document.

GENERAL GUIDELINES FOR STORAGE HARDWARE

Page 15: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

General Concepts

Much of today’s storage technology was designed to solve two general problems: (1)

safe, redundant and recoverable storage of large amounts of data and (2) rapid retrieval of

the stored data. An assumption regarding the access of the data is that reading and

updating of the information would occur at relatively constant rates. For example, data

would be read from the storage system about 70% of the time, and written to the storage

system about 30% of the time on a fixed basis. This ratio can vary widely depending on

the end-user application.

In the case of Caché, the application reads exclusively for 100% of the time. Following

an 80-second interval, in addition to the requests for data, a large set of write requests to

storage are introduced. This “burst” can consist of several hundred megabytes of 8K data

blocks which must be written twice: once to the Caché WIJ and then to the actual random

access data base.

Most storage subsystems are not really optimized for this type of “burst” I/O behavior.

Moreover, it was assumed that the ratio of reads to writes would remain relatively

constant across a fixed period of time. Most of the best practices have assumed these

constant ratio read/write conditions. In the case of Caché, the storage system must first

operate in a “read-only” mode, followed next by a simultaneous, “read-only” and “write-

fast” mode. This cycle is repeated every eighty seconds.

If the cache associated with the storage system is not large enough and becomes rapidly

filled with data to be written to disk, the storage system’s algorithm will direct the storage

controller to “de-stage” the write cache. This operation supersedes the priority of any

read requests, which are being processed.

The two storage resources which can most often limit the read performance are (1)

storage cache and (2) the ratio of number of disk spindles to total user data. The greater

the number of spindles that are available during a write operation, the faster the writes to

physical disk can be completed. Associated with the number spindles is the cache size

available to the storage system. Data which is destined to be written to physical disk must

wait in the write cache until the physical disk resources become available.

The most significant limiting factor across all of the storage system components is the

physical disk. Reading or writing to the disk is limited by the rotation speed of the platter

and the time required to start and stop the read/write head movement. No matter what

other factors are considered to maximize throughput, the wait time for an I/O request to

be serviced by the disk will ultimately determine overall response time.

Most first-time Epic users will want to fill the existing disk arrays to their maximum

available capacity. This is especially true since under RAID 10, half of the spindles are

already used simply to mirror the user data. Thus from the end-users perspective, only

half of the spindles are available. Completely filling the useable RAID 10 formatted disk

Page 16: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

space translates into more data that must be accessed by the single read/write head on an

individual spindle.

New technology such as Solid State Drives (SSDs) which are not limited by the rotation

speed of the drive can handle the load with even RAID 5. With SSDs, it allows us to

leverage new technologies such as EasyTier or advanced caching on XIV.

Another problem with completely filling the disks is that, at the logical volume storage

level, certain JFS metadata information must be retained on the same disk volumes. The

metadata includes journal logs which allow the filesystem to recover from a logical

volume failure. To protect the JFS metadata, it is advisable not to exceed 97% of the disk

capacity for this reason alone.

Rapid database growth is the norm and to be expected in a health care setting. Epic

therefore sizes for three years of growth when providing the hardware sizing guide.

Unexpected addition of new patients or patient data can rapidly consume large amounts

of reserve capacity.

For all of these reasons, it is recommended that the physical disk requirements do not

exceed 60-70% utilization over the three years.

The Use of RAID

The striping of logical data across multiple spindles is an obvious way to evenly

distribute the load of all available disks. Consider ten simultaneous requests for data. If

all of the data was located on only one platter, then nine of the ten requests would remain

queued while the first request was being serviced. Each read or write requires the disk to

rotate and the read write head to move to a new position. All of this is completed

sequentially. Now consider the same data spread or “striped” evenly across ten disk

platters. The ten read requests can be serviced in parallel. Now the only limitation is the

latency required to move the data from the storage cache to the requesting server. This

time can be measured in units of hundredths of milliseconds. Depending on the type of

disk drive, a physical read operation can consume between 3 to 5 milliseconds.

The Caché I/O requests to the production data files are about 99% random in nature. This

means that for almost every I/O request, the read/write heads will perform a “seek”

operation. If the data were sequential, the read operation would require little or no “seek”

operations. As one block of data is written, the read/write head will most likely be

already positioned to write the next block.

One method of striping is the use of RAID (Redundant Array of Independent Disks).

RAID not only provides data striping for faster disk access but provides loss of data

protection as well. The two most widely used types of RAID are 5 and 1+0 or 10. Based

on testing done with EPIC we have determined that RAID 10 provides better

performance compared with RAID 5. There are documented reasons why RAID 10 is

Page 17: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

superior to RAID 5 particularly when multiple random writes of small blocks are

required. The following reference provides additional details about RAID types and their

respective performance:

http://www.redbooks.ibm.com/abstracts/sg247146.html?Open

RAID10 provides data redundancy by way of mirroring each disk. If one disk fails, a

duplicate copy of that disk will provide the same data. When the failed disk is replaced,

the system will rebuild the new platter with a copy of the data located on the mirrored

drive.

Besides striping at the storage level using RAID, striping is also done at the SVC level

and at the Logical Volume Level as well. These striping methods will be covered in later

sections.

How Data is processed through the Storage System

There are multiple logical and physical ‘stages’ within a storage system that data must

pass through. These stages include:

(1) The physical disk drives, where data is actually written or read. The drives are

arranged in groups of 16 disk units within a physical tray. We are currently

recommending the use of 146GB drives or smaller. However, future disk density and

access speed technology may allow for larger capacity drives.

(2) The RAID Array, in this diagram RAID 10 is depicted. Each array consists of 8 disks

from an array site. The RAID 10 Array will consist of either 4 + 4 or 3 + 3 + 2 spares.

We especially recommend using 4 + 4 ranks for production on the DS8 storage.

(3) The strip size used for the RAID 0 portion of the RAID 1 + 0 or RAID 10 is 128KB

(4) The stripe size is 1MB or (128KB strip * 8 disks)

(5) There are N sets of 8+8 disks which make up a RAID 10 rank. The 8+8 array can be

split between the Epic “prod” and WIJ volumes into 6+6 and 2+2 respectively. The 6+6

consists of disks from one RAID 10 array and the 2+2 consists of disks from another

RAID 10 array

The extent size is typically set at 1GB. Multiple extents are used to create a LUN.

(6) Depending on the size of the production database multiple LUNs should be created.

The minimum number is two and the maximum recommendation is 32 LUNs.

Page 18: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

(7) The storage cache size on the DS8K is a minimum of 32GB per controller. This value

may change depending on the model of the storage system and the total size of the Epic

database being used. For the DS8000 series storage systems, 1/32 of the total storage

cache is dedicated for write I/Os. Therefore, a sufficient total amount of cache must be

available to insure that the correct amount of write cache can handle the data coming

from the Cache write burst.

These values are the “typical” recommended configuration for a standard Epic

installation. The values, however, may vary based on recommendations made by Epic or

depending on the total size requirements of the database.

Based on empirical evidence these values seem to provide the best overall performance.

A Typical Layout of the Epic Production Caché data volumes

Although there are any number of ways to configure the LUNs for use by the Caché DB

product the following configurations seems to provide acceptable results:

The Caché production database file systems/logical volumes, prd01-prd08, should be

spread across as many ranks as possible. These ranks should be made up of spindles from

multiple and diverse RAID10 arrays. Selection of the arrays should be evenly distributed

across both storage system controllers as well as fiber channel adapters. The WIJ should

also be allocated from disks belonging to the same arrays as the disks used for

production. This keeps the WIJ volume “spread” across multiple spindles as much as

possible.

The WIJ and the production volumes are accessed at separate times. There will be no

simultaneous contention from the WIJ and the production data for the ranks at any time

during the “write burst” process.

The Transaction Journal should be created from separate ranks from the database

volumes to give extra protection against disk array failure. In the event that the

production database arrays experience a catastrophic failure, the journal files will be used

to recover any lost transactions.

Here is a sample schematic representation of the disk layout for a DS8K system:

Page 19: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

Below is an example of the commands to set up the volume groups, volumes and file

systems for a single Epic instance: mkvg -f -S -s 16 -y epicvg1 hdisk13 hdisk14 hdisk15 hdisk16

mklv -a e -b n -y prlv11 -e x -w n -x 35192 -t jfs2 epicvg1 35082

mklv -a e -b n -y prlv12 -e x -w n -x 35192 -t jfs2 epicvg1 35082

mklv -a e -b n -y prlv13 -e x -w n -x 35192 -t jfs2 epicvg1 35082

mklv -a e -b n -y prlv14 -e x -w n -x 35192 -t jfs2 epicvg1 35082

mklv -a e -b n -y prlv15 -e x -w n -x 35192 -t jfs2 epicvg1 35082

mklv -a e -b n -y prlv16 -e x -w n -x 35192 -t jfs2 epicvg1 35082

mklv -a e -b n -y prlv17 -e x -w n -x 35192 -t jfs2 epicvg1 35082

mklv -a e -b n -y prlv18 -e x -w n -x 35192 -t jfs2 epicvg1 35082

mklv -a e -b n -y wijlv1 -e x -w n -x 800 -t jfs2 epicvg1 795

crfs -v jfs2 -d prlv11 -m /epic/prd11 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio

mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv11 /epic/prd11

crfs -v jfs2 -d prlv12 -m /epic/prd12 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio

mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv12 /epic/prd12

crfs -v jfs2 -d prlv13 -m /epic/prd13 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio

mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv13 /epic/prd13

crfs -v jfs2 -d prlv14 -m /epic/prd14 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio

mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv14 /epic/prd14

crfs -v jfs2 -d prlv15 -m /epic/prd15 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio

mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv15 /epic/prd15

crfs -v jfs2 -d prlv16 -m /epic/prd16 -A yes -p rw -a logname=INLINE –a options=rw,rbrw,cio

mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv16 /epic/prd16

crfs -v jfs2 -d prlv17 -m /epic/prd17 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio

mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv17 /epic/prd17

crfs -v jfs2 -d prlv18 -m /epic/prd18 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio

RAID RAID RAID RAID 10101010 RAID RAID RAID RAID 10101010

RAID RAID RAID RAID 10101010 RAID RAID RAID RAID 5555

RAID RAID RAID RAID 10101010 RAID RAID RAID RAID 10101010

RAID RAID RAID RAID 5555 RAID RAID RAID RAID 5555

RAID RAID RAID RAID 10101010 RAID RAID RAID RAID 10101010

RAID RAID RAID RAID 10101010 RAID RAID RAID RAID 5555

RAID RAID RAID RAID 10101010 RAID RAID RAID RAID 10101010

RAID RAID RAID RAID 5555 RAID RAID RAID RAID 5555

RAID RAID RAID RAID 5555

Journal F ile Disk

FlashCopy Disk

Database F ile Disk

DS8100 Hot Spare Disk

DA

PA

IR 2

DA

PA

IR 0

DA

PA

IR 3

RAID RAID RAID RAID 10101010

Page 20: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/prlv18 /epic/prd18

mkdir /epic/prd1

crfs -v jfs2 -d wijlv1 -m /epic/prd1 -A yes -p rw -a logname=INLINE -a options=rw,rbrw,cio

mount -v jfs2 -o rw,rbrw,cio -o log=INLINE /dev/wijlv1 /epic/prd1

And here is an example of the resulting filesystem layout from running the commands

above:

/dev/prlv11 1293778944 86438760 94% 8 1% /epic/prd11

/dev/prlv12 1293778944 86438640 94% 8 1% /epic/prd12

/dev/prlv13 1293778944 86438512 94% 8 1% /epic/prd13

/dev/prlv14 1293778944 86438568 94% 8 1% /epic/prd14

/dev/prlv15 1293778944 86438568 94% 8 1% /epic/prd15

/dev/prlv16 1293778944 86438680 94% 8 1% /epic/prd16

/dev/prlv17 1293778944 86438936 94% 8 1% /epic/prd17

/dev/prlv18 1293778944 86438776 94% 8 1% /epic/prd18

/dev/wijlv1 26050560 21903352 16% 6 1% /epic/prd1

/dev/prlv11 /epic/prd11 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE

/dev/prlv12 /epic/prd12 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE

/dev/prlv13 /epic/prd13 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE

/dev/prlv14 /epic/prd14 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE

/dev/prlv15 /epic/prd15 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE

/dev/prlv16 /epic/prd16 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE

/dev/prlv17 /epic/prd17 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE

/dev/prlv18 /epic/prd18 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE

/dev/wijlv1 /epic/prd1 jfs2 Nov 10 15:30 rw,rbw,rbr,cio,log=INLINE

For more information please refer to Epic’s File System Layout Recommendations

document.

FlashCopy Except for XIV, FlashCopy is used for creating point in time copies of the production

database. The Caché db writes are momentarily suspended while the FlashCopy

command completes. We recommend using incremental FlashCopy. The target drives

for the FlashCopy can be different than the source drives. For example, 15K vs 10K,

RAID5 vs RAID10 differences are acceptable. However SATA (or nearline) drives are

not recommended.

EasyTier EasyTier can be used within an Epic production environment, but the customer must

continue to use at least the number of 15K RPM spindles recommended by Epic

Hardware Configuration Guide.

CONFIGURATION GUIDELINES FOR THE DS8000 SERIES ENTERPRISE STORAGE SYSTEM

The following section provides specific details regarding the configuration of the DS8000

Storage System. The description of the storage layout in Section IV was intended to be a

“generic” starting point.

Page 21: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

When configuring an IBM DS8000 series storage unit, it is important to have production

data LUNs from multiple ranks that are then assigned to different controllers. This is

needed to work around the 25% NVS cache per rank limit. The ranks should be divided

as evenly as possible between the available DA (Device Adapter) pairs.

With the 15000 rpm disks configured in RAID 10, the DS8000 will present 2 types of

arrays: 4+4 or 3+3+2s, depending on the number of disk per Device Adapter. The 4+4

arrays seem to have slightly better performance. We recommend using only the 4+4

arrays for the Epic production volumes. The 3+3 arrays can be used for shadow as well as

other non-production activities.

We recommend using one extent pool per rank for the DS8000 to simplify the

management. The striping of the LUNs will be done at the AIX (LVM) level.

When creating the Extent Pool, they need to be spread evenly on both “servers” (internal

Controllers) of the DS8000.

As a general rule, we always recommend using 4+4 arrays if possible on the DS8000

storage for the Epic production instance.

If a Multiple-Array Extent Pool is required, it is preferable to create the extent pool with

as many 4+4 arrays as possible. A minimum of two extent pools is required. These two

extent pools should be associated with the two controllers.

Volume groups should be created from one LUN per Extent Pool in the DS8000, in order

to spread every AIX Logical Volume across every AIX physical volume in the Volume

Group.

When the DS8000 is shared with other applications than EPIC, The EPIC Production

database arrays should be put on their own Device Adapter and not share the Device

adapter with other applications if possible.

FlashCopy is mandatory for the nightly backup. It is strongly recommended to use

Incremental FlashCopy. If you need to have more than one Incremental FlashCopy to

create the daily “support” database (for example), it is possible to do an incremental

FlashCopy from the EPIC Reporting Shadow database. Please contact

[email protected] for more information.

The FlashCopy repository does not require the same types of spindles or geometry as the

source disks. For example, RAID 5, 10K RPM drives could be used for the FlashCopy

repository instead of higher-performance drives.

For optimal performance of the Epic production environments, it is best to have at least 4

fibre channel ports on the DS8000 connected to a minimum of four HBAs on the server

per production instance.

Page 22: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

Since mid-November 2011, the “Epic optimization package” is available which

significantly improves the performance of the DS8000 by reducing the peak read IOs.

This applies to the following code on the DS8100/DS8300: R4.3 or higher, DS8700 R5.1

or higher, DS8800 R6.1 or higher. Please contact your IBM representative for the

process to obtain the Epic optimization package.

SVC AND THE EPIC STORAGE CONFIGURATION

SVC has been found to have measurable benefits when included in a hardware

configuration which supports the Epic software environment. Tests have shown that the

SVC will not impact the overall Epic storage performance while providing all the

functional benefits of an SVC. This includes the Storwize V7000 in Gateway mode.

Following are some important guidelines to consider for a hardware configuration which

includes SVC. (See Figure I)

1. We recommend using an even number of vdisks for Epic OLTP production data

and balancing the vdisks between the controllers to balance the load between the

controllers.

2. When using SVC FlashCopy to create backup, it is essential to configure the

FlashCopy to be incremental and to tune the background copy rate from the

default 50% to a value that will allow the background copy to finish in time

without impacting production performance during the backup window.

a. At the time of the FlashCopy resynchronization, FlashCopy will use the

write cache as well. Therefore, it is important to take it into account for the

overall cache sizing.

b. The incremental FlashCopy target should follow the same rules as the

production database to use the maximum available cache. Specifically it

should be part of at least 4 mdiskgroups.

c. During our testing, the copy write between the range of 50-70% with the

default being 50, combined with FlashCopy incremental gives the best

balance between FlashCopy copy time and production disk IO response

time. The best value that we have found seems to be 65.

3. As mentioned previously, when the SVC is shared with storage systems that are

non-Epic related, we recommend dedicating one IOgroup (pair of nodes) of the

cluster to the EPIC Production MDisks, and assign the rest of the load to the

remainder of the cluster (the SVC supports up to 4 IOgroups).

4. Because of the Write Cache Partitioning feature (which prevents cache

starvation), the SVC will not allocate more than 25% of the Write cache per

Storage Pool (mdisk group) if there is more than 5 mdisk groups in the system. To

get access to the full Write cache of the IOgroup we recommend creating at least

Page 23: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

four StoragePools (Mdisk Groups). We recommend production OLTP data to be

spread over at least 4 mdiskgroups to allow access to the full write cache.

5. When the SVC is used in conjunction with DS4/5000 series storage, the

DS4/5000 write cache should be entirely disabled. Tests have determined that

having the storage cache and the SVC write cache enabled results in poor

response times both for reading and writing I/O rates.

WITH SVC (or V7000 in Gateway mode)

WITHOUT SVC

DS4000 Series / DS5000 Disable Storage Write

Cache for the Production

Volumes

Storage Write Cache set to 5% Lower and 5% Upper

FIGURE I – SVC and Storage Configuration

6. Please note, that the SVC has been tested with the Epic production db for all IBM

supported platforms. The cache should now be on for both the SVC and the

DS8000-series.

7. It’s important to tune queue_depth setting for each hdisk according to the SVC

performance guide, especially when using relatively large size VDISKS at SVC

level.

CONFIGURATION GUIDELINES FOR THE STORWIZE V7000 MID-RANGE STORAGE SYSTEM

The Storwize V7000 is IBM’s mid-range storage system that uses a similar technology

base as the IBM SVC, thus all the SVC mdiskgroup considerations apply to the V7000.

The IBM Storwize V7000 offers internal solid-state drives (SSDs), 15K RPM and 10K

RPM Small Form Factor (SFF) drives. IBM offers a 300 GB capacity for their 15 K RPM

small form factor drives.

1. Depending on the Epic requirements we may need to use more than one canister

controller for the Epic load. If the Epic OLTP production workload requires more

than 32-spindles, we recommend you dedicate a V7000 controller (two canister

controllers) for the OLTP production workload. If the production workload

requires less than 32-spindles, it may be OK to share the V7000 controller with

other non-aggressive application workloads. Please check Epic’s V7000

Configuration document for more information on Epic’s requirements.

2. Please note that the previous SVC section applies fully to the V7000. Please refer

to the SVC section for the V7000.

Page 24: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

3. We recommend setting the FlashCopy grain size to 64KB when using SSDs for

the production OLTP data.

4. Storwize V7000 with 15K and 10K RPM SAS drives

a. Epic recommends 15K RPM drives for production storage and have live

production experience with 15K RPM drives. 15K RPM (but not 10K

RPM) drives provide the level of performance required for Epic

production. However for non-production Storwize V7000 10K RPM

drives are acceptable.

5. The Storwize V7000 offers easy configuration tool with the GUI (Wizard) the

array for the EPIC Production Database should be configured as RAID 10.

6. The spare disks do not have to be created with the array but need to be added once

all the arrays have been created. By this method, you can control the location and

the number of spare disks.

7. Just like the SVC it is recommended to use at least 4 Storagepools (mdiskgroups)

if the number of disks permits having access to 100% of the Write cache. EPIC

provides a cache requirement for the production database. So if the Write cache is

sufficiently large enough, then additional storagepools may not be needed.

8. We have noticed that it is easier to manage groups of 4+4 Raid 10 arrays. This is

not mandatory.

9. Storwize V7000 with SSDs (Solid State Drives)

a. Although SSD performance significantly better than spinning drives,

testing has shown that the write cycle length can be the limiting factor for

performance on SSDs. As a general rule of thumb it is possible to replace

six spinning drives with a single SSD if capacity permits. Additional

SSDs are not expected to significantly change the write service times.

b. RAID 5 is recommended for SSDs rather than RAID 10 due to the cost-

performance benefit of SSDs over spinning drives.

V7000 Configuration ScreenShots The IBM Storwize V7000 GUI interface provides an array configuration wizard with

logic that ensures new RAID arrays are created using appropriate candidate disks that

will provide best performance and spare coverage.

Page 25: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

Figure 1 – Storwize V7000 System Status

Page 26: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

The following example shows the 6 mdisks that comprise a 48 disk, RAID10 SAS array:

Figure 2 - Sample mdisk config

Page 27: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

A view next of the storage pool, comprised of the above 6 mdisks:

Figure 3 - Sample Storage Pool config

Page 28: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

Finally, here is a view of the 8 logical volumes (LUNs) that have been mapped to our AIX server. These LUNs were added to a common Epic volume group (VG) and divided into 9 filesystems against which our database and WIJ simulations were executed.

Figure 4 - Sample LUN config

New RAID arrays on the IBM Storwize V7000 were created using the interactive

interface. The Storwize V7000 includes the capability to build optimal arrays through

wizard driven array definition panels. The array configuration panels will select the

drives most suited to your storage requirement based on the settings you choose. While

there is no need to manually configure the storage to guarantee a balanced RAID array,

there is still the option to create arrays from the graphical interface or using the command

line interface.

CONFIGURATION GUIDELINES FOR THE DS5000 SERIES MID-RANGE STORAGE SYSTEM (Note: These configuration guidelines apply to the DS4000 series also)

The DS5000 Series Storage System is sufficiently different from the DS8000 such that

some additional consideration must be made in order to obtain the best possible

performance from this mid-range system. The following section provides specific details

regarding the configuration of the DS5000 Storage System. As was mentioned in section

Page 29: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

V., the description of the storage layout in Section IV was intended to be a “generic”

starting point.

The write cache flush should be set at 5% maximum and 5% minimum.

When using IBM SVC to manage IBM DS5000 series storage units, the optimal results

were achieved by disabling write cache (not the read cache) at the DS5K unit level for

the LUNs that will be used to hold the database files only and use read/write cache at

SVC level for the VDISKs that were constructed from those DS5K LUNs.

CONFIGURATION GUIDELINES FOR THE XIV STORAGE SYSTEM XIV Storage may be used for non-production purposes within the Epic

environment. This includes activities such as testing, training and shadow servers which

are not being used for production purposes. XIV storage provides a low cost large

capacity data retention facility. However, it is not intended to provide the low latency

response characteristics required within the Epic production environment.

The latest XIV technology, XIV Gen3, is also available for Epic non-production

purposes. XIV Gen3 with SSD technology is planned to be available for Epic production

instances in the future. Preliminary IBM-internal testing of XIV Gen3 with SSDs

demonstrates acceptable performance for the Epic production environment.

CONFIGURATION GUIDELINES FOR THE N-Series STORAGE SYSTEM Please refer to Epic’s NetApp best practices for the IBM N-Series storage system.

The IBM N-Series system is similar to the NetApp storage technology.

CONFIGURING THE POWER SYSTEMS AIX SERVER

In order for the Power Systems AIX server to provide optimal performance when running

the Epic software, a specific set of changes to the default set of system tunables is

required. These system parameters have been tested with the Epic software under many

differing scenarios, which Epic Systems feels would be encountered in typical situations.

POWER7 If the Epic Hardware Configuration Guide specifies less than or equal to 14 cores, this

section does not pertain.

The inter-CEC L3 cache-to-cache communication on POWER7 limits the high-volume of

lock management that the InterSystems Caché database employs. An Epic instance that

Page 30: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

uses more than 16 cores and crosses a CEC boundary, will encounter performance issues

due to the L3 cache-to-cache communication latency.

If using a 795 Power Systems server for the Epic production instance, please refer to

IBM’s 795 Cross-Book Guidelines whitepaper.

The 750 server consists of a single 32-core CEC (or book). Therefore the p7 L3 cache

latency is not an issue and Epic can therefore scale up to 28 cores.

Mounting With Concurrent I/O The primary change to a default AIX system is invoking the use of concurrent I/O or

CIO. By default AIX uses the JFS2 filesystem. CIO bypasses the caching features which

are enabled within JFS2. The principle reason for disabling JFS2 cache is because the

Caché DB application is already caching needed data blocks. Caché determines what data

needs to be written to permanent storage and what data should remain in the Caché global

buffers. Having the JFS2 cache also making this determination will typically cause

unnecessary extra work to be performed by the system. In addition the JFS2 cache

requires real memory which could otherwise be used by the Caché global buffer.

CIO is invoked via the –o cio mount option. This option should be used on the database

only file systems, typically /epic/prd01 – /epic/prd08. These filesystems host the

CACHE.DAT files which are exclusively random access in nature. The Caché Write

Image Journal should be mounted with the default JFS2 mount options.

Creation of Volume Groups, Logical Volumes, and File Systems for use by Caché

Following are the steps necessary to create and mount the volumes which will host the

Epic data volumes. It is assumed that the storage LUNs which correspond to the volumes

have already been created either via the storage system or the SVC if available.

Step 0. Make a top level root directory for the Epic/Caché

EXAMPLE:

mkdir /epic

mkdir /epic/prd01

Step 1. Create the Volume groups

EXAMPLE:

mkvg -S -y epicprvg -s 16 hdisk1 hdisk2 hdisk3 .....

Page 31: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

Step 2. Create the Logical Volumes

EXAMPLE:

mklv -a e -b n –e x –t jfs2 -y prdlv01 epicprvg 10G hdisk1 hdisk2 hdisk3 .....

mklv -a e -b n –e x –t jfs2 -y prdlv02 epicprvg 10G hdisk2 hdisk3 hdisk4 ..... hdisk1

Step 3: Create the File Systems

EXAMPLE:

crfs -v jfs2 -d prdlv01 -m /epic/prd01 -A yes -a logname=INLINE –a options=cio

(crfs -v jfs2 -d prdlv -m /epic/prd -A yes -a logname=INLINE –a options=rw)

Step 4: Mount the File Systems

EXAMPLE:

mount /epic/prd

mount /epic/prd01

Step 5: Check that the appropriate entries and options are added to /etc/filesystems.

These steps should be repeated for the eight production volumes, the WIJ and the Journal

files. The WIJ should share the same LUNs as the production volumes. The Journal file

should utilize a separate set of LUNs under a separate volume group.

When the volumes are mounted the results from the mount command, the df command

and the path command should resemble the following:

# df /epic/prd0*

Filesystem 1024-blocks Free %Used Iused %Iused Mounted on

/dev/prdlv01 78577664 1995268 98% 12 1% /epic/prd01

/dev/prdlv02 78577664 1995268 98% 12 1% /epic/prd02

/dev/prdlv03 78577664 1995260 98% 12 1% /epic/prd03

/dev/prdlv04 78577664 1995264 98% 12 1% /epic/prd04

/dev/prdlv05 78577664 1995292 98% 12 1% /epic/prd05

/dev/prdlv06 78577664 1995280 98% 12 1% /epic/prd06

/dev/prdlv07 78577664 1995376 98% 12 1% /epic/prd07

/dev/prdlv08 78577664 1985984 98% 12 1% /epic/prd08

Page 32: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

Additional System Settings

Following are the recommended changes to a subset of the AIX system tunable. A brief

description of the reason for the change is included.

# vmo

vmo -p -o lru_file_repage=0 -- Determines which type of pages are replaced during a

paging operation, based on file repage and computational

re-page values.

vmo -p -o maxclient%=90 -- Specifies that the number of client pages cannot exceed

90% of real memory

vmo -p -o maxperm%=90 -- Specifies that the number of file pages should not exceed

90% of real memory

vmo -p -o vmm_mpsize_support=0 -- Use 4K memory pages only

# ioo

Required ioo parameters

ioo -p -o lvm_bufcnt=64 – Specifies the total Logical Volume Manager buffers.

ioo -p -o sync_release_ilock=1 -- Allows inodes to be unlocked after an I/O operation

update.

ioo -p -o numfsbufs=4096 – Sets the number of available file system buffers

ioo -p -o pv_min_pbuf=4096 – Specifies the minimum number of physical I/O buffers

per physical volume

These j2_xxx settings improve the performance for JFS2 filesystems (optional)

ioo -p -o j2_dynamicBufferPreallocation=256 -- Specifies the number of 16k slabs to

preallocate when the filesystem is

running low of bufstructs.

ioo -p -o j2_maxPageReadAhead=2 -- Specifies the maximum number of pages to be

read ahead when processing a sequentially

accessed file on Enhanced JFS.

ioo -p -o j2_maxRandomWrite=512 -- Specifies a threshold for random writes to

accumulate in RAM before subsequent pages are

flushed to disk by the Enhanced JFS's write-

behind algorithm. The random write-behind

threshold is on a per-file basis.

Page 33: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

ioo -p -o j2_minPageReadAhead=1 -- Specifies the minimum number of pages to be

read ahead when processing a sequentially

accessed file on Enhanced JFS.

ioo -p -o j2_nBufferPerPagerDevice=2048 -- Specifies the minimum number of file

system bufstructs for Enhanced JFS.

ioo -p -o j2_nPagesPerWriteBehindCluster=2 -- Specifies the number of pages per

cluster processed by Enhanced JFS's

write behind algorithm.

ioo -p -o j2_nRandomCluster=1 -- Specifies the distance apart (in clusters) that writes

have to exceed in order for them to be considered as

random by the Enhanced JFS's random write behind

algorithm.

#additional required parameters

lvmo -v epicrdvg -o pv_pbuf_count=4096 -- Increase the number of PV buffers for the

production volume group

chdev -l hdisk5 -P -a queue_depth=64 -- Sets the hdisk depth queue to 64 (default is

20)

chdev -l sys0 -a maxuproc=32767 --- Sets the maximum processes per user to 32767

ADDITIONAL RECOMMENDATIONS

1. Boot from SAN

Boot from SAN is not recommended when running the Epic environment:

Both Caché and PowerHA depend on the O/S running correctly. If a SAN failure occurs

such that the O/S can no longer communicate with the rootvg volume, even for a brief

interval, the condition of the O/S is suspect. The system may appear to be operating

correctly. However, if any O/S specific data was lost during transfer between RAM and

disk, the O/S is no longer viable. Since all software running on the system depends

entirely on the O/S, end user products or supporting middleware may no longer function

correctly.

Epic recommends that customers do not boot from SAN so that Epic can log into the

system following a failure to troubleshoot. However, PowerHA 7.1 recommends that the

customer boots from SAN partly because of the Live Partition Mobility feature. The

decision to boot from SAN should be discussed with your Epic representative.

2. PowerHA (formerly known as HACMP)

Page 34: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

There are multiple resources for information regarding the best method of configuring a

PowerHA (i.e. HACMP) failover cluster. Epic will provide their customers with

PowerHA callable scripts which contain the necessary instructions to cleanly shut down

and start up the Epic and Caché environment.

Most IT system administrators view PowerHA as being capable of recovering from any

and all events that could occur to an Epic environment. As much as we would like to

imagine such a safety mechanism, it doesn’t exist.

What PowerHA will do:

Recover from any type of real hardware failure. This includes the servers, switches, disk

systems and any other type of device which could experience a physical failure due to

power loss, electronic component failure or a catastrophic event.

What PowerHA will not do:

Recover from user errors, either intentional or accidental. Since PowerHA depends on the

operating system, it is assumed that if the operating system started running the Epic

environment without a problem, it should continue to support the environment without a

problem. There are two conditions where the O/S could fail (a) A hardware failure, or (b)

A change made to the O/S environment by a user. In case (a), PowerHA will recognize

the hardware failure and initiate a failover. PowerHA, however, will not support case (b).

PowerHA requires diligent administration and monitoring. PowerHA cannot be installed

and left alone to run by itself. Taking this approach will certainly result in eventual

failure of the correct operation of PowerHA.

All of the available PowerHA documentation makes two major recommendations:

1. Whenever a change is made to the cluster that is being managed by PowerHA, no

matter how trivial it might seem, PowerHA must always be re-tested to insure that

nothing was modified in such a way that PowerHA can not longer function properly.

2. Regardless of whether the system was modified or not, a manual PowerHA failover

should be conducted at regular intervals, (for example, every three months).

Item 2 provides two benefits: It gives confirmation that a PowerHA failover will work

when an unexpected failure occurs. By executing a planned failover, any problems can

quickly be identified and resolved.

PowerHA depends greatly on the environment that it is assigned to manage. Due to its

flexibility, there are many ways to mis-configure a PowerHA environment. There is only

Page 35: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

one way to be certain that PowerHA has been configured to run successfully: Test, test

and re-test.

3. PowerHA and SPOF (Single Point of Failure) In order for PowerHA to work, it must not be limited by Single Points Of Failure or

SPOFs. For example, in order for PowerHA to maintain inter-nodal communication

within the HA cluster there must exist more than a single communication path. This

requires the availability of completely redundant switches, cables and adapters from one

end to the other. Having 8 communication adapters on each node does no good if the two

nodes are connected via a single data path (Ethernet cable). Having multiple redundant

zones on a switch won’t help if the switch loses power.

Therefore, building in redundancy is a must. This requires that half of the equipment may

be sitting idle, until a failure occurs, which unfortunately, is a cost of maintaining a High

Availability environment.

4. PowerHA and ECVG

Customers who are using Epic are required to provide a fail-over system which will take

over in the event of a primary OLTP system failure. This is of obvious necessity in a

health care related environment. IBM offers this facility on POWER based systems

through the use of PowerHA.

Should the active compute system which is running Epic encounter a failure, PowerHA

will recognize the loss of the active system. The fail-over process causes the resources,

(primarily the attached storage system), being used by the primary system to be acquired

by the take-over system. The backup system will then attempt to start the same Epic

environment. Although the takeover is not instantaneous, it does, however provide an

automated method to recover from a catastrophic hardware failure.

In more recent versions of PowerHA, IBM has introduced the use of Enhanced

Concurrent Volume Groups (ECVG). The advantage of ECVG is primarily that the Epic

database volumes are already varied on to both PowerHA nodes (active and standby

nodes). In the event of a failure, the time required for the take-over node to acquire the

Epic volumes is greatly reduced. Therefore IBM has encouraged their PowerHA

customers to take advantage of ECVG mounted volumes which are associated with a

PowerHA cluster.

In the unlikely event that PowerHA itself fails, ECVG can potentially cause a ‘split brain’

event.. When both nodes in the cluster can no longer communicate, or, especially if the

takeover node believes that the primary node has failed, it is possible for both nodes to

become active. Therefore, it is possible that the Epic software could start running on the

takeover node while the primary node is still in play. Recent versions of PowerHA

(versions 6.1 and 7.1) have significantly reduced the possibility of a ‘split brain’ event

occurring. In PowerHA version 6.1, ECVG can safely be used in the Epic environment.

Page 36: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

In PowerHA version 7.1, ECVG is mandatory anyway. Therefore, since PowerHA

ECVG can safely be used in an Epic environment.

When logical volumes are mounted concurrently, it allows access from more than one

compute node simultaneously. Therefore, when a volume group is mounted concurrently,

data on the volumes can be updated by both nodes.

5. Micro Partitioning

Micro Partitioning or SPLPAR is currently not supported within an Epic production

environment. DLPAR, however, is supported.

There are several reasons why Epic does not support the use of SPLPAR.

(a) Epic expects no more than a 15ms response latency from the Caché based DB server.

If both CPU and memory resources were to be shared between Epic and other

applications, there is always a possibility that a non-Epic application could choke

resources away from Epic during a critical time.

(b) When Epic provides the sizing information, the assumption is that the Epic products

are the only ones actively running on the system. Therefore, at a minimum, the Epic

partition would need to be fully configured with the Epic required resources. Epic

provides discounts to their customers if the customer has followed Epic’s

recommendation regarding configuration. It is assumed that those resources are available

at all times. Thus, in effect, the Epic LPAR would really be regarded as a fixed resource

LPAR, or DLPAR.

Epic sizes the DB server so that the customer is not running above 70% CPU utilization

under normal load. We don’t know how quickly a shared partition can obtain resources

from another shared partition, before those resources can actually begin to provide some

relief during a sudden and unplanned increase in resource demand originating from the

Epic partition. In any case, the priority for “spare capacity” to the Epic partition would

require top priority over all other partitions; thereby, once again, making the Epic

partition, an effectively independent DLPAR.

(c) Epic provides their customers a “guarantee of performance”. This is available to the

customer on condition that the customer has followed the Epic recommended guidelines.

Should a performance related problem occur, Epic will want to be able to reproduce the

problem. If performance was degraded due to shared resources being unavailable, it

would be more difficult for Epic (or IBM), to identify whether the cause was due to

something that happened within the Epic partition, or whether an external load-driven

event was the cause.

(d) At this time, we have not adequately tested the interaction between SPLPAR and

PowerHA. As an example, what would happen, or, what would we expect to have

Page 37: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

happen, were the system to experience a physical CPU failure. What should PowerHA do

if Epic happened to be using one tenth or more of the physical CPU at the time?

Normally, loss of a resource would trigger a fail-over. However, this CPU is now a

“virtual” resource.

Epic, however, has no objection to the use of SPLPAR in a non-production environment,

so long as performance is not being evaluated within that environment.

6. VIRTUAL I/O

Virtual I/O (VIO) may be used in the Epic environment. Although Virtual I/O may

provide better use of existing hardware resources, the performance impacts must be

considered in the production environment. The number of the adapters that are being

included in a Virtual I/O environment must continuously provide the same level of

performance as in a non-VIO environment.

NPIV virtualizes a physical fibre-channel adapter, thereby allowing the assignment of

multiple WWNs (World Wide Name IDs). Again, the total load of multiple LPARs being

supported by a physical adapter must be considered.

Epic prefers the use of physical adapters over VIO servers for the production OLTP

system. If VIO servers are desired for enterprise virtualization/consolidation practices,

the following considerations apply when using VIO with the production OLTP LPAR

and its failover LPAR.

a. Please follow IBM’s best practices to set up sufficient redundancy at the

VIO layer to avoid single points of failure.

b. Please follow IBM’s recommendation to properly size the VIO servers for

the overall activities on the server frame.

i. When using Oracle on an IBM Power Systems server as the Clarity

RDBMS: The Clarity RDBMS Oracle server should be on separate VIO servers

from the production OLTP LPAR and its failover LPAR.

ii. You should employ redundant VIO servers. Each VIO server must have

sufficient CPU and memory resources to support the full load expected. If they

are in a shared processor pool, the VIO servers should have the highest weight

within the pool to avoid being starved by activities from other application LPARs.

iii. Each VIO server must have a total of at least 4 ports from at least 2

physical HBAs. The total IO bandwidth provided by the HBAs must

accommodate the total IOPS projection from all LPARs, with sufficient

redundancy. The IOPS projections from the main Epic components can be found

in the previous IO projection and requirements section.

iv. The total network bandwidth provided by the Ethernet adapters must

accommodate the network traffic expected from all LPARs, with sufficient

redundancy. 10 Gbit interfaces are generally more appropriate for large scale

systems. If using 1 Gbit interfaces, multiple interfaces may have to be aggregated

to provide adequate bandwidth and acceptable latency. The Ethernet network

must provide sufficient amount of bandwidth for all of the Epic functional

Page 38: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

requirements (eg, Shadow, Backup, etc). You may still find it beneficial to use

separate NICs for traffic that may have unbounded bandwidth usage patterns.

a. There are two technologies available to provide IO access via VIO: virtual

SCSI and NPIV. Please discuss which technology best suits your needs with

your IBM support.

i. Be aware that queue_depth needs to be properly tuned at both VIO

server layer and the production LPAR layer when using virtual SCSI.

ii. Epic has conducted performance tests with NPIV and found the results

acceptable.

There could be different VIO considerations for SAN boot. If you desire to use SAN

boot, please follow IBM’s best practices for SAN boot.

7. Live Partition Mobility

Live Partition Mobility (LPM) provides the ability move an existing running Epic

instance from one Power Systems frame to another. During a migration, impact on

performance may be observed depending on the size of the Epic environment being

migrated. The database activity may be momentarily suspended. This may result in end

user clients being disconnected temporarily. The alternative for migrating an Epic

production instance from one Power Systems frame to another is to initiate a manual

PowerHA failover. Using PowerHA would result in anywhere from at least a 5 to 15

minute outage, versus a brief end-user client disconnect of less than a minute when using

Live Partition Mobility.

Live Partition Mobility requires VIO Servers on both the source and target Power

Systems frames. Use of NPIV is strongly recommended to support Live Partition

Mobility.

An LPM migration must be done only during low-use hours -- whenever there is minimal

use of the Epic production database.

What Information Should Be Collected When A Problem Occurs

The Epic environment is complex given that there are many “moving parts”. A

performance issue can be caused by any part of either the server system or of the storage

system. Because each stage of the computational process depends on all others, it can

often be difficult to identify the true culprit which causing a problem. For example,

although it seems that obtaining data from storage appears slow, it may in fact be the case

that the server is running out of I/O buffers or disk queues in order to handle the

incoming data from the storage system. Therefore each stage of the process must be

analyzed and diagnosed. The primary task is to determine whether a stage in the process

is waiting for something, (starving), or whether the stage is overloaded.

The disk I/O throughput may seem reasonable for the given configuration. However,

users are noting a substandard response time. Upon further investigation, it is determined

Page 39: EpicIBM Best Practices 2012 Final

© Copyright IBM Corporation, 2012

that the Logical Volume Manager (LVM) has run out of resources on the server. This

may not be immediately evident since we don’t see large amounts of CPU being

consumed. However, lack of certain JFS buffers could result in a bottleneck.

Following is a partial list of information which should be collected when reporting a

problem, either to IBM support or to anyone involved in technical support of Epic.

(1) Have they filed a PMR with IBM? If so, provide the PMR number.

(2) Has Epic Systems been made aware of the problem? Who is the primary Epic contact

that they are dealing with?

(3) Type of System P server, Model, # of CPUs, total memory, DLPARS, SPLPARS, etc.

(4) Type of Storage. Number of spindles, Storage configuration, (eg RAID 5, RAID 10,

Stripe size, number of ranks, LUNs etc.).

(5) Are they using SVC?

(6) Is the storage or SVC being shared with other non-Epic applications?

(7) What has been changed prior to them experiencing the performance problem? For

example, increased users, change in storage config, additional workloads etc.

(8) Did the performance degrade suddenly, or was it a slow degradation over time.

(9) Is there a particular hour of day or night that the performance degrades? Is it constant?

(10) Can the customer provide results from the Epic RanRead facility?

(11) Does the performance degradation occur during a flash copy or other back-end copy

procedures?

Also, if one is available, provide a topology diagram showing the OLTP, Shadow, failover servers, the storage switches and associated interconnects to each component which supports the entire Epic environment.