oracle databases on emc symmetrix storage systems

Oracle Databases on EMC Symmetrix Storage Systems

Version 1.3

• Generating Restartable Oracle Copies Using Symmetrix Storage

• Oracle Remote Replication and Disaster Restart Using Symmetrix Storage

• Oracle Data Layout and Performance Using Symmetrix Storage

Yaron Dar

Oracle Databases on EMC Symmetrix Storage Systems2

Copyright © 2008, 2009, 2010, 2011 EMC Corporation. All rights reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

For the most up-to-date regulatory document for your product line, go to the Technical Documentation and Advisories section on EMC Powerlink.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

All other trademarks used herein are the property of their respective owners.

H2603.3

Contents

Preface

Chapter 1 Oracle on Open SystemsIntroduction ....................................................................................... 26Oracle overview ................................................................................ 27

Oracle system elements .............................................................27Oracle data elements ..................................................................29

Storage management ........................................................................ 33Cloning Oracle objects or environments ....................................... 34Backup and recovery ........................................................................ 35Oracle Real Application Clusters ................................................... 36Optimizing Oracle layouts on EMC Symmetrix .......................... 38EMC and Oracle integration ........................................................... 39

Install base ...................................................................................39Joint engineering.........................................................................39Joint Services Center...................................................................40

Chapter 2 EMC Foundation ProductsIntroduction ....................................................................................... 42Symmetrix hardware and EMC Enginuity features .................... 45

Symmetrix VMAX platform......................................................46EMC Enginuity operating environment..................................47

EMC Solutions Enabler base management ................................... 49EMC Change Tracker ....................................................................... 52EMC Symmetrix Remote Data Facility .......................................... 53

SRDF benefits ..............................................................................54SRDF modes of operation..........................................................54SRDF device groups and composite groups...........................55

Oracle Databases on EMC Symmetrix Storage Systems 3

Contents

SRDF consistency groups .......................................................... 55SRDF terminology ...................................................................... 59SRDF control operations............................................................ 61Failover and failback operations .............................................. 65EMC SRDF/Cluster Enabler solutions.................................... 67

EMC TimeFinder............................................................................... 68TimeFinder/Mirror establish operations................................ 69TimeFinder split operations...................................................... 70TimeFinder restore operations ................................................. 71TimeFinder consistent split....................................................... 72Enginuity Consistency Assist ................................................... 72TimeFinder/Mirror reverse split ............................................. 75TimeFinder/Clone operations.................................................. 75TimeFinder/Snap operations ................................................... 78

EMC Storage Resource Management ............................................ 81EMC Storage Viewer ........................................................................ 86EMC PowerPath................................................................................ 88

PowerPath/VE............................................................................ 90EMC Replication Manager .............................................................. 97EMC Open Replicator ...................................................................... 99EMC Virtual Provisioning ............................................................. 100

Thin device ................................................................................ 100Data device ................................................................................ 100New Symmetrix VMAX Virtual Provisioning features ...... 101

EMC Virtual LUN migration ........................................................ 103EMC Fully Automated Storage Tiering (FAST).......................... 106

Chapter 3 Creating Oracle Database ClonesOverview.......................................................................................... 109Comparing recoverable and restartable copies of databases ... 110

Recoverable disk copies........................................................... 110Restartable disk copies............................................................. 110

Copying the database with Oracle shutdown............................. 111Creating Oracle copies using TimeFinder/Mirror .............. 111Creating Oracle copies using TimeFinder/Clone ............... 113Creating Oracle copies using TimeFinder/Snap ................. 115

Copying a running database using EMC consistency technology........................................................................................ 118

Creating Oracle copies using TimeFinder/Mirror .............. 118Creating Oracle copies using TimeFinder/Clone ............... 120Creating Oracle copies using TimeFinder/Snap ................. 122

Copying the database with Oracle in hot backup mode........... 125


Contents

Putting the tablespaces or database into hot backup mode ...........................................................................................125Taking the tablespaces or database out of hot backup mode ...........................................................................................126Creating Oracle copies using TimeFinder/Mirror...............126Creating Oracle copies using TimeFinder/Clone ................128Creating Oracle copies using TimeFinder/Snap..................130

Replicating Oracle using Replication Manager .......................... 133Transitioning disk copies to Oracle database clones.................. 135

Host considerations ..................................................................135Enabling a cold database copy................................................140Enabling a restartable database copy.....................................141Enabling a hot backup database copy....................................142

Oracle transportable tablespaces .................................................. 143Benefits and uses of transportable tablespaces.....................143Implementation of transportable tablespaces with EMC TimeFinder and SRDF..............................................................144Transportable tablespace example .........................................144

Cross-platform transportable tablespaces ................................... 150Overview....................................................................................150Implementing cross-platform transportable tablespaces....151

Choosing a database cloning methodology ................................ 154

Chapter 4 Backing Up Oracle EnvironmentsIntroduction ..................................................................................... 156Comparing recoverable and restartable copies of databases.... 157

Recoverable disk copies ...........................................................157Restartable disk copies .............................................................157

Database organization to facilitate recovery ............................... 159Oracle backup overview ................................................................ 161

Online (hot) versus offline (cold) backups ............................163Point-in-time and roll-forward recovery backups ...............164Comparing partial and entire database backups .................165Comparing incremental and full database backups ............165

Using EMC replication in the Oracle backup process ............... 166Copying the database with Oracle shutdown ............................ 168

Creating cold Oracle backup copies using TimeFinder/Mirror ..................................................................168Creating cold Oracle backup copies using TimeFinder/Clone ...........................................................................................170Creating cold Oracle backup copies using TimeFinder/Snap.............................................................................................172

5Oracle Databases on EMC Symmetrix Storage Systems

Contents

Copying a running database using EMC consistency technology........................................................................................ 175

Creating restartable Oracle backup copies using TimeFinder/Mirror .................................................................. 176Creating restartable Oracle backup copies using TimeFinder/Clone ................................................................... 177Creating restartable Oracle backup copies using TimeFinder/Snap ..................................................................... 179

Copying the database with Oracle in hot backup mode........... 182Putting the tablespaces or database into hot backup mode........................................................................................... 182Taking the tablespaces or database out of hot backup mode........................................................................................... 183Creating hot Oracle backup copies using TimeFinder/Mirror ......................................................................................... 183Creating hot Oracle backup copies using TimeFinder/Clone........................................................................................... 185Creating hot Oracle backup copies using TimeFinder/Snap ............................................................................................ 187

Backing up the database copy ...................................................... 190Backups using EMC Replication Manager for Oracle backups ............................................................................................ 191Backups using Oracle Recovery Manager (RMAN).................. 193Backups using TimeFinder and Oracle RMAN.......................... 195

Chapter 5 Restoring and Recovering Oracle DatabasesIntroduction..................................................................................... 198Oracle recovery types..................................................................... 199

Crash recovery .......................................................................... 199Media recovery ......................................................................... 200Complete recovery ................................................................... 201Incomplete recovery................................................................. 201Restartable database recovery ................................................ 202

Oracle recovery overview.............................................................. 203Restoring a backup image using TimeFinder ............................. 205

Restore using TimeFinder/Mirror......................................... 205Restore using TimeFinder/Clone .......................................... 208Restore using TimeFinder/Snap............................................ 211

Restoring a backup image using Replication Manager ............ 215Oracle database recovery procedures.......................................... 217

Oracle restartable database recovery procedures................ 217Oracle complete recovery........................................................ 218


Contents

Oracle incomplete recovery.....................................................220Database recovery using Oracle RMAN...................................... 223Oracle Flashback ............................................................................. 224

Flashback configuration...........................................................224Flashback Query........................................................................225Flashback Version Query.........................................................226Flashback Transaction Query..................................................226Flashback Table .........................................................................226Flashback Drop..........................................................................226Flashback Database...................................................................227

Chapter 6 Understanding Oracle Disaster Restart & Disaster RecoveryIntroduction ..................................................................................... 230Definitions ........................................................................................ 231

Dependent-write consistency..................................................231Database restart .........................................................................231Database recovery.....................................................................232Roll-forward recovery ..............................................................232

Design considerations for disaster restart and disaster recovery ............................................................................................ 233

Recovery Point Objective.........................................................233Recovery Time Objective .........................................................234Operational complexity............................................................234Source server activity ...............................................................235Production impact ....................................................................235Target server activity................................................................235Number of copies of data.........................................................236Distance for solution.................................................................236Bandwidth requirements .........................................................236Federated consistency ..............................................................237Testing the solution ..................................................................237Cost .............................................................................................238

Tape-based solutions....................................................................... 239Tape-based disaster recovery..................................................239Tape-based disaster restart ......................................................239

Remote replication challenges....................................................... 241Propagation delay.....................................................................241Bandwidth requirements .........................................................242Network infrastructure ............................................................242Method of instantiation............................................................243Method of reinstantiation ........................................................243


Contents

Change rate at the source site ................................................. 243Locality of reference ................................................................. 244Expected data loss .................................................................... 244Failback operations .................................................................. 245

Array-based remote replication.................................................... 246Planning for array-based replication ........................................... 247SRDF/S single Symmetrix array to single Symmetrix array ... 250

How to restart in the event of a disaster ............................... 252SRDF/S and consistency groups .................................................. 253

Rolling disaster ......................................................................... 253Protection against a rolling disaster ...................................... 255SRDF/S with multiple source Symmetrix arrays ................ 257

SRDF/A............................................................................................ 260SRDF/A using a single source Symmetrix array................. 261SRDF/A multiple source Symmetrix arrays ........................ 262How to restart in the event of a disaster ............................... 264

SRDF/AR single hop ..................................................................... 266SRDF/AR multihop ....................................................................... 269

How to restart in the event of a disaster ............................... 271Database log-shipping solutions .................................................. 272

Overview of log shipping........................................................ 272Log-shipping considerations .................................................. 272Log shipping and remote standby database ........................ 275Log shipping and standby database with SRDF.................. 276Oracle Data Guard ................................................................... 277

Running database solutions .......................................................... 286Overview ................................................................................... 286Advanced Replication.............................................................. 286Oracle Streams .......................................................................... 287

Chapter 7 Oracle Database Layouts on EMC Symmetrix DMXIntroduction..................................................................................... 290The performance stack................................................................... 291

Importance of I/O avoidance ................................................. 292Storage-system layer considerations ..................................... 293

Traditional Oracle layout recommendations .............................. 294Oracle's optimal flexible architecture .................................... 294Oracle layouts and replication considerations..................... 295Automated Storage Management .......................................... 296

Symmetrix DMX performance guidelines .................................. 297Front-end connectivity............................................................. 297Symmetrix cache....................................................................... 299


Contents

Back-end considerations ..........................................................308Additional layout considerations ...........................................309Configuration recommendations............................................310

RAID considerations....................................................................... 311Types of RAID ...........................................................................311RAID recommendations ..........................................................315Symmetrix metavolumes .........................................................316

Host- versus array-based striping ................................................ 318Host-based striping ..................................................................318Symmetrix-based striping (metavolumes) ............................319Striping recommendations ......................................................320

Data placement considerations ..................................................... 322Disk performance considerations ...........................................322Hypervolume contention.........................................................324Maximizing data spread across the back end.......................325Minimizing disk head movement ..........................................327

Other layout considerations .......................................................... 328Database layout considerations with SRDF/S .....................328Database cloning, TimeFinder, and sharing spindles .........328Database clones using TimeFinder/Snap .............................329

Oracle database-specific configuration settings ......................... 331The database layout process .......................................................... 333

Database layout process...........................................................333

Chapter 8 Data ProtectionEMC Double Checksum overview ............................................... 340

Traditional methods of preventing data corruption............340Data corruption between host and conventional storage ...341Benefits of checking within Symmetrix arrays .....................341

Implementing EMC Double Checksum for Oracle .................... 342Other checksum operations.....................................................342Enabling checksum options.....................................................343Verifying checksum is enabled ...............................................344Validating for checksum operations ......................................344Disabling checksum..................................................................345

Implementing Generic SafeWrite for generic applications ....... 346Torn pages: Using Generic SafeWrite to protect applications................................................................................346Why generic? .............................................................................347Where to enable Generic SafeWrite........................................347Configuring Generic SafeWrite...............................................348How to disable Generic SafeWrite .........................................350


Contents

Listing Generic SafeWrite devices ......................................... 351Performance considerations.................................................... 351

Syntax and examples...................................................................... 353

Chapter 9 Storage Tiering—Virtual LUN and FASTOverview.......................................................................................... 356Evolution of storage tiering........................................................... 359

Manual storage tiering............................................................. 359Fully Automated Storage Tiering (FAST)............................. 359Fully Automated Storage Tiering for Virtual Pools (FAST VP) .................................................................................. 359Example of storage tiering evolution .................................... 359

Symmetrix Virtual Provisioning................................................... 361Introduction............................................................................... 361Virtual Provisioning and Oracle databases .......................... 363Planning thin devices for Oracle databases.......................... 368

Enhanced Virual LUN migrations for Oracle databases........... 372Manual tiering mechanics ....................................................... 372Symmetrix Enhanced Virtual LUN technology ................... 372LUN-based migrations and ASM........................................... 373Configuration for Virtual LUN migration ............................ 376Symmetrix Virtual LUN VP mobility technology ............... 380

Fully Automated Storage Tiering for Virtual Pools................... 381FAST VP and Virtual Provisioning........................................ 381FAST VP Elements ................................................................... 382FAST VP time window considerations ................................. 383FAST VP move time window considerations ...................... 384FAST VP architecture............................................................... 384FAST VP and Oracle databases .............................................. 386Examples of FAST VP for Oracle databases ......................... 390Test Case 1: FAST VP optimization of a single Oracle database OLTP workload........................................................ 391Test Case 2: Oracle databases sharing the ASM disk group and FAST policy............................................................ 396Test Case 3: Oracle databases on separate ASM disk groups and FAST policies ....................................................... 399

Fully Automated Storage Tiering................................................. 404Introduction............................................................................... 404FAST configuration .................................................................. 405FAST device movement........................................................... 406FAST and ASM ......................................................................... 407


Contents

Example of FAST for Oracle databases..................................407Conclusion........................................................................................ 419

Appendix A Symmetrix VMAX with EnginuityIntroduction to Symmetrix VMAX series with Enginuity ........ 422

New Symmetrix VMAX ease of use, scalability and virtualization features ............................................................. 422Oracle mission-critical applications require protection strategy ...................................................................................... 423Enterprise protection and compliance using SRDF ............ 423Oracle database clones and snapshots with TimeFinder ... 424Oracle database recovery using storage consistent replications................................................................................ 424Best practices for local and remote Oracle database replications................................................................................ 424Symmetrix VMAX Auto-provisioning Groups ................... 425Symmetrix VMAX Enhanced Virtual LUN migration technology................................................................................. 427Symmetrix VMAX TimeFinder product family................... 431Symmetrix VMAX SRDF product family ............................. 434ASM rebalancing and consistency technology .................... 442

Leveraging TimeFinder and SRDF for business continuity solutions............................................................................................ 444

Use Case 1: Offloading database backups from production................................................................................. 447Use Case 2: Parallel database recovery ................................. 450Use Case 3: Local restartable replicas of production .......... 452Use Case 4: Remote mirroring for disaster protection (synchronous and asynchronous).......................................... 453Use Case 5: Remote restartable database replicas for repurposing............................................................................... 454Use Case 6: Remote database valid backup replicas .......... 456Use Case 7: Parallel database recovery from remote backup replicas......................................................................... 457Use Case 8: Fast database recovery from a restartable replicas....................................................................................... 459

Conclusion........................................................................................ 462Test storage and database configuration ..................................... 463

General test environment........................................................ 463


Contents

Appendix B Sample SYMCLI Group Creation CommandsSample SYMCLI group creation commands .............................. 468

Appendix C Related Host OperationOverview.......................................................................................... 474

BIN file configuration ............................................................. 474SAN considerations................................................................. 475Final configuration considerations for enabling LUN presentation to hosts ............................................................... 476

Presenting database copies to a different host ........................... 477AIX considerations .................................................................. 477HP-UX considerations ............................................................ 480Linux considerations............................................................... 483Solaris considerations ............................................................. 485Windows considerations ........................................................ 487Windows Dynamic Disks ....................................................... 490

Presenting database copies to the same host .............................. 491AIX considerations .................................................................. 491HP-UX considerations ............................................................ 492Linux considerations............................................................... 495Solaris considerations ............................................................. 496Windows considerations ........................................................ 497

Appendix D Sample Database Cloning ScriptsSample script to replicate a database........................................... 500

Appendix E Solutions Enabler Command Line Interface (CLI) for FAST VP Operations and MonitoringOverview.......................................................................................... 510

Enabling FAST.......................................................................... 510Gathering detailed information about a Symmetrix thin pool ............................................................................................ 510Checking distribution of thin device tracks across FAST VP tiers ...................................................................................... 511Checking the storage tiers allocation.................................... 512


Title Page

Figures

1 Oracle Systems Architecture......................................................................... 272 Physical data elements in an Oracle configuration ................................... 303 Relationship between data blocks, extents, and segments....................... 324 Oracle two-node RAC configuration........................................................... 375 Symmetrix VMAX logical diagram ............................................................. 476 Basic synchronous SRDF configuration...................................................... 547 SRDF consistency group ............................................................................... 578 SRDF establish and restore control operations .......................................... 639 SRDF failover and failback control operations .......................................... 6510 Geographically distributed four-node EMC SRDF/CE clusters............. 6711 EMC Symmetrix configured with standard volumes and BCVs ............ 6912 ECA consistent split across multiple database-associated hosts............. 7313 ECA consistent split on a local Symmetrix system ................................... 7414 Creating a copy session using the symclone command ........................... 7715 TimeFinder/Snap copy of a standard device to a VDEV......................... 8016 SRM commands.............................................................................................. 8217 EMC Storage Viewer...................................................................................... 8718 PowerPath/VE vStorage API for multipathing plug-in........................... 9119 Output of rpowermt display command on a Symmetrix VMAX

device .................................................................................................................9420 Device ownership in vCenter Server........................................................... 9521 Virtual Provisioning components .............................................................. 10122 Virtual LUN eligibility tables ..................................................................... 10323 Copying a cold (shutdown) Oracle database with TimeFinder/

Mirror ...............................................................................................................11224 Copying a cold Oracle database with TimeFinder/Clone ..................... 11425 Copying a cold Oracle database with TimeFinder/Snap....................... 11626 Copying a running Oracle database with TimeFinder/Mirror............. 11927 Copying a running Oracle database with TimeFinder/Clone .............. 12128 Copying a running Oracle database with TimeFinder/Snap................ 123

Oracle Databases on 13

Figures

29 Copying an Oracle database in hot backup mode with TimeFinder/Mirror............................................................................................................... 127

30 Copying an Oracle database in hot backup mode with TimeFinder/Clone ................................................................................................................ 129

31 Copying an Oracle database in hot backup mode with TimeFinder/Snap................................................................................................................. 131

32 Using Replication Manager to make a TimeFinder copy of Oracle...... 13333 Database organization to facilitate recovery............................................ 15934 Copying a cold Oracle database with TimeFinder/Mirror ................... 16935 Copying a cold Oracle database with TimeFinder/Clone..................... 17136 Copying a cold Oracle database with TimeFinder/Snap ...................... 17337 Copying a running Oracle database with TimeFinder/Mirror............. 17638 Copying a running Oracle database using TimeFinder/Clone ............ 17839 Copying a running Oracle database with TimeFinder/Snap................ 18040 Copying an Oracle database in hot backup mode with TimeFinder/

Mirror............................................................................................................... 18441 Copying an Oracle database in hot backup mode with TimeFinder/

Clone ................................................................................................................ 18642 Copying an Oracle database in hot backup mode with TimeFinder/

Snap.................................................................................................................. 18843 Using RM to make a TimeFinder copy of Oracle .................................... 19144 Restoring a TimeFinder copy, all components ........................................ 20645 Restoring a TimeFinder copy, data components only............................ 20646 Restoring a TimeFinder/Clone copy, all components ........................... 20947 Restoring a TimeFinder/Clone copy, data components only ............... 20948 Restoring a TimeFinder/Snap copy, all components ............................. 21249 Restoring a TimeFinder/Snap copy, data components only................. 21250 Restoring Oracle using EMC Replication Manager ................................ 21551 Database components for Oracle ............................................................... 24852 Synchronous replication internals ............................................................. 25053 Rolling disaster with multiple production Symmetrix arrays .............. 25454 Rolling disaster with SRDF consistency group protection .................... 25655 SRDF/S with multiple source Symmetrix arrays and ConGroup

protection ........................................................................................................ 25856 SRDF/A replication internals .................................................................... 26057 SRDF/AR single-hop replication internals .............................................. 26658 SRDF/AR multihop replication Internals ................................................ 27059 Log shipping and remote standby database ............................................ 27560 Sample Oracle10g Data Guard configuration.......................................... 28061 "No data loss" standby database................................................................ 28462 The performance stack ................................................................................ 29263 Relationship between host block size and IOPS/throughput............... 298


Figures

64 Performance Manager graph of write-pending limit for a single hypervolume ...................................................................................................305

65 Performance Manager graph of write-pending limit for a four-member metavolume .....................................................................................306

66 Write workload for a single hyper and a striped metavolume ............. 30767 3+1 RAID 5 layout detail ............................................................................. 31268 Anatomy of a RAID 5 random write ......................................................... 31369 Optimizing performance with RAID 5 sequential writes....................... 31470 Disk performance factors ............................................................................ 32471 Synchronous replication internals ............................................................. 35172 Storage tiering evolution ............................................................................. 36073 Thin devices and thin pools containing data devices ............................. 36374 Thin device configuration ........................................................................... 36575 Migration of ASM members from FC to EFDs using Enhanced

Virtual LUN technology ............................................................................... 37676 Virtual LUN migration to configured space............................................. 37777 Virtual LUN migration to unconfigured space........................................ 37978 FAST managed objects................................................................................. 38279 FAST policy association............................................................................... 38380 FAST VP components .................................................................................. 38581 “Heat” map of ASM member devices showing sub-LUN skewing...... 38782 Gold FAST VP policy storage group association..................................... 39383 Storage tier allocation changes during the FAST VP test for FINDB.... 39484 Ddatabase transaction changes with FAST VP ........................................ 39685 Storage tier changes during FAST VP enabled run on two

databases......................................................................................................... 39886 FAST VP enabled test with different FAST policies................................ 40387 Initial FAST policies for DB3....................................................................... 41088 Initial FAST policy for DB3 ......................................................................... 41289 Initial performance analysis on FAST ....................................................... 41290 FAST configuration wizard: Setting FAST parameters........................... 41391 FAST configuration wizard: Creating performance and move time

window ........................................................................................................... 41392 FAST configuration wizard: Creating FAST policy................................. 41493 FAST configuration wizard: Creating a FAST storage group................ 41594 DB3 FAST policy........................................................................................... 41695 FAST swap/move detail ............................................................................. 41796 Disk utilization map after migration ......................................................... 41897 Oracle RAC and Auto-provisioning Groups............................................ 42698 Migration example using Virtual LUN technology ................................ 42999 SRDF/Synchronous replication ................................................................. 435100 SRDF/Asynchronous replication............................................................... 437101 SRDF Adaptive Copy mode........................................................................ 438


Figures

102 Concurrent SRDF ......................................................................................... 439103 Cascaded SRDF ............................................................................................ 440104 SRDF/Extended Distance Protection........................................................ 440105 SRDF/Star ..................................................................................................... 441106 Test configuration ........................................................................................ 464107 Windows Disk Management console........................................................ 488


Title Page

Tables

1 Oracle background processes ........................................................................ 282 SYMCLI base commands ............................................................................... 493 TimeFinder device type summary................................................................ 794 Data object SRM commands .......................................................................... 835 Data object mapping commands .................................................................. 836 File system SRM commands to examine file system mapping ................ 847 File system SRM command to examine logical volume mapping ........... 858 SRM statistics command ................................................................................ 859 Comparison of database cloning technologies ......................................... 15410 Database cloning requirements and solutions.......................................... 15411 Background processes for managing a Data Guard environment......... 28012 Initialization parameters .............................................................................. 33113 Background processes for managing a Data Guard environment......... 35314 FAST VP Oracle test environment .............................................................. 39015 Initial tier allocation for test cases with shared ASM disk group .......... 39116 FINDB initial tier allocation......................................................................... 39317 Initial AWR report for FINDB..................................................................... 39318 Oracle database tier allocations-initial and FAST VP enabled ............... 39519 FAST VP enabled database response time from the AWR report ......... 39520 FINDB and HRDB initial storage tier allocation....................................... 39721 Initial AWR report for FINDB..................................................................... 39722 FAST VP enabled database transaction rate changes .............................. 39923 Initial tier allocation for a test case with independent ASM disk

groups ..............................................................................................................39924 Initial AWR report for CRMDB and SUPCHDB....................................... 40125 AST VP enabled AWR report for CRMDB and SUPCHDB .................... 40226 Storage tier allocation changes during the FAST VP-enabled run ........ 40327 Test configuration ......................................................................................... 40828 Storage and ASM configuration for each test database........................... 40929 Database storage placement (initial) and workload profile.................... 409


Tables

30 Initial Oracle AWR report inspection (db file sequential read).............. 41031 Initial FAST performance analysis results................................................. 41632 Results after FAST migration of DB3 to Flash .......................................... 41733 ASM diskgroups, and Symmetrix device and composite groups ........ 44434 Test hardware ................................................................................................ 464


Preface

As part of an effort to improve and enhance the performance and capabilities of its product lines, EMC periodically releases revisions of its hardware and software. Therefore, some functions described in this document may not be supported by all versions of the software or hardware currently in use. For the most up-to-date information on product features, refer to your product release notes.

This document describes how the EMC Symmetrix array manages Oracle databases on UNIX and Windows. Additionally, this document provides a general description of the Oracle RDBMS and EMC products and utilities that can be used for Oracle administration. EMC Symmetrix storage arrays and EMC software products and utilities are used to clone Oracle environments and to enhance database and storage management backup and recovery procedures.

Other topics include:

◆ Database and storage management administration

◆ CPU resource consumption

◆ The time required to clone or recover Oracle systems

Audience This TechBook is intended for systems administrators, Oracle database administrators, and storage management personnel responsible for managing Oracle databases on opensystems platforms. The information in this document is based on Oracle10g. In this document, open-systems platforms are UNIX operating systems (including AIX, HPUX, Linux, and Solaris), as well as Microsoft Windows platforms.


20

Preface

Readers of this document are expected to be familiar with the following topics:

◆ Symmetrix operation◆ Oracle concepts and operation

Relateddocumentation

The following is a list of related documents that provide more detailed information on topics described in this TechBook. Many of these documents are on the EMC Powerlink site (http://powerlink.EMC.com). For Oracle information, consult the Oracle websites including the main site (http://www.oracle.com), the Oracle Technology Network (OTN), and Oracle Metalink.

EMC-related documents include:

◆ Solutions Enabler Release Notes (by release)

◆ Solutions Enabler Support Matrix (by release)

◆ Solutions Enabler Symmetrix Device Masking CLI Product Guide (by release)

◆ Solutions Enabler Symmetrix Base Management CLI Product Guide (by release)

◆ Solutions Enabler Symmetrix CLI Command Reference (by release)

◆ Solutions Enabler Symmetrix Configuration Change CLI Product Guide (by release)

◆ Solutions Enabler Symmetrix SRM CLI Product Guide (by release)

◆ Solutions Enabler Installation Guide (by release)

◆ Solutions Enabler Symmetrix TimeFinder Family CLI Product Guide (by release)

◆ Solutions Enabler Symmetrix SRDF Family CLI Product Guide (by release)

◆ Symmetrix Remote Data Facility (SRDF) Product Guide

◆ Enginuity—The EMC Symmetrix Storage Operating Environment - A Detailed Review (white paper)

◆ Replication Manager Product Guide

◆ Replication Manager Support Matrix

Oracle-related documents include:

◆ Oracle Data Guard Concepts and Administration

◆ Oracle Database Administrator's Guide

◆ Oracle Database Backup and Recovery Basics


Preface

◆ Oracle Database Backup and Recovery Advanced Users Guide

◆ Oracle Database Performance Tuning Guide

◆ Oracle Database Reference

Organization This TechBook contains the following chapters and several appendices:

Chapter 1, “Oracle on Open Systems,” provides a high-level overview of Oracle.

Chapter 2, “EMC Foundation Products,” describes EMC products used to support the management of Oracle environments.

Chapter 3, “Creating Oracle Database Clones,” describes procedures to clone Oracle instances. It also discusses procedures to clone Oracle objects within and across Oracle instances using Oracle Transportable Tablespaces and EMC TimeFinder.

Chapter 4, “Backing Up Oracle Environments,” describes how to back up Oracle environments and objects with Oracle Recovery Manager and EMC products including TimeFinder and SRDF.

Chapter 5, “Restoring and Recovering Oracle Databases,” describes how to recover Oracle environments and objects, based upon the type of backups that were previously performed.

Chapter 6, “Understanding Oracle Disaster Restart & Disaster Recovery,” describes the difference between using traditional recovery techniques versus EMC restart solutions.

Chapter 7, “Oracle Database Layouts on EMC Symmetrix DMX,” describes Oracle RDBMS on EMC Symmetrix DMX data layout recommendations and best practices.

Chapter 8, “Data Protection,” describes data protection methods using EMC Double Checksum to minimize the impact of I/O errors on database consistency during I/O transfers between hosts and Symmetrix storage devices.

Chapter 9, “Storage Tiering—Virtual LUN and FAST,” describes storage tiers available on Symmetrix and methodologies for nondisruptive migration of Oracle data using Symmetrix technologies across available storage tiers.

The appendixes provide sample code, which supplement procedures described in the document, and additional detail on the Symmetrix VMAX Series with Enginuity with Oracle.


22

Preface

The references section lists documents that contain more information on these topics. Examples provided in this document cover methods for performing various Oracle functions using Symmetrix arrays with EMC software. These examples were developed for laboratory testing and may need tailoring to suit other operational environments. Any procedures outlined in this document should be thoroughly tested prior to production implementation.

Conventions used inthis document

EMC uses the following conventions for special notices.

Note: A note presents information that is important, but not hazard-related.

IMPORTANT

An important notice contains information essential to operation of the software or hardware.

Typographical conventionsEMC uses the following type style conventions in this document:

Normal Used in running (nonprocedural) text for:• Names of interface elements (such as names of windows,

dialog boxes, buttons, fields, and menus)• Names of resources, attributes, pools, Boolean expressions,

buttons, DQL statements, keywords, clauses, environment variables, functions, utilities

• URLs, pathnames, filenames, directory names, computer names, filenames, links, groups, service keys, file systems, notifications

Bold Used in running (nonprocedural) text for:• Names of commands, daemons, options, programs, processes,

services, applications, utilities, kernels, notifications, system calls, man pages

Used in procedures for:• Names of interface elements (such as names of windows,

dialog boxes, buttons, fields, and menus)• What user specifically selects, clicks, presses, or types

Italic Used in all text (including procedures) for:• Full titles of publications referenced in text• Emphasis (for example a new term)• Variables


Preface

The authors of this Techbook

This TechBook was written by Yaron Dar, an employee of EMC based at Hopkinton, Masschusetts. Yaron has over ten years of service with EMC and more than thirteen years of experience in Oracle databases.

Other primary contributors to this TechBook are David Waddill, Udgith Mankad, and the EMC Database and Application Team, also based in Hopkinton.

We'd like to hear from you!

Your feedback on our TechBooks is important to us! We want our books to be as helpful and relevant as possible, so please feel free to send us your comments, opinions and thoughts on this or any other TechBook:

[email protected]

Courier Used for:• System output, such as an error message or script • URLs, complete paths, filenames, prompts, and syntax when

shown outside of running text

Courier bold Used for:• Specific user input (such as commands)

Courier italic Used in procedures for:• Variables on command line• User input variables

< > Angle brackets enclose parameter or variable values supplied by the user

[ ] Square brackets enclose optional values

| Vertical bar indicates alternate selections - the bar means “or”

{ } Braces indicate content that you must specify (that is, x or y or z)

... Ellipses indicate nonessential information omitted from the example


[email protected]

24

Preface


1

This chapter presents these topics:

◆ Introduction ........................................................................................ 26◆ Oracle overview ................................................................................. 27◆ Storage management ......................................................................... 33◆ Cloning Oracle objects or environments ........................................ 34◆ Backup and recovery ......................................................................... 35◆ Oracle Real Application Clusters..................................................... 36◆ Optimizing Oracle layouts on EMC Symmetrix............................ 38◆ EMC and Oracle integration............................................................. 39

Oracle on OpenSystems

Oracle on Open Systems 25

26

Oracle on Open Systems

IntroductionThe Oracle RDBMS on open systems first became available in 1979 and has steadily grown to become the marketshare leader in enterprise database solutions. With a wide variety of features and functionality, Oracle provides a stable platform for handling concurrent, read-consistent access to a customer's application data.

Oracle database 10g and 11g, the latest releases of the Oracle RDBMS, have introduced a variety of new and enhanced features over previous versions of the database. Among these are:

◆ Increased self-management through features such as Automatic Undo Management, Oracle managed files, and mean time to recovery enhancements.

◆ Improved toolsets and utilities such as Recovery Manager (RMAN), Oracle Data Guard, and Oracle Enterprise Manager (OEM).

◆ Introduction of Automatic Storage Management (ASM).

◆ Enhancements to Oracle Real Application Clusters.

◆ Introduction of Database Resource Manager.

◆ Enhancements to Oracle Flashback capabilities.

◆ Introduction of Oracle VM server virtualization.

Oracle's architectural robustness, scalability, and availability functions have positioned it as a cornerstone in many customers' enterprise system infrastructures. A large number of EMC® customers use Oracle in open-systems environments to support large, mission-critical business applications.



Oracle overviewThe Oracle RDBMS can be configured in multiple ways. The requirement for 24x7 operations, replication and disaster recovery, and the capacity of the host(s) that will contain the Oracle instance(s) will, in part, determine how the Oracle environment must be architected.

Oracle system elements

An Oracle database consists of three basic components: memory structures, processes, and files. An Oracle instance is defined as the System Global Area and the associated background processes. Figure 1 shows a simplified example of the Oracle components.

Figure 1 Oracle Systems Architecture

The System Global Area (SGA) contains the basic memory structures that an Oracle database instance requires to function. The SGA contains memory structures such as the Buffer Cache (shared area for users to read or write Oracle data blocks), Redo Log Buffer (circular buffer for the Oracle logs), Shared Pool (including user SQL and PL/SQL code, data dictionary, and more), Large Pool, and others.

PMON

ICO-IMG-0

System Global Area (SGA)

SharedPool

Redo LogBuffers

PGAData

DictionaryDB Block Buffers

DBWn

Data files Data files Data files Data files

Snnn

LGWR

CKPT

ActiveRedoLog

RedoLog

RedoLog

ARCn

ArchiveLogs

SMON

Oracle overview 27

28


In addition to the SGA, the Oracle instance has another memory structure that is called Program Global Area, or PGA. A PGA is allocated for each server process accessing the database as well as for background processes. The PGA contains the session information, cursors, bind variable values, and an area for memory intensive operations such as sorts, joins, and others. This is particularly important for data warehouses where parallel query execution commonly requires a lot of PGA space rather than SGA.

The background processes are started when the instance is initiated; they enable Oracle to perform tasks such as reading and writing between the data files and the SGA, managing I/O to the redo logs, performing archiving between the redo and archive logs, and connecting users to the database. Table 1 describes some of the Oracle background processes shown in Figure 1 on page 27.

Table 1 Oracle background processes (page 1 of 2)

Process Description

DBWn (Database Writer)

Writes data from buffer cache to the datafiles on disk. Up to 20 database writer processes can be started per Oracle instance. The number of writers can be controlled manually by using the DB_WRITER_PROCESSES init.ora parameter. If not specified, Oracle will determine automatically the number of writers.

LGWR (Log Writer)

Manages the redo log buffer and transmitting data from the buffer to the redo logs on disk. Log writer writes to the logs whenever one of these four scenarios occurs:• A user committed transaction• Every three seconds• When the redo buffer is third full• If DB writer needs to write dirty blocks, but their redo log is still in the

redo buffer

ARCn (Database Archiver)

Copies the redo logs to one or more log directories when a log switch occurs. The ARCn process is only turned on if the database is in ARCHIVELOG mode and automatic archiving is enabled. Up to 10 archive processes can be started per Oracle instance, controlled by the init.ora parameter LOG_ARCHIVE_MAX_PROCESSES.

CKPT (Checkpoint)

When the Oracle system performs a checkpoint, DBWn needs to destage data to disk. The CKPT process updates the data file header accordingly.



Additional database processes may be started depending on the system configuration. Some of these processes include RECO, QMNn, Jnnn, and MMON.

Finally, the database files are the physical structures that store data on disk. These files can be created within a file system, as raw partitions or in Oracle Automatic Storage Management (ASM). Oracle uses database files to maintain the logical structures within the database and store data. These logical structures include tablespaces, segments, extents, and data blocks. Database files commonly include data, control, temp, redo log and archive log files.

Oracle data elementsOracle maintains a set of database elements critical to the operation of the Oracle subsystem. These database elements consist of both physical and logical data elements.

Physical data elementsThe required physical data elements include datafile(s) for the Oracle SYSTEM tablespace, control files, redo logs, and other miscellaneous database files (the parameter file, alert and trace logs, backup files, and so on). Other physical elements such as the archive logs and

SMON (System Monitor)

Performs recovery at instance startup. It coalesces free extents in the datafiles and cleans up temporary segments of failed user processes.

PMON (Process Monitor)

Cleans up after a user process fails. The process frees up resources including database locks and the blocks in the buffer cache of the failed process.

Snnn (Server processes)

Connects user processes to the database instance. Server processes can either be dedicated or shared, depending on user requirements and the amount of host memory available.

Table 1 Oracle background processes (page 2 of 2)

Process Description

Oracle overview 29

30


additional tablespaces for data are also typically configured. A minimal configuration is shown in Figure 2, followed by a description of each data structure.

Figure 2 Physical data elements in an Oracle configuration

The Oracle SYSTEM and SYSAUX tablespaces consist of the data dictionary, PL/SQL program units, and other database objects such as users, tablespaces, tables, indexes, performance information, and so on. These tablespaces are the only ones required, although in practice, other tablespaces containing user data are typically created.

Every database has one or more physical data files. A data file is associated with just one tablespace. Data in a data file is read during normal database operations and stored in the database buffer cache. Modified or new data is not written to the data file immediately. Instead, The DB Writer background process periodically refreshes the data files from the buffer cache.

The Oracle control files consist of one or more configuration files (the control file is typically multiplexed onto separate physical spindles) that contain the name of the database, the name and location of all database datafiles and redo logs, redo and archive log history information, checkpoint information, and other information needed at system startup and while the database is running.

REDO2

REDO1

ARCH 14

ARCH 15

ARCH 16

SYSTEM

CNTL 1

CNTL 2

CNTL 2

Binaries

ICO-IMG-000502



Oracle redo logs contain data and undo changes. All changes to the database are written to the redo logs, unless logging of allowed database objects, such as user tables, is explicitly disabled. Two or more redo logs are configured, and normally the logs are multiplexed to prevent data loss in the event that database recovery is required.

Archive logs are offloaded copies of the redo logs and are normally required for recovering an Oracle database. Archive logs can be multiplexed, both locally and remotely.

Oracle binaries are the executables and libraries used to initiate the Oracle instance. Along with the binaries, Oracle uses many other files to manage and monitor the database. These files include the initialization parameter file (init<sid>.ora), server parameter file (SPFILE), alert log, and trace files.

Logical data elementsDatafiles are the primary physical data element. Oracle tablespaces are the logical element configured on top of the datafiles. Oracle tablespaces are used as containers to hold the customer's information. Each tablespace is built on one or more of the datafiles.

Tablespaces are the containers for the underlying Oracle logical data elements. These logical elements include data blocks, extents, and segments. Data blocks are the smallest logical elements configurable at the database level. Data blocks are grouped into extents that are then allocated to segments. Types of segments include data, index, temporary and undo.

Oracle overview 31

32


Figure 3 shows the relationship between the data blocks, extents, and segments.

Figure 3 Relationship between data blocks, extents, and segments

Data Blocks (16 KB)

Extent(960 KB)

Extent(960 KB)

Segment(1920 KB)

ICO-IMG-000-503



Storage managementStandard Oracle backup/restore, disaster recovery, and cloning methods can be difficult to manage and time-consuming. EMC

Symmetrix® provides many alternatives or solutions that make these operations easy to manage, fast, and very scalable. In addition, EMC developed many best practices that increase Oracle performance and high availability when using Symmetrix storage arrays.

Storage management 33

34


Cloning Oracle objects or environmentsEMC technology enables creation of an instant point-in-time copy of an Oracle database system. The cloned copy is an identical environment to its source, and can be used for other processing purposes such as backup, recovery, offline reporting, and testing.

Transportable tablespaces are an alternative to cloning an entire Oracle database. Through Oracle's transportable tablespaces, it is possible to clone an individual tablespace or all user tablespaces and present them to a different Oracle database environment. Clone creation is facilitated through the use of EMC products such as TimeFinder®, SRDF®, Open Replicator, and others.

In addition, Oracle also may clone or replicate individual database objects, such as tables, in a variety of ways. Methods include trigger-based mechanisms such as snapshots, Oracle's Advanced Replication, Streams, message queues and Oracle Data Guard.



Backup and recoveryBackup and recovery operations using Oracle utilities typically require intervention by experienced personnel and can be both labor- and resource-intensive in large Oracle environments. Recovery of large Oracle instances, such as in SAP or PeopleSoft environments, are especially complex because the entire system is basically a large referential set. All data in the set, including the database and associated application files, needs to be recovered together and to the same recovery point.

Dynamic manipulation of objects and application-maintained referential integrity further complicates recovery efforts. Traditional Oracle recovery techniques require multiple passes of the data, which can greatly impact recovery times. Such techniques are generally unworkable in large Oracle environments due to the time required to recover all objects. EMC hardware and software are used to make the process faster and more effective.

In addition to traditional backup and recovery operations, Oracle provides the Recovery Manager (RMAN) utility. RMAN provides a wide range of backup and recovery procedures through either a command line interface on a client host or a GUI interface in Enterprise Manager. RMAN performs backup or recovery operations by integrating with sessions running on the target database host. Remote procedure calls (RPCs) to specialized packages stored in the target database are then made that execute in the backup or recovery of the database. RMAN also may be configured as a repository for historical backup information that supplements records written by the utility into the database control file. EMC has worked with Oracle engineering to closely integrate RMAN with products such as TimeFinder to offload backup operations from production and reduce recovery time.

Backup and recovery 35

36


Oracle Real Application ClustersTypically, Oracle is configured with a single instance that attaches to a single database. However, Oracle can be configured with multiple host instances connecting to a single database. This configuration, which originally was called Oracle Parallel Server (OPS) in Oracle versions prior to release Oracle9i, is now known as Oracle Real Application Clusters (RAC). Implementations of Oracle RAC are configured to enhance performance, scalability, and availability over a stand-alone Oracle database.

An Oracle RAC environment consists of multiple Oracle instances that share access to a single Oracle database. Each instance contains its own memory structures and processes. In addition, each instance contains its own set of redo logs and undo segments. Each instance shares access to the datafiles making up the database. Since all hosts must have access to all database files, concurrent access to the data files through the use of cluster manager is required. This also permits one host to assume control of all datafiles in the event of an instance failure requiring recovery.

Performance and scalability are enhanced in an Oracle environment because host-based resource limitations such as CPU and memory constraints are overcome by permitting two or more host instances to attach to the same database. For example, in a homogeneous host environment, near-linear scaling of host resources is achieved by employing Oracle RAC. Additionally, because multiple hosts are configured with access to the database, availability is increased. In the event of a failure to one host or database instance, user connections are failed over to the surviving cluster members ensuring continuous operations.

Figure 4 on page 37 shows a typical Oracle RAC configuration with two member nodes. Each member of the group has its own SGA, redo logs, and undo space. Though not shown here, each member also has its own set of initialization and parameter files. Concurrent access to each data file is managed through the cluster management software. Locking and inter-instance management are communicated through a network interconnect between the RAC nodes.



Figure 4 Oracle two-node RAC configuration

High bandwidth, low latency interconnect

RAC Node 1

SGA

Binaries

RAC Node 1

SGA

Binaries

Shared storage

SYSTEM DATA INDEX

REDO1

UNDO1

Node 1 files

REDO2

UNDO2

Node 2 files

ICO-IMG-000504

Oracle Real Application Clusters 37

38


Optimizing Oracle layouts on EMC SymmetrixA primary concern for DBAs and system administrators when configuring an Oracle databases on an EMC Symmetrix VMAX™ and DMX™ is the appropriate data layout on the storage. Maximizing performance, availability, and recoverability of the database requires a thorough understanding of the I/O characteristics, uptime requirements, backup, and cloning needs. Careful consideration and planning of the back-end configuration, including RAID, physical spindles, number of front-end directors and HBAs, as well as layout of the database on the back-end of the Symmetrix array is necessary. These considerations ensure the database implementation successfully meets all business requirements.



EMC and Oracle integrationThe EMC/Oracle partnership was established in 1995 and continues to the present. Through joint engineering efforts, certification testing, collaborative solution offerings, and the Joint Services Center, EMC and Oracle maintain strong ties to ensure successful product integration for customers' mission-critical database systems.

Install base

With more than 55,000 mutual customers, EMC and Oracle are recognized as the leaders in automated networked storage and enterprise software, respectively. The EMC Symmetrix VMAX and DMX offer the highest levels of performance, scalability and availability along with industry-leading software for successfully managing and maintaining complex Oracle database environments. In addition, EMC IT has one of the largest deployments of Oracle Applications in the world, with over 35,000 named users and over 3,500 concurrent users at peak periods. Also Oracle IT uses both CLARiiON® and Symmetrix extensively.

Joint engineering

Engineers for EMC and Oracle continue to work together to develop integrated solutions, document best practices, and ensure interoperability for customers deploying Oracle databases in EMC Symmetrix VMAX and DMX storage environments. Key EMC technologies such as TimeFinder and SRDF have been certified through Oracle's Storage Certification Program (OSCP). As Oracle phased out OSCP based on the maturity of the technology, Engineering efforts continue between the two companies to ensure successful integration between each company's products. With each major technology or new product line EMC briefs Oracle Engineering about the technology changes and together they review best practices. EMC publishes many of the technology and deployment best practices as joint logo papers with the presence of the Oracle logo showing the strong communication and relationship between the companies.

EMC and Oracle integration 39

40


Joint Services CenterEMC and Oracle maintain a Joint Services Center to handle specific customer questions and issues relating to the database in EMC Symmetrix VMAX and DMX environments. When level 1 tech support from either company requires assistance with joint EMC/Oracle-related issues, calls are automatically escalated to this service center. Based in Hopkinton, Mass., this Service Center provides answers to EMC- and Oracle-related questions from leading support specialists trained in both database and storage platforms.


2

This chapter introduces the EMC foundation products discussed in this document that work in combined Symmetrix and Oracle environments:

◆ Introduction ........................................................................................ 42◆ Symmetrix hardware and EMC Enginuity features...................... 45◆ EMC Solutions Enabler base management .................................... 49◆ EMC Change Tracker......................................................................... 52◆ EMC Symmetrix Remote Data Facility ........................................... 53◆ EMC TimeFinder ................................................................................ 68◆ EMC Storage Resource Management.............................................. 81◆ EMC Storage Viewer.......................................................................... 86◆ EMC PowerPath ................................................................................. 88◆ EMC Replication Manager................................................................ 97◆ EMC Open Replicator ....................................................................... 99◆ EMC Virtual Provisioning............................................................... 100◆ EMC Virtual LUN migration.......................................................... 103◆ EMC Fully Automated Storage Tiering (FAST)........................... 106

EMC FoundationProducts

EMC Foundation Products 41

42

EMC Foundation Products

IntroductionEMC provides many hardware and software products that support Oracle environments on Symmetrix systems. This chapter provides a technical overview of the EMC products referenced in this document. The following products, which are highlighted and discussed, were used and/or tested with VMware Infrastructure deployed on EMC Symmetrix.

EMC offers an extensive product line of high-end storage solutions targeted to meet the requirements of mission-critical databases and applications. The Symmetrix product line includes the DMX Direct Matrix Architecture™ series and the VMAX Virtual Matrix™ series. EMC Symmetrix is a fully redundant, high-availability storage processor, providing nondisruptive component replacements and code upgrades. The Symmetrix system features high levels of performance, data integrity, reliability, and availability.

EMC Enginuity™ Operating Environment — Enginuity enables interoperation between the latest Symmetrix platforms and previous generations of Symmetrix systems and enables them to connect to a large number of server types, operating systems and storage software products, and a broad selection of network connectivity elements and other devices, ranging from HBAs and drivers to switches and tape systems.

EMC Solutions Enabler — Solutions Enabler is a package that contains the SYMAPI runtime libraries and the SYMCLI command line interface. SYMAPI provides the interface to the EMC Enginuity operating environment. SYMCLI is a set of commands that can be invoked from the command line or within scripts. These commands can be used to monitor device configuration and status, and to perform control operations on devices and data objects within a storage complex.

EMC Symmetrix Remote Data Facility (SRDF) — SRDF is a business continuity software solution that replicates and maintains a mirror image of data at the storage block level in a remote Symmetrix system. The SRDF component extends the basic SYMCLI command set of Solutions Enabler to include commands that specifically manage SRDF.



EMC SRDF consistency groups — An SRDF consistency group is a collection of related Symmetrix devices that are configured to act in unison to maintain data integrity. The devices in consistency groups can be spread across multiple Symmetrix systems.

EMC TimeFinder — TimeFinder is a family of products that enable LUN-based replication within a single Symmetrix system. Data is copied from Symmetrix devices using array-based resources without using host CPU or I/O. The source Symmetrix devices remain online for regular I/O operations while the copies are created. The TimeFinder family has three separate and distinct software products, TimeFinder/Mirror, TimeFinder/Clone, and TimeFinder/Snap:

• TimeFinder/Mirror enables users to configure special devices, called business continuance volumes (BCVs), to create a mirror image of Symmetrix standard devices. Using BCVs, TimeFinder creates a point-in-time copy of data that can be repurposed. The TimeFinder/Mirror component extends the basic SYMCLI command set of Solutions Enabler to include commands that specifically manage Symmetrix BCVs and standard devices.

• TimeFinder/Clone enables users to make copies of data simultaneously on multiple target devices from a single source device. The data is available to a target’s host immediately upon activation, even if the copy process has not completed. Data may be copied from a single source device to as many as 16 target devices. A source device can be either a Symmetrix standard device or a TimeFinder BCV device.

• TimeFinder/Snap enables users to configure special devices in the Symmetrix array called virtual devices (VDEVs) and save area devices (SAVDEVs). These devices can be used to make pointer-based, space-saving copies of data simultaneously on multiple target devices from a single source device. The data is available to a target’s host immediately upon activation. Data may be copied from a single source device to as many as 128 VDEVs. A source device can be either a Symmetrix standard device or a TimeFinder BCV device. A target device is a VDEV. A SAVDEV is a special device without a host address that is used to hold the changing contents of the source or target device.

Introduction 43

44


EMC Change Tracker — EMC Symmetrix Change Tracker software measures changes to data on a Symmetrix volume or group of volumes. Change Tracker software is often used as a planning tool in the analysis and design of configurations that use the EMC TimeFinder or SRDF components to store data at remote sites.

Solutions Enabler Storage Resource Management (SRM) component — The SRM component extends the basic SYMCLI command set of Solutions Enabler to include commands that allow users to systematically find and examine attributes of various objects on the host, within a specified relational database, or in the EMC enterprise storage. The SRM commands provide mapping support for relational databases, file systems, logical volumes and volume groups, as well as performance statistics.

EMC PowerPath® — PowerPath is host-based software that provides I/O path management. PowerPath operates with several storage systems, on several enterprise operating systems and provides failover and load balancing transparent to the host application and database.



Symmetrix hardware and EMC Enginuity featuresSymmetrix hardware architecture and the EMC Enginuity operating environment are the foundation for the Symmetrix storage platform. This environment consists of the following components:

◆ Symmetrix hardware

◆ Enginuity-based operating functions

◆ Solutions Enabler

◆ Symmetrix application program interface (API) for mainframe

◆ Symmetrix-based applications

◆ Host-based Symmetrix applications

◆ Independent software vendor (ISV) applications

All Symmetrix systems provide advanced data replication capabilities, full mainframe and open systems support, and flexible connectivity options, including Fibre Channel, FICON, ESCON, Gigabit Ethernet, and iSCSI.

Interoperability between Symmetrix storage systems enables customers to migrate storage solutions from one generation to the next, protecting their investment even as their storage demands expand.

Symmetrix enhanced cache director technology allows configurations of up to 512 GB of cache. The cache can be logically divided into 32 independent regions providing up to 32 concurrent 500 MB/s transaction throughput.

The Symmetrix on-board data integrity features include:

◆ Continuous cache and on-disk data integrity checking and error detection/correction

◆ Fault isolation

◆ Nondisruptive hardware and software upgrades

◆ Automatic diagnostics and phone-home capabilities

At the software level, advanced integrity features ensure information is always protected and available. By choosing a mix of RAID 1 (mirroring), RAID 1/0, high performance RAID 5 (3+1 and 7+1) protection and RAID 6, users have the flexibility to choose the

Symmetrix hardware and EMC Enginuity features 45

46


protection level most appropriate to the value and performance requirements of their information. The Symmetrix DMX and VMAX are EMC’s latest generation of high-end storage solutions.

From the perspective of the host operating system, a Symmetrix system appears to be multiple physical devices connected through one or more I/O controllers. The host operating system addresses each of these devices using a physical device name. Each physical device includes attributes, vendor ID, product ID, revision level, and serial ID. The host physical device maps to a Symmetrix device. In turn, the Symmetrix device is a virtual representation of a portion of the physical disk called a hypervolume.

Symmetrix VMAX platformThe EMC Symmetrix VMAX Series with Enginuity is a new entry to the Symmetrix product line. Built on the strategy of simple, intelligent, modular storage, it incorporates a new scalable fabric interconnect design that allows the storage array to seamlessly grow from an entry-level configuration into the world's largest storage system. The Symmetrix VMAX provides improved performance and scalability for demanding enterprise storage environments while maintaining support for EMC's broad portfolio of platform software offerings.

The Enginuity operating environment for Symmetrix version 5874 is a new, feature-rich Enginuity release supporting Symmetrix VMAX storage arrays. With the release of Enginuity 5874, Symmetrix VMAX systems deliver new software capabilities that improve capacity utilization, ease of use, business continuity and security.

The Symmetrix VMAX also maintains customer expectations for high-end storage in terms of availability. High-end availability is more than just redundancy; it means nondisruptive operations and upgrades, and being “always online.” Symmetrix VMAX provides:

◆ Nondisruptive expansion of capacity and performance at a lower price point

◆ Sophisticated migration for multiple storage tiers within the array

◆ The power to maintain service levels and functionality as consolidation grows

◆ Simplified control for provisioning in complex environments



Many of the new features provided by the new EMC Symmetrix VMAX platform can reduce operational costs for customers deploying VMware Infrastructure environments, as well as enhance functionality to enable greater benefits. This document details those features that provide significant benefits to customers deploying VMware Infrastructure environments.

Figure 5 on page 47 illustrates the architecture and interconnection of the major components in the Symmetrix VMAX storage system.

Figure 5 Symmetrix VMAX logical diagram

EMC Enginuity operating environment

EMC Enginuity is the operating environment for all Symmetrix storage systems. Enginuity manages and ensures the optimal flow and integrity of data through the different hardware components. It also manages Symmetrix operations associated with monitoring and optimizing internal data flow. This ensures the fastest response to the user's requests for information, along with protecting and replicating data. Enginuity provides the following services:

◆ Manages system resources to intelligently optimize performance across a wide range of I/O requirements.

ICO-IMG-000752

Symmetrix hardware and EMC Enginuity features 47

48


◆ Ensures system availability through advanced fault monitoring, detection, and correction capabilities and provides concurrent maintenance and serviceability features.

◆ Offers the foundation for specific software features available through EMC disaster recovery, business continuity, and storage management software.

◆ Provides functional services for both Symmetrix-based functionality and for a large suite of EMC storage application software.

◆ Defines priority of each task, including basic system maintenance, I/O processing, and application processing.

◆ Provides uniform access through APIs for internal calls, and provides an external interface to allow integration with other software providers and ISVs.



EMC Solutions Enabler base managementThe EMC Solutions Enabler kit contains all the base management software that provides a host with SYMAPI-shared libraries and the basic Symmetrix command line interface (SYMCLI). Other optional subcomponents in the Solutions Enabler (SYMCLI) series enable users to extend functionality of the Symmetrix systems. Three principle sub-components are:

◆ Solutions Enabler SYMCLI SRDF, SRDF/CG, and SRDF/A

◆ Solutions Enabler SYMCLI TimeFinder/Mirror, TimeFinder/CG, TimeFinder/Snap, TimeFinder/Clone

◆ Solutions Enabler SYMCLI Storage Resource Management (SRM)

These components are discussed later in this chapter.

SYMCLI resides on a host system to monitor and perform control operations on Symmetrix storage arrays. SYMCLI commands are invoked from the host operating system command line or via scripts. SYMCLI commands invoke low-level channel commands to specialized devices on the Symmetrix called gatekeepers. Gatekeepers are very small devices carved from disks in the Symmetrix that act as SCSI targets for the SYMCLI commands.

SYMCLI is used in single command line entries or in scripts to monitor and perform control operations on devices and data objects toward the management of the storage complex. It also monitors device configuration and status of devices that make up the storage environment. To reduce the number of inquiries from the host to the Symmetrix systems, configuration and status information is maintained in a host database file.

Table 2 lists the SYMCLI base commands discussed in this document.

Table 2 SYMCLI base commands (page 1 of 3)

Command Argument Description

symdg Performs operations on a device group (dg)

create Creates an empty device group

delete Deletes a device group

rename Renames a device group

EMC Solutions Enabler base management 49

50


release Releases a device external lock associated with all devices in a device group

list Displays a list of all device groups known to this host

show Shows detailed information about a device group and any gatekeeper or BCV devices associated with the device group

symcg Performs operations on a composite group (cg)

create Creates an empty composite group

add Adds a device to a composite group

remove Removes a device from a composite group

delete Deletes a composite group

rename Renames a composite group

release Releases a device external lock associated with all devices in a composite group

hold Hold devices in a composite group

unhold Unhold devices in a composite group

list Displays a list of all composite groups known to this host

show Shows detailed information about a composite group, and any gatekeeper or BCV devices associated with the group

symld Performs operations on a device in a device group

add Adds devices to a device group and assigns the device a logical name

list Lists all devices in a device group and any associated BCV devices

remove Removes a device from a device group

rename Renames a device in the device group

show Shows detailed information about a device in a the device group





symbcv Performs support operations on BCV pairs

list Lists BCV devices

associate Associates BCV devices to a device group – required to perform operations on the BCV device

disassociate Disassociates BCV devices from a device group

associate –rdf Associates remotely attached BCV devices to a SRDF device group

disassociate –rdf

Disassociates remotely attached BCV devices from an SRDF device group



EMC Solutions Enabler base management 51

52


EMC Change TrackerThe EMC Symmetrix Change Tracker software is also part of the base Solutions Enabler SYMCLI management offering. Change Tracker commands are used to measure changes to data on a Symmetrix volume or group of volumes. Change Tracker functionality is often used as a planning tool in the analysis and design of configurations that use the EMC SRDF and TimeFinder components to create copies of production data.

The Change Tracker command (symchg) is used to monitor the amount of changes to a group of hypervolumes. The command timestamps and marks specific volumes for tracking and maintains a bitmap to record which tracks have changed on those volumes. The bitmap can be interrogated to gain an understanding of how the data on the volume changes over time and to assess the locality of reference of applications.



EMC Symmetrix Remote Data FacilityThe Symmetrix Remote Data Facility (SRDF) component of EMC Solutions Enabler extends the basic SYMCLI command set to enable users to manage SRDF. SRDF is a business continuity solution that provides a host-independent, mirrored data storage solution for duplicating production site data to one or more physically separated target Symmetrix systems. In basic terms, SRDF is a configuration of multiple Symmetrix systems whose purpose is to maintain multiple copies of logical volume data in more than one location.

SRDF replicates production or primary (source) site data to a secondary (target) site transparently to users, applications, databases, and host processors. The local SRDF device, known as the source (R1) device, is configured in a partner relationship with a remote target (R2) device, forming an SRDF pair. While the R2 device is mirrored with the R1 device, the R2 device is write-disabled to the remote host. After the R2 device synchronizes with its R1 device, the R2 device can be split from the R1 device at any time, making the R2 device fully accessible again to its host. After the split, the target (R2) device contains valid data and is available for performing business continuity tasks through its original device address.

SRDF requires configuration of specific source Symmetrix volumes (R1) to be mirrored to target Symmetrix volumes (R2). If the primary site is no longer able to continue processing when SRDF is operating in synchronous mode, data at the secondary site is current up to the last committed transaction. When primary systems are down, SRDF enables fast failover to the secondary copy of the data so that critical information becomes available in minutes. Business operations and related applications may resume full functionality with minimal interruption.

Figure 6 on page 54 illustrates a basic SRDF configuration where connectivity between the two Symmetrix is provided using ESCON, Fibre Channel, or Gigabit Ethernet. The connection between the R1 and R2 volumes is through a logical grouping of devices called a remote adapter (RA) group. The RA group is independent of the device and composite groups defined and discussed in “SRDF device groups and composite groups” on page 55.

EMC Symmetrix Remote Data Facility 53

54


Figure 6 Basic synchronous SRDF configuration

SRDF benefits

SRDF offers the following features and benefits:

◆ High data availability

◆ High performance

◆ Flexible configurations

◆ Host and application software transparency

◆ Automatic recovery from a component or link failure

◆ Significantly reduced recovery time after a disaster

◆ Increased integrity of recovery procedures

◆ Reduced backup and recovery costs

◆ Reduced disaster recovery complexity, planning, testing, etc.

◆ Supports Business Continuity across and between multiple databases on multiple servers and Symmetrix systems.

SRDF modes of operation

SRDF currently supports the following modes of operation:

◆ Synchronous mode (SRDF/S) provides real-time mirroring of data between the source Symmetrix system(s) and the target Symmetrix system(s). Data is written simultaneously to the cache of both systems in real time before the application I/O is completed, thus ensuring the highest possible data availability.

Source Target

Escon

FCGigE

<200Km

ICO-IMG-000001

Server



Data must be successfully stored in both the local and remote Symmetrix systems before an acknowledgment is sent to the local host. This mode is used mainly for metropolitan area network distances less than 200 km.

◆ Asynchronous mode (SRDF/A) maintains a dependent-write consistent copy of data at all times across any distance with no host application impact. Applications needing to replicate data across long distances historically have had limited options. SRDF/A delivers high-performance, extended-distance replication and reduced telecommunication costs while leveraging existing management capabilities with no host performance impact.

◆ Adaptive copy mode transfers data from source devices to target devices regardless of order or consistency, and without host performance impact. This is especially useful when transferring large amounts of data during data center migrations, consolidations, and in data mobility environments. Adaptive copy mode is the data movement mechanism of the Symmetrix Automated Replication (SRDF/AR) solution.

SRDF device groups and composite groupsApplications running on Symmetrix systems normally involve a number of Symmetrix devices. Therefore, any Symmetrix operation must ensure all related devices are operated upon as a logical group. Defining device or composite groups achieves this.

A device group or a composite group is a user-defined group of devices that SYMCLI commands can execute upon. Device groups are limited to a single Symmetrix system and RA group (a.k.a. SRDF group). A composite group, on the other hand, can span multiple Symmetrix systems and RA groups. The device or composite group type may contain R1 or R2 devices and may contain various device lists for standard, BCV, virtual, and remote BCV devices. The symdg/symld and symcg commands are used to create and manage device and composite groups.

SRDF consistency groupsAn SRDF consistency group is a collection of devices defined by a composite group that has been enabled for consistency protection. Its purpose is to protect data integrity for applications that span multiple


56


RA groups and/or multiple Symmetrix systems. The protected applications may comprise multiple heterogeneous data resource managers across multiple host operating systems.

An SRDF consistency group uses PowerPath or Enginuity Consistency Assist (SRDF-ECA) to provide synchronous disaster restart with zero data loss. Disaster restart solutions that use consistency groups provide remote restart with short recovery time objectives. Zero data loss implies that all completed transactions at the beginning of a disaster will be available at the target.

When the amount of data for an application becomes very large, the time and resources required for host-based software to protect, back up, or run decision-support queries on these databases become critical factors. The time required to quiesce or shut down the application for offline backup is no longer acceptable. SRDF consistency groups allow users to remotely mirror the largest data environments and automatically split off dependent-write consistent, restartable copies of applications in seconds without interruption to online service.

A consistency group is a composite group of SRDF devices (R1 or R2) that act in unison to maintain the integrity of applications distributed across multiple Symmetrix systems or multiple RA groups within a single Symmetrix. If a source (R1) device in the consistency group cannot propagate data to its corresponding target (R2) device, EMC software suspends data propagation from all R1 devices in the consistency group, halting all data flow to the R2 targets. This suspension, called tripping the consistency group, ensures that a dependent-write consistent R2 copy of the database up to the point in time that the consistency group tripped.

Tripping a consistency group can occur either automatically or manually. Scenarios in which an automatic trip would occur include:

◆ One or more R1 devices cannot propagate changes to their corresponding R2 devices

◆ The R2 device fails

◆ The SRDF directors on the R1 side or R2 side fail

In an automatic trip, the Symmetrix system completes the write to the R1 device, but indicates that the write did not propagate to the R2 device. EMC software intercepts the I/O and instructs the Symmetrix to suspend all R1 source devices in the consistency group from propagating any further writes to the R2 side. Once the suspension is



complete, writes to all of the R1 devices in the consistency group continue normally, but they are not propagated to the target side until normal SRDF mirroring resumes.

An explicit trip occurs when the command symrdf –cg suspend or split is invoked. Suspending or splitting the consistency group creates an on-demand, restartable copy of the database at the R2 target site. BCV devices that are synchronized with the R2 devices are then split after the consistency group is tripped, creating a second dependent-write consistent copy of the data. During the explicit trip, SYMCLI issues the command to create the dependent-write consistent copy, but may require assistance from PowerPath or SRDF-ECA if I/O is received on one or more R1 devices, or if the SYMCLI commands issued are abnormally terminated before the explicit trip.

An EMC consistency group maintains consistency within applications spread across multiple Symmetrix systems in an SRDF configuration, by monitoring data propagation from the source (R1) devices in a consistency group to their corresponding target (R2) devices as depicted in Figure 7. Consistency groups provide data integrity protection during a rolling disaster.

Figure 7 SRDF consistency group

Host 1

DBMS

Host 2

DBMS

RDF-ECA

RDF-ECA

Consistency groupHost component

Symmetrix control Facility

Consistency groupHost component

Symmetrix control Facility

2

R1(Z)

R1(Y)

R1(X)

R2(Y)

R2(Z)

R2(X)

R1(C)

R1(B)

R1(A)

R2(B)

R2(C)

R2(A)

ICO-IMG-000106

Suspend R1/R2relationship

DBMSrestartablecopy

E-ConGroupdefinition(X,Y,Z)

X = DBMS dataY = Application dataZ = Logs

1

3

4 5

6

7


58


A consistency group protection is defined containing volumes X, Y, and Z on the source Symmetrix. This consistency group definition must contain all of the devices that need to maintain dependent-write consistency and reside on all participating hosts involved in issuing I/O to these devices. A mix of CKD (mainframe) and FBA (UNIX/Windows) devices can be logically grouped together. In some cases, the entire processing environment may be defined in a consistency group to ensure dependent-write consistency.

The rolling disaster described previously begins, preventing the replication of changes from volume Z to the remote site.

Since the predecessor log write to volume Z cannot be propagated to the remote Symmetrix system, a consistency group trip occurs.

A ConGroup trip holds the write that could not be replicated along with all of the writes to the logically grouped devices. The writes are held by PowerPath on UNIX/Windows hosts and by IOS on mainframe hosts (or by ECA-RDA for both UNIX/Windows and mainframe hosts) long enough to issue two I/Os to all of the Symmetrix systems involved in the consistency group. The first I/O changes the state of the devices to a suspend-pending state.

The second I/O performs the suspend actions on the R1/R2 relationships for the logically grouped devices which immediately disables all replication to the remote site. This allows other devices outside of the group to continue replicating, provided the communication links are available. After the relationship is suspended, the completion of the predecessor write is acknowledged back to the issuing host. Furthermore, all writes that were held during the consistency group trip operation are released.

After the second I/O per Symmetrix completes, the I/O is released, allowing the predecessor log write to complete to the host. The dependent data write is issued by the DBMS and arrives at X but is not replicated to the R2(X).

When a complete failure occurs from this rolling disaster, the dependent-write consistency at the remote site is preserved. If a complete disaster does not occur and the failed links are activated again, the consistency group replication can be resumed. EMC recommends creating a copy of the dependent-write consistent image while the resume takes place. Once the SRDF process reaches synchronization the dependent-write consistent copy is achieved at the remote site.



SRDF terminologyThis section describes various terms related to SRDF operations.

Suspend and resume operationsPractical uses of suspend and resume operations usually involve unplanned situations in which an immediate suspension of I/O between the R1 and R2 devices over the SRDF links is desired. In this way, data propagation problems can be stopped. When suspend is used with consistency groups, immediate backups can be performed off the R2s without affecting I/O from the local host application. I/O can then be resumed between the R1 and R2 and return to normal operation.

Establish and split operationsThe establish and split operations are normally used in planned situations in which use of the R2 copy of the data is desired without interfering with normal write operations to the R1 device. Splitting a point-in-time copy of data allows access to the data on the R2 device for various business continuity tasks. The ease of splitting SRDF pairs to provide exact database copies makes it convenient to perform scheduled backup operations, reporting operations, or new application testing from the target Symmetrix data while normal processing continues on the source Symmetrix system.

The R2 copy can also be used to test disaster recovery plans without manually intensive recovery drills, complex procedures, and application service interruptions. Upgrades to new versions can be tested or changes to actual code can be made without affecting the online production server. For example, modified server code can be run on the R2 copy of the database until the upgraded code runs with no errors before upgrading the production server.

In cases where an absolute real-time copy of the production data is not essential, users may choose to split the SRDF pair periodically and use the R2 copy for queries and report generation. The SRDF pair can be re-established periodically to provide incremental updating of data on the R2 device. The ability to refresh the R2 device periodically provides the latest information for data processing and reporting.

Failover and failback operationsPractical uses of failover and failback operations usually involve the need to switch business operations from the production site to a remote site (failover) or the opposite (failback). Once failover


60


occurs, normal operations continue using the remote (R2) copy of synchronized application data. Scheduled maintenance at the production site is one example of where failover to the R2 site might be needed.

Testing of disaster recovery plans is the primary reason to temporarily fail over to a remote site. Traditional disaster recovery routines involve customized software and complex procedures. Offsite media must be either electronically transmitted or physically shipped to the recovery site. Time-consuming restores and the application of logs usually follow. SRDF failover/failback operations significantly reduce the recovery time by incrementally updating only the specific tracks that have changed; this accomplishes in minutes what might take hours for a complete load from dumped database volumes.

Update operationThe update operation allows users to resynchronize the R1s after a failover while continuing to run application and database services on the R2s. This function helps reduce the amount of time that a failback to the R1 side takes. The update operation is a subset of the failover/failback functionality. Practical uses of the R1 update operation usually involve situations in which the R1 becomes almost synchronized with the R2 data before a failback, while the R2 side is still online to its host. The -until option, when used with update, specifies the target number of invalid tracks that are allowed to be out of sync before resynchronization to the R1 completes.

Concurrent SRDFConcurrent SRDF means having two target R2 devices configured as concurrent mirrors of one source R1 device. Using a Concurrent SRDF pair allows the creation of two copies of the same data at two remote locations. When the two R2 devices are split from their source R1 device, each target site copy of the application can be accessed independently.

R1/R2 swapSwapping R1/R2 devices of an SRDF pair causes the source R1 device to become a target R2 device and vice versa. Swapping SRDF devices allows the R2 site to take over operations while retaining a remote mirror on the original source site. Swapping is especially useful after failing over an application from the R1 site to the R2 site. SRDF swapping is available with Enginuity version 5567 or later.



Data Mobility Data mobility is an SRDF configuration that restricts SRDF devices to operating only in adaptive copy mode. This is a lower-cost licensing option that is typically used for data migrations. It allows data to be transferred in adaptive copy mode from source to target, and is not designed as a solution for DR requirements unless used in combination with TimeFinder.

Dynamic SRDFDynamic SRDF allows the creation of SRDF pairs from non-SRDF devices while the Symmetrix system is in operation. Historically, source and target SRDF device pairing has been static and changes required assistance from EMC personnel. This feature provides greater flexibility in deciding where to copy protected data.

Dynamic RA groups can be created in a SRDF switched fabric environment. An RA group represents a logical connection between two Symmetrix systems. Historically, RA groups were limited to those static RA groups defined at configuration time. However, RA groups can now be created, modified, and deleted while the Symmetrix system is in operation. This provides greater flexibility in forming SRDF-pair-associated links.

SRDF control operationsThis section describes typical control operations that can be performed by the Solutions Enabler symrdf command.

Solutions Enabler SYMCLI SRDF commands perform the following basic control operations on SRDF devices:

◆ Establish synchronizes an SRDF pair by initiating a data copy from the source (R1) side to the target (R2) side. This operation can be a full or incremental establish. Changes on the R2 volumes are discarded by this process.

◆ Restore resynchronizes a data copy from the target (R2) side to the source (R1) side. This operation can be a full or incremental restore. Changes on the R1 volumes are discarded by this process.

◆ Split stops mirroring for the SRDF pair(s) in a device group and write-enables the R2 devices.

◆ Swap exchanges the source (R1) and target (R2) designations on the source and target volumes.


62


◆ Failover switches data processing from the source (R1) side to the target (R2) side. The source side volumes (R1), if still available, are write-disabled.

◆ Failback switches data processing from the target (R2) side to the source (R1) side. The target side volumes (R2), if still available, are write-disabled.

Establishing an SRDF pairEstablishing an SRDF pair initiates remote mirroring—the copying of data from the source (R1) device to the target (R2) device. SRDF pairs come into existence in two different ways:

◆ At configuration time through the pairing of SRDF devices. This is a static pairing configuration discussed earlier.

◆ Anytime during a dynamic pairing configuration in which SRDF pairs are created on demand.

A full establish (symrdf establish –full) is typically performed after an SRDF pair is initially configured and connected via the SRDF links. After the first full establish, users can perform an incremental establish, where the R1 device copies to the R2 device only the new data that was updated while the relationship was split or suspended.

To initiate an establish operation on all SRDF pairs in a device or composite group, all pairs must be in the split or suspended state. The symrdf query command is used to check the state of SRDF pairs in a device or composite group.

When the establish operation is initiated, the system write-disables the R2 device to its host and merges the track tables. The merge creates a bitmap of the tracks that need to be copied to the target volumes discarding the changes on the target volumes. When the establish operation is complete and the SRDF pairs are in the synchronized state. The R1 device and R2 device contain identical data, and continue to do so until interrupted by administrative command or unplanned disruption. Figure 8 depicts SRDF establish and restore operations:



Figure 8 SRDF establish and restore control operations

The establish operation may be initiated by any host connected to either Symmetrix system, provided that an appropriate device group has been built on that host. The following command initiates an incremental establish operation for all SRDF pairs in the device group named MyDevGrp:

symrdf –g MyDevGrp establish –noprompt

Splitting an SRDF pairWhen read/write access to a target (R2) device is necessary, the SRDF pair can be split. When the split completes, the target host can access the R2 device for write operations. The R2 device contains valid data and is available for business continuity tasks or restoring data to the R1 device if there is a loss of data on that device.

While an SRDF pair is in the split state, local I/O to the R1 device can still occur. These updates are not propagated to the R2 device immediately. Changes on each Symmetrix system are tracked through bitmaps and are reconciled when normal SRDF mirroring operations are resumed. To initiate a split, an SRDF pair must already be in one of the following states:

◆ Synchronized◆ Suspended◆ R1 updated◆ SyncInProg (if the –symforce option is specified for the split –

resulting in a set of R2 devices that are not dependent-write consistent and are not usable)

The split operation may be initiated from either host. The following command initiates a split operation on all SRDF pairs in the device group named MyDevGrp:

R1 R2

Establish

Restore

ICO-IMG-000003

Productionserver

DRserver

ProductionDBMS

Disaster recoveryDBMS

Data

Logs

Data

Logs


64


symrdf –g MyDevGrp split –noprompt

The symrdf split command provides exactly the same functionality as the symrdf suspend and symrdf rw_enable R2 commands together. Furthermore, the split and suspend operations have exactly the same consistency characteristics as SRDF consistency groups. Therefore, when SRDF pairs are in a single device group, users can split the SRDF pairs in the device group as shown previously and have restartable copies on the R2 devices. If the application data spans multiple Symmetrix systems or multiple RA groups, include SRDF pairs in a consistency group to achieve the same results.

Restoring an SRDF pairWhen the target (R2) data must be copied back to the source (R1) device, the SRDF restore command is used (see Figure 8 on page 63). After an SRDF pair is split, the R2 device contains valid data and is available for business continuance tasks (such as running a new application) or restoring data to the R1 device. Moreover, if the results of running a new application on the R2 device need to be preserved, moving the changed data and new application to the R1 device is another option.

Users can perform a full or incremental restore. A full restore operation copies the entire contents of the R2 device to the R1 device. An incremental restore operation is much faster because it copies only new data that was updated on the R2 device while the SRDF pair was split. Any tracks on the R1 device that changed while the SRDF pair was split are replaced with corresponding tracks on the R2 device. To initiate a restore, an SRDF pair must already be in the split state. The restore operation can be initiated from either host. The following command initiates an incremental restore operation on all SRDF pairs in the device group named MyDevGrp (add the –full option for a full restore).

symrdf –g MyDevGrp restore –noprompt symrdf –g MyDevGrp restore –noprompt -full

The restore operation is complete when the R1 and R2 devices contain identical data. The SRDF pair is then in a synchronized state and may be reestablished by initiating the following command:

symrdf -g MyDevGrp establish



Failover and failback operationsHaving a synchronized SRDF pair allows users to switch data processing operations from the source site to the target site if operations at the source site are disrupted or if downtime must be scheduled for maintenance. This switchover from source to target is enabled through the use of the failover command. When the situation at the source site is back to normal, a failback operation is used to reestablish I/O communications links between source and target, resynchronize the data between the sites, and resume normal operations on the R1 devices as shown in Figure 9, which illustrates the failover and failback operations.

Figure 9 SRDF failover and failback control operations

The failover and failback operations relocate the processing from the source site to the target site or vice versa. This may or may not imply movement of data.

FailoverScheduled maintenance or storage system problems can disrupt access to production data at the source site. In this case, a failover operation can be initiated from either host to make the R2 device read/write-enabled to its host. Before issuing the failover, all applications services on the R1 volumes must be stopped. This is because the failover operation makes the R1 volumes read-only. The following command initiates a failover on all SRDF pairs in the device group named MyDevGrp:

symrdf –g MyDevGrp failover –noprompt

R1 R2

Failover

Failback

ICO-IMG-000004

Productionserver

DRserver

ProductionDBMS

Disaster recoveryDBMS

Data

Logs

Data

Logs


66


To initiate a failover, the SRDF pair must already be in one of the following states:

◆ Synchronized◆ Suspended◆ R1 updated◆ Partitioned (when invoking this operation at the target site)

The failover operation:

◆ Suspends data traffic on the SRDF links◆ Write-disables the R1 devices◆ Write-enables the R2 volumes

FailbackTo resume normal operations on the R1 side, a failback (R1 device takeover) operation is initiated. This means read/write operations on the R2 device must be stopped, and read/write operations on the R1 device must be started. When the failback command is initiated, the R2 becomes read-only to its host, while the R1 becomes read/write-enabled to its host. The following command performs a failback operation on all SRDF pairs in the device group named MyDevGrp:

symrdf –g MyDevGrp failback -noprompt

The SRDF pair must already be in one of the following states for the failback operation to succeed:

◆ Failed over◆ Suspended and write-disabled at the source◆ Suspended and not ready at the source◆ R1 Updated◆ R1 UpdInProg

The failback operation:

◆ Write-enables the R1 devices.◆ Performs a track table merge to discard changes on the R1s.◆ Transfers the changes on the R2s.◆ Resumes traffic on the SRDF links.◆ Write-disables the R2 volumes.



EMC SRDF/Cluster Enabler solutionsEMC SRDF/Cluster Enabler (SRDF/CE) for MSCS is an integrated solution that combines SRDF and clustering protection over distance. EMC SRDF/CE provides disaster-tolerant capabilities that enable a cluster to span geographically separated Symmetrix systems. It operates as a software extension (MMC snap-in) to the Microsoft Cluster Service (MSCS).

SRDF/CE achieves this capability by exploiting SRDF disaster restart capabilities. SRDF allows the MSCS cluster to have two identical sets of application data in two different locations. When cluster services are failed over or failed back, SRDF/CE is invoked automatically to perform the SRDF functions necessary to enable the requested operation.

Figure 10 illustrates the hardware configuration of two, four-node, geographically distributed EMC SRDF/CE clusters using bidirectional SRDF.

Figure 10 Geographically distributed four-node EMC SRDF/CE clusters

R1

R2

R1

R2

Clients

Enterprise LAN/WAN

Primarysite nodes

Secondarysite nodes

Fibre Channelor SCSI

Fibre Channelor SCSI

SRDF

ICO-IMG-000005


68


EMC TimeFinderThe SYMCLI TimeFinder component extends the basic SYMCLI command set to include TimeFinder or business continuity commands that allow control operations on device pairs within a local replication environment. This section specifically describes the functionality of:

◆ TimeFinder/Mirror — General monitor and control operations for business continuance volumes (BCV)

◆ TimeFinder/CG — Consistency groups

◆ TimeFinder/Clone — Clone copy sessions

◆ TimeFinder/Snap — Snap copy sessions

Commands such as symmir and symbcv perform a wide spectrum of monitor and control operations on standard/BCV device pairs within a TimeFinder/Mirror environment. The TimeFinder/Clone command, symclone, creates a point-in-time copy of a source device on nonstandard device pairs (such as standard/standard, BCV/BCV). The TimeFinder/Snap command, symsnap, creates virtual device copy sessions between a source device and multiple virtual target devices. These virtual devices only store pointers to changed data blocks from the source device, rather than a full copy of the data. Each product requires a specific license for monitoring and control operations.

Configuring and controlling remote BCV pairs requires EMC SRDF business continuity software discussed previously. The combination of TimeFinder with SRDF provides for multiple local and remote copies of production data.

Figure 11 illustrates application usage for a TimeFinder/Mirror configuration in a Symmetrix system.



Figure 11 EMC Symmetrix configured with standard volumes and BCVs

TimeFinder/Mirror establish operationsA BCV device can be fully or incrementally established. After configuration and initialization of a Symmetrix system, BCV devices contain no data. BCV devices, like standard devices, can have unique host addresses and can be online and ready to the host(s) to which they are connected. A full establish operation must be used the first time the standard devices are paired with the BCV devices. An incremental establish of a BCV device can be performed to resynchronize any data that has changed on the standard since the last establish operation.

Note: When BCVs are established, they are inaccessible to any host.

Symmetrix systems allow up to four mirrors for each hypervolume. The mirror positions are commonly designated M1, M2, M3, and M4. An unprotected BCV can be the second, third, or fourth mirror position of the standard device. A host, however, logically views the Symmetrix M1/M2 mirrored devices as a single device.

To assign a BCV as a mirror of a standard Symmetrix device, the symmir establish command is used. One method of establishing a BCV pair is to allow the standard/BCV device-pairing algorithm to arbitrarily create BCV pairs from multiple devices within a device group:

symmir -g MyDevGrp establish –full -noprompt

With this method, TimeFinder/Mirror first checks for any attach assignments (specifying a preferred BCV match from among multiple BCVs in a device group). TimeFinder/Mirror then checks if there are

STD

STD

STD

BCV

BCV

BCV

Target data uses:BackupData warehouseRegression testingData protection

ICO-IMG-000006

ServerrunningSYMCLI

EMC TimeFinder 69

70


any pairing relationships among the devices. If either of these previous conditions exists, TimeFinder/Mirror uses these assignments.

TimeFinder split operationsSplitting a BCV pair is a TimeFinder/Mirror action that detaches the BCV from its standard device and makes the BCV ready for host access. When splitting a BCV, the system must perform housekeeping tasks that may require a few milliseconds on a busy Symmetrix system. These tasks involve a series of steps that result in separation of the BCV from its paired standard:

◆ I/O is suspended briefly to the standard device.

◆ Write pending tracks for the standard device that have not yet been written out to the BCV are duplicated in cache to be written to the BCV.

◆ The BCV is split from the standard device.

◆ The BCV device status is changed to ready.

Regular splitA regular split is the type of split that has existed for TimeFinder/Mirror since its inception. With a regular split (before Enginuity version 5568), I/O activity from the production hosts to a standard volume was not accepted until it was split from its BCV pair. Therefore, applications attempting to access the standard or the BCV would experience a short wait during a regular split. Once the split was complete, no further overhead was incurred.

Beginning with Enginuity version 5568, any split operation is an instant split. A regular split is still valid for earlier versions and for current applications that perform regular split operations. However, current applications that perform regular splits with Enginuity version 5568 actually perform an instant split.

By specifying the –instant option on the command line, an instant split with Enginuity versions 5x66 and 5x67 can be performed. Since version 5568, this option is no longer required because instant split mode has become the default behavior. It is beneficial to continue to supply the –instant flag with later Enginuity versions, otherwise the default is to wait for the background split to complete.



Instant splitAn instant split shortens the wait period during a split by dividing the process into a foreground split and a background split. During an instant split, the system executes the foreground split almost instantaneously and returns a successful status to the host. This instantaneous execution allows minimal I/O disruptions to the production volumes. Furthermore, the BCVs are accessible to the hosts as soon as the foreground process is complete. The background split continues to split the BCV pair until it is complete. When the -instant option is included or defaulted, SYMCLI returns immediately after the foreground split, allowing other operations while the BCV pair is splitting in the background.

The following operation performs an instant split on all BCV pairs in MyDevGrp, and allows SYMCLI to return to the server process while the background split is in progress:

symmir -g MyDevGrp split –instant –noprompt

The following symmir query command example checks the progress of a split on the composite group named MyConGrp. The –bg option is provided to query the status of the background split:

symmir –cg MyConGrp query –bg

TimeFinder restore operations

A BCV device can be used to fully or incrementally restore data on the standard volume. Like the full establish operation, a full restore operation copies the entire contents of the BCV devices to the standard devices. The devices upon which the restore operates may be defined in a device group, composite group, or device file. For example:

symmir -g MyDevGrp -full restore –nopromptsymmir -cg MyConGrp -full restore –nopromptsymmir -f MyFile -full –sid 109 restore -noprompt

The incremental restore process accomplishes the same thing as the full restore process with a major time-saving exception. The BCV copies to the standard device only new data that was updated on the BCV device while the BCV pair was split. The data on the corresponding tracks of the BCV device also overwrites any changed tracks on the standard device. This maximizes the efficiency of the resynchronization process. This process is useful, for example, if,

EMC TimeFinder 71

72


after testing or validating an updated version of a database or a new application on the BCV device is completed, a user wants to migrate and utilize a copy of the tested data or application on the production standard device.

Note: An incremental restore of a BCV volume to a standard volume is only possible when the two volumes have an existing TimeFinder relationship

TimeFinder consistent splitTimeFinder consistent split allows you to split off a dependent-write consistent, restartable image of an application without interrupting online services. Consistent split helps to avoid inconsistencies and restart problems that can occur when splitting an application-related BCV without first quiescing or halting the application. Consistent split is implemented using Enginuity Consistency Assist (ECA) feature. This functionality requires a TimeFinder/CG license.

Enginuity Consistency AssistThe Enginuity Consistency Assist (ECA) feature of the Symmetrix operating environment can be used to perform consistent split operations across multiple heterogeneous environments. This functionality requires a TimeFinder/CG license and uses the –consistent option of the symmir command.

Using ECA to consistently split BCV devices from the standards, a control host with no database or a database host with a dedicated channel to gatekeeper devices must be available. The dedicated channel cannot be used for servicing other devices or to freeze I/O. For example, to split a device group, execute:

symmir –g MyDevGrp split –consistent -noprompt

Figure 12 illustrates an ECA split across three database hosts that access devices on a Symmetrix system.



Figure 12 ECA consistent split across multiple database-associated hosts

Device groups or composite groups must be created on the controlling host for the target application to be consistently split. Device groups can be created to include all of the required devices for maintaining business continuity. For example, if a device group is defined that includes all of the devices being accessed by Hosts A, B, and C (see Figure 12), then all of the BCV pairs related to those hosts can be consistently split with a single command.

However, if a device group is defined that includes only the devices accessed by Host A, then the BCV pairs related to Host A can be split without affecting the other hosts. The solid vertical line in Figure 12 represents the ECA holding of I/Os during an instant split process, creating a dependent-write consistent image in the BCVs.

Figure 13 illustrates the use of local consistent split with a database management system (DBMS).

STD

STD

STD

BCV

BCV

BCV

ICO-IMG-000007

prodgrp

Controlling host

SYMAPIECA

Databaseservers

Host B

Consistent split

Host A

Host C

EMC TimeFinder 73

74


Figure 13 ECA consistent split on a local Symmetrix system

When a split command is issued with ECA from the production host, a consistent database image is created using the following sequence of events shown in Figure 13:

1. The device group, device file, or composite group identifies the standard devices that hold the database.

2. SYMAPI communicates to Symmetrix Enginuity to validate that all identified BCV pairs can be split.

3. SYMAPI communicates to Symmetrix Enginuity to open the ECA window (the time within Symmetrix Enginuity where the writes are deferred), the instant split is issued, and the writes are released by closing the window.

4. ECA suspends writes to the standard devices that hold the database. The DBMS cannot write to the devices and subsequently waits for these devices to become available before resuming any further write activity. Read activity to the device is not affected unless attempting to read from a device with a write queued against it.

5. SYMAPI sends an instant split request to all BCV pairs in the specified device group and waits for the Symmetrix to acknowledge that the foreground split has occurred. SYMAPI then communicates with Symmetrix Enginuity to resume the write or close the ECA window.

6. The application resumes writing to the production devices.

The BCV devices now contain a restartable copy of the production data that is consistent up until the time of the instant split. The production application is unaware that the split or

ICO-IMG-000008

Host

DBMS

PowerPath orECA

SYMAPISYMCLI 1

2

3

4

5

6

Symmetrix

Applicationdata

LOGS

Applicationdata

Otherdata

BCV

BCV

BCV

BCV



suspend/resume operation occurred. When the application on the secondary host is started using the BCVs, there is no record of a successful shutdown. Therefore, the secondary application instance views the BCV copy as a crashed instance and proceeds to perform the normal crash recovery sequence to restart.

When performing a consistent split, it is a good practice to issue host-based commands that commit any data that has not been written to disk before the split to reduce the amount of time on restart. For example on UNIX systems, the sync command can be run. From a database perspective, a checkpoint or equivalent should be executed.

TimeFinder/Mirror reverse split

BCVs can be mirrored to guard against data loss through physical drive failures. A reverse split is applicable for a BCV that is configured to have two local mirrors. It is generally used to recover from an unsuccessful restore operation. When data is restored from the BCV to the standard device, any writes that occur while the standard is being restored alter the original copy of data on the BCVs primary mirror. If the original copy of BCV data is needed again at a later time, it can be restored to the BCVs primary mirror from the BCVs secondary mirror using a reverse split. For example, whenever logical corruption is reintroduced to a database during a recovery process (following a BCV restore), both the standard device and the primary BCV mirror are left with corrupted data. In this case, a reverse split can restore the original BCV data from a BCVs secondary mirror to its primary mirror.

This is particularly useful when performing a restore and immediately restarting processing on the standard devices when the process may have to be restarted many times.

Note: Reverse split is not available when protected restore is used to return the data from the BCVs to the standards.

TimeFinder/Clone operations

Symmetrix TimeFinder/Clone operations using SYMCLI can create up to 16 copies from a source device onto target devices. Unlike TimeFinder/Mirror, TimeFinder/Clone does not require the traditional standard-to-BCV device pairing. Instead, TimeFinder/Clone allows any combination of source and target

EMC TimeFinder 75

76


devices. For example, a BCV can be used as the source device, while another BCV can be used as the target device. Any combination of source and target devices can be used. Additionally, TimeFinder/Clone does not use the traditional mirror positions the way that TimeFinder/Mirror does. Because of this, TimeFinder/Clone is a useful option when more than three copies of a source device are desired.

Normally, one of the three copies is used to protect the data against hardware failure.

The source and target devices must be the same emulation type (FBA or CKD). The target device must be equal in size to the source device. Clone copies of striped or concatenated metavolumes can also be created providing the source and target metavolumes are identical in configuration. Once activated, the target device can be instantly accessed by a target’s host, even before the data is fully copied to the target device.

TimeFinder/Clone copies are appropriate in situations where multiple copies of production data is needed for testing, backups, or report generation. Clone copies can also be used to reduce disk contention and improve data access speed by assigning users to copies of data rather than accessing the one production copy. A single source device may maintain as many as 16 relationships that can be a combination of BCVs, clones and snaps.

Clone copy sessionsTimeFinder/Clone functionality is controlled via copy sessions, which pair the source and target devices. Sessions are maintained on the Symmetrix system and can be queried to verify the current state of the device pairs. A copy session must first be created to define and set up the TimeFinder/Clone devices. The session is then activated, enabling the target device to be accessed by its host. When the information is no longer needed, the session can be terminated. TimeFinder/Clone operations are controlled from the host by using the symclone command to create, activate, and terminate the copy sessions.

Figure 14 illustrates a copy session where the controlling host creates a TimeFinder/Clone copy of standard device DEV001 on target device DEV005, using the symclone command.



Figure 14 Creating a copy session using the symclone command

The symclone command is used to enable cloning operations. The cloning operation happens in two phases: creation and activation. The creation phase builds bitmaps of the source and target that are later used during the activation or copy phase. The creation of a symclone pairing does not start copying of the source volume to the target volume, unless the -precopy keyword is used.

For example, to create clone sessions on all the standards and BCVs in the device group MyDevGrp, use the following command:

symclone -g MyDevGrp create -noprompt

The activation of a clone enables the copying of the data. The data may start copying immediately if the –copy keyword is used. If the –copy keyword is not used, tracks are only copied when they are accessed from the target volume or when they are changed on the source volume.

Activation of the clone session established in the previous create command can be accomplished using the following command.

symclone –g MyDevGrp activate -noprompt

New Symmetrix VMAX TimeFinder/Clone features

Solutions Enabler 7.1 and Enginuity 5874 SR1 introduce the ability to clone from thick to thin devices using TimeFinder/Clone. thick to thin TimeFinder/Clone allows application data to be moved from standard Symmetrix volumes to virtually provisioned storage within the same array. For some workloads virtually provisioned volumes offer advantages with allocation utilization, ease of use and performance through automatic wide striping. thick to thin TimeFinder/Clone provides an easy way to move workloads that

ICO-IMG-000490

Server runningSYMCLI

Target host

DEV005

DEV001

1

2

EMC TimeFinder 77

78


benefit from Virtual Provisioning into that storage paradigm. Migration from Thin devices back to fully provisioned devices is also possible. The source and target of the migration may be of different protection types and disk technologies offering versatility with protections schemes and disk tier options. thick to thin TimeFinder Clone will not disrupt hosts or internal array replication sessions during the copy process.

TimeFinder/Snap operationsSymmetrix arrays provide another technique to create copies of application data. The functionality, called TimeFinder/Snap, allows users to make pointer-based, space-saving copies of data simultaneously on multiple target devices from a single source device. The data is available for access instantly. TimeFinder/Snap allows data to be copied from a single source device to as many as 128 target devices. A source device can be either a Symmetrix standard device or a BCV device controlled by TimeFinder/Mirror, with the exception being a BCV working in clone emulation mode. The target device is a Symmetrix virtual device (VDEV) that consumes negligible physical storage through the use of pointers to track changed data.

The VDEV is a host-addressable Symmetrix device with special attributes created when the Symmetrix system is configured. However, unlike a BCV which contains a full volume of data, a VDEV is a logical-image device that offers a space-saving way to create instant, point-in-time copies of volumes. Any updates to a source device after its activation with a virtual device, causes the pre-update image of the changed tracks to be copied to a save device. The virtual device’s indirect pointer is then updated to point to the original track data on the save device, preserving a point-in-time image of the volume. TimeFinder/Snap uses this copy-on-first-write technique to conserve disk space, since only changes to tracks on the source cause any incremental storage to be consumed.

The symsnap create and symsnap activate commands are used to create source/target Snap pair.



Table 3 summarizes some of the differences between devices used in TimeFinder/Snap operations.

Snap copy sessionsTimeFinder/Snap functionality is managed via copy sessions, which pair the source and target devices. Sessions are maintained on the Symmetrix system and can be queried to verify the current state of the devices. A copy session must first be created—a process which defines the Snap devices in the operation. On subsequent activation, the target virtual devices become accessible to its host. Unless the data is changed by the host accessing the virtual device, the virtual device always presents a frozen point-in-time copy of the source device at the point of activation. When the information is no longer needed, the session can be terminated.

TimeFinder/Snap operations are controlled from the host by using the symsnap command to create, activate, terminate, and restore the TimeFinder/Snap copy sessions. The TimeFinder/Snap operations described in this section explain how to manage the devices participating in a copy session through SYMCLI.

Figure 15 on page 80 illustrates a virtual copy session where the controlling host creates a copy of standard device DEV001 on target device VDEV005.

Table 3 TimeFinder device type summary

Device Description

Virtual device A logical-image device that saves disk space through the use of pointers to track data that is immediately accessible after activation. Snapping data to a virtual device uses a copy-on-first-write technique.

Save device A device that is not host-accessible but accessed only through the virtual devices that point to it. Save devices provide a pool of physical space to store snap copy data to which virtual devices point.

BCV A full volume mirror that has valid data after fully synchronizing with its source device. It is accessible only when split from the source device that it is mirroring.

EMC TimeFinder 79

80


Figure 15 TimeFinder/Snap copy of a standard device to a VDEV

The symsnap command is used to enable TimeFinder/Snap operations. The snap operation happens in two phases: creation and activation. The creation phase builds bitmaps of the source and target that are later used to manage the changes on the source and target. The creation of a snap pairing does not copy the data from the source volume to the target volume. To create snap sessions on all the standards and BCVs in the device group MyDevGrp, use the following command.

symsnap -g <MyDevGrp> create -noprompt

The activation of a snap enables the protection of the source data tracks. When protected tracks are changed on the source volume, they are first copied into the save pool and the VDEV pointers are updated to point to the changed tracks in the save pool. When tracks are changed on the VDEV, the data is written directly to the save pool and the VDEV pointers are updated in the same way.

Activation of the snap session created in the previous create command can be accomplished using the following command.

symsnap –g <MyDevGrp> activate -noprompt

ICO-IMG-000491

I/O

I/O

Target host

Controlling host

VDEV005

SAVDEV

DEV001 Device pointers

from VDEV to original data

Data copied to save area due to

copy on write

1

2



EMC Storage Resource ManagementThe Storage Resource Management (SRM) component of EMC Solutions Enabler extends the basic SYMCLI command set to include SRM commands that allow users to discover and examine attributes of various objects on a host or in the EMC storage enterprise.

Note: The acronym for EMC Storage Resource Management (SRM) can be easily confused with the acronym for VMware Site Recovery Manager. To avoid any confusion, this document always refers to VMware Site Recovery Manager as VMware SRM.

SYMCLI commands support SRM in the following areas:

◆ Data objects and files

◆ Relational databases

◆ File systems

◆ Logical volumes and volume groups

◆ Performance statistics

SRM allows users to examine the mapping of storage devices and the characteristics of data files and objects. These commands allow the examination of relationships between extents and data files or data objects, and how they are mapped on storage devices. Frequently, SRM commands are used with TimeFinder and SRDF to create point-in-time copies for backup and restart.

Figure 16 on page 82 outlines the process of how SRM commands are used with TimeFinder in a database environment.

EMC Storage Resource Management 81

82


Figure 16 SRM commands

EMC Solutions Enabler with a valid license for TimeFinder and SRM is installed on the host. In addition, the host must also have PowerPath or use ECA, and must be utilized with a supported DBMS system. As discussed in “TimeFinder split operations” on page 70, when splitting a BCV, the system must perform housekeeping tasks that may require a few seconds on a busy Symmetrix system. These tasks involve a series of steps (shown in Figure 16 on page 82) that result in the separation of the BCV from its paired standard:

1. Using the SRM base mapping commands, first query the Symmetrix system to display the logical-to-physical mapping information about any physical device, logical volume, file, directory, and/or file system.

2. Using the database mapping command, query the Symmetrix to display physical and logical database information.

3. Next, use the database mapping command to translate:

• The devices of a specified database into a device group or a consistency group, or

• The devices of a specified table space into a device group or a consistency group.

4. The BCV is split from the standard device.

Host

DBMS

PowerPath orECA

SYMAPISYMCLI

2

1

3

4

SYMCLI Mapping Command

Invoke Database APIsIdentify devices

Map database objects between database metadata and the SYMCLI database

TimeFinder SPLIT

SRM

ICO-IMG-000011

DEV001

BCV

DEV002

BCV

DEV003

BCV

DEV004

BCV

DEV001

Data

DEV002

Data

DEV003

Log

DEV004

Log



Table 4 lists the SYMCLI commands used to examine the mapping of data objects.

SRM commands allow users to examine the host database mapping and the characteristics of a database. The commands provide listings and attributes that describe various databases, their structures, files, table spaces, and user schemas. Typically, the database commands work with Oracle, Informix, SQL Server, Sybase, Microsoft Exchange, SharePoint Portal Server, and DB2 LUW database applications.

Table 5 lists the SYMCLI commands used to examine the mapping of database objects.

Table 4 Data object SRM commands

Command Argument Action

symrslv pd Displays logical to physical mapping information about any physical device.

lv Displays logical to physical mapping information about a logical volume.

file Displays logical to physical mapping information about a file.

dir Displays logical to physical mapping information about a directory.

fs Displays logical to physical mapping information about a file system.

Table 5 Data object mapping commands (page 1 of 2)


symrdb list Lists various physical and logical database objects:Current relational database instances availabletable spaces, tables, files, or schemas of a databaseFiles, segments, or tables of a database table space or schema

show Shows information about a database object: table space, tables, file, or schema of a database, File, segment, or a table of a specified table space or schema

rdb2dg Translates the devices of a specified database into a device group.


84


The SYMCLI file system SRM command allows users to investigate the file systems that are in use on the operating system. The command provides listings and attributes that describe file systems, directories, and files, and their mapping to physical devices and extents.

Table 6 lists the SYMCLI command that can be used to examine the file system mapping.

SYMCLI logical volume SRM commands allow users to map logical volumes to display a detailed view of the underlying storage devices. Logical volume architecture defined by a Logical Volume Manager (LVM) is a means for advanced applications to improve performance by the strategic placement of data.

rdb2cg Translates the devices of a specified table space into a composite group or a consistency group.

tbs2cg Translates the devices of a specified table space into a composite group. Only data database files are translated.

tbs2dg Translates the devices of a specified table space into a device group. Only data database files are translated.

Table 5 Data object mapping commands (page 2 of 2)


Table 6 File system SRM commands to examine file system mapping


symhostfs list Displays a list of file systems, files, or directories

show Displays more detail information about a file system or file system object.



Table 7 lists the SYMCLI commands that can be used to examine the logical volume mapping.

SRM performance statistics commands allow users to retrieve statistics about a host’s CPU, disk, and memory.

Table 8 lists the statistics commands.

Table 7 File system SRM command to examine logical volume mapping


symvg deport Deports a specified volume group so it can be imported later.

import Imports a specified volume group.

list Displays a list of volume groups defined on the host system by the logical volume manager.

rescan Rescans all the volume groups.

show Displays more detail information about a volume group.

vg2cg Translates volume groups to composite groups.

vg2dg Translates volume groups to device groups.

symlv list Displays a list of logical volumes on a specified volume group.

show Displays detail information (including extent data) about a logical volume.

Table 8 SRM statistics command


symhost show Displays host configuration information.

stats Displays performance statistics.


86


EMC Storage ViewerEMC Storage Viewer (SV) for vSphere Client extends the vSphere Client to facilitate discovery and identification of EMC Symmetrix storage devices that are allocated to VMware ESX/ESXi hosts and virtual machines. The Storage Viewer for vSphere Client presents the underlying storage details to the virtual datacenter administrator, merging the data of several different storage mapping tools into a few seamless vSphere Client views.

The Storage Viewer for vSphere Client enables you to resolve the underlying storage of Virtual Machine File System (VMFS) datastores and virtual disks, as well as raw device mappings (RDM). In addition, you are presented with lists of storage arrays and devices that are accessible to the ESX and ESXi hosts in the virtual datacenter. Previously, these details were only made available to you using separate storage management applications.

Once installed and configured, Storage Viewer provides four different views:

◆ The global EMC Storage view. This view configures the global settings for the Storage Viewer, including the Solutions Enabler client/server settings, log settings, and version information. Additionally, an arrays tab lists all of the storage arrays currently being managed by Solutions Enabler, and allows for the discovery of new arrays and the deletion of previously discovered arrays.

◆ The EMC Storage tab for hosts. This tab appears when an ESX/ESXi host is selected. It provides insight into the storage that is configured and allocated for a given ESX/ESXi host.

◆ The SRDF SRA tab for hosts. This view also appears when an ESX/ESXi host is selected on a vSphere Client running on VMware Site Recovery Manager Server. It allows you to configure device pair definitions for the EMC SRDF Storage Replication Adapter (SRA), to use when testing VMware Site Recovery Manager recovery plans, or when creating gold copies before VMware Site Recovery Manager recovery plans are executed.

◆ The EMC Storage tab for virtual machines. This view appears when a virtual machine is selected. It provides insight into the storage that is allocated to a given virtual machine, including both virtual disks and raw device mappings (RDM).



A typical view of the Storage Viewer for vSphere Client can be seen in Figure 17.

Figure 17 EMC Storage Viewer

EMC Storage Viewer 87

88


EMC PowerPathEMC PowerPath is host-based software that works with networked storage systems to intelligently manage I/O paths. PowerPath manages multiple paths to a storage array. Supporting multiple paths enables recovery from path failure because PowerPath automatically detects path failures and redirects I/O to other available paths. PowerPath also uses sophisticated algorithms to provide dynamic load balancing for several kinds of path management policies that the user can set. With the help of PowerPath, systems administrators are able to ensure that applications on the host have highly available access to storage and perform optimally at all times.

A key feature of path management in PowerPath is dynamic, multipath load balancing. Without PowerPath, an administrator must statically load balance paths to logical devices to improve performance. For example, based on current usage, the administrator might configure three heavily used logical devices on one path, seven moderately used logical devices on a second path, and 20 lightly used logical devices on a third path. As I/O patterns change, these statically configured paths may become unbalanced, causing performance to suffer. The administrator must then reconfigure the paths, and continue to reconfigure them as I/O traffic between the host and the storage system shifts in response to usage changes.

Designed to use all paths concurrently, PowerPath distributes I/O requests to a logical device across all available paths, rather than requiring a single path to bear the entire I/O burden. PowerPath can distribute the I/O for all logical devices over all paths shared by those logical devices, so that all paths are equally burdened. PowerPath load balances I/O on a host-by-host basis, and maintains statistics on all I/O for all paths. For each I/O request, PowerPath intelligently chooses the least-burdened available path, depending on the load-balancing and failover policy in effect. In addition to improving I/O performance, dynamic load balancing reduces management time and downtime because administrators no longer need to manage paths across logical devices. With PowerPath, configurations of paths and policies for an individual device can be changed dynamically, taking effect immediately, without any disruption to the applications.



PowerPath provides the following features and benefits:

◆ Multiple paths, for higher availability and performance — PowerPath supports multiple paths between a logical device and a host bus adapter (HBA, a device through which a host can issue I/O requests). Having multiple paths enables the host to access a logical device even if a specific path is unavailable. Also, multiple paths can share the I/O workload to a given logical device.

◆ Dynamic multipath load balancing — Through continuous I/O balancing, PowerPath improves a host’s ability to manage heavy I/O loads. PowerPath dynamically tunes paths for performance as workloads change, eliminating the need for repeated static reconfigurations.

◆ Proactive I/O path testing and automatic path recovery — PowerPath periodically tests failed paths to determine if they are available. A path is restored automatically when available, and PowerPath resumes sending I/O to it. PowerPath also periodically tests available but unused paths, to ensure they are operational.

◆ Automatic path failover — PowerPath automatically redirects data from a failed I/O path to an alternate path. This eliminates application downtime; failovers are transparent and non-disruptive to applications.

◆ Enhanced high availability cluster support — PowerPath is particularly beneficial in cluster environments, as it can prevent interruptions to operations and costly downtime. PowerPath’s path failover capability avoids node failover, maintaining uninterrupted application support on the active node in the event of a path disconnect (as long as another path is available).

◆ Consistent split — PowerPath allows users to perform TimeFinder consistent splits by suspending device writes at the host level for a fraction of a second while the foreground split occurs. PowerPath software provides suspend-and-resume capability that avoids inconsistencies and restart problems that can occur if a database-related BCV is split without first quiescing the database.

◆ Consistency Groups — Consistency groups are a composite group of Symmetrix devices specially configured to act in unison to maintain the integrity of a database distributed across multiple SRDF arrays controlled by an open systems host computer.

EMC PowerPath 89

90


PowerPath/VEEMC PowerPath/VE delivers PowerPath Multipathing features to optimize VMware vSphere virtual environments. With PowerPath/VE, you can standardize path management across heterogeneous physical and virtual environments. PowerPath/VE enables you to automate optimal server, storage, and path utilization in a dynamic virtual environment. With hyper-consolidation, a virtual environment may have hundreds or even thousands of independent virtual machines running, including virtual machines with varying levels of I/O intensity. I/O-intensive applications can disrupt I/O from other applications and before the availability of PowerPath/VE, load balancing on an ESX host system had to be manually configured to correct for this. Manual load-balancing operations to ensure that all virtual machines receive their individual required response times are time-consuming and logistically difficult to effectively achieve.

PowerPath/VE works with VMware ESX and ESXi as a multipathing plug-in (MPP) that provides enhanced path management capabilities to ESX and ESXi hosts. PowerPath/VE is supported with vSphere (ESX4) only. Previous versions of ESX do not have the PSA, which is required by PowerPath/VE.

PowerPath/VE installs as a kernel module on the vSphere host. PowerPath/VE will plug in to the vSphere I/O stack framework to bring the advanced multipathing capabilities of PowerPath - dynamic load balancing and automatic failover - to the VMware vSphere platform (Figure 18 on page 91).



Figure 18 PowerPath/VE vStorage API for multipathing plug-in

At the heart of PowerPath/VE path management is server-resident software inserted between the SCSI device-driver layer and the rest of the operating system. This driver software creates a single "pseudo device" for a given array volume (LUN) regardless of how many physical paths on which it appears. The pseudo device, or logical volume, represents all physical paths to a given device. It is then used for creating virtual disks, and for raw device mapping (RDM), which is then used for application and database access.

EMC PowerPath 91

92


PowerPath/VE's value fundamentally comes from its architecture and position in the I/O stack. PowerPath/VE sits above the HBA, allowing heterogeneous support of operating systems and storage arrays. By integrating with the I/O drivers, all I/Os run through PowerPath and allow for it to be a single I/O control and management point. Since PowerPath/VE resides in the ESX kernel, it sits below the Guest OS level, application level, database level, and file system level. PowerPath/VE's unique position in the I/O stack makes it an infrastructure manageability and control point - bringing more value going up the stack.

PowerPath/VE featuresPowerPath/VE provides the following features:

◆ Dynamic load balancing - PowerPath is designed to use all paths at all times. PowerPath distributes I/O requests to a logical device across all available paths, rather than requiring a single path to bear the entire I/O burden.

◆ Auto-restore of paths - Periodic auto-restore reassigns logical devices when restoring paths from a failed state. Once restored, the paths automatically rebalance the I/O across all active channels.

◆ Device prioritization - Setting a high priority for a single or several devices improves their I/O performance at the expense of the remaining devices, while otherwise maintaining the best possible load balancing across all paths. This is especially useful when there are multiple virtual machines on a host with varying application performance and availability requirements.

◆ Automated performance optimization - PowerPath/VE automatically identifies the type of storage array and sets the highest performing optimization mode by default. For Symmetrix, the mode is SymmOpt (Symmetrix Optimized).

◆ Dynamic path failover and path recovery - If a path fails, PowerPath/VE redistributes I/O traffic from that path to functioning paths. PowerPath/VE stops sending I/O to the failed path and checks for an active alternate path. If an active path is available, PowerPath/VE redirects I/O along that path. PowerPath/VE can compensate for multiple faults in the I/O channel (for example, HBAs, fiber-optic cables, Fibre Channel switch, storage array port).



◆ Monitor/report I/O statistics - While PowerPath/VE load balances I/O, it maintains statistics for all I/O for all paths. The administrator can view these statistics using rpowermt.

◆ Automatic path testing - PowerPath/VE periodically tests both live and dead paths. By testing live paths that may be idle, a failed path may be identified before an application attempts to pass I/O down it. By marking the path as failed before the application becomes aware of it, timeout and retry delays are reduced. By testing paths identified as failed, PowerPath/VE will automatically restore them to service when they pass the test. The I/O load will be automatically balanced across all active available paths.

PowerPath/VE managementPowerPath/VE uses a command set, called rpowermt, to monitor, manage, and configure PowerPath/VE for vSphere. The syntax, arguments, and options are very similar to the traditional powermt commands used on all the other PowerPath Multipathing supported operating system platforms. There is one significant difference in that rpowermt is a remote management tool.

Not all vSphere installations have a service console interface. In order to manage an ESXi host, customers have the option to use vCenter Server or vCLI (also referred to as VMware Remote Tools) on a remote server. PowerPath/VE for vSphere uses the rpowermt command line utility for both ESX and ESXi. PowerPath/VE for vSphere cannot be managed on the ESX host itself. There is neither a local nor remote GUI for PowerPath on ESX.

Administrators must designate a Guest OS or a physical machine to manage one or multiple ESX hosts. rpowermt is supported on Windows 2003 (32-bit) and Red Hat 5 Update 2 (64-bit).

When the vSphere host server is connected to the Symmetrix system, the PowerPath/VE kernel module running on the vSphere host will associate all paths to each device presented from the array and associate a pseudo device name (as discussed earlier). An example of this is shown in Figure 15 on page 80, which shows the output of rpowermt display host=x.x.x.x dev=emcpower0. Note in the output that the device has four paths and displays the optimization mode (SymmOpt = Symmetrix optimization).

EMC PowerPath 93

94


Figure 19 Output of rpowermt display command on a Symmetrix VMAX device

As more VMAX Engines or Symmetrix DMX directors become available, the connectivity can be scaled as needed. PowerPath/VE supports up to 32 paths to a device. These methodologies for connectivity ensure all front-end directors and processors are utilized, providing maximum potential performance and load balancing for vSphere hosts connected to the Symmetrix VMAX/DMX storage arrays in combination with PowerPath/VE.

PowerPath/VE in vCenter ServerPowerPath/VE for vSphere is managed, monitored, and configured using rpowermt as discussed in the previous section. This CLI-based management is common across all PowerPath platforms and presently, there is very little integration at this time with VMware management tools. However, LUN ownership is presented in the GUI.

As seen in Figure 20 on page 95, under the ESX Configuration tab and within the Storage Devices list, the owner of the device is shown.



Figure 20 Device ownership in vCenter Server

Figure 20 shows a number of different devices owned by PowerPath. A set of claim rules are added to the vSphere PSA, which enables PowerPath/VE to manage supported storage arrays. As part of the initial installation process and claiming of devices by PowerPath/VE, the system must be rebooted. Nondisruptive installing is discussed in the following section.

Nondisruptive installation of PowerPath/VE using VMotionInstalling PowerPath/VE on a vSphere host requires a reboot. Just as with other PowerPath platforms, either the host must be rebooted or the I/O to applications running on the host must be stopped. In the case of vSphere, the migration capability built into the hypervisor allows members of the cluster to have PowerPath/VE installed without disrupting active virtual machines.

VMware VMotion technology leverages the complete virtualization of servers, storage, and networking to move an entire running virtual machine instantaneously from one server to another. VMware VMotion uses the VMware cluster file system to control access to a virtual machine's storage. During a VMotion operation, the active memory and precise execution state of a virtual machine is rapidly transmitted over a high-speed network from one physical server to another and access to the virtual machines' disk storage is instantly switched to the new physical host. It is therefore advised, in order to eliminate any downtime, to use VMotion to move all running virtual machines off the ESX host server before the installation of

EMC PowerPath 95

96


PowerPath/VE. If the ESX host server is in a fully automated High Availability (HA) cluster, put the ESX host into maintenance mode, which will immediately begin migrating all of the virtual machines off the ESX host to other servers in the cluster.

As always, it is necessary to perform a number of different checks before evacuating virtual machines from an ESX host to make sure that the virtual machines can actually be migrated. These checks include making sure that:

◆ VMotion is properly configured and functioning.

◆ The datastores containing the virtual machines are shared over the cluster.

◆ No virtual machines are using physical media from their ESX host system (that is, CD-ROMs, USB drives)

◆ The remaining ESX hosts in the cluster will be able to handle the additional load of the temporarily migrated virtual machines.

Performing these checks will help to ensure the successful (and error-free) migration of the virtual machines. Additionally, this due diligence will greatly reduce the risk of degraded virtual machine performance resulting from overloaded ESX host systems. For more information on configuring and using VMotion, refer to VMware documentation.

This process should be repeated on all ESX hosts in the cluster until all PowerPath installations are complete.



EMC Replication ManagerEMC Replication Manager is an EMC software application that dramatically simplifies the management and use of disk-based replications to improve the availability of user’s mission-critical data and rapid recovery of that data in case of corruption.

Note: All functionality offered by EMC Replication Manager is not supported in a VMware Infrastructure environment. The EMC Replication Manager Support Matrix available on Powerlink® (EMC’s password-protected customer- and partner-only website) provides further details on supported configurations.

Replication Manager helps users manage replicas as if they were tape cartridges in a tape library unit. Replicas may be scheduled or created on demand, with predefined expiration periods and automatic mounting to alternate hosts for backups or scripted processing. Individual users with different levels of access ensure system and replica integrity. In addition to these features, Replication Manager is fully integrated with many critical applications such as DB2 LUW, Oracle, and Microsoft Exchange.

Replication Manager makes it easy to create point-in-time, disk-based replicas of applications, file systems, or logical volumes residing on existing storage arrays. It can create replicas of information stored in the following environments:

◆ Oracle databases

◆ DB2 LUW databases

◆ Microsoft SQL Server databases

◆ Microsoft Exchange databases

◆ UNIX file systems

◆ Windows file systems

◆ VMware file systems

The software utilizes Java-based client-server architecture. Replication Manager can:

◆ Create point-in-time replicas of production data in seconds.

◆ Facilitate quick, frequent, and non-destructive backups from replicas.

EMC Replication Manager 97

98


◆ Mount replicas to alternate hosts to facilitate offline processing (for example, decision-support services, integrity checking, and offline reporting).

◆ Restore deleted or damaged information quickly and easily from a disk replica.

◆ Set the retention period for replicas so that storage is made available automatically.

Replication Manager has a generic storage technology interface that allows it to connect and invoke replication methodologies available on:

◆ EMC Symmetrix arrays

◆ EMC CLARiiON arrays

◆ HP StorageWorks arrays

Replication Manager uses Symmetrix API (SYMAPI) Solutions Enabler software and interfaces to the storage array’s native software to manipulate the supported disk arrays. Replication Manager automatically controls the complexities associated with creating, mounting, restoring, and expiring replicas of data. Replication Manager performs all of these tasks and offers a logical view of the production data and corresponding replicas. Replicas are managed and controlled with the easy-to-use Replication Manager console.



EMC Open ReplicatorEMC Open Replicator enables distribution and/or consolidation of remote point-in-time copies between EMC Symmetrix DMX and qualified storage systems such as the EMC CLARiiON storage arrays. By leveraging the high-end Symmetrix DMX storage architecture, Open Replicator offers unmatched deployment flexibility and massive scalability.

Open Replicator can be used to provide solutions to business processes that require high-speed data mobility, remote vaulting and data migration. Specifically, Open Replicator enables customers to:

◆ Rapidly copy data between Symmetrix, CLARiiON and third-party storage arrays.

◆ Perform online migrations from qualified storage to Symmetrix DMX arrays with minimal disruption to host applications.

◆ Push a point-in-time copy of applications from Symmetrix DMX arrays to a target volume on qualified storage arrays with incremental updates.

◆ Copy from source volumes on qualified remote arrays to Symmetrix DMX volumes.

Open Replicator is tightly integrated with the EMC TimeFinder and SRDF family of products, providing enterprises with highly flexible and lower-cost options for remote protection and migration. Open Replicator is ideal for applications and environments where economics and infrastructure flexibility outweigh RPO and RTO requirements. Open Replicator enables businesses to:

◆ Provide a cost-effective and flexible solution to protect lower-tier applications.

◆ Reduce TCO by pushing or pulling data from Symmetrix DMX systems to other qualified storage arrays in conventional SAN/WAN environments.

◆ Create remote point-in-time copies of production applications for many ancillary business operations such as data vaulting.

◆ Obtain cost-effective application restore capabilities with minimal RPO/RTO impact.

◆ Comply with industry policies and government regulations.

EMC Open Replicator 99

100


EMC Virtual ProvisioningVirtual Provisioning (commonly known as thin provisioning) was released with the 5773 Enginuity operating environment. Virtual Provisioning allows for storage to be allocated/accessed on-demand from a pool of storage servicing one or many applications. This type of approach has multiple benefits:

◆ Enables LUNs to be “grown” into over time with no impact to the host or application as space is added to the thin pool

◆ Only delivers space from the thin pool when it is written to, that is, on-demand. Overallocated application components only use space that is written to — not requested.

◆ Provides for thin-pool wide striping and for the most part relieves the storage administrator of the burden of physical device/LUN configuration

Virtual Provisioning introduces two new devices to the Symmetrix. The first device is a thin device and the second device is a data device. These are described in the following two sections.

Thin device

A thin device is a “Host accessible device” that has no storage directly associated with it. Thin devices have pre-configured sizes and appear to the host to have that exact capacity. Storage is allocated in chunks when a block is written to for the first time. Zeroes are provided to the host for data that is read from chunks that have not yet been allocated.

Data device

Data devices are specifically configured devices within the Symmetrix that are containers for the written-to blocks of thin devices. Any number of data devices may comprise a data device pool. Blocks are allocated to the thin devices from the pool on a round robin basis. This allocation block size is 768K.

Figure 21 on page 101 depicts the components of a Virtual Provisioning configuration:



Figure 21 Virtual Provisioning components

New Symmetrix VMAX Virtual Provisioning features

Solutions Enabler 7.1 and Enginuity 5874 SR1 introduce two new features to Symmetrix Virtual Provisioning - thin pool write rebalancing and zero space reclamation. Thin pool write balancing provides the ability to automatically rebalance allocated extents on data devices over the entire pool when new data devices are added. Zero space reclamation allows users to reclaim space from tracks of data devices that are all zeros.

Thin pool write rebalanceThin pool write rebalancing for Virtual Provisioning pools extends the functionality of the Virtual Provisioning feature by implementing a method to normalize the used capacity levels of data devices within a virtual data pool after new data drives are added or existing data drives are drained. This feature introduces a background optimization task to scan the used capacity levels of the data devices within a virtual pool and perform movements of multiple track groups from the most utilized pool data devices to the least used pool data devices. The process can be scheduled to run only when

Pool A

Pool B

Datadevices

Datadevices

ThinDevices

ICO-IMG-000493

EMC Virtual Provisioning 101

102


changes to virtual pool composition make it necessary and user controls exist to specify what utilization delta will trigger track group movement.

Zero space reclamationZero space reclamation or Virtual Provisioning space reclamation provides the ability to free, also referred to as "de-allocate," storage extents found to contain all zeros. This feature is an extension of the existing Virtual Provisioning space de-allocation mechanism. Previous versions of Enginuity and Solutions Enabler allowed for reclaiming allocated (reserved but unused) thin device space from a thin pool. Administrators now have the ability to reclaim both allocated/unwritten extents as well as extents filled with host-written zeros within a thin pool. The space reclamation process is nondisruptive and can be executed with the targeted thin device ready and read/write to operating systems and applications.

Starting the space reclamation process spawns a back-end disk director (DA) task that will examine the allocated thin device extents on specified thin devices. A thin device extent is 768 KB (or 12 tracks) in size and is the default unit of storage at which allocations occur. For each allocated extent, all 12 tracks will be brought into Symmetrix cache and examined to see if they contain all zero data. If the entire extent contains all zero data, the extent will be de-allocated and added back into the pool, making it available for a new extent allocation operation. An extent that contains any non-zero data is not reclaimed.



EMC Virtual LUN migrationThis feature offers system administrators the ability to transparently migrate host visible LUNs from differing tiers of storage that are available in the Symmetrix VMAX. The storage tiers can represent differing hardware capability as well as differing tiers of protection. The LUNs can be migrated to either unallocated space (also referred to as unconfigured space) or to configured space, which is defined as existing Symmetrix LUNs that are not currently assigned to a server-existing, not-ready volumes-within the same subsystem. The data on the original source LUNs is cleared using instant VTOC once the migration has been deemed successful. The migration does not require swap or DVR space, and is nondisruptive to the attached hosts or other internal Symmetrix applications such as TimeFinder and SRDF. Figure 22 shows the valid combinations of drive types and protection types that are available for migration.

Figure 22 Virtual LUN eligibility tables

The device migration is completely transparent to the host on which an application is running since the operation is executed against the Symmetrix device; thus the target and LUN number are not changed and applications are uninterrupted. Furthermore, in SRDF environments, the migration does not require customers to re-establish their disaster recovery protection after the migration.

The Virtual LUN feature leverages the newly designed virtual RAID architecture in Enginuity 5874, which abstracts device protection from its logical representation to a server. This powerful approach allows a device to have more simultaneous protection types such as BCVs, SRDF, Concurrent SRDF, and spares. It also enables seamless

Drive Type

Flash FibreChannel

SATA

Flash

FibreChannel

SATA

Protection Type

RAID 1 RAID 6

RAID 6

RAID 6

RAID 6 Un-Protected

Un-Protected

RAID 1

ICO-IMG-000754

y y y

y y y

y y y

y y y x

y y y x

y y y x

y y y x

EMC Virtual LUN migration 103

104


transition from one protection type to another while servers and their associated applications and Symmetrix software are accessing the device.

The Virtual LUN feature offers customers the ability to effectively utilize SATA storage - a much cheaper, yet reliable, form of high capacity storage. It also facilitates fluid movement of data across the various storage tiers present within the subsystem - the realization of true "tiered storage in the box." Thus, Symmetrix VMAX becomes the first enterprise storage subsystem to offer a comprehensive "tiered storage in the box," ILM capability that complements the customer's tiering initiatives. Customers can now achieve varied cost/performance profiles by moving lower priority application data to less expensive storage, or conversely, moving higher priority or critical application data to higher performing storage as their needs dictate.

Specific use cases for customer applications enable the moving of data volumes transparently from tier to tier based on changing performance (moving to faster or slower disks) or availability requirements (changing RAID protection on the array). This migration can be performed transparently without interrupting those applications or host systems utilizing the array volumes and with only a minimal impact to performance during the migration.

The following sample commands show how to move two LUNs of a host environment from RAID 6 drives on Fibre Channel 15k rpm drives to Enterprise Flash drives. The new symmigrate command, which comes in EMC Solutions Enabler 7.0, is used to perform the migrate operation. The source Symmetrix hypervolume numbers are 200 and 201, and the target Symmetrix hypervolumes on the Enterprise Flash drives are A00 and A01.

1. A file (migrate.ctl) is created that contains the two LUNs to be migrated. The file has the following content:

200 A00201 A01

2. The following command is executed to perform the migration:

symmigrate -sid 1261 -name <ds_mig> -f <migrate.ctl> establish

The ds_mig name associated with this migration can be used to interrogate the progress of the migration.



3. To inquire on the progress use the following command:

symmigrate -sid 1261 -name <ds_mig> query

The two host accessible LUNs are migrated without having to impact application or server availability.

EMC Virtual LUN migration 105

106


EMC Fully Automated Storage Tiering (FAST)With the release of Enginuity 5874, EMC now offers the first generation of Fully Automated Storage Tiering technology. EMC Symmetrix VMAX Fully Automated Storage Tiering (FAST) for standard provisioned environments automates the identification of data volumes for the purposes of allocating or re-allocating application data across different performance tiers within an array. FAST proactively monitors workloads at the volume (LUN) level in order to identify "busy" volumes that would benefit from being moved to higher-performing drives. FAST will also identify less "busy" volumes that could be relocated to higher-capacity drives, without existing performance being affected. This promotion/demotion activity is based on policies that associate a storage group to multiple drive technologies, or RAID protection schemes, based on the performance requirements of the application contained within the storage group. Data movement executed during this activity is performed nondisruptively, without affecting business continuity and data availability.

The primary benefits of FAST include:

◆ Automating the process of identifying volumes that can benefit from Enterprise Flash Drives and/or that can be kept on higher-capacity, less-expensive drives without impacting performance

◆ Improving application performance at the same cost, or providing the same application performance at lower cost. Cost is defined as space, energy, acquisition, management and operational expense.

◆ Optimizing and prioritizing business applications, which allows customers to dynamically allocate resources within a single array

◆ Delivering greater flexibility in meeting different price/performance ratios throughout the lifecycle of the information stored

Management and operation of FAST are provided by SMC, as well as the Solutions Enabler Command Line Interface (SYMCLI). Also, detailed performance trending, forecasting, alerts, and resource utilization are provided through Symmetrix Performance Analyzer (SPA). EMC IonixTM ControlCenter® provides the capability for advanced reporting and analysis to be used for charge back and capacity planning.


3

This chapter discusses how to presents these topics:

◆ Overview................................................................................................. 109◆ Comparing recoverable and restartable copies of databases............ 110◆ Copying the database with Oracle shutdown .................................... 111◆ Copying a running database using EMC consistency technology .. 118◆ Copying the database with Oracle in hot backup mode.................. 125◆ Replicating Oracle using Replication Manager ................................. 133◆ Transitioning disk copies to Oracle database clones......................... 135◆ Oracle transportable tablespaces ......................................................... 143◆ Cross-platform transportable tablespaces .......................................... 150◆ Choosing a database cloning methodology ....................................... 154

Creating OracleDatabase Clones

Creating Oracle Database Clones 107

108

Creating Oracle Database Clones

This chapter describes the Oracle database cloning process using various EMC products. Determining which replication products to use depends on the customer's requirements and database environment. Products such as TimeFinder and Replication Manager provide an easy method copying of Oracle databases in a single Symmetrix array.

This chapter describes the database cloning process. A database cloning process typically includes some or all of the following steps, depending on the copying mechanism selected and the desired usage of the database clone:

◆ Preparing the array for replication

◆ Conditioning the source database

◆ Making a copy of the database volumes

◆ Resetting the source database

◆ Presenting the target database copy to a server

◆ Conditioning the target database copy



OverviewThere are many choices when cloning databases with EMC array-based replication software. Each software product has differing characteristics that affect the final deployment. A thorough understanding of the options available leads to an optimal replication choice.

An Oracle database can be in one of three data states when it is being copied:

◆ Shutdown

◆ Processing normally

◆ Conditioned using hot-backup mode

Depending on the data state of the database at the time it is copied, the database copy may be restartable or recoverable. This section begins with a discussion of recoverable and restartable database clones. It then describes various approaches to data replication using EMC software products and how the replication techniques is used in combination with the different database data states to facilitate the database cloning process. Following that, database clone usage considerations are discussed along with descriptions of the procedures used to deploy database clones across various operating-systems platforms.

Overview 109

110


Comparing recoverable and restartable copies of databasesThe Symmetrix-based replication technologies described in this section can create two types of database copies: recoverable or restartable. A significant amount of confusion exists between these two types of database copies; a clear understanding of the differences between the two is critical to ensure the appropriate application of each method when a cloned Oracle environment is required.

Recoverable disk copies

A recoverable database copy is one in which logs can be applied to the database data state and the database is rolled forward to a point in time after the database copy is created. A recoverable Oracle database copy is intuitively easy for DBAs to understand since maintaining recoverable copies, in the form of backups, is an important DBA function. In the event of a failure of the production database, the ability to recover the database not only to the point in time when the last backup was taken, but also to roll forward subsequent transactions up to the point of failure, is a key feature of the Oracle database.

Restartable disk copies

If a copy of a running Oracle system is created using EMC consistency technology without putting the database in hot backup mode, the copy is a DBMS restartable copy. This means that when the DBMS is started on the restartable copy, it performs crash recovery. First, all transactions recorded as committed and written to the redo log, but which may not have had corresponding data pages written to the data files are rolled forward using the redo logs. Second, after the application of log information completes, Oracle rolls back any changes that were written to the database (dirty pages flushed to disk for example), but were never actually committed by a transaction. The state attained is often referred to as a transactionally consistent point in time. It is essentially the same process that the RDBMS would undergo if the server suffered an unanticipated interruption such as a power failure.

Roll-forward recovery using archive logs to a point in time after the disk copy is created is unsupported on an Oracle restartable database copy.



Copying the database with Oracle shutdownIdeally, a copy of an Oracle database should be taken while the database is shut down. Taking a copy after the database has been shut down normally ensures a clean copy for backups to tape or for fast startup of the cloned database. In addition, a cold copy of a database is in a known transactional data state which, for some application requirements, is exceedingly important. Copies of running databases are in unknown transactional data states.

While a normal shutdown is desirable, it is not always feasible with an active Oracle database. In many cases, applications and databases must be forced to completely shut down. Rarely, the shutdown abort command may be required to successfully shut down the database. For any abnormal shutdowns, it is recommended that the database be restarted allowing recovery and cleanup of the database, and then be shut down normally. This ensures a clean, consistent copy of the database is available for the copy procedure.

One primary method of creating copies of an Oracle database is through the use of the EMC local replication product, TimeFinder. TimeFinder is also used by Replication Manager to make database copies. Replication Manager facilitates the automation and management of database clones.

TimeFinder comes in three different forms, TimeFinder/Mirror, TimeFinder/Clone and TimeFinder/Snap. These were discussed in general terms in Chapter 2, “EMC Foundation Products.” Here, they are used in a database context.

Creating Oracle copies using TimeFinder/Mirror

TimeFinder/Mirror is an EMC software product that allows an additional hardware mirror to be attached to a source volume. The additional mirror is a specially designated volume in the Symmetrix configuration called a business continuance volume (BCV). The BCV is synchronized to the source volume through a process called an establish. While the BCV is established, it is not ready to all hosts. At an appropriate time, the BCV can be split from the source volume to create a complete point-in-time copy of the source data that can be used for multiple different purposes including backup, decision support, regression testing, and such.

Copying the database with Oracle shutdown 111

112


Groups of BCVs are managed together using SYMCLI device or composite groups. Solutions Enabler commands are executed to create SYMCLI groups for TimeFinder/Mirror operations. If the database spans more than one Symmetrix array, a composite group is used. Appendix B, “Sample SYMCLI Group Creation Commands,”provides examples of these commands.

Figure 23 shows how to use TimeFinder/Mirror to make a database copy of a cold Oracle database.

Figure 23 Copying a cold (shutdown) Oracle database with TimeFinder/Mirror

1. Establish the BCVs to the standard devices. This operation occurs in the background and should be executed in advance of when the BCV copy is needed.

symmir -g device_group establish -full -noprompt

Note that the first iteration of the establish needs to be a full synchronization. Subsequent iterations by default are incremental if the -full keyword is omitted. Once the command is issued, the array begins the synchronization process using only Symmetrix resources. Since this operation occurs independently from the host, the process must be interrogated to see when it completes. The command to interrogate the synchronization process is:

symmir -g device_group verify

This command will return a 0 return code when the synchronization operation is complete. Alternatively, synchronization can be verified using the following:

ICO-IMG-000505

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle42

31



symmir -g device_group query

After the volumes are synchronized, the split command can be issued at any time.

2. Once BCV synchronization is complete, bring down the database to make a copy of a cold database. Execute the following Oracle commands:

sqlplus "/ as sysdba"SQL> shutdown immediate;

3. When the database is deactivated, split the BCV mirrors using the following command:

symmir -g device_group split -noprompt

The split command takes a few seconds to process. The database copy on the BCVs is now ready for further processing.

4. The source database can now be activated and made available to users once again.

SQL> startup;

Creating Oracle copies using TimeFinder/Clone

TimeFinder/Clone is an EMC software product that copies data internally in the Symmetrix array. A TimeFinder/Clone session is created between a source data volume and a target volume. The target volume must be equal to or greater in size than the source volume. The source and target for TimeFinder/Clone sessions can be any hypervolumes in the Symmetrix configuration.

TimeFinder/Clone devices are managed together using SYMCLI device or composite groups. Solutions Enabler commands are executed to create SYMCLI groups for TimeFinder/Clone operations. If the database spans more than one Symmetrix array, a composite group is used. Appendix B, “Sample SYMCLI Group Creation Commands,” provides examples of these commands.


114


Figure 24 shows how to use TimeFinder/Clone to make a copy of a cold Oracle database onto BCV devices.

Figure 24 Copying a cold Oracle database with TimeFinder/Clone

1. Create the TimeFinder/Clone pairs. The following command creates the TimeFinder/Clone pairings and protection bitmaps. No data is copied or moved at this time:

symclone -g device_group create -noprompt

Unlike TimeFinder/Mirror, the TimeFinder/Clone relationship is created and activated when it is needed. No prior synchronization of data is necessary. After the TimeFinder/Clone session is created, it can be activated consistently.

2. Once the create command is complete, shut down the database to make a cold disk copy of the database. Execute the following Oracle commands:


With the database down, activate the TimeFinder/Clone:

symclone -g device_group activate -noprompt

After an activate command, the database copy provided by TimeFinder/Clone is immediately available for further processing even though the copying of data may not have completed.

ICO-IMG-000505

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle42

31



3. Activate the source database to make available to users once again:

SQL> startup;Databases copied using TimeFinder/Clone are subject to Copy on First Write (COFW) and Copy On Access (COA) penalties. The COFW penalty means that if a track is written to the source volume and it not copied to the target volume, it must first be copied to the target volume before the write from the host is acknowledged. COA means that if a track on a TimeFinder/Clone volume is accessed before it is copied, it must first be copied from the source volume to the target volume. This causes additional disk read activity to the source volumes and could be a source of disk contention on busy systems.

Creating Oracle copies using TimeFinder/SnapTimeFinder/Snap enables users to create complete copies of their data while consuming only a fraction of the disk space required by the original copy.

TimeFinder/Snap is an EMC software product that maintains space-saving, pointer-based copies of disk volumes using VDEVs and SAVDEVs. The VDEVs contain pointers either to the source data (when it is unchanged) or to the SAVDEVs (when the data has changed).

TimeFinder/Snap devices are managed together using SYMCLI device or composite groups. Solutions Enabler commands are executed to create SYMCLI groups for TimeFinder/Snap operations. If the database spans more than one Symmetrix array, a composite group is used. Appendix B, “Sample SYMCLI Group Creation Commands,”provides examples of these commands.


116


Figure 25 shows how to use TimeFinder/Snap to make a copy of a cold Oracle database.

Figure 25 Copying a cold Oracle database with TimeFinder/Snap

1. Create the TimeFinder/Snap pairs. The following command creates the TimeFinder/Snap pairings and protection bitmaps. No data is copied or moved at this time:

symsnap -g device_group create -noprompt

2. Once the create operation has completed, shut down the database to make a cold TimeFinder/Snap of the DBMS. Execute the following Oracle commands:


3. With the database down, the TimeFinder/Snap copy can now be activated:

symsnap -g device_group activate -noprompt

After activating the snap, the pointer-based database copy on the VDEVs is available for further processing.

4. The source database can be started again. Use the following Oracle command:

SQL> startup;

STD

VDEV

SAVEDEV

Controllinghost

Targethost

1 3

2

4

ICO-IMG-000506

Device pointers from VDEV to original data


copy on write

I/O

I/O



Databases copied using TimeFinder/Snap are subject to a COFW penalty while the snap is activated. The COFW penalty means that if a track is written to the source volume and it has not been copied to the snap-save area, it must first be copied to the save area before the write from the host is acknowledged.


118


Copying a running database using EMC consistency technology

The replication of a running database system involves a database copying technique that is employed while the database is servicing applications and users. The database copying technique uses EMC consistency technology combined with an appropriate data copy process like TimeFinder/Mirror, TimeFinder/Clone, or such. TimeFinder/CG allows for the running database copy to be created in an instant through use of the -consistent key word on the split or activate commands. The image created in this way is in a dependent-write consistent data state and is used as a restartable copy of the database.

Databases management systems enforce a principle of dependent-write I/O. That is, no dependent-write I/O will be issued until the predecessor write that it is dependent on has completed. This type of programming discipline is used to coordinate database and log updates within a database management system and allows those systems to be restartable in event of a power failure. Dependent-write consistent data states are created when database management systems are exposed to power failures. Using EMC consistency technology options during the database cloning process also creates a database copy that has a dependent-write-consistent data state. Chapter 2, “EMC Foundation Products,” provides more information on EMC consistency technology.

Oracle can be copied while it is running and processing transactions. The following sections describe how to copy a running Oracle database using TimeFinder technology.

Creating Oracle copies using TimeFinder/MirrorTimeFinder/Mirror is an EMC software product that allows an additional hardware mirror to be attached to a source volume. The additional mirror is a specially designated volume in the Symmetrix configuration called a BCV. The BCV is synchronized to the source volume through a process called an establish. While the BCV is established, it is not ready to all hosts. At an appropriate time, the BCV can be split from the source volume to create a complete point-in-time copy of the source data that can be used for multiple different purposes including backup, decision support, regression testing, and such.



Groups of BCVs are managed together using SYMCLI device or composite groups. Solutions Enabler commands are executed to create SYMCLI groups for TimeFinder/Mirror operations. If the database spans more than one Symmetrix array, a composite group is used. Appendix B, “Sample SYMCLI Group Creation Commands,” provides examples of these commands.

Figure 26 shows how to use TimeFinder/Mirror and EMC consistency technology to make a database copy of a running Oracle database.

Figure 26 Copying a running Oracle database with TimeFinder/Mirror



Note that the first iteration of the establish needs to be a full synchronization. Subsequent iterations are incremental and do not need the -full keyword. Once the command is issued, the array begins the synchronization process using only Symmetrix resources. Since this operation occurs independently from the host, the process must be interrogated to see when it completes. The command to interrogate the synchronization process is:


This command will return a 0 return code when the synchronization operation is complete.

ICO-IMG-000507

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle

1 2

Copying a running database using EMC consistency technology 119

120


Alternatively, verify synchronization using the following:


6. When the volumes are synchronized, issue the split command:

symmir -g device_group split -consistent -noprompt

The -consistent keyword tells the Symmetrix array to use ECA (Enginuity Consistency Assist) to momentarily suspend writes to the disks while the split is being processed. The effect of this is to create a point-in-time copy of the database on the BCVs. It is similar to the image created when there is a power outage that causes the server to crash. This image is a restartable copy. The database copy on the BCVs is then available for further processing.

Since there was no specific coordination between the database state and the execution of the consistent split, the copy is taken independent of the database activity. In this way, EMC consistency technology can be used to make point-in-time copies of multiple systems atomically, resulting in a consistent point-in-time with respect to all applications and databases included in the consistent split.

Creating Oracle copies using TimeFinder/CloneTimeFinder/Clone is an EMC software product that copies data internally in the Symmetrix array. A TimeFinder/Clone session is created between a source data volume and a target volume. The target volume must be equal to or greater in size than the source volume. The source and target for TimeFinder/Clone sessions can be any hypervolumes in the Symmetrix configuration.




Figure 27 shows how to use TimeFinder/Clone to make a copy of a running Oracle database onto BCV devices.

Figure 27 Copying a running Oracle database with TimeFinder/Clone



Unlike TimeFinder/Mirror, the TimeFinder/Clone relationship is created and activated when it is needed. No prior copying of data is necessary.

2. After the TimeFinder/Clone relationship is created, activate it consistently:

symclone -g device_group activate -consistent -noprompt

The -consistent keyword tells the Symmetrix to use ECA to momentarily suspend writes to the source disks while the TimeFinder/Clone is being activated. The effect of this is to create a point-in-time copy of the database on the target volumes. It is a copy similar in state to that created when there is a power outage resulting in a server crash. This copy is a restartable copy. After the activate command, the database copy on the TimeFinder/Clone devices is available for further processing.

ICO-IMG-000507

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle

1 2


122


Since there was no specific coordination between the database state and the execution of the consistent split, the copy is taken independent of the database activity. In this way, EMC consistency technology can be used to make point-in-time copies of multiple systems atomically, resulting in a consistent point-in-time with respect to all applications and databases included in the consistent split.

Databases copied using TimeFinder/Clone are subject to COFW and COA penalties. The COFW penalty means that the first time a track is written to the source volume and it has not been copied to the target volume, it must first be copied to the target volume before the write from the host is acknowledged. Subsequent writes to tracks that have already been copied do not suffer from the penalty. COA means that if a track on a target volume is accessed before it has been copied, it must first be copied from the source volume to the target volume. This causes additional disk read activity to the source volumes and could be a source of disk contention on busy systems.


TimeFinder/Snap is an EMC software product that maintains space-saving, pointer-based copies of disk volumes using Virtual Devices (VDEVs) and save devices (SAVDEVs). The VDEVs contain pointers either to the source data (when it is unchanged) or to the SAVDEVs (when the data has changed).

TimeFinder/Snap devices are managed together using SYMCLI device or composite groups. Solutions Enabler commands are executed to create SYMCLI groups for TimeFinder/Snap operations. If the database spans more than one Symmetrix array, a composite group is used. Appendix B, “Sample SYMCLI Group Creation Commands,”provides examples of these commands.



Figure 28 shows how to use TimeFinder/Snap to make a copy of a running Oracle database.

Figure 28 Copying a running Oracle database with TimeFinder/Snap



After the TimeFinder/Snap is created, all pointers from the VDEVs are directed at the source volumes. No data has been copied at this point. The snap can be activated consistently using the consistent activate command.

2. Once the create operation has completed, execute the activate command can with the -consistent option to perform the consistent snap:

symsnap -g device_group activate -consistent -noprompt

The -consistent keyword tells the Symmetrix arrat to use ECA to momentarily suspend writes to the disks while the activate command is being processed. The effect of this is to create a point-in-time copy of the database on the VDEVs. It is similar to the state created when there is a power outage that causes the server to crash. This image is a restartable copy. The database copy on the VDEVs is available for further processing.

STD

VDEV

SAVEDEV

Controllinghost

Targethost

1

2

ICO-IMG-000508



copy on write

I/O

I/O


124


Since there was no specific coordination between the database state and the execution of the consistent split, the copy is taken independent of the database activity. In this way, EMC consistency technology can be used to make point-in-time copies of multiple systems atomically, resulting in a consistent point in time with respect to all applications and databases included in the consistent split.

Databases copied using TimeFinder/Snap are subject to COFW penalty while the snap is activated. The COFW penalty means that if a track is written to the source volume and it has not been copied to the snap-save area, it must first be copied to the snap-save area before the write from the host is acknowledged.



Copying the database with Oracle in hot backup modeFor many years, Oracle has supported hot backup mode, which provides the capability to use split-mirroring technology while the database is online and create a recoverable database on the copied devices. During this process, the database is fully available for reads and writes. However, instead of writing change vectors (such as the rowid, before, and after images of the data) to the online redo log, entire blocks of data are written. These data blocks are then used to overwrite any potential inconsistencies in the data files. While this enables the database to recover itself and create a consistent point-in-time image after recovery, it also degrades performance while the database is in hot backup mode.

An important consideration when using hot backup mode to create a copy of the database is the need to split the archive logs separately from the database. This is because Oracle must recover itself to the point after all of the tablespaces are taken out of hot backup mode. If the hypervolumes containing the archive logs are split at the same time as the data volumes, the marker indicating the tablespaces are out of hot backup mode will not be found in the last archive log. As such, the archive logs must be split after the database is taken out of hot backup mode, so the archive log devices (and generally the redo logs as well) must be separate from the other data files.

The following sections describe the steps needed to put tablespaces or the entire database into hot backup mode and take it out again. Appendix D, “Sample Database Cloning Scripts,” provides a sample script showing how hot backup mode is used to create a recoverable Oracle database image.

Putting the tablespaces or database into hot backup mode

To create a consistent image of Oracle while in hot backup mode, each of the tablespaces in the database must be put into hot backup mode before copying can be performed. The following command connects to the database instance and issues the commands to put the tablespaces (in this case, SYSTEM, DATA, and INDEXES) into hot backup mode:

sqlplus "/ as sysdba"SQL> alter system archive log current;SQL> alter tablespace DATA begin backup;SQL> alter tablespace INDEXES begin backup;

Copying the database with Oracle in hot backup mode 125

126


SQL> alter tablespace SYSTEM begin backup;

Alternatively, with Oracle10g, the entire database can be put into hot backup mode with:

sqlplus "/ as sysdba"SQL> alter system archive log current;SQL> alter database begin backup;

When these commands are issued, data blocks for the tablespaces are flushed to disk and the datafile headers are updated with the last SCN. Further updates of the SCN to the datafile headers are not performed. When these files are copied, the nonupdated SCN in the datafile headers signifies to the database that recovery is required.

Taking the tablespaces or database out of hot backup mode

To take the tablespaces out of hot backup mode, connect to the database and issue the following commands:

sqlplus "/ as sysdba"SQL> alter tablespace DATA end backup;SQL> alter tablespace INDEXES end backup;SQL> alter tablespace SYSTEM end backup;SQL> alter system archive log current;

When these commands complete, the database is returned to its normal operating state.


sqlplus "/ as sysdba"SQL> alter database end backup;SQL> alter system archive log current;

The log file switch command is used to ensure that the marker indicating that the tablespaces have been taken out of hot backup mode is found in an archive log.

Creating Oracle copies using TimeFinder/Mirror

TimeFinder/Mirror is an EMC software product that allows an additional hardware mirror to be attached to a source volume. The additional mirror is a specially designated volume in the Symmetrix configuration, called a BCV. The BCV is synchronized to the source



volume through a process called an establish. While the BCV is established it is not ready to all hosts. At an appropriate time, the BCV can be split from the source volume to create a complete point-in-time copy of the source data that can be used for multiple different purposes including backup, decision support, regression testing, etc.

Groups of BCVs are managed together using SYMCLI device or composite groups. Solutions Enabler commands are executed to create SYMCLI groups for TimeFinder/Mirror operations. If the database spans more than one Symmetrix array, a composite group is used. Appendix B provides examples of these commands.

Figure 29 shows how to use TimeFinder/Mirror to make a copy of an Oracle database in hot backup mode. 1.Establish the BCVs to the standard devices. This operation occurs in the background and should be executed in advance of when the BCV copy is needed.

Figure 29 Copying an Oracle database in hot backup mode with TimeFinder/Mirror


symmir -g data_group establish -full -nopromptsymmir -g log_group establish -full -noprompt

Note that the first iteration of the establish needs to be a "full" synchronization. Subsequent iterations are incremental and do not need the -full keyword. Once the command is issued, the array begins the synchronization process using only Symmetrix

ICO-IMG-000509

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle42

3

5

1


128


resources. Since this is asynchronous to the host, the process must be interrogated to see when it is finished. The command to interrogate the synchronization process is:

symmir -g data_group verifysymmir -g log_group verify


2. When the volumes are synchronized, put the database in hot backup mode. Connect to the database and issue the following commands:


3. Execute a split of the standard and BCV relationship:

symmir -g data_group split -noprompt

The -consistent keyword is not used here as consistency is being provided by the database. The Data BCV(s) now contain an inconsistent copy of the database that can be made consistent through recovery procedures using the archive logs. This is a recoverable database. Usage of recoverable copies of databases is described in “Recoverable disk copies” on page 110.

4. After the replicating process completes, take the database (or tablespaces) out of hot backup mode on the source database:

SQL> alter database end backup;SQL> alter system archive log current;

5. After tablespaces are taken out of hot backup mode and a log switch is performed, split the Log BCV devices from their source volumes:

symmir -g log_group split -noprompt

Creating Oracle copies using TimeFinder/CloneTimeFinder/Clone is an EMC software product that copies data internally in the Symmetrix array. A TimeFinder/Clone session is created between a source data volume and a target volume. The



target volume needs to be equal to or greater in size than the source volume. The source and target for TimeFinder/Clone sessions can be any hypervolumes in the Symmetrix configuration.

TimeFinder/Clone devices are managed together using SYMCLI device or composite groups. Solutions Enabler commands are executed to create SYMCLI groups for TimeFinder/Clone operations. If the database spans more than one Symmetrix array, a composite group is used. Appendix B provides examples of these commands.

Figure 30 shows how to use TimeFinder/Clone to make a copy of an Oracle database in hot backup mode onto BCV devices.

Figure 30 Copying an Oracle database in hot backup mode with TimeFinder/Clone


symclone -g data_group create -nopromptsymclone -g log_group create -noprompt


2. Place the Oracle database in hot backup mode:


ICO-IMG-000509

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle42

3

5

1


130


3. Execute an "activate" of the TimeFinder/Clone:

symclone -g data_group activate -noprompt

The -consistent keyword is not used here as consistency is being provided by the database. The data-clone devices now contain an inconsistent copy of the database that can be made consistent through recovery procedures using the archive logs. This is a recoverable database. “Enabling a cold database copy” on page 140 describes use of recoverable copies of databases.



5. After the tablespaces are taken out of hot backup mode and a log switch is performed, activate the log clone devices:

symclone -g log_group activate -noprompt

Databases copied using TimeFinder/Clone are subject to COFW and COA penalties. The COFW penalty means that the first time a track is written to the source volume and it has not been copied to the target volume, it must first be copied to the target volume before the write from the host is acknowledged. Subsequent writes to tracks already copied, do not suffer from the penalty. COA means that if a track on a target volume is accessed before it is copied, it must first be copied from the source volume to the target volume. This causes additional disk read activity to the source volumes and could be a source of disk contention on busy systems.


TimeFinder/Snap is an EMC software product that maintains space-saving pointer-based copies of disk volumes using VDEVs and SAVDEVs. The VDEVs contain pointers either to the source data (when it is unchanged) or to the SAVDEVs (when the data has changed).



TimeFinder/Snap devices are managed together using SYMCLI device or composite groups. Solutions Enabler commands are executed to create SYMCLI groups for TimeFinder/Snap operations. If the database spans more than one Symmetrix, array, a composite group is used. Appendix B, “Sample SYMCLI Group Creation Commands,” provides examples of these commands.

Figure 31 shows how to use TimeFinder/Snap to make a copy of an Oracle database in hot backup mode.

Figure 31 Copying an Oracle database in hot backup mode with TimeFinder/Snap


symsnap -g data_group create -nopromptsymsnap -g log_group create -noprompt

Unlike TimeFinder/Mirror, the snap relationship is created and activated when it is needed. No prior copying of data is necessary. The create operation establishes the relationship between the standard devices and the VDEVs and it also creates the protection metadata.

2. After the snaps are created, place the Oracle database in hot backup mode:


STD

VDEV

SAVEDEV

Controllinghost

Targethost

1 3 5

2

4

ICO-IMG-000510



copy on write

I/O

I/O


132


3. Execute an "activate" of the TimeFinder/Snap for the data devices:

symsnap -g data_group activate -noprompt

The -consistent keyword is not used here because consistency is being provided by the database. The VDEVs (and possibly SAVDEVs) contain a pointer-based copy of the database while it is in hot backup mode. This is a recoverable database copy. “Enabling a cold database copy” on page 140 describes use of recoverable copies of Oracle databases.

4. Once the snap activate process completes, take the database (or tablespaces) out of hot backup mode on the source database:


5. After the database is taken out of hot backup mode and a log switch is performed, activate the Log snap devices:

symsnap -g log_group activate -noprompt

Databases copied using TimeFinder/Snap are subject to a COFW penalty while the snap is activated. The COFW penalty means that if a track is written to the source volume and it has not been copied to the snap save area, it must first be copied to the save area before the write from the host is acknowledged.



Replicating Oracle using Replication ManagerEMC Replication Manager is used to manage and control the TimeFinder copies of an Oracle database. The RM product has a GUI and command line and provides the capability to:

◆ Autodiscover the standard volumes holding the database.

◆ Identify the pathname for all database files.

◆ Identify the location of the archive log directories.

◆ Identify the location of the database binaries, dump files, and such.

Using this information, RM can set up TimeFinder Groups with BCVs or VDEVs, schedule TimeFinder operations and manage the creation of database copies, expiring older versions as needed.

Figures 32 demonstrates the steps performed by Replication Manager using TimeFinder/Mirror to create a database copy to use for multiple purposes.

Figure 32 Using Replication Manager to make a TimeFinder copy of Oracle

Replication Manager does the following:

1. Logs in to the database and discovers the locations of all the datafiles and logs on the Symmetrix devices. Note that the dynamic nature of this activity will handle the situation when extra volumes are added to the database. The procedure will not have to change.

ICO-IMG-000511

Data STD

Log STD

Arch STD

Data BCV

Oracle

3

1

8

9

10

4

2

6 7

5

Replicating Oracle using Replication Manager 133

134


2. Establishes the standards to the BCVs in the Symmetrix array. Replication Manager polls the progress of the establish process until the BCVs are synchronized, and then moves on to the next step.

3. Performs a log switch to flush changes to disk, minimizing recovery required of the copied database.

4. Puts the Oracle database in hot backup mode, discussed in “Putting the tablespaces or database into hot backup mode” on page 125.

5. Issues a TimeFinder split, to detach the Data BCVs from the standard devices.

6. Takes the Oracle database out of hot backup mode, as discussed in “Taking the tablespaces or database out of hot backup mode” on page 126.

7. Performs another log switch to flush the end of hot backup mode marker from the online redo logs to an archive log.

8. Creates a copy of a backup control file.

9. Copies the backup control file and additional catalog information the Replication Manager host.

10. Copies the database archive logs to the Replication Manager host for use in the restore process.



Transitioning disk copies to Oracle database clonesThe method employed to enable a database copy for use depends on how the copy was created. A database copy created while the source database was in hot backup mode requires Oracle recovery before the database can be opened for normal processing. This requires the database be started in mount mode and recovery started through the recover database command until the point the database was taken out of hot backup mode (or beyond if desired).

If a copy of a running database was created using EMC consistency technology without using hot backup mode, it can be restarted only. Currently, no roll-forward log apply to a point in time after the copy was created is supported by Oracle.

A database copy created with the EMC consistency technology should also be restarted on another server, one different from the one that sees the source database. This is because both the source and target databases have the datafile paths and the same database ID, and therefore can not coexist on the same server. Oracle provides mechanisms to change the database ID through a utility called nid. Additionally, the paths to the datafiles can change.

The following sections describe how to restart a database copy created from a cold database, with the database running using EMC consistency technology, and also a database copy made while in hot backup mode. Details of how to deal with host-related issues when processing the database copy are discussed first.

Host considerations

One of the primary considerations when starting a copy of an Oracle database is whether to present it back to the same host or mount the database on another host. While it is significantly simpler to restart a database on a secondary host, it is still possible to restart a copy of the database on the same host with only a few extra steps. The extra steps required to mount a database to the same host, mounting a set of copied volumes back to the same host, changing the mount points, and relocating the datafiles, are described next.

Transitioning disk copies to Oracle database clones 135

136


Mounting a set of copied volumes to the same hostBefore the database can be presented back to the same host, the hypervolumes must be presented. Additionally, operating system and logical volume specific commands must be run to make the volumes and file systems (if applicable available). Appendix C, “Related Host Operation,” provides detailed procedures by operating system.

Relocating a database copyRelocating the copy of an Oracle database is a requirement if mounting the database back to the same server that sees the source database, or if database datafile locations changed for whatever reason. This is accomplished by writing a backup control file to trace and editing the file written to the $ORACLE_HOME/rdbms/log directory (by default). Generally, the command is used to re-create a control file for the database to use. In this case, the new control file contains a listing of new paths for the datafiles and redo logs to point to. With the addition of an initialization parameter file, the new database can be discovered and started for use.

The following steps are required to generate a file to use to mount a database copy on the same or new host with new datafile locations.

1. Generate the file containing the script to re-create the control file.

sqlplus "/ as sysdba"SQL> alter database backup controlfile to trace;

2. Find and edit the script. The trace file is written to background_dump_dest (by default into the ORACLE_HOME/rdbms/log directory on a UNIX system) and is in the form of SID_ora_nnnnn.trc where nnnnn is a number. It is the customer's responsibility to write this to a new file (create_control.sql for example) before editing the file. The following is an example of backup control file written to trace:

Dump file /oracle/oracle9i/rdbms/log/test_ora_20748.trcOracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit ProductionWith the Partitioning, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.7.0 - ProductionORACLE_HOME = /oracle/oracle9iSystem name: SunOSNode name: l82bk050Release: 5.8Version: Generic_108528-29Machine: sun4uInstance name: test



Redo thread mounted by this instance: 1Oracle process number: 10Unix process pid: 20748, image: oracle@l82bk050 (TNS V1-V3)

*** SESSION ID:(9.6785) 2005-12-11 17:04:18.454*** 2005-12-11 17:04:18.453# The following are current System-scope REDO Log Archival # related parameters and can be included in the database # initialization file.## LOG_ARCHIVE_DEST=''# LOG_ARCHIVE_DUPLEX_DEST=''## LOG_ARCHIVE_FORMAT=T%TS%S.ARC# REMOTE_ARCHIVE_ENABLE=TRUE# LOG_ARCHIVE_START=TRUE# LOG_ARCHIVE_MAX_PROCESSES=2# STANDBY_FILE_MANAGEMENT=MANUAL# STANDBY_ARCHIVE_DEST=?/dbs/arch# FAL_CLIENT=''# FAL_SERVER=''## LOG_ARCHIVE_DEST_1='LOCATION=/oracle/archive'# LOG_ARCHIVE_DEST_1='MANDATORY NOREOPEN NODELAY'# LOG_ARCHIVE_DEST_1='ARCH NOAFFIRM SYNC'# LOG_ARCHIVE_DEST_1='NOREGISTER NOALTERNATE NODEPENDENCY'# LOG_ARCHIVE_DEST_1='NOMAX_FAILURE NOQUOTA_SIZE NOQUOTA_USED'# LOG_ARCHIVE_DEST_STATE_1=ENABLE## Below are two sets of SQL statements, each of which creates # a new

control file and uses it to open the database. The # first set opens the database with the NORESETLOGS option and # should

be used only if the current versions of all online # logs are available. The second set opens the database with # the RESETLOGS option and should be used if online logs are # unavailable.# The appropriate set of statements can be copied from the # trace into a script file, edited as necessary, and executed # when

there is a need to re-create the control file.## Set #1. NORESETLOGS case## The following commands will create a new control file and # use it to open the database. Data used by the recovery # manager will be lost. Additional logs may be required for # media recovery of offline datafiles. Use this only if the # current version of all online logs are available.STARTUP NOMOUNTCREATE CONTROLFILE REUSE DATABASE "TEST" NORESETLOGS NOARCHIVELOG-- SET STANDBY TO MAXIMIZE PERFORMANCE MAXLOGFILES 16 MAXLOGMEMBERS 2


138


MAXDATAFILES 30 MAXINSTANCES 2 MAXLOGHISTORY 224LOGFILE GROUP 1 ( '/oracle/oradata/test/oraredo1a.dbf', '/oracle/oradata/test/oraredo2a.dbf' ) SIZE 10M, GROUP 2 ( '/oracle/oradata/test/oraredo1b.dbf', '/oracle/oradata/test/oraredo2b.dbf' ) SIZE 10M,GROUP 3 ( '/oracle/oradata/test/oraredo1c.dbf', '/oracle/oradata/test/oraredo2c.dbf' ) SIZE 10M-- STANDBY LOGFILEDATAFILE '/oracle/oradata/test/orasys.dbf', '/oracle/oradata/test/oraundo.dbf', '/oracle/oradata/test/orausers.dbf'CHARACTER SET US7ASCII;# Recovery is required if any of the datafiles are restored # backups, or if the last shutdown was not normal or # immediate.RECOVER DATABASE# Database can now be opened normally.ALTER DATABASE OPEN;# Commands to add tempfiles to temporary tablespaces.# Online tempfiles have complete space information.# Other tempfiles may require adjustment.ALTER TABLESPACE TEMP_TS ADD TEMPFILE '/oracle/oradata/test/oratest.dbf' SIZE 524288000 REUSE AUTOEXTEND OFF;# End of tempfile additions.## Set #2. RESETLOGS case## The following commands will create a new control file and # use it to open the database. The contents of online logs # will be lost and all backups will be invalidated. Use this # only if online logs are damaged.STARTUP NOMOUNTCREATE CONTROLFILE REUSE DATABASE "TEST" RESETLOGSNOARCHIVELOG-- SET STANDBY TO MAXIMIZE PERFORMANCE MAXLOGFILES 16 MAXLOGMEMBERS 2 MAXDATAFILES 30 MAXINSTANCES 2 MAXLOGHISTORY 224



LOGFILE GROUP 1 ( '/oracle/oradata/test/oraredo1a.dbf', '/oracle/oradata/test/oraredo2a.dbf' ) SIZE 10M, GROUP 2 ( '/oracle/oradata/test/oraredo1b.dbf', '/oracle/oradata/test/oraredo2b.dbf' ) SIZE 10M, GROUP 3 ( '/oracle/oradata/test/oraredo1c.dbf', '/oracle/oradata/test/oraredo2c.dbf' ) SIZE 10M-- STANDBY LOGFILEDATAFILE '/oracle/oradata/test/orasys.dbf', '/oracle/oradata/test/oraundo.dbf', '/oracle/oradata/test/orausers.dbf'CHARACTER SET US7ASCII;# Recovery is required if any of the datafiles are restored # backups, or if the last shutdown was not normal or # immediate.RECOVER DATABASE USING BACKUP CONTROLFILE# Database can now be opened zeroing the online logs.ALTER DATABASE OPEN RESETLOGS;# Commands to add tempfiles to temporary tablespaces.# Online tempfiles have complete space information.# Other tempfiles may require adjustment.ALTER TABLESPACE TEMP_TS ADD TEMPFILE '/oracle/oradata/test/oratest.dbf' SIZE 524288000 REUSE AUTOEXTEND OFF;# End of tempfile additions.#After deciding whether to open the database with a reset logs and

editing the file appropriately, the datafile locations can change. When run, the instance will search in the new locations for the Oracledatafiles.

sqlplus "/ as sysdba"SQL> @create_controlThis will create the new database, relocating the datafiles into the

newly specified locations.

Changing the SID, DBNAME, and DBIDA normal part of presenting a database clone to the same or a new host is changing identifiers associated with the database. These identifiers include:

◆ SID - System ID is used to distinguish Oracle instances on a host.


140


◆ DBNAME - Database Name defined in the initialization parameter at database creation and is written to the control file. It specifies the service name of the database and should be the same as that defined in the tnsnames.ora file.

◆ DBID - Database ID, which is the internal database unique identifier.

Changing the SID is a simple procedure. It is accomplished by shutting down the database, changing the initialization parameter ORACLE_SID, and restarting the database. The new SID will be used to name the processes that are initiated as part of the Oracle startup procedures. An additional step needed is to create a unique init<SID>.ora parameter file in the ORACLE_HOME/dbs directory.

Changing the DBNAME and DBID are more complicated since they are written in the control file and into the database itself. Oracle provides two utilities for changing the DBNAME and DBID. They are the dbnewid and nid utilities. In addition, the DBNAME can be changed by re-creating the control file using the procedures outlined in “Relocating a database copy” on page 136. For changing the DBNAME, the DBID, or both, these steps are performed after any recovery procedures required are completed by the database.

Enabling a cold database copy A cold database copy is a database image taken when the copied database is shut down. A cold database copy ensures that the database copy is consistent when it is restarted; no crash recovery is required to make the database transactionally consistent. Restarting a database copy taken while the database was shut down does not require any crash recovery and as such requires minimal time to restart and open.

The following steps describe the process for restarting a cold database copy. It assumes that either the database is being started on another host, or that the processes listed in “Host considerations” on page 135 have completed.

1. Use the following SYMCLI command to verify that the appropriate hypers are available to the host:

syminq



2. After the appropriate devices are available to the host, make the operating system aware of the devices. In addition, import the volume or disk groups and mount any file systems. This is operating-system dependent and is discussed in Appendix C, “Related Host Operation.”

3. Since the database was shut down when the copy was made, no special processing is required to restart the database. Start the database as follows:

sqlplus "/ as sysdba"SQL> startup;

Alternatively, if additional archive logs, redo logs, and a valid control file from the copied database is available to roll forward the database to a specified point in time, use the following instead:

sqlplus "/ as sysdba"SQL> startup mount;SQL> recover database;

Enabling a restartable database copyIf a restartable database copy is started on the same host, the Oracle SID and DBID must change. If it is started on a different server, the SID and DBID can be left the same or changed. In most cases, it is appropriate to change the SID and DBID so that connections through Oracle Net are uniquely identified. The following steps show the process for enabling a restartable database copy.


syminq

2. After the appropriate devices are available to the host, make the operating system aware of the devices. In addition, import the volume or disk and any mount file systems. This is operating-system dependent and is discussed in Appendix C, “Related Host Operation.”

3. Since the database was shut down when the copy was made, no special processing is required to restart the database. Start the database as follows:

sqlplus "/ as sysdba"


142


SQL> startup;

Enabling a hot backup database copyA database copy made with the database in hot backup mode by definition requires recovery before the database can be opened for user transactions. The following steps are used to recovery and open a database copy made with the database in hot backup mode.


syminq

2. After the appropriate devices are available to the host, make the operating system aware of the devices. In addition, import the volume or disk groups and mount any file systems. This is operating-system dependent and is discussed in Appendix C, “Related Host Operation.”

3. Since the database was shutdown when the copy was made, no special processing is required to restart the database. The following is used to start the database:


After applying the required logs to make the database consistent to the point where the database was taken out of hot backup mode, the copy can open for user transactions.



Oracle transportable tablespacesA number of customers require Oracle objects, specifically tablespaces, to be cloned across Oracle instances. Reasons for cloning Oracle objects include building development environments, providing system integration and test environments, building read-only decision support environments, or satisfying a number of other business-related requirements for Oracle data. Oracle provides a mechanism for moving tablespace objects from one database to another through a method called transportable tablespaces. In conjunction with EMC replication technologies such as TimeFinder, Open Replicator, or SRDF, pieces of an Oracle database can be cloned and attached to an alternate database for a variety of business applications.

The transportable tablespace feature was introduced in Oracle8i to facilitate the bulk movement of data between two Oracle databases running the same operating system and version of Oracle. Additionally, new functionality built into Oracle10g allows tablespaces to be transported between different operating system platforms (such as a Sun tablespace to a Windows environment). This enhancement allows greater flexibility when migrating from one operating system to another or when creating test/development systems on lower-cost operating environments.

Benefits and uses of transportable tablespaces

Transportable tablespaces are moved by placing the target tablespaces into read-only mode at the Oracle level, and then copying the associated operating system files by an external means (such as cp, dd, ftp, and so on) to place them on the host where the target database is located. Previous methods of transferring the data, such as export and import, required significant time and effort to migrate the data to the new instance. Transportable tablespaces provide a simple mechanism for tablespace copies to be incorporated into a second database environment.

There is a myriad of uses for Oracle transportable tablespaces. For example, customers may need to access data from their OLTP database to populate their data warehouse system. Transportable tablespaces provide a mechanism to migrate the required data directly into the data warehouse environment. Another example for transportable tablespaces is migrating periodic data (for example,

Oracle transportable tablespaces 143

144


monthly sales data) from high-end mirrored storage to cheaper RAID 5 volumes as the data ages and access requirements change. Transportable tablespaces allows data to be moved quickly and easily between the RAID types.

Implementation of transportable tablespaces with EMC TimeFinder and SRDFWhen implementing Oracle transportable tablespaces on a database running on a Symmetrix array, replication software such as TimeFinder or SRDF may be used to create a clone of the tablespace(s) to be transported to the target environment. Creating copies of the data in this manner has the advantage that no host cycles are used in the cloning process. Additionally, following the initial full-copy synchronization, incremental replication, in which only changed tracks are copied, can be used when TimeFinder or SRDF are the replication method. This significantly reduces the time required to copy the datafiles to the target environment. Finally, this process is also easily scripted or managed through EMC management products like Replication Manager. The next section provides an example of transportable tablespaces with the EMC TimeFinder software.

Transportable tablespace exampleBefore implementing transportable tablespaces, a few requirements must be addressed. For example, the source and target databases must use the same character set and national character set. Also, a transportable tablespace cannot be mounted to a database containing a tablespace with that name. Any users that own objects in the tablespace may either exist or be created in the target database; objects can be transferred to other users if required. Additionally, starting with Oracle9i, multiple block sizes can exist in the database. If the block size used for the tablespace is not the default size for the target database, buffer cache of the size used by the transportable tablespace must be allocated in the target.

The major limitation with transportable tablespaces however, is that the tablespace set (the group of tablespaces to be migrated) must be self-contained. Self-contained means that indexes or referential integrity constraints on any tablespace object, for example the primary key index on a table, must be included as a part of the transportable tablespace set. Thus, in a typical customer environment where table data and indexes are separated into their own tablespaces, either both



tablespaces must be a part of the transportable tablespace set, or the indexes must be dropped before the transportable tablespace set is created.

Oracle provides a procedure to verify that the transportable tablespace set is self-contained. This procedure, called TRANSPORT_SET_CHECK, is a part of the DBMS_TTS package. This package is created as a part of the dbms_plugts script, which is automatically run as a part of catproc. The role EXECUTE_CATALOG_ROLE must be assigned to any user that executes this procedure.

The following is an example of the steps needed to verify that a set of tablespaces can transport successfully. This example uses two tablespaces, DATA1 and INDEX1, and verifies that they can be successfully transported to a target database.

1. Determine the national character set in use by the source and target databases:

SELECT *FROM v$nls_parametersWHERE parameter = 'NLS_CHARACTERSET';The output from this SQL command on both the source and

target databases should be identical. For example:PARAMETER VALUE------------------- -------------------NLS_CHARACTERSET WE8ISO8859P1

2. Verify that tablespaces with similar names do not already exist in the target database:

SELECT tablespace_nameFROM dba_tablespaces;

TABLESPACE_NAME-------------------------------SYSTEMSYSAUXTEMP1UNDO1USERS1

3. Determine the users that own objects in the tablespaces to be transported, and verify that either they exist in the target database or create them. Note that these objects can be transferred to another user, if required.

SELECT DISTINCT ownerFROM dba_segments


146


WHERE tablespace_name IN ('DATA1', 'INDEX1');

OWNER---------------DEV1USER1These owners (schemas) need to be verified on the

target side:SELECT usernameFROM dba_users;

USERNAME---------------SYSSYSTEM

In this case, the DEV1 user exists but the USER1 user does not. The USER1 user must be created with the command:

CREATE USER user1IDENTIFIED BY user1;

4. Verify the default block sizes in each database. If different, additional block buffer cache must be allocated in the target database of the appropriate size. Multiple values for database block sizes were released in Oracle9i.

SELECT name, valueFROM v$parameterWHERE name = 'db_block_size';

NAME VALUE-------------------------------------db_block_size8192

5. Verify if the tablespaces in the set (DATA1 and INDEX1) are self-contained:

EXECUTE dbms_tts.transport_set_check('DATA1,INDEX1',TRUE);

SELECT *FROM transport_set_violations;

VIOLATIONS-----------------------------------------------------

--CONSTRAINT FK_SALES_ORDER_DEPT between table

DEV1.SALESin tablespace DATA1 and table DEV2.ORDER_DEPT intablespace DATA2



PARTITIONED TABLE DEV1.SALES is partially contained inthe transportable setIn this example, the foreign key constraint on the DEV1.SALES table would need to be dropped. Additionally, the partitioned table DEV1.SALES would need to be addressed.

After determining that any issues with the self-contained tablespaces are addressed, the tablespaces need to be put into read-only mode (or taken completely offline) so that copies of the files can be made and presented to the target host. Additionally, metadata concerning the tablespaces must be extracted so that the tablespaces can be successfully "plugged into" the new environment. It should be noted that extraction of the metadata from the source database and importing it into the target is Oracle version dependent. Following are the steps to implement a transportable tablespace:

1. Put the two tablespaces into read-only mode:

ALTER TABLESPACE data1 READ ONLY;ALTER TABLESPACE index1 READ ONLY;

2. Extract tablespace metadata. There are two ways to do this. The first method, available in all Oracle versions, is by using the Oracle export utility EXP. The second, a new feature in Oracle10g, uses the Oracle Data Pump utility.EXP transport_tablespace = ytablespaces = data1, index1triggers = yconstraints = ygrants = yfile = d:\oracle\exp\meta1.dmp

Alternatively, Oracle Dump Pump syntax for the metadata extract is as follows:

EXPDP system/manager DUMPFILE = meta1.dmp DIRECTORY = d:\oracle\exp TRANSPORT_TABLESPACES = data1,index1

3. After successfully extracting the metadata, copy the datafile(s) associated with the tablespaces. First, the datafiles must be identified and copied to their new location (either on the same host or a different one). A variety of methods for copying the datafiles are available including cp, copy, cpio, tar, or the DBMS_FILE_COPY package. Additionally, EMC software such as


148


TimeFinder or SRDF can be used to clone the volumes as described in the following section. In this example, TimeFinder is used to clone the data.

SELECT tablespace_name, file_nameFROM dba_data_filesWHERE tablespace_name in ('DATA1', 'INDEX1');

TABLESPACE_NAME FILE_NAME--------------- --------------------------------DATA1 d:\oracle\oradata\db1\data1.dbfINDEX1 d:\oracle\oradata\db1\index1.dbf

In this case, both required datafiles are on the d:\ drive. This volume will be identified and replicated using TimeFinder. Note that careful database layout planning is critical when TimeFinder is used for replication. First, create a device group for the standard device used by the d:\ drive and a BCV that will be used for the new e:\ drive. Appendix B, “Sample SYMCLI Group Creation Commands,”provides examples of creating device groups.

4. After creating the device group, establish the BCV to the standard device:symmir -g device_group establish -full -nopromptsymmir -g device_group verify -i 30

5. After the BCV is fully synchronized with the standard device, the devices can split since the tablespaces on the device are in read-only mode.symmir -g device_group split -noprompt

6. A full, track-by-track copy of the d:\ drive is now available on the BCVs. Once BCVs are split, they become available to the host they are presented to. The BCVs should be mounted on another host (could be the same host) that contains the database where transported tablespaces will be mounted to. Appendix C, “Related Host Operation,” provides the steps to present devices to each of the operating system types. To verify that the volumes are presented to the host, enter: syminq

7. Once the tablespace datafiles are in place, import the metadata information into the target database:IMP transport_tablespace = ydatafiles = (e:\oracle\oradata\db1\data1.dbf, e:\oracle\oradata\db1\index1.dbf)



file = d:\oracle\exp\meta1.dmptablespaces = (data1,index1)tts_owners = (dev1,dev2)Alternatively, with Data Pump in Oracle10g:IMPDPsystem/manager DUMPFILE = meta1.dmp DIRECTORY = d:\oracle\exp\ TRANSPORT_DATAFILES =

e:\oracle\oradata\db1\data1.dbf,e:\oracle\oradata\db1\index1.dbf

8. Put the tablespaces on the target host into read/write mode:ALTER TABLESPACE data1 READ WRITE;ALTER TABLESPACE index1 READ WRITE;


150


Cross-platform transportable tablespacesOracle introduced a new concept with transportable tablespaces in Oracle10g. Cross-platform transportable tablespaces are an enhancement to previous functionality that allows a single tablespace, or a set of tablespaces, to be migrated from one operating system to another. Previously, use of transportable tablespaces required that both the operating system and version of Oracle be the same between target and source. If the target database is Oracle10g, these limiting feature requirements no longer apply. Cross-platform transportable tablespaces provide customers with a significantly improved method for migrating Oracle databases from one operating system to another.

Overview

Cross-platform transportable tablespaces enable data from an Oracle database running on one operating system to be cloned and presented to another database running on a different platform. Oracle datafiles differences, as a result of the need to run on different operating systems, are a function of byte ordering, or "endianness," of the files. The endian format of the datafiles is classified as either "big endian" or "little endian" (in "big endian," the first byte is the most significant while in "little endian", the first byte is the least significant). If two operating systems both use "big endian" byte ordering, the files can transferred between operating systems and used successfully in an Oracle database (through a feature such as transportable tablespaces). For source and target operating systems with different byte ordering, a process to convert the datafiles from one "endianness" to another is required.

Oracle uses an RMAN option to convert a data file from "big endian" to "little endian" and vice versa. First, the "endianness" of the source and target operating systems must be identified. If different, then the datafiles are read and converted by RMAN. Upon completion, the "endianness" of the datafiles is converted to the format needed in the new environment. The process of converting the cloned datafiles occurs either on the source database host before copying to the new environment or once it is received on the target host. Other than this conversion process, the steps for cross-platform transportable tablespaces are the same as for normal transportable tablespaces.



Implementing cross-platform transportable tablespacesThe following example shows the process needed to implement cross-platform transportable tablespaces for a tablespace migrating from a Solaris (big endian format) to a Linux host (little endian format). In this example, the tablespace OR_UFS is migrated from a Solaris to a Linux Oracle database.

1. Verify that the tablespace, or set of tablespaces, are self-contained. This means that objects in the tablespace set must not have associated objects (such as indexes, materialized views, or partitioned tables) outside of the specified tablespace set. Oracle provides the procedure TRANSPORT_SET_CHECK as a part of the Oracle provided package DBMS_TTS. For example:

EXECUTE DBMS_TTS.TRANSPORT_SET_CHECK('OR_UFS', TRUE);Any violations of the tablespace being self-contained

are written to the TRANSPORT_SET_VIOLATIONS view, and queried using:

SELECT *FROM TRANSPORT_SET_VIOLATIONS;

If no rows are selected, the tablespace is self-contained.

2. Determine the source and target database endian formats and determine whether conversion is required. The first step lists the endian formats for all available operating systems. The second shows the specific format for the database platform in use.

SELECT platform_id, platform_name, endian_formatFROM v$transportable_platformORDER BY 1;PLATFORMP LATFORM_NAME ENDIAN_FORMAT----------------------------------------------------- 1 Solaris[tm] OE (32-bit) Big 2 Solaris[tm] OE (64-bit) Big 3 HP-UX (64-bit) Big 4 HP-UX IA (64-bit)Big 5 HP Tru64 UNIX Little 6 AIX-Based Systems (64-bit)Big 7 Microsoft Windows IA (32-bit)Little 8 Microsoft Windows IA (64-bit)Little 9 IBM zSeries Based LinuxBig10 Linux IA (32-bit)Little11 Linux 64-bit for AMDLittle 12 Microsoft Windows 64-bit for AMDLittle13 Linux 64-bit for AMD Little15 HP Open VMS Little16 Apple Mac OS BigSELECT a.platform_name, a.endian_format

Cross-platform transportable tablespaces 151

152


FROM v$transportable_platform a, v$database bWHERE a.platform_name = b.platform_name;On the Solaris host, the output from this SQL command is:PLATFORM_NAME ENDIAN_FORMAT----------------------------- --------------Solaris[tm] OE (32-bit)BigOn the Linux host, this command returns:PLATFORM_NAME ENDIAN_FORMAT----------------------------- --------------Linux IA (32-bit)Little

3. Either shut down the database or put the tablespace(s) into read-only mode so that a clean copy of the datafiles that make up the tablespace set can be made. A tablespace (in this example, OR_UFS) is put into read-only mode with the following:

SQL> ALTER TABLESPACE or_ufs READ ONLY;

4. Metadata of the tablespace set must be created and copied to the target environment. Use either the Oracle export utility or the new Data Pump facility to create this file. The following shows the commands to create the tablespace metadata information using Oracle Data Pump:

expdp system/manager dumpfile=or_ufs.dmp directory=dpump_dir transport_tablespaces=or_ufs transport_full_check=Y

5. After putting the tablespace in read-only mode, the datafiles can be copied and presented to the target host. There are many ways to manage this replication process including host-based (cp, rcp, ftp, and so on) and storage-based methods (TimeFinder, SRDF, Open Replicator). These new target volumes are then presented to the target host.

6. The endianness of the data may be converted either on the storage or the target host. In this example, the conversion process is performed after migrating the data to the target. The Oracle RMAN utility is used to convert the data file. The following shows an example of the RMAN conversion process:

RMAN> CONVERT DATAFILE "/ufs/or_ufs.dbf"2> TO PLATFORM="Linux IA (32-bit)"3> FROM PLATFORM="Solaris[tm] OE (64-bit)"4> DB_FILE_NAME_CONVERT="/ufs","/ufs2"5> PARALLELISM=2;



The datafile is converted to little endian and is written to the new directory location /ufs2 from the /ufs directory using the same filename.

7. After converting the file, the tablespace may now be "plugged" into the target database. The Data Pump utility is used to facilitate the process.

impdp system/manager dumpfile=or_ufs.dmp directory=dpump_dir transport_datafiles=/ufs2/or_ufs.dbf

Cross-platform transportable tablespaces 153

154


Choosing a database cloning methodologyThe replication technologies described in the prior sections each have pros and regarding their applicability to solve a given business problem. Table 9 compares the different methods to use and the differing attributes of those methods.

The following are examples of some of the choices you might make for database cloning based on the information in Table 10.

Table 9 Comparison of database cloning technologies

TimeFinder/Snap TimeFinder/Clone TimeFinder/Mirror Replication Manager

Maximum number of copies

15 Incremental: 16Non-inc: Unlimited

Incremental: 16Non-inc: Unlimited

Incremental: 16Non-inc: Unlimited

No. simultaneous Copies

15 16 2 2

Production impact COFW COFW & COA None None

Scripting Required Required Required Automated

Database clone needed a long time

Not recommended Recommended Recommended Recommended

High write usage to DB clone

Not recommended Recommended Recommended Recommended

Table 10 Database cloning requirements and solutions

System requirements Replication choices

The application on the source volumes is very performance- sensitive and the slightest degradation will cause responsiveness of the system to miss SLAs.

TimeFinder/MirrorReplication ManagerIntroductionIntroduction

Space and economy are a real concern. Multiple copies are needed and retained only a short period of time, with performance not critical.

TimeFinder/SnapReplication Manager

More than two simultaneous copies need to be made. The copies will live for up to a month’s time.

TimeFinder/Clone

Multiple copies are being made, some with production mount. The copies are reused in a cycle expiring the oldest one first.

Replication Manager


4


◆ Introduction ............................................................................................ 156◆ Comparing recoverable and restartable copies of databases........... 157◆ Database organization to facilitate recovery ...................................... 159◆ Oracle backup overview ....................................................................... 161◆ Using EMC replication in the Oracle backup process ...................... 166◆ Copying the database with Oracle shutdown ................................... 168◆ Copying a running database using EMC consistency technology . 175◆ Copying the database with Oracle in hot backup mode.................. 182◆ Backing up the database copy.............................................................. 190◆ Backups using EMC Replication Manager for Oracle backups ...... 191◆ Backups using Oracle Recovery Manager (RMAN) ......................... 193◆ Backups using TimeFinder and Oracle RMAN................................. 195

Backing Up OracleEnvironments

Backing Up Oracle Environments 155

156

Backing Up Oracle Environments

IntroductionAs a part of normal day-to-day operations, the DBA creates backup procedures that run one or more times a day to protect the database against errors. Errors can originate from many sources (such as software, hardware, user, and so on) and it is the responsibility of the DBA to provide error recovery strategies that can recover the database to a point of consistency and also minimize the loss of transactional data. Ideally, this backup process should be simple, efficient, and fast.

Today, the DBA is challenged to design a backup (and recovery) strategy to meet the ever-increasing demands for availability that can also manage extremely large databases efficiently while minimizing the burden on servers, backup systems, and operations staff.

This section describes how the DBA can leverage EMC technology in a backup strategy to:

◆ Reduce production impact of performing backups.

◆ Create consistent point-in-time backup images.

◆ Create restartable or recoverable database backup images.

◆ Enhance Oracle's RMAN backup utility.

Before covering these capabilities, it is necessary to review some terminology and also to look at best practices for Oracle database layouts that can facilitate and enhance the backup and restore process.



Comparing recoverable and restartable copies of databasesThe Symmetrix-based replication technologies for backup discussed in this section can create two types of database copies: recoverable or restartable. A significant amount of confusion exists between these two types of database images; a clear understanding of the differences between the two is critical to ensure that the recovery goals for a database can be met.

Recoverable disk copies

A recoverable database copy is a disk copy of the database in which transaction logs can be applied to datafiles to roll forward the database content to a point in time after the copy is created. A recoverable Oracle database copy is intuitively easy for DBAs to understand since maintaining recoverable copies, in the form of backups, is an important DBA function. In the event of a failure of the production database, the ability to recover the database not only to the point in time when the last backup was taken, but also to roll forward subsequent transactions up to the point of failure, is a key feature of the Oracle database.

Creating recoverable images of Oracle databases with EMC replication technology requires that the database be shut down when it is copied or if a running database is to be replicated, the database must be in hot backup mode. “Putting the tablespaces or database into hot backup mode” on page 182 describes this mode in detail.

Restartable disk copiesIf a copy of a running Oracle database is created using EMC consistency technology without putting the database in hot backup mode, the copy is a DBMS restartable copy. This means that when the restartable database copy is first brought up, it performs crash recovery. First, all transactions recorded as committed and written to the redo log, but which may not have had corresponding data pages written to the datafiles, are rolled forward using the redo logs. Second, after the application of log information completes, Oracle rolls back any changes that were written to the database (dirty block buffers flushed to disk for example) but that were never actually

Comparing recoverable and restartable copies of databases 157

158


committed by a transaction. The state attained as a result of these two actions is often referred to as a transactionally consistent point-in-time database state.Roll-forward recovery using archive logs to a point in time after the disk copy is created is not supported on an Oracle restartable database copy.



Database organization to facilitate recoveryIt is advantageous to organize the database on disk to facilitate recovery. Since array replication techniques copy volumes at the physical disk level (as seen by the host), all datafiles for a database should be created on a set of disks dedicated to the database and should not be shared with other applications and databases. For UNIX systems using a logical volume manager (LVM), ensure that the data files reside in a volume group dedicated to the database. Sharing with other applications can cause unnecessary work for the array and waste space on the target volumes.

In addition to isolating the database to be copied onto its own dedicated volumes, the database should also be divided into two parts, the data structures and the recovery structures. The recovery structures consist of the redo logs, the archive logs, and the control files. The database data volumes hold the data files such as the SYSTEM, SYSAUX, and other database tablespaces. Figure 33 depicts a TimeFinder setup where the data structures and recovery structures have been separated onto their own volumes.

Figure 33 Database organization to facilitate recovery

Archivelogs

Redologs

Control

Archivelogs

Redologs

Control

SYSTEMSYSAUX

DATA

INDEX UNDOTEMP

SYSTEMSYSAUX

DATA

INDEX UNDO

Standarddevices

BCVs

SYMMETRIX

ICO-IMG-000512

Database organization to facilitate recovery 159

160


The strategy of separating data structures and recovery structures allows just the data structures to be restored from a database disk copy or from a tape dump of the database copy. Once the data structures are restored, the database can be rolled forward to a designated point in time. If the data and recovery structures are not separated, the resulting database restore will return the data state to that of the restored database image, and no roll-forward processing will be possible as the more current logs will be overwritten by the restore process.



Oracle backup overviewOracle has two primary backup methods: user-managed and Recovery Manager (RMAN). A traditional user-managed backup involves putting the database into an allowed backup state, and then copying datafiles to tape or disk using operating-system commands. User-managed backups require careful consideration by the DBA to ensure that the appropriate files (archive logs, control files, datafiles, and so on) are successfully backed up and accessible in the event a recovery process is required. Recovery typically involves using the appropriate server platform means to perform database file restore from backup file images, and then explicitly performing log recovery against the restored database file images.

An alternative to user-managed backup is RMAN. RMAN provides both a utility for easily managing the backup process and facilitating restore and recovery procedures should they be needed. RMAN does this by providing management functions to locate datafile members and scripting procedures to automate the backup process. It also maintains an internal list of backup filenames and their locations to automate the recovery process. RMAN (when not used with proxy-copy) also checks the Oracle block integrity during the backup and logs errors if a corrupt block is found. RMAN also allows finer recovery granularity. RMAN provides an automated, efficient utility that simplifies the Oracle backup and recovery process.

Note: Although it is common to see RMAN back up the production database directly, it is recommended to offload RMAN backup to a replica of the database.

The following sections describe considerations for the various user-managed backups. These include:

◆ Online versus offline backups - Occasionally, backups are performed by shutting down the primary database, copying the data files to tape or disk, and then restarting the database. However, this requires significant downtime. Given the 24x7x365 uptime requirements generally needed by IT today, hot backup mode with Oracle database backups is widely used.

◆ Point-in-time versus roll-forward recovery backups — Historically, backup procedures involved creating a recoverable database so that further transactions found in the archive logs

Oracle backup overview 161

162


could be used to recover a database to the point in time of a failure. However, this recovery process can be operationally complex in federated environments.

◆ Partial (tablespace or datafile) versus entire database backups — Oracle provides the ability to back up a single data file or tablespace in addition to backing up the entire database. This option is useful, for example, when a tablespace will not be accepting new transactions and is converted to read-only. After doing a final backup, there is no reason to continue backups to this tablespace since the data is unchanging.

◆ Incremental versus full database backups — RMAN provides the means of only backing up changed data blocks, rather than backing up a full copy of the datafiles.

The backup process consists of three primary steps:

◆ Preparing the database for backup (shutting down the database, putting the database in hot backup mode, or not conditioning the database at all)

◆ Initiating the backup process to disk or tape from the operating-system or array level

◆ Verifying the backup has completed successfully, that the backup media (tape or disk) is protected and available for recovery (if needed), and that the backup can be used for recovery purposes

Most backup procedures require that the database be conditioned in some way before the data files that compose the database are backed up. This includes either shutting down the database or putting the database in hot backup mode. However, restartable backup images captured without requiring any special operational preparation to "condition" the database are growing in popularity due to federated database and application environments. Although these types of backups do not currently allow roll-forward operations to the point of a failure as in the case of a recoverable image, they are important particularly when used as offsite backups for DR.

User-managed backups require either host- or array-based replication mechanisms for copying the data. Traditionally, backups are written to tape; although there is increasing interest in writing backup images to disk for performance and availability reasons. The use of EMC local and remote replication technologies including Replication Manager simplify and enhance performance of the backup process by allowing most of the heavy work of creating the actual operational backup to be offloaded from the production database service.



Perhaps the most critical, but often overlooked, component of the backup process is verification of the database once the backup process has completed. Important and often difficult management tasks in the recovery process include:

◆ Ensuring that database image is complete and available in the event it is needed for a recovery and has not lost any transactions.

◆ Integrating the database recovery process with application information, outside data files, or other databases or applications.

Verifying the database after a backup depends on the customer's specific applications, requirements, and environment.

The paramount consideration for any backup and recovery processes is the need for tested and well-documented backup and recovery procedures. Planning for, documenting, and testing the required backup procedures for a particular database environment is an essential part of maintaining a workable recovery strategy. In many cases, tested and documented procedures for both backup and recovery are not available in customer's IT environments. Without these tried and documented procedures, unforeseen issues can arise (such as the loss of key employees) or catastrophic failures can occur, causing significant deleterious affects to critical business systems.

Online (hot) versus offline (cold) backupsThe ability to create database images made consistent during recovery while both read and write transactions are processing is a key differentiating feature of Oracle. Backup images made while the database is hot are critical in environments with stringent uptime requirements. By comparison, shutting down the database provides customers with a safe and consistent method of creating backup database images at the expense of database service outages. Choosing which of these user-managed backup methods to implement depends on the customer's needs and database environment.

Hot backups allow for greater availability, but also create more complex recovery strategies as all logs containing active transactions must be present at recovery time to be successful. Cold backups make for simple restore and recovery, but reduce the availability of the system. Prolonged database service shutdowns to accommodate the creation of extremely large database backups are frequently unacceptable to the business. In these cases, online backups must be performed.


164


Making a hot copy of the database is now the standard, but this method has its own challenges. How can a consistent copy of the database and supporting files be made when they are changing throughout the duration of the backup? What exactly is the content of the tape backup at completion? The reality is that the tape data is a "fuzzy image" of the disk data, and considerable expertise is required to restore the database back to a database point of consistency.

Online backups are made when the database is running in log archival mode. While there are performance considerations for running in archive log mode, the overhead associated with it is generally small compared with the enhanced capabilities and increased data protection afforded by running in it. Except in cases such as large data warehouses where backups are unnecessary, or in other relatively obscure cases, archive log mode is generally considered a best practice for all Oracle database environments.

Point-in-time and roll-forward recovery backupsUntil recently, conditioning the database either through shutting down the database or putting the tablespaces into hot backup mode was the only way to make Oracle database images used for recovery. However, the requirement to recover databases and applications in federated environments has driven the need to create new methods of backing up databases and applications. Enabled through EMC consistency technologies, point-in-time backup images, rather than fully recoverable copies have become increasing important in customers' complex federated environments. Consistently split technologies in both local and remote EMC replication solutions provide a means of creating dependent-write consistent database images made transactionally consistent through the database's own recovery mechanisms.

One backup method gaining increasing usage is combining EMC consistency technologies with Oracle hot backup mode when creating backup images. Using the two technologies together provides enhanced flexibility during the recovery process since both restartable and recoverable databases are supported when this process is used.

Note: Currently, using the EMC consistency technology to create a recoverable database image without conditioning Oracle is not supported.



Comparing partial and entire database backupsBacking up all the datafiles that compose the database is the typical method of doing database backups. In some cases, however, backing up only pieces of the database, for example the datafiles that make up a single tablespace make sense. One example of this is the read-only tablespace. In some environments, a read-only tablespace is created after all updates to it are complete. Monthly orders or sales transactions after month-end processing are examples of where a tablespace might be converted from read/write to read-only. Once a full backup of the tablespace is available, there is no need to continue backups of that particular tablespace. In this case, taking a tablespace backup and saving it once can save on subsequent database backup time, complexity, and costs. Although in most cases full backups are customary, partial backups in certain situations are more practical and effective.

Comparing incremental and full database backupsMaking full database images for backup purposes is the standard method of backing up databases. Creating incremental database backups is unavailable to users without the use of RMAN. Additionally, incremental backups add complexity and create recovery challenges when compared to full backups. Incremental backups use less space for the backed up data, and in the latest releases of the database (Oracle10g Release 2), by keeping a bitmap of changed tracks, incremental backups eliminate the need to fully scan each database datafile.


166


Using EMC replication in the Oracle backup processEMC disk-based replication is used to make a copy of an Oracle database and this copy is used as a source for backup. A database backup process using disk replication technology typically includes some or all of the following steps, depending on the copying mechanism selected and the desired usage of the database backup:

◆ Preparing the array for replication◆ Conditioning the source database◆ Making a copy of the database volumes◆ Resetting the source database◆ Presenting the target database copy to a backup server ◆ Backing up the target database copy

In all cases but one, operating-system capabilities are used to back up the copies of the database directories and containers. In other words, the Oracle backup utility RMAN is not used except in the case described in “Backups using TimeFinder and Oracle RMAN” on page 195.

The first step in the backup process depends on what method is going to be used to copy the database volumes and whether it is required to "condition" the Oracle database in any way. Conditioning could involve shutting down the database, or putting the database in hot backup mode. How the backup is processed and subsequently restored depends on what condition the database was in when the database copy was made. The database can be in one of three data states when it is being copied:

◆ Shutdown◆ Processing normally◆ Conditioned using hot backup mode

Depending on the data state of the database at the time it is copied, the database copy may be restartable or recoverable. While a restartable database is used in a valid backup/restore strategy, it cannot guarantee some data loss. Most DBAs will want to make a recoverable copy of the database such that logs taken after the backup are used to roll forward the database to a point in time after the backup was taken. It is important to understand that the database copies created with the EMC Symmetrix storage array can be used in a recoverable, roll-forward fashion, only if the database was conditioned properly (hot backup mode or shut down) when the copy was created. In addition, the way the restore is executed



depends on the state of the database at the time the copy was made. Chapter 5, “Restoring and Recovering Oracle Databases,” covers the restore of the database.

The following sections describe how to make a copy of the database using three different EMC technologies with the database in the three different states described in the prior paragraph.

The primary method of creating copies of an Oracle database is through the use of the EMC local replication product TimeFinder. TimeFinder is also used by Replication Manager to make database copies. Replication Manager facilitates the automation and management of database copies.

The TimeFinder family consists of two base products and several component options. TimeFinder/Mirror, TimeFinder/Clone and TimeFinder/Snap were discussed in general terms in Chapter 2, “EMC Foundation Products.” In this chapter, they are used in a database backup context.

Using EMC replication in the Oracle backup process 167

168


Copying the database with Oracle shutdownIdeally, a copy of an Oracle database should be taken while the database is shut down. Taking a copy after the database has shut down normally ensures a clean copy for backups. In addition, a cold copy of a database is in a known transactional data state which, for some application requirements, is exceedingly important. Copies taken of running databases are in unknown transactional data states.

While a normal shutdown is desirable, it is not always feasible with an active Oracle database. In many cases, applications and databases are forced to completely shut down. Rarely, the shutdown abort command, which terminates all database engine processes abruptly, may be required to successfully shut down the database. Whenever an abnormal database shutdown occurs, it is recommended that the database be restarted allowing the Oracle database engine to properly recover and clean up the database, and then be shut down normally. This ensures a clean, consistent copy of the database is available for the backup procedure.

Creating cold Oracle backup copies using TimeFinder/Mirror

TimeFinder/Mirror is an EMC software product that allows an additional storage hardware mirror to be attached to a source volume. The additional mirror is a specially designated volume in the Symmetrix configuration called a BCV. The BCV is synchronized to the source volume through a process called an establish. While the BCV is established, it is not accessible from any hosts. At an appropriate time, the BCV can be split from the source volume to create a complete point-in-time copy of the source data that can be used for backup.

Groups of standards and BCVs are managed together using SYMCLI device or composite groups. Solutions Enabler commands are executed to create SYMCLI groups for TimeFinder/Mirror operations. If the database spans more than one Symmetrix array, a composite group is used. Appendix B, “Sample SYMCLI Group Creation Commands,” provides examples of these commands.



Figure 34 shows how to use TimeFinder/Mirror to make a database copy of a cold Oracle database.

Figure 34 Copying a cold Oracle database with TimeFinder/Mirror



Note that the first iteration of the establish needs to be a full synchronization. Subsequent iterations by default are incremental if the -full keyword is omitted. When the command is issued, the array begins the synchronization process using only Symmetrix resources. Since this operation occurs independently from the host, the process must be interrogated to see when it completes. The command to interrogate the synchronization process is:


This command will return a 0 return code when the synchronization operation is complete. Alternatively, synchronization is verified using the following:


After all of the volumes in the device group appears as synchronized, the split command is issued at any time.

ICO-IMG-000505

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle42

31


170


2. Once BCV synchronization is complete, bring down the database to make a copy of a cold database:


3. When the database is shut down, split the BCV mirrors using the following command:

symmir -g device_group split -noprompt

The split command takes a few seconds to process. The database copy on the BCVs is now ready for further processing.

4. The source database can now be activated and made available to users once again.

SQL> startup;

Creating cold Oracle backup copies using TimeFinder/CloneTimeFinder/Clone is an EMC software product that copies data internally in the Symmetrix array. A TimeFinder/Clone session is created between a source data volume and a target volume. The target volume must be equal to or greater in size than the source volume. The source and target for TimeFinder/Clone sessions can be any hypervolumes in the Symmetrix configuration.




Figure 35 shows how to use TimeFinder/Clone to make a copy of a cold Oracle database onto the BCV devices.

Figure 35 Copying a cold Oracle database with TimeFinder/Clone



Unlike TimeFinder/Mirror, the TimeFinder/Clone relationship is created and activated when it is needed. No prior synchronization of data is necessary. After the TimeFinder/Clone session is created it can be activated consistently.

2. Once the create command has completed, shut down the database to make a cold disk copy of the database:


3. With the database down, activate the TimeFinder /Clone:

symclone -g device_group activate -noprompt

After an activate command, the database copy provided by TimeFinder/Clone is immediately available for further processing even though the copying of data may not have completed.

4. Activate the source database to make it available to users again:

SQL> startup;

ICO-IMG-000505

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle42

31


172


Databases copied using TimeFinder/Clone are subject to COFW and COA penalties. The COFW penalty means that if a track is written to the source volume and has not been copied to the target volume, it must first be copied to the target volume before the write from the host is acknowledged. COA means that if a track on a TimeFinder/Clone volume is accessed before it was copied, it must first be copied from the source volume to the target volume. This causes additional disk read activity on the source volumes and could be a source of disk contention on busy systems.

Creating cold Oracle backup copies using TimeFinder/SnapTimeFinder/Snap enables users to create complete copies of their data while consuming only a fraction of the disk space required by the original copy.

TimeFinder/Snap is an EMC software product that maintains space-saving, pointer-based copies of disk volumes using VDEVs and SAVDEVs. The VDEVs contain pointers either to the source data (when it is unchanged) or to the SAVDEVs (when the data has been changed). VDEVs are not a full copy of the source data and rely on the source devices. If the source device becomes unavailable, the virtual device will not be available as well. In addition, if the SAVEDEV area gets full, it will invalidate the TimeFinder/Snap session.

TimeFinder/Snap devices are managed together using SYMCLI device or composite groups. Solutions Enabler commands are executed to create SYMCLI groups for TimeFinder/Snap operations. If the database spans more than one Symmetrix array, a composite group is used. Appendix B, “Sample SYMCLI Group Creation Commands,” provides examples of these commands.



Figure 36 shows how to use TimeFinder/Snap to make a copy of a cold Oracle database.

Figure 36 Copying a cold Oracle database with TimeFinder/Snap



2. Once the create operation has completed, shut down the database in order to make a cold TimeFinder/Snap of the DBMS. Execute the following Oracle commands:


3. With the database down, the TimeFinder/Snap copy can be activated:

symsnap -g device_group activate -noprompt

After the activate, the pointer-based database copy on the VDEVs is available for further processing.

4. The source database can be activated again. Use the following Oracle command:

SQL> startup;

STD

VDEV

SAVEDEV

Controllinghost

Targethost

1 3

2

4

ICO-IMG-000506



copy on write

I/O

I/O


174


Databases copied using TimeFinder/Snap are subject to a COFW penalty while the snap is activated. The COFW penalty means that if a track is written to the source or target volumes and has not been copied to the snap save area, it must first be copied to the save area before the write from the host is acknowledged.



Copying a running database using EMC consistency technology

Running database systems can be copied while the databases are servicing applications and users. The database copying techniques use EMC consistency technology combined with an appropriate data copy process like TimeFinder/Mirror or TimeFinder/Clone. TimeFinder/CG allows for the running database copy to be created in an instant through use of the -consistent key word on the split or activate commands. The image created in this way is in a dependent-write consistent data state and can be used as a restartable copy of the database.

Database management systems enforce a principle of dependent-write I/O. That is, no dependent-write will be issued until the predecessor write it is dependent on has completed. This type of programming discipline is used to coordinate database and log updates within a database management system and allows those systems to be restartable in event of a power failure. Dependent-write consistent data states are created when database management systems are exposed to power failures. Using EMC consistency technology options during the database replication process also creates a database copy that has a dependent-write-consistent data state. Chapter 2, “EMC Foundation Products,” provides more information on EMC consistency technology.

Oracle databases can be copied while running and processing transactions. The following sections describe how to copy a running Oracle database using TimeFinder technology.


176


Creating restartable Oracle backup copies using TimeFinder/MirrorFigure 37 shows how to use TimeFinder/Mirror to make a copy of a running Oracle database.

Figure 37 Copying a running Oracle database with TimeFinder/Mirror



Note that the first iteration of the establish must be a full synchronization. Subsequent iterations are incremental and do not need the -full keyword. Once the command is issued, the array begins the synchronization process using only Symmetrix resources. Since this operation occurs independently from the host, the process must be interrogated to see when it completes. The command to interrogate the synchronization process is:


This command will return a 0 return code when the synchronization operation is complete. Alternatively, synchronization can be verified using the following:


2. Once the volumes are synchronized, issue the split command:

symmir -g device_group split -consistent -noprompt

ICO-IMG-000507

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle

1 2



The -consistent keyword tells the Symmetrix array to use ECA to momentarily suspend writes to the disks while the split is being processed. The effect of this is to create a point-in-time copy of the database on the BCVs. It is similar to the image created when there is a power outage that causes the server to crash. This image is a restartable copy. The database copy on the BCVs is then available for further processing.

Since there was no specific coordination between the database state and the execution of the consistent split, the copy is taken independent of the database activity. Using this technique, EMC consistency technology is used to make point-in-time backups of multiple systems atomically, resulting in a consistent point in time with respect to all applications and databases included in the consistent split.

Creating restartable Oracle backup copies using TimeFinder/Clone

TimeFinder/Clone is an EMC software product that copies data internally in the Symmetrix array. A TimeFinder/Clone session is created between a source data volume and a target volume. The target volume needs to be equal to or greater in size than the source volume. The source and target for TimeFinder/Clone sessions can be any hypervolumes in the Symmetrix configuration.



178


Figure 38 shows how to use TimeFinder/Clone to make a copy of a running Oracle database onto BCV devices.

Figure 38 Copying a running Oracle database using TimeFinder/Clone




2. After the TimeFinder/Clone relationship is created, it can be activated consistently:

symclone -g device_group activate -consistent -noprompt

The -consistent keyword tells the Symmetrix array to use ECA to momentarily suspend writes to the source disks while the TimeFinder/Clone is being activated. The effect of this is to create a point-in-time copy of the database on the target volumes. It is a copy similar in state to that created when there is a power outage resulting in a server crash. This copy is a restartable copy. After the activate command, the database copy on the TimeFinder/Clone devices is available for further processing.

ICO-IMG-000507

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle

1 2



Since there was no specific coordination between the database and the execution of the consistent split, the copy is taken independent of the database activity. In this way, EMC consistency technology is used to make point-in-time copies of multiple systems atomically, resulting in a consistent point in time with respect to all applications and databases included in the consistent split.

Databases copied using TimeFinder/Clone are subject to COFW and COA penalties. The COFW penalty means that if a track is written to the source volume and has not been copied to the target volume, it must first be copied to the target volume before the write from the host is acknowledged. COA means that if a track on a target volume is accessed before it is copied, it has to be copied from the source volume to the target volume first. This causes additional disk read activity to the source volumes and could be a source of disk contention on busy systems.

Creating restartable Oracle backup copies using TimeFinder/Snap

TimeFinder/Snap enables users to create complete copies of their data while consuming only a fraction of the disk space required by the original copy.

TimeFinder/Snap is an EMC software product that maintains space-saving pointer-based copies of disk volumes using VDEVs and SAVDEVs. The VDEVs contain pointers either to the source data (when it is unchanged) or to the SAVDEVs (when the data changed).



180


Figure 39 on page 180 shows how to use TimeFinder/Snap to make a copy of a running Oracle database.

Figure 39 Copying a running Oracle database with TimeFinder/Snap



After the TimeFinder/Snap devices are created, all the pointers from the VDEVs point at the source volumes. No data has been copied at this point. The snap can be activated consistently using the consistent activate command.

2. Once the create operation has completed, execute the activate command with the -consistent option to perform the consistent snap:

symsnap -g device_group activate -consistent -noprompt

The -consistent keyword tells the Symmetrix array to use ECA (Enginuity Consistency Assist) to momentarily suspend writes to the disks while the activate command is being processed. The effect of this is to create a point-in-time copy of the database on the VDEVs. It is similar to the state created when there is a power outage that causes the server to crash. This image is a restartable copy. The database copy on the VDEVs is available for further processing.

STD

VDEV

SAVEDEV

Controllinghost

Targethost

1

2

ICO-IMG-000508



copy on write

I/O

I/O



Since there was no specific action coordination between the database and the execution of the consistent split, the copy is taken independent of the database activity. In this way, EMC consistency technology is used to make point-in-time copies of multiple systems atomically, resulting in a consistent point in time with respect to all applications and databases included in the consistent split.

Databases copied using TimeFinder/Snap are subject to a COFW penalty while the snap is activated. The COFW penalty means that if a track is written to the source volume and has not been copied to the snap-save area, it must first be copied to the snap-save area before the write from the host is acknowledged.


182


Copying the database with Oracle in hot backup modeFor many years, Oracle has supported hot backup mode, which provides the capability to use split-mirroring technology while the database is online and create a recoverable database on the copied devices. During this process, the database is fully available for reads and writes. However, instead of writing change vectors (such as the rowid, before, and after images of the data) to the online redo log, entire blocks of data are written. These data blocks are then used to overwrite any potential inconsistencies in the data files. While this enables the database to recover itself and create a consistent point-in-time image after recovery, it also degrades performance while the database is in hot backup mode.

An important consideration when using hot backup mode to create a copy of the database is the need to split the archive logs separately from the database. This is because Oracle must recover itself to the point after all of the tablespaces are taken out of hot backup mode. If the hypervolumes containing the archive logs are split at the same time as the data volumes, the marker indicating the tablespaces are out of hot backup mode will not be found in the last archive log. As such, the archive logs must be split after the database is taken out of hot backup mode, so the archive log devices (and generally the redo logs as well) must be separate from the other data files.

The following sections describe the steps needed to put tablespaces or the entire database into hot backup mode and take it out again. Appendix D, “Sample Database Cloning Scripts,” provides a sample script showing how hot backup mode can be used to create a recoverable Oracle database image.

Putting the tablespaces or database into hot backup mode

To create a consistent image of Oracle while into hot backup mode, each of the tablespaces in the database must be put into hot backup mode before copying can be performed. The following command connects to the database instance and issues the commands to put the tablespaces into hot backup mode:

sqlplus "/ as sysdba"SQL> alter system archive log current;SQL> alter tablespace DATA begin backup;SQL> alter tablespace INDEXES begin backup;SQL> alter tablespace SYSTEM begin backup;





When these commands are issued, data blocks for the tablespaces are flushed to disk and the datafile headers are updated with the last checkpoint SCN. Further updates of the checkpoint SCN to the data file headers are not performed while in this mode. When these files are copied, the nonupdated SCN in the datafile headers signifies to the database that recovery is required.

Taking the tablespaces or database out of hot backup mode

To take the tablespaces out of hot backup mode, connect to the database and issue the following commands:

sqlplus "/ as sysdba"SQL> alter tablespace DATA end backup;SQL> alter tablespace INDEXES end backup;SQL> alter tablespace SYSTEM end backup;SQL> alter system archive log current;

When these commands complete, the database is returned to its normal operating state.


sqlplus "/ as sysdba"SQL> alter database end backup;SQL> alter system archive log current;

The log file switch command is used to ensure that the marker indicating that the tablespaces are taken out of hot backup mode is found in an archive log. Switching the log automatically ensures that this record is found in a written archive log.

Creating hot Oracle backup copies using TimeFinder/MirrorTimeFinder/Mirror is an EMC software product that allows an additional hardware mirror to be attached to a source volume. The additional mirror is a specially designated volume in the Symmetrix configuration called a BCV. The BCV is synchronized to the source


184


volume through a process called an establish. While the BCV is established it is not ready to all hosts. At an appropriate time, the BCV can be split from the source volume to create a complete point-in-time copy of the source data that can be used for multiple different purposes including backup, decision support, regression testing, etc.

Groups of BCVs are managed together using SYMCLI device or composite groups. Solutions Enabler commands are executed to create SYMCLI groups for TimeFinder/Mirror operations. If the database spans more than one Symmetrix array, a composite group is used. Appendix B, “Sample SYMCLI Group Creation Commands,” provides examples of these commands.

Figure 40 shows how to use TimeFinder/Mirror to make a copy of an Oracle database in hot backup mode.

Figure 40 Copying an Oracle database in hot backup mode with TimeFinder/Mirror


symmir -g data_group establish -full -nopromptsymmir -g log_group establish -full -noprompt

Note that the first iteration of the establish needs to be a "full" synchronization. Subsequent iterations are incremental and do not need the -full keyword. Once the command is issued, the array begins the synchronization process using only Symmetrix

ICO-IMG-000509

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle42

3

5

1



resources. Since this is asynchronous to the host, the process must be interrogated to see when it is finished. The command to interrogate the synchronization process is:

symmir -g data_group verifysymmir -g log_group verify


2. When the volumes are synchronized, put the database in hot backup mode. Connect to the database and issue the following commands:


3. Execute a split of the standard and BCV relationship:

symmir -g data_group split -noprompt

The -consistent keyword is not used here as consistency is provided by the database. The Data BCV(s) now contain an inconsistent copy of the database that can be made consistent through recovery procedures using the archive logs. This is a recoverable database. Usage of recoverable copies of databases is described in “Recoverable disk copies” on page 110.



5. After the tablespaces are taken out of hot backup mode and a log switch is performed, split the Log BCV devices from their source volumes:

symmir -g log_group split -noprompt

Creating hot Oracle backup copies using TimeFinder/CloneTimeFinder/Clone is an EMC software product that copies data internally in the Symmetrix array. A TimeFinder/Clone session is created between a source data volume and a target volume. The


186


target volume needs to be equal to or greater in size than the source volume. The source and target for TimeFinder/Clone sessions can be any hypervolumes in the Symmetrix configuration.


Figure 41 shows how to use TimeFinder/Clone to make a copy of an Oracle database in hot backup mode onto BCV devices.

Figure 41 Copying an Oracle database in hot backup mode with TimeFinder/Clone


symclone -g data_group create -nopromptsymclone -g log_group create -noprompt


2. Place the Oracle database in hot backup mode:


ICO-IMG-000509

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle42

3

5

1



3. Execute an "activate" of the TimeFinder/Clone:

symclone -g data_group activate -noprompt

The -consistent keyword is not used here as consistency is provided by the database. The Data clone devices now contain an inconsistent copy of the database that can be made consistent through recovery procedures using the archive logs. This is a recoverable database. “Enabling a cold database copy” on page 140 describes use of recoverable copies of databases.



5. After the tablespaces are taken out of hot backup mode and a log switch is performed, activate the Log clone devices:

symclone -g log_group activate -noprompt

Databases copied using TimeFinder/Clone are subject to COFW and COA penalties. The COFW penalty means the first time a track is written to the source volume and has not been copied to the target volume, it must first be copied to the target volume before the write from the host is acknowledged. Subsequent writes to tracks already copied, do not suffer from the penalty. COA means that if a track on a target volume is accessed before it is copied, it must first be copied from the source volume to the target volume. This causes additional disk read activity to the source volumes and could be a source of disk contention on busy systems.

Creating hot Oracle backup copies using TimeFinder/SnapTimeFinder/Snap enables users to create complete copies of their data while consuming only a fraction of the disk space required by the original copy.

TimeFinder/Snap is an EMC software product that maintains space-saving pointer-based copies of disk volumes using VDEVs and SAVDEVs. The VDEVs contain pointers either to the source data (when it is unchanged) or to the SAVDEVs (when the data has changed).


188



The following diagram depicts the necessary steps to make a copy of an Oracle database in hot backup mode using TimeFinder/Snap:

Figure 42 Copying an Oracle database in hot backup mode with TimeFinder/Snap


symsnap -g data_group create -nopromptsymsnap -g log_group create -noprompt

Unlike TimeFinder/Mirror, the snap relationship is created and activated when it is needed. No prior copying of data is necessary. The create operation establishes the relationship between the standard devices and the VDEVs and it also creates the protection metadata.

2. After the snaps are created, place the Oracle database in hot backup mode:


STD

VDEV

SAVEDEV

Controllinghost

Targethost

1 3 5

2

4

ICO-IMG-000510



copy on write

I/O

I/O



3. Execute an "activate" of the TimeFinder/Snap for the data devices:

symsnap -g data_group activate -noprompt

The -consistent keyword is not used here because consistency is provided by the database. The VDEVs (and possibly SAVDEVs) contain a pointer-based copy of the database while it is in hot backup mode. This is a recoverable database copy. “Enabling a cold database copy” on page 140 describes use of recoverable copies of Oracle databases.

4. Once the snap activate process completes, take the database (or tablespaces) out of hot backup mode on the source database:


5. After the database is taken out of hot backup mode and a log switch is performed, activate the lsnap devices:

symsnap -g log_group activate -noprompt

Databases copied using TimeFinder/Snap are subject to a COFW penalty while the snap is activated. The COFW penalty means that if a track is written to the source volume and has not been copied to the snap-save area, it must first be copied to the save area before the write from the host is acknowledged.


190


Backing up the database copyThe most common method of backing up an array-based copy of the database is to present the volumes that contain the database copy to a backup server, import the volume group, mount file systems, etc. When the backup server has access to the volumes in this way, it can simply execute a file system or device backup of the database volumes. Note that this backup is done at the OS level and uses OS utilities or a Tape Management system utility to create the backup on tape.

Note: Regardless of the tool used to create the backup copy and regardless of the state of the database at the time the copy was created, the backup process is the same, except as noted in the next section.

Appendix C, “Related Host Operation,” describes issues around accessing, importing, and mounting copies of database volumes.



Backups using EMC Replication Manager for Oracle backupsEMC Replication Manager (RM) is used to manage and control the TimeFinder copies of an Oracle database used for backups. It also may be integrated with backup products such as NetWorker. The RM product has a GUI and command line and provides the capability to:

◆ Autodiscover the standard volumes holding the database

◆ Identify the pathname for all database files

◆ Identify the location of the archive log directories

◆ Identify the location of the database binaries, dump files, etc.

◆ Callouts to integrate backup utilities with the database copies

Using this information, RM can set up TimeFinder Groups with BCVs or VDEVs, schedule TimeFinder operations and manage the creation of database copies, expiring older versions as needed.

Figure 43 demonstrates the steps performed by RM using TimeFinder/Mirror to create a database copy to use for multiple other purposes:

Figure 43 Using RM to make a TimeFinder copy of Oracle

RM does the following:

1. Logs in to the database and discovers the locations of all the datafiles and logs on the Symmetrix devices. Note that the dynamic nature of this activity will handle the situation when extra volumes are added to the database. The procedure will not have to change.

ICO-IMG-000511

Data STD

Log STD

Arch STD

Data BCV

Oracle

3

1

8

9

10

4

2

6 7

5

Backups using EMC Replication Manager for Oracle backups 191

192


2. Establishes the standards to the BCVs in the Symmetrix array. RM polls the progress of the establish process until the BCVs are synchronized, and then moves on to the next step.

3. Performs a log switch to flush changes to disk, minimizing recovery required of the copied database.

4. Puts the Oracle database in hot backup mode, discussed in “Putting the tablespaces or database into hot backup mode” on page 182.

5. Issues a TimeFinder split, to detach the data BCVs from the standard devices.

6. Takes the Oracle database out of hot backup mode, as discussed in “Taking the tablespaces or database out of hot backup mode” on page 183.

7. Performs another log switch to flush the end of hot backup mode marker from the online redo logs to an archive log.

8. Creates a copy of a backup control file.

9. Copies the backup control file and additional catalog information the RM host.

10. Copies the database archive logs to the replication manager host for use in the restore process.



Backups using Oracle Recovery Manager (RMAN)Oracle Recovery Manager, or RMAN, is an Oracle backup and recovery utility first available in Oracle 8. RMAN contains a command-line client and integration into the Enterprise Manager (OEM) GUI. The utility integrates with procedures and sessions running on the Oracle servers to manage the backup and recovery processes. In addition, RMAN can create a backup repository that contains information about all backups and recoveries in the environment.

RMAN provides a number of key benefits over traditional user managed database backups. They include:

◆ Automated backup and recovery — The RMAN utility provides functionality that can automate backup and recovery processes to tape or disk.

◆ Backup catalogs — RMAN monitors all database backup activities and stores information concerning backed up datafiles and archive logs in a centrally managed repository.

◆ Incremental backup support — RMAN enables the ability to create incremental backup images that shorten the backup window and reduce the amount of tape or disk space needed for backups.

◆ Block level corruption detection and recovery — During the backup process, RMAN reads each of the data blocks and determines whether the block is consistent. If data corruption issues exist, RMAN can provide block media recovery to correct any corruption issues.

◆ Hot backup mode not required for hot backups — RMAN provides the ability to create a database backup image that can be made consistent without having to put the database into hot backup mode (assumes that primary database is used for backups rather than a database copy).

◆ Integration with OEM — Oracle Enterprise Manager provides a GUI interface that simplifies the management process of Oracle databases. RMAN integrates with OEM to provide a single location for managing all aspects of multiple database environments.

Backups using Oracle Recovery Manager (RMAN) 193

194


◆ Backup types — RMAN provides the ability to back up databases to either tape or disk. It also integrates with specialized media managers that assist in simplifying the backups of Oracle with backup solutions including NetWorker and VERITAS NetBackup.

RMAN provides a broad range of backup options and features. Describing these features in detail is beyond the scope of this document. For additional detailed information on RMAN, consult the Oracle documentation Oracle Database Backup and Recovery Basics and Oracle Database Backup and Recovery Advanced User's Guide.



Backups using TimeFinder and Oracle RMANIt is possible to back up an Oracle database image made with TimeFinder using the RMAN backup utility. This option retains all of the benefits of RMAN while simultaneously offloading the resources needed by Oracle's backup utility to process the production database blocks. In addition, using TimeFinder in conjunction with RMAN enables recovery procedures to be tested and validated before affecting production data, since a second copy of the data is available.

Backups using TimeFinder and Oracle RMAN 195

196



5


◆ Introduction ...................................................................................... 198◆ Oracle recovery types ...................................................................... 199◆ Oracle recovery overview............................................................... 203◆ Restoring a backup image using TimeFinder .............................. 205◆ Restoring a backup image using Replication Manager.............. 215◆ Oracle database recovery procedures ........................................... 217◆ Database recovery using Oracle RMAN....................................... 223◆ Oracle Flashback .............................................................................. 224

Restoring andRecovering Oracle

Databases

Restoring and Recovering Oracle Databases 197

198

Restoring and Recovering Oracle Databases

IntroductionRecovery of a production database is an event that all DBAs hope is never required. Nevertheless, DBAs must be prepared for unforeseen events such as media failures or user errors requiring database recovery operations. The keys to a successful database recovery include the following:

◆ Identifying database recovery time objectives

◆ Planning the appropriate recovery strategy based upon the backup type (full, incremental)

◆ Documenting the recovery procedures

◆ Validating the recovery process

Oracle recovery depends on the backup methodology used. With the appropriate backup procedures in place, an Oracle database is recovered to any point in time between the end of the backup and the point of failure using a combination of backed up data files and Oracle recovery structures including the control files, the archive logs, and the redo logs. Recovery typically involves copying the previously backed up files into their appropriate locations and, if necessary, performing recovery operations to ensure that the database is recovered to the appropriate point in time and is consistent.

The following sections examine both traditional (user-managed) and RMAN Oracle database recoveries. This chapter assumes that EMC technology is used in the backup process as described in Chapter 4, “Backing Up Oracle Environments.” Thus, this chapter directly matches the sections of that chapter.



Oracle recovery typesThere are several reasons for Oracle to perform recovery. Examples include recovery of the database after a power failure on the host, recovery after user error that deletes a required part of the database, recovery after application errors, and recovery on account of hardware (disk, host, HBA, and such) or software failures or corruptions.

The following sections discuss the various types of recovery operations available in Oracle, under what circumstances they are employed, and high-level details of each recovery operation. These operations are then further discussed in subsequent sections of this chapter.

Crash recovery

A critical component of all ACID-compliant (Atomicity Consistency Isolation Durability) databases is the ability to perform crash recovery to a consistent database state after a failure. Power failures to the host are a primary concern for databases to go down inadvertently and require crash recovery. Other situations where crash recovery procedures are needed include databases shut down with the "abort" option and database images created using a consistent split mechanism.

Crash recovery is an example of using the database restart process, where the implicit application of database logs during normal initialization occurs. Crash recovery is a database driven recovery mechanism-it is not initiated by a DBA. Whenever the database is started, Oracle verifies that the database is in a consistent state. It does this by reading information out of the control file and verifying the database was previously shut down cleanly. It also determines the latest checkpoint system change number (SCN) in the control file and verifies that each datafile is current by comparing the SCN in each data file header. In the event that a crash occurred and recovery is required, the database automatically determines which log information needs to be applied. The latest redo log is read and change information from them is applied to the database files, rolling forward any transactions that were committed but not applied to the database files. Then, any transaction information written to the datafiles, but not committed, are rolled back using data in the undo logs.

Oracle recovery types 199

200


In addition to enabling Oracle databases to recover after an outage, crash recovery is also essential to restartable databases that use the EMC consistency technologies. These consistency technologies in conjunction with TimeFinder, SRDF, or Open Replicator, enable dependent-write consistent database images to be created. When these copies are restarted, the database uses crash recovery mechanisms to transform the dependent-write consistent images into transactionally consistent database images.

Media recoveryMedia recovery is another type of Oracle recovery mechanism. Unlike crash recovery however, media recovery is always user-invoked, although both user-managed and RMAN recovery types may be used. In addition, media recovery rolls forward changes made to the datafiles restored from disk or tape due to their loss or corruption. Unlike crash recovery, which uses only the online redo log files, media recovery uses both the online redo logs and the archived log files during the recovery process.

The granularity of a media recovery depends on the requirements of the DBA. It can be performed for an entire database, for a single tablespace, or even for a single datafile. The process involves restoring a copy of a valid backed up image of the required data structure (database, tablespace, datafile) and using Oracle standard recovery methods to roll forward the database to the point in time of the failure by applying change information found in the archived and online redo log files. Oracle uses SCNs to determine the last changes applied to the data files involved. It then uses information in the control files that specifies which SCNs are contained in each of the archive logs to determine where to start the recovery process. Changes are then applied to appropriate datafiles to roll them forward to the point of the last transaction in the logs.

Media recovery is the predominant Oracle recovery mechanism. Media recovery is also used as a part of replicating Oracle databases for business continuity or disaster recovery purposes. Further details of the media recovery process are in the following sections.



Complete recoveryComplete recovery is the primary method of recovering an Oracle database. It is the process of recovering a database to the latest point in time (just before the database failure) without the loss of committed transactions. The complete recovery process involves restoring a part or all of the database data files from a backup image on tape or disk, and then reading and applying all transactions subsequent to the completion of the database backup from the archived and online log files. After restarting the database, crash recovery is performed to make the database transactionally consistent for continued user transactional processing.

The processes needed to perform complete recovery of the database are detailed in the following sections.

Incomplete recoveryOracle sometimes refers to incomplete database recovery as a point-in-time recovery. Incomplete recovery is similar to complete recovery in the process used to bring the database back to a transactionally consistent state. However, instead of rolling the database forward to the last available transaction, roll-forward procedures are halted at a user-defined prior point. This is typically done to recover a database prior to a point of user error such as the deletion of a table, undesired deletion or modification of customer data, or rollback of an unfinished batch update. In addition, incomplete recovery is also performed when recovery is required, but there are missing or corrupted archive logs. Incomplete recovery always incurs some data loss.

Typically, incomplete recovery operations are performed on the entire database since Oracle needs all database files to be consistent with one another. However, an option called Tablespace Point-in-Time Recovery (usually abbreviated TSPITR), which allows a single tablespace to be only partially recovered, is also available. This recovery method, in Oracle10g, uses the transportable tablespace feature described in Section 3.8. The Oracle documentation Oracle Database Backup and Recovery Advanced Users Guide provides additional information on TSPITR.

Oracle recovery types 201

202


The need to perform incomplete recovery to fix user errors has become less important with the introduction of the Oracle Flashback capabilities. Flashback functionality is described in “Oracle Flashback” on page 224.

Restartable database recoveryIn addition to the more commonly used complete and incomplete recovery methods, another database recovery scheme gaining increasing relevance is restartable database recovery. Restartable database recovery is differentiated from incomplete recovery by the fact that the EMC consistency technology is required to make a restartable image, rather than one using Oracle's recovery procedures.

Using this method, the entire database image written to tape or disk is restored to the point of the backup and the database service is then restarted. At restart, Oracle performs crash recovery, rolling forward any changes that did not make it to the data files and rolling back changes that had not committed. The database is then opened for user activities.

While a restartable database recovery invariably involves the loss of data since the backup was made at some previous time, there are benefits to restoring restartable database images. Primary among them is the ability to coordinate recovery points with multiple, or federated database environments. The operational complexity of managing the recovery points of multiple databases, related application data, or infrastructure messaging queues can be difficult, if not impossible using traditional backup techniques. Restart procedures in conjunction with EMC consistency technology allows customers to create a point-in-time image of all applications and databases. This reduces or eliminates operational complexity and enables reduced recovery times for complex enterprise environments.



Oracle recovery overviewOracle has two primary recovery methods: user-managed and RMAN. A traditional user-managed recovery involves restoring data files from tape or disk using operating system commands and performing Oracle recovery operations to create a consistent database image to a specified point in time between the last available backup and the point of failure. User-managed recoveries require careful consideration by the DBA to ensure that the appropriate files (archive logs, control files, datafiles, and such) are available and restored to their correct locations, and that the recovery process is performed successfully.

An alternative to user-managed recovery is RMAN. After using RMAN for the backup process, the utility also may be used to recover the database. RMAN maintains an internal list of backup filenames and their locations in order to automate the recovery process. RMAN is an automated, efficient utility that simplifies the Oracle recovery process.

The following sections describe considerations for the various user-managed recovery processes. These include:

◆ Online versus offline recovery - In most cases, recovery is performed by shutting down the primary database, restoring one or more data files from tape or disk, and recovering the datafile(s). This requires downtime for the database. An alternative to this is recovering a single tablespace, which in some cases can be done online.

◆ Point-in-time versus roll-forward recovery - Historically, recovery procedures involved restoring a copy of the database and using Oracle recovery mechanisms in conjunction with the archive logs (and the last redo logs) to recover the database to the point of a failure. In federated environments, however, restoring a database image to a known state and using Oracle crash recovery procedures to make the database consistent is a growing alternative to traditional roll-forward recovery.

◆ Partial (tablespace or datafile) versus full database recovery - Oracle provides the ability to recover a single data file or tablespace in addition to recovering the entire database. This option is useful for example if a single datafile becomes corrupted or data from a single tablespace is lost.

Oracle recovery overview 203

204


◆ "Incomplete versus complete database recovery - In general, customers want to recover a database fully to the point of a failure. In some cases, however, due to lost or corrupted archive logs from tape, incomplete recovery may necessary or even desired.

The recovery process consists of three primary steps:

◆ Restoring a database backup (that is, the backed up datafiles or raw devices, from either tape or disk)

◆ Recovering the database using Oracle database recovery methods

◆ Verifying the state of the recovered database including consistency of the database, recovery point, and coordination with other databases or applications

Most recovery procedures require both a restore of a database image and user-initiated recovery of that image to a consistent state. However, this is not always the case. In some circumstances, simply restoring an image of the database and restarting the instance is all that is needed to bring the database to a consistent, defined state and continue operations. Planning for, documenting, and testing the required recovery procedures for a particular database environment are an essential part of maintaining a workable recovery strategy.

Perhaps the most critical, but often overlooked, component of the recovery process is verification of the database once the restore and recovery steps are complete. Important and often difficult management tasks in the recovery process include:

◆ Ensuring that the database is consistent and has not lost any transactions.

◆ Integrating the database recovery process with application information, datafiles stored outside of the database, or other databases or applications.

Verifying the database after restore and recovery depends upon the customer's specific applications, requirements, and environment. As such, it is not discussed further in this document.



Restoring a backup image using TimeFinderThe first step in any recovery process is to restore a backup image from either tape or disk media. In this section, the copy is on disk media and was created by TimeFinder. It is assumed that the disks contain the database image needed for restore. The exact restore process depends on how the copy on disk was created. If the database image needed for the restore is on tape, the procedures are different and beyond the scope of this document. Ideally, a copy of an Oracle database shut down during the backup process is available for restore, as it provides the greatest flexibility for the recovery processing. However, backup images taken while the database was in hot backup mode or was not conditioned in any way all use similar restore procedures.

Restore operations that use only Symmetrix array resources may be performed if EMC TimeFinder was used to create a backup image of the database. If a database copy was made while the database was shut down, this copy can be restored (with TimeFinder, either incrementally or fully) and used as a point-in-time image, or as part of an incomplete or complete recovery of the database. Alternatively, if the backup image was made while the database was in a hot backup state, TimeFinder may also be used to restore an inconsistent image of the database that can be successfully recovered using Oracle incomplete or complete recovery techniques to a user defined point of consistency.

TimeFinder comes in three different forms: TimeFinder/Mirror, TimeFinder/Clone and TimeFinder/Snap. These were discussed in general terms in Chapter 2, “EMC Foundation Products.” Here, they are used in a database recovery context. The following sections describe the restore process when each variant of TimeFinder is used.

Restore using TimeFinder/Mirror

If TimeFinder/Mirror was used to create a backup database image, the TimeFinder restore process can be used to copy the backup image back to the production volumes. Two cases for the restore process exist. In the first case, when a point-in-time restartable database image restore is desired, all of the data files that make up the database including the archive logs, redo logs, and control files are restored from the BCV devices.

Restoring a backup image using TimeFinder 205

206


This first case is depicted in Figure 44, where both the volumes containing the datafiles and the database recovery structures (archive logs, redo logs, and control files) are restored.

Prior to any disk-based restore using EMC technology, the database must be shut down, and file systems unmounted. The operating system should have nothing in its memory that reflects the content of the database file structures.

Figure 44 Restoring a TimeFinder copy, all components

In most circumstances, only the datafiles (or even a subset of the datafiles) are restored. In these instances, user-initiated complete or incomplete database recoveries are planned. Figure 45 depicts the case where only the datafiles (or a subset of the datafiles) are restored.

Figure 45 Restoring a TimeFinder copy, data components only

ICO-IMG-000513

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle

21

ICO-IMG-000514

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle

21



In the example that follows, the data_group device group holds all Symmetrix volumes containing Oracle tablespaces. The log_group group has volumes containing the Oracle recovery structures (the archive logs, redo logs, and control files). The following steps describe the process needed to restore the database image from the BCVs:

1. Verify the state of the BCVs. All volumes in the Symmetrix device group should be in a split state. The following commands identify the state of the BCVs for each of the device groups:

symmir -g data_group querysymmir -g log_group query

2. Shut down the database on the production volumes. From a storage perspective, the restore process does not require the database to shut down. However, because data blocks are changing at the storage layer, Oracle is not aware of changes occurring during the restore process. As such, data in the SGA may not be consistent with data on disk. This inconsistency requires a brief outage of the database while the restore process is initiated.

sqlplus "/ as sysdba"SQL> shutdown immediateSQL> startup restrict;SQL> shutdown

3. After the primary database has shut down, unmount the file system (if used) to ensure that nothing remains in cache. This action is operating-system dependent.

4. Once the primary database has shut down successfully and the file system is un-mounted, initiate the BCV restore process. In this example, both the data_group and log_group device groups are restored, indicating a point-in-time recovery. If an incomplete or complete recovery is required, only the data_group device group would be restored. Execute the following TimeFinder/Mirror SYMCLI commands:

symmir -g data_group restore -nopsymmir -g log_group restore -nop

symmir -g data_group querysymmir -g log_group query


208


5. Once the BCV restore process has been initiated, the production database copy is ready for recovery operations. It is possible to start the recovery process even though the data is still being restored from the BCV to the production devices. Any tracks needed, but not restored, will be pulled directly from the BCV device. It is recommended however, that the restore operation completes and the BCVs are split from the standard devices before the source database is started and recovery (if required) is initiated.

Note: It is important to understand that if the database is restarted before the restore process completes, any changes to the source database volumes will also be written to the BCVs. This means that the copy on the BCV will no longer be a consistent database image. It is always recommended that the restore process completes and the BCVs are split from the source volumes before processing or recovery is initiated on the source devices.

6. After the restore process completes, split the BCVs from the standard devices with the following commands:

symmir -g data_group split -nopsymmir -g log_group split -nopsymmir -g data_group querysymmir -g log_group query

Restore using TimeFinder/Clone

TimeFinder/Clone allows a target clone image to be restored back to the source device or to an unrelated target device. Prior to Solutions Enabler 6.0, data could be restored from a clone target back to its source device only by performing a reverse clone operation. The clone relationship was terminated between the source and the target; the target was then used as the source for creating and activating a new clone relationship. However, with SYMCLI 6.0 running with Enginuity 5671 code, an operation similar to a TimeFinder/Mirror restore can be performed without terminating the original clone session. A prerequisite of this is that the clone operation must have been created with the -differential option.

If TimeFinder/Clone is used to create a database backup image, the TimeFinder/Clone restore process can be used to copy the backup image back to the production volumes. Figure 46 on page 209 depicts the necessary steps to restore a database copy of an Oracle database using TimeFinder/Clone. In this example, both the volumes containing data files and database recovery structures (archive logs,



redo logs, and control files) are restored in anticipation that a point-in-time recovery (rather than a complete or incomplete recovery) will be performed. Alternatively, if complete recovery is required, only the volumes containing the database datafiles may be restored, as is shown in Figure 47.

Figure 46 Restoring a TimeFinder/Clone copy, all components

Figure 47 Restoring a TimeFinder/Clone copy, data components only

ICO-IMG-000513

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle

21

ICO-IMG-000514

Data STD

Log STD

Arch STD

Data BCV

Log BCV

Arch BCV

Oracle

21


210


In the example that follows, the data_group device group holds all Symmetrix volumes containing Oracle tablespaces. The log_group group has volumes containing the Oracle recovery structures (the archive logs, redo logs, and control files). Follow these steps to restore the database image from the BCV clone devices:

1. Verify the state of the clone devices. Volumes in the Symmetrix device group should be in an active state, although the relationship between the source and target volumes may have terminated. The following commands identify the state of the clones for each of the device groups (the -multi flag is used to show all relationships available):

symclone -g data_group query -multisymclone -g log_group query -multi

2. .Shut down the database on the production volumes. From a storage perspective, the restore process does not require the database to be shut down. However, because data blocks are changing at the storage layer, Oracle is not aware of changes taking place during the restore process. As such, data in the SGA is inconsistent with data on disk. This inconsistency requires a brief outage of the database while the restore process is initiated.


3. After the primary database has shut down, unmount the file system (if used) to ensure that nothing remains in server cache. This action is operating-system dependent.

4. Initiate the clone restore process. In this example, both the data_group and log_group device groups are restored, indicating a point-in-time recovery. If an incomplete or complete recovery is required, only the data_group device group would be restored. Execute the following TimeFinder/Clone SYMCLI commands:

symclone -g data_group restore -nopsymclone -g log_group restore -nopsymclone -g data_group query -multisymclone -g log_group query -multi



5. After the clone restore process is initiated, the production database copy is ready for recovery operations. It is possible to start the recovery process even though the data is still being restored from the BCV to the production devices. Any tracks needed, but not restored, are pulled directly from the BCV device.

6. After the restore process completes, terminate the clone/standard relationships as follows:

symclone -g data_group terminate -nopsymclone -g log_group terminate -nopsymclone -g data_group query -multisymclone -g log_group query -multi

Restore using TimeFinder/SnapTimeFinder/Snap allows a target virtual database image to be restored back to the source device or to an unrelated target device. Prior to Solutions Enabler 5.4, when data was restored from a snap back to its source device, any other snap sessions created were terminated. Beginning with SYMCLI 5.4, restore operations using TimeFinder/Snap can maintain the relationship between be the source device and any other snaps. Additional snap sessions are persistent through a restore operation to the source device.

If TimeFinder/Snap were used to create a backup database image, the TimeFinder restore process can be used to copy the backup image to the production volumes. Figure 48 shows how to use TimeFinder/Clone to make a database copy of a cold Oracle database. In this example, both the volumes containing data files and database recovery structures (archive logs, redo logs, and control files) are restored in anticipation that a point-in-time recovery (rather than a complete or incomplete recovery) will be performed.


212


Alternatively, if complete recovery is required, only the volumes containing the database data files may be restored, as shown in Figure 49.

Figure 48 Restoring a TimeFinder/Snap copy, all components

Figure 49 Restoring a TimeFinder/Snap copy, data components only

1. Verify the state of the snap devices. Volumes in the Symmetrix device group should be in an active state, although the relationship between the source volumes and virtual devices may

STD

SAVEDEV

LogDEV

ArchDEV

DataDEV

Databasehost 1 2

ICO-IMG-000515

Data copied to save area is restored to standard

STD

SAVEDEV

DataDEV

Databasehost 1 2

ICO-IMG-000516

Data copied to save area is restored to standard



also have been terminated. The following commands identify the state of the clones for each of the device groups (the -multi flag is used to show all relationships available):

symsnap -g data_group query -multisymsnap -g log_group query -multi

2. Shut down the database on the production volumes. From a storage perspective, the restore process does not require the database to be shut down. However, because data blocks are changing at the storage layer, Oracle is unaware of changes occurring during the restore process. As such, data in the SGA is inconsistent with data on disk. This inconsistency requires a brief outage of the database while the restore process is initiated.


3. After the primary database shuts down, unmount the file system (if used) to ensure that nothing remains in cache. This action is operating-system dependent.

4. Once the file systems are unmounted, initiate the snap restore process. In this example, both the data_group and log_group device groups are restored, indicating a point-in-time recovery. If an incomplete or complete recovery is required, only the data_group device group would be restored. Execute the following TimeFinder/Clone SYMCLI commands:

symsnap -g data_group restore -nopsymsnap -g log_group restore -nop

symsnap -g data_group query -restoresymsnap -g data_group query -multisymsnap -g log_group query -restoresymsnap -g log_group query -multi

5. After the snap restore process is initiated, the production database copy is ready for recovery operations. It is possible to start the recovery process even though the data is still being restored from the BCV to the production devices. Any tracks needed, but not restored, are pulled directly from the BCV device.


214


6. When the snap restore process is initiated, both the snap device and the source are set to a Not Ready status (that is, they are offline to host activity). Once the restore operation commences, the source device is set to a Ready state. Upon completion of the restore process, terminate the restore operations as follows:

symsnap -g data_group terminate -restored -nopromptsymsnap -g log_group terminate -restored -nopromptsymsnap -g data_group querysymsnap -g log_group query

Note: Terminating the restore session does not terminate the underlying snap session.

7. If the snap device is needed for further processing once the restore process completes, the virtual devices must be set to a Ready state again. This is accomplished through the command:

symld -g data_group ready VDEV001 -vdev

and so on for each VDEV in the device group.

Alternatively, after the restore process completes, the snap/standard relationships can be terminated with the following commands:

symsnap -g data_group terminate -nopromptsymsnap -g log_group terminate -nopromptsymsnap -g data_group query -multisymsnap -g log_group query -multi



Restoring a backup image using Replication ManagerReplication Manager provides automated restore procedures through a graphical user interface that simplifies the restore process. If Replication Manager is used to create a replica of the database, a restore process can simply be initiated through the Replication Manager interface.

Figure 50 demonstrates the steps performed by Replication Manager using TimeFinder/Mirror to restore a database copy so that recovery can be performed. Note that Replication Manager has the ability to leave the database in a restored state so the DBA can initiate recovery procedures or it can start the recovery process automatically. Replication Manager has several options for restoring and recovering an Oracle database.

Figure 50 Restoring Oracle using EMC Replication Manager

1. Shut down the primary Oracle database so that data can be restored.

2. Replication Manager initiates a restore operation to copy the backed up version of the database back over the primary Oracle database's datafiles.

3. A copy of the backup control file is copied from the Replication Manager host to the Oracle host for recovery procedures.

4. Copies of the required archive logs are copied from the Replication Manager host to the Oracle host for recovery procedures.

ICO-IMG-000517

Data STD

Log STD

Arch STD

Data BCV

Oracle

3

1

4

2

5

Restoring a backup image using Replication Manager 215

216


5. At this point, the database is ready for recovery. Depending on how it is configured, recovery operations may be manually or automatically initiated.

Note: For more information on Replication Manager capabilities with Oracle, consult the latest Replication Manager Product Guide or Replication Manager Administrator's Guide.



Oracle database recovery proceduresAfter the database is restored from either tape or disk, the next step in the process is to perform recovery procedures to bring the database into a consistent state and open the database for user transactions. The type of recovery implemented depends on the backup process originally used. The recovery procedures required for each of the restored database images discussed in “Restoring a backup image using TimeFinder” on page 205 and “Restoring a backup image using Replication Manager” on page 215 are described next.

Oracle restartable database recovery procedures

An Oracle point-in-time recovery here refers to the process needed to recover a backup database image taken using one of the EMC consistency technologies. Recovery examples for cold, hot backup mode, quiesced, and consistently split database backup images are described next. For each case, the recovery process is managed by the database itself through Oracle crash recovery procedures, rather than media recovery operations.

Restartable database recovery from a cold backup imageCreating a point-in-time recovered database image usually requires Oracle crash recovery procedures. However, after taking a cold backup image, no database recovery is required to bring the datafiles into a consistent state. Ensure the restored database files are available. The database can then be restarted using the commands:


Restartable database recovery from a hot backup imageFor customers with the ability to perform consistent TimeFinder splits, utilizing EMC consistent splits with hot backup mode provides an effective way to create both restartable and recoverable database images. Using both together maximizes the flexibility of options should either restart or recovery be required.

A point-in-time recovery of a restored image taken with the source in hot backup mode in general requires user-initiated recovery. To recover the hot backup image, follow these steps:

1. Recover the database to the point when the tablespaces were taken out of hot backup mode:

Oracle database recovery procedures 217

218



The recover database command initiates recovery procedures within Oracle. The control files and each of the datafiles are examined. The database then lists the appropriate archive logs to apply. Each archive log is then applied in turn to the point when the tablespaces were taken out of hot backup mode.

2. Once recovery is complete, open the database to users:

SQL> alter database open;

Note: It is also possible to simply restart the database as shown in the next section.

Restartable database recovery from a consistent split backup imageCreating a point-in-time restartable database image only requires Oracle crash recovery procedures to open it to users. Ensure the restored database files are available. The database can then be restarted using the commands:


Oracle complete recovery

A complete recovery requires that all committed transactions from the database are applied to the restored database image. Oracle media recovery procedures are initiated to determine whether any datafiles in the backup image need recovery. Oracle also determines that archive logs to apply to roll forward the database to the state before the failure occurred.

In most cases, the source database must be shut down for recovery procedures to be initiated. In some cases however, for example when only a single datafile or tablespace needs to be recovered, the process can be completed with the database open (only the tablespace needs to be taken offline).

The following subsections describe the recovery procedures for Oracle complete recovery. The type of recovery implemented depends on the backup process originally used. The recovery procedures required for each of the restored database images



discussed in “Restoring a backup image using TimeFinder” on page 205 and “Restoring a backup image using Replication Manager” on page 215 are described next.

Oracle complete recovery from a cold backup imageComplete recovery using a cold backup image of the database is easily managed as the database is already in a consistent transactional state. SCN information in the control file defines the recovery point for the database. Each of the datafiles in the database also contains the latest SCN checkpoint. This information is compared and the set of archive logs needed to roll forward the database is determined.

To recover the database from a cold backup image, follow these steps:

1. Recover the database to the point of the latest transactions.


The recover database command initiates recovery procedures within Oracle. The control files and each of the datafiles are examined. The database then lists the appropriate archive logs to apply. Each archive log is then applied in turn to the point when the tablespaces were taken out of hot backup mode. Additionally, the latest redo log information can be applied to the database by specifying the latest logs.



Oracle complete recovery for a hot backup imageIn general, a point-in-time recovery of a restored image taken with the source in hot backup mode requires user-initiated recovery. The following demonstrates the process of recovering the hot backup image:

1. The database must be fully recovered to the point of database failure. This requires that all archive logs and the latest redo logs are available.



220


The recover database command initiates recovery procedures within Oracle. The control files and each of the datafiles are examined. The database then lists the appropriate archive logs to apply. Information in each archive log is then applied to the database in turn. This is a manual process, although it can be automated if all the logs are in an appropriate directory (such as the flash recovery area) by specifying auto when Oracle requests a specific log.



Oracle complete recovery from a consistent split backup imageCurrently, Oracle does not support complete recovery for a database image taken using the -consistent split option of the TimeFinder products without putting the database in hot backup mode. Consistent split images are only supported to create database restartable images, rather than recoverable ones. Consistent split images with the database in hot backup mode can be used for both restart and recovery. Using EMC consistency technology in conjunction with hot backup mode is recommended because of the flexibility in recovery options offered.

Oracle incomplete recovery

Incomplete recovery procedures are nearly identical to the complete recovery steps defined in the last section. The primary difference however, is that instead of rolling the database to the last available transactions in the redo logs, data is only rolled forward to an earlier point specified by the DBA. Additionally, the database must be opened using the open resetlogs option.

Incomplete recovery is used for a number of reasons. User errors or logical corruptions are the primary reasons that incomplete recoveries are performed (although a new alternative is the Oracle Flashback technology). Another reason for performing incomplete recovery is due to missing archive logs during a complete recovery procedure.

Oracle incomplete recovery from a cold backup imageA point-in-time recovery of a restored image taken with the source in hot backup mode in general requires user initiated recovery. To recover the hot backup image, follow these steps:



1. Recover the database to a point beyond where the tablespaces are taken out of hot backup mode:sqlplus "/ as sysdba"SQL> startup mount;SQL> recover database until cancel;

The recover database command initiates recovery procedures within Oracle. The control files and each datafile is examined. The database then lists the appropriate archive logs to apply. Each archive log is then applied in turn to a point specified by the DBA.

Alternatively, the recover database command has additional options. For example, the database can be recovered to a specific SCN or to a particular timestamp using the following:

SQL> recover database until change SCN;

or

SQL> recover database until time timestamp;

2. Once recovery is complete, open the database using the open resetlogs option. This is necessary because the database was not recovered fully to the point of failure.SQL> alter database open resetlogs;

After opening the database with the resetlogs option, you should immediately perform a full database backup.

Oracle incomplete recovery from a hot backup imageIn general, a point-in-time recovery of a restored image taken with the source in hot backup mode requires user-initiated recovery. The following demonstrates the process of recovering the hot backup image:

1. Recover the database to a point beyond where the tablespaces were taken out of hot backup mode.


The recover database command initiates recovery procedures within Oracle. The control files and each datafile is examined. The database then lists the appropriate archive logs to apply. Each archive log is then applied in turn to the point all of the tablespaces were taken out of hot backup mode.


222


Alternatively, the recover database command has additional options. For example, the database can be recovered to a specific SCN or to a particular timestamp using the following:

SQL> recover database until change SCN;or

SQL> recover database until time timestamp;

2. Once recovery is complete, open the database using the open resetlogs option:

SQL> alter database open resetlogs;

After opening the database with the resetlogs option, immediately perform a full database backup.

Oracle incomplete recovery from a consistent split backup imageCurrently, Oracle does not support complete recovery for a database image taken using the -consistent split option of the TimeFinder products without putting the database in hot backup mode. Consistent split images are only supported to create database restartable images, rather than recoverable ones. Consistent split images with the database in hot backup mode can be used for both restart and recovery. Using EMC consistency technology in conjunction with hot backup mode is recommended because of the flexibility in recovery options offered.



Database recovery using Oracle RMANAs stated in Chapter 4, “Backing Up Oracle Environments,” Oracle Recovery Manager provides DBAs many options when performing recovery operations of an Oracle database backed up with the utility. The details of how RMAN may be used to restore and recover an Oracle database are beyond the scope of this document. The Oracle documentation Oracle Database Backup and Recovery Basics and Oracle Database Backup and Recovery Advanced User's Guide provide additional detailed information on RMAN.

Database recovery using Oracle RMAN 223

224


Oracle FlashbackOracle Flashback is a technology that helps DBAs recover from user errors to the database. Initial Flashback functionality was provided in Oracle9i but was greatly enhanced in Oracle10g. Flashback retains undo data in the form of flashback logs. Flashback logs containing undo information are periodically written by the database in order for the various types of Flashback to work.

Each type of Flashback relies on undo data being written to the flash recovery area. The flash recovery area is a file system Oracle uses to retain the flashback logs, archive logs, backups, and other

Some of the ways Flashback helps DBAs recover from user errors are:

◆ Flashback Query◆ Flashback Version Query◆ Flashback Transaction Query◆ Flashback Table◆ Flashback Drop◆ Flashback Database

Each of these recovery methods is describe in the following sections.

Note: Flashback is a recovery mechanism for logical or user errors. It is not a utility to be used in place of traditional backup and recovery techniques and is not designed to solve physical or media errors.

Flashback configuration

Flashback is enabled in a database by creating a flash recovery area for the Flashback logs to be retained, and by enabling Flashback logging. Flashback allows the database to be flashed back to any point in time. However, the Flashback logs represent discrete database points in time, and as such, ARCHIVELOG mode must also be enabled for the database. Archive log information is used in conjunction with the flashback logs to re-create any given database point-in-time state desired.

The default flashback recovery area is defined by the Oracle initialization parameter DB_RECOVERY_FILE_DEST. It is important to set this parameter with the location of a directory that can hold the flashback logs. The required size of this file system depends on how far back a user may want to flash back the database to, and whether



other objects such as archive logs and database backups, will be written to this directory. A maximum size for this directory is specified by the DB_RECOVERY_FILE_DEST_SIZE (no default) parameter. In some cases, Oracle recommends up to three times the actual database size for the flash recovery area.

To enable Flashback log generation in a database, enter the following:

SQL> startup mount;SQL> alter database flashback on;

To turn off Flashback, enter the following:

SQL> startup mount;SQL> alter database flashback off;

To identify the state of Flashback, use the following query:

SELECT name, current_scn, flashback_onFROM v$database;

In addition to establishing the flash recovery area and enabling Flashback log generation, set the initialization parameter DB_FLASHBACK_RETENTION_TARGET (default of 1440 minutes, or one day) to define the targeted amount of Flashback log retention. This parameter determines how far back a database can be flashed back. In addition, the oldest currently available SCN and time in the Flashback logs can be determined through the query:

SELECT oldest_flashback_scn, oldest_flashback_timeFROM v$flashback_database_log;

Additional information concerning the flashback logs may also be found in the v$flashback_database_log view.

Flashback Query

Flashback Query displays versions of queries run against a database as they looked at a previous time. For example, if a user dropped a selection of rows from a database erroneously, Flashback Query allows that user to run queries against the table as if it were at that time.

The following is an example of the Flashback Query functionality:

SELECT first_name, last_nameFROM empAS OF TIMESTAMP TO_TIMESTAMP('2005-11-25 11:00:00', 'YYYY-MM-DD

HH:MI:SS')

Oracle Flashback 225

226


WHERE salary = '100000';

This can be used to return rows deleted as well. For example:

INSERT INTO emp (SELECT first_name, last_name FROM emp AS OF TIMESTAMP TO_TIMESTAMP('2005-11-25 11:00:00', 'YYYY-MM-DD

HH:MI:SS') WHERE last_name = 'PENDLE');

Flashback Version QueryFlashback Version Query displays versions of rows in a table during a specified time interval. This functionality is helpful in auditing changes to particular rows in a database as well as for seeing previous values of rows during a set time interval.

Flashback Transaction QueryFlashback Transaction Query presents changes made by transactions or sets of transactions in the database during a specified time period.

Flashback TableFlashback Table returns a table back into the state that it was at a specified time. It is particularly useful in that this change can be made while the database is up and running. The following is an example of the Flashback Table functionality:

FLASHBACK TABLE empTO TIMESTAMP TO_TIMESTAMP('2005-11-26 10:30:00', 'YYYY-MM-DD

HH:MI:SS');An SCN can also be used:FLASHBACK TABLE empTO SCN 54395;

Flashback DropIf tables in Oracle are dropped inadvertently using a DROP TABLE command, Flashback Drop can reverse the process, reenabling access to the drop table. As long as space is available, the DROP TABLE command does not delete data in the tablespace data files. Instead,



the table data is retained (in Oracle's "recycle bin") and the table is renamed to an internally system-defined name. If the table is needed, Oracle can bring back the table by renaming it with its old name.

The following shows an example of a table being dropped and then brought back using the FLASHBACK TABLE command.

1. Determine the tables owned by the currently connected user:

SQL> SELECT * FROM tab;

TNAME TABTYPE CLUSTERID --------------------------------TEST TABLE

2. Drop the table:

SQL> DROP TABLE test;SQL> SELECT * FROM tab;no rows selectedSQL>

3. Ensure the table is placed in the recycle bin:

SQL> show recyclebin;ORIGINAL NAME RECYCLEBIN NAME OBJECT TYPE DROP TIME------------- --------------------- ----------- --------------TEST BIN$wdadid/3akdah3a69 TABLE 2005-11-26:10:

4. Recover the table:

FLASHBACK TABLE testTO BEFORE DROP;

5. Verify the table is back:

SQL> SELECT * FROM tab;

TNAME TAB TYPE CLUSTERID--------------------------------TEST TABLE

Flashback DatabaseFlashback Database logically recovers the entire database to a previous point in time. A database can be rolled back in time to the point before a user error such as a batch update or set of transactions logically corrupted the database. The database can rolled back to a particular SCN, redo log sequence number, or timestamp. The following is the syntax of the FLASHBACK DATABASE command:

Oracle Flashback 227

228


FLASHBACK [DEVICE TYPE = device type DATABASETO [BEFORE] SCN = scn TO [BEFORE] SEQUENCE = sequence # [THREAD = thread id] TO [BEFORE] TIME = 'date_string'

The following is an example of Flashback Database used to recover database table data inadvertently dropped. The particular SCN before the transaction is identified and the database flashed to an SCN just before the bad transactions occurred.

1. Identify the available Flashback logs. The following lists the available Flashback logs and the first SCN associated with each one:

SELECT log#, first_change#, first_time

FROM v$database_flashback_logfile;

The Flashback log should be validated for the particular SCN desired.

2. Shut down the database and restart it in mount mode for the full database flashback.

SQL> shutdown immediate;

SQL> startup mount;

3. Flash back the database.

SQL> flashback database to scn = 23104;

4. Open the database for use. To make the database consistent, open the database as follows:


After opening the database with the resetlogs option, immediately perform a full database backup.


6


◆ Introduction ............................................................................................ 230◆ Definitions............................................................................................... 231◆ Design considerations for disaster restart and disaster recovery ... 233◆ Tape-based solutions ............................................................................. 239◆ Remote replication challenges.............................................................. 241◆ Array-based remote replication ........................................................... 246◆ Planning for array-based replication................................................... 247◆ SRDF/S single Symmetrix array to single Symmetrix array........... 250◆ SRDF/S and consistency groups ......................................................... 253◆ SRDF/A................................................................................................... 260◆ SRDF/AR single hop............................................................................. 266◆ SRDF/AR multihop............................................................................... 269◆ Database log-shipping solutions.......................................................... 272◆ Running database solutions ................................................................. 286

Understanding OracleDisaster Restart &

Disaster Recovery

Understanding Oracle Disaster Restart & Disaster Recovery 229

230

Understanding Oracle Disaster Restart & Disaster Recovery

IntroductionA critical part of managing a database is planning for unexpected loss of data. The loss can occur from a disaster such as a fire or flood or it can come from hardware or software failures. It can even come through human error or malicious intent. In each instance, the database must be restored to some usable point, before application services can resume.

The effectiveness of any plan for restart or recovery involves answering the following questions:

◆ How much downtime is acceptable to the business?

◆ How much data loss is acceptable to the business?

◆ How complex is the solution?

◆ Does the solution accommodate the data architecture?

◆ How much does the solution cost?

◆ What disasters does the solution protect against?

◆ Is there protection against logical corruption?

◆ Is there protection against physical corruption?

◆ Is the database restartable or recoverable?

◆ Can the solution be tested?

◆ If failover happens, will failback work?

All restart and recovery plans include a replication component. In its simplest form, the replication process may be as easy as making a tape copy of the database and application. In a more sophisticated form, it could be realtime replication of all changed data to some remote location. Remote replication of data has its own challenges centered around:

◆ Distance◆ Propagation delay (latency)◆ Network infrastructure◆ Data loss

This section provides an introduction to the spectrum of disaster recovery and disaster restart solutions for Oracle databases on EMC Symmetrix arrays.



DefinitionsIn the following sections, the terms dependent-write consistency, database restart, database recovery, and roll-forward recovery are used. A clear definition of these terms is required to understand the context of this section.

Dependent-write consistency

A dependent-write I/O is one that cannot be issued until a related predecessor I/O has completed. Dependent-write consistency is a data state where data integrity is guaranteed by dependent-write I/Os embedded in application logic. Database management systems are good examples of the practice of dependent-write consistency.

Database management systems must devise protection against abnormal termination to successfully recover from one. The most common technique used is to guarantee that a dependent-write cannot be issued until a predecessor write has completed. Typically the dependent-write is a data or index write while the predecessor write is a write to the log. Because the write to the log must be completed prior to issuing the dependent-write, the application thread is synchronous to the log write (that is, it waits for that write to complete prior to continuing). The result of this strategy is a dependent-write consistent database.

Database restart

Database restart is the implicit application of database logs during the database's normal initialization process to ensure a transactionally consistent data state.

If a database is shut down normally, the process of getting to a point of consistency during restart requires minimal work. If the database abnormally terminates, then the restart process will take longer depending on the number and size of in-flight transactions at the time of termination. An image of the database created by using EMC consistency technology while it is running, without conditioning the database, will be in a dependent-write consistent data state, which is similar to that created by a local power failure. This is also known as a DBMS restartable image. The restart of this image transforms it to a

Definitions 231

232


transactionally consistent data state by completing committed transactions and rolling back uncommitted transactions during the normal database initialization process.

Database recoveryDatabase recovery is the process of rebuilding a database from a backup image, and then explicitly applying subsequent logs to roll forward the data state to a designated point of consistency. Database recovery is only possible with databases configured with archive logging.

A recoverable Oracle database copy can be taken in one of three ways:

◆ With the database shut down and copying the database components using external tools

◆ With the database running using the Oracle backup utility Recovery Manager (RMAN)

◆ With the database in hot backup mode and copying the database using external tools

Roll-forward recoveryWith some databases, it may be possible to take a DBMS restartable image of the database, and apply subsequent archive logs, to roll forward the database to a point in time after the image was created. This means that the image created can be used in a backup strategy in combination with archive logs. At the time of printing, a DBMS restartable image of Oracle cannot use subsequent logs to roll forward transactions. In most cases, during a disaster, the storage array image at the remote site will be an Oracle DBMS restartable image and cannot have archive logs applied to it.



Design considerations for disaster restart and disaster recoveryLoss of data or loss of application availability has a varying impact from one business type to another. For instance, the loss of transactions for a bank could cost millions, whereas system downtime may not have a major fiscal impact. On the other hand, businesses that are primarily web-based may require 100 percent application availability to survive. The two factors, loss of data and loss of uptime are the business drivers that are baseline requirements for a DR solution. When quantified, these two factors are more formally known as Recovery Point Objective (RPO) and Recovery Time Objective (RTO), respectively.

When evaluating a solution, the RPO and RTO requirements of the business need to be met. In addition, the solution needs to consider operational complexity, cost, and the ability to return the whole business to a point of consistency. Each aspect is discussed in the following sections.

Recovery Point Objective

The RPO is a point of consistency to which a user wants to recover or restart. It is measured in the amount of time from when the point of consistency was created or captured to the time the disaster occurred. This time equates to the acceptable amount of data loss. Zero data loss (no loss of committed transactions from the time of the disaster) is the ideal goal, but the high cost of implementing such a solution must be weighed against the business impact and cost of a controlled data loss.

Some organizations, like banks, have zero data loss requirements. The database transactions entered at one location must be replicated immediately to another location. This can have an impact on application performance when the two locations are far apart. On the other hand, keeping the two locations close to one another might not protect against a regional disaster like the Northeast power outage or the hurricanes in Florida.

Defining the required RPO is usually a compromise between the needs of the business, the cost of the solution, and the risk of a particular event happening.

Design considerations for disaster restart and disaster recovery 233

234


Recovery Time ObjectiveThe RTO is the maximum amount of time allowed for recovery or restart to a specified point of consistency. This time involves many factors, including the time taken to:

◆ Provision power, utilities, etc.

◆ Provision servers with the application and database software.

◆ Configure the network.

◆ Restore the data at the new site.

◆ Roll forward the data to a known point of consistency.

◆ Validate the data.

Some delays can be reduced or eliminated by choosing certain DR options, such as having a hot site where servers are preconfigured and on standby. Also, if storage-based replication is used, the time taken to restore the data to a usable state is completely eliminated.

As with RPO, each solution for RTO will have a different cost profile. Defining the RTO is usually a compromise between the cost of the solution and the cost to the business when database and applications are unavailable.

Operational complexityThe operational complexity of a DR solution may be the most critical factor in determining the success or failure of a DR activity. The complexity of a DR solution can be considered as three separate phases.

1. Configuration of initial setup of the implementation

2. Maintenance and management of the running solution

3. Execution of the DR plan in the event of a disaster

While initial configuration complexity and running complexity can be a demand on human resources, the third phase, execution of the plan, is where automation and simplicity must be the focus. When a disaster is declared, key personnel may be unavailable in addition to the loss of servers, storage, networks, buildings, and so on. If the complexity of the DR solution is such that skilled personnel with an



intimate knowledge of all systems involved are required to restore, recover and validate application and database services, the solution has a high probability of failure.

Multiple database environments grow organically over time into complex federated database architectures. In these federated database environments, reducing the complexity of DR is absolutely critical. Validation of transactional consistency within the database architecture is time consuming, costly, and requires application and database familiarity. One reason for the complexity is the heterogeneous databases and operating systems involved in these environments. Across multiple heterogeneous platforms it is hard to establish a common clock and therefore hard to determine a business point of consistency across all platforms. This business point of consistency has to be created from intimate knowledge of the transactions and data flows.

Source server activity

DR solutions may or may not require additional processing activity on the source servers. The extent of that activity can impact both response time and throughput of the production application. This effect should be understood and quantified for any given solution to ensure the impact to the business is minimized. The effect for some solutions is continuous while the production application is running; for other solutions, the impact is sporadic, where bursts of write activity are followed by periods of inactivity.

Production impact Some DR solutions delay the host activity while taking actions to propagate the changed data to another location. This action only affects write activity and although the introduced delay may only be of the order of a few milliseconds, it can impact response time in a high-write environment. Synchronous solutions introduce delay into write transactions at the source site; asynchronous solutions do not.

Target server activitySome DR solutions require a target server at the remote location to perform DR operations. The server has both software and hardware costs and needs personnel with physical access to it for basic


236


operational functions like power on and off. Ideally, this server could have some usage such as running development or test databases and applications. Some DR solutions require more target server activity and some require none.

Number of copies of dataDR solutions require replication of data in one form or another. Replication of a database and associated files can be as simple as making a tape backup and shipping the tapes to a DR site or as sophisticated as asynchronous array-based replication. Some solutions require multiple copies of the data to support DR functions. More copies of the data may be required to perform testing of the DR solution in addition to those that support the DR process.

Distance for solutionDisasters, when they occur, have differing ranges of impact. For instance, a fire may take out a building, an earthquake may destroy a city, or a tidal wave may devastate a region. The level of protection for a DR solution should address the probable disasters for a given location. For example when protecting against an earthquake, the DR site should not be in the same locale as the production site. For regional protection, the two sites need to be in two different regions. The distance associated with the DR solution affects the kind of DR solution that can be implemented.

Bandwidth requirementsOne of the largest costs for DR is in provisioning bandwidth for the solution. Bandwidth costs are an operational expense; this makes solutions that have reduced bandwidth requirements very attractive to customers. It is important to recognize in advance the bandwidth consumption of a given solution to be able to anticipate the running costs. Incorrect provisioning of bandwidth for DR solutions can have an adverse affect on production performance and can invalidate the overall solution.



Federated consistencyDatabases are rarely isolated islands of information with no interaction or integration with other applications or databases. Most commonly, databases are loosely and/or tightly coupled to other databases using triggers, database links, and stored procedures. Some databases provide information downstream for other databases using information distribution middleware; other databases receive feeds and inbound data from message queues and EDI transactions. The result can be a complex interwoven architecture with multiple interrelationships. This is referred to as a federated database architecture.

With a federated database architecture, making a DR copy of a single database without regard to other components invites consistency issues and creates logical data integrity problems. All components in a federated architecture need to be recovered or restarted to the same dependent-write consistent point of time to avoid these problems.

It is possible then that point database solutions for DR, such as log shipping, do not provide the required business point of consistency in a federated database architecture. Federated consistency solutions guarantee that all components, databases, applications, middleware, flat files, and such are recovered or restarted to the same dependent-write consistent point in time.

Testing the solutionTested, proven, and documented procedures are also required for a DR solution. Many times the DR test procedures are operationally different from a set of true disaster procedures. Operational procedures need to be clearly documented. In the best-case scenario, companies should periodically execute the actual set of procedures for DR. This could be costly to the business because of the application downtime required to perform such a test, but is necessary to ensure validity of the DR solution.


238


CostThe cost of doing DR can be justified by comparing it to the cost of not doing it. What does it cost the business when the database and application systems are unavailable to users? For some companies, this is easily measurable, and revenue loss can be calculated per hour of downtime or per hour of data loss.

Whatever the business, the DR cost is going to be an extra expense item and, in many cases, with little in return. The costs include, but are not limited to:

◆ Hardware (storage, servers and maintenance)

◆ Software licenses and maintenance

◆ Facility leasing/purchase

◆ Utilities

◆ Network infrastructure

◆ Personnel



Tape-based solutionsThis sectin discusses the following tape-based solutions:

◆ “Tape-based disaster recovery” on page 239

◆ “Tape-based disaster restart” on page 239

Tape-based disaster recoveryTraditionally, the most common form of disaster recovery was to make a copy of the database onto tape and use PTAM (Pickup Truck Access Method) to take the tapes offsite to a hardened facility. In most cases, the database and application needed to be available to users during the backup process. Taking a backup of a running database created a "fuzzy" image of the database on tape, one that required database recovery after the image had been restored. Recovery usually involved application of logs that were active during the time the backup was in process. These logs had to be archived and kept with the backup image to ensure successful recovery.

The rapid growth of data over the last two decades indicates this method is unmanageable. Making a hot copy of the database is now the standard, but this method has its own challenges. How can a consistent copy of the database and supporting files be made when they are changing throughout the duration of the backup? What exactly is the content of the tape backup at completion? The reality is that the tape data is a "fuzzy image" of the disk data, and considerable expertise is required to restore the database back to a database point of consistency.

In addition, the challenge of returning the data to a business point of consistency, where a particular database must be recovered to the same point as other databases or applications, is making this solution less viable.

Tape-based disaster restartTape-based disaster restart is a recent development in disaster recovery strategies and is used to avoid the "fuzziness" of a backup taken while the database and application are running. A "restart" copy of the system data is created by locally mirroring the disks that contain the production data, and splitting off the mirrors to create a dependent-write consistent point-in-time image of the disks. This

Tape-based solutions 239

240


image is a DBMS restartable image as described earlier. Thus, if this image was restored and the database brought up, the database would perform an implicit recovery to attain transactional consistency. Roll-forward recovery using archived logs from this database image is not possible with Oracle without conditioning the database prior to the consistent split. This conditioning process is described in “Copying the database with Oracle in hot backup mode” on page 125.

The restartable image on the disks can be backed up to tape and moved offsite to a secondary facility. If this image is created and shipped offsite on a daily basis, the maximum amount of data loss is 24 hours.

The time taken to restore the database is a factor to consider since reading from tape is typically slow. Consequently, this solution can be effective for customers with relaxed RTOs.



Remote replication challengesReplicating database information over long distances for the purpose of disaster recovery is challenging. Synchronous replication over distances greater than 200 km may be unfeasible due to the negative impact on the performance of writes because of propagation delay; some form of asynchronous replication must be adopted. Considerations in this section apply to all forms of remote replication technology whether they are array-based, host-based, or managed by the database.

Remote replication solutions usually start with initially copying a full database image to the remote location. This is called instantiation of the database. There are a variety of ways to perform this. After instantiation, only the changes from the source site are replicated to the target site in an effort to keep the target up to date. Some methodologies may not send all of the changes (certain log shipping techniques for instance), by omission rather than design. These methodologies may require periodic re-instantiation of the database at the remote site.

The following considerations apply to remote replication of databases:

◆ Propagation delay (latency due to distance)

◆ Bandwidth requirements

◆ Network infrastructure

◆ Method of instantiation

◆ Method of reinstantiation

◆ Change rate at the source site

◆ Locality of reference

◆ Expected data loss

◆ Failback operations

Propagation delayElectronic operations execute at the speed of light. The speed of light in a vacuum is 186,000 miles per second. The speed of light through glass (in the case of fiber-optic media) is less, approximately 115,000 miles per second. In other words, in an optical network, such as

Remote replication challenges 241

242


SONET for instance, it takes 1 millisecond to send a data packet 125 miles or 8 milliseconds for 1,000 miles. All remote replication solutions need to be designed with a clear understanding of the propagation delay impact.

Bandwidth requirementsAll remote replication solutions have some bandwidth requirements because the changes from the source site must be propagated to the target site. The more changes there are, the greater the bandwidth that is needed. It is the change rate and replication methodology that determine the bandwidth requirement, not necessarily the size of the database.

Data compression can help reduce the quantity of data transmitted and therefore the size of the "pipe" required. Certain network devices, like switches and routers, provide native compression, some by software and some by hardware. GigE directors provide native compression in a DMX to DMX SRDF pairing. The amount of compression achieved depends on the type of data being compressed. Typical character and numeric database data compresses at about a 2-to-1 ratio. A good way to estimate how the data will compress is to assess how much tape space is required to store the database during a full-backup process. Tape drives perform hardware compression on the data prior to writing it. For instance, if a 300 GB database takes 200 GB of space on tape, the compression ratio is 1.5 to 1.

For most customers, a major consideration in the disaster recovery design is cost. It is important to recognize that some components of the end solution represent a capital expenditure and some an operational expenditure. Bandwidth costs are operational expenses and thus any reduction in this area, even at the cost of some capital expense, is highly desirable.

Network infrastructure The choice of channel extension equipment, network protocols, switches, routers, and such, ultimately determines the operational characteristics of the solution. EMC has a proprietary "BC Design Tool" to assist customers in analysis of the source systems and to determine the required network infrastructure to support a remote replication solution.



Method of instantiationIn all remote replication solutions, a common requirement is for an initial, consistent copy of the complete database to be replicated to the remote site. The initial copy from source to target is called instantiation of the database at the remote site. Following instantiation, only the changes made at the source site are replicated. For large databases, sending only the changes after the initial copy is the only practical and cost-effective solution for remote database replication.

In some solutions, instantiation of the database at the remote site uses a process similar to the one that replicates the changes. Some solutions do not even provide for instantiation at the remote site (log shipping for instance). In all cases it is critical to understand the pros and cons of the complete solution.

Method of reinstantiationSome methods of remote replication require periodic refreshing of the remote system with a full copy of the database. This is called reinstantiation. Technologies such as log shipping frequently require this since not all activity on the production database may be represented in the log. In these cases, the disaster recovery plan must account for re-instantiation and also for the fact there may be a disaster during the refresh. The business objectives of RPO and RTO must likewise be met under those circumstances.

Change rate at the source siteAfter instantiation of the database at the remote site, only changes to the database are replicated remotely. There are many methods of replication to the remote site and each has differing operational characteristics. The changes can be replicated using logging technology (hardware and software mirroring for example). Before designing a solution with remote replication, it is important to quantify the average change rate. It is also important to quantify the change rate during periods of burst write activity. These periods might correspond to end of month/quarter/year processing, billing, or payroll cycles. The solution needs to allow for peak write workloads.


244


Locality of referenceLocality of reference is a factor that needs to be measured to understand if there will be a reduction of bandwidth consumption when any form of asynchronous transmission is used. Locality of reference is a measurement of how much write activity on the source is skewed. For instance, a high locality of reference application may make many updates to a few tables in the database, whereas a low locality of reference application rarely updates the same rows in the same tables during a given time period. While the activity on the tables may have a low locality of reference, the write activity into an index might be clustered when inserted rows have the same or similar index column values. This renders a high locality of reference on the index components.

In some asynchronous replication solutions, updates are "batched" into periods of time and sent to the remote site to be applied. In a given batch, only the last image of a given row/block is replicated to the remote site. So, for highly skewed application writes, this results in bandwidth savings. Generally, the greater the time period of batched updates, the greater the savings on bandwidth.

Log-shipping technologies do not consider locality of reference. For example, a row updated 100 times, is transmitted 100 times to the remote site, whether the solution is synchronous or asynchronous.

Expected data lossSynchronous DR solutions are zero data loss solutions; there is no loss of committed transactions from the time of the disaster. Synchronous solutions also may be impacted by a rolling disaster in which case, work completed at the source site after the rolling disaster started may be lost. Rolling disasters are discussed in detail in a later section.

Nonsynchronous DR solutions have the potential for data loss. How much data is lost depends on many factors, most of which are defined earlier. For asynchronous replication, where updates are batched and sent to the remote site, the maximum amount of data lost will be two cycles or two batches worth. The two cycles that may be lost include the cycle currently being captured on the source site and the one currently transmitted to the remote site. With inadequate network bandwidth, data loss could increase due the increased transmission time.



Failback operationsIf there is the slightest chance that failover to the DR site may be required, then there is a 100 percent chance that failback to the primary site also will be required, unless the primary site is lost permanently. The DR architecture should be designed to make failback simple, efficient, and low risk. If failback is not planned for, there may be no reasonable or acceptable way to move the processing from the DR site, where the applications may be running on tier 2 servers and tier 2 networks, and so forth, back to the production site.

In a perfect world, the DR process should be tested once a quarter, with database and application services fully failed over to the DR site. The integrity of the application and database must be verified at the remote site to ensure all required data copied successfully. Ideally, production services are brought up at the DR site as the ultimate test. This means production data is maintained on the DR site, requiring a failback when the DR test completed. While this is not always possible, it is the ultimate test of a DR solution. It not only validates the DR process, but also trains the staff on managing the DR process should a catastrophic failure occur. The downside for this approach is that duplicate sets of servers and storage need to be present to make an effective and meaningful test. This tends to be an expensive proposition.


246


Array-based remote replicationCustomers can use the capabilities of a Symmetrix storage array to replicate the database from the production location to a secondary location. No host CPU cycles are used for this, leaving the host dedicated to running the production application and database. In addition, no host I/O is required to facilitate this; the array takes care of all replication and no hosts are required at the target location to manage the target array.

EMC provides multiple solutions for remote replication of databases:

◆ SRDF/S: Synchronous SRDF

◆ SRDF/A: Asynchronous SRDF

◆ SRDF/AR: SRDF Automated Replication

Each of these solutions is discussed in detail in the following sections. To use any of the array-based solutions, it is necessary to coordinate the disk layout of the databases with this replication in mind.



Planning for array-based replicationAll Symmetrix solutions replicating data from one array to another are disk based. This allows the Symmetrix array to be neutral to volume manager, file system, database system, etc. However, this does not mean that file system and volume manager concerns can be ignored. For example, it is impossible to replicate a single disk from an AIX volume group and import it to another host. Effectively, the smallest level of granularity for disk-based replication is a volume group, in the case of UNIX. On Windows, the smallest unit could be a disk, a volume set, or disk group, depending on how the disks are set up in disk manager.

In addition, if a database is to be replicated independently of other databases, it should have its own dedicated disks. That is, the disks used by a database should not be shared with other applications or databases.

In many cases, when a database is restored, only the tablespace containers should be restored and not the logs. An array-based restore copies the whole host volume, so if the current logs need to be preserved, then they should be placed on separate volumes from the tablespace containers. Logically, the database can be divided into recovery structures and data. Figure 51 on page 248 shows the separation of the recovery structures and data components for an Oracle database in preparation for a TimeFinder implementation. This separation is useful for restoring the data, and then applying the log to some known point of consistency. This is usually for local replication and recovery purposes, but can be used for solutions that combine database and array-based replication solutions. Typically, this separation is less important for remote replication for restart purposes.

Planning for array-based replication 247

248


Figure 51 Database components for Oracle

When a set of volumes is defined for a database for remote replication, care must be taken to ensure the disks hold everything needed to restart the database at the remote site. Simply replicating the tablespace containers is in sufficient. The following is a list of the objects that must be replicated in addition to the tablespace container directories:

◆ Oracle binaries: Remotely replicating the Oracle binaries directory is optional. If not installed on the remote host, this directory must be replicated to start Oracle. The version and patch level of the binaries should be the same on both the source and target systems.

◆ Redo log directories: Place the redo log directories on the replicated disks. Use the following series of commands to change the active log location if they are not located on Symmetrix DMX storage:

1. Shut down the database:

sqlplus SQL> connect / as sysdba

Archivelogs

Redologs

Control

Archivelogs

Redologs

Control

SYSTEMSYSAUX

DATA

INDEX UNDOTEMP

SYSTEMSYSAUX

DATA

INDEX UNDO

Standarddevices

BCVs

SYMMETRIX

ICO-IMG-000512



SQL> shutdownSQL> exit

2. Move the datafiles using O/S commands from the old location to the new location:

mv /oracle/oldlogs/log1a.rdo /oracle/newlogs/log1a.rdomv /oracle/oldlogs/log1b.rdo /oracle/newlogs/log1b.rdo

3. Start the database in mount mode.

sqlplusSQL> startup mountSQL> alter database rename file '/oracle/oldlog/log1a.rdo',

'/oracle/newlog/log1a.rdo''/oracle/oldlog/log1b.rdo',

'/oracle/newlog/log1b.rdo';

4. Open the database.


◆ Archive log directory: Place the archive log directory on the replicated disks. The archive log directory is identified in the init.ora startup file.

◆ Control files: Operational procedures are required to ensure that when additional data files are added, the new versions are copied to the DR location or at least placed on a replicated disk to guarantee the files are at the remote site in a disaster. If an array-based solution is used, placement of these files on replicated disks solves this problem.

Planning for array-based replication 249

250


SRDF/S single Symmetrix array to single Symmetrix arraySynchronous SRDF, or SRDF/S, is a method of replicating production data changes from locations that are no greater than 200 km apart. Synchronous replication takes writes that are inbound to the source Symmetrix array and copies them to the target Symmetrix array. The write operation is not acknowledged as complete to the host until both Symmetrix arrays have the data in cache. While the following examples involve Symmetrix arrays, the fundamentals of synchronous replication described here are true for all synchronous replication solutions. Figure 52 shows the process.

Figure 52 Synchronous replication internals

1. A write is received in the source Symmetrix cache. At this time, the host has not received acknowledgement that the write is complete.

2. The source Symmetrix array uses SRDF/S to push the write to the target Symmetrix array.

3. The target Symmetrix array sends an acknowledgement back to the source that the write was received.

4. Ending status of the write is presented to the host.

These four steps cause a delay in the processing of writes as perceived by the database on the source server. The amount of delay depends on the exact configuration of the network, the storage, the write block size, and the distance between the two locations. Note that reads to the source Symmetrix array are not affected by the replication.

The following steps outline the process of setting up synchronous replication using Solutions Enabler (SYMCLI) commands.

Source TargetICO-IMG-000518

2

3

1

4

Oracle



1. Before the synchronous mode of SRDF can be established, initial instantiation of the database is required. In other words, first create a baseline full copy of all the volumes participating in the synchronous replication. This is usually accomplished using the adaptive copy mode of SRDF. Create a group device_group as follows:

symdg create device_group -type rdf1

2. Add disks 123, 124, and 12f to the group device_group:

symld -g device_group add dev 123symld -g device_group add dev 124symld -g device_group add dev 12f

3. Put the group device_group into adaptive copy mode:

symrdf -g device_group set mode acp_disk -nop

4. Instruct the source Symmetrix array to send all the tracks on the source site to the target site using the current mode:

symrdf -g device_group establish -full -noprompt

The adaptive copy mode of SRDF has no impact to host application performance. It transmits tracks to the remote site never sent before or changed since the last time the track was sent. It does not preserve write order or dependent-write consistency.

5. When both sides are synchronized, put SRDF into synchronous mode. In the following command, the device group device_group is put into synchronous mode:

symrdf -g device_group set mode sync -nop

Note: There is no requirement for a host at the remote site during the synchronous replication. The target Symmetrix array itself manages the in-bound writes and updates the appropriate volumes in the array.

Dependent-write consistency is inherent in a synchronous relationship as the target R2 volumes are at all times equal to the source provided that a single RA group is used. If multiple RA groups are used or if multiple Symmetrix arrays are used on the source site, SRDF Consistency Groups (SRDF/CG) must be used to guarantee consistency. SRDF/CG is described below.

SRDF/S single Symmetrix array to single Symmetrix array 251

252


How to restart in the event of a disasterIn the event of a disaster where the primary source Symmetrix array is lost, run database and application services from the DR site. A host at the DR site is required for this. The first requirement is to write-enable the R2 devices. If the device_group device group is not built on the remote host, it must be created using the R2 devices that were mirrors of the R1 devices on the source Symmetrix array. Group Named Services (GNS) can be used to propagate the device group to the remote site if there is a host used there. The Solutions Enabler Symmetrix Base Management CLI Product Guide provides more details on GNS.

To write-enable the R2s in group device_group, enter the following:

symld -g device_group rw_enable -noprompt

At this point, the host can issue the necessary commands to access the disks. For instance, on a UNIX host, import the volume group, activate the logical volumes, fsck the file systems and mount them.

Once the data is available to the host, the database can restart. The database will perform an implicit recovery when restarted. Transactions that were committed, but not completed, are rolled forward and completed using the information in the redo logs. Transactions that have updates applied to the database, but were not committed, are rolled back. The result is a transactionally consistent database.



SRDF/S and consistency groupsZero data loss disaster recovery techniques tend to use straightforward database and application restart procedures. These procedures work well if all processing and data mirroring stop at the same instant in time at the production site, when a disaster happens. Such is the case when there is a site power failure.

However, in most cases, it is unlikely that all data processing ceases at an instant in time. Computing operations can be measured in nanoseconds and even if a disaster takes only a millisecond to complete, many such computing operations could be completed between the start of a disaster until all data processing ceases. This gives us the notion of a rolling disaster. A rolling disaster is a series of events taken over a period of time that comprise a true disaster. The specific period of time that makes up a rolling disaster could be milliseconds (in the case of an explosion) or minutes in the case of a fire. In both cases, the DR site must be protected against data inconsistency.

Rolling disaster

Protection against a rolling disaster is required when the data for a database resides on more than one Symmetrix array or multiple RA groups. Figure 53 on page 254 depicts a dependent-write I/O sequence where a predecessor log write is happening prior to a page flush from a database buffer pool. The log device and data device are on different Symmetrix arrays with different replication paths. Figure 53 demonstrates how rolling disasters can affect this dependent-write sequence

SRDF/S and consistency groups 253

254


.

Figure 53 Rolling disaster with multiple production Symmetrix arrays

1. This example of a rolling disaster starts with a loss of the synchronous links between the bottom source Symmetrix array and the target Symmetrix array. This will prevent the remote replication of data on the bottom source Symmetrix array.

2. The Symmetrix array, which is now no longer replicating, receives a predecessor log write of a dependent-write I/O sequence. The local I/O is completed, however it is not replicated to the remote Symmetrix array, and the tracks are marked as being 'owed' to the target Symmetrix array. Nothing prevents the predecessor log write from completing to the host completing the acknowledgement process.

3. Now that the predecessor log write is complete, the dependent data write is issued. This write is received on both the source and target Symmetrix arrays because the rolling disaster does not affect those communication links.

4. If the rolling disaster ended in a complete disaster, the condition of the data at the remote site is such that it creates a "data ahead of log" condition, which is an inconsistent state for a database. The severity of the situation is when the database is restarted, performing an implicit recovery, it may not detect the inconsistencies. A person extremely familiar with the transactions

Host

DBMS

1

R1(Z)

R1(Y)

R1(X)

R2(Y)

R2(Z)

R2(X)

R1(C)

R1(B)

R1(A)

R2(B)

R2(C)

R2(A)

ICO-IMG-000519

Dataaheadof Log

X = Application DataY = DBMS DataZ = Logs

2

3

3

4



running at the time of the rolling disaster may detect the inconsistencies. Database utilities can run to detect some of the inconsistencies.

A rolling disaster can happen so data links providing remote mirroring support are disabled in a staggered fashion, while application and database processing continues at the production site. The sustained replication during the time when some Symmetrix arrays are communicating with their remote partners through their respective links while other Symmetrix arrays are not (due to link failures) can cause data integrity exposure at the recovery site. Some data integrity problems caused by the rolling disaster cannot be resolved through normal database restart processing and may require a full database recovery using appropriate backups, journals, and logs. A full database recovery elongates overall application restart time at the recovery site.

Protection against a rolling disaster

SRDF consistency group (SRDF/CG) technology provides protection against rolling disasters. A consistency group is a set of Symmetrix volumes spanning multiple RA groups and/or multiple Symmetrix arrays that replicate as a logical group to other Symmetrix arrays using synchronous SRDF. It is not a requirement to span multiple RA groups and/or Symmetrix arrays when using consistency groups. Consistency group technology guarantees that if a single-source volume is unable to replicate to its partner for any reason, then all volumes in the group stop replicating. This ensures that the image of the data on the target Symmetrix array is consistent from a dependent-write perspective.

Figure 54 on page 256 depicts a dependent-write I/O sequence where a predecessor log write is happening prior to a page flush from a database buffer pool. The log device and data device are on different Symmetrix arrays with different replication paths. Figure 54 demonstrates how rolling disasters can be prevented using EMC consistency group technology.


256


Figure 54 Rolling disaster with SRDF consistency group protection

1. Consistency group protection is defined containing volumes X, Y, and Z on the source Symmetrix array. This consistency group definition must contain all the devices required to maintain dependent-write consistency and reside on all participating hosts involved in issuing I/O to these devices. A mix of CKD (mainframe) and FBA (UNIX/Windows) devices can be logically grouped together. In some cases, the entire processing environment may be defined in a consistency group to ensure dependent-write consistency.

2. The rolling disaster just described begins preventing the replication of changes from volume Z to the remote site.

3. The predecessor log write occurs to volume Z, causing a consistency group (ConGroup) trip.

4. A ConGroup trip will hold the I/O that could not be replicated along with all of the I/O to the logically grouped devices. The I/O is held by PowerPath on the UNIX or Windows hosts, and IOS on the mainframe host. It is held long enough to issue two I/Os per Symmetrix array. The first I/O will put the devices in a suspend-pending state.

Host 1

DBMS

Solutions EnablerConGroup definition

Host 2

DBMS

SCF/SYMAPI

IOS/PowerPath

SCF/SYMAPI

IOS/PowerPath

Solutions EnablerConGroup definition

2

R1(Z)

R1(Y)

R1(X)

R2(Y)

R2(Z)

R2(X)

R1(C)

R1(B)

R1(A)

R2(B)

R2(C)

R2(A)

ICO-IMG-000520

Suspend R1/R2relationship

DBMSrestartablecopy



X = Application dataY = DBMS dataZ = Logs

1

1

3

45

6

7



5. The second I/O performs the suspend of the R1/R2 relationship for the logically grouped devices, which immediately disables all replication to the remote site. This allows other devices outside of the group to continue replicating provided the communication links are available.

6. After the R1/R2 relationship is suspended, all deferred write I/Os are released, allowing the predecessor log write to complete to the host. The dependent data write is issued by the DBMS and arrives at X but is not replicated to the R2(X).

7. If a complete failure occurred from this rolling disaster, dependent-write consistency at the remote site is preserved. If a complete disaster did not occur and the failed links were activated again, the consistency group replication could be resumed once synchronous mode is achieved. It is recommended to create a copy of the dependent-write consistent image while the resume occurs. Once the SRDF process reaches synchronization the dependent-write consistent copy is achieved at the remote site.

SRDF/S with multiple source Symmetrix arraysThe implications of spreading a database across multiple Symmetrix arrays or across multiple RA groups and replicating in synchronous mode were discussed in previous sections. The challenge in this type of scenario is to protect against a rolling disaster. SRDF consistency groups can be used to avoid data corruption in a rolling disaster situation.

Consider the architecture depicted in Figure 55 on page 258.


258


Figure 55 SRDF/S with multiple source Symmetrix arrays and ConGroup protection

To protect against a rolling disaster, a consistency group can be created that encompasses all the volumes on all Symmetrix arrays participating in replication as shown by the blue-dotted oval.

The following steps outline the process of using Solutions Enabler (SYMCLI) commands to set up synchronous replication with consistency groups:

1. Create a consistency group for the source side of the synchronous relationship (the R1 side):

symcg create device_group -type rdf1 -ppath

2. Add to the consistency group the R1 devices 121 and 12f from Symmetrix with ID 111, and R1 devices 135 and 136 from Symmetrix with ID 222:

symcg -cg device_group add dev 121 -sid 111symcg -cg device_group add dev 12f -sid 111symcg -cg device_group add dev 135 -sid 222symcg -cg device_group add dev 136 -sid 222

Archivelogs

Redologs

Datafiles

Archivelogs

Redologs

Datafiles

Archivelogs

Redologs

Datafiles

Archivelogs

Redologs

Datafiles

Source Target

Syn

chro

no

us

Oracle

ICO-IMG-000521



3. Before the synchronous mode of SRDF can be established, the initial instantiation of the database is required. In other words, first create the baseline full copy of all the volumes participating in the synchronous replication. This is usually accomplished using adaptive copy mode of SRDF.

4. Put the group device_group into adaptive copy mode:

symrdf -cg device_group set mode acp_disk -noprompt

5. Instruct the source Symmetrix array to send all tracks at the source site to the target site using the current mode:

symrdf -cg device_group establish -full -noprompt

6. Adaptive copy mode has no host impact. It transmits tracks to the remote site never sent before or that have changed since the last time the track was sent. It does not preserve write order or consistency. When both sides are synchronized, SRDF can be put into synchronous mode. In the following command, the device group proddb is put into synchronous mode:

symrdf -cg device_group set mode sync -noprompt

7. Enable consistency protection:

symcg -cg device_group enable -noprompt

Note: There is no requirement for a host at the remote site during the synchronous replication. The target Symmetrix array manages the in-bound writes and updates the appropriate disks in the array.


260


SRDF/ASRDF/A, or asynchronous SRDF, is a method of replicating production data changes from one Symmetrix array to another using delta set technology. Delta sets are the collection of changed blocks grouped together by a time interval configured at the source site. The default time interval is 30 seconds. The delta sets are then transmitted from the source site to the target site in the order created. SRDF/A preserves dependent-write consistency of the database at all times at the remote site.

The distance between the source and target Symmetrix arrays is unlimited and there is no host impact. Writes are acknowledged immediately when they hit the cache of the source Symmetrix array. SRDF/A is only available on the DMX family of Symmetrix arrays. Figure 56 shows the process.

Figure 56 SRDF/A replication internals

1. Writes are received in the source Symmetrix cache. The host receives immediate acknowledgement that the write is complete. Writes are gathered into the capture delta set for 30 seconds.

2. A delta set switch occurs and the current capture delta set becomes the transmit delta set by changing a pointer in cache. A new empty capture delta set is created.

3. SRDF/A sends the changed blocks in the transmit delta set to the remote Symmetrix array. The changes collect in the receive delta set at the target site. When the replication of the transmit delta set

R1N

R1N

N-1

N-1

N-1

N-1

R2N-2

R2N-2

Oracle

ICO-IMG-000522

1 5 2 3 4



is complete, another delta set switch occurs and a new empty capture delta set is created with the current capture delta set becoming the new transmit delta set. The receive delta set becomes the apply delta set.

4. The apply delta set marks all the changes in the delta set against the appropriate volumes as invalid tracks and begins destaging the blocks to disk.

5. The cycle repeats continuously.

With sufficient bandwidth for the source database write activity, SRDF/A will transmit all changed data within the default 30 seconds. This means that the maximum time the target data will be behind the source is 60 seconds (two replication cycles). At times of high-write activity, it may be impossible to transmit all the changes that occur during a 30-second interval. This means the target Symmetrix array will fall behind the source Symmetrix array by more than 60 seconds. Careful design of the SRDF/A infrastructure and a thorough understanding of write activity at the source site are necessary to design a solution that meets the RPO requirements of the business at all times.

Consistency is maintained throughout the replication process on a delta-set boundary. The Symmetrix array will not apply a partial delta set, which would invalidate consistency. Dependent-write consistency is preserved by placing a dependent write in either the same delta set as the write it depends on or a subsequent delta set.

Note: There is no requirement for a host at the remote site during asynchronous replication. The target Symmetrix array manages in-bound writes and updates the appropriate disks in the array.

Different command sets are used to enable SRDF/A depending on whether the SRDF/A group of devices is contained within a single Symmetrix array or is spread across multiple Symmetrix arrays.

SRDF/A using a single source Symmetrix arrayBefore the asynchronous mode of SRDF is established, initial instantiation of the database has to occur. In other words, a baseline full copy of all the volumes participating in the asynchronous replication must be executed first. This is usually accomplished using the adaptive copy mode of SRDF.

SRDF/A 261

262


The following steps outline the process of using Solutions Enabler (SYMCLI) commands to set up asynchronous replication.

1. Create an SRDF disk group for the source side of the synchronous relationship (the R1 side):

symdg create device_group -type rdf1

2. Add to the device group the R1 devices 121 and 12f from the Symmetrix array with ID 111, and R1 devices 135 and 136 from the Symmetrix array with ID 222:

symld -g device_group add dev 121 -sid 111symld -g device_group add dev 12f -sid 111symld -g device_group add dev 135 -sid 222symld -g device_group add dev 136 -sid 222

3. Put the group proddb into adaptive copy mode:

symrdf -g device_group set mode acp_disk -noprompt

4. Instruct the source Symmetrix array to send all the tracks at the source site to the target site using the current mode:

symrdf -g device_group establish -full -noprompt

5. The adaptive copy mode of SRDF has no impact to host application performance. It transmits tracks to the remote site never sent before or changed since the last time the track was sent. It does not preserve write order or consistency. When both sides are synchronized, SRDF can be put into asynchronous mode. In the following command, the device group proddb is put into asynchronous mode:

symrdf -g device_group set mode async -noprompt

Note: There is no requirement for a host at the remote site during the asynchronous replication. The target Symmetrix array manages the in-bound writes and updates the appropriate disks in the array.

SRDF/A multiple source Symmetrix arrays

When a database is spread across multiple Symmetrix arrays and SRDF/A is used for long-distance replication, separate software must be used to manage the coordination of the delta-set boundaries between the participating Symmetrix arrays and to stop replication if any of the volumes in the group cannot replicate for any reason. The



software must ensure that all delta-set boundaries on every participating Symmetrix array in the configuration are coordinated to give a dependent-write consistent point-in-time image of the database.

SRDF/A multisession consistency (MSC) provides consistency across multiple RA groups and/or multiple Symmetrix arrays. MSC is available on 5671 microcode and later with Solutions Enabler V6.0 and later. SRDF/A with MSC is supported by an SRDF process daemon that performs cycle-switching and cache recovery operations across all SRDF/A sessions in the group. This ensures that a dependent-write consistent R2 copy of the database exists at the remote site at all times. A composite group must be created using the SRDF consistency protection option (-rdf_consistency) and must be enabled using the symcg enable command before the RDF daemon begins monitoring and managing the MSC consistency group. The RDF process daemon must be running on all hosts that can write to the set of SRDF/A volumes being protected. At the time of an interruption (SRDF link failure, for instance), MSC analyzes the status of all SRDF/A sessions and either commits the last cycle of data to the R2 target or discards it.

The following steps outline the process using Solutions Enabler (SYMCLI) commands to set up synchronous replication with consistency groups.

1. Create the replication composite group for the SRDF/A devices:

symcg create device_group -rdf_consistency -type rdf1

The -rdf_consistency option indicates the volumes in the group are to be protected by MSC.

2. Add to the composite group named device_group the R1 devices 121 and 12f from the Symmetrix array with ID 111 and R1 devices 135 and 136 from the Symmetrix array with ID 222:

symcg -cg device_group add dev 121 -sid 111symcg -cg device_group add dev 12f -sid 111symcg -cg device_group add dev 135 -sid 222symcg -cg device_group add dev 136 -sid 222

3. Before the asynchronous mode of SRDF can be established, the initial instantiation of the database is required. In other words, first create the baseline full copy of all the volumes participating

SRDF/A 263

264


in the asynchronous replication. This is usually accomplished using the adaptive copy mode of SRDF. The following command puts the group proddb into adaptive copy mode:

symrdf -g device_group set mode acp_disk -noprompt4. Instruct the source Symmetrix array to send all the tracks at the

source site to the target site using the current mode:

symrdf -g device_group establish -full -noprompt5. The adaptive copy mode of SRDF has no impact on host

application performance. It transmits tracks to the remote site never sent before or changed since the last time the track was sent. It does not preserve write order or consistency. When both sides are synchronized, SRDF can be put into asynchronous mode. In the following command, the device group proddb is put into asynchronous mode:

symrdf -g device_group set mode async -noprompt

6. Enable multisession consistency for the group:

symcg -cg device_group enable.

Note: There is no requirement for a host at the remote site during the asynchronous replication. The target Symmetrix array itself manages the in-bound writes and updates the appropriate disks in the array.

How to restart in the event of a disasterIn the event of a disaster when the primary source Symmetrix array is lost, run database and application services from the DR site. A host at the DR site is required for this. If the device_group device group is not built yet on the remote host, it must first be created using the R2 devices that were mirrors of the R1 devices on the source Symmetrix array. The first thing that must be done is to write-enable the R2 devices.

symld -g device_group rw_enable -nopromptR2s on a single Symmetrix

symcg -cg device_group rw_enable -nopromptR2s on multiple Symmetrix

At this point, the host can issue the necessary commands to access the disks. For instance, on a UNIX host, import the volume group, activate the logical volumes, fsck the file systems, and mount them.



Once the data is available to the host, the database can be restarted. The database will perform crash recovery when restarted. Transactions committed, but not completed, are rolled forward and completed using the information in the redo logs. Transactions with updates applied to the database, but not committed, are rolled back. The result is a transactionally consistent database.

SRDF/A 265

266


SRDF/AR single hopSRDF Automated Replication, or SRDF/AR, is a continuous movement of dependent-write consistent data to a remote site using SRDF adaptive copy mode and TimeFinder consistent split technology. TimeFinder BCVs are used to create a dependent-write consistent point-in-time image of the data to be replicated. The BCVs also have an R1 personality, which means that SRDF in adaptive copy mode can be used to replicate the data from the BCVs to the target site. Since the BCVs are not changing, replication completes in a finite length of time. The length of time for replication depends on the size of the network "pipe" between the two locations, the distance between the two locations, the quantity of changed data tracks, and the locality of reference of the changed tracks. On the remote Symmetrix array, another BCV copy of the data is made using data on the R2s. This is necessary because the next SRDF/AR iteration replaces the R2 image in a nonordered fashion, and if a disaster were to occur while the R2s were synchronizing, there would not be a valid copy of the data at the DR site. The BCV copy of the data in the remote Symmetrix array is commonly called the "gold" copy of the data. The whole process then repeats.

With SRDF/AR, there is no host impact. Writes are acknowledged immediately when they hit the cache of the source Symmetrix array. Figure 57 shows the process.

Figure 57 SRDF/AR single-hop replication internals

STD

STD

BCV/R1

BCV/R1

R2

R2

BCV

BCVOracle

ICO-IMG-000523

1

5

2

3 4



1. Writes are received in the source Symmetrix cache and are acknowledged immediately. The BCVs are already synchronized with the STDs at this point. A consistent split is executed against the STD-BCV pairing to create a point-in-time image of the data on the BCVs.

2. SRDF transmits the data on the BCV/R1s to the R2s in the remote Symmetrix array.

3. When the BCV/R1 volumes are synchronized with the R2 volumes, they are reestablished with the standards in the source Symmetrix array. This causes the SRDF links to be suspended. At the same time, an incremental establish is performed on the target Symmetrix array to create a "gold" copy on the BCVs in that frame.

4. When the BCVs in the remote Symmetrix array are fully synchronized with the R2s, they are split and the configuration is ready to begin another cycle.

5. The cycle repeats based on configuration parameters. The parameters can specify the cycles to begin at specific times, specific intervals, or to run back to back.

Cycle times for SRDF/AR are usually in the minutes to hours range. The RPO is double the cycle time in a worst-case scenario. This may be a good fit for customers with relaxed RPOs.

The added benefit of having a longer cycle time is that the locality of reference will likely increase. This is because there is a much greater chance of a track being updated more than once in a 1-hour interval than in, for example, a 30-second interval. The increase in locality of reference shows up as reduced bandwidth requirements for the final solution.

Before SRDF/AR starts, instantiation of the database has too occur. In other words, first create a baseline full copy of all the volumes participating in the SRDF/AR replication. This requires a full establish to the BCVs in the source array, a full SRDF establish of the BCV/R1s to the R2s, and a full establish of the R2s to the BCVs in the target array. There is an option to automate the initial setup of the relationship.

As with other SRDF solutions, SRDF/AR does not require a host at the DR site. The commands to update the R2s and manage the synchronization of the BCVs in the remote site are all managed in-band from the production site.

SRDF/AR single hop 267

268


Note: SRDF/AR is primarily a restartable solution. While SRDF/AR may also be used as a recoverable solution, difficulties arise because of the need to split the archive logs separately from the data files after taking the tablespaces out of hot backup mode. Because of this limitation, SRDF/AR is not recommended in Oracle environments that plan on creating a recoverable database image at the target site.

How to restart in the event of a disasterIn the event of a disaster, it is necessary to determine if the most current copy of the data is located on the remote site BCVs or R2s at the remote site. Depending on when in the replication cycle the disaster occurs, the most current version could be on either set of disks.



SRDF/AR multihopSRDF/AR multihop is an architecture that allows long-distance replication with zero seconds of data loss through use of a bunker Symmetrix array. Production data is replicated synchronously to the bunker Symmetrix array, which is within 200 km of the production Symmetrix array allowing synchronous replication, but also far enough away that potential disasters at the primary site may not affect it. Typically, the bunker Symmetrix array is placed in a hardened computing facility.

BCVs in the bunker frame are periodically synchronized to the R2s and consistent split in the bunker frame to provide a dependent-write consistent point-in-time image of the data. These bunker BCVs also have an R1 personality, which means that SRDF in adaptive copy mode can be used to replicate the data from the bunker array to the target site. Since the BCVs are not changing, the replication can be completed in a finite length of time. The replication time depends on the size of the "pipe" between the bunker location and the DR location, the distance between the two locations, the quantity of changed data, and the locality of reference of the changed data. On the remote Symmetrix array, another BCV copy of the data is made using the R2s. This is because the next SRDF/AR iteration replaces the R2 image, in a nonordered fashion, and if a disaster were to occur while the R2s were synchronizing, there would not be a valid copy of the data at the DR site. The BCV copy of the data in the remote Symmetrix array is commonly called the "gold" copy of the data. The whole process then repeats.

SRDF/AR multihop 269

270


With SRDF/AR multihop, there is minimal host impact. Writes are only acknowledged when they hit the cache of the bunker Symmetrix array and a positive acknowledgment is returned to the source Symmetrix array. Figure 58 depicts the process.

Figure 58 SRDF/AR multihop replication Internals

1. BCVs are synchronized and consistently split against the R2s in the bunker Symmetrix array. The write activity is momentarily suspended on the source Symmetrix array to get a dependent-write consistent point-in-time image on the R2s in the bunker Symmetrix array, which creates a dependent-write consistent point-in-time copy of the data on the BCVs.

2. SRDF transmits the data on the bunker BCV/R1s to the R2s in the DR Symmetrix array.

3. When the BCV/R1 volumes are synchronized with the R2 volumes in the target Symmetrix array, the bunker BCV/R1s are established again with the R2s in the bunker Symmetrix array. This causes the SRDF links to be suspended between the bunker and DR Symmetrix arrays. Simultaneously, an incremental establish is performed on the DR Symmetrix array to create a gold copy on the BCVs in that frame.

4. When the BCVs in the DR Symmetrix array are fully synchronized with the R2s, they are split and the configuration is ready to begin another cycle.

BCV/R1

BCV/R1

R1

R1

R2

R2

R2

R2

BCV

BCV

Short Distance Long Distance

Oracle

Production Bunker

ICO-IMG-000524

1 52

DR

3

4



5. The cycle repeats based on configuration parameters. The parameters can specify the cycles to begin at specific times, specific intervals, or to run immediately after the previous cycle completes.

Even though cycle times for SRDF/AR multihop are usually in the minutes to hours range, the most current data is always in the bunker Symmetrix array. Unless there is a regional disaster that destroys both the primary site and the bunker site, the bunker Symmetrix array will transmit all data to the remote DR site. This means zero data loss at the point of the beginning of the rolling disaster or an RPO of 0 seconds. This solution is a good fit for customers with a requirement of zero data loss and long- distance DR.

An added benefit of having a longer cycle time means that the locality of reference will likely increase. This is because there is a much greater chance of a track being updated more than once in a 1-hour interval than in say a 30-second interval. The increase in locality of reference shows up as reduced bandwidth requirements for the network segment between the bunker Symmetrix arrayand the DR Symmetrix array.

Before SRDF/AR can be initiated, initial instantiation of the database is required. In other words, first create a baseline full copy of all the volumes participating in the SRDF/AR replication. This requires a full establish of the R1s in the source location to the R2s in the bunker Symmetrix array. The R1s and R2s need to be synchronized continuously. The following then occur:

◆ A full establish from the R2s to the BCVs in the bunker Symmetrix array

◆ A full SRDF establish of the BCV/R1s to the R2s in the DR Symmetrix array

◆ A full establish of the R2s to the BCVs in the DR Symmetrix system are performed.

How to restart in the event of a disasterIn the event of a disaster, it is necessary to determine if the most current copy of the data is on the R2s on the remote site or on the BCV/R1s in the bunker Symmetrix array. Depending on when the disaster occurs, the most current version could be on either set of disks.

SRDF/AR multihop 271

272


Database log-shipping solutionsLog shipping is a strategy that some companies employ for disaster recovery. The process only works for databases using archive logging. The essence of log shipping is that changes to the database at the source site reflected in the log are propagated to the target site. These logs are then applied to a "standby" database at the target site to maintain a consistent image of the database that can be used for DR purposes.

Overview of log shipping

The change activity on the source database generates log information eventually copied from the redo logs to the archive logs to free up active log space. A process external to the database takes the archived logs and transmits them (usually over IP) to a remote DR location. This location has a database in standby mode. A server at the standby location receives the archive logs and uses them to roll forward changes to the standby database.

If a disaster were to happen at the primary site, the standby database is brought online and made available to users, albeit with some loss of data.

Log-shipping considerations

When considering a log shipping strategy it is important to understand:

◆ What log shipping covers.◆ What log shipping does not cover.◆ Server requirements.◆ How to instantiate and reinstantiate the database.◆ How failback works.◆ Federated consistency requirements.◆ How much data will be lost in the event of a disaster.◆ Manageability of the solution.◆ Scalability of the solution.



Log-shipping limitationsLog shipping transfers only the changes happening to the database that are written into the redo logs and then copied to an archive log. Consequently, operations happening in the database not written to the redo logs do not get shipped to the remote site. To ensure that all transactions are written to the redo logs, run the following command:

alter database force logging;

Log shipping is a database-centric strategy. It is completely agnostic and does not address changes that occur outside of the database. Changes include, but are not limited to the following:

◆ Application files and binaries◆ Database configuration files◆ Database binaries◆ OS changes◆ Flat files

To sustain a working environment at the DR site, there are several procedures required to keep these objects up to date.

Server requirementsLog shipping requires a server at the remote DR site to receive and apply the logs to the standby database. It may be possible to offset this cost by using the server for other functions when it is not being used for DR. Database licensing fees for the standby database may also apply.

How to instantiate and reinstantiate the databaseLog-shipping architectures need to be supported by a method of instantiating the database at the remote site. The method needs to be manageable and timely. For example, shipping 200 tapes from the primary site to the DR site may not be an adequate approach, considering the transfer time and database restore time.

Reinstantiation must also be managed. Some operations, mentioned above, do not carry over into the standby database. Periodically, it may be necessary to reinstantiate the database at the DR site. The process should be easily managed, but also should provide continuous DR protection. That is to say, there must be a contingency plan for a disaster during reinstantiation.

Database log-shipping solutions 273

274


How failback worksAn important component of any DR solution is designing a failback procedure. If the DR setup is tested with any frequency, this method should be simple and risk free. Log shipping is done in reverse and works well when the primary site is still available. In the case of a disaster where the primary site data is lost, the database is reinstantiated at the production site.

Federated consistency requirementsMost databases are not isolated islands of information. They frequently have upstream inputs and downstream outputs, triggers, and stored procedures that reference other databases. There may also be a workflow management system like MQ Series, Lotus Notes, or TIBCO managing queues containing work to be performed. This entire environment is a federated structure that needs to be recovered to the same point in time to get a transactionally consistent disaster restart point.

Log-shipping provides single database-centric solutions and are not adequate solutions in federated database environments.

Data loss expectationsIf sufficient bandwidth is provisioned for the solution, the amount of data lost in a disaster is going to be approximately two logs worth of information. In terms of time, it would be approximately twice the length of time it takes to create an archive log. This time will most likely vary during the course of the day due to fluctuations in write activity.

Manageability of the solutionThe manageability of a DR solution is a key to its success. Log-shipping solutions have many components to manage including servers, databases, and external objects as noted above. Some of the questions to answer to make a clear determination of the manageability of a log-shipping solution are:

◆ How much effort does it take to set up log shipping?

◆ How much effort is needed to keep it running on an on-going basis?

◆ What is the risk if something required at the target site is missed?

◆ If FTP is being used to ship the log files, what kind monitoring is needed to guarantee success?



Scalability of the solutionThe scalability of a solution is directly linked to its complexity. To successfully scale the DR solution, the following questions must be answered:

◆ How much more effort does it take to add more databases?

◆ How easy is the solution to manage when the database grows much larger?

◆ What happens if the quantity of updates increases dramatically?

Log shipping and remote standby databaseA remote standby database is an Oracle database that is a physical copy of the production database and requires a standby control file. It can be created by restoring a backup of the production database or by using a storage hardware mirroring technology along with the standby control file.

Figure 59 depicts a standby database kept up to date using log shipping.

Figure 59 Log shipping and remote standby database

1. The database must be instantiated at the DR site, either by tape or by using SRDF, or if it is small enough, shipping it over IP.

2. As redo logs are filled, they are copied into archive logs in the archive directory.

ICO-IMG-000525

Oracle Oracle

Otherdata

Redologs

Datafiles

Archivelogs

Otherdata

Activelogs

Datafiles

Archivelogs

3

2

1

4


276


3. A process external to the database copies the archive logs from the production site to the DR site.

4. Periodically, the archive logs are applied to the remote standby database.

A standby database is kept in mount mode so that new logs shipped from the production database can be applied. As new logs arrive, use the following command to apply the logs to the standby database:

recover from archive_dir standby database;In the event of a disaster, the standby database needs to be activated for production availability:

recover database;alter database open;

Note: Note that users need to run a catalog for the production database to connect to the new location or the IP address of the standby server needs to be updated to be the same as the failed production server.

Log shipping and standby database with SRDF

Rather than using a file transfer protocol to ship the logs from the source to the target site, SRDF can be used as the transfer mechanism. The advantages to using SRDF to ship the logs are listed next.

Synchronous SRDF can be used to ship active and archive logs when the Symmetrix arrays are less than 200 km apart; this is a zero data loss solution. When the redo logs are replicated synchronously and a disaster occurs, all data up to the last completed transaction will be present at the remote site. When the database is restarted at the remote site, partial transactions will be rolled back and the committed transactions finished using log information.

In addition, federated DR requirements can be satisfied by synchronously replicating data external to the database.

◆ Guaranteed delivery - SRDF brings deterministic protocols into play, an important advantage over FTP. This means that SRDF guarantees what is sent is what is received. If packets are lost on the network, SRDF retransmits the packets as necessary.



◆ Restartability - SRDF also brings restartability to the log shipping functionality. Should a network outage or interruption cause data transfer to stop, SRDF can restart from where it left off after the network is returned to normal operations.

While replicating the logs to the remote site, the receiving volumes (R2s) cannot be used by the host, as they are read-only. If the business requirements are such that the standby database should be continually applying logs, BCVs can be used periodically to synchronize against the R2s and split. Then, the BCVs can be accessed by a host at the remote site and the archive logs retrieved and applied to the standby database.

Oracle Data Guard

In database release Oracle8i and more significantly in the subsequent Oracle9i release of the product, Oracle made significant changes to the standby database process described in the previous section. Rather than utilizing host- or storage-based replication methods for shipping the archive logs and manually applying them, Oracle created Data Guard, which automated the replication or redo information and the application of this data to a standby database under a single utility. Oracle9i and Oracle10g Data Guard allows DBAs to replicate transactions written to the primary database's redo logs, and either asynchronously or synchronously apply them to a standby database running either in a local or remote database environment. Oracle Data Guard provides businesses with a simple and efficient disaster recovery mechanism for their enterprise database protection needs.

Data Guard overviewOracle Data Guard ensures high availability, data protection, and disaster recovery of a database. Data Guard provides a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases to enable production Oracle databases to survive disasters and data corruptions. Data Guard maintains these standby databases as transactionally consistent copies of the production database. If the production database becomes unavailable because of a planned or unplanned outage, Data Guard can switch any standby database to the production role, minimizing the downtime associated with the outage. Data Guard can be used with traditional backup, restore, and cluster techniques to provide a high level of data protection and availability.


278


A Data Guard configuration consists of one production database, also referred to as the primary database, and one or more (up to nine) standby databases. The databases in a Data Guard configuration are connected by Oracle Net and may be dispersed geographically. There are no restrictions on where the databases are located, provided they can communicate with each other.

A standby database is a transactionally consistent copy of the primary database. Using a backup copy of the primary database, up to nine standby databases can be created and incorporated into a Data Guard configuration. Once created, Data Guard automatically maintains each standby database by transmitting redo data from the primary database, and then applying this redo to the standby.

Data Guard protection modesOracle Data Guard first became available in release Oracle8i. In that version of the software, synchronous replication of redo log transactions is unavailable; only asynchronous shipping of the archive logs could be configured. With Oracle8i Data Guard, the ARCH process was responsible for replicating the redo information over the network to the standby server. Once received by the remote host, a remote file server (RFS) process would write the redo information to an archive log where it could be applied to the standby database.

The Oracle9i Data Guard release substantially improved this process. In this release, the LGWR is used to replicate redo information from the primary to standby host. Because LGWR is used to copy the redo information, rather than the ARCH process, both synchronous and asynchronous replication is available. Three modes of operation are configurable in Oracle9i and Oracle10g Data Guard:

◆ Maximum Protection - This is a synchronous method of replicating redo information. Data must be written on the standby side before acknowledgement on the primary database. If network connectivity to the standby host is lost (that is, the primary and standby database cannot be kept in synchronization), then the primary database will be shut down. Network latency and bandwidth significantly impact primary database performance.

◆ Maximum Availability - This is a synchronous method of replicating redo information. Data must be written on the standby host before acknowledgement on the primary database. If network connectivity to the standby host is lost however, this



mode will allow continued primary database operations. Network latency and bandwidth significantly impact primary database performance.

◆ Maximum Performance - This is an asynchronous method of replicating redo information. Replicated log information is allowed to diverge from the primary database. Small amounts of data may not be replicated from the primary to standby database, but performance impact on the primary is minimal.

While the first two options provide synchronous replication between source and target servers, latency limitations usually prevent their use except in short-distance situations.

Data Guard servicesData Guard consists of three high-level services: Log Transport services, Log Apply services, and Role Management services.

◆ Log Transport - Log Transport services are used to ensure that redo log information is successfully replicated from the primary site to the target site. Log Transport uses Oracle Net over the customers existing LAN/WAN to replicate redo log information. Log information is transported in three forms: asynchronously as entire archive logs (using the ARC process), asynchronously as writes to the standby redo logs (LGWR writes to async log ship buffers), or synchronously as writes to the standby redo logs (using the LGWR process).

◆ Log Apply - Log Apply services are configured on the standby host and are used to read log information replicated from the primary database and apply it to the standby. Primary database log information is read from archived logs or standby redo logs, both located on the standby host. Two host processes are used for this: MRP (Redo Apply) and LSP (SQL Apply).

◆ Role Management - Role Management services are used to control switchover and failover of the primary database. During a switchover operation, the primary database is gracefully demoted to standby status, while a standby database is promoted to a primary without data loss. Failover operations are initiated when the primary fails. In this case, all log information is applied and the standby database is configured to run as the new primary database.

Figure 60 on page 280 shows a sample high-level overview of an Oracle10g Data Guard configuration. Additional details on the role of the processes shown in the diagram are found in the next section.


280


Figure 60 Sample Oracle10g Data Guard configuration

Data Guard processesOracle uses several old and new background processes (listed in Table 11) for managing a Data Guard environment.

ICO-IMG-000528

Primary DB

StandbyDB

Archive logArchive log

Redo log StandbyRedo log

LGWR

ARCn

RFS

ARCn

MRP or LSP

LOG_ARCH_DEST_1

Source site Target Site

Table 11 Background processes for managing a Data Guard environment (page 1 of 2)

Process Description

LGWR (Log Writer)

Sends Redo Log information from the primary host to the standby host via Oracle Net. LGWR can be configured to send data to standby redo logs on the standby host for synchronous operations.

ARCn(Archiver)

Sends primary database archive logs to the standby host. This process is used primarily in configurations that do not use standby redo logs and are configured for asynchronous operations.

RFS (Remote File Server)

Receives log data, either from the primary LGWR or ARCn processes, and write data on the standby site to either the standby redo logs or archive logs. This process is configured on the standby host when Data Guard is implemented.



Physical and logical standby databasesA standby database may be configured either as a physical or a logical standby database.

A physical standby database is a block-for-block copy of the primary database. Redo log information is read and applied by the Redo Apply process (MRP) to individual data blocks on the standby database, similar to the way recovery would be performed if needed on the primary database. This has a number of advantages including:

◆ A physical standby is an exact block-level copy of the primary database and is used as a backup source for recovery directly to the primary.

◆ Data Guard only ships change records from the redo logs to the target; this information can be significantly smaller than the actual block-level changes in the database.

◆ Recovery time is generally short as the database is in mount mode and redo log information is applied to the database as it is received. However, recovery times can only be reduced if logs are continuously being applied, requiring that the standby database always remain in managed recovery mode. While in managed recovery mode, the standby database may not be opened in read-only (or read/write with Oracle10g and Flashback) mode for user queries.

FAL (Fetch Archive Log)

Manages the retrieval of corrupted or missing archive from the primary to the standby host.

MRP(Managed Recovery)

Used by a physical standby database to apply logs, retrieved from either the standby redo logs or from local copies of archive logs.

LSP (Logical Standby)

Used by a logical standby database to apply logs, retrieved from either the standby redo logs or from local copies of the archive logs.

LNS (Network Server)

Enables asynchronous writes to the standby site using the LGWR process and standby redo logs.

Table 11 Background processes for managing a Data Guard environment (page 2 of 2)

Process Description


282


Some things to consider when employing a physical standby database include:

◆ A physical standby is completely dependent on redo log information. Therefore, if any type of unlogged operation is made to the primary database (such as batch loads, index creation, and so on) it will invalidate the standby.

◆ A physical standby only protects a single database source. If there are any external files that require protection or a data dependency exists between the primary database and another application, Data Guard will not protect these types of configurations. Examples of this include environments with asynchronous messaging, other nonOracle databases, related applications with additional data, or configuration files that require continuous protection.

◆ A physical standby database has no mechanism to coordinate Enterprise Consistency-the protection and recovery point between multiple databases. If loosely or tightly coupled databases require a single point of recovery, achieving this by means of Data Guard is operationally complex.

◆ In an environment where multiple databases exist, each requires its own Data Guard configuration and protection. In data centers where customers have many database instances running Data Guard, management of the database systems can become complex. Oracle Data Guard Broker is a distributed management framework that simplifies management in these types of environments.

A logical standby database also reads the redo log information replicated from the primary host. However, instead of applying changes directly to the database, SQL statements based on the redo information are generated and then run against the standby. The SQL Apply process is used to keep the logical standby database in close synchronization with the primary. Some advantages with a logical standby database include:

◆ Read/write access - With a logical standby, the database can be open (replicated tables are read-only) for queries or updates while the SQL Apply process is applying data.

◆ Altering the database - Database changes can be made to a logical standby that do not prevent additional updates from the primary. For example, additional indexes can be created on tables to improve query performance to support a reporting instance.



Likewise, new tables or materialized views could be created allowing read/write access or improve performance for user queries.

Additional considerations for a logical stand database are:

◆ Because a logical standby is not a physical copy of the primary database and has its own database physical structure and ID, it cannot be used for media recovery of the primary database. It can only maintain similar database content.

◆ Because a logical standby is not tied to the primary database structure, with the ability to add indexes, materialized views, and other Oracle objects to improve its usage, there is an increased chance that the standby will get out of sync with the primary database. This makes a logical standby database a less than optimal solution for DR.

Oracle Data Guard BrokerOracle Enterprise Manager (OEM) provides a web-based interface for viewing, monitoring, and administering primary and standby databases in a Data Guard configuration. The Data Guard Broker is a distributed management framework that automates and centralizes the creation, maintenance, and monitoring of a Data Guard implementation. Data Guard Broker can either use the OEM GUI or a CLI to automate and simplify:

◆ Creating and enabling Data Guard configurations, including setting up log transport and log apply services.

◆ Managing an entire Data Guard configuration from any system in the configuration.

In addition, the OEM GUI automates and simplifies:

◆ Creating a physical or logical standby database from a backup copy of the primary database.

◆ Adding new or existing standby databases to an existing Data Guard configuration.

◆ Monitoring log apply rates, capturing diagnostic information, and detecting problems quickly with centralized monitoring, testing, and performance tools.


284


Oracle Data Guard with EMC SRDFSRDF and Data Guard both provide customers with the ability to create and maintain a synchronous or asynchronous copy of their database for DR purposes at a remote site. The decision to implement SRDF or Data Guard depends on the specific business needs and requirements for the environment in which they are deployed. Although they essentially perform similar tasks, there are still cases where both products may be deployed together.

Prior to the Oracle9i release of Data Guard in which standby redo logs could be configured, synchronous replication to the standby site could not be enabled. To ensure no data loss between the primary and standby databases, SRDF/S can be configured in conjunction with Data Guard to replicate only the logs as shown in Figure 61.

Figure 61 "No data loss" standby database

In Oracle9i, standby redo logs were added to the database. Standby redo logs on the target side enable Data Guard to maintain a "no data loss" standby database. These logs act like regular redo logs but are available to receive log information synchronously from the primary database. When the primary database flushes redo data from the log buffers to disk, information is also sent via Oracle Net to the target site. It is received on the target by the RFS process, which then writes it to the standby logs. This, in conjunction with Oracle's Real Time Apply technology in Oracle10g, enables Data Guard to maintain a synchronous copy of the primary database at the standby site.

ICO-IMG-000525

Oracle Oracle

Otherdata

Redologs

Datafiles

Archivelogs

Otherdata

Activelogs

Datafiles

Archivelogs

3

2

1

4



With the enhanced capabilities of Data Guard, in some customer environments SRDF may be overlooked. However, even if Data Guard is planned for a particular environment, SRDF is a useful tool for instantiating and reinstantiating an Oracle database on the standby site. Instantiation involves creating a consistent copy of the primary database at the target site. SRDF not only simplifies this process, but it is also more efficient as incremental reestablishes may be used after the first initial full push of the data. SRDF provides an easy-to-use and efficient mechanism for replicating the database or outside data from the primary to standby sites, or vice versa in the event of recovery to the primary after a failover, whenever required.


286


Running database solutionsThis section contains the following information for running database solutions:

◆ “Overview” on page 286

◆ “Advanced Replication” on page 286

◆ “Oracle Streams” on page 287

OverviewRunning database solutions attempt to use DR solutions in an active fashion. Instead of having the database and server sitting idly waiting for a disaster to occur, the idea of having the database running and serving a useful purpose at the DR site is an attractive one. Also, active databases at the target site minimize the recovery time required to have an application available in the event of a failure of the primary. The problem is that hardware, server, and database replication-level solutions typically require exclusive access to the database, not allowing users to access the target database. The solutions presented in this section perform replication at the application layer and therefore allow user access even when the database is being updated by the replication process.

In addition to an Oracle Data Guard logical standby database, which can function as a running database while log information is being applied to it, Oracle has two other methods of synchronizing data between disparate running databases. These running database solutions are Oracle's Advanced Replication and Oracle Streams, which are described at a high level in the following sections.

Advanced ReplicationAdvanced Replication is one method of replicating objects between Oracle databases. Advanced Replication is similar to Oracle's previous Snapshot technology, where changes to the underlying tables were tracked internally within Oracle and used to provide a list of necessary rows to be sent to a remote location when a refresh of the remote object was requested. Instead of snapshots, Oracle now uses materialized views to track and replicate changes. Materialized views are a complete or partial copy of a target table from a single point in time.



Advanced Replication has two types of replication sites: master sites and materialized view sites. A master site contains information that is used as a source for the materialized view. A materialized view site is the target site for the data to be replicated. At the materialized view site, additional data may be written to the materialized views. These views also may be updated with information sent back to the master site. Materialized views with multiple master sites for a single data object are also possible. In these situations, complexity is increased due to the need to handle conflicting data added at each of the sites for replication to the others. This type of replication is called Multimaster Replication:

◆ Advanced Replication can use either asynchronous or synchronous (two-phase commit) replication.

◆ Advanced Replication is rarely used for DR purposes. It is typically used to replicate infrequently changing table data between databases.

Note: The Oracle documentation Oracle Database Advanced Replication provides more information on Advanced Replication.

Oracle Streams

Streams is Oracle's distributed transaction solution for propagating table, schema, or entire database changes to one or many other Oracle databases. Streams uses the concept of change records from the source database, which are used to asynchronously distribute changes to one or more target databases. Both DML and DDL changes can be propagated between the source and target databases. Queues on the source and target databases are used to manage change propagation between the databases.

The process for distributing transactions includes the following stages:

◆ Capture - LCRs are created that capture DML or DDL changes from targeted objects in the source databases.

◆ Stage - LCRs are stored in a queue to be forwarded on to the target database(s).

Running database solutions 287

288


◆ Propagate - LCRs are the passed via the network to queues located in the target database(s).

◆ Consume - LCRs are extracted off the queue with the corresponding DML or DDL changes being applied to the target database(s).

A key feature of Streams is the ability to detect and resolve conflicts between the databases participating in the replication process.

The Oracle Streams feature is rarely used for DR due to its asynchronous nature and inherent complexity.

Note: More detailed information on Streams is provided in the Oracle documentation Oracle Streams Concepts and Administration and Oracle Streams Replication Administrator's Guide.


7


◆ Introduction ...................................................................................... 290◆ The performance stack .................................................................... 291◆ Traditional Oracle layout recommendations ............................... 294◆ Symmetrix DMX performance guidelines.................................... 297◆ RAID considerations ....................................................................... 311◆ Host- versus array-based striping ................................................. 318◆ Data placement considerations ...................................................... 322◆ Other layout considerations ........................................................... 328◆ Oracle database-specific configuration settings .......................... 331◆ The database layout process........................................................... 333

Oracle DatabaseLayouts on EMCSymmetrix DMX

Oracle Database Layouts on EMC Symmetrix DMX 289

290

Oracle Database Layouts on EMC Symmetrix DMX

IntroductionMonitoring and managing database performance should be a continuous process in all Oracle environments. Establishing baselines and collecting database performance statistics for comparison against them are important to monitor performance trends and maintain a smoothly running system. The following section discusses the performance stack and how database performance should be managed in general. Subsequent sections discuss Symmetrix DMX layout and configuration issues to help ensure the database meets the required performance levels.



The performance stackPerformance tuning involves the identification and elimination of bottlenecks in the various resources that make up the system. Resources include the application, the code (SQL) that drives the application, the database, the host, and the storage. Tuning performance involves the following:

◆ Analyzing each of these individual components that make up an application

◆ Identifying bottlenecks or potential optimizations that can be made to improve performance

◆ Implementing changes that eliminate the bottlenecks or improve performance

◆ Verifying that the change has improved overall performance

Tuning performance is an iterative process and is performed until the benefits to be gained by continued tuning are outweighed by the effort required to tune the system.

Figure 62 on page 292 shows the various "layers" to be examined as a part of any performance analysis. The potential benefits achieved by analyzing and tuning a particular layer of the performance stack are not equal, however. In general, tuning the upper layers of the performance stack, such as the application and SQL statements, provides a much better return on investment than tuning the lower layers, such as the host or storage layers. For example, implementing a new index on a heavily used table that changes logical access from a full table scan to index lookup with individual row selection can vastly improve database performance if the statement is run many times (thousands or millions) a day.

When tuning an Oracle database application, developers, DBAs, system administrators, and storage administrators need to work together to monitor and manage the process. Efforts should begin at the top of the stack and address application and SQL statement tuning before moving down into the database and host-based tuning parameters. After all of these are addressed, storage-related tuning efforts should be performed.

The performance stack 291

292


Figure 62 The performance stack

Importance of I/O avoidanceThe primary goal at all levels of the performance stack is disk I/O avoidance. In theory, an ideal database environment is one in which most I/Os are satisfied from memory rather than going to disk to retrieve the required data. In practice however, this is unrealistic and careful consideration of the disk I/O subsystem is necessary. Optimizing performance of an Oracle database on an EMC Symmetrix DMX involves a detailed evaluation of the I/O requirements of the proposed application or environment. A thorough understanding of the performance characteristics and best practices of the Symmetrix array, including the underlying storage components (disks, directors, and so on) is also needed. Additionally, knowledge of complementary software products such as EMC SRDF, EMC TimeFinder, EMC Symmetrix Optimizer, and backup software, along with how using these products will affect the database, is important for maximizing performance. Ensuring optimal configuration for the Oracle database requires a holistic approach to

ICO-IMG-000040

Application

SQL Statements

DB Engine

Operating System

Storage System

Poorly written application,inefficient code

SQL logic errors, missing index

Database resource contention

File system parameters settings,kernel tuning, I/O distribution

Storage allocation errors,volume contention



application, host, and storage configuration planning. Configuration considerations for host- and application-specific parameters are beyond the scope of this document. Storage configuration considerations are covered in this chapter.

Storage-system layer considerations

What is the best way to configure Oracle on EMC Symmetrix DMX storage? This is a frequently asked question from customers. However, before recommendations are made, a detailed understanding of the configuration and requirements for the database, host(s), and storage environment is required. The principal goal for optimizing any layout on the Symmetrix DMX is to maximize the spread of I/O across the components of the array, reducing or eliminating any potential bottlenecks in the system. The following sections examine the trade-offs between optimizing storage performance and manageability for Oracle. They also discuss recommendations for laying out an Oracle database on EMC Symmetrix DMX arrays, as well as settings for storage-related Oracle configuration settings.

The performance stack 293

294


Traditional Oracle layout recommendationsUntil recently, with the introduction of Automated Storage Management (ASM), Oracle's best practices for optimally laying out a database were focused on identifying potential sources of contention for storage-related resources. Eliminating contention involved understanding how the database managed the data flow process and ensuring that concurrent or near-concurrent storage resource requests were separated onto different physical spindles. Many of these recommendations still have value in a Symmetrix DMX environment. Before examining other storage-based optimizations, the next session presents a discussion of these recommendations.

Oracle's optimal flexible architecture

Oracle has long recommended their Optimal Flexible Architecture (OFA) for laying out databases on the storage. Although there is much disagreement as to whether OFA provides an optimal storage layout, many of the recommendations continue to make sense in Oracle environments on a Symmetrix DMX array. Some of the recommendations for performance that still generally apply include:

◆ Place redo log members (in addition to log groups) on separate hypers/spindles. This minimizes contention for the logs as new writes come in from the database and the old log is written to an archive log. It also isolates the sequential write and read activity for these members from other volumes with different access methods.

◆ "Redo logs and archive logs on separate hypers/spindles. This minimizes disk contention when writing to the archive logs while reading from the previous redo log.

◆ Separate INDEX tablespaces from their DATA counterparts. Index reads that result in table reads are better serviced from different physical devices to minimize disk contention and head movement.

◆ Isolate TEMP and UNDO tablespaces from DATA and INDEX information. TEMP and UNDO typically do long sequential writes and reads. Sequential access to data should be isolated from more random-access object types to limit head movement and improve performance.



Replication of Oracle databases also plays a critical role in the way a database should be laid out. To create a backup image of a database while it is open or hot, Oracle requires that the archive logs must be replicated after the "inconsistent" data tablespaces (DATA, INDEX, and so on) have been replicated. When using replication software such as EMC TimeFinder or SRDF, log information must be copied after the data is replicated. This requires that log information reside on separate hypers from the data volumes. When configuring Oracle in a Symmetrix DMX environment the archive logs, redo logs, and control files should be placed on separate hypervolumes from other data volumes. Also, because it is easier to re-create a TEMP tablespace rather than replicate it (either locally or remotely), temporary tablespaces should also be placed on their own separate hypervolumes. A TEMP tablespace would then be re-created using a "CREATE TEMPORARY TABLESPACE TEMP. . ." while the database is in mount mode, before it is fully opened.

OFA provides some general recommendations for laying out an Oracle database on a storage array. The key point with OFA, or any recommendation for optimizing the layout, is that it is critical to understand both the type (sequential or random), size (large or small), and quantity (low, medium or high) of I/O against the various tablespaces and other elements (logs, control files, and so on) of the database. Without a clear understanding of the data elements and the access patterns expected against them, contention issues on the back-end directors or physical spindles may arise and seriously degrade Oracle performance. Knowledge of the application, both data elements and access patterns, is critical to ensuring high performance in the database environment.

Oracle layouts and replication considerationsIf it is planned to use array replication technologies like TimeFinder and SRDF, it is prudent to organize the database in such a way as to facilitate recovery. Since array replication techniques copy volumes at the physical disk level (as seen by the host), all datafiles for a database should be created on a set of disks dedicated to the database and not be shared with other applications and databases. For UNIX systems, ensure that the data files reside in a volume group dedicated to the database. Sharing with other applications can cause unnecessary work for the array and wasted space on the target volumes.

Traditional Oracle layout recommendations 295

296


In addition to isolating the database to be copied onto its own dedicated volumes, the database should be divided into two parts: the data structures and the recovery structures. The recovery structures consist of the redo logs, the archive logs, and the control files. The database data volumes hold the data files for all tablespaces in the database and the ORACLE_HOME directory if desired.

Automated Storage ManagementA new feature with Oracle10g release 1 related to Oracle data layouts is Oracle Automated Storage Management (ASM). Using ASM reduces database layout complexity and management in Oracle10g environments since the database itself determines where "extents" for the database are placed on how they are managed.



Symmetrix DMX performance guidelinesOptimizing performance for Oracle in an EMC Symmetrix DMX environment is similar to optimizing performance for all applications on the storage array. Maximizing performance requires a clear understanding of the I/O requirements of the applications accessing storage. The overall goal when laying out an application on disk devices in the back-end of the Symmetrix DMX is to reduce or eliminate bottlenecks in the storage system by spreading out the I/O across all of the array's resources. Inside a Symmetrix DMX array, there are several areas to consider:

◆ Front-end connections into the Symmetrix DMX — This includes the number of connections from the host to the Symmetrix DMX that are required, and whether front-end Fibre Channel ports will be directly connected or a SAN will be deployed to share ports between hosts.

◆ Memory cache in the Symmetrix DMX — All host I/Os pass through cache on the Symmetrix DMX. I/O can be adversely affected if insufficient cache is configured in the Symmetrix DMX for the environment. Also, writes to individual hypervolumes or to the array as a whole may be throttled when a threshold known as the "write-pending limit" is reached.

◆ Back-end considerations — There are two sources of possible contention in the back-end of the Symmetrix array: the back-end directors and the physical spindles. Proper layout of the data on the disks is needed to ensure satisfactory performance.

Front-end connectivityOptimizing front-end connectivity requires an understanding of the number and size of I/Os, both reads and writes, which will be sent between the hosts and the Symmetrix DMX array. There are limitations to the amount of I/O that each front-end director port, each front-end director processor, and each front-end director board can handle. Additionally, SAN fan-out counts (that is, the number of hosts that can be attached through a Fibre Channel switch to a single front-end port) need to be carefully managed.

A key concern when optimizing front-end performance is determining which of the following I/O characteristics is more important in the customer's environment:

Symmetrix DMX performance guidelines 297

298


◆ Input/output operations per second (IOPS)

◆ Throughput (MB/s)

◆ A combination of IOPS and throughput

In OLTP database applications, where I/Os are typically small and random, IOPS is the more important factor. In DSS applications, where transactions in general require large sequential table or index scans, throughput is the more critical factor. In some databases, a combination of OLTP- and DSS-like I/Os are required. Optimizing performance in each type of environment requires tuning the host I/O size.

Figure 63 depicts the relationships between the block size of a random read request from the host, and both IOPS and throughput needed to fulfill that request from the Symmetrix DMX.

Figure 63 Relationship between host block size and IOPS/throughput

The figure shows that the maximum number of IOPS is achieved using smaller block sizes such as 4 KB (4096 bytes). For OLTP applications, where the typical Oracle DB_BLOCK_SIZE is 4 KB or 8 KB, the Symmetrix DMX provides higher IOPS, but decreased throughput. The opposite is also true for DSS applications. Tuning

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

Per

cent

of m

axim

um

512 4096 8192 32768 65536

Blocksize

I/O per sec

MB per sec

ICO-IMG-000042

IOPS and throughput vs. blocksizeRandom read cache hit



the host to send larger I/O sizes for DSS applications can increase the overall throughput (MB/s) from the front-end directors on the DMX. Database block sizes are generally larger (16 KB or even 32 KB) for DSS applications. Sizing the host I/O size as a power of two multiple of the DB_BLOCK_SIZE and tuning the MULTI_BLOCK_READ_COUNT appropriately is important for maximizing performance in a customer's Oracle environment.

Currently, each Fibre Channel port on the Symmetrix DMX is theoretically capable of 200 MB/s of throughput. In practice however, the throughput available per port is significantly less and depends on the I/O size and on the shared utilization of the port and processor on the director. Increasing the size of the I/O from the host perspective decreases the number of IOPS that can be performed, but increases the overall throughput (MB/s) of the port. As such, increasing the I/O block size on the host is beneficial for overall performance in a DSS environment. Limiting total throughput to a fraction of the theoretical maximum (100 to 120 MB/s is a good "rule of thumb") will ensure that enough bandwidth is available for connectivity between the Symmetrix DMX and the host.

Symmetrix cacheThe Symmetrix cache plays a key role in improving I/O performance in the storage subsystem. The cache improves performance by allowing write acknowledgements to be returned to a host when data is received in solid-state cache, rather than being fully destaged to the physical disk drives. Additionally, reads benefit from cache when sequential requests from the host allow follow-on reads to be prestaged in cache. The following briefly describes how the Symmetrix cache is used for writes and reads, and then discusses performance considerations for it.

Write operations and the Symmetrix cacheAll write operations on a Symmetrix array are serviced by cache. When a write is received by the front-end director, a cache slot must be found to service the write operation. Since cache slots are a representation of the underlying hypervolume, if a prior read or write operation caused the required data to already be loaded into cache, the existing cache slot may be used to store the write I/O. If a cache slot representing the storage area is not found, a call is made to locate a free cache slot for the write. The write operation is moved to the cache slot and the slot is then marked write pending. At a later


300


point, Enginuity will destage the write to physical disk. The decision of when to destage is based on overall system load, physical disk activity; read operations to the physical disk, and availability of cache.

Cache is used to service the write operation to optimize the performance of the host system. As write operations to cache are significantly faster than physical writes to disk media, the write is reported as complete to the host operating system much earlier. Battery backup and priority destage functions within the Symmetrix ensure that no data loss occurs in the event of system power failure.

If the write operation to a given disk is delayed due to higher priority operations (read activity is one such operation), the write-pending slot remains in cache for longer time periods. Cache slots are allocated as needed to a volume for this purpose. Enginuity calculates thresholds for allocations to limit the saturation of cache by a single hypervolume. These limits are referred to as write-pending limits.

Cache allocations are based on a per hypervolume basis. As write-pending thresholds are reached, additional allocations may occur, as well as reprioritization of write activity. As a result, write operations to the physical disks may increase in priority to ensure that excessive cache allocations do not occur. This is discussed in more detail in the next section.

Thus, the cache enables buffering of writes and allows for a steady stream of write activity to service the destaging of write operations from a host. In a "bursty" write environment, this serves to even out the write activity. Should the write activity constantly exceed the low write priority to the physical disk, Enginuity will raise the priority of write operations to attempt to meet the write demand. Ultimately, should write load from the host exceed the physical disk ability to write, the volume maximum write-pending limit may be reached. In this condition, new cache slots only will be allocated for writes to a particular volume once a currently allocated slot is freed by destaging it to disk. This condition, if reached, may severely impact write operations to a single hypervolume.



Read operations and the Symmetrix cacheAs mentioned in the previous section, read operations typically have an elevated priority for service from the physical disks. As user processes normally wait for an I/O operation to complete before continuing, this is generally a good practice for storage arrays, especially those able to satisfy write operations from cache.

When an I/O read request is detected from a host system, Enginuity sees if a corresponding cache slot representing the storage area exists. If so, the read request may be serviced immediately-this is considered a read hit. If the required data is not in cache but free slots are available, the read operation must wait for a transfer from disk-this is called a short read miss. If no cache slot exists for the read to be transferred into, then a cache slot must be allocated and the read physically transferred into the slot - this is referred to as a long read miss.

Although cache slots are 32 KB in size, a cache slot may contain only the requested data. That is, if a read request is made for an 8 KB block, then only that 8 KB block will be transferred into the cache slot, as opposed to reading the entire 32 KB track from disk. The smallest read request unit is 4 KB (a sector).

Note: In the DMX-3, the cache slot size increases from 32 KB to 64 KB. Sectors also increase from 4 KB to 8 KB.

Symmetrix cache performance considerationsAn important performance consideration is to ensure that an appropriate amount of cache is installed in the Symmetrix DMX. All I/O requests from hosts attached to the array are serviced from the Symmetrix DMX cache. Symmetrix DMX cache can be thought of as an extension to database buffering mechanisms. As such, many database application environments can benefit from additional Symmetrix DMX cache. With newly purchased arrays, appropriately sizing the cache is performed by the sales team, based on the number and size of physical spindles, configuration (including number and type of volumes), replication requirements (SRDF for example), and customer requirements.

Cache usage can be monitored through a number of Symmetrix DMX monitoring tools. Primary among these is ControlCenter Performance Manager (formerly known as WorkLoad Analyzer). Performance Manager contains a number of views that analyze Symmetrix DMX cache utilization at both the hypervolume and


302


overall system level. Views provide detailed information on specific component utilizations including disks, directors (front-end and back-end), and cache utilization.

Symmetrix cache plays a key role in host I/O read and write performance. Read performance can be improved through prefetching by the Symmetrix DMX if the reads are sequential in nature. Enginuity algorithms detect sequential read activity and pre-stage reads from disk in cache before the data is requested. Write performance is also greatly enhanced because all writes are acknowledged back to the host when they reach cache rather than when they are written to disk. While reads from a specific hypervolume can use as much cache as is required to satisfy host requests (assuming free cache slots are available), the DMX limits the number of writes that can be written to a single volume (the write-pending limit discussed earlier). Understanding the Enginuity write-pending limits is important when planning for optimal performance.

As previously discussed, the write-pending limit is used to prevent high write rates to a single hypervolume from consuming all of the storage array cache for its use, at the expense of performance for reads or writes to other volumes. The write-pending limit for each hypervolume is determined at system startup and depends on the number and type of volumes configured and the amount of cache available. The limit is not dependent on the actual size of each volume. The more cache available, the more write requests that can be serviced in cache by each individual volume. While some sharing of unused cache may occur (although this is not guaranteed), an upper limit of three times the initial write-pending limit assigned to a volume is the maximum amount of cache any hypervolume can acquire for changed tracks. If the maximum write-pending limit is reached, destaging to disk must occur before new writes can come in. This forced destaging to disk before a new write is received into cache limits writes to that particular volume to physical disk write speeds. Forced destage of writes can significantly reduce performance to a hypervolume should the write-pending limit be reached. If performance problems with a particular volume are identified, an initial step in determining the source of the problem should include verification of the number of writes and the write-pending limit for that volume.

In addition to limits imposed at the hypervolume level, write-pending limits are imposed at the system level. Two key cache utilization points for the DMX are reached when 40 percent and 80



percent of the cache is used for pending writes. Under normal operating conditions, satisfying read requests from a host has greater priority than satisfying write requests. However, when pending writes consume 40 percent of cache, the Symmetrix DMX then prioritizes reads and writes equally. This reprioritization can have a profound affect on database performance. The degradation is even more pronounced if cache utilization for writes reaches 80 percent. At that point, the DMX begins a forced destage of writes to disk, with discernable performance degradation of both writes and reads. If this threshold is reached, it is a clear indicator that reexamination of both the cache and the total I/O on the array is needed.

Write-pending limits are also established for Symmetrix metavolumes. Metavolumes are created by combining two or more individual hypervolumes into a single logical device that is then presented to a host as a single logical unit (LUN). Metavolumes can be created as concatenated or striped metavolumes. Striped metavolumes use a stripe size of 960 KB. Concatenated metavolumes write data to the first hyper in the metavolume (meta head) and fill it before beginning to write to the next member of the metavolume. Write-pending limits for a metavolume are calculated on a member by member (hypervolume) basis.

Determining the write-pending limit and current number of writes pending per hypervolume can be done simply using SYMCLI commands.

The following SYMCLI command returns the write-pending limit for hypervolumes in a Symmetrix system:

symcfg -v list | grep pending

Max # of system write pending slots:162901Max # of DA write pending slots:81450Max # of device write pending slots:4719

The exact number of cache slots available to writes for a hypervolumes varies with the amount of cache available in the system. However, the maximum number of write pending slots an individual hypervolume uses is up to three times the maximum number of device write-pending slots listed above (3 * 4,719 = 14,157 write pending tracks).

The number of write-pending slots used by a host's devices is found using the SYMCLI command:

symstat -i 30 DEVICE KB/sec KB/sec % Hits %Seq Num WP


304


13:09:52 READ WRITE READ WRITE RD WRT READ Tracks

13:09:52 035A (Not Visible ) 0 0 0 0 N/A N/A N/A 2 0430 (Not Visible ) 0 0 0 0 100 0 100 2679 0431 (Not Visible ) 0 0 0 0 100 0 100 2527 0432 (Not Visible ) 0 0 0 0 82 28 0 2444 0434 (Not Visible ) 0 0 0 0 0 100 0 14157 0435 (Not Visible ) 0 0 0 0 0 100 0 14157 043A (Not Visible ) 0 0 0 0 N/A N/A N/A 49 043C (Not Visible ) 0 0 0 0 N/A N/A N/A 54 043E (Not Visible ) 0 0 0 0 N/A N/A N/A 15 043F (Not Visible ) 0 0 0 0 N/A N/A N/A 10 0440 (Not Visible ) 0 0 0 0 N/A N/A N/A 807 0441 (Not Visible ) 0 0 0 0 N/A N/A N/A 328 0442 (Not Visible ) 13 1 66 0 0 100 0 17 0443 (Not Visible ) 0 0 0 0 100 0 100 1597 0444 (Not Visible ) 0 0 0 0 N/A N/A N/A 4

From this, we see devices 434 and 435 have reached the device write-pending limit of 14,157. Further analysis on the cause of the excessive writes and methods of alleviating this performance bottleneck against these devices should be made.

Alternatively, Performance Manager may be used to determine the device write-pending limit, and whether device limits are being reached. Figure 64 on page 305 is a Performance Manager view displaying both the device write-pending limits and device write-pending counts for a given device, in this example Symmetrix device 055. For the Symmetrix in this example, the write-pending slots per device was 9,776 and thus the max write-pending limit was 29,328 slots (3 * 9776). In general, a distinct flat line in such graphs indicates that a limit is reached.



Figure 64 Performance Manager graph of write-pending limit for a single hypervolume

Metavolumes can also be monitored using Performance Manager. Figure 65 on page 306 shows a four-member striped metavolume with the same workload as shown in the previous figure. Each of the volumes services a proportion of the workload. Due to the location of the file being created and the stripe depth used, one of the volumes incurred more write activity. However, even in this case, it did not exceed the lowest of the write-pending thresholds, let alone reach the maximal threshold limit.

30000

25000

20000

15000

10000

5000

016:50 16:52 16:54 16:56 16:58 17:00 17:02 17:04 17:06 17:08 17:10 17:12 17:14

Devices 055-write pendingcount 12/16/200n

Devices 055-maximum write pendingthreshold12/16/200n

ICO-IMG-000043


306


Figure 65 Performance Manager graph of write-pending limit for a four-member metavolume

In the same way, the throughput and overall performance of the workload was substantially improved. Figure 66 on page 307 shows a comparison of certain metrics in this configuration. It is obvious this is not a fair comparison since the comparison matches a single hypervolume against four hypers within the metavolume. However, it does show that multiple disks can satisfy an intense workload, which clearly exceeds the capability of a single device. It also serves to demonstrate the management and dynamic allocation of cache resources for volumes.

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0

12:49 12:53 12:57 13:01 13:05 13:09 13:13 13:17 13:27 13:25

Devices 00F-write pendingcount 12/21/200n

Devices 00E-write pendingcount 12/21/200n



Devices 00F-maximum writepending threshold12/21/200n

ICO-IMG-000038



Figure 66 Write workload for a single hyper and a striped metavolume

Note that the number of cache boards also has a minor affect on performance. When comparing Symmetrix DMX arrays with the same amount of cache, increasing the number of boards (for example, four cache boards with 16 GB each as opposed to two cache boards with 32 GB each) has a small positive affect on the performance in DSS applications. This is due to the increased number of paths between front-end directors and cache, and has the affect of improving overall throughput. However, configuring additional boards is only helpful in high-throughput environments such as DSS applications. For OLTP workloads, where IOPS are more critical, additional cache directors provide no added performance benefits. This is because the number of IOPS per port or director is limited by the processing power of CPUs on each board.

Note: In the DMX-3, write-pending limits for individual volumes is modified. Instead of allowing writes up to three times the initial write-pending limit, up to ~ 1/20 of the cache can be used by any individual hypervolume.

10 ps Read 10 ps Write 10 ps

MetaHyper

Transactionsper second

ICO-IMG-000039


308


Back-end considerationsBack-end considerations are typically the most important part of optimizing performance on the Symmetrix DMX. Advances in disk technologies have not kept up with performance increases in other parts of the storage array such as director and bandwidth (that is, Direct Matrix versus Bus) performance. Disk-access speeds have increased by a factor of three to seven in the last decade while other components have easily increased one to three orders of magnitude. As such, most performance bottlenecks in the Symmetrix DMX are attributable to physical spindle limitations.

An important consideration for back-end performance is the number of physical spindles available to handle the anticipated I/O load. Each disk is capable of a limited number of operations. Algorithms in the Symmetrix DMX Enginuity operating environment optimize I/Os to the disks. Although this helps to reduce the number of reads and writes to disk, access to disk, particularly for random reads, is still a requirement. If an insufficient number of physical disks are available to handle the anticipated I/O workload, performance will suffer. It is critical to determine the number of spindles required for an Oracle database implementation based on I/O performance requirements, and not solely on the physical space considerations.

To reduce or eliminate back-end performance issues on the Symmetrix DMX, carefully spread access to the disks across as many back-end directors and physical spindles as possible. EMC has long recommended for data placement of application data to "go wide before going deep." This means that performance is improved by spreading data across the back-end directors and disks, rather than allocating individual applications to specific physical spindles. Significant attention should be given to balancing the I/O on the physical spindles. Understanding the I/O characteristics of each datafile and separating high application I/O volumes on separate physical disks will minimize contention and improve performance. Implementing Symmetrix Optimizer may also help to reduce I/O contention between hypervolumes on a physical spindle. Symmetrix Optimizer identifies I/O contention on individual hypervolumes and nondisruptively moves one of the hypers to a new location on another disk. Symmetrix Optimizer is an invaluable tool in helping to reduce contention on physical spindles should workload requirements change in an environment.



Placement of data on the disks is another performance consideration. Due to the rotational properties of disk platters, tracks on the outer parts of the disk perform better than inner tracks. While the Symmetrix DMX Enginuity algorithms smooth out much of this variation, small performance increases can be achieved by placing high I/O objects on the outer parts of the disk. Of more importance, however, is minimizing the seek times associated with the disk head moving between hypervolumes on a spindle. Physically locating higher I/O devices together on the disks can significantly improve performance. Disk head movement across the platters (seek time) is a large source of latency in I/O performance. By placing higher I/O devices contiguously, disk head movement may be reduced, increasing I/O performance of that physical spindle.

Additional layout considerations

A few additional factors can determine the best layout for a given hardware and software configuration. It is important to evaluate each factor to create an optimal layout for an Oracle database.

Host bus adapterA host bus adapter (HBA) is a circuit board and/or integrated circuit adapter that provides I/O processing and physical connectivity between a server and a storage device. The connection may route through Fibre Channel switches if Fibre Channel FC-SW is used. Because the HBA relieves the host microprocessor of both data storage and retrieval tasks, it can improve the server's performance time. An HBA and its associated disk subsystems are sometimes referred to as a disk channel.

HBAs can be a bottleneck if an insufficient number of them are provisioned for a given throughput environment. When configuring Oracle systems, estimate the throughput required and provision sufficient HBAs accordingly.

Host addressing limitationsThere are also limitations on the number of LUNs that can be addressed on a host channel. For instance, the maximum number of LUNs that can be presented on AIX is 512, while on other operating systems, the maximum number is 256.


310


These factors must be considered when designing the database storage infrastructure. The final architecture will always be compromise between what is ideal and what is economically feasible within the constraints of the implementation environment.

Configuration recommendationsKey recommendations for configuring the Symmetrix DMX for optimal performance include the following:

Understand the I/O — It is critical to understand the characteristics of the database I/O including the number, type (read or write) size, location (that is, data files, logs), and sequentiality of the I/Os. Empirical data or estimates are needed to assist in planning.

◆ Physical spindles — The number of disk drives in the DMX should first be determined by calculating the number of I/Os required, rather than solely based on the physical space needs. The key is to ensure that the front-end needs of the applications can be satisfied by the flow of data from the back end.

◆ Spread out the I/O — Both reads and writes should be spread across the physical resources (front-end and back-end ports, physical spindles, hypervolumes) of the DMX. This helps to prevent bottlenecks such as hitting port or spindle I/O limits, or reaching write-pending limits on a hypervolume.

◆ Bandwidth — A key consideration when configuring connectivity between a host and the Symmetrix DMX is the expected bandwidth required to support database activity. This requires an understanding of the size and number of I/Os between the host and the Symmetrix system. Connectivity considerations for both the number of HBAs and Symmetrix front-end ports is required.



RAID considerationsFor years, Oracle has recommended that all database storage be mirrored; their philosophy of stripe and mirror everywhere (SAME) is well known in the Oracle technical community. While laying out databases using SAME may provide optimal performance in most circumstances, in some situations acceptable data performance (IOPS or throughput) can be achieved by implementing more economical RAID configurations such as RAID 5. Before discussing RAID recommendations for Oracle, a definition of each RAID type available in the Symmetrix DMX is required.

Types of RAID

The following RAID configurations are available on the Symmetrix DMX:

◆ Unprotected - This configuration is not typically used in a Symmetrix DMX environment for production volumes. BCVs and occasionally R2 devices (used as target devices for SRDF) can be configured as unprotected volumes.

◆ RAID 1 - These are mirrored devices and are the most common RAID type in a Symmetrix DMX. Mirrored devices require writes to both physical spindles. However, intelligent algorithms in the Enginuity operating environment can use both copies of the data to satisfy read requests not in the cache of the Symmetrix DMX. RAID 1 offers optimal availability and performance, but at an increased cost over other RAID protection options.

◆ RAID 5 - A relatively recent addition to Symmetrix data protection (Enginuity 5670+), RAID 5 stripes parity information across all volumes in the RAID group. RAID 5 offers good performance and availability, at a decreased cost. Data is striped using a stripe width of four tracks (128 KB on DMX-2 and 256 KB on DMX-3). RAID 5 is configured either as RAID 5 3+1 (75% usable) or RAID 5 7+1 (87.5 % usable) configurations. Figure 67 shows the configuration for 3+1 RAID 5 while Figure 68 on page 313 shows how a random write in a RAID 5 environment is performed.

RAID considerations 311

312


Figure 67 3+1 RAID 5 layout detail

◆ RAID-S - Proprietary EMC RAID configuration with parity information on a single hypervolume. RAID-S functionality was optimized and renamed as parity RAID in the Symmetrix DMX.

◆ Parity RAID - Prior to the availability of RAID 5, parity RAID was implemented in storage environments that required a lower cost and did not have high performance requirements. Parity RAID uses a proprietary RAID protection scheme with parity information being written on a single hypervolume. Parity RAID is configured in 3+1 (75 percent usable) and 7+1 (87.5 percent usable) configurations. Parity RAID is not recommended in current DMX configurations, where RAID 5 should be used instead.

ICO-IMG-000083

Parity 1 - 12

Data 13 - 16

Data 25 - 28

Data 37 - 40

Data 1 - 4

Parity 13 - 24

Data 29 - 32

Data 41 - 44

Data 5 - 8

Data 17 - 20

Parity 25 - 36

Data 45 - 48

Data 9 - 12

Data 21 - 24

Data 33 - 36

Parity 37 - 48

Disk 1

Stripe size(4 tracks wide)

RAID 5 3+1 Array

Disk 2 Disk 3 Disk 4



◆ RAID 1/0 - These are striped and mirrored devices. This configuration is only used in mainframe environments.

Figure 68 Anatomy of a RAID 5 random write

The following describes the process of a random write to a RAID 5 volume:

1. A random write is received from the host and is placed into a data slot in cache to be destaged to disk.

2. The write is destaged from cache to the physical spindle. When received, parity information is calculated in cache on the drive by reading the old data and using an XOR calculation with the new data.

3. The new parity information is written back to Symmetrix cache.

4. The new parity information is written to the appropriate parity location on another physical spindle.

It is also important to note some of the optimizations implemented within Enginuity for large sequential batch update (write) operations. As previously discussed, when random write operations are processed, there may be a requirement to generate a background read operation to be able to read and regenerate new parity information. With subsequent optimizations, and when large sequential writes are generated by the host application, Enginuity can calculate parity information based on the data written, and then write the new parity information when the data is destaged to disk. In this way the write penalty is removed. The size of the write operation must be at least a complete RAID 5 stripe, since each stripe is 4 tracks (128 KB in DMX-2 and 256 KB in DMX-3), in a RAID 5 (3+1) environment, the write must be 3 x 128 KB (384 KB) in a DMX-2 environment, or 3 x 256 KB (768 KB) in a DMX-3 environment. For a RAID 5 (7+1), the write

2

34

1

CACHE

Data slot

Parity slot

Host data XOR data write

XOR of old

and new data

XOR parity write

Data

Parity

ICO-IMG-000045


314


must be 7 x 128 KB (896 KB) in a DMX-2 environment, or 7 x 256 KB (1792 KB) in a DMX-3 environment. Thus, large sequential write operations, which may be typical of large batch updates, may benefit from this optimization. Figure 69 shows an example of this optimization.

Figure 69 Optimizing performance with RAID 5 sequential writes

Determining the appropriate level of RAID to configure in an environment depends on the availability and performance requirements of the applications that will use the Symmetrix DMX. Combinations of RAID types are configurable in the Symmetrix DMX with some exceptions. For example, storage may be configured as a combination of RAID 1 and 3+1 RAID 5 devices. Combinations of 3+1 and 7+1 parity or RAID 5 are currently not allowed in the same Symmetrix DMX. Likewise, mixing any types of RAID 5 and parity RAID in the same frame is not allowed.

Until recently, RAID 1 was the predominant choice for RAID protection in Symmetrix storage environments. RAID 1 provides maximum availability and enhanced performance over other available RAID protections. In addition, performance optimizations such as Symmetrix Optimizer, which reduces contention on the physical spindles by nondisruptively migrating hypervolumes, and Dynamic Mirror Service Policy (DMSP), which improves read performance by optimizing reads from both mirrors, were only available with mirrored volumes, not with parity RAID devices. While mirrored storage is still the recommended choice for RAID

Write new data

Write new data

Write new data

Write new parity3

3

3

3

2

1

CACHE

Data slots

Data slots

Data slots

Parity slots

Host data

Data

Data

Data

Parity

ICO-IMG-000527

XOR in Cache



configurations in the Symmetrix DMX, the relatively recent addition of RAID 5 storage protection provides customers with a reliable, economical alternative for their production storage needs.

RAID 5 storage protection became available with the 5670+ release of the Enginuity operating environment. RAID 5 storage protection provides economic advantages over using RAID 1, while providing high availability and performance. RAID 5 implements the standard data striping and rotating parity across all members of the RAID group (either 3+1 or 7+1). Additionally, Symmetrix Optimizer functionality is available with RAID 5 in order to reduce spindle contention. RAID 5 provides customers with a flexible data protection option for dealing with varying workloads and service-level requirements. With the advent of RAID 5 protection, using parity RAID in Symmetrix DMX systems is not recommended.

RAID recommendations

Oracle has long recommended RAID 1 over RAID 5 for database implementations. This was largely attributed to RAID 5's historical poor performance versus RAID 1 (due to software implemented RAID schemes) and also due to high disk drive failure rates that caused RAID 5 performance degradation after failures and during rebuilds. However, disk drives and RAID 5 in general have seen significant optimizations and improvements since Oracle initially recommended avoiding RAID 5. In the Symmetrix DMX, Oracle databases can be deployed on RAID 5 protected disks for all but the highest I/O performance intensive applications. Databases used for test, development, QA, or reporting are likely candidates for using RAID 5 protected volumes.

Another potential candidate for deployment on RAID 5 storage is DSS applications. In many DSS environments, read performance greatly outweighs the need for rapid writes. This is because data warehouses typically perform loads off-hours or infrequently (once a week or month); read performance in the form of database user queries is significantly more important. Since there is no RAID penalty for RAID 5 read performance, only write performance, these types of applications are generally good candidates for RAID 5 storage deployments. Conversely, production OLTP applications typically require small random writes to the database, and as such, are generally more suited to RAID 1 storage.


316


An important consideration when deploying RAID 5 is disk failures. When disks containing RAID 5 members fail, two primary issues arise-performance and data availability. Performance will be affected when the RAID group operates in the degraded mode, as the missing data must be reconstructed using parity and data information from other members in the RAID group. Performance also will be affected when the disk rebuild process is initiated after the failed drive is replaced or a hot spare is activated. Potential data loss is the other important consideration when using RAID 5. Multiple drive failures that cause the loss of multiple members of a single RAID group result in loss of data. While the probability of such an event is small, the potential in 7+1 RAID 5 environment is much higher than that for RAID 1. As such, the probability of data loss due to the loss of multiple members of RAID 5 group should be carefully weighed against the benefits of using RAID 5.

The bottom line in choosing a RAID type is ensuring that the configuration meets the needs of the customer's environment. Considerations include the following:

◆ Read and write performance

◆ Balancing the I/O across the spindles and back-end of the Symmetrix system

◆ Tolerance for reduced application performance when a drive fails

◆ The consequences of losing data in the event of multiple disk failures

In general, EMC recommends RAID 1 for all types of customer data including Oracle databases. However, RAID 5 configurations may be beneficial for many applications and should be considered.

Symmetrix metavolumes

Individual Symmetrix hypervolumes of the same RAID type (RAID 1, RAID 5) may be combined together to form a virtualized device called a Symmetrix metavolume. Metavolumes are created for a number of reasons including:

◆ A desire to create devices that are greater than the largest hypervolume available (in 5670 and 5671 Enginuity operating environments, this is currently just under 31 GB per hypervolume).



◆ To reduce the number of volumes presented down a front-end director or to an HBA. A metavolume presented to an HBA only counts as a single LUN even though the device may consist of a large number of individual hypers.

◆ To increase performance of a LUN by spreading I/O across more physical spindles.

There are two types of metavolumes: concatenated or striped. With concatenated metavolumes, the individual hypers are combined to form a single volume, such that data is written to the first hypervolume sequentially before moving to the next. Writes to the metavolume start with the metahead and proceed on that physical until full, and then move on to the next hypervolume. Striped metavolumes on the other hand, write data across all members of the device. The stripe size is set at two cylinders or 960 KB.

In some cases, striped metavolumes are recommended over concatenated volumes in Oracle database environments. The exceptions to this general rule occur in certain DSS environments where metavolumes may obscure the sequential nature of the I/Os from the Enginuity prefetching algorithms, or in cases where RAID 5 metavolumes are created.


318


Host- versus array-based stripingAnother hotly disputed issue when configuring a storage environment for maximum performance is whether to use host-based or array-based striping in Oracle environments. Striping of data across the physical disks is critically important to database performance because it allows the I/O to be distributed across multiple spindles. Although disk drive size and speeds have increased dramatically in recent years, spindle technologies have not kept pace with host CPU and memory improvements. Performance bottlenecks in the disk subsystem can develop if careful attention is not paid to the data storage requirements and configuration. In general, the discussion concerns trade-offs between performance and manageability of the storage components.

Oracle has recommended the SAME (Stripe and Mirror Everywhere) configuration for many years. However, Oracle has never recommended where the striping should occur. In general, the discussion concerns trade-offs between performance and manageability of the storage components. The following presents in more depth the trade-offs when using host-based and array-based striping.

Host-based striping

Host-based striping is configured through the Logical Volume Manager used on most open-systems hosts. For example, in an HP-UX environment, striping is configured when logical volumes are created in a volume group as shown below:

lvcreate -i 4 -I 64KB -L 1024 -n stripevol activevg

In this case, the striped volume is called stripevol (using the -n flag), is created on the volume group activevg, is of volume size 1 GB (-L 1024), uses a stripe size of 64 KB (-I 64KB), and is striped across four physical volumes (-i 4). The specifics of striping data at the host level are operating-system-dependent.

Two important things to consider when creating host-based striping are the number of disks to configure in a stripe set and an appropriate stripe size. While no definitive answer can be given that optimizes these settings for any given configuration, the following are general guidelines to use when creating host-based stripes:



◆ Ensure that the stripe size used is a power of two multiple of the track size configured on the Symmetrix DMX (that is, a multiple of 32 KB on DMX-2 and 64KB on DMX-3), the database, and host I/Os. Alignment of database blocks, Symmetrix tracks, host I/O size, and the stripe size can have considerable impact on database performance. Typical stripe sizes are 64 KB to 256 KB, although the stripe size can be as high as 512 KB or even 1 MB.

◆ Multiples of 4 physical devices for the stripe width are generally recommended, although this may be increased to 8 or 16 as required for LUN presentation or SAN configuration restrictions as needed. Care should be taken with RAID 5 metavolumes to ensure that members do not end up on the same physical spindles (a phenomenon known as vertical striping), as this may affect performance. In general, RAID 5 metavolumes are not recommended.

◆ When configuring in an SRDF environment, smaller stripe sizes (32 KB for example), particularly for the redo logs, are recommended. This is to enhance performance in synchronous SRDF environments due to the limit of having only one outstanding I/O per hypervolume on the link.

◆ Data alignment (along block boundaries) can play a significant role in performance, particularly in Windows environments. Refer to operating-system-specific documentation to learn how to align data blocks from the host along Symmetrix DMX track boundaries.

◆ Ensure that volumes used in the same stripe set are located on different physical spindles. Using volumes from the same physicals reduces the performance benefits of using striping. An exception to this rule is when RAID 5 devices are used in DSS environments.

Symmetrix-based striping (metavolumes)

An alternative to using host-based striping is to stripe at the Symmetrix DMX level. Striping in the Symmetrix array is accomplished through the use of striped metavolumes, as discussed in the previous section. Individual hypervolumes are selected and striped together, forming a single LUN presented through the front-end director to the host. At the Symmetrix level, all writes to this single LUN are striped. Currently, the only stripe size available

Host- versus array-based striping 319

320


for a metavolume is 960 KB. It is possible to create metavolumes with up to 255 hypervolumes, although in practice metavolumes are usually created with 4 to 16 devices.

Striping recommendationsDetermining the appropriate striping method depends on many factors. In general, striping is a tradeoff between manageability and performance. With host-based striping, CPU cycles are used to manage the stripes; Symmetrix metavolumes require no host cycles to stripe the data. This small performance decrease in host-based striping is offset by the fact that each device in a striped volume group maintains an I/O queue, thereby increasing performance over a Symmetrix metavolume, which only has a single I/O queue on the host.

Recent tests show that striping at the host level provides somewhat better performance than comparable Symmetrix-based striping, and is generally recommended if performance is paramount. Host-based striping is also recommended with environments using synchronous SRDF, since stripe sizes in the host can be tuned to smaller increments than are currently available with Symmetrix metavolumes, thereby increasing performance.

Management considerations generally favor Symmetrix-based metavolumes over host-based stripes. In many environments, customers have achieved high-performance back-end layouts on the Symmetrix system by allocating all of the storage as four-way striped metavolumes. The advantage of this is any volume selected for host data is always striped, with reduced chances for contention on any given physical spindle. Additional storage requirements for any host volume group, since additional storage is configured as a metavolume, also are striped. Management of added storage to an existing volume group using host-based striping may be significantly more difficult, requiring in some cases a full backup, reconfiguration of the volume group, and restore of the data to successfully expand the stripe.

An alternative in Oracle environments gaining popularity recently is the combined use of both host-based and array-based striping. Known as double striping or a plaid, this configuration utilizes striped metavolumes in the Symmetrix array, which are then presented to a volume group and striped at the host level. This has many advantages in database environments where read access is



small and highly random in nature. Since I/O patterns are pseudo random, access to data is spread across a large quantity of physical spindles, thereby decreasing the probability of contention on any given disk. Double striping, in some cases, can interfere with data prefetching at the Symmetrix DMX level when large, sequential data reads are predominant. This configuration may be inappropriate for DSS workloads.

Another method of double striping the data is through the use of Symmetrix metavolumes and RAID 5. A RAID 5 hypervolume stripes data across either four or eight physical disks using a stripe size of four tracks (128 KB for DMX-2 or 256 KB for DMX-3). Striped metavolumes stripe data across two or more hypers using a stripe size of two cylinders (960 KB in DMX-2 or 1920 KB in DMX-3). When using striped metavolumes with RAID 5 devices, ensure that members do not end up on the same physical spindles, as this will adversely affect performance. In many cases however, double striping using this method also may affect prefetching for long, sequential reads. As such, using striped metavolumes is generally not recommended in DSS environments. Instead, if metavolumes are needed for LUN presentation reasons, concatenated metavolumes on the same physical spindles are recommended.

The decision of whether to use host-based, array-based, or double striping in a storage environment has elicited considerable fervor on all sides of the argument. While each configuration has positive and negative factors, the important thing is to ensure that some form of striping is used for the storage layout. The appropriate layer for disk striping can have a significant impact on the overall performance and manageability of the database system. Deciding which form of striping to use depends on the specific nature and requirements of the database environment in which it is configured.

With the advent of RAID 5 data protection in the Symmetrix DMX, an additional option of triple striping data using RAID 5, host-based striping, and metavolumes combined is now available. However, triple striping increases data layout complexity, and in testing has shown no performance benefits over other forms of striping. In fact, it is shown to be detrimental to performance and as such, is not recommended in any Symmetrix DMX configuration.

Host- versus array-based striping 321

322


Data placement considerationsPlacement of the data on the physical spindles can potentially have a significant impact on Oracle database performance. Placement factors that affect database performance include:

◆ Hypervolume selection for specific database files on the physical spindles themselves.

◆ The spread of database files across the spindles to minimize contention.

◆ The placement of high I/O devices contiguously on the spindles to minimize head movement (seek time).

◆ The spread of files across the spindles and back-end directors to reduce component bottlenecks.

Each of these factors is discussed next.

Disk performance considerationsAs shown in Figure 70 on page 324, there are five main considerations for spindle performance:

◆ Actuator Positioning (Seek Time) - The time it takes the actuating mechanism to move the heads from their present position to a new position. This delay averages a few milliseconds in length and depends on the type of drive. For example, a 15k drive has an average seek time of approximately 3.5 ms for reads and 4 ms for writes, with a full disk seek of 7.4 ms for reads and 7.9 ms for writes.

Note: Disk drive characteristics can be found at www.seagate.com.

◆ Rotational Speed - This is due to the need for the platter to rotate underneath the head to correctly position the data to be accessed. Rotational speeds for spindles in the Symmetrix DMX range from 7,200-15,000 rpm. The average rotational delay is the time it takes for half of a revolution of the disk. In the case of a 15 KB drive, this is about 2 milliseconds.



◆ Interface Speed -A measure of the transfer rate from the drive into the Symmetrix cache. It is important to ensure that the transfer rate between the drive and cache is greater than the drive's rate to deliver data. Delay caused by this is typically a very small value, on the order of a fraction of a millisecond.

◆ Areal Density -A measure of the number of bits of data that fits on a given surface area on the disk. The greater the density, the more data per second that can be read from the disk as it passes under the disk head.

◆ Cache Capacity and Algorithms - Newer disk drives have improved read and write algorithms, as well as cache, in order to improve the transfer of data in and out of the drive and to make parity calculations for RAID 5.

Delay caused by the movement of the disk head across the platter surface is called seek time. The time associated with a data track rotating to the required location under the disk head is referred to as rotational delay. The cache capacity on the drive, disk algorithms, interface speed, and the areal density (or zoned bit recording) combines to produce a disk transfer time. Therefore, the time taken to complete an I/O (or disk latency) consists of these three elements: seek time, rotational delay, and transfer time.

Data transfer times are typically on the order of fractions of a millisecond and as such, rotational delays and delays due to repositioning the actuator heads are the primary sources of latency on a physical spindle. Additionally, rotational speeds of disk drives have increased from top speeds of 7,200 rpm up to 15,000 rpm, but still average on the order of a few milliseconds. The seek time continues to be the largest source of latency in disk assemblies when using the entire disk.

Transfer delays are lengthened in the inner parts of the drive; more data can be read per second on the outer parts of the drive than by data located on the inner regions. Therefore, performance is significantly improved on the outer parts of the disk. In many cases, performance improvements of more than 50 percent can sometimes be realized on the outer cylinders of a physical spindle. This performance differential typically leads customers to place high I/O objects on the outer portions of the drive.

While placing high I/O objects such as redo logs on the outer edges of the spindles has merit, performance differences across the drives inside the Symmetrix DMX are significantly smaller than the stand-alone disk characteristics would attest. Enginuity operating

Data placement considerations 323

324


environment algorithms, particularly the algorithms that optimize ordering of I/O as the disk heads scan across the disk, greatly reduces differences in hypervolume performance across the drive. Although this smoothing of disk latency may actually increase the delay of a particular I/O, overall performance characteristics of I/Os to hypervolumes across the face of the spindle will be more uniform.

Figure 70 Disk performance factors

Hypervolume contentionDisk drives can receive only a limited number of read or write I/Os before performance degradation occurs. While disk improvements and cache, both on the physical drives and in disk arrays, have improved disk read and write performance, the physical devices can still become a critical bottleneck in Oracle database environments. Eliminating contention on the physical spindles is a key factor in ensuring maximum Oracle performance on Symmetrix DMX arrays.

Contention can occur on a physical spindle when I/O (read or write) to one or more hypervolumes exceeds the I/O capacity of the disk. While contention on a physical spindle is undesirable, this type of contention can be rectified by migrating high I/O data onto other devices with lower utilization. This is accomplished using a number

Areal density Rotational speed

Cache andalgorithms

Position actuator

Interface speed

ICO-IMG-000037



of methods, depending on the type of contention that is found. For example, when two or more hypervolumes on the same physical spindle have excessive I/O, contention may be eliminated by migrating one of the hypervolumes to another, lower-utilized physical spindle. This is done through processes such as LVM mirroring at the host level or by using tools such as EMC Symmetrix Optimizer to nondisruptively migrate data from impacted devices. One method of reducing hypervolume contention is careful layout of the data across the physical spindles on the back-end of the Symmetrix system. Another method of reducing contention is to use striping, either at the host level or inside the Symmetrix system.

Hypervolume contention can be found in several ways. Oracle-specific data collection and analysis tools such as the Automatic Workload Repository (AWS), the Automatic Database Diagnostic Monitor, and StatsPack can help to identify areas of reduced I/O performance in the database data files. Additionally, EMC tools such as Performance Manager (formerly WorkLoad Analyzer) can help to identify performance bottlenecks in the Symmetrix DMX array. Establishing baselines of the system and proactive monitoring are essential in helping to maintain an efficient, high-performance database.

Commonly, tuning database performance on the Symmetrix system is performed post-implementation. This is unfortunate because with a small amount of up-front effort and detailed planning, significant I/O contention issues could be minimized or eliminated in a new implementation. While detailed I/O patterns of a database environment are not always well known, particularly in the case of a new system implementation, careful layout consideration of a database on the back end of the Symmetrix system can save time and future effort in trying to identify and eliminate I/O contention on the disk drives.

Maximizing data spread across the back endA long-standing data layout recommendation at EMC is "Go wide before going deep." This means that data placement on the Symmetrix DMX should be spread across the back-end directors and physical spindles before locating data on the same physical drives. By spreading the I/O across the Symmetrix back end, I/O bottlenecks in any one array component can be minimized or eliminated.


326


Given recent improvements in the Symmetrix DMX component technologies, such as CPU performance on the directors and the Direct Matrix architecture, the most common bottleneck in new implementations is with contention on the physical spindles and the back-end directors. To reduce these contention issues, examine the I/O requirements for each application that will use the Symmetrix storage. From this analysis, create a detailed layout that balances the anticipated I/O requirements across both back-end directors and physical spindles.

Before data is laid out on the DMX back end, it is helpful to understand the I/O requirements for each of the file systems or volumes being laid out. Several methods to optimize layout on the back-end directors and spindles are available. One time-consuming method involves creating a map of the hypervolumes on physical storage, including hypervolume presentation by director and physical spindle, based on information available in EMC ControlCenter. This involves documenting the environment using a tool such as Excel, with each hypervolume marked on its physical spindle and disk director. Using this map of the back end and volume information for the database elements, preferably categorized by I/O requirement (high/medium/low, or by anticipated reads and writes), the physical data elements and I/Os can be evenly spread across the directors and physical spindles.

This type of layout can be extremely complex and time-consuming. Additional complexity is added when RAID 5 hypers are added to the configuration. Since each hypervolume is placed on either four or eight physical volumes in RAID 5 environments, trying to uniquely map out each datafile or database element is beyond what most customers feel provides value. In these cases, one alternative is to rank each of the database elements or volumes in terms of anticipated I/O. Once ranked, each element may be assigned a hypervolume in order on the back end. Since BIN file creation tools almost always spread contiguous hypervolume numbers across different elements of the back end, this method of assigning the ranked database elements usually provides a reasonable spread of I/O across the spindles and back-end directors in the Symmetrix DMX. In combination with Symmetrix Optimizer, this method of spreading the I/O is normally effective in maximizing the spread of I/O across DMX components.



Minimizing disk head movementPerhaps the key performance consideration a customer can control when laying out a database on the Symmetrix DMX is minimizing head movement on the physical spindles. Positioning high I/O hypervolumes contiguously on the physical spindles can minimize head movement. Disk latency caused by interface or rotational speeds cannot be controlled by layout considerations. The only disk drive performance considerations that can be controlled are the placement of data onto specific, higher-performing areas of the drive (discussed in a previous section) and the reduction of actuator movement, by trying to place high I/O objects in adjacent hypervolumes on the physical spindles.

One method, described in the previous section, describes how volumes can be ranked by anticipated I/O requirements. Using a documented "map" of the back-end spindles, high-I/O objects can be placed on the physical spindles, grouping the highest-I/O objects together. Recommendations differ as to whether it is optimal to place the highest I/O objects together on the outer parts of the spindle (the highest performing parts of a physical spindle) or in the center of a spindle. Since there is no definitive answer to this question, the historical recommendation of putting high-I/O objects together on the outer part of the spindle is still a reasonable suggestion. Placing these high-I/O objects together on the outer parts of the spindle should help reduce disk actuator movement when doing reads and writes to each hypervolume on the spindle, thereby improving a controllable parameter in any data layout exercise.


328


Other layout considerationsIn addition to the layout considerations described in previous sections, a few additional factors may be important to DBAs or storage administrators seeking to optimize database performance. Additional configuration factors to consider include the following:

◆ Implementing SRDF/S for the database

◆ Creating database clones using TimeFinder/Mirror or TimeFinder/Clone

◆ Creating database clones using TimeFinder/Snap

These additional layout considerations are discussed in the following sections.

Database layout considerations with SRDF/S

Two primary concerns must be considered when SRDF/S is implemented:

◆ Inherent latency is added for each write to the database. Latency occurs because each write must be first written to both the local and remote Symmetrix caches before the write can be acknowledged to the host. This latency must always be considered as a part of any SRDF/S implementation. As the speed of light cannot be circumvented, there is little to be done to mitigate this latency.

◆ Each hypervolume configured in the Symmetrix is only allowed to send a single I/O across the SRDF link. Performance degradation results when multiple I/Os are written to a single hypervolume since subsequent writes must wait for predecessors to complete. Striping at the host level is particularly helpful in these situations. Using a smaller stripe size (32-128 KB) ensures that larger writes will be spread across multiple hypervolumes, reducing the chances for SRDF to serialize writes across the link.

Database cloning, TimeFinder, and sharing spindles

Database cloning is useful when DBAs want to create backup or other business continuance images of a database. A common question when laying out a database is whether BCVs should share the same physical spindles as the production volumes or should be isolated on



separate physical disks. There are pros and cons to each of the solutions; the optimal solution generally depends on the anticipated workload.

The primary benefit of spreading BCVs across all physical spindles is performance. Spreading I/Os across more spindles reduces the risk of bottlenecks on the physical disks. Workloads that use BCVs, such as backups and reporting databases, may generate high I/O rates. Spreading this workload across more physical spindles may significantly improve performance in these environments.

The main drawbacks to spreading BCVs across all spindles in the Symmetrix system are:

◆ Synchronization may cause spindle contention during resynchronization.

◆ BCV workloads may negatively impact production database performance.

When resynchronizing the BCVs, data is read from the production hypers and copied into cache. From there it is destaged to the BCVs. When the physical disks share production and BCVs, the synchronization rates can be greatly reduced because of increased seek times due to the conflict between reading from one part of the disk and writing to another. The other drawback to sharing physical disks is the increased workload on the spindles that may impact performance on the production volumes. Sharing the spindles increases the chance that contention may arise, decreasing database performance.

Determining the appropriate location for BCVs (either sharing the same physical spindles or isolated on their own disks) depends on customer preference and workload. In general, BCVs should share the same physical spindles. However, in cases where the BCV synchronization and utilization may negatively impact applications (for example, databases that run 24x7 with high I/O requirements), it may be beneficial for the BCVs to be isolated on their own physical disks.

Database clones using TimeFinder/SnapTimeFinder/Snap provides many of the benefits of full-volume replication techniques such as TimeFinder/Mirror or TimeFinder/Clone, but at greatly reduced costs. However, there are two performance considerations when using TimeFinder/Snap to

Other layout considerations 329

330


make database clones for backups or other business continuous functions. These include Copy on First Write (COFW) and Copy On Access (COA) penalties. The first affects the production volume performance, while the second affects access to the snap volumes.

COFW is the result of the need for data to be copied from the production hypers to the save area as writes come into them. It affects the production devices. TimeFinder/Snap uses virtual devices that contain pointers to where valid data for the snap device is located. When first created, all of the pointers are directed at the production hypervolume. As time goes by and changes are made to the production volume, any changes must be saved to an alternative location before being written to disk. This requirement to save the original data to a save device before a write can be processed, along with the change to the snap pointer, manifests itself as a small write performance hit to the production volumes. Although generally small, whenever a snap device is created with a production volume, consider this COFW performance penalty.

The COA penalty affects both the production and snap volume performance. It is the result of two reasons: The need to determine where the snap data on disk is located and the workload on the production volumes. The virtual device contains pointers that define whether a requested track is located on the production volume or on a save device. This in-memory lookup before reading from the appropriate disk track requires extra cycles of processing before a read request is returned to a host.

In addition, this COA penalty depends on the load in the Symmetrix system and the activity. In highly utilized systems, the performance penalty can increase dramatically due to contention for physical resources. In highly utilized systems, using TimeFinder/Snap can result in unsatisfactory performance in all applications resident of the array.



Oracle database-specific configuration settingsOracle provides significant flexibility when configuring a database through its initialization parameters. Although a broad range of parameters can be adjusted, relatively few have a significant impact on Oracle performance from a storage perspective.

Table 12 describes initialization parameters that can be tuned to improve I/O performance from a Symmetrix DMX storage array.

Table 12 Initialization parameters (page 1 of 2)

Parameter Description

DB_BLOCK_BUFFERS

Specifies the number of data "pages" available in host memory for data pulled from disk. Typically, the more block buffers available in memory, the better the potential performance of the database.

DB_BLOCK_SIZE Determines the size of the data pages Oracle stores in memory and out on disk. For DSS applications, using larger block sizes such as 16 KB (or 32 KB when available) improves data throughput, while for OLTP applications, a 4 KB or 8 KB block size may be more appropriate.

DB_FILE_MULTIBLOCK_READ_COUNT

Specifies the maximum number of blocks that can be read in a single sequential read I/O. For OLTP environments, this parameter should be set to a low value (4 or 8 for example). For DSS environments where long, sequential data scans are normal, this parameter should be increased to match the maximum host I/O size (or more) to optimize throughput.

DB_WRITER_PROCESSES

Specifies the number of DBWR processes initially started for the database. Increasing the number of DBWR processes can improve writes to disk through multiplexing if multiple CPUs are available in the host.

DBWR_IO_SLAVES Configures multiple I/O server process for the DBW0 process. This parameter is only used on single CPU servers where only a single DBWR process is enabled. Configuring I/O slaves can improve write performance to disk through multiplexing the writes.

DISK_ASYNCH_IO Controls whether I/O to Oracle structures such as datafiles, log files, and control files is written asynchronously or not. If asynchronous I/O is available on the host platform, asynchronous I/O to the datafiles has a positive affect on I/O performance.

FAST_START_MTTR_TARGET

Specifies the desired number of seconds needed to crash recover a database in the event of a failure. If used, setting this to low values is detrimental to performance because of the need to perform frequent checkpoints to ensure quick restart of the data.

Oracle database-specific configuration settings 331

332


LOG_BUFFER Specifies the size of the redo log buffer. Increasing the size of this buffer can decrease the frequency of required writes to disk.

LOG_CHECKPOINT_INTERVAL

Specifies the number of redo log blocks that can be written before a checkpoint must be performed. This affects performance since a checkpoint requires that data be written to disk to ensure consistency. Frequent checkpoints reduce the amount of recovery needed if a crash occurs but can also be detrimental to Oracle performance.

LOG_CHECKPOINT_TIMEOUT

Specifies the number of seconds that can elapse before a checkpoint must be performed. This affects performance since a checkpoint requires that data be written to disk to ensure consistency. Frequent checkpoints reduce the amount of recovery needed if a crash occurs, but also can be detrimental to Oracle performance.

SORT_AREA_SIZE Specifies the maximum amount in memory that Oracle will use to perform sort operations. Increasing this parameter decreases the likelihood that a sort will be performed in a temporary tablespace on disk. However, this also increases the memory requirements on the host.

Table 12 Initialization parameters (page 2 of 2)

Parameter Description



The database layout processAfter discussing the various considerations for optimizing Oracle performance on the Symmetrix DMX, the question arises as to how these recommendations are applied when laying out a new database. There are four general phases to the layout process:

◆ Analysis and planning

◆ Initial database layout

◆ Implementation

◆ Reanalyze and refine

The following sections describe each of these phases and provide examples of how a database might be laid out for three examples: an OLTP-like database, a DSS-like database, and a mixed-type database.

Database layout processAs discussed previously, there are four primary phases to a database layout process. The following describes the high-level steps involved in an Oracle database layout process.

Analysis and planningThe first step in the process is typically the most time-consuming as it requires thoroughly analyzing and documenting the anticipated environment. Information concerning the host, storage, connectivity, I/O, availability, growth, and backups must be provided as a part of this phase of the process. Typical questions include:

◆ What is the anticipated size of the database at deployment? After one month? After six 6 months?

◆ What is the host environment needed for the application? Memory? CPUs? Growth?

◆ What level of operating system will be deployed? Which LVM? Raw or "cooked" file systems?

◆ How will data striping be achieved (host-based, storage-based, double striping)?

◆ What RAID will be used for the database environment (RAID 1, RAID 5)?

The database layout process 333

334


◆ How many data files for the database will be created? Which have the highest I/O activity?

◆ What are the availability requirements for the database?

◆ Will a cluster be deployed? How many nodes? Single instance? RAC?

◆ How many data paths are required from the host to the storage array? Will multipathing software be used?

◆ How will the host connect d to the storage array (direct attach, SAN)?

◆ Which is more important: IOPS or throughput? How much of each are anticipated?

◆ What kind of database is planned (DSS, OLTP, a combination of the two)?

◆ What types of I/Os are anticipated from the database (long sequential reads, small bursts of write activity, a mix of reads and writes)?

◆ How will backups be handled? Will replication (host or storage based) be used?

Answers to these questions determines the configuration and layout of the proposed database. The key to the layout process is a complete understanding of the database characteristics and requirements to implement. Of particular importance when planning a database layout is the I/O characteristics of the various database objects. This information is collected and documented, and encompasses the key deliverable for the next phase of the database layout on a Symmetrix project.

In some cases, the databases to be deployed already exist in a production environment. In these cases, it is easy to understand the I/O characteristics of the various underlying database structures (tablespaces, data files, tables, and so on). Various tools for gathering performance statistics include Oracle StatsPack, EMC Performance Manager, host-based utilities including sar and iostat, and third-party analyzers (such as Quest Central). Performance statistics such as reads and writes are determined for database objects. These statistics are then used to determine the required number of physical spindles, the number of I/O paths between the host and the storage array, and the Symmetrix configuration.



Sometimes, the database to be deployed is not in a production environment. In these cases, it is difficult to anticipate the I/O requirements of the planned database. However, EMC has analyzed empirical data from a wide variety of environments, including Oracle database implementations, and has created workload simulations based upon these analyses. Given an anticipated maximum number of I/Os and type of workload (DSS or OLTP), a simulated workload can be put into an EMC utility called SymmMerge to estimate resource utilization on the Symmetrix system. This tool is only available to EMC internal performance specialists (i.e., SPEED resources), but using this tool ensures a successful Oracle implementation in a Symmetrix environment when the exact workload requirements are unknown.

Initial database layoutThe next step in the process is to create an initial layout of the database on the back-end storage of the Symmetrix. Of primary concern is spreading out the anticipated workload across the back-end directors and physical spindles inside the Symmetrix array. The first step in the process is to acquire a map of the Symmetrix back end. This map shows the layout of the Symmetrix hypervolumes on the physical disks in the array. This information can be acquired from the EMC software Ionix ControlCenter.

In addition to the back-end layout, the database configuration in terms of tablespaces and datafiles should be planned. This requires determining the number and type of datafiles needed for the implementation. Additionally, estimates of the type, size, and number of read and write requests for each volume should be determined. These estimates form the basis of the data layout process.

Once the information on the Symmetrix back-end layout and Oracle datafile requirements are determined, the next step is to begin laying out each of the datafiles across the hypervolumes in the Symmetrix system. Volumes are laid out such that the workload is spread across the back-end directors and the physical spindles. Make sure that the number of reads and writes to a physical device does not exceed the maximum number of I/Os that a spindle can handle. However, if sufficient diligence is taken to ensure balance of I/Os across the drives, reaching I/O limits on one spindle would result in high rates of activity across all drives. In such a situation, additional spindles are required to handle the workload.


336


Simpler alternatives to a detailed database layout exist. One method is to simply rank I/Os (primarily reads since writes are written to cache). In a typical Symmetrix BIN file, consecutive hypervolume numbers are spread across different physical spindles and back-end directors. By ranking volume requirements for the database and assigning them to consecutive hypervolume numbers, a DBA can reasonably spread the database I/O across back-end directors and the physical spindles.

ImplementationThe implementation phase takes the planned database layout from the preceding step and implements it into the customer's environment. The host is presented with the documented storage. Volume groups and file systems are created as required and database elements are initialized as planned. This phase of the process is normally short and relatively straight-forward to complete if prior steps are performed and documented well.

Reanalyze and refineAfter the database is put into production, it is important to establish performance baselines and continue to monitor the system environment. Important tools for this include Oracle Statspack and EMC Performance Manager. Performance baselines after deployment help to determine the database performance characteristics of the system. Thereafter, ongoing monitoring of the system and comparison to these benchmarks helps determine whether performance degrades over time and where the degradation occurs. Degradation can occur in database performance due to growth in the system or changing requirements inside the database.

If degradation is detected, there are several ways to deal with it. If the source of poor performance is contention on the physical spindle (for example, multiple active hypervolumes on the same physical spindle contending for I/O), then workload on the drive must be reduced. There are several ways to do this. Striping data is effective at eliminating contention on a spindle. Another commonly used method to eliminate contention is to migrate one of the active hypervolumes on the drive to a new location. This can be done through host-based mechanisms such as copying the associated data files to new hypervolumes. However, this can cause service disruptions to the data being migrated.



An alternative to host-based migrations is to use EMC Symmetrix Optimizer. Symmetrix Optimizer proactively analyzes the Symmetrix for back-end performance issues including director and disk contention. It then determines the best way to optimize disk I/O. It detects hypervolume contention on a physical spindle and migrates the volume to a new location without disruption to active systems.

Another source of degradation occurs when activity to a single hypervolume exceeds hypervolume (write pending) or spindle (read and write) limits. In such cases, moving the hypervolume to another physical spindle will not solve the problem. When this case is found, the only way to reduce the performance degradation is to spread out the contents of the hypervolume onto multiple volumes. If multiple datafiles are located on the hyper, migrating one or more to alternate locations may fix the problem. More difficult (and more common however) is when a single datafile his issue is to have the data spread across multiple hypervolumes. This can be done through striping techniques at the Symmetrix system through metavolumes, at the host through host-based striping, or in the database by using data partitioning.


338



8

This chapter describes data protection methods using EMC Double Checksum to minimize the impact of I/O errors on database consistency during I/O transfers between hosts and Symmetrix storage devices. Topics include:

◆ EMC Double Checksum overview ................................................ 340◆ Implementing EMC Double Checksum for Oracle..................... 342◆ Implementing Generic SafeWrite for generic applications ........ 346◆ Syntax and examples ....................................................................... 353

Data Protection

Data Protection 339

340

Data Protection

EMC Double Checksum overviewThe EMC Double Checksum feature provides a method to help minimize the impact of I/O errors on database consistency during I/O transfers between hosts and Symmetrix storage devices.

For Oracle, EMC Double Checksum for Oracle contains a rich set of checks that can be natively performed by the Symmetrix array. For each Relational Database Management System (RDBMS) write in the Symmetrix array, checksum values are computed and compared to test the data for any corruption picked up along the way from the host. Although errors of this kind are infrequent, they can have a considerable effect on data availability and recovery. Section 8.2 provides details on this feature.

For generic RDBMS applications, EMC Double Checksum for Generic Applications provides the Generic SafeWrite feature to help protect critical applications from incurring an incomplete write, and subsequent torn page, due to a failure with a component connected to the Symmetrix Front End Channel Adapter. Generic SafeWrite is most often used to protect against corruptions due to HBA and link failures including server crashes, where essentially, it will help protect against fractured writes that can occur before the data reaches the Symmetrix. “Implementing Generic SafeWrite for generic applications” on page 346 provides details on this feature.

This chapter contains overview and concept information. The EMC Solutions Enabler Symmetrix CLI Command Reference contains a complete list of syntax and options.

Traditional methods of preventing data corruption

Data corruption checking is an integral part of most RDBMS products. For instance, Oracle computes a checksum that verifies the data within each page. If corruption occurs, the checksum will be incorrect when the data is read. However, this checking only takes place within the host system - not the storage array.

As a result, if there is corruption after the data leaves the host system, it will not be detected until that data is read back into the system, which can be some time - maybe months - later. The RDBMS will issue an alert, and then the data must be rebuilt from backups and


Data Protection

database logs. While a corruption remains undetected, the number of database logs required for recovery increases. This causes the data recovery process to be more complex and time-consuming.

Data corruption between host and conventional storage

Although data appears to the host to travel directly to the Symmetrix array, it passes through multiple hardware and software layers. These can lead to problems such as corruption introduced by errors in the operating system or the I/O driver. Hardware can also introduce corruption, such as errors in the host adapter, cable and connector problems, static electricity, and RF noise and interference.

This means that valid data within the RDBMS might arrive corrupted at the storage device. The storage device writes the data as is because it has no way of validating the data.

Benefits of checking within Symmetrix arrays

With this feature, the Symmetrix array can perform error checks on data pages handled within a checksum extent as they are written to the disk. The check occurs before the write command is acknowledged. If an error is detected within the blocks of the extent, the I/O can be rejected and/or reported in a phone home connection or logged in the Symmetrix error log facilities. The error action can be specified by the administrator.

This checking feature minimizes the possibility of data corruption occurring between the host and the Symmetrix array. It improves the recovery time by flagging the error at the time of the "write." When this error condition is raised, and reject I/O is selected, Oracle takes an action, such as taking the tablespace offline.

EMC Double Checksum overview 341

342

Data Protection

Implementing EMC Double Checksum for OracleThe symchksum command allows you to perform control operations that manage the checksum I/O extents.

For example, to enable Oracle checksum checking on the extents of all the devices that define the current database instance and then to phone home on error, enter:

symchksum enable -type Oracle -phone_home

This command requires an Oracle PAK license.

The following are current restrictions for this feature:

◆ Refer to the EMC Support Matrix for supported Oracle versions and configurations.

◆ Data-block size is limited to 32 KB.

◆ Checksum data objects can only be Oracle datafiles, control files, and redo logs.

The Oracle instance being checked must be started with the following init.ora configuration parameter set to true:

db_block_checksum=true

Other checksum operations

The following additional checks are available:

◆ MagicNumber - Verify the magic number that appears in Oracle data blocks. (Enabled by default.)

◆ NonzeroDba - Check for nonzero data block address. The dba stored in a data block is never zero. (Enabled by default.)

◆ Check_All_Blocks - Apply checks to each block in a write.

◆ Straddle - Check that the write does not straddle known Oracle areas.

◆ Check_dba - Check that the logical address embedded by Oracle is compatible with the storage address of the writes.

With the addition of these new tests, the output when you list the extents will look similar to the following:


Data Protection

Symmetrix ID: 000187900671

D E V I C E S W I T H C H E C K S U M E X T E N T S

Action Checks

R P C C A N D e h h M h S l Z F i j o k a k t l r s L e n s g D r B D a c Num Blk o c e u i B a l B c rDevice Name Dev Exts Siz Type g t H m c A d k A t d-------------------------------------------------------------------------------

/dev/sdi 047 16 32b Oracle X . X X X . . . X . ./dev/sdj 048 16 32b Oracle X . X X X . . . X . ./dev/sdk 049 16 32b Oracle X . X X X . . . X . ./dev/sdl 04A 15 32b Oracle X . X X X . . . X . .

Use this output to determine which features are enabled on the devices with checksum extents.

To turn off any of the automatic checksum features, use the -suppress_feature option and supply the name of the feature, for example:

symchksum -type Oracle enable -suppress_feature MagicNumber

The -suppress_feature option is only for the operations run by default. To turn off an option that was manually enabled, such as -phone_home, disable the checksum operation and begin again with a new symchksum enable command.

Enabling checksum options

When using the symchksum enable command, the user can decide to reject the I/O, or have the Symmetrix phone home when a checksum error is detected.

If an I/O is not a multiple of the object block size, the user can choose to reject the I/O. This is called a fractured I/O, and is selected with the -fractured_reject_io option. When using this option, the -reject_io option must also be used.

When extents are enabled with the -discard option, EMC Double Checksum writes blocks to disk until a failed block is detected. The -discard option divides a large I/O into smaller units of 32 KB each.

Implementing EMC Double Checksum for Oracle 343

344

Data Protection

When a checksum failure is detected, all blocks in that unit and subsequent units are discarded. When using this option, the -reject_io option must also be used.

The symchksum enable command understands the Oracle database structure. The feature can be enabled for tablespaces, control files, redo logs, or the entire database.

For Oracle9i and above, if the block size for a tablespace is altered, then the user must disable and then reenable the extents of the tablespace to ensure that the block size of the enabled extents match the block size of the tablespace.

Note: When FF or power down occurs, extents are lost. Run the symchksum enable command again.

Verifying checksum is enabled

The symchksum command also allows you to verify that the datafile's extents are currently enabled for checksum checking. This provides an easy way to determine if the specified tablespace or instance is fully protected by the Symmetrix checksum feature. The verify action will report if all, some, or none of the Oracle datafile's extents are enabled for checksum checking. This is useful in environments where the database configuration changes frequently. An example of this is if a new datafile is added, but checksum is not enabled for the new file.

The symchksum verify command understands the Oracle database structure. The feature can be enabled for tablespaces, control files, redo logs, or the entire database.

Validating for checksum operations

The symchksum command also allows you to validate your Oracle tablespace or instance for checksum operations without performing any active actions. This is helpful when you want to know if your database environment is configured to support Symmetrix checksum functionality without actually making any changes.

If the validate is successful, you can enable EMC Double Checksum on the specified Oracle database or tablespace. The following items are validated:


Data Protection

◆ Refer to the EMC Support Matrix for supported Oracle versions and configurations.

◆ Oracle's checksum initialization parameter is set (db_block_checksum).

◆ If the Oracle datafile is created on a striped LVM, that the LVM stripe width is a multiple of the Oracle block size.

◆ Oracle datafile's block size is less than or equal to 32 KB.

◆ The Symmetrix Enginuity version supports the checksum functionality.

◆ Each Symmetrix device has the checksum flag set.

◆ Each Symmetrix device has a supportable number of extents defined.

The symchksum validate command understands the Oracle database structure. The feature can be enabled for tablespaces, control files, redo logs, or the entire database.

Disabling checksum

The symchksum disable command understands the Oracle database structure. The feature can be enabled for tablespaces, control files, redo logs, or the entire database.

The symchksum disable command also is used on a device basis. This capability is not normally used, but is provided in the event the tablespace was dropped before EMC Double Checksum was disabled for that object.

When the disable action is specified for a Symmetrix device, the -force flag is required. Disabling extents in this way can cause a mapped tablespace or database to be only partially protected, therefore, use this option with caution. All the extents monitored for checksum errors on the specified Symmetrix device will be disabled.

Implementing EMC Double Checksum for Oracle 345

346

Data Protection

Implementing Generic SafeWrite for generic applicationsGeneric SafeWrite (GSW) is used to help protect critical applications from incurring an incomplete write, and subsequent torn page, due to a failure with a component connected to the Symmetrix Front End Channel Adapter.

Torn pages: Using Generic SafeWrite to protect applications

A Relational Database Management System (RDBMS), such as Oracle and Microsoft Exchange, structure data within database files using pages (also referred to as blocks). Pages within a database are the smallest allocation unit size possible for a database object (such as a table or a row).

For example, the page size for Microsoft Exchange is 4 KB and for Oracle, though it is configurable, it is usually set to 8 KB. If an incomplete page is written to a database file, a corruption to the database will occur. The resulting corruption is commonly referred to as a torn page.

Torn pages are only detected by most RDBMSs after the corruption is written, when that area of the database is read, which could be after when the corruption was introduced. In general, the only recovery from a torn page is to perform a restore from a backup (some RDBMSs allow page-level restores, while others require a complete database restore). Torn pages can occur due to failures in various components that lie between the RDBMS and the storage array. Some of these components include the operating system, file system, logical volume manager, I/O driver, host bus adapter, Fibre or SCSI link and storage adapter.

The EMC Double Checkum Generic SafeWrite feature protects critical applications from incurring incomplete writes, and subsequent torn pages, due to a failure with a component connected to the Symmetrix Front End Channel Adapter.

Most often, Generic SafeWrite is used to protect against corruption that occurs when the HBA and link fails (including server crashes). In this scenario, Generic SafeWrite will protect against fractured writesoccuring before the data reaches the Symmetrix array.


Data Protection

Why generic?Generic SafeWrite is deemed generic because the checks performed to ensure complete data are application independent. For instance, Generic SafeWrite will not perform any Oracle- or Exchange-specific checksums to verify data integrity. It is important to note that for Oracle, EMC Double Checksum for Oracle provides a rich set of checks which can be natively performed by the Symmetrix array. For more information on EMC Double Checksum for Oracle, consult “Implementing EMC Double Checksum for Oracle” on page 342.

Where to enable Generic SafeWriteGeneric SafeWrite only needs to be enabled for specific devices on the Symmetrix array. For a RDBMS, Generic SafeWrite only needs to be enabled for devices that support datafiles. The list below gives an example of database files where the supporting devices for these files should have Generic SafeWrite enabled:

Microsoft Exchange

◆ .edb files

◆ .stm files

Microsoft SQL Server

◆ .mdf files

◆ .ndf files

Oracle

◆ Data files

◆ Control files

It is recommended to enable Generic SafeWrite for database file devices, though it is unnecessary to enable it for database log devices. In general, a RDBMS will write to its respective log file with a 512 byte sector alignment. The RDBMS can therefore determine the last sector that was correctly written and subsequently discard or rollback any incomplete transactions.

Implementing Generic SafeWrite for generic applications 347

348

Data Protection

Note: It is always a best practice to separate the location of database files and log files for a given database onto unique devices. There are cases, however, where the datafile and log file may share the same device. In this case, it is still possible to have GSW enabled; however, there will be a performance impact to the log writes that may impact application performance.

There are no restrictions regarding the size of a device or the number of devices where GSW can be enabled. All device types are supported including devices replicated using the SRDF and the TimeFinder family of products. It is also supported to enable GSW on file systems across all logical volume managers as well as on raw devices, given the OS platforms are supported by the Solutions Enabler Storage Resource Management (SRM) component. When using file systems on Windows and Linux hosts, for performance reasons, it is strongly recommended to ensure the file systems are properly aligned with the storage. For more information regarding file system alignment, consult Using diskpar and diskpart to Align Partitions on Windows Basic and Dynamic Disks available on Powerlink.

Configuring Generic SafeWrite

To use Generic SafeWrite, you must:

◆ Enable the RDB_cksum device flag on all devices targeted for Generic SafeWrite use

◆ Run a discover operation to update the physical device information in the SYMAPI database

◆ Enable Generic SafeWrite on all devices targeted for Generic SafeWrite use via the symchksum command

Setting the RDB_chksum Symmetrix device flagBefore using Generic SafeWrite, the RDB_cksum Symmetrix device flag must be enabled on all devices targeted for Generic SafeWrite use. This change does not turn Generic SafeWrite on, it only allows it to be enabled on the specified devices.

The RDB_cksum device flag is set by using the SYMCLI symconfigure command, which will perform a Symmetrix configuration change. Chapter 1 contains more information on using symconfigure.


Data Protection

Note: If symconfigure cannot be used, the appropriate device flag is set on the array by a EMC Customer Support Engineer.

The following is an example command:

symconfigure -sid 54 -f c:\enable_cksum.txt commit

where the c:\enable_cksum.txt file contains the following command:

set device 0015:0019 attribute=RDB_Cksum;

Note: If metavolumes are used, this flag needs to be set for both metaheads and metamembers.

Enabling Generic SafeWriteOnce the device flags are set on the Symmetrix array, it is possible to use the symchksum command to enable Generic SafeWrite. Before running the symchksum command, confirm the following:

◆ The devices enabled for Generic SafeWrite are visible to the host from where the symchksum command will be run.

◆ Run a symcfg discover command after presenting devices to a host to update the SYMAPI configuration database with the correct physical drive information.

◆ Using the symchksum command, Generic SafeWrite is enabled by specifying a specific device, a range of devices, or a device group.

Enabling for a device

To enable Generic SafeWrite for a device, use the command syntax shown in the example below:

symchksum enable -type generic dev 005 -sid 54

Note: If this is a metadevice, only the metahead needs to be specified.

Enabling for a range of devices

To enable Generic SafeWrite for a contiguous range of devices:

symchksum enable -type generic -range 005:025 -sid 54

Enabling for a device group

To enable Generic SafeWrite for a device group:


350

Data Protection

symchksum enable -type generic -g sql_data -sid 54

Note: Enabling Generic SafeWrite on a Composite Group (CG) is currently not supported.

The symchksum enable -type generic command automatically sets the Log, Phone Home, and Generic Double Checksum options as described below:

◆ Log - Indicates that errors will be sent to the Symmetrix error log. These events should be visible via the symevent command.

◆ Phone Home - Indicates that an error will initiate a call by the Symmetrix to EMC Customer Service.

◆ Generic - The Generic option allows for two functions to be performed by the Symmetrix array. First, when an incomplete write is detected, it will be rejected and the Symmetrix will force the I/O to be retried from the host. Then, if the host is unavailable to retry the I/O, the write will be discarded, preventing it from being written to disk.

How to disable Generic SafeWrite

Generic SafeWrite can be disabled using the symchksum disable -type generic command as shown in the same examples below.

Disabling for a deviceTo disable Generic SafeWrite for a device, use the command syntax shown in the example below:

symchksum disable -type generic dev 005 -sid 54

Note: If this is a metadevice, only the meta head needs to be specified.

Disabling for a range of devicesTo disable Generic SafeWrite for a contiguous range of devices:

symchksum disable -type generic -range 005:025 -sid 54

Disabling for a device groupTo disable Generic SafeWrite for a device group:

symchksum disable -type generic -g sql_data -sid 54


Data Protection

Listing Generic SafeWrite devicesTo list which devices are Generic SafeWrite enabled, use the symchksum list command. Only Generic SafeWrite-enabled devices that are visible to the host running the symchksum list command are returned.

Figure 71 shows the expected output from the list command, with Generic for the type and the Log and PhoneH (short for Phone Home) options set as well.

Figure 71 Synchronous replication internals

The symchksum show command is used to look at a specific device. For example:

symchksum show dev 103 -type generic -mb -sid 54

Performance considerationsPerformance testing was done with Microsoft Exchange, Microsoft SQL Server and Oracle on standard devices, and in the case of Microsoft Exchange, also on SRDF/S and SRDF/A devices. For the Microsoft SQL Server and Oracle performance tests, a TPCC

ICO-IMG-000526


352

Data Protection

workload was used. For Microsoft Exchange, the Jetstress performance tool was used. The results of these tests showed no performance degradation from an application perspective. The reason application performance remain unaffected is database log writes, as well as database reads, will be performed normally. When considering log write response times and database read response times as the main determinates of application performance with respect to storage, it is expected that client and application response times will not be greatly affected.

Outside of application performance, there may be a slight increase in the write response time to the database file devices depending on application profile and usage. In general, this response time increase should not impact application performance. Writes to a database file are done asynchronously, therefore write response times to this file are less of a concern than to the log device.However, there is always a possibility in certain environments a delay in these asynchronous writes may impact performance.


Data Protection

Syntax and examplesThis section contains the symchksum argument descriptions and several examples of using the SYMCLI symchksum command. Consult the EMC Solutions Enabler Symmetrix CLI Command Reference for the complete list of syntax and options.

To list the devices on Symmetrix array 3890 with extents being checked for checksum errors, enter:

symchksum list -sid 3890

To show all the extents of Symmetrix device 0A1 on Symmetrix array 3890 being checked for checksum errors, enter:

symchksum show dev 0A1 -sid 3890

To enable Checksum on the extents of all the devices that define the current database instance and then to phone home on error, enter:

symchksum enable -type Oracle -phone_home

To enable Checksum on the extents of all the devices that define the tablespace and then to log on error, enter:

symchksum enable -type Oracle -tbs SYSTEM

Table 13 Background processes for managing a Data Guard environment


symchksum list Lists all devices that currently have checksum checking enabled.

show Displays the extents of a specified device that are having checksum checking performed.

enable Enables checksum checking on the extents of the specified devices.

disable Disables checksum checking on the extents of the specified devices.

validate Validates that a specified database or tablespace is able to have checksum checking enabled.

verify Verifies whether the specified database or tablespace has checksum checking enabled on all their devices.

Syntax and examples 353

354

Data Protection

To verify that Oracle tablespace USER01 has Checksum enabled on all the devices that have defined it, enter:

symchksum verify -type Oracle -tbs USER01

To disable Checksum on the current database instance, enter:

symchksum disable -type Oracle

Note: Disable by device should only be used under special circumstances. For example, this option can be used to remove extents if a database or a tablespace has been dropped without first doing a normal disable. In this case, disable by device can be used to remove the extents.

To disable (with force) Checksum for all checksum extents on Symmetrix device 0A1 on Symmetrix unit 3890, enter:

symchksum disable dev 0A1 -sid 3890 -force


9

This chapter describes storage tiers available on Symmetrix and methodologies for nondisruptive migration of Oracle data using Symmetrix technologies across available storage tiers:

◆ Overview........................................................................................... 356◆ Evolution of storage tiering ............................................................ 359◆ Symmetrix Virtual Provisioning .................................................... 361◆ Enhanced Virual LUN migrations for Oracle databases ............ 372◆ Fully Automated Storage Tiering for Virtual Pools .................... 381◆ Fully Automated Storage Tiering .................................................. 404◆ Conclusion ........................................................................................ 419

Storage Tiering—VirtualLUN and FAST

Storage Tiering—Virtual LUN and FAST 355

356

Storage Tiering—Virtual LUN and FAST

OverviewThe EMC Symmetrix VMAX series with Enginuity is the newest addition to the Symmetrix product family. Built on the strategy of simple, intelligent, modular storage, it incorporates a new scalable Virtual Matrix interconnect that connects all shared resources across all VMAX Engines, allowing the storage array to grow seamlessly and cost-effectively from an entry-level configuration into the world’s largest storage system. The Symmetrix VMAX provides improved performance and scalability for demanding enterprise storage environments while maintaining support for EMC’s broad portfolio of platform software offerings.

EMC Symmetrix VMAX delivers enhanced capability and flexibility for deploying Oracle databases throughout the entire range of business applications, from mission-critical applications to test and development. In order to support this wide range of performance and reliability at minimum cost, Symmetrix VMAX arrays support multiple drive technologies that include Enterprise Flash Drives (EFDs), Fibre Channel (FC) drives, both 10k rpm and 15k rpm, and 7,200 rpm SATA drives. In addition, various RAID protection mechanisms are allowed that affect the performance, availability, and economic impact of a given Oracle system deployed on a Symmetrix VMAX array.

As companies increase deployment of multiple drive and protection types in their high-end storage arrays, storage and database administrators are challenged to select the correct storage configuration for each application. Often, a single storage tier is selected for all data in a given database, effectively placing both active and idle data portions on fast FC drives. This approach is expensive and inefficient, because infrequently accessed data will reside unnecessarily on high-performance drives.

Alternatively, making use of high-density low-cost SATA drives for the less active data, FC drives for the medium active data, and EFDs for the very active data enables efficient use of storage resources, and reduces overall cost and the number of drives necessary. This, in turn, also helps to reduce energy requirements and floor space, allowing the business to grow more rapidly.

Database systems, due to the nature of the applications that they service, tend to direct the most significant workloads to a relatively small subset of the data stored within the database and the rest of the database is less frequently accessed. The imbalance of I/O load



across the database causes much higher utilization of the LUNs, holding the active objects in a phenomenon known as LUN access “skewing.” However, in most cases LUNs have some unallocated and therefore idle spaces, or a combination of hot and cold data due to a mix of different database objects. Such differences in the relative utilization of the space inside each LUN are referred to as sub-LUN “skewing.”

While the use of multiple storage tiers can be managed manually by DBAs placing the appropriate database objects in their right tier, this can become cumbersome given the growing complexity of applications and the fluctuations of access frequency to data over time.

Enginuity 5874 introduced Fully Automated Storage Tiering (FAST) as a method to address changes in LUN access skewing. FAST operates on standard (non-VP) Symmetrix addressable devices. It automatically and seamlessly moves the storage behind the controlled LUNs to the appropriate storage tier, based on user policy and LUN activity. Enginuity 5875 introduced FAST for Virtual Pools (FAST VP) as a method to address changes in sub-LUN access skewing. FAST VP is based on Virtual Provisioning™ and operates on thin Symmetrix devices. It automatically and seamlessly moves portions of the LUN to the appropriate storage tiers, based on user policy and the sub-LUN activity. Due to its finer granularity, FAST VP is more efficient in utilizing the capacity of the different storage tiers, and more responsive to changes in workload patterns than even the most diligent DBA. FAST VP also adapts readily to configurations in which, due to host striping, the workload is evenly distributed across many LUNs (like Oracle Automatic Storage Management, or ASM). Rather than having to move all the LUNs as a group between storage tiers, FAST VP operates appropriately on small portions in each LUN, moving them to the storage tier that best matches their workload needs.

FAST VP preserves Symmetrix device IDs, which means there is no need to change file system mount points, volume manager settings, database file locations, or scripts. It also maintains any TimeFinder or SRDF business continuity operations even as the data migration takes place.

By optimizing data placement of active LUNs and sub-LUNs to the storage tier that best answers their needs, FAST VP helps maximize utilization of Flash drives, increase performance, reduce the overall

Overview 357

358


number of drives, and improve the total cost of ownership (TCO) and ROI. FAST VP enables users to achieve these objectives while simplifying storage management.

This chapter describes Symmetrix Virtual Provisioning, a tiered storage architecture approach for Oracle databases, and the way in which devices can be moved nondisruptively, using either Virtual LUN, FAST for traditional thick devices and FAST VP for virtual provisioned devices, in order to put the right data on the right storage tier at the right time.



Evolution of storage tieringStorage tiering has evolved over the past several years from a completely manual process to the automatic process that it is today.

Manual storage tiering

Manual storage tiering is the process of collecting performance information on a set of drives and then manually placing data on different drive types based on the performance requirement for that data. This process is typically very labor-intensive and does not dynamically adjust as the load on the application increases or decreases over time.

Fully Automated Storage Tiering (FAST)

FAST was introduced in 2009 and is based on virtual LUN (VLUN) migration for standard devices. FAST allows administrators to define policies and priorities that govern what data resides in each storage tier and can automatically make data placement decisions without human intervention. FAST is a major step forward in data management automation, but it is limited to moving entire LUNs from one tier to another. Even if only a small amount of the data on the LUN is active, then inactive data is also migrated, consuming valuable space in the higher-performance tier.

Fully Automated Storage Tiering for Virtual Pools (FAST VP)

FAST VP monitors the performance of a LUN at fine granularity and moves only a small number of Symmetrix tracks between storage tiers. FAST VP automates the identification of sub-LUN data for the purposes of relocating it across different performance/capacity tiers within an array.

Example of storage tiering evolution

Figure 72 on page 360 shows an example of storage tiering evolution from a single tier to sub-LUN tiering. Although the image shows FAST VP operating on two tiers alone, in most cases tiering strategy is still best optimized for cost/performance using a three-tier approach.

Evolution of storage tiering 359

360


Figure 72 Storage tiering evolution

Traditional FAST FAST VP

ICO-IMG-000927



Symmetrix Virtual ProvisioningThis section contains the following information:

◆ “Introduction” on page 361

◆ “Virtual Provisioning and Oracle databases” on page 363

◆ “Planning thin devices for Oracle databases” on page 368

IntroductionSymmetrix Virtual Provisioning, the Symmetrix implementation of what is commonly known in the industry as “thin provisioning,” enables users to simplify storage management and increase capacity utilization by sharing storage among multiple applications and only allocating storage as needed from a shared “virtual pool” of physical disks.

Symmetrix thin devices are logical devices that can be used in many of the same ways that Symmetrix standard devices have traditionally been used. Unlike traditional Symmetrix devices, thin devices do not need to have physical storage preallocated at the time the device is created and presented to a host (although in many cases customers interested only in wide striping and ease of management choose to fully preallocate the thin devices). A thin device is not usable until it has been bound to a shared storage pool known as a thin pool. Multiple thin devices may be bound to any given thin pool. The thin pool is comprised of devices called data devices that provide the actual physical storage to support the thin device allocations.

When a write is performed to a part of any thin device for which physical storage has not yet been allocated, the Symmetrix allocates physical storage from the thin pool for that portion of the thin device only. The Symmetrix operating environment, Enginuity, satisfies the requirement by providing a block of storage from the thin pool called a thin device extent. This approach reduces the amount of storage that is actually consumed.

The minimum amount of physical storage that can be reserved at a time for the dedicated use of a thin device is referred to as a data device extent. The data device extent is allocated from any one of the data devices in the associated thin pool. Allocations across the data

Symmetrix Virtual Provisioning 361

362


devices are balanced to ensure that an even distribution of allocations occurs from all available data devices in the thin pool (also referred to as wide striping).

For Symmetrix, the thin device extent size is the same as the data device extent size, which is 12 Symmetrix tracks or 768 KB. Note that, there is no reason to match the LVM stripe depth with the thin device extent size. Oracle commonly accesses data either by random single block read/write operations (usually 8 KB in size) or sequentially by reading large portions of data. In either case there is no advantage or disadvantage to match the LVM stripe depth to the thin device extent size as single block read/writes operate on a data portion that is smaller than the LVM stripe depth anyway. For sequential operations, if the data is stored together in adjacent locations on the devices, the read operation will simply continue to read data on each LUN (every time the sequential read wraps to that same LUN) regardless of the stripe depth. If the LVM striping caused the data to be stored randomly on the storage devices then the sequential read operation will turn into a storage random read of large I/Os spread across all the devices.

When a read is performed on a thin device, the data being read is retrieved from the appropriate data device in the thin pool to which the thin device is associated. If for some reason a read is performed against an unallocated portion of the thin device, zeros are returned to the reading process.

When more physical data storage is required to service existing or future thin devices, for example, when a thin pool is approaching full storage allocations, data devices can be added to existing thin pools dynamically without causing a system outage. New thin devices can also be created and bound to an existing thin pool at any time.

When data devices are added to a thin pool they can be in an enabled or disabled state. In order for the data device to be used for thin extent allocation, it needs to be in the enabled state. For it to be removed from the thin pool, it needs to be in a disabled state. Symmetrix automatically initiates a drain operation on a disabled data device without any disruption to the application. Once all the allocated extents are drained to other data devices, a data device can be removed from the thin pool.

The following figure depicts the relationships between thin devices and their associated thin pools. Thin Pool A contains six data devices, and thin Pool B contains three data devices. There are nine thin



devices associated with thin Pool A and three thin devices associated with thin pool B. The data extents for thin devices are distributed on various data devices as shown in Figure 73.

Figure 73 Thin devices and thin pools containing data devices

The way thin extents are allocated across the data devices results in a form of striping in the thin pool. The more data devices in the thin pool (and the associated physical drives behind them), the wider striping will be, creating an even I/O distribution across the thin pool. Wide striping simplifies storage management by reducing the time required for planning and execution of data layout.

Virtual Provisioning and Oracle databases

Oracle data file initializationUsing Virtual Provisioning in conjunction with Oracle databases provides the benefits mentioned earlier, such as reducing future server impact during LUN provisioning, increasing storage utilization, native striping in the thin pool, and ease and speed of creating and working with thin devices. However, as commonly known, when Oracle initializes new files, such as log, data and temp files, it fully allocates the file space by writing non-zero information

Pool A

Pool B

Data devices

Data devices

Thindevices

ICO-IMG-000929


364


(metadata) to each initialized block. This will cause the thin pool to allocate the amount of space that is being initialized by the database. As database files are added, more space will be allocated in the pool. Due to Oracle file initialization, and in order to get the most benefit from a Virtual Provisioning infrastructure, a strategy for sizing files, pools, and devices should be developed in accordance with application and storage management needs. Some strategy options are explained next.

OversubscriptionAn oversubscription strategy is based on using thin devices with a total capacity greater than the physical storage in the pools that they are bound to. This can increase capacity utilization by sharing storage among applications, thereby reducing the amount of allocated but unused space. The thin devices each seem to be a full-size device to the application, while in fact the thin pool cannot accommodate the total LUNs’ capacity. Since Oracle database files initialize their space even though they are still empty, it is recommended that instead of creating very large data files that remain largely empty for most of their lifetime, smaller data files should be considered to accommodate near-term data growth. As they fill up over time, their size can be increased, or more data files added, in conjunction with the capacity increase of the thin pool. The Oracle auto-extend feature can be used for simplicity of management, or DBAs may prefer to use manual file size management or addition.

An oversubscription strategy is recommended for database environments when database growth is controlled, and thin pools can be actively monitored and their size increased when necessary in a timely manner.

UndersubscriptionAn undersubscription strategy is based on using thin devices with a total capacity smaller than the physical storage in the pools that they are bound to. This approach doesnot necessarily improve storage capacity utilization but still makes use of wide striping, thin pool sharing, and other benefits of Virtual Provisioning. In this case the data files can be sized to make immediate use of the full thin device size, or alternatively, auto-extend or manual file management can be used.



Undersubscribing is recommended when data growth is unpredictable, when multiple small databases share a large thin pool to benefit from wide striping, or when an oversubscriptioned environment is considered unacceptable.

Thin device preallocationA third option exists that can be used with either oversubscription or undersubscription, and has become very popular for Oracle databases. When the DBAs like to guarantee that space is reserved for the databases’ thin devices, they can use thin device preallocation. While this reduces potential capacity utilization benefits for the thin pool, it still enables users to achieve easier data layout with wide striping. A thin device can preallocate space in the pool, even before data was written to it. Figure 74 shows an example of creating 10x 29.30 GB thin devices, and preallocating 10 GB in the pool for each of them.

Figure 74 Thin device configuration


366


The example shows an SMC screen (a similar operation can be done using the Symmetrix CLI). When preallocation is used, Oracle database customers often preallocate the whole thin device (reducing the storage capacity optimization benefits). In effect each thin device therefore fully claims its space in the thin pool, eliminating a possible thin pool out-of-space condition. It is also possible to preallocate a portion of the thin device (like the 10 GB in the example) to match the size of the application file. For example, ASM disks can be set smaller than their actual full size, and later be resized dynamically without any disruption to the database application. In this case an ASM disk group can be created from these 10 thin devices, only using 10 GB of each disk. At a later time, additional storage on the thin device can be preallocated, and ASM disks resized to match it.

Planning thin pools for Oracle databasesPlanning thin pools for Oracle environments requires some attention to detail but the advantage of using thin pools is that the environment is flexible. By using thin devices, performance of the database can be improved over thick devices because thin devices are striped evenly over all the physical drives in the pool. For typical OLTP Oracle databases this provides the maximum number of physical devices to service the workload. If a database starts on a pool of, say, 64 physical devices, and the load to those devices is too heavy, the pool can be expanded dynamically without interruption to the application, to spread the load over more physical drives.

In general, thin pools should be configured to meet at least the initial capacity requirements of all applications that will reside in the pool. The pool should also contain enough physical drives to service the expected back-end physical drive workload. Customers can work with their local EMC account team for recommendations on how to size the number of physical drives.

For RAID protection, thin pools are not different in terms of reliability and physical drive performance than existing drives today. If an application is deployed on RAID 5 (3+1) today, there is no reason to change the protection for thin pools. Likewise if an application is deployed on RAID 1 or RAID 5 (7+1), then the thin pool should be configured to match. Both RAID 1 and RAID 5 protect from a single-drive failure, and RAID 6 protects from two-drive failures. A RAID 1 group resides on two physical drives; a RAID 5 (3+1) group resides on four physical drives, and so on. When a thin pool is created, it is always created out of similarly configured RAID groups. For example, if we create eight RAID 5 (3+1) data devices and put



them into one pool, the pool has eight RAID 5 devices of four drives each. If one of the drives in this pool fails, you are not losing one drive from a pool of 32 drives; rather, you are losing one drive from one of the eight RAID-protected data devices and that RAID group can continue to service read and write requests, in degraded mode, without data loss. Also, as with any RAID group, with a failed drive Enginuity will immediately invoke a hot sparing operation to restore the RAID group to its normal state. While this RAID group is rebuilding, any of the other RAID groups in the thin pool can have a drive failure and there is still no loss of data. In this example, with eight RAID groups in the pool there can be one failed drive in each RAID group in the pool without data loss. In this manner data stored in the thin pool is no more vulnerable to data loss than any other data stored on similarly configured RAID devices. Therefore, a protection of RAID 1 or RAID 5 for thin pools is acceptable for most applications and RAID 6 is only required in situations where additional parity protection is warranted.

The number of thin pools is affected by a few factors. The first is the choice of drive type and RAID protection. Each thin pool is a group of data devices sharing the same drive type and RAID protection. For example, a thin pool that consists of multiple RAID 5 protected data devices based on 15k rpm FC disk can host the Oracle data files for a good choice of capacity/performance optimization. However, very often the redo logs that take relatively small capacity are best protected using RAID 1 and therefore another thin pool containing RAID 1 protected data devices can be used. In order to ensure sufficient spindles behind the redo logs the same set of physical drives that is used for the RAID 5 pool can also be used for the RAID 1 thin pool. Such sharing at the physical drive level, but separation at the thin pool level, allows efficient use of drive capacity without compromising on the RAID protection choice. Oracle Fast Recovery Area (FRA), for example, can be placed in a RAID 6 protected SATA drive’s thin pool.

Therefore the choice of the appropriate drive technology and RAID protection is the first factor in determining the number of thin pools. The other factor has to do with the business owners. When applications share thin pools they are bound to the same set of data devices and spindles, and they share the same overall thin pool capacity and performance. If business owners require their own control over thin pool management they will likely need a separate set of thin pools based on their needs. In general, however, for ease of


368


manageability it is best to keep the overall number of thin pools low, and allow them to be spread widely across many drives for best performance.

Planning thin devices for Oracle databases

Thin device sizingThe maximum size of a thin device in a Symmetrix VMAX is 240 GB. If a larger size is needed, then a metavolume comprised of thin devices can be created. When host striping is used, like Oracle ASM, it is recommended that the metavolume be concatenated rather than striped since the host will provide a layer of striping, and the thin pool is already striped based on data device extents. Concatenated metavolumes also support fast expansion capabilities, as new metavolume members can be easily appended to the existing concatenated metavolume. This functionality may be applicable when the provisioned thin device has become fully allocated at the host level, and it is required to further increase the thin device to gain additional space. Note that it is not recommended to provision applications with a low number of very large LUNs. The reason is that each LUN provides the host with an additional I/O queue to which the host operating system can stream I/O requests and parallelize the workload. Host software and HBA drivers tend to limit the amount of I/Os that can be queued at a time to a LUN and therefore to avoid host queuing bottlenecks under heavy workloads, it is better to provide the application with multiple, smaller LUNs rather than very few and large LUNs.

Striped metavolumes are supported with Virtual Provisioning and there may be workloads that will benefit from multiple levels of striping (for example, for Oracle redo logs when SRDF/S is used, and host striping is not available).

When oversubscription is used, the thin pool can be sized for near-term database capacity growth, and the thin devices for long-term LUN capacity needs. Since the thin LUNs do not take space in the pool until data is written to them, this method optimizes storage capacity utilization and reduces the database and application impact as they continue to grow. Note, however, that the larger the device the more metadata is associated with it and tracked in the Symmetrix cache. Therefore the sizing should be reasonable and realistic to limit unnecessary cache overhead, as small as it is.



Thin devices and ASM disk group planningThin devices are presented to the host as SCSI LUNs. Oracle recommends creating at least a single partition on each LUN to identify the device as being used. On x86-based platforms it is important to align the LUN partition, for example, by using fdisk or parted on Linux. With fdisk, after the new partition is created, type “x” to enter Expert mode, then use the “b” option to move the beginning of the partition. Either 128 blocks (64 KB) offset or 2,048 blocks (1 MB) offset are good choices and align with the Symmetrix 64 KB cache track size. After assigning Oracle permissions to the partition it can become an ASM disk group member or can be used in other ways for the Oracle database.

Oracle recommends when using Oracle Automatic Storage Management (ASM), the use of a minimum number of ASM disk groups for ease of management. Indeed, when multiple smaller databases share the same performance and availability requirements they can also share ASM disk groups; however, larger, more critical databases may require their own ASM disk groups for better control and isolation. EMC best practice for mission-critical Oracle databases is to create a few ASM disk groups based on the following considerations:

◆ +GRID:

Starting with database 11gR2 Oracle has merged Cluster Ready Services (CRS) and ASM and they are installed together as part of Grid installation. Therefore when the clusterware is installed, the first ASM disk group is also created to host the quorum and cluster configuration devices. Since these devices contain local environment information such as hostnames and subnet masks, there is no reason to clone or replicate them. EMC best practice starting with Oracle database 11.2 is to only create a very small disk group during Grid installation for the sake of CRS devices and not place any database components in it. When other ASM disk groups containing database data are replicated with storage technology, they can simply be mounted to a different +GRID disk group at the target host or site, already with Oracle CRS installed with all the local information relevant to that host and site. Note that while external redundancy (RAID protection is handled by the storage array) is recommended for all other ASM disk groups, EMC recommends high redundancy only for the +GRID disk group. The reason is that Oracle automates the number of quorum devices based on redundancy level and it will allow the creation of more quorum devices. Since the capacity


370


requirements of the +GRID ASM disk group are tiny, very small devices can be provisioned (High redundancy implies three copies/mirrors and therefore a minimum of three devices is required).

◆ +DATA, +LOG: While separating data and log files to two different ASM disk groups is optional, EMC recommends it in the following cases:

• When TimeFinder is used to create a clone (or snap) that is a valid backup image of the database. The TimeFinder/Clone image can serve as a source for RMAN backup to tape, and/or can be opened for reporting (read-only), and so on. However the importance of such a clone image is that it is a valid full backup image of the database. If the database requires media recovery, restoring the TimeFinder/Clone back to production takes only seconds-regardless of the database size. This is a huge saving in RTO and in a matter of a few seconds, archive logs can start being applied as part of media recovery roll forward. When such a clone does not exist, the initial backup set has to be first restored from tape/VTL prior to applying any archive log, which can add a significant amount of time to recovery operations. Therefore, when TimeFinder is used to create a backup image of the database, in order for the restore to not overwrite the online logs, they should be placed in separate devices and a separate ASM disk group.

• Another reason for separation of data from log files is performance and availability. Redo log writes are synchronous and require to complete in the least amount of time. By having them placed in separate storage devices, the commit writes will not have to share the LUN I/O queue with large async buffer cache checkpoint I/Os. Having the logs in their own devices makes it available to use one RAID protection for data files (such as RAID 5), and another for the logs (such as RAID 1).

◆ +TEMP: When storage replication technology is used for disaster recovery, like SRDF/S, it is possible to save bandwidth by not replicating temp files. Since temp files are not part of a recovery operation and quick to add, having them on separate devices allows bandwidth saving, but adds to the operations of bringing up the database after failover. While it is not required to separate temp files, it is an option and the DBA may choose to do it anyway for performance isolation reasons if that is their best practice.



◆ +FRA: Fast Recovery Area typically hosts the archive logs and sometimes flashback logs and backup sets. Since the I/O operations to FRA are typically sequential writes, it is usually sufficient to have it located on a lower tier such as SATA drives. It is also an Oracle recommendation to have FRA as a separate disk group from the rest of the database to avoid keeping the database files and archive logs or backup sets (that protect them) together.

Thin pool reclamation using ASM storage reclamation utilityThe Symmetrix Enginuity 5874 Q4 2009 service release introduced zero space reclamation, which returns allocated data extents containing contiguous zero blocks to a thin pool for reuse. The feature currently works at the granularity of 12 tracks (768 KB). When database and host files are dropped, they are commonly not zeroed out by the operating system or the Oracle database and therefore their space cannot be reclaimed in the thin pool.

In general, Oracle ASM reuses free/deleted space under the high watermark very efficiently. However, when a large amount of space is released, for example after the deletion of a large tablespace or database, and the space is not anticipated to be needed soon by that ASM disk group, it is beneficial to free up that space in both the disk group and thin pool.

To simplify the storage reclamation of thin pool space no longer needed by ASM objects, Oracle has developed the ASM Storage Reclamation Utility (ASRU). ASRU rebalances (consolidates) the ASM disk group and resizes the disks. Once the ASM disks are resized, ASRU fills the remainder of the disk space with zeros to allow the reclamation of the zero space by storage Virtual Provisioning zero reclamation algorithms. The whole process is nondisruptive since users can perform ASM’s resize and rebalance operations online. ASRU along with Symmetrix Virtual Provisioning zero space reclamation can save a considerable amount of space in the thin pool, making it available for other applications.


372


Enhanced Virual LUN migrations for Oracle databasesThe Oracle database and storage administrators use a variety of mechanisms to place the data on the right storage tier. This section describes manual tiering mechanisms commonly deployed and then extends the discussion to nondisruptive, and transparent migration of Oracle data using Enhanced Virtual LUN technology available on Symmetrix.

Manual tiering mechanics

The goal of an effective storage tiering approach in a multi-typed storage configuration is to place the right data on the right storage tier at the right time. A given Oracle device may be highly active and highly accessed when data is created on it in the first instance. But over time, its usage may drop to a level where the device could be deployed on a storage tier that has lower-cost and lower-performance characteristics.

A typical manual storage tiering approach uses the following steps:

1. Monitor database and storage performance: Oracle statspack or AWR report can be used to analyze the Oracle database I/O profile.

2. Identify and classify hot and cold database objects that are candidates for cost reduction (down-tiering) or performance improvement (up-tiering).

3. Identify space to be used as targets for tiering operations.

4. Use database, host, or storage utilities to move the candidates to the right storage tier.

5. Validate that the tiering activities had the desired effects.

6. Repeat the process at a later time.

Symmetrix Enhanced Virtual LUN technology

Symmetrix VMAX and Enginuity 5874 introduced an Enhanced Virtual LUN technology that enables transparent, nondisruptive data mobility of thick devices between storage tiers (a combination of disk technology and RAID protection). Virtual LUN technology provides users with the ability to move Symmetrix logical devices between



drive types, such as high-performance Enterprise Flash Drives (EFDs), Fibre Channel drives, or high-capacity low-cost SATA drives and at the same time change their RAID protection.

Virtual LUN migration occurs independent of host operating systems or applications, and during the migration the devices remain fully accessible to database transactions. While the back-end device characteristics change (RAID protection or physical drive type) the migrated devices’ identities remain the same to the host, allowing seamless online migration. Virtual LUN migration is fully integrated with Symmetrix replication technology and maintains consistency of source/target device relationships in replications such as SRDF, TimeFinder/Clone, TimeFinder/Snap, or Open Replicator.

The advantages of migrating data using storage technology are ease of use, efficiency, and simplicity. Data is migrated in the Symmetrix back end without needing any SAN or host resources, which increases migration efficiency. The migration is a safe operation as the target is treated internally as just another “mirror” of the logical device, although with its own RAID protection and drive type. At the end of the migration, the data on the original “mirror” is formatted to preserve security. Finally, since the identity of source devices does not change, moving between storage tiers is easy and does not require additional host change control, backup script updates, changes in file system mount points, a volume manager, or others. The migration pace can be controlled using Symmetrix quality of service (symqos) commands.

Virtual LUN migration helps customers to implement an Information Life Management (ILM) strategy for their databases, such as the movement of the entire database, tablespaces, partitions, or ASM disk groups between storage tiers. It also allows adjustments in service levels and performance requirements to application data. For example, often application storage is provisioned before clear performance requirements are known. At a later time, once the requirements are better understood, it is easy to make any adjustment to increase user experience and ROI using the correct storage tier.

LUN-based migrations and ASMAutomatic Storage Management (ASM) is a feature in Oracle Database 10g and higher that provides the database administrators with a simple storage management interface that is consistent across all server and storage platforms. As a vertically integrated file system

Enhanced Virual LUN migrations for Oracle databases 373

374


and volume manager, purpose-built for Oracle database files, ASM provides the performance of async I/O with the easy management of a file system. ASM provides capability that saves DBAs time and provides flexibility to manage a dynamic database environment with increased efficiency.

Oracle Database 11g Release 2 extends ASM functionality to manage all data: Oracle database files, Oracle Clusterware files, and non-structured general-purpose data such as binaries, external files, and text files. ASM simplifies, automates, and reduces cost and overhead by providing a unified and integrated solution stack for all your file management needs eliminating the need for third-party volume managers, file systems, and clusterware platforms.

Symmetrix FAST complements the use of Oracle ASM, or other file systems and volume manager types for that matter. FAST relies on the availability of multiple storage tiers in the Symmetrix storage, and on LUN access skewing. As discussed earlier, LUN access skewing is very common in databases and often tends to simply show that the most recent data is accessed much more heavily than older data. The next few paragraphs will discuss approaches to LUN access skewing with ASM and other volume managers or file systems.

One of the many features of Oracle ASM is that it will stripe all the data under ASM control evenly across all the LUNs in the ASM disk group. The effect this has on I/O distribution is that access to all the members in the ASM disk group is uniform. With the first release of FAST, the granularity is a full device (ASM member) and therefore the FAST Storage Group should always include all the devices in any ASM disk group that is under FAST control. So for an Oracle system using ASM, the goal will be to move or swap an entire ASM disk group from one storage tier to another rather than individual members. Therefore when planning the database for use of multiple storage tiers, ASM disk group creation should be designed accordingly. For example, if DBAs are not interested in the redo logs being managed by FAST, and only data files, they should place each file type in their own ASM disk group and allow FAST to manage just the devices of the +DATA ASM disk group. This approach will create skewing between ASM disk groups, based on the I/O profile generated by the data file types included in each disk group. Another approach is to look at skewing between databases. In that case by placing each database in their own ASM disk group, FAST can now manage the performance and cost requirements for each database separately.



A similar approach can be taken with other third-party file systems and LVMs. For example, if tablespace data files are spread across multiple file systems, all the devices that pertain to these file systems should be included together in the FAST Storage Group. In that case it would be wise to not mix on the same file systems data files that belong to different databases, business units, or organizations, unless the intent is to operate on all of them in the same way when it comes to a storage tiering strategy.

With the release of Oracle Database 11gR2, Oracle introduced a feature called Intelligent Data Placement (IDP). IDP adds a template to ASM disk groups that allows the ASM extents that belong to specific Oracle files to be placed on the “hot” (outer) or “cold” (inner) locations of the ASM members (LUNs). The Symmetrix storage array never presents full physical drives to the host. Rather each drive is carved into multiple logical devices and those are presented as LUNs to hosts. This allows greater flexibility, data protection, and sharing of resources, including data movement, virtualization, and replication. Therefore IDP will not be effective with Symmetrix since it can not operate on the full physical drive. In addition, Symmetrix Enginuity code will make sure the disk heads are placed in the most optimal position, which depends on data access and not inner/outer drive locations.

Symmetrix devices can be concatenated to simulate full physical drives or RAID groups; however, keep in mind that current FC 10k rpm drive size has reached 600 GB, and SATA drives have capacities of 1 and 2 TB. These are very large objects to operate on and each time additional capacity is needed, multiples of such large units will need to be acquired and added, which is not cost or capacity efficient. In addition, for performance reasons it is better to present a host with multiple reasonably sized LUNs than very few large ones. The reason is that each LUN will have its own I/O queue by the host and this will make keeping the storage devices busy with enough concurrent work easy. With few very large LUNs often server host queuing needs to be tweaked to simulate this behavior, potentially causing performance problems. As a final note, since Flash drives have no moving parts, there is no notion of inner or outer drive location and IDP does not apply to EFDs either.

The following section describes a scenario of migrating an I/O intensive ASM disk group from FC to EFDs using enhanced Virtual LUN.


376


Configuration for Virtual LUN migrationAs depicted in Figure 75 on page 376, based on the performance analysis of the ASM diskgroup “+Sales” it was identified that this ASM disk group services more random I/O and EFDs are better suited for the disk group. ASM operates on the paradigm of Stripe and Mirror Everything (SAME) and hence the data is widely striped across all the logical volumes that are part of the ASM disk group. Every time ASM devices are added or removed,ASM automatically rebalances the extents across the new set of devices. It is recommended to operate on an entire ASM diskgroup and migrate all the LUNs to devices with similar I/O characteristics. In this case an entire ASM disk group containing 40 x 50 GB ASM devices (RAID 1) created on 40 x 300 GB FC drives will be migrated to logical volumes carved out of 4 x 400 GB EFDs (RAID 5).

Figure 75 Migration of ASM members from FC to EFDs using Enhanced Virtual LUN technology

The target devices for the migration can be chosen from configured space or new devices can be automatically configured by migrating to unconfigured space.

Migration to configured spaceThe LUN migration to configured space requires pre-configured LUNs with equal or larger capacity and desired RAID protection on the target storage and mapping source and target LUNs. The target LUN will contain the complete copy of the application data on the source LUN at the end of migration. The storage associated with

40 x 300 GB 15K rpm (RAID 1)

4 x 400 GB EnterpriseFlash drives (RAID 5)

Ease of LUN migration for: Databases ASM disk groups Partitions Tablespaces

‘+Sales’20 x 50 GB

ASM members

ICO-IMG-000779



source LUNs will be released using iVTOC-an Enginuity operation that wipes out logical volume information from the LUN at the end of the migration process. The entire process happens transparent to the application and no changes to the LUN visibility or identity are made.

Example:

This example illustrates steps to migrate the ASM volumes in sales_dg device group to configured space on disk group 1. Prior to migration 40 x 50 GB RAID 5 (3+1) LUNs are configured on 4 x 400 GB EFDs. The migration operation will involve automatic selection of appropriate target devices from the Symmetrix disk group #1.

The command line is:

symmigrate -name migrate_Session -g sales_dg -tgt_config -tgt_disk_grp 1 -tgt_raid_5 -tgt_prot 3+1 establish

Figure 76 depicts steps involved during the migration of Symmetrix LUN 790 that is a member of the “+SALES” ASM disk group to automatically selected target device 1FD7 on disk group 1.

Figure 76 Virtual LUN migration to configured space

Steps:

1. Migrating device 790 from a RAID 1 (FC) to RAID 5 (3+1) on EFD configured as 1FD7.

RAID1(FC)

M1

RDF

M2

RAID5(SATA)

M3 M4

RAID1(FC)

M1 M2 M3 M41FD7

790

ICO-IMG-000780


378


2. Configuration lock is taken.

3. RAID 5 mirror of 1FD7 is made not_ready and attached to source device in one out of four available mirror positions as the secondary mirror.

4. Configuration lock is released.

5. Secondary mirror is synchronized from the primary mirror.

6. Once synchronization is done, configuration lock is taken again.

7. Primary and secondary roles are switched and original primary mirror is detached from the source and moved to the target device 1FD7.

8. iVTOC is performed on 1FD7 for clearing the state of the original primary mirror and stopping further access to the original data on this mirror.


Migration to unconfigured spaceThis type of migration only requires selection of target storage and protection type but no LUNs have to be carved out of target storage ahead of time. At the start of the migration process, Symmetrix Enginuity automatically creates the target LUNs with required capacity and RAID protection on the unconfigured storage space and performs data migration between source and target LUNs. At the end of migration the original source storage is added to the unconfigured capacity.

Example:

This example illustrates steps to migrate the ASM volumes in the sales_dg device group to unconfigured space on disk group 2. At the end of the migration, the target devices will be configured as RAID 5 (3+1) on 4 x 400 GB EFDs.

The command line is:

symmigrate -name migrate_Session -g sales_dg -tgt_unconfig -tgt_disk_grp 2 -tgt_raid_5 -tgt_prot 3+1 establish

Figure 77 on page 379 depicts steps involved during the migration of Symmetrix LUN 790 that is a member of the “+SALES” ASM disk group to automatically configured RAID 5 (3+1) target device on EFD disk group 2.



Figure 77 Virtual LUN migration to unconfigured space

Steps:

1. Migrating device 790 from a RAID 1 (FC) to RAID 5 (EFD) pool.

2. Configuration lock is taken.

3. The RAID 5 mirror is created from unconfigured space and added as the secondary mirror.


5. The secondary mirror is synchronized from the primary mirror.

6. Once synchronization is done, the configuration lock is taken again.

7. Primary and secondary roles are switched and the original primary mirror is detached from the source and moved to the target device 1FD7.

8. The original primary mirror on RAID 1 (FC) is deleted.


M1

RDF

M2

RAID5(SATA)

M3 M4

790

ICO-IMG-000781

Primary Remote Secondary


380


Symmetrix Virtual LUN VP mobility technologyIntroduced in Enginuity 5875, EMC Symmetrix VMAX VLUN VP enables transparent, nondisruptive data mobility of thindevices between storage tiers or RAID protections. Virtual LUN VP Mobility (VLUN VP) benefits and usage are almost identical to VLUN with the exception that while VLUN operated on “thick” devices, VLUN VP operates only on thin devices, and migrates only the allocated extents of a thin device to a single target thin pool. As a result, at the end of the migration the thin device will share the storage tier and RAID protection of the target thin pool.

Note that when using VLUN VP on devices under FAST VP control, it is recommended to pin the thin devices to the target thin pool so that FAST VP does not move them to other tiers until the user is ready. When thin devices under FAST VP control are pinned to a thin pool, FAST VP continues to collect their statistics, but it will not issue move plans for them.

VLUN VP enables customers to move Symmetrix thin devices without disrupting user applications and with minimal impact to host I/O. Users may move thin devices between thin pools to:

◆ Change the drive media on which the thin devices are stored

◆ Change the thin device RAID protection level

◆ Move a thin device that was managed by FAST VP (and may be spread across multiple tiers, or thin pools) to a single thin pool

While VLUN VP has the ability to move all allocated thin device extents from one pool to another, it also has the ability to move specific thin device extents from one pool to another, and it is this feature that is the basis for FAST VP.



Fully Automated Storage Tiering for Virtual PoolsThis section describes the Fully Automated Storage Tiering (FAST VP) configuration and illustrates the process of sub-LUN level migration of Oracle data onVirtual Provisioned storage (thin devices) to different storage tiers to meet the business requirements using FAST VP.

FAST VP and Virtual Provisioning

FAST VP is based on Virtual Provisioning technology. Virtual Provisioning as explained earlier allows the creation and use of virtual devices (commonly referred to as thin devices) that are host-addressable, cache-only pointer-based devices. Once the host starts using the thin devices, their data is allocated in commonly shared pools called thin pools. A thin pool is simply a collection of Symmetrix regular devices of the same drive technology and RAID protection (for example, 50 x 100 GB RAID 5-15k rpm FC devices can be grouped into a thin pool called FC15k_RAID5). Because the thin pool devices store the pointer-based thin devices’ data, they are also referred to as data devices. Data in the thin pool is always striped, taking advantage of all the physical drives behind the thin pool data devices. This allows both improved performance as well as ease of deployment and storage provisioning. In addition, as data devices are added or removed from the thin pool, their data will be rebalanced (restriped) seamlessly as well. In short, Virtual Provisioning has many deployment advantages in addition to being the base technology for FAST VP.

One can start understanding how FAST VP benefits from this structure. Since the thin device is pointer-based, and its actual data is stored in thin pools based on distinct drive type technology, when FAST VP moves data between storage tiers it simply migrates the data between the different thin pools and updates the thin device pointers accordingly. To the host, the migration is seamless as the thin device maintains the exact same LUN identity. At the Symmetrix storage, however, the data is migrated between thin pools without any application downtime.

Fully Automated Storage Tiering for Virtual Pools 381

382


FAST VP ElementsFAST VP has three main elements—storage tiers, storage groups, and FAST policies—as shown in Figure 78.

Figure 78 FAST managed objects

◆ Storage tiers are the combination of drive technology and RAID protection available in the VMAX array. Examples for storage tiers are RAID 5 EFD, RAID 1 FC, RAID 6 SATA, and so on. Since FAST VP is based on Virtual Provisioning, the storage tiers for FAST VP contain one to three thin pools of the same drive type and RAID protection.

◆ Storage groups are collections of Symmetrix host-addressable devices. For example, all the devices provided to an Oracle database can be grouped into a storage group. While a storage group can contain both thin and thick devices, FAST VP will operate only on the thin devices in a given storage group.

◆ A FAST VP policy combines storage groups with storage tiers, and defines the configured capacities, as a percentage, that a storage group is allowed to consume on that tier. For example, a FAST VP policy can define 10 percent of its allocation to be placed on EFD_RAID 5, 40 percent on FC15k_RAID 1, and 50 percent on SATA_RAID 6 as shown in Figure 79 on page 383. Note that these allocations are the maximum allowed. For example, a policy of 100 percent on each of the storage tiers means that FAST VP has liberty to place up to 100 percent of the storage group data on any of the tiers. When combined, the policy must total at least 100 percent, but may be greater than 100 percent as shown in Figure 8. In addition, the FAST VP policy defines exact time windows for performance analysis, data movement, data relocation rate, and other related settings.

Storage Tier FAST Policy Storage Group

EFD Tier

FC Tier

SATA Tier

10%

40%

50%

Policy Storage Group 1

ICO-IMG-000930



FAST VP operates in the storage array based on the policy allocation limits for each tier (“Compliance”), and in response to the application workload (“Performance”). During the Performance Time Window, FAST will gather performance statistics for the controlled storage groups. During the Move Time Window, FAST will then create move plans (every 10 minutes) that will accommodate any necessary changes in performance or due to compliance changes. Therefore, FAST VP operates in reactions to changes in workload or capacities, in accordance to the policy.

Figure 79 FAST policy association

FAST VP time window considerationsThere is no one Performance Time Window recommendation that is generically applicable to all customer environments. Each site will need to make the decision based on its particular requirements and SLAs. Collecting statistics 24x7 is simple and the most comprehensive approach; however, overnight and daytime I/O profiles may differ greatly, and evening performance may not be as important as daytime performance. This difference can be addressed by simply setting the collection policy to be active only during the daytime from 7 a.m. to 7 p.m., Monday to Friday. This policy is best

ICO-IMG-000931


384


suited for applications that have consistent I/O loads during traditional business hours. Another approach would be to only collect statistics during peak times on specific days. This is most beneficial to customers whose I/O profile has very specific busy periods, such as the a.m. hours of Mondays. By selecting only the peak hours for statistical collection, the site can ensure that the data that is most active during peak periods gets the highest priority to move to a high-performance tier. The default Performance Time Window is set for 24x7 as the norm but can be easily changed using CLI or SMC.

FAST VP move time window considerationsChoosing a FAST VP Move Time Window allows a site to make a decision about how quickly FAST VP responds to changes in the workload. Allowing it to move data at any time of the day lets FAST VP quickly adapt to changing I/O profiles but may add activity to the Symmetrix back end during these peak times. Alternatively, the FAST VP Move Time Window can be set to specific lower activity hours to prevent FAST activity from interfering with online activity. One such case would be when FAST is initially implemented on the array when the amount of data being moved could be substantial. In either case FAST VP would attempt to make the move operations as efficiently as possible by only moving allocated extents, and with sub-LUN granularity the move operations are focused on just the datasets that need to be promoted or demoted.

The FAST VP Relocation Rate (FRR) is a quality-of-service setting for FAST VP and affects the “aggressiveness” of data movement requests generated by FAST VP. FRR can be set between 1 and 10, with 1 being the most aggressive, to allow the FAST VP migrations to complete as fast as possible, and 10 being the least aggressive. With the release of FAST VP and Enginuity 5875, the default FRR is set to 5 and can be easily changed dynamically. An FRR of 6 was chosen for the use cases described later in the chapter.

FAST VP architectureThere are two components of FAST VP: Symmetrix microcode and the FAST controller.



The Symmetrix microcode is a part of the Enginuity storage operating environment that controls components within the array. The FAST controller is a service that runs on the Symmetrix service processor.

Figure 80 FAST VP components

When FAST VP is active, both components participate in the execution of two algorithms to determine appropriate data placement:

◆ Intelligent tiering algorithm

The intelligent tiering algorithm uses performance data collected by the microcode, as well as supporting calculations performed by the FAST controller, to issue data movement requests to the VLUN VP data movement engine.

◆ Allocation compliance

The allocation compliance algorithm enforces the upper limits of storage capacity that can be used in each tier by a given storage group by also issuing data movement requests to the VLUN VP data movement engine.


386


Data movements performed by the microcode are achieved by moving allocated extents between tiers. The size of data movement can be as small as 768 KB, representing a single allocated thin device extent, but will more typically be an entire extent group, which is 10 thin device extents, or 7.5 MB.

FAST VP has two modes of operation, Automatic or Off. When operating in Automatic mode, data analysis and data movements will occur continuously during the defined windows. In Off mode, performance statistics will continue to be collected, but no data analysis or data movements will take place.

FAST VP and Oracle databasesFAST VP integrates very well with Oracle databases. As explained earlier, applications tend to drive most of the workload to a subset of the database, and very often, just a small subset of the whole database. That subset is a candidate for performance improvement and therefore uptiering by FAST VP. Other database subsets can either remain where they are or be down-tiered if they are mostly idle (for example, unused space or historic data maintained due to regulations). If we look at Oracle ASM, it natively stripes the data across its members, spreading the workload across all storage devices in the ASM disk group. From the host it may look as if all the LUNs are very active but in fact, in almost all cases just a small portion of each LUN is very active. Figure 81 on page 387 shows an example of I/O read activity, as experienced by the Symmetrix storage array, to a set of 15 ASM devices (X-axis) relative to the location on the devices (Y-axis). The color reflects I/O activity to each logical block address on the LUN (LBA), where blue indicates low activity and red high. It is easy to see in this example that while ASM stripes the data and spreads the workload evenly across the devices, not all areas on each LUN are “hot,” and FAST VP can focus on the hot areas alone and uptier them. It can also down-tier the idle areas (or leave them in place, based on the policy allocations). The result will be improved performance, cost, and storage efficiency.

Even if ASM is not in use, other volume managers tend to stripe the data across multiple devices and will therefore benefit from FAST VP in a similar way. When file systems alone are used we can look at a sub-LUN skewing inside the file system rather than a set of devices.



The file system will traditionally host multiple data files, each containing database objects in which some will tend to be more active than others as discussed earlier, creating I/O access skewing at a sub-LUN level.

Figure 81 “Heat” map of ASM member devices showing sub-LUN skewing

At the same time there are certain considerations that need to be understood in relationship to FAST VP and planned for. One of them is instantaneous changes in workload characteristics and the other is changes in data placement initiated by the host such as ASM rebalance.

Instantaneous changes in workload characteristicsInstantaneous changes in workload characteristics, such as quarter-end or year-end reports, may put a heavy workload on portions of the database that are not accessed daily and may have been migrated to a lower-performance tier. Symmetrix is optimized to take advantage of very large cache (up to 1 TB raw) and has efficient algorithms to prefetch data and optimize disk I/O access. Therefore Symmetrix VMAX will handle most workload changes effectively and no action needs to be taken by the user. On the other hand, the user can also assist by modifying the FAST VP policy ahead of such activity when it is known and expected, and by changing the Symmetrix priority controls and cache partitioning quotas if used.

ICO-IMG-000919


388


Since such events are usually short term and only touch each dataset once it is unlikely (and not desirable) for FAST VP to migrate data at that same time and it is best to simply let the storage handle the workload appropriately. If the event is expected to last a longer period of time (such as hours or days), then FAST VP, being a reactive mechanism, will actively optimize the storage allocation as it does natively.

Changes in data placement initiated by the host (such as ASM rebalance)Changes in data placement initiated by the host can be due to file system defrag, volume manager restriping, or even simply a user moving database objects. When Oracle ASM is used, the data is automatically striped across the disk group. There are certain operations that will cause ASM to restripe (rebalance) the data, effectively moving existing allocated ASM extents to a new location, which may cause the storage tiering optimized by FAST VP to temporarily degrade until FAST VP re-optimizes the database layout. ASM rebalance commonly takes place when devices are added or dropped from the ASM disk group. These operations are normally known in advance (although not always) and will take place during maintenance or low-activity times. Typically new thin devices given to the database (and ASM) will be bound to a medium- or high-performance storage tier, such as FC or EFD. Therefore when such devices are added, ASM will rebalance extents into them, and it is unlikely that database performance will degrade much afterward (since they are already on a relatively fast storage tier). If such activity takes place during low-activity or maintenance time it may be beneficial to disable FAST VP movement until it is complete and then let FAST VP initiate a move plan based on the new layout. FAST VP will respond to the changes and re-optimize the data layout. Of course it is important that any new devices that are added to ASM should be also added to the FAST VP controlled storage groups so that FAST VP can operate on them together with the rest of the database devices.

Which Oracle objects to place under FAST VP controlVery often storage technology is managed by a different group from the database management team and coordination is based on need. In these cases when devices are provisioned to the database they can be placed under FAST VP control by the storage team without clear knowledge on how the database team will be using them. Since FAST VP analyzes the actual I/O workload based on the FAST Policy it will actively optimize the storage tiering of all controlled devices.



However, when more coordination takes place between the database and storage administrators it might be best to focus the FAST VP optimization on database data files, and leave other database objects such as logs and temp space outside of FAST VP control. The reason is that redo logs, archive logs, and temp space devices experience sequential read and write activity. All writes in Symmetrix go to cache and are acknowledged immediately to the host (regardless of storage tier). For sequential reads, the different disk technologies at the storage array will have minimal impact due to I/O prefetch and reduced disk head movement (in contrast to random read activity).

FAST VP algorithms place higher emphasis on improving random read I/O activity although they also take into consideration writes and sequential reads activity. Placing only data files under FAST VP control will reduce the potential competition over the EFD tier by database objects that may have a high I/O load but are of less importance to consume precious capacity on that tier. However, as mentioned earlier, when all database devices are under FAST VP control, such objects may uptier, but with a lesser priority than objects with random read activity (such as data files with a typical I/O profile).

A different use case for FAST VP usage could be to optimize the storage tiering of sequential read/write devices (like temp files, archive logs) in a separate storage group and FAST VP policy with only SATA and FC tiers included in the FAST VP policy. In that way the goal is again to eliminate competition over EFD, while allowing dynamic cost/performance optimization for archive logs and temp files between SATA and FC tiers (redo logs are best served by the FC tier in almost all cases).

OLTP or DSS workloads and FAST VPAs explained in the previous section, FAST VP places higher emphasis on uptiering a random read workload, although it will try to improve performance of other devices with high I/O activity such as sequential reads and writes. For that reason the active dataset of the OLTP applications will have a higher priority to be uptiered by FAST VP over DSS. However, DSS applications can benefit from FAST VP as well. First, data warehouse/BI systems often have large indexes that generate random read activity. These indexes generate an I/O workload that can highly benefit by being uptiered to EFD. Master Data Management (MDM) tables are another example of objects that can highly benefit from the EFD tier. FAST VP also downtiers inactive data. This is especially important in DSS


390


databases that tend to be very large. FAST VP can reduce costs by downtiering the aged data and partitions, and keep the active dataset in faster tiers. FAST VP does the storage tiering automatically without having to continuously perform complex ILM actions at the database or application tiers.

Examples of FAST VP for Oracle databasesThis section covers examples of using Oracle database 11g with FAST VP. The three use cases are:

1. FAST VP optimization of a single Oracle database OLTP workload: This use case demonstrates the basic work of FAST VP and how it optimizes the storage allocation of a single Oracle database from the initial FC tier to all three tiers—SATA, FC, and EFD.

2. FAST VP optimization of two databases sharing an ASM disk group: This use case demonstrates FAST VP optimization when multiple Oracle databases with different workloads are sharing the same ASM disk groups, storage devices, and FAST VP policy.

3. FAST VP optimization of two databases with separate ASM disk groups: This use case demonstrates FAST VP optimization when each database requires its own FAST VP policy for better isolation and control of resources.

Test environmentThis section describes the hardware, software, and database configuration used for Oracle databases and FAST VP test cases as seen in Table 14.

Table 14 FAST VP Oracle test environment (page 1 of 2)

Configuration aspect Description

Storage Array Symmetrix VMAX

Enginuity 5875

Oracle CRS and Database Version 11gR2

EFD 8 x 400 GB

FC 40 x FC 15 k rpm 300 GB drives

SATA 32 x SATA 7,200 rpm 1 TB drives



Test Case 1: FAST VP optimization of a single Oracle database OLTP workloadThis section shows an example of the benefits of FAST VP storage tiering optimization of a single Oracle ASM-based database executing an OLTP workload. It highlights the changes in the tier allocation and performance between the beginning and the end for the run. The +DATA ASM disk group resides on the FC tier at the start and FAST VP migrates idle portions to SATA, and highly active portions to EFD. At the end of the run we can see improved transaction rates and response times and very efficient usage of the three tiers.

The test configuration had two Oracle databases-FINDB (Financial) and HRDB (Human Resource)-sharing ASM disk groups and therefore also a Virtual Provisioning storage group and FAST VP policy, as shown in Table 15.

One server was used for this test. Each of the Oracle databases was identical in size (about 600 GB) and designed for an industry-standard OLTP workload. However, during this test one database had high activity whereas the other database remained idle to provide a simple example of the behavior of FAST VP.

Linux Oracle Enterprise Linux 5.3

Multipathing EMCPowerPath 5.3 SP1

Host Dell R900

Table 14 FAST VP Oracle test environment (page 2 of 2)


Table 15 Initial tier allocation for test cases with shared ASM disk group

Databases ASM disk groups

Thin devices

Storage groups

Thin pool RAID Tier associated

Initial tier alocation

FINDB & HRDB+DATA 12 x 100

GB DATA_SG

FC_Pool

RAID 5

FC 100%

EFD_Pool EFD 0%

SATA_Pool SATA 0%

+REDO FC_Pool RAID 1 FC 10%


392


Note that since an industry-standard benchmark tool was used, the I/O distribution across the database was completely even and random. This reduced sub-LUN skewing (since the whole database was highly active), and therefore the second idle database helped in simulating a more normal environment where some objects will not be highly accessed. It is very likely that real customer databases will demonstrate much better locality of data referenced (the recent data is more heavily accessed, or a mix of hot and cold database objects), providing FAST VP with better sub-LUN skewing to work with. With improved locality of reference (sub-LUN skewing) smaller EFD capacity can contain the hot database objects and therefore the policy can be set to a smaller EFD tier allocation percentage than shown in this example.

Test case execution

Objectives

Achieve a single Oracle ASM database workload storage tiering optimization by FAST VP.

Steps

1. Run a baseline workload prior to the FAST VP-enabled run

2. Run the workload with FAST VP enabled, allowing storage allocation on all three tiers

3. Review the storage tiering efficiency and performance differences

Monitoring database and storage performance

During the baseline run the database devices were 100 percent allocated on the FC tier as shown in Table 16 on page 393. Per the AWR report given in Table 17 on page 393, user I/O random read activity (“db file sequential read”) is the main database wait event, with an average I/O response time of 6 ms. For FC drives this is a good response time that reflects a combination of 15k rpm drives (typically 6 ms response time at best per I/O, regardless of storage vendor) with efficient Symmetrix cache utilization.



Defining the FAST policy

Although a 6 ms response time is very good for a FC tier with a heavy I/O workload, a FAST VP “Gold” policy was set to improve both the performance for this critical database as well to tier it across SATA, FC, and EFD thin pools. As shown in Figure 82, which is part of a Symmetrix Management Console (SMC) screen, the Gold policy allowed a maximum 40 percent allocation on the EFD tier and 50 percent allocations on both of the FC and SATA tiers.

Figure 82 Gold FAST VP policy storage group association

Table 16 FINDB initial tier allocation

ASM disk group

Database size Storage groups

Initial storage tier allocation

+DATAFINDB (600 GB)

HRDB (600 GB)

DATA_SG

EFD 0% 0

FC 100% 1.2 TB

SATA 0% 0

Table 17 Initial AWR report for FINDB

Event Waits Time (s) Avg wait (s) %DB time Wait class

db file sequential read 3,730,770 12,490 6 88.44 User I/O

db file parallel read 85,450 1,249 14 6.74 User I/O

DB CPU 674 4.79

log file sync 193,448 108 1 0.56 Commit

db file scattered read 3,241 20 11 0.22 User I/O

ICO-IMG-000924


394


Running the database workload after enabling the FAST VP policy

The database workload was restarted after enabling the FAST VP policy. FAST VP collected statistics, analyzed them, and performed the extent movements following the performance and compliance algorithms.

As can be seen in Figure 83, the tier allocation changed rapidly and where the FC tier was 100 percent used at the beginning of the run, by the end of the run the ASM disk group was using 35 percent of the EFD tier and rest of the disk group was spread across FC and SATA tiers. As the entire +DATA ASM disk group was associated with a FAST VP policy and FINDB and HRDB were sharing the same ASM disk group, the majority of active extents of FINDB moved to the EFD tier whereas inactive extents of HRDB moved to the SATA tier. The extents that were moderately active remained in the FC storage tier. At the end of the run the ASM disk group was spread across all three storage tiers based on the workload and FAST VP policy.

The storage tier allocations initially and after FAST VP was enabled are shown in Table 18 on page 395. The Solutions Enabler command lines for enabling FAST VP operations and monitoring tier allocations are given in Appendix E, “Solutions Enabler Command Line Interface (CLI) for FAST VP Operations and Monitoring.”

Figure 83 Storage tier allocation changes during the FAST VP test for FINDB

FAST VP Enabled Tier Allocation

0

500

1000

1500

2000

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86

Interval

Tie

r C

apac

ity

Use

d

(GB

) EFD TierFC TierSATA Tier

ICO-IMG-000925



Analyzing the performance improvements with FAST VP

As can be seen in Table 19, the average I/O response time at the end of the run changed to 3 ms, which is a considerable improvement over the initial test that utilized the FC tier for the entire ASM disk group. This is the result of migration of active extents of the ASM disk group to EFD tiers and allocation of 35 percent capacity on that tier.

The response time improvement and utilization of all available storage tiers—EFD, FC and SATA—to store ASM disk group extents also resulted in considerable improvement in FINDB transaction rates as shown in the next figure. The initial database transaction rate (transactions per minute) for FINDB with the entire ASM disk group on the FC tier was 2,079 and after FAST VP initiated movements a transaction rate of 3,760 was achieved that is an improvement of 81 percent while utilizing all available storage tiers more effectively and efficiently.

Table 18 Oracle database tier allocations-initial and FAST VP enabled

ASM disk group

Database size Storage tiers allocation

+Data(1.2 TB)

FINDB (600 GB)

HRDB (600 GB)

Tier used Initial FAST VP enabled

EFD 0% 0 35% 626 GB

FC 100% 1.2 TB 50% 941 GB

SATA 0% 0 12% 204 GB

Table 19 FAST VP enabled database response time from the AWR report

Event Waits Time (s) Avg wait (ms) %DB time Wait class



DB CPU 674 4.69




396


Figure 84 Ddatabase transaction changes with FAST VP

Test Case 2: Oracle databases sharing the ASM disk group and FAST policyOracle ASM makes it easy to provision and share devices across multiple databases. The databases, running different workloads, can share the ASM disk group for ease of manageability and provisioning. Multiple databases can share the Symmetrix thin pools for ease of provisioning, wide striping, and manageability at the storage level as well. This section describes the test case in which a FAST VP policy is applied to the storage group associated with the shared ASM disk group. At the end of the run we can see improved transaction rates and response times of both databases, and very efficient usage of the available tiers.

0

1000

2000

3000

4000

5000

14 HR FAST Enabled Run

FAST Enabled 14 HourTPCC Run

Database Transaction Rate

0

1000

2000

3000

4000

Initial and With FAST VP Enabled

Initial TransactionRate

FAST VP EnabledTransaction Rate

Tran

sact

ions

Per

Min

ute

(TP

M)

Tran

sact

ions

Per

Min

ute

(TP

M)

FINDB FAST Enabled Run

ICO-IMG-000921



Test case execution

Objectives

Achieve storage tiering optimization for multiple databases sharing the ASM disk group using FAST VP.

Steps

1. Run performance baselines while both databases use the FC tier alone (prior to the FAST VP enabled run)

2. Run the workload again on both databases with FAST VP enabled, allowing storage allocation on all three tiers



During the baseline run the databases devices were 100 percent allocated on the FC tier as shown in . Both databases executed an OLTP-type workload (similar to the previous use case) where FINDB had more processes executing the workload in comparison to HRDB’s workload, and therefore FINDB had a higher workload profile than HRDB.

Table 20 FINDB and HRDB initial storage tier allocation

ASM disk group Database size Initial storage tier allocation

+DATA (1.2 TB)

FINDB(600 GB)

HRDB (600 GB)

EFD 0% 0

FC 100% 1.2 TB

SATA 0% 0

Table 21 Initial AWR report for FINDB




DB CPU 674 4.79




398



As the ASM disk group and Symmetrix storage groups are identical to the ones used in Test Case 1 the same FAST Policy is used for this use case.


At the start of the test FAST VP was enabled and workloads on both databases started with FINDB running a higher workload compared to HRDB. After an initial analysis period (which was 2 hours by default) FAST performed the movement to available tiersAnalyzing the performance improvements with FAST VP

Active extents from both databases were distributed to the EFD and FC tiers with the majority of active extents on EFDs while inactive extents migrated to the SATA tier. Figure 85 shows the performance improvements for both databases resulting from FAST VP controlled tier allocation..

Figure 85 Storage tier changes during FAST VP enabled run on two databases

The database transaction rate changes before and after FAST-based movements are shown in Table 22 on page 399. Both databases exhibited higher performance with FINDB, which was more active and achieved higher gain as more extents from FINDB got migrated to EFDs.

All on FC and FAST Enabled at the start

0

500

1000

1500

2000

2500

3000

14 HRFAST Enabled RunTran

sact

ions

Per

Min

ute

FINDB HighWorkload

HRDB LowWorkload

ICO-IMG-000922



Test Case 3: Oracle databases on separate ASM disk groups and FAST policies

Not all the databases have the same I/O profile or SLA requirements and may also warrant different data protection policies. By deploying the databases with different profiles on separate ASM disk groups, administrators can achieve the desired I/O performance and ease of manageability. On the storage side these ASM disk groups will be on separate storage groups to allow for definition of FAST VP policies appropriate for the desired performance. This section describes a use case with two Oracle databases with different I/O profiles on separate ASM disk groups and independent FAST policies.

The hardware configuration of this test was the same as the previous two use cases (as shown in Table 1). This test configuration had two Oracle databases—CRMDB (CRM) and SUPCHDB (Supply Chain)—on separate ASM disk groups, storage groups, and FAST VP policies, as shown in Table 23.

Table 22 FAST VP enabled database transaction rate changes

Database

Trasnsaction rate

% ImprovementInitial FAST VP enabled

FINDB 1144 2497 118%

HRDB 652 1222 87%

Table 23 Initial tier allocation for a test case with independent ASM disk groups

Databases ASM disk groups

Thin devices

Storage groups Thin pool RAID Tier

associatedInitial tier allocation

CRMDB+DATA 6 x 100 GB OraDevices_C1 FC_Pool

RAID 5

FC 100%

+REDO 2 x 6 GB OraRedo EFD_Pool EFD 0%

SUPCHDB

+DATA 6 x 100 GB OraDevices_S1 SATA_Pool SATA 0%

+REDO 2 x 6 GB OraRedo REDO_Pool RAID 1 FC 100%


400


The Symmetrix VMAX array had a mix of storage tiers–EFD, FC, and SATA. One server was used for this test. Each of the Oracle databases was identical in size (about 600 GB) and designed for an industry-standard OLTP workload.

The Oracle databases CRMDB and SUPCHDB used independent ASM disk groups based on thin devices that were initially bound to FC_Pool (FC tier).

The CRMDB database in this configuration was part of a customer relationship management system that was critical to the business. To achieve higher performance the FAST VP policy “GoldPolicy” was defined to make use of all three available storage tiers and storage group-OraDevices_C1 was associated with the policy.

The SUPCHDB database was important to the business and had proper performance characteristics. Business would benefit if the performance level can be maintained at lower cost. To meet this goal the FAST VP policy “SilverPolicy” was defined to make use of only FC and SATA tiers and storage group - OraDevices_S1 was associated with the policy.

Test case execution

Objectives

Achieve storage tiering optimization while maintaining isolation of resources that each database is allowed to use.

Steps

1. Run a baseline workload (prior to the FAST VP-enabled run)

2. Define two separate FAST policies–Gold policy and Silver policy– and associate them with the appropriate storage groups

3. Run the workloads again with FAST VP enabled, allowing storage allocation based on the distinct FAST VP policies





Table 24 shows the baseline performance of both databases based on the initial FC tier allocation. Both databases are getting a response time of 8 ms. Our goal is to improve it for CRMDB and maintain it for SUPCHDB at lower cost.


For CRMDB, our goal was to improve the performance. For FC-based configurations, a response time of 8 ms is reasonable, but can improve with better storage tiering. The FAST VP Gold policy was defined to improve both the performance for this critical database as well to tier it across SATA, HDD, and EFD thin pools. The Gold policy allowed a maximum 40 percent allocation on the EFD tier and 100 percent allocations on both of the FC and SATA tiers. By setting FC and SATA allocations to 100 percent in this policy, FAST VP has the liberty to leave up to 100 percent of the data on any of these tiers or move up to 40 percent of it to EFD, based on the actual workload.

Table 24 Initial AWR report for CRMDB and SUPCHDB

CRMDB




DB CPU 4,338 2.45

log file sync 1,635,001 1,157 1 0.65 Commit


SUPCHDB




DB CPU 4,338 2.45

log file sync 746,897 1,157 1 0.65 Commit



402


For SUPCHDB, our goal was to lower the cost while maintaining or improving the performance. The FAST VP Silver policy was defined to allocate the extents across FC and SATA drives to achieve this goal. The Silver policy allows a maximum of 50 percent allocation on the FC tier and up to 100 percent allocation on the SATA tier.


The database workload was repeated after enabling the FAST VP policy. FAST VP collected statistics, analyzed them, and performed the extent movements following the performance and compliance algorithms. The AWR reports for both databases were generated to review the I/O response times as shown in Table 25.

The database transaction rate changes are shown in Figure 86 on page 403.

Table 25 AST VP enabled AWR report for CRMDB and SUPCHDB

CRMDB




DB CPU 4,338 2.45

log file sync 1,635,001 1,157 1 0.65 Commit


SUPCHDB




DB CPU 4,338 1.86

log file sync 746,897 1,157 1 0.5 Commit




Figure 86 FAST VP enabled test with different FAST policies

Analyzing the performance improvements with FAST VP

As shown in Table 26, CRMDB used the FAST Gold policy and FAST VP migrated 40 percent of the CRMDB FC extents to the EFD tier and 10 percent to SATA. The rest of the extents remained on FC drives. This resulted in improvement of response time from 8 ms to 5 ms and a very decent improvement in transaction rate from 962 to 2,500, which represents 160 percent growth in transaction rate without any application change.

SUPCHDB used the FAST Silver policy and therefore FAST VP moved the less active extents to SATA drives. Still, the response time improved from 8 ms to 7 ms and hence we reached both cost savings while maintaining or improving performance.

FAST VP Enabled run with different FAST Policies

CRMDB HighWorkload 6Driver(Gold_Pol_1)

SUPCHDB LowWorkload 4Driver(Silver Policy)

FAST Enabled Run withdifferent FAST VP Policies

Tran

sact

ions

Per

Min

ute

30002500200015001000

5000

ICO-IMG-000923

Table 26 Storage tier allocation changes during the FAST VP-enabled run

ASM disk group

Database DB size

Initial transaction rate (TPM)

FAST VP enabled transaction rate (TPM)

%Change

FAST policy used

FAST VP enabled storage tiers used

EFD FC SATA

+DATA

CRMDB 600 GB 962 2500 160% GOLD Policy40% 50% 10%

240 GB 300 GB 60 GB

SUPCHDB 600 GB 682 826 21% SILVER Policy

44% 56%

0 266 GB 334 GB


404


Fully Automated Storage TieringThis section describes the FAST configuration and illustrates the process of migration of Oracle databases to the correct storage tiers to meet the business requirements using FAST.

Introduction

Businesses use multiple databases in environments that serve DSS and OLTP application workloads. Even though multiple levels of cache exist in the database I/O stack including host cache, database server cache and Symmetrix cache, the disk response time is critical at times for application performance. Selection of the correct storage class for various database objects is a challenge. Also the storage selection that works in one situation may not be optimal for other cases. Jobs executed at periodic intervals or on an adhoc basis such as quarter-end batch jobs demand high degree of performance and availability and make disk selection and data placement even more challenging. As the size and number of databases grow, analysis of performance of various databases, identifying the bottlenecks, and selection of the right storage tier for the multitude of databases turns into a daunting task.

Introduced in the Enginuity 5874 Q4 2009 service release, EMC Symmetrix VMAX Fully Automated Storage Tiering (FAST) is Symmetrix software that utilizes intelligent algorithms to continuously analyze device I/O activity and generate plans for moving and swapping devices for the purposes of allocating or re-allocating application data across different performance storage tiers within a Symmetrix array. FAST proactively monitors workloads at the Symmetrix device (LUN) level in order to identify “busy” devices that would benefit from being moved to higher-performing drives such as EFD. FAST will also identify less “busy” devices that could be relocated to higher-capacity, more cost-effective storage such as SATA drives without altering performance.

Time windows can be defined to specify when FAST should collect performance statistics (upon which the analysis to determine the appropriate storage tier for a device is based), and when FAST should perform the configuration changes necessary to move devices between storage tiers. Movement is based on user-defined storage tiers and FAST Policies.



The primary benefits of FAST include:

◆ Automating the process of identifying volumes that can benefit from EFD or that can be kept on higher-capacity, less-expensive drives without impacting performance

◆ Improving application performance at the same cost, or providing the same application performance at lower cost. Cost is defined as space, energy, acquisition, management, and operational expense

◆ Optimizing and prioritizing business applications, allowing customers to dynamically allocate resources within a single array

◆ Delivering greater flexibility in meeting different price/performance ratios throughout the lifecycle of the information stored

The management and operation of FAST can be conducted using either the Symmetrix Management Console (SMC) or the Solutions Enabler Command Line Interface (SYMCLI). Additionally, detailed performance trending, forecasting, alerts, and resource utilization are provided through the Symmetrix Performance Analyzer (SPA). And if so desired, Ionix ControlCenter provides the capability for advanced reporting and analysis that can be used for chargeback and capacity planning.

FAST configuration

FAST configuration involves three components:

◆ Storage Groups

A Storage Group is a logical grouping of Symmetrix devices. Storage Groups are shared between FAST and Auto-provisioning Groups; however, a Symmetrix device may only belong to one Storage Group that is under FAST control. A Symmetrix VMAX storage array supports upto 8,192 Storage Groups associated with FAST Policies.

◆ Storage Tiers

storage tiers are a combination of a drive technology (for example, EFD, FC 15k rpm, or SATA) and a RAID protection type (for example, RAID 1, RAID 5 (3+1), RAID 5 (7+1), RAID 6 (6+2)). There are two types of storage tiers-static and dynamic. A static type contains explicitly specified Symmetrix device groups, while a dynamic type will automatically contain all Symmetrix disk

Fully Automated Storage Tiering 405

406


groups of the same drive technology. A storage tier will contain at least one Symmetrix disk group but can contain more than one of a single drive technology type.

◆ FAST Policies

FAST Policies associate a set of Storage Groups with up to three storage tiers. FAST Policy includes the maximum percentage that Storage Group devices can occupy in each of the storage tiers. The percentage of storage specified for each type in the policy when aggregated must total at least 100 percent and may total more than 100 percent. For example, if the Storage Groups associated with the policy are allowed 100 percent in any of the types, FAST can recommend for all the storage devices to be together on any one type (capacity limit on the storage tier is not enforced). In another example, to force the Storage Group to one of the storage tiers, simply set the policy to 100 percent on that type and 0 percent for all other types. At the time of association, a Storage Group may also be given a priority (between 1 and 3) with a policy. If a conflict arises between multiple active FAST Policies, the FAST Policy priorty will help determine which policy gets precedence. The Symmetrix VMAX supports up to 256 FAST Policies.

FAST device movement

Devicerelocation to another storage class can take place when using FAST through either of the two ways: Move or Swap.

A move occurs when unconfiigured (free) space exists in the target storage tier. Only one device is involved in a move, and a Dynamic Relocation Volume (DRV) is not required. The device migration operation is identical to Virtual LUN migration to unconfigured space.

A swap occurs when there is no unconfigured space in the target storage tier, so similar sized device in target storage tier will be moved out of the storage tier. Such an operation requires one DRV for each pair of devices being swapped. To facilitate swap operation, DRV should therefore be sized to fit the largest FAST controlled device.



Moves or swaps are completely transparent to the application and can be performed nondisruptively. Symmetrix metadevices are complete entity; therefore metadevice members cannot exist in different Symmetrix disk groups.

FAST and ASM

Symmetrix FAST complements the use of Oracle ASM, or other file systems and volume managers for that matter. FAST relies on the availability of multiple storage tiers in the Symmetrix array, and on LUN access skewing. As discussed earlier, LUN access skewing is common in most database systems and often tends to simply show that the most recent data is accessed more heavily than older data.

Similar to Virtual LUN-based migrations, the first release of FAST also works at the LUN level granularity and therefore the FAST Storage Group should include all the devices in the ASM disk group to ensure that FAST policies are applied to all those LUNs simultaneously. This ensures that FAST will choose to move or swap all ASM devices within the ASM disk group and will not break the disk group on multiple storage tiers.

Because of this, when planning to use multiple storage tiers and FAST with ASM, the ASM disk groups must be designed accordingly. For example, if DBAs are not interested in the REDO logs but only data files, being managed by FAST, they should place the redo logs in their own ASM disk group and allow FAST to operate on the Storage Group containing the data disk group.

Example of FAST for Oracle databasesAs described in the overview, FAST is controlled by defining Storage Groups, storage tiers, and FAST Policies. The example here illustrates three databases, their associated I/O profile, business goals for the databases after their initial deployment, and FAST configurations to meet the service level goals for all the databases.

The test configuration used an Oracle 11gR2 RAC to manage three single-instance databases across two servers. All three databases were the same size, roughly 1 TB executing an OLTP workload. The focus was to manage the storage tiers between databases (that is, placing each database in the type that best matches its business and performance needs). Since each database had the redo logs, data files, temp, and FRA in their own ASM disk groups, it is only the +DATA


408


ASM disk group of each database that was moved between the storage tiers. The +REDO and +TEMP disk groups remained on 15k rpm drives, and FRA on SATA drives.

The first database, DB1, started on FC 15k rpm drives but was designed to simulate a low I/O activity database that has very few users, low importance to the business, and is a candidate to move to a lower storage tier, or “down-tier.” The DB1 database could be one that was once active but is now being replaced by a new application. The second database, DB2, was designed to simulate a medium active database that was initially deployed on SATA drives, but its activity level and importance to the business are increasing and it is a candidate to be moved to a higher storage tier, or “up-tier.” The last database, DB3, started on FC 15k rpm drives and was designed to simulate the high I/O activity level of a mission-critical application with many users and is a candidate to up-tier from FC 15k rpm to EFD.

The test configuration details are provided in Table 27.

Table 27 Test configuration


Storage Array Symmetrix VMAX (SE 2 Engine)

Enginuity 5874 Service Release Q4 ‘09

Oracle CRS and Database Version 11gR2

EFD 8 x 400 GB Enterprise Flash Drives

HDD 120 x FC 15k rpm 300 GB Drives

SATA 32 x SATA 7,200 rpm 1 TB Drives

Linux Oracle Enterprise Linux 5.3

Multipathing EMC PowerPath 5.3 SP1



Each of the three databases was using the ASM disk group configuration as shown in Table 28.

Table 29 shows the initial storage drive types and count behind each of the +DATA ASM disk groups at the beginning of the tests. It also shows the OLTP workload and potential business goals for each database.

Figure 77 on page 379 shows the logical FAST profile we used for database 3, or DB3. In this case, while we have three drive types in the Symmetrix VMAX—EFD, FC 15k rpm, and SATA drives—we do not want DB3 to reside on SATA so we could potentially not include a SATA type. However, including it and setting the allowable percentage to 0 percent has the same effect.

Table 28 Storage and ASM configuration for each test database

ASM disk groupsNumber of LUNs Size (GB) Total (GB) RAID

DATA 10 120 1,200 RAID 5 (3+1)

REDO 20 5 100 RAID 1

TEMP 5 120 600 RAID 5 (3+1)

FRA 40 120 4,800 RAID 5 (3+1)

Table 29 Database storage placement (initial) and workload profile

DatabaseNumber of physical drives Drive type Workload Business goal

DB1 40 FC 15k Very low Down-tiering/cost saving

DB2 32 SATA Medium Up-tiering/preserver SLA

DB3 40 FC 15k High Up-tiering/improve SLA


410


Figure 87 Initial FAST policies for DB3

Monitoring database and storage performanceBased on the initial test configuration and the understanding of the relative importance of each database to the business (as shown in Table 29 on page 409) we reviewed Oracle AWR data for each of the databases (Table 30), and made a note of the OLTP transaction rate baseline for later comparison.

Based on these results we can see that DB1 is mainly busy waiting for random read I/O (“db file sequential read” Oracle event refers to host random I/O). A wait time of 5 ms is very good; however, this

Storage type

Storage classMatch

Service level objectives

Type 1

400 GB EFDRAID 5 (3+1)

Type 2

300 GB 15K FCRAID 5 (3+1)

Type 3

1 TB SATARAID 5 (3+1)

DB2_SG

DB2_SG

FAST Policies Storage Groups

DB3_SG100%

100%

0%

DB3_FP

ICO-IMG-000782

Table 30 Initial Oracle AWR report inspection (db file sequential read)

Database Events Waits Time(s) Avg wait (ms) % DB time

DB1 db file sequential read 684,271 3,367 5 84.6

DB2 db file sequential read 13,382 250 18 89.2

DB3 db file sequential read 18,786,472 163,680 9 76.2



database shows a low transaction rate and has no business justification to be on FC 15k rpm drives. As a result the decision is made to down-tier it to SATA drives.

DB2 is also spending most of the time waiting for random read I/Os with an average wait time of 18 ms. It is currently on SATA drives, and its transaction rate will surely improve by placing it on a better performing storage tier. In this case the business goal is to address DB2’s increasing importance and activity and in order to do it we will move it to the FC 15k rpm drive type where the I/O response time and transaction rate can be improved.

DB3 is already on FC 15k rpm drives, but based on its high I/O activity it can benefit from up-tiering to EFD. In this test configuration we had 8 x 400 GB EFDs in RAID 5, which gives roughly 2.8 TB available capacity. The 1 TB DATA ASM disk group of DB3 could easily fit there. This is a very visible and critical database to the business and so it was decided to up-tier its data to EFD.

Finally, we used an EMC internal performance analysis tool that shows a “heat map” of the drive utilization at the Symmetrix back end just to illustrate the changes. Each physical drive in the array is represented by a single rectangle and that rectangle corresponds with the physical location of the drive in the array. The rectangle is color-filled to represent the utilization of the physical drive as shown in the color legend on the left side of the figure. As shown in Figure 89 on page 412, the FC 15k rpm drives hosting DB1 are blue, showing minimum utilization. The SATA drives hosting DB2 are primarily light-green to yellow, showing medium-high utilization, and the FC 15k rpm drives hosting DB3 are bright red, which indicates a potential for improvement.


412


Figure 88 Initial FAST policy for DB3

Figure 89 Initial performance analysis on FAST

Based on the initial analysis, FAST configuration was done. The simplest way to configure FAST for the first time is to use the SMC FAST Configuration Wizard. But before starting the wizard, it is recommended to create the Storage Groups that FAST will operate on. Storage Groups in Symmetrix VMAX are used for both Auto-provisioning Groups to simplify device masking operations, as well as to group devices for FAST control. While Symmetrix devices can be in multiple Storage Groups, no two Storage Groups under

Storage type

Storage class Service level objectives

Type 1

400 GB EFDRAID 5 (3+1)

Type 2

300 GB 15K FCRAID 5 (3+1)

Type 3

1 TB SATARAID 5 (3+1)

DB2_SG

DB2_SG

FAST Policies Storage Groups

DB3_SG100%

100%

0%

DB3_FP

ICO-IMG-000782

IMG-ICO-000783

DB1 on FC 15K rpmDB3 on FC 15K rpm

DB2 on SATA



FAST control can contain the same devices. In Figure 90 we can see how the devices of ASM disk group +DATA, of database 3 (DB3), are placed into a Storage Group that can later be assigned a FAST Policy. As shown in Figure 90, FAST configuration parameters are specified. The user approval mode is chosen.

Figure 90 FAST configuration wizard: Setting FAST parameters

Figure 91 shows configuration of performance and windows for collecting FAST statistical performance data and move windows.

Figure 91 FAST configuration wizard: Creating performance and move time window

ICO-IMG-000784


414


Figure 92 shows provisioning the target storage tier for the FAST policies.

Figure 92 FAST configuration wizard: Creating FAST policy

When creating FAST policies, the Storage Groups prepared earlier for FAST control are being assigned storage tiers they can be allocated on, and the capacity percentage the Storage Group is allowed on each of them.

The last screen in the wizard is a summary and approval of the changes. Additional modifications to FAST configuration and settings can be done using Solutions Enabler or SMC directly, without accessing the wizard again. Solutions Enabler uses the “symfast” command line syntax, and SMC uses the FAST tab.

The following example shows how FAST can be used to migrate data for DB3 to the appropriate storage tier. The DB3 Storage Group properties box has three tabs—General, Devices, and Fast Compliance. The Devices tab shows the 10 Symmetrix devices that belong to the +DATA ASM disk group devices that contain DB3 data files and comprise the DB3_SG Storage Group. The FAST Compliance tab shows what tiers of storage this Storage Group may reside in. In this case we have defined the FC storage tier as the place where the drives are now and the EFD storage tier is where FAST may choose to move this ASM disk group. Note that there is no option for a SATA storage tier for the DB3 Storage Group. This will prohibit FAST from ever recommending a down-tier of DB3 to SATA.

ICO-IMG-000786



Figure 93 FAST configuration wizard: Creating a FAST storage group

The final step of the process is to associate the Storage Group with the FAST tiers and define a policy to manage FAST behavior. In our case we have one Storage Group (DB3_SG), two FAST tiers (EFD and FC), and one FAST Policy (Figure 94 on page 416). The FAST Policy allows for up to 100 percent of the Storage Group to reside on the Flash storage tier and allows for 100 percent of DB3 to reside on FC. Since there is no SATA storage tier defined for DB3, a third storage tier option does not exist. By allowing up to 100 percent of the DB3 Storage Group to reside on EFD we expected that if FAST was going to move any DB3 LUNs to EFD, it would move them all because they all have the same I/O profile, and there is ample capacity available on that storage tier to accommodate all the capacity of those ASM disk group devices or the FAST Storage Group.

ICO-IMG-000787


416


Figure 94 DB3 FAST policy

Monitoring and executing FAST recommendations for DB3The initial FAST test configuration was exactly the same as the initial configuration of the Virtual LUN use case described previously. It had DB2 placed on 10 LUNs on 32 physical SATA drives, with DB1 and DB3 started on forty 15k rpm drives each. Table 31 shows the initial performance of the databases in their starting locations.

We reran the OLTP workload and after an hour of collecting data, FAST proposed a Move/Swap plan shown in Figure 95 on page 417. FAST proposed to move all the LUNs in the DB3 disk group from FC to EFD in a single move, which is exactly what we had expected. We approved the plan, FAST executed the move plan, and we reran the workload. The results after the move are shown in Table 32 on page 417.

ICO-IMG-000791

Table 31 Initial FAST performance analysis results

Database Number of physical drives

Drive type Avg. txn/min % Change

DB1 40 FC 15k 349.20 0.00%

DB2 32 SATA 890.53 0.00%

DB3 40 FC 15k 11736.03 0.00%



After the FAST move of DB3 to EFD, overall system transaction performance improved around 13 percent and there was no degradation in performance of the other two databases. The utilization map (Figure 96 on page 418) shows both the active SATA drives and the Flash drives in shades of yellow and green, indicating moderate usage.

Figure 95 FAST swap/move detail

ICO-IMG-000790

This FAST plan is based on Performance

improvement algorithm

Table 32 Results after FAST migration of DB3 to Flash

Number of physical Drives Drive type Avg. txn/min % Change

DB1 40 FC 15k 358.12 2.55%

DB2 32 SATA 897.27 0.76%

DB3 8 Flash 13334.98 13.62%


418


Figure 96 Disk utilization map after migration

DB2 on SATA

DB1 on FC 15K rpm

DB3 on EFDICO-IMG-000789



ConclusionSymmetrix Virtual Provisioning offers great value to Oracle environments with improved performance and ease of management due to wide striping and higher capacity utilization. Oracle ASM and Symmetrix Virtual Provisioning complement each other very well. With a broad range of data protection mechanisms and tighter integration between Symmetrix and Oracle now available even for thin devices, adoption of Virtual Provisioning for Oracle environments is very desirable.

With the Enginuity 5874 Q4 2009 service release enhancements made to Virtual LUN migration and the introduction of FAST technology, data center administrators are now able to dynamically manage data placement in a Symmetrix array to maximize performance and minimize costs.Introduced with the Symmetrix Enginuity 5875 in Q1 2011, FAST VP in Oracle environments improves storage utilization and optimizes the performance of databases by effectively making use of multiple storage tiers at a lower overall cost of ownership when using Symmetrix Thin Provisioning.In a multi-tiered Oracle storage configuration, moving the highly accessed volumes from FC drives to EFDs can help administrators maintain or improve performance and free up FC drives for other uses. Moving active drives from SATA to FC drives improves performance and allows for increased application activity. Moving lightly accessed volumes from FC to SATA helps utilization and drives down cost. This volume or sub-LUN level movement can be done nondisruptively on a Symmetrix VMAX using Virtual LUN,FAST VP and FAST capabilities.

Conclusion 419

420



A

This appendix introduces EMC Symmetrix VMAX software and hardware capabilities, and provides a comprehensive set of best practices and procedures for high availability and business continuity when deploying Oracle Database 10g and 11g with Symmetrix VMAX, including EMC TimeFinder and Symmetrix Remote Data Facility (SRDF ), which have been widely deployed with Oracle databases.

◆ Introduction to Symmetrix VMAX series with Enginuity ......... 422◆ Leveraging TimeFinder and SRDF for business continuity

solutions ............................................................................................ 444◆ Conclusion ........................................................................................ 462◆ Test storage and database configuration ...................................... 463

Symmetrix VMAX withEnginuity

Symmetrix VMAX with Enginuity 421

422

Symmetrix VMAX with Enginuity

Introduction to Symmetrix VMAX series with EnginuityAs mentioned in Chapter 2, ”EMC Foundation Products,” The EMC Symmetrix VMAX Series with Enginuity is a new offering in the Symmetrix product line. Built on the strategy of simple, intelligent, modular storage, it incorporates a new Virtual Matrix interface that connects and shares resources across all nodes, allowing the storage array to seamlessly grow from an entry-level configuration into the world's largest storage system. Symmetrix VMAX provides improved performance and scalability for demanding enterprise database environments while maintaining support for EMC's broad portfolio of software offerings. With the release of Enginuity 5874, Symmetrix VMAX systems now deliver new software capabilities that improve ease of use, business continuity, Information Life Management (ILM), virtualization of small to large environments, and security.

Symmetrix VMAX arrays are well integrated with Oracle databases and applications to support their performance needs, scalability, availability, ease of management, and future growth. This white paper introduces Symmetrix VMAX software and hardware capabilities, and provides a comprehensive set of best practices and procedures for high availability and business continuity when deploying Oracle Database 10g and 11g with EMC Symmetrix VMAX. This includes EMC TimeFinder and Symmetrix Remote Data Facility (SRDF), which have been widely deployed with Oracle databases.

New Symmetrix VMAX ease of use, scalability and virtualization features

In addition to Symmetrix VMAX enhanced performance, scalability, and availability, Enginuity 5874 introduces new ease of use, virtualization, and ILM functionalities. With Symmetrix VMAX Auto-provisioning Groups, mapping devices to small or large Oracle database environments becomes fast and easy. Devices, HBA WWNs, or storage ports can be easily added or removed, and automatically these changes are propagated through the Autoprovisioning Group, thus improving and simplifying complex storage provisioning for any physical or virtual environment. With Symmetrix VMAX Enhanced Virtual LUN Technology, Oracle applications data can be migrated between storage tiers seamlessly, while the database is active, thus allowing the placement of data on the storage tier that best matches its performance and cost requirements. As database



performance requirements change, it is easy and efficient to move the appropriate LUNs to their new storage tier. Symmetrix VMAX Virtual LUN migration doesn't consume host or SAN resources; it improves return on investment (ROI) by using the correct storage tiering strategy, and it reduces complexity as there is no need to change backup or DR plans since the host devices don't change. Additional enhancements to availability, scalability, and ease of use are introduced later in the paper and are fully described in the VMAX product guide.

Oracle mission-critical applications require protection strategy

The demand for database protection and availability increases as data grows in size and becomes more interconnected, and the organization infrastructure expands. It is essential to have continuous access to the database and applications and efficient use of available system resources. Data centers face disasters caused by human errors, hardware and software failures, and natural disasters. When disaster strikes, the organization is measured by its ability to resume operations quickly, seamlessly, and with the minimum amount of data loss. Having a valid backup and restartable image of the entire information infrastructure greatly helps achieve the desired level of recovery point objective (RPO), recovery time objective (RTO), and service level agreement (SLA).

Enterprise protection and compliance using SRDFData consistency refers to the accuracy and integrity of the data and the copies of the data. Symmetrix VMAX offers several solutions for local and remote replication of Oracle databases and applications data. With SRDF software, single or multiple database mirrors can be created, together with their external data, application files and/or message queues - all sharing a consistency group. Replicating data this way creates the point of consistency across business units and applications before any disaster takes place. Failover to the DR site is merely a series of application restart operations that reduce overall complexity and downtime. SRDF provides two- or three-site solutions, and synchronous and asynchronous replication, as well as a no data loss solution over any distance using SRDF/Star, cascaded or concurrent SRDF, and the new SRDF/Extended Distance Protection (EDP). With SRDF/Star, for example, compliance requirements such as not operating the business without a disaster

Introduction to Symmetrix VMAX series with Enginuity 423

424


recovery site can be met, even when the production array is unavailable.

Oracle database clones and snapshots with TimeFinderEvery mission-critical system has a need for multiple copies, such as for development, test, backup offload, reporting, data publishing, and more. With Symmetrix VMAX using TimeFinder software, multiple Oracle database copies can be created or restored in a matter of seconds (either full volume clones or virtual snapshots), regardless of the database size. Such operations are incremental and only changes are copied over. As soon as TimeFinder creates (or restores) a replica, the target devices (or source) will immediately show the final image as if the copy has already finished, even if data copy operations continue incrementally in the background. This functionality shortens business operation times tremendously. For example, rather than performing backup directly on production, it can be offloaded in seconds to a standalone replica. In another example, if an Oracle database restore is required, as soon as TimeFinder restore starts, database recovery operations can start, and there is no need to wait for the storage restore to complete. This ability, also referred to as parallel restore, provides a huge reduction in RTO and increases business availability.

Oracle database recovery using storage consistent replicationsIn some cases there is a need for extremely fast database recovery, even without failing over to a DR site (especially when only one database out of many sustained a logical or physical corruption). By implementing TimeFinder consistency technology, periodic database replicas can be taken (for example, every few hours) without placing the Oracle database in hot backup mode. Oracle now supports database recovery on a consistent storage replica, applying archive and redo logs to recover it (Oracle support is based on Metalink note 604603.1).

Best practices for local and remote Oracle database replicationsThis appendix provides an overview of the Symmetrix VMAX system, Auto-provisioning Groups, and Virtual LUN technology with Oracle-related samples. It also details the procedures and best practices for the following use cases:



◆ Use Case 1 — Offloading database backups from production to a local TimeFinder/Clone, then using Oracle Recovery Manager (RMAN) for farther backup

◆ Use Case 2 — Facilitating parallel production database recovery by restoring a local TimeFinder/Clone backup image and applying logs to it

◆ Use Case 3 — Creating local restartable clones (or snaps) of production for database repurposing (such as creating test, development, and reporting copies)

◆ Use Case 4 — Creating remote mirrors of the production database for disaster protection (synchronous and asynchronous)

◆ Use Case 5 — Creating remote restartable and writeable database clones (or snaps) for repurposing

◆ Use Case 6 — Creating remote database valid backup and recovery clones (or snaps)

◆ Use Case 7 — Facilitating parallel production database recovery by restoring remote TimeFinder/Clone backup images simultaneously with SRDF restore, and then applying Oracle logs to the production database in parallel

◆ Use Case 8 — Demonstrating fast database recovery using a restartable TimeFinder replica

Symmetrix VMAX Auto-provisioning Groups

The Auto-provisioning Groups feature facilitates ease and simplicity of storage provisioning for standalone and clustered Oracle databases. It simplifies and shortens storage provisioning tasks for small- and large-scale environments. The storage provisioning that used to take many steps in prior releases can now be accomplished with just a few simple and intuitive operations.

The Auto-provisioning Groups feature is built on the notion of "storage groups," "initiator groups," "port groups," and the views that combine the groups together. Storage groups are populated with Symmetrix devices. Port groups are populated with the array front-end adapter (FA) port numbers. Initiator groups are populated with HBA WWN information. Then by simply combining storage, initiator, and port groups into views, the device masking operations take place automatically across the view. Any modification necessary to available storage devices, storage array ports, or HBAs would


426


simply require changing the appropriate group and will automatically be incorporated throughout the view. For example, if additional database devices are necessary, simply adding those devices to the appropriate storage group will automatically initiate all the necessary mapping and masking operations across the entire view (note that if the devices are already mapped, the operation will complete faster, otherwise the Symmetrix config change will first map the devices appropriately before they are masked, making the task take a little longer). Initiator groups can be cascaded as shown in the next example.

Figure 97 shows an example of using Auto-provisioning Groups to mask Oracle Real Applications Cluster (RAC) database devices. A storage group is created with the database devices and a port group with the Symmetrix ports. An initiator group is created for each host's HBAs (for long-term ease of management); however, they are then cascaded into a single initiator group for the entire cluster. The Auto-provisioning Groups view simply includes the storage group, port group, and the cascaded initiator group. If any hosts are added or removed from the cluster they will simply be added or removed from the cascaded initiator group. In a similar way, devices or Symmetrix ports can be added or removed from their groups and the view will automate the device provisioning for the cluster.

Figure 97 Oracle RAC and Auto-provisioning Groups

Port: 10E:1

Port: 07E:1

Oracle RAC devices

Storage SAN

RAC1_HBAs RAC2_HBAs



The following steps demonstrate the use of Auto-provisioning Groups, based on the example in Figure 97 on page 426.

1. Create a storage group for RAC devices

symaccess -name RAC_devs -type storage devs 790:7AF create

2. Create a port group with storage ports 7E:1 and 10E:1

symaccess -name RAC_ports -type port -dirport 7E:1,10E:1 create

3. Create an initiator group for each cluster node's HBAs

symaccess -name RAC1_hbas -type initiator -file ./RAC1_hbas.txt create

The file RAC1_hbas.txt contains:

WWN:10000000c975c2e4WWN:10000000c975c336 symaccess -name RAC2_hbas -type initiator -file

./RAC2_hbas.txt createThe file RAC2_hbas.txt contains:

WWN:10000000c975c31aWWN:10000000c975c3ab

4. Cascade the cluster nodes' initiator groups into a single one for the entire cluster

symaccess -name RAC_hbas -type initiator createsymaccess -name RAC_hbas -type initiator add -ig

RAC1_hbassymaccess -name RAC_hbas -type initiator add -ig

RAC2_hbas5. Create the view for the entire RAC cluster storage provisioning

symaccess create view -name RAC_view -storgrp RAC_devs -portgrp RAC_ports -initgrp RAC_hbas

Symmetrix VMAX Enhanced Virtual LUN migration technologyEnginuity 5874 provides an enhanced version of Symmetrix Virtual LUN software to enable transparent, nondisruptive data mobility of devices between storage tiers and/or RAID protections. Virtual LUN migration technology provides users with the ability to move Symmetrix logical devices between disk types, such as high-performance enterprise Flash drives (EFDs), Fibre Channel drives, or high-capacity low-cost SATA drives. As devices are migrated they can change their RAID protection.


428


Virtual LUN migration occurs independent of host operating systems or applications, and during the migration the devices remain fully accessible to database transactions. While the back-end device characteristics change (RAID protection and/or physical disk type) the migrated devices' identities remain the same, allowing seamless online migration. Virtual LUN is fully integrated with Symmetrix replication technology and the source devices can participate in replications such as SRDF, TimeFinder/Clone, TimeFinder/Snap, or Open Replicator.

The advantages of migrating data using storage technology are ease of use, efficiency, and simplicity. Data is migrated in the Symmetrix back end without needing any SAN or host resources increasing migration efficiency. The migration is a safe operation as the target is treated internally as just another "mirror" of the logical device, although with its own RAID protection and storage tier. At the end of the migration the original "mirror" of the logical device is simply removed. Finally, since the identity of source devices doesn't change, moving between storage tiers is made easy and doesn't require additional change control of business operations such as remote/local replications and backup. The migration pace can be controlled using Symmetrix Quality of Service (symqos) commands.

Virtual LUN migration helps customers to implement an Information Life Management (ILM) strategy for their databases, such as the move of the entire database, tablespaces, partitions, or ASM diskgroups between storage tiers. It also allows adjustments in service levels and performance requirements to application data. For example often application storage is provisioned before clear performance requirements are known. At a later time once the requirements are better understood it is easy to make any adjustment to increase user experience and ROI using the correct storage tier.

Figure 98 on page 429 shows an example of performing a Virtual LUN migration of an ASM diskgroup "+Sales" with 20 x 50 GB devices (ASM members). The migration source devices are spread across 40 x 300 GB hard disk drives and protected with RAID 1. The migration target devices are spread across only 4 x 400 GB EFDs and protected with RAID 5.



Figure 98 Migration example using Virtual LUN technology

The following steps demonstrate the use of Virtual LUN, based on the example in Figure 98.

1. Optional: Verify information for a migration session called Sales_mig

symmigrate -name Sales_mig -file Sales_ASM.txt validate

The file Sales_ASM.txt contains the list of source and target migration devices:

0100 0C00... ...0113 0C13

2. Perform the migration

symmigrate -name Sales_mig -file Sales_ASM.txt establish

3. Follow the migration progress and rate at 60-second intervals

symmigrate -name Sales_mig -file Sales_ASM.txt query -i 60

4. Terminate the migration session after completion

symmigrate -name Sales_mig -file Sales_ASM.txt terminate


430


5. Optional: Control migration pace

Create a Symmetrix DG with the source devices

symdg create Sales_dgsymld -g Sales_dg -range 0100:0113 addall

Control the copy pace using the DG

symqos -g Sales_dg set MIR pace 8

Virtual LUN can utilize configured or unconfigured disk space for the target devices. Migration to unconfigured disk space means that devices will move to occupy available free space in a target storage diskgroup. After the migration, the original storage space of the source devices will be unconfigured. In either case the source devices' identity doesn't change, making the migration seamless to the host; no changes to DR, backup, or high availability configuration aspects are necessary. When specifying configured disk space for the migration, in essence the source and target devices simply swap their storage characteristics. However, after the data was migrated to the target devices, the original source drive storage space will be reformatted, to prevent exposure of the data that once belonged to it.

With Enginuity 5874, migration of logical devices and metavolumes is supported. (Only the metahead volumes needs to be specified. The metamembers will be automatically selected.) Virtual LUN migration does not support migration of thin devices (or thin pool devices), virtual devices (or save pool devices), and internal Symmetrix devices such as VCM, SFS, or Vault.

Migration to configured spaceThis option is useful when most of the space in the target diskgroup is already configured (and therefore not enough free space is available). It is also useful when it is expected that the migration is temporary and a reverse migration will take place at a later time to the same target devices. One example of this is migrating the SALES ASM diskgroup to a Flash drive tier before the end-of-the-month closing report. That way when the time comes to migrate back, the source devices return to occupy their previous storage space. When migrating to a configured space both source and target devices are specified. The target devices should match in size to the source devices and they should be at least unmasked to any host, and optionally unmapped from any Symmetrix FA port. These requirements ensure that the target devices of the migration do not contain currently active customer data. Likewise, the target devices



cannot be involved in any other Symmetrix copy operation such as SRDF, Clone, Snap, or Open Replicator. After the migration, the target devices occupy the original storage location and protection of the source devices, and the original source device storage space is formatted to prevent exposure of its old data by the target.

Migration to unconfigured spaceThis option is useful when enough free space is available in the target storage diskgroup. When migrating to an unconfigured space only the source devices are specified. For the migration target, a storage diskgroup number is provided along with the RAID protection type of the new LUN. At the completion of this migration the old source LUN is unconfigured so no reformat of the LUN is required.

Symmetrix VMAX TimeFinder product familyThe EMC TimeFinder family of local replication technology allows for creating multiple, nondisruptive, read/writeable storage-based replicas of database and application data. It satisfies a broad range of customers' data replication needs with speed, scalability, efficient storage utilization, and minimal to no impact on the applications - regardless of the database size. TimeFinder provides a solution for backup, restart, and recovery of production databases and applications, even when they span Symmetrix arrays. TimeFinder is well integrated with other EMC products such as SRDF and allows the creation of replicas on a remote target without interrupting the synchronous or asynchronous replication. If a restore from a remote replica is needed, TimeFinder and SRDF will restore data incrementally and in parallel, to achieve a maximum level of availability and protection. The TimeFinder product family supports the creation of dependent write-consistent replicas using EMC consistency technology, and replicas that are valid for Oracle backup/recovery operations, as described later in the use cases.

TimeFinder/Clone and the new cascaded clonesTimeFinder/Clone provides the ability to create, refresh, or restore multiple full volume copies of the source volumes where after the first full synchronization, only incremental changes are passed between source and target devices. TimeFinder/Clone operations can have any combination of standard (STD) and/or business continuance volumes (BCV) for source and/or target devices, making it extremely flexible. TimeFinder/Clone can work in emulation mode, simulating TimeFinder/Mirror commands (symmir) for


432


legacy reasons; however, it is recommended to use the native TimeFinder/Clone command syntax (symmclone) when creating new scripts.

TimeFinder/Clone can scale to thousands of devices and can create up to 16 targets to each source device. It also provides the flexibility of synchronizing the target volumes before the clone session (replica) is activated, also referred to as precopy, after the clone session is activated, also referred to as background copy, or let the clone devices synchronize only when data is accessed, also referred to as no-copy, which can be used, for example, for short-term gold copies.

TimeFinder always presents the final copied image immediately on its target devices (when creating a replica) or source devices (when restoring it), even if background copy operations are still in progress. This allows the application to immediately use the TimeFinder devices. For example, during TimeFinder restore of a valid database backup image, Oracle roll forward recovery can start in parallel, reducing RTO.

Cascaded clones is a new feature in Enginuity 5874 that provides the ability to perform one additional clone operation on a clone target without losing the incremental nature of the relationships. This can become useful when the first clone is a gold copy (backup image, for example) that should not be used, but additional replicas are required off it for purposes such as backup, reporting, publishing, test/dev, and so on. Another option is to do it by using multiple TimeFinder/Snaps. However when a full volume replica is required instead, starting with Enginuity 5874 it is also possible to create an additional clone and deploy it for such purposes.

TimeFinder/Snap and the new TimeFinder/Snap RecreateTimeFinder/Snap software allows users to create, refresh, or restore multiple read/writeable, space-saving copies of data. TimeFinder/Snap allows data to be copied from each source device to as many as 128 target devices where the source devices can be either a STD device or a BCV. The target devices are Symmetrix virtual devices (VDEV) that consume negligible physical storage through the use of pointers to track changed data.

Any update to source target devices after the snap session was activated causes the pre-updated data to be copied in the background to a designated shared storage pool called a save device pool. The virtual device's pointer is then updated to that location. Any subsequent updates after the first data modification won't require



any further background copy. Since copy operations happen in the background, performance overhead of using TimeFinder/Snap is minimal, and the process is known as Avoid Copy on First Write (ACOFW).

TimeFinder/Snap Recreate is new in Enginuity 5874. It provides the ability to very quickly refresh TimeFinder snapshots. Previously it was necessary to terminate an older snap session in order to create a new one. The TimeFinder recreate command simplifies the process to refresh old snaps without having to describe the source and target devices relationships again.

TimeFinder Consistent SplitWith TimeFinder you can use the Enginuity Consistency Assist (ECA) feature to perform consistent splits between source and target device pairs across multiple, heterogeneous hosts. Consistent split (which is an implementation of instant split) helps to avoid inconsistencies and restart problems that can occur if you split database-related devices without first quiescing the database. The difference between a normal instant split and a consistent split is that when using consistent split on a group of devices, the database writes are held at the storage level momentarily while the foreground split occurs, maintaining dependent-write order consistency on the target devices comprising the group. Since the foreground instant split completes in just a few seconds, Oracle needs to be in hot backup mode only for this short time when hot backup is used. When consistent split alone is used to create a restartable replica, interference with business operations is minimal.

TimeFinder target devices, after performing a consistent split, are in a state that is equivalent to the state a database would be in after a power failure, or if all database instances were aborted simultaneously. This is a state that is well known to Oracle and it can recover easily from it by performing a crash recovery the next time the database instance is started.

TimeFinder and SRDFTimeFinder and SRDF products are closely integrated. In fact, it is always recommended to use SRDF in conjunction with remote TimeFinder to allow remote copies utilizing the target hardware resources without interrupting the SRDF replications. Also the remote copies can serve as a gold copy whenever an SRDF target needs to be refreshed. As an example, a remote TimeFinder/Clone can be created from the SRDF R2 devices, and many additional snaps


434


can be created out of that clone for test, development, and reporting instances. When SRDF/A is used any remote TimeFinder operation should use the consistent split feature to coordinate the replica with SRDF/A cycle switching. The use cases in this appendix illustrate some of the basic Oracle business continuity operations that TimeFinder and SRDF can perform together.

Symmetrix VMAX SRDF product familySymmetrix Remote Data Facility (SRDF) is a Symmetrix-based business continuance and disaster restart solution. In simplest terms, SRDF is a configuration of multiple Symmetrix units whose purpose is to maintain real-time copies of host devices in more than one location. The Symmetrix units can be in the same room, in different buildings within the same campus, or hundreds of miles apart. SRDF provides data mobility and disaster restart spanning multiple host platforms, operating systems, and applications. It can scale to thousands of devices, can replicate while maintaining write-order consistency from multiple source arrays to multiple target arrays, and can support a variety of topologies and configurations.

The local SRDF device, known as the source (R1) device, is configured in a pairing relationship with a remote target (R2) device, forming an SRDF pair. When the R2 devices are mirrored with R1 devices, the R2 devices are write-disabled to the remote host. After the R2 devices are synchronized with its R1 devices, they can be split at any time, making the R2 devices fully accessible to their hosts. The R2 device can be either used directly by hosts (once they are split), can be restored incrementally to the R1 devices, or can be used in conjunction with TimeFinder to create additional replicas. TimeFinder replicas can be taken from the R2 devices even while SRDF is replicating, without disturbing the replication.

Many other new performance and scalability features were added to SRDF with Enginuity release 5874, including a new protection mode called SRDF/Extended Distance Protection (SRDF/EDP). Please refer to the SRDF product guide for a full description.

SRDF modes of operationSRDF/Synchronous (SRDF/S), SRDF/Asynchronous (SRDF/A), and SRDF Adaptive Copy are the basic operation modes of SRDF. The first two are valid for Oracle database protection and maintain dependent write-order consistency. The third is useful for bulk data



transfers or in combination with more complex SRDF solutions such as SRDF/Automated Replication (SRDF/AR).

SRDF/Synchronous modeSRDF/S is used to create a no data loss solution of committed transactions. It provides the ability to replicate multiple databases and applications data remotely while guaranteeing the data on both the source and target devices is exactly the same. SRDF/S can protect single or multiple source Symmetrix storage arrays with synchronous replication.

With SRDF/S Synchronous replication, shown in Figure 99, each I/O from the local host to the source R1 devices is first written to the local Symmetrix cache (1) and then it is sent over the SRDF links to the remote Symmetrix unit (2). Once the remote Symmetrix unit acknowledged it received the I/O in its cache successfully (3), the I/O is acknowledged to the local host (4). Synchronous mode guarantees that the remote image is an exact duplication of the source R1 device's data.

Figure 99 SRDF/Synchronous replication

Single Roundtrip and Concurrent Write SRDF performance enhancements

Starting with the Enginuity 5772 Service Release, SRDF/S provides a few performance enhancements. The first, Single Roundtrip, allows faster SRDF/S response time when long distances increase write latency. Where previously a transfer-ready condition state was required from the SRDF target before sending the actual data, now both transfer ready and data are sent in parallel and acknowledged once. The second, Concurrent Write, allows SRDF/S to send up to eight I/Os in parallel per each source device if the I/O arrives from different FA ports. This allows SRDF/S to perform much faster, for


436


example, during Oracle checkpoints and when host multipathing tools like EMC PowerPath® are used.

SRDF/Asynchronous replication modeSRDF/Asynchronous (SRDF/A) provides a consistent point-in-time image on the target (R2) devices that is only slightly behind the source (R1) devices. SRDF/A allows replication over unlimited distance, with minimum to no effect on the performance of the local production database(s). SRDF/A can "ride" through workload peaks by utilizing the local Symmetrix cache and optionally spilling data to a disk pool (also called delta set extension, or DSE) and reducing the link bandwidth requirements.

SRDF/A session data is transferred to the remote Symmetrix array in timed cycles, also called delta sets, as illustrated in Figure 100 on page 437. There are three cycles that work in unison - the capture cycle receives all new I/O from the hosts, the transmit/receive cycles on the R1 and R2, respectively, send and receive the previous captured cycle until it is fully received, and the apply cycle applies a previously fully received cycle to the R2 devices.

The SRDF/A cycle switching process is very efficient and scalable. Within a capture cycle if a piece of data is updated multiple times only the most recent update to the data is transmitted once. This process is called write folding. Also, there is no need to maintain write consistency of each I/O. Instead, consistency is maintained between cycles. If replication stops for any reason SRDF will make sure to either apply a fully received cycle to the target R2 devices, or discard the last incomplete cycle. This leaves the remote R2 devices always only one or two cycles behind the R1 devices. While the default minimum cycle switching time is 30 seconds, it can grow during peak workload, and shrink back to default afterward.



Figure 100 SRDF/Asynchronous replication

SRDF/A Consistency Exempt

New to Enginuity 5874 is the ability to add or remove devices from an SRDF/A session without breaking the session consistency to perform that operation. When dynamic SRDF devices are added the consistency exempt flag is set, allowing them to synchronize without interrupting the consistency attributes of the other devices in the SRDF/A session. After they are in sync for two cycles the flag will be automatically removed, allowing them to join the session consistency attributes. When devices are suspended the consistency exempt flag will be automatically set, thus allowing them to be removed without interrupting the SRDF session consistency. These new and flexible abilities enhance database protection and availability.

SRDF/A Multi-Session Consistency

Like SRDF/S, SRDF/A can replicate from multiple source arrays to multiple target arrays while maintaining write-order consistency between cycles. When dependent write consistently across multiple Symmetrix arrays is required, the SRDF/A Multi-Session Consistency (MSC) option is used and the coordination of cycle switching across the arrays is performed with the assistance of SRDF redundant host daemons. The daemons merely wait for ready conditions on all the arrays and then send the switch cycle command, keeping communication light and efficient. Similar to TimeFinder consistent split, also when SRDF/A MSC is used there is a brief hold of write I/O on all the arrays simultaneously during cycle switch to preserve write-order consistency.


438


SRDF Adaptive Copy replication modeSRDF Adaptive Copy replication facilitates long-distance data sharing and migration (see Figure 101). SRDF Adaptive Copy replication allows the primary and secondary volumes to be more than one I/O out of synchronization. The maximum number of I/Os that can be out of synchronization is known as the maximum skew value, and can be set using SRDF monitoring and control software. There is no attempt to preserve the ordering of write I/Os when using SRDF Adaptive Copy replication.

Figure 101 SRDF Adaptive Copy mode

SRDF Adaptive Copy replication is useful as an interim step before changing to an Oracle-supported SRDF/S or SRDF/A replication. It is also used for point-in-time long-distance bulk transfer of data. For example, if the connection between the two sides is lost for a long period of time allowing the buildup of a large number of changes to accumulate, resumption of the links can cause a heavy surge in link traffic (created by the backlog of changes added to those generated by normal production traffic). By using SRDF Adaptive Copy replication, the backlog of invalid tracks is synchronized using the SRDF low priority queue, while new writes are buffered in cache and sent across using the high priority SRDF queue without impacting the host application. Once the backlog of changes has been transferred, or the total amount of changed tracks has reached a specified number, the mode can be changed to SRDF/S or SRDF/A replication to achieve database protection.

Note: SRDF Adaptive Copy replication is not supported for database restart or database recovery solutions with Oracle databases. Using SRDF Adaptive Copy replication by itself for disaster protection of Oracle databases will lead to a corrupt and unusable remote database.



SRDF topologiesSRDF can be set in many topologies other than the single SRDF source and target. Thus SRDF satisfies different needs for high availability and disaster restart. It can use a single target or two concurrent targets; it can provide a combination of synchronous and asynchronous replications; it can provide a three-site solution that allows no data loss over very long distances and more. Some of the basic topologies that can be used with SRDF are shown in the following section .

Concurrent SRDFSRDF allows simultaneous replication of single R1 source devices to up to two target devices using multiple SRDF links. All SRDF links can operate in either Synchronous or Asynchronous mode or one or more links can utilize Adaptive Copy mode for efficient utilization of available bandwidth on that link. This topology allows simultaneous data protection over short and long distances as shown in Figure 102.

Figure 102 Concurrent SRDF

Cascaded SRDFSRDF allows cascaded configurations in which data is propagated from one Symmetrix to the next. This configuration requires Synchronous mode for the first SRDF leg and Asynchronous or Adaptive Copy modes for the next. As shown in Figure 103, this topology provides remote replications over greater distances with varying degree of bandwidth utilization and none to limited data loss (depends on the choice of SRDF modes and disaster type).


440


Figure 103 Cascaded SRDF

SRDF/Extended Distance ProtectionSRDF currently supports multi-site replications in cascaded SRDF configuration. This feature is enhanced to support a more efficient two-site DR solution over extended distances with zero or near zero data loss. In this configuration (shown in Figure 104), the storage cache alone is used on the intermediate site for a temporary pass-through data store of the modified tracks before copying them over to the tertiary site. SRDF/S and Adaptive Copy are allowed between primary and secondary sites. SRDF/A and Adaptive Copy are available between secondary and tertiary sites.

Figure 104 SRDF/Extended Distance Protection

The major benefits of this configuration are:

◆ New long-distance replication solution with the ability to achieve zero RPO at the target site



◆ A lower-cost alternative in which to achieve no data loss for target site disaster restart

SRDF/StarSRDF/Star is a two- or three-site protection topology where data is replicated from source Site A to two other Symmetrix systems simultaneously (Site B and Site C). The data remains protected even in case one target site (B or C) goes down. If site A (the primary site) goes down, the customer can choose where to come up (site B or C) based on SRDF/Star information. If the storage data in the other surviving site is more current then changes will be incrementally sent to the surviving site that will come up. For protection and compliance, remote replications can start immediately to the new DR site. For example, as shown in Figure 105, if database operations resume in Site C, data will be sent first from Site B to create a no data loss solution, and then Site B will become the new DR target. SRDF/Star has a lot of flexibility and can change modes and topology to achieve best protection with each disaster scenario. For full description of the product refer to the SRDF product guide.

Figure 105 SRDF/Star

Leveraging TimeFinder and SRDF for data consistencyEMC TimeFinder and SRDF solutions with Enginuity Consistency Assist (ECA consistent split) allow creation of dependent write-order consistent storage-based replicas. The replicas are created by temporarily holding write I/Os to all source devices included in the replica. Since all writes are held, no dependent writes can be issued


442


(as they depend on a previous completion of the held I/O). For example Oracle will not write to data files (checkpoint) until the redo writes for these data changes were fully recorded in the log files.

SRDF/S and SRDF/A modes ensure the dependent write-order consistency of the replication by synchronizing each and every dependent I/O (SRDF/S mode) or by synchronizing across cycles of transferred data (SRDF/A mode). In an actual disaster that leads to the loss of source location, database restart operations can be completed at the remote location without the delays associated with finding and applying recovery across applications in the correct sequence or to a coordinated time before the failure.

In addition to disaster restart benefits, SRDF significantly enhances disaster recovery operations by using fast and reliable replication technology to offload the Oracle backup operations to a remote site and later return the restored data to the local site as shown in the use cases section.

ASM rebalancing and consistency technologyASM provides a seamless and nonintrusive mechanism to expand and shrink the diskgroup storage. When disk storage is added or removed, ASM will perform a redistribution (rebalancing) of the striped data . This entire rebalance operation is done while the database is online, thus providing higher availability to the database. The main objective of the rebalance operation is to always provide an even distribution of file extents, workload, and data protection across all disks in the diskgroup.

With Symmetrix arrays as the storage, it is considered a best practice to use ASM external redundancy for data protection. The Symmetrix RAID protection will be utilized to provide RAID 1, RAID 5, or RAID 6 internal disk protection.

The split operation of storage-based replicas is sensitive to the rebalancing process, which may cause ASM diskgroup inconsistencies if the diskgroup device members are split at slightly different times. These inconsistencies are a result of possible ASM metadata changes occurring while a split operation is in process. Upon startup if ASM detects an inconsistency, metadata logs will be used to perform ASM instance recovery. In addition Oracle provides tools and procedural steps to avoid inconsistencies when splitting storage-based replicas; however, these procedures can be simplified and streamlined with the use of EMC consistency technology.



Since EMC consistent split technology suspends database I/O to preserve write ordering, it also has the side effect of preventing any ASM metadata changes during the split. Performing a consistent split will prevent ASM metadata inconsistencies during the replication process, eliminating the otherwise extra steps or possible unusable replica if ASM rebalance was active while performing a nonconsistent split.


444


Leveraging TimeFinder and SRDF for business continuity solutionsTable 33 shows the RAC database and Symmetrix device layout that was used in the use cases. All the devices (LUNs) were 50 GB in size and the database actual size was about 400 GB.

The database primary devices (also TimeFinder and SRDF source devices) were using Symmetrix RAID 1 protection. TimeFinder/Clone targets were using RAID 5 protection to improve storage utilization. SRDF target devices also used RAID 1 to match the same protection level as the primary database devices.

ASM general best practices◆ ASM was using external redundancy (no software mirroring) in

accordance with EMC's recommendation of leveraging the Symmetrix array RAID protection instead.

◆ ASM was set with three diskgroups: +REDO (redo logs), +DATA (data, control, temp files), and +FRA (archives, flashback logs). Typically EMC recommends separating logs from data for performance monitoring and backup offload reasons. When SRDF is used, temp files can go to their own "+TEMP" diskgroup if replication bandwidth is limited as temp is not required for database restart or recovery. In these use cases, however, SRDF FC bandwidth was not an issue and temp files were included in the +DATA diskgroup. Finally, +FRA can typically use a lower-cost storage tier like SATA drives and therefore require their own diskgroup.

TimeFinder best practices◆ Multiple Symmetrix device groups were used for

TimeFinder/Clone (or snap) operations, allowing finer granularity of operations. For recovery solutions, data files

Table 33 ASM diskgroups, and Symmetrix device and composite groups

ASMdiskgroups

Database devices

Recovery Device Groups (DG)

Restart Device Groups (DG)

SRDF Consistency Group (CG)

+DATA 18 LUNs x 50 GB DATA_DG DB_DG ALL_CG

+REDO 4 LUNs x 50 GB REDO_DG

+FRA 3 LUNs x 50 GB FRA_DG



(together with control files), log files, and archive logs each had their own DG, allowing the replica of each to take place at slightly different times as shown in the recovery use cases. For example, if a valid datafile's backup replica should be restored to production, and the production logs are intact, by separating the datafiles and logs to their own DG and ASM diskgroups, such a restore won't compromise the logs and full database recovery would be possible. For a restart solution, a single DG was used that includes all data (control) and log files, allowing them to be split consistently creating a restartable and consistent replica.

◆ Note that TimeFinder operations can span Symmetrix arrays. When that is the case instead of a device group (DG) a composite group (CG) should be used, following the exact same best practices as shown for the DG in this paper.

◆ It is recommended to issue TimeFinder and SRDF commands from a management (or the target) host and not the database production host. The reason is that in rare cases when consistent split is used, under heavy write activity Symmetrix management commands may be queued behind database writes, interfering with completing the replication and the replica will be deemed invalid.

◆ It is recommended to use Symmetrix Generic Name Services (GNS) and allow them to be replicated to the SRDF targets. GNS manages all the DG and CG definitions in the array and can replicate them to the SRDF target so the management host issuing TimeFinder and SRDF commands will be able to operate on the same CG and DG as the source (without having to re-create them).

◆ For the sake of simplicity the use cases assume that GNS is used and replicated remotely. When remote TimeFinder or SRDF operations are used, they are issued on the target host. It is also possible to issue remote TimeFinder and SRDF commands from the local management host using the -rdf flag; however it requires the SRDF links to be functional.

◆ Note that remote TimeFinder replica creation from an SRDF/A target should always use the -consistent flag to coordinate SRDF/A cycle switching with the TimeFinder operation and simply put, guarantee that the replica is consistent.

Leveraging TimeFinder and SRDF for business continuity solutions 445

446


SRDF best practices◆ SRDF, whether synchronous or asynchronous, should always use

a composite group (CG) with consistency enabled (also called a consistency group). While enabling consistency is a requirement for SRDF/A, it is a common misconception that SRDF/S being a synchronous replication doesn't benefit from it. However SRDF/S with consistency enabled will guarantee that if even a single source device can't replicate to its target, all the SRDF devices in that session will stop replicating, preserving the target consistent image.

◆ For SRDF replications a single CG was used that included all the database devices (data, control and log files). As shown in Table 1 it also included the FRA devices. SRDF on its own is a restart solution and since database crash recovery never uses archive logs there is no need to include FRA in the SRDF replications. However there are two reasons why they could be included. The first is if Flashback database functionality is required for the target. Replicating the FRA (and the flashback logs) in the same consistency group with the rest of the database allows its usage on the target of flashback functionality. The second reason is that to allow offload of backup images remotely, the archive logs are required (as shown in “Use Case 6: Remote database valid backup replicas” on page 456).

◆ It is always recommended to have a clone copy available at the SRDF target as a gold copy protection from rolling disasters. Rolling disasters is a term used when a first interruption to normal replication activities is followed by a secondary database failure on the source, leaving the database without an immediately available valid replica. For example, if SRDF replication was interrupted for any reason for a while (planned or unplanned) and changes were accumulated on the source, once the synchronization resumes and until the target is synchronized (SRDF/S) or consistent (SRDF/A), the target is not a valid database image. For that reason it is best practice before such resynchronization to take a TimeFinder gold copy replica at the target site, which will preserve the last valid image of the database, as a protection from rolling disasters.

◆ While the source database was clustered, since Oracle RAC is based on a shared storage architecture, by virtue of replicating all the database components (data, log, and control files) the target database has the option of being started in cluster, or non-clustered mode. Regardless of the choice, it is not



recommended to replicate the cluster layer (voting disks or cluster configuration devices) since these contain local hosts and subnets information. It is best practice that if a cluster layer is required at the target hosts, it should be configured ahead of time, based on target hostnames and subnets, and therefore be ready to bring up the database whenever the time comes.

Use Case 1: Offloading database backups from production

This use case illustrates how to offload database backups from production to a local TimeFinder/Clone, and then using Oracle RMAN to perform further backup.

While the Oracle database is in hot backup mode on the production host, a TimeFinder/Clone activate is performed to create a recoverable replica of the database. This is a valid backup image that can be used to perform quick recovery of the Oracle database. The image can also be mounted to another host for RMAN backups.

High-level steps1. Place the database in hot backup mode.

2. Activate the DATA_DG clone (with -consistent since ASM is used).

3. End hot backup mode.

4. Archive the current log.

5. Copy two backup control files to the FRA ASM diskgroup.

6. Activate the ARCHIVE_DG clone (with -consistent since ASM is used).

7. Optionally mount the clone devices on a backup host and perform RMAN backup.

Device groups usedDATA_DG and ARCH_DG

Detailed steps

On the production host1. Place the production database in hot backup mode.

# export ORACLE_SID=RACDB1# sqlplus "/ as sysdba"SQL> alter database begin backup;


448


2. Activate the TimeFinder/Clone DATA_DG replica. The clone replica includes data and control files. Use -consistent with ASM or file systems.

# symclone -dg DATA_DG -tgt -consistent activate


SQL> alter database end backup;

4. Switch logs and archive the current log file.

SQL> alter system archive log current;

5. Create two backup control files and place them in the FRA diskgroup for convenience (RMAN syntax is shown, although SQL can be used as well). One will be used to mount the database for RMAN backup; the other will be saved with the backup set.

RMAN>run {allocate channel ctl_file type disk;

copy current controlfile to '+FRA/control_file/control_start';

copy current controlfile to '+FRA/control_file/control_bakup';

release channel ctl_file;}

6. Activate the TimeFinder/Clone ARCHIVE_DG replica. The clone replica includes the archive logs and backup control files. Use -consistent with ASM or file systems. If RMAN Catalog is used synchronize it first to register the most recent archive logs.

RMAN>resync catalog;

# symclone -g ARCH_DG -tgt -consistent activate

On the backup hostThe database replica can be used as a valid disk backup or as a source for backup to a tertiary media such as tape or a disk library. In this example RMAN will be used to perform the backup.

Target/Backup host prerequisites:

◆ The ASM devices (or partitions) on clone volumes have correct Oracle permissions.



◆ The ASM_DISKSTRING parameter in the init.ora file for the ASM instance includes the path to clone volumes.

◆ The ASM_DISKGROUPS parameter in the init.ora file for the ASM instance contains the names of the production database diskgroups.

◆ It is not necessary to have the database mounted as RAC. Prior to mounting the database comment out, update ASM and database instance init.ora parameters as necessary. Specifically change CLUSTER_DATABASE to false if clustered mode is not needed. If the database is to be started in clustered mode then the cluster layer (and software) should already be installed and configured on the target host (not replicated with TimeFinder or SRDF)

7. Continuing from step 6 on the previous page) Start the ASM instance. If other volume managers or file systems are used their appropriate import and mount commands will be used instead. Make sure all the diskgroups were mounted correctly by ASM.

# export ORACLE_SID=+ASM# sqlplus "/ as sysdba"SQL> startup

8. Mount the database instance. A database backup that was taken with hot backup mode is valid for recovery only as long as it has not been opened read-writeable (with the resetlogs option). For that reason, it should be only mounted, which is the minimum prerequisite for RMAN backup. It can also be opened in read-only mode after enough archive logs are applied to resolve any data files' fuzziness. Before starting the database in mount mode, change the CONTROL_FILES in the init.ora file to point to the backup control file.

control_files = +FRA/control_file/control_start

# export ORACLE_SID=CLONE_DB# sqlplus "/ as sysdba"SQL> startup mount

9. Back up the database with RMAN from the backup host. The control file copy that was not used to mount the instance (control_bak) should be part of the backup set. The control_start file should not be backed up because the SCN will be updated when the database is mounted for backup.

RMAN>run {allocate channel t1 type disk;backup format 'ctl%d%s%p%t'


450


controlfilecopy '+FRA/control_file/control_bak';backup full format 'db%d%s%p%t' database;backup format 'al%d%s%p%t' archivelog all;release channel t1;}

Note: Note: The format specifier %d is for date, %t for 4-byte timestamp, %s for backup set number, and %p for the backup piece number.

Use Case 2: Parallel database recovery This use case illustrates how to perform parallel database recovery by restoring a local TimeFinder backup replica and applying logs to it, even while TimeFinder restore continues in the background.

The clone copy created in Use Case 1 can be used to perform database recovery of the production database. Database recovery operations can start as soon as TimeFinder/Clone restore operation has started, providing a much faster RTO compared to common solutions that require an initial restore of the backup image from the tertiary media destination, and only once it was fully restored, database recovery operations can start. Recovery can be performed using the archived logs available on the production host or restored from the TimeFinder/Clone image. Like in this example, if recovery takes place on production, and archive logs including even online redo logs are available, a full media recovery (no data loss) can be achieved. If the production logs (or not all archive logs) are available, database incomplete media recovery can still be performed.

High-level steps1. Shut down production database and ASM instances.

2. Restore the DATA_DG clone (split afterwards).

3. Start ASM.

4. Mount the database.

5. Perform database recovery and open the database.

Device group usedDATA_DG



Detailed steps

On the production host1. Shut down any production database and ASM instances (if still

running).

# export ORACLE_SID=RACDB1# sqlplus "/ as sysdba"SQL> shutdown abort

# export ORACLE_SID=+ASM1# sqlplus "/ as sysdba"SQL> shutdown abort

2. Restore the TimeFinder/Clone replica. Note the -force is required if the source device is also part of an active SRDF session with remote R2 devices. In this case it is assumed that production archive and redo logs are available, therefore, just the DATA_DG (with data and control files) is restored.

As soon as the restore starts it is possible to continue with the next step. However make sure to split the clone replica at a later time when the background restore completed. Note that TimeFinder restore protects the replica from changes to the source devices.

# symclone -dg DATA_DG -tgt restore [-force]# symclone -dg DATA_DG -tgt split

3. Start the ASM instance (follow the same activities as in Use Case 1, step 7).

4. Mount the database (follow the same activities as in Use Case 1, step 8).

5. Recover and open the production database. Use resetlogs if incomplete recovery was performed.

# export ORACLE_SID=RACDB1

# sqlplus "/ as sysdba"SQL> startup mountSQL> recover automatic database using backup

controlfile until cancel;SQL> alter database open;


452


Use Case 3: Local restartable replicas of productionThis use case illustrates how to create local restartable clones (or snaps) of production for database repurposing, such as creating test, development, and reporting copies.

While the Oracle database is running transactions on the production host, without the use of hot backup mode activate a consistent TimeFinder/Clone session to create a restartable replica of the database. The replica can be mounted to another host for purposes such as test, dev, reporting, and so on. Mounting multiple replicas of the same database on the same host is possible; however that topic is beyond the scope of this paper.

High-level steps1. Activate the DB_DG clone (with -consistent to create restartable

replica).

2. Start the ASM instance.

3. Start the database instance.

4. Optionally, refresh the clone replica from production at a later time.

Device group usedDB_DG

Detailed steps

On the target host1. Activate the TimeFinder/Clone DB_DG replica. The clone replica

includes all data, control, and log files. Use -consistent to make sure the replica maintains dependent write consistency and therefore a valid restartable replica from which Oracle can simply perform crash recovery.

# symclone -dg DB_DG -tgt -consistent activate

Note: Note: Follow the same target host prerequisites as in Use Case 1 prior to step 7.

2. Start the ASM instance (or perform import/mount if other volume managers or file systems are used). Make sure all the diskgroups were mounted correctly by ASM.



# export ORACLE_SID=+ASM# sqlplus "/ as sysdba"SQL> startup

3. Simply start the database instance. No recovery or archive logs are needed.

# export ORACLE_SID=CLONE_DB# sqlplus "/ as sysdba"SQL> startup

At this point the clone database is opened and available for user connections.

4. Optionally, it is easy and fast to refresh the TimeFinder replica from production as TimeFinder/Clone operations are incremental as long as the clone session is not terminated. Once the clone session is reactivated, the target devices are available immediately for use, even if background copy is still taking place.

1. Shut down the clone database instance since it needs to be refreshed

SQL> shutdown abort

2. Re-create and activate the TimeFinder/Clone replica from production. This will initiate the background copy operation.

# symclone -dg DB_DG -tgt recreate# symclone -dg DB_DG -tgt activate -consistent

3. Start the clone ASM and database instances by following steps 2 and 3 again.

Use Case 4: Remote mirroring for disaster protection (synchronous and asynchronous)

This use case illustrates how to create remote mirrors of a production database for disaster protection using SRDF/S or SRDF/A.

High-level steps1. Perform initial synchronization of SRDF in Adaptive Copy mode.

2. Once the SRDF target is close enough to the source, change the replication mode to SRDF/S or SRDF/A.

3. Enable SRDF consistency.


454


Device group usedALL_CG

Detailed steps1. Perform initial synchronization of SRDF in Adaptive Copy mode.

Repeat this step or use the skew parameter until the SRDF target is close enough to the source.

# symrdf -cg ALL_CG set mode acp_wp skew <number>]# symrdf -cg ALL_CG establish

2. Once the SRDF target is close enough to the source change the replication mode to SRDF/S or SRDF/A.

1. For SRDF/S, set protection mode to sync:

# symrdf -cg ALL_CG set mode sync

2. For SRDF/A, set protection mode to async:

# symrdf -cg ALL_CG set mode async

3. Establish SRDF replication if the copy is not already active and enable consistency.

# symrdf -cg ALL_CG enable# symrdf -cg ALL_CG establish [-full]# symrdf -cg ALL_CG verify -synchronized -i 60

Use Case 5: Remote restartable database replicas for repurposingThis use case illustrates how to create remote restartable clones (or snaps ) of production for database repurposing without interrupting SRDF protection

Once synchronized, an SRDF/S or SRDF/A session can be split at any time to create the dependent write consistent remote replica based on the R2 target devices. At that time SRDF will keep track of any changes on both source and target devices and only these changes will be copied over the next time SRDF is synchronized (refresh the target devices) or restored (refresh the source devices).

However it is a better practice to keep SRDF synchronized to maintain remote replication and protection, and instead activate a remote TimeFinder replica such as clone or snap (currently supported with SRDF/S only), and alternatively additional snapshots can be taken from the remote clone. These replicas of the database are dependent write consistent and can be used for activities such as test,



development, reporting, data processing, publishing, and more. It also can serve as gold copy protection from rolling disasters as explained earlier in the SRDF best practices section.

High-level steps1. Activate the remote DB_DG clone (use -consistent to create

restartable replica).

2. Start the remote ASM instance.

3. Start the remote database instance.

4. Optionally, refresh the remote clone replica from production (SRDF targets) at a later time.

Device group usedDB_DG

Detailed steps

On the target host1. Activate the TimeFinder/Clone DB_DG remote replica. The clone

replica includes all data, control, and log files. Use -consistent to make sure the replica maintains dependent write consistency and therefore a valid restartable replica from which Oracle can simply perform crash recovery.

# symclone -dg DB_DG -tgt -consistent activate

Note: Note: Follow the same target host prerequisites as in Use Case 1 prior to step 7.

2. Start the ASM instance. Follow the same activities as in Use Case 3 step 2.

3. Start the database instance. Follow the same activities as in Use Case 3 step 3.

At this point the clone database is opened and available for user connections.

4. Optionally, to refresh the database clone follow the same activities as in Use Case 3 step 4.


456


Use Case 6: Remote database valid backup replicas This use case illustrates how to create remote database clones that are a valid Oracle backup image and can be used for database recovery.

By creating TimeFinder remote replicas that are valid for database recovery, backup to tertiary media can be performed at the remote site. Also, the TimeFinder replica itself is a valid backup to disk that can be used to recover production if necessary.

Note: For SRDF/A: The SRDF checkpoint command will return control to the user only after the source device content reached the SRDF target devices (SRDF will simply wait two delta sets). This is useful for example when production is placed in hot backup mode before the remote clone is taken.

High-level steps1. Place the database in hot backup mode.

2. If using SRDF/A, perform SRDF checkpoint (no action required for SRDF/S).

3. Activate a remote DATA_DG clone (with -consistent if SRDF/A and/or ASM are used).


5. Archive the current log.

6. Copy two backup control files to the FRA ASM diskgroup.

7. If using SRDF/A then perform SRDF checkpoint (no action required for SRDF/S).

8. Activate the remote ARCHIVE_DG clone (with -consistent if SRDF/A and/or ASM is used).

9. Optionally mount the remote clone devices on the backup host and perform RMAN backup.

Device groups usedDATA_DG and ARCH_DG for TimeFinder operations, ALL_CG for SRDF operations



Detailed steps

On the production host1. Place production in hot backup mode. Follow the same activities

as in Use Case 1 step 1.

2. If SRDF/A is used then an SRDF checkpoint command will make sure the SRDF target has the datafiles in backup mode as well.

# symrdf -cg ALL_CG checkpoint

3. Activate the remote DATA_DG clone. Use -consistent if SRDF/A is used and/or ASM. Follow the same activities as in Use Case 1 step 2.

4. End hot backup mode. Follow the same activities as in Use Case 1 step 3.

5. Switch logs and archive the current log file. Follow the same activities as in Use Case 1 step 4.

6. Create two backup control files and place in the FRA diskgroup for convenience. Follow the same activities as in Use Case 1 step 5.

7. If SRDF/A is used then an SRDF checkpoint command will make sure the SRDF target has the FRA diskgroup (with the last archives and backup control files) at the target.

# symrdf -cg ALL_CG checkpoint

8. Activate the remote TimeFinder/Clone ARCHIVE_DG replica. Follow the same activities as in Use Case 1 step 6.

9. Optionally mount the remote clone devices on the backup host and perform RMAN backup. Follow the same activities as in the “On the backup host” section in Use Case 1.

Use Case 7: Parallel database recovery from remote backup replicas

This use case illustrates how to perform parallel production database recovery by restoring a remote TimeFinder/Clone backup image simultaneously with SRDF restore, and then applying Oracle logs to the production database in parallel. This is similar to Use Case 2, only the recovery is from a remote replica.


458



2. Restore the remote DATA_DG clone (split afterwards). Restore SRDF in parallel.

3. Start ASM.


5. Perform database recovery (possibly while the TimeFinder and SRDF restore are still taking place) and open the database.

Device groups usedDATA_DG; ALL_CG for SRDF operations

Detailed steps

On the production host1. Shut down any production database and ASM instances (if still

running). Follow the same activities as in Use Case 2 step 1.

2. Restore the remote TimeFinder/Clone replica to the SRDF target devices, then restore SRDF. If SRDF is still replicating from source to target stop the replication first. Then start TimeFinder restore, and once started start SRDF restore in parallel.

In some cases the distance is long, the bandwidth is limited, and many changes have to be restored. In these cases it might make more sense to change SRDF mode to Adaptive Copy first until the differences are small before placing it again in SRDF/S or SRDF/A mode.

# symrdf -cg ALL_CG split# symclone -dg DATA_DG -tgt restore [-force]# symrdf -cg ALL_CG restore

It is not necessary to wait for the completion of the SRDF restore before moving to the next step.

3. Start ASM on the production host. Follow the same activities as in Use Case 1 step 7.

4. Mount the database. Follow the same activities as in Use Case 1 step 8.

5. Recover and open the production database. Follow the same activities as in Use Case 2 step 5.



Use Case 8: Fast database recovery from a restartable replicasThis use case illustrates fast database recovery by using the most recent consistent (restartable) replica and applying logs to it.

Oracle supports various database recovery scenarios based on dependent write consistent storage replicas created using SRDF and/or TimeFinder. Oracle support is covered in metalink note ID 604683.1. The purpose of this use case is not to replace backup strategy such as nightly backups based on hot backup mode. Instead, it offers a complementary use case such as when RTO requirements are very strict. It could be a compelling solution to run the database in archivelog mode, and perform periodic snapshots without placing the database in hot backup mode. If recovery is required, the last snapshot is restored and in parallel the limited transactions since that snapshot was taken are restored, creating a fast database recovery solution.

Consider this scenario. The database is in archive log mode and periodic TimeFinder consistent clones or snaps are created that include only the data. At some point a database recovery is required based on the last replica (clone in this example).


2. Restore the most recent DATA_DG clone (split afterwards).

3. Start ASM.


5. Perform database full or incomplete recovery (possibly while the TimeFinder background restore is still taking place).

Device group usedDATA_DG

Detailed steps1. Shut down any production database and ASM instances (if still

running). Follow the same activities as mentioned in Use Case 2 step 1.

2. Restore the most recent DATA_DG TimeFinder replica. Follow the same activities as mentioned in Use Case 2 step 2.


460


3. Start the ASM instance (follow the same activities as in Use Case 1 step 7).

4. Mount the database (follow the same activities as in Use case 1 step 8).

5. Perform database recovery based on one of the following options.

Full (complete) database recoveryWhen all online redo logs and archive logs are available it is possible to perform a full media recovery of the Oracle database to achieve a no data loss of committed transactions.

SQL> recover automatic database;SQL> alter database open;

Note: Note: It might be necessary to point the location of the online redo logs or archive logs if the recovery process didn't locate them automatically (common in RAC implementations with multiple online or archive logs locations). The goal is to apply any necessary archive logs as well as the online logs fully.

Point-in-time database recoveryWhen a full media recovery is not desirable, or when some archives or online logs are missing, an incomplete recovery can be performed. When performing incomplete recovery enough logs need to be applied to pass the maximum point of data file fuzziness so they are all consistent. After passing that point additional archive can potentially be applied. The following is a sample script (based on the Oracle metalink note mentioned previously) that can help identify the minimum SCN required to open the database. However performing data file scans can be an elongated process that defeats the purpose of fast recovery and short RTO. Therefore running the script is optional, and it is recommended to simply perform the recovery instead for two reasons. First, the TimeFinder replica with the data and control files can be restored again if necessary so it can't be corrupted by the restore. Second, since the replica is taken with consistent split, the point of fuzziness of the data files can't go beyond the time of the split (it can only be older). Therefore it is clear that recovering this replica to a time beyond the split time will pass the maximum fuzziness in all the data files and will be sufficient.

Optional scan datafile script (not recommended to run unless RTO is not a concern):

spool scandatafile.out



set serveroutput on declare

scn number(12) := 0; scnmax number(12) := 0;

begin for f in (select * from v$datafile) loop scn := dbms_backup_restore.scandatafile(f.file#); dbms_output.put_line('File ' || f.file# ||'

absolute fuzzy scn = ' || scn); if scn > scnmax then scnmax := scn; end if;

end loop;

dbms_output.put_line('Minimum PITR SCN = ' || scnmax);

end;

Sample output generated by the scan data script:

SQL> @./scandata.sqlFile 1 absolute fuzzy scn = 27088040File 2 absolute fuzzy scn = 27144475File 3 absolute fuzzy scn = 27164171…File 22 absolute fuzzy scn = 0Minimum PITR SCN = 27164171

Perform incomplete database recovery (sample commands):

SQL> alter database recover database until change 27164171;



462


ConclusionSymmetrix VMAX is a new offering in the Symmetrix product line with enhanced scalability, performance, availability, and security features, allowing Oracle databases and applications to be deployed rapidly and with ease.

With the introduction of Enterprise Flash Drives, and together with Fibre Channel and SATA drives, Symmetrix provides a consolidation platform covering performance, capacity, and cost requirements of small and large databases. The correct use of storage tiers together with the ability to move data seamlessly between tiers allow customers to place their most active data on the fastest tiers, and their less active data on high-density, low-cost media like SATA drives. Features such as Autoprovisioning allow ease of storage provisioning to Oracle databases, clusters, and physical or virtual server farms.

TimeFinder and SRDF technologies simplify high availability and disaster protection of Oracle databases and applications, and provide the required level of scalability from the smallest to the largest databases. SRDF and TimeFinder are easy to deploy and very well integrated with Oracle products like Automatic Storage Management (ASM), RMAN, Grid Control, and more. The ability to offload backups from production, rapidly restore backup images, or create restartable database clones enhances the Oracle user experience and data availability.

Oracle and EMC have been investing in an engineering partnership to innovate and integrate both technologies since 1995. The integrated solutions increase database availability, enhance disaster recovery strategy, reduce backup impact on production, minimize cost, and improve storage utilization across a single database instance or RAC environments.



Test storage and database configurationThis appendix contains a description of the storage and database configurations used in the test use cases.

General test environment

It is assumed that:

◆ Oracle is installed on the target host with similar options to production and configured for ASM use (CSS, or Cluster Synchronization Service, is active).

◆ Copies of the production init.ora files for the ASM instance and the database instance were copied to the target host and modified if required to fit the target host environment.

◆ The appropriate Clone, R2, or Remote Clone (whichever is appropriate for the test) is accessible by the target host.

The SRDF and TimeFinder tests were performed while an OLTP workload was running, simulating a high number of concurrent Oracle users.

Although, TimeFinder and SRDF commands can be issued from any host connected to the Symmetrix, in the following test cases, unless specified otherwise, they were issued from the production host. The term "Production host" is used to specify the primary host where the source devices are used, and "Target host" is used to specify the host where the Clones, R2, or Remote clone devices are used.

Test setupFigure 106 on page 464 depicts the test setup containing Oracle RAC on the production site and associated TimeFinder/Clone and SRDF devices for local and remote replication.

Test storage and database configuration 463

464


Figure 106 Test configuration

Storage and device specific configuration:

◆ All RAC nodes share the same set of devices and have proper ownerships.

◆ PowerPath is used to support multipathing and load balancing.

◆ PowerPath device names are uniform across all RAC nodes.

◆ Symmetrix device groups are created for shared storage for RAC.

◆ ASM diskgroups were configured on Symmetrix devices.

◆ Appropriate local and remote replication relationships were created using SYMCLI commands for TimeFinder/Clone and SRDF.

Table 34 Test hardware (page 1 of 2)

Model OS Oracle version

Local “Production” Host:RAC Node 1

Dell Red Hat Enterprise Linux 5.0

11g release 1 (11.1.0.6.0)

Local “Production” Host:RAC Node 2

Dell Red Hat Enterprise Linux 5.0

11g release 1 (11.1.0.6.0)

Remote “Target” Host Dell Red Hat Enterprise Linux 5.0

11g release 1 (11.1.0.6.0)



Type Enginuity version

Symmetrix VMAX 5874

Symmetrix VMAX 5874

Table 34 Test hardware (page 2 of 2)

Model OS Oracle version

Test storage and database configuration 465

466



B

This appendix presents the following topic.

◆ Sample SYMCLI group creation commands................................ 468

Sample SYMCLI GroupCreation Commands

Sample SYMCLI Group Creation Commands 467

468

Sample SYMCLI Group Creation Commands

Sample SYMCLI group creation commandsThe following shows how Symmetrix device groups and composite groups are created for the TimeFinder family of products including TimeFinder/Mirror, TimeFinder/Clone, and TimeFinder/Snap.

This example shows how to build and populate a device group and a composite group for TimeFinder/Mirror usage:

Device group:

1. Create the device group:

symdg create dbgroup -type regular

2. Add the standard devices to the group. The database containers reside on five Symmetrix devices. The device numbers for these are 0CF, 0F9, 0FA, 0FB, and 101:

symld -g device_group add dev 0CFsymld -g device_group add dev 0F9symld -g device_group add dev 0FAsymld -g device_group add dev 0FBsymld -g device_group add dev 101

3. Associate the BCV devices to the group. The number of BCV devices should be the same as the number of standard devices and the same size. The device serial numbers of the BCVs used in the example are 00C, 00D, 063, 064, and 065.

symbcv -g device_group associate dev 00Csymbcv -g device_group associate dev 00Dsymbcv -g device_group associate dev 063symbcv -g device_group associate dev 064symbcv -g device_group associate dev 065

Composite group:

1. Create the composite group:

symcg create device_group -type regular

2. Add the standard devices to the composite group. The database containers reside on five Symmetrix devices on two different Symmetrix arrays. The device numbers for these are 0CF, 0F9 on the Symmetrix array with the last three digits of 123, and device numbers 0FA, 0FB, and 101 on the Symmetrix array with the last three digits of 456:

symcg -g device_group add dev 0CF -sid 123



symcg -g device_group add dev 0F9 -sid 123symcg -g device_group add dev 0FA -sid 456symcg -g device_group add dev 0FB -sid 456symcg -g device_group add dev 101 -sid 456

3. Associate the BCV devices to the composite group. The number of BCV devices should be the same as the number of standard devices and the same size. The device serial numbers of the BCVs used in the example are 00C, 00D, 063, 064, and 065.

symbcv -cg device_group associate dev 00C -sid 123symbcv -cg device_group associate dev 00D -sid 123symbcv -cg device_group associate dev 063 -sid 456symbcv -cg device_group associate dev 064 -sid 456symbcv -cg device_group associate dev 065 -sid 456

This example shows how to build and populate a device group and a composite group for TimeFinder/Clone usage:

Device group:

1. Create the device group device_group:

symdg create device_group -type regular



3. Add the target clone devices to the group. The targets for the clones can be standard devices or BCV devices. In this example, BCV devices are used. The number of BCV devices should be the same as the number of standard devices, and the same size or larger than the paired standard device. The device serial numbers of the BCVs used in the example are 00C, 00D, 063, 064, and 065.

symbcv -g device_group associate dev 00Csymbcv -g device_group associate dev 00Dsymbcv -g device_group associate dev 063symbcv -g device_group associate dev 064symbcv -g device_group associate dev 065

Sample SYMCLI group creation commands 469

470


Composite group:

1. Create the composite group device_group:


2. Add the standard devices to the group. The database containers reside on five Symmetrix devices on two different Symmetrix arrays. The device numbers for these are 0CF, 0F9 on the Symmetrix array with the last three digits of 123, and device numbers 0FA, 0FB, and 101 on the Symmetrix array with the last three digits of 456:

symcg -g device_group add dev 0CF -sid 123symcg -g device_group add dev 0F9 -sid 123symcg -g device_group add dev 0FA -sid 456symcg -g device_group add dev 0FB -sid 456symcg -g device_group add dev 101 -sid 456

3. Add the target for the clones to the device group. In this example, BCV devices are added to the composite group to simplify the later symclone commands. The number of BCV devices should be the same as the number of standard devices and the same size. The device serial numbers of the BCVs used in the example are 00C, 00D, 063, 064, and 065.

symbcv -cg device_group associate dev 00C -sid 123symbcv -cg device_group associate dev 00D -sid 123symbcv -cg device_group associate dev 063 -sid 456symbcv -cg device_group associate dev 064 -sid 456symbcv -cg device_group associate dev 065 -sid 456

The following example shows how to build and populate a device group and a composite group for TimeFinder/Snap usage.

Device group:

1. Create the device group device_group:

symdg create device_group -type regular





3. Add the VDEVs to the group. The number of VDEVs should be the same as the number of standard devices and the same size. The device serial numbers of the VDEVs used in the example are 291, 292, 394, 395, and 396.

symld -g device_group add dev 291 -vdevsymld -g device_group add dev 292 -vdevsymld -g device_group add dev 394 -vdevsymld -g device_group add dev 395 -vdevsymld -g device_group add dev 396 -vdev

Composite group:

1. Create the composite group device_group:


2. Add the standard devices to the composite group. The database containers reside on five Symmetrix devices on two different Symmetrix arrays. The device numbers for these are 0CF, 0F9 on the Symmetrix array with the last three digits of 123, and device numbers 0FA, 0FB, and 101 on the Symmetrix array with the last three digits of 456:

symcg -g device_group add dev 0CF -sid 123symcg -g device_group add dev 0F9 -sid 123symcg -g device_group add dev 0FA -sid 456symcg -g device_group add dev 0FB -sid 456symcg -g device_group add dev 101 -sid 456

3. Add the VDEVs to the composite group. The number of VDEVs should be the same as the number of standard devices and the same size. The device serial numbers of the VDEVs used in the example are 291, 292, 394, 395, and 396:

symld -cg device_group add dev 291 -sid 123 -vdevsymld -cg device_group add dev 292 -sid 123 -vdevsymld -cg device_group add dev 394 -sid 456 -vdevsymld -cg device_group add dev 395 -sid 456 -vdevsymld -cg device_group add dev 396 -sid 456 -vdev

Sample SYMCLI group creation commands 471

472



C


◆ Overview........................................................................................... 474

Related Host Operation

Related Host Operation 473

474


OverviewPrevious sections demonstrated methods of creating a database copy using storage-based replication techniques. While in some cases, customers create one or more storage-based database copies of the database as "gold" copies (copies that are left in a pristine state on the array), in most cases they want to present copied devices to a host for backups, reporting, and other business continuity processes. Mounting storage- replicated copies of the database requires additional array-based, SAN-based (if applicable), and host-based steps including LUN presentation and masking, host device recognition, and importing of the logical groupings of devices so that the operating system and logical volume manager recognize the data on the devices. Copies of the database can be presented to a new host or presented back to the same host that sees the source database. The following sections describe the host-specific considerations for these processes.

Whether using SRDF, TimeFinder, or Replication Manager to create a copy of the database, there are six essential requirements for presenting the replicated devices and making the copies available to a host. They include:

◆ Verifying that the devices are presented to the appropriate front-end directors in the BIN file.

◆ Verifying zoning and LUN presentation through the SAN are configured (if needed).

◆ Editing configuration information to allow the devices to be seen on the host.

◆ Scanning for the devices on the SCSI paths.

◆ Creating special files (UNIX) or assigning drive letters (Windows).

◆ Making the devices ready for use.

The following sections briefly discuss these steps at a high level.

BIN file configuration

For r the data to be presented to a host, the Symmetrix BIN file must be configured. LUNs need to be assigned to the hypervolumes, and then presented to front-end director ports.



This configuration can be done by the EMC Customer Engineer (CE) through the BIN file change request process, or by the customer using software utilities such as Symmetrix Configuration Manager or EMC ControlCenter.

SAN considerations

Hosts can be attached to a Symmetrix DMX either by direct connectivity (FC-AL, iSCSI, ESCON, or FICON), or through a SAN using Fibre Channel (FC-SW). When using direct-connect, all LUNs presented to a front-end port are presented to the host. In the case of a SAN, additional steps must be considered. These include zoning, which is a means of enabling security on the switch, and LUN masking, which is used to restrict hosts to see only the devices that they are meant to see. Also, there are HBA-specific SAN issues that must be configured on the hosts.

SAN zoning is a means of restricting FC devices (for example, HBAs and Symmetrix front-end FC director ports) from accessing all other devices on the fabric. It prevents FC devices from accessing unauthorized or unwanted LUNs. In essence, it establishes relationships between HBAs and FC ports using World Wide Names (WWNs). WWNs are unique hardware identifiers for FC devices. In most configurations, a one-to-one relationship (the zone) is established between an HBA and FC port, restricting other HBAs (or FC ports) from accessing the LUNs presented down the port. This simplifies configuration of shared SAN access and provides protection against other hosts gaining shared access to the LUNs.

In addition to zoning, LUN masking, which on the Symmetrix array is called Volume Logix™, can also be used to restrict hosts to see only specified devices down a shared FC director port. SANs are designed to increase connectivity to storage arrays such as the Symmetrix. Without Volume Logix, all LUNs presented down a FC port would be available to all hosts that are zoned to the front-end port, potentially compromising both data integrity and security.

The combination of zoning and Volume Logix, when configured correctly for a customer's environment, ensures that each host only sees the LUNs designated for it. They ensure data integrity and security, and also simplify the management of the SAN environment. There are many tools to configure zoning and LUN

Overview 475

476


presentation; primary among them is EMC SAN Manager™. Identifying specific configuration steps for zoning and LUN presentation are beyond the scope of this document.

There are several host- and HBA-specific SAN configuration considerations. When presenting volumes through a SAN, the HBA(s) must be configured correctly for the fabric. One step in the process is updating the HBA firmware and driver levels to those validated in the EMC Support Matrix; this is dependent on the type of host and HBA implemented. Parameter files for particular HBAs may also require modification. Additional configuration requirements depend on the type of host used. Examples of additional host configuration needs include updating of the sd.conf file on Solaris systems, configuration of persistent binding on Solaris hosts, and installation of the EMC ODM Support Package for Symmetrix on IBM systems.

For additional information on host- and HBA-specific configuration issues, consult the EMC Host Connectivity guides for each operating system, as well as the HBA-specific installation and configuration guides.

Final configuration considerations for enabling LUN presentation to hostsOnce the BIN file is configured correctly and connectivity via direct-connect or the SAN is established, the final steps needed to present the storage are to probe the SCSI bus for the storage devices, create special files on UNIX systems or assign drive letters on Windows hosts, and to make the devices ready for device I/O. These host-specific steps are described by operating system in the following sections.



Presenting database copies to a different hostThere are host-specific requirements to mount volumes when TimeFinder or Replication Manager are used to create a database copy that is to be presented to a different host. The following sections describe the processes needed to mount these copies for each of the major host types. Each section describes the operating-system steps for the following:

◆ Gathering configuration information prior to the change

◆ Probing the SCSI bus for new devices

◆ Creating special device files (UNIX) or assigning drive letters (Windows)

◆ Importing volume/disk groups (UNIX)

◆ Activating logical volumes (UNIX)

◆ Mounting file systems (if applicable)

AIX considerationsWhen presenting copies of devices from an AIX environment to a different host from the one the production copy is running on, the first step is to scan the SCSI bus, which allows AIX to recognize the new devices. The following demonstrates the steps needed for the host to discover and verify the disks, bring the new devices under PowerPath control if necessary, import the volume groups, and mount the file systems (if applicable).

1. Before presenting the new devices, it is useful to run the following commands and save the information to compare to after the devices are presented:

lsdev -Cc disklspvsyminq

2. Another important step to complete before presenting the devices to the new host is to understand which volume groups are associated with which physical devices on the source host. The following commands list the volume groups on the host, identify for each Oracle volume group which physical devices (hdisks) are used, and show the relationship between hdisks. Run these

Presenting database copies to a different host 477

478


commands on the host prior to making any device changes. This is a precaution only and is to document the environment should it later need to be restored manually.

lspv (List all the physical volume identifiers)lsvg (List all volume groups, identify Oracle volume groups)lsvg -p vol_grp (Run for each Oracle volume, identify hdisks)syminq (Find Symmetrix volume numbers for each

Oracle hdisk)

3. Once the Symmetrix volumes on the source host are identified, determine the relationship between the source volumes and the target volumes. Finding this relationship depends on the software used. The following commands are used to determine the relationships when TimeFinder/Mirror is used as the replication method:


4. The next step is for the target host to recognize the new devices. The following command scans the SCSI buses and examines all adapters and devices presented to the target system:

cfgmgr -v

Alternatively, the EMC command emc_cfgmgr (found in the /usr/lpp/EMC/Symmetrix/bin directory) may be executed to probe the SCSI buses for new devices after the Symmetrix ODM drivers have been installed.

5. Confirm presentation of the new devices by running the following commands:

lsdev -Cc diskslspvsyminq

6. If EMC PowerPath is installed, use the following commands to place the new devices under PowerPath control and verify success:

powermt configpowermt displaypowermt display dev=all



Once the devices are discovered by AIX, the next step is to import the volume groups. The key is to keep track of the PVIDs on the source system. The PVID is the physical volume identifier that uniquely identifies a volume across multiple AIX systems. When the volume is first included in a volume group, the PVID is assigned based on the host serial number and the timestamp. In this way, no two volumes should ever get the same PVID. However, array-based replicating technologies copy everything on the disk including the PVID.

7. On the production host, use the lspv command to list the physical volumes Locate the PVID of any disk in the volume group being replicated. On the secondary host, do an lspv as well. Locate the hdisk that corresponds to the PVID noted in the first step. Suppose the disk has the designation hdisk33. The volume group can now be imported using the command:

importvg -y vol_grp hdisk33

8. The volume group descriptor area (VGDA) on every disk in the volume group has a list of the PVIDs of all the other disks in the volume group. During the import, the LVM tracks down all the disks using the information in the VGDA. To ensure that the volume group imported successfully, use the following command to list the volume groups:

lsvg

9. If PowerPath is in use on the target host, the import of the volume group must be executed using a powerdisk rather than an hdisk. The procedure otherwise is the same. When performing an lspv on the target host, locate the appropriate hdiskpower## associated with PVID previously obtained. Then, issue the importvg command:

importvg -y vol_grp hdiskpower##

Once imported, the volume groups are automatically activated by AIX. The next step is to mount the file systems. If the UID and GIDs are not the same between the two hosts, run the chown command to change the ownerships of the logical volumes to the dba user and group that administer the server:

chown dbaadmin:dbagroup /dev/rlvname (Character special device file)chown dbaadmin:dbagroup /dev/lvname(Block special device file)


480


10. The first time this procedure is performed, create mount points for the file systems if raw volumes are not used. The mount points should be made the same as the mount points for the production file systems.

AIX and BCV considerationsTimeFinder/Mirror uses BCVs, which are by default in the "defined" state to AIX. To change these volumes to the "available" state, execute the following command:

/usr/lpp/EMC/Symmetrix/bin/mkbcv -a

If the devices need to be placed in the "defined" state to AIX, use the following command:

/usr/lpp/EMC/Symmetrix/bin/rmbcv -a

HP-UX considerationsWhen presenting clone devices in an HP-UX environment to a host different from the one the production copy is running on, initial planning and documentation of the source host environment is first required. The following demonstrates the steps needed for the target host to discover and verify the disks, bring the new devices under PowerPath control if necessary, import the volume groups and mount the file systems (if applicable).

1. Before presenting the new devices, it is useful to run the following commands on the target host and save the information to compare to output taken after the devices are presented:

vgdisplay -v | grep "Name"(List all volume groups)syminq(Find Symmetrix volume for each c#t#d#)

2. Another important step to complete before presenting the devices to the new host is to understand the association of volume groups and physical devices (/dev/rdsk/c#t#d#) on the source host since all disks in a volume group must be replicated together. Additionally, the relationship between the physical devices and the Symmetrix logical devices (hypervolumes) must be identified. The following commands list the volume groups on the host, identify for each Oracle volume group which physical devices are used, and show the relationship between physical devices and the Symmetrix devices.



vgdisplay -v | grep "VG Name" (List all volume groups)vgdisplay -v <vol_grp> | grep "PV Name"

(List PVs for a volume group)

syminq(Find Symmetrix volume for each c#t#d#)

3. Create map files for each volume group to replicate. The Volume Group Reserve Area (VGRA) on disk contains descriptor information about all physical and logical volumes that make up a volume group. This information is used when a volume group is imported to another host. However, logical volume names are not stored on disk. When a volume group is imported, the host assigns a default logical volume name. To ensure that the logical volume names are imported correctly, a map file generated on the source is created for each volume group and used on the target host when the group is imported.

vgexport -v -p -m /tmp/vol_grp.map vol_grp

4. Identify the Symmetrix device groups on the source host. The following commands, when run on the source host, list the device groups created and shows their details:

symdg list(Lists all device groups on the host)symdg show device_group (Shows specifics of a device group)

5. Once the Symmetrix volumes and applicable device groups on the source host are identified, identify the relationship between the source volumes and the target volumes. Finding this relationship depends on the replication software used. The following commands are used to determine the relationships when TimeFinder/Mirror is the replication method used:

symmir -g device_group query(TimeFinder/Mirror)

6. After identifying the Symmetrix volumes used as targets, ensure that the target host recognizes these new devices. The following command scans the SCSI buses and examines all adapters and devices presented to the target system:

ioscan -fn

7. Create device special files for the volumes presented to the host:

insf -e


482


8. Confirm presentation of the new devices by running the following commands and comparing them to outputs found in step 1:

symcfg discoversyminq


powermt config(Configures the devices under PowerPath)powermt display(Displays the number of devices per path)powermt display dev=all(Displays all of the device detail info)

10. Once the devices are discovered by HP-UX, they need to be identified with their associated volume groups from the source host to be imported successfully. When using the vgimport command, specify all of the devices for the volume group to be imported. Since the target and LUN designations for the target devices are different from the source volumes, the exact devices must be identified using the syminq and symmir outputs. Source volume group devices are associated with Symmetrix source devices through a syminq output. Then Symmetrix device pairings from the source to target hosts are found from the symmir device group outputs. Finally, Symmetrix target volume to target host device pairings are made through the syminq output from the target host.

11. After identifying each of the new /dev/rdsk/c#t#d# devices and their associated volume groups, create the volume group structures needed to successfully import the volume groups onto the new host. A directory and group file for each volume group must be created before the volume group can be imported. Ensure that each volume group has a unique minor number.

ls -l /dev/*/group (Identify used minor numbers)mkdir /dev/vol_grpmknod /dev/vol_grp/group c 64 0xminor#0000

(minor# must be unique)



12. Import the volume groups onto the target host. Volume group information from the source host is stored in the Volume Group Reserve Area (VGRA) on each volume presented to the target host. Volume groups are imported by specifying a volume group name, if the volume group names are not used on the target.

vgimport -v -m vg_map_file vol_grp /dev/rdsk/c#t#d# [/dev/rdsk/c#t#d#]

where vg_map_file is the volume group map file created in step 3, vol_grp is the volume group name being imported, and c#t#d# are the devices in the specified volume group.

13. After importing the volume group, activate the volume group.

vgchange -a y vol_grp

14. Once the volume groups are activated, mount on the target any file systems from the source host. These file systems may require a file system check using fsck as well. Add an entry to /etc/fstab for each file system.

Linux considerations

Enterprise releases of Linux from Red Hat and SuSE provide a logical volume manager for grouping and managing storage. However, it is not common to use the logical volume manager on Linux. The technique deployed to present and use a copy of Oracle database on a different host depends on whether or not the logical volume manager is used on the production host. To access the copy of the database on a secondary host, follow these steps:

1. Create a mapping of the devices that contain the database to file systems. This mapping information is used on the secondary host. The mapping can be performed by using the information in the /etc/fstab file and/or the output from the df command.

In addition, if the production host does not use logical volume manager, the output from syminq and symmir/symclone/symsnap command is required to associate the operating-system device names (/dev/sd<x>) with Symmetrix device numbers on the secondary host.

2. Unlike other UNIX operating systems, Linux does not have a utility to rescan the SCSI bus. Any of the following methods allow a user to discover changes to the storage environment:


484


• Rebooting the Linux host

• Unloading and reload the Fibre Channel or SCSI driver module

• Making changes to the /proc pseudo-file system to initiate a scan

Although rebooting the Linux host is a viable option, for most enterprise environment this is unacceptable. The unloading and reload of the Fibre Channel or SCSI driver module is possible only if the driver is not being used, making it a highly unreliable technique.

The Linux operating system presents all resources under management by means of a pseudo file system called /proc. This file system is a representation of in-memory data structures that are used by the operating system to manage hardware and software resources. The /proc file system is used to convey resource changes to the operating system. To initiate a scan of the SCSI bus, execute the following command on the secondary host:

echo "scsi scan-new-devices" > /proc/scsi/scsi

The devices representing the copy of the database should be available for use on the secondary host.

3. Confirm presentation of the new devices by running the following commands on the secondary host:


4. If PowerPath is available on the secondary host, use the following command to place the new devices under the control of PowerPath:

powermt config

Verify the status of the devices by executing the following commands:

powermt displaypowermt display dev=all

5. If logical volume manager is used on the production host, import the volume group definitions back on the secondary host. To do this, use the pvscan, vgimport, and vgchange commands as follows:

pvscan -novolumegroupvgimport volume_group_name



vgchange -a y volume_group_name

The pvscan command displays all the devices that are initialized, but not belonging to a volume group. The command should display all members of the volume groups that constitute the copy of the database. The vgimport command imports the new devices and creates the appropriate LVM structures needed to access the data. If LVM is not used, this step can be skipped.

6. Once the volume groups, if any, are activated, mount on the target any file systems from the source host. If logical volume manager is not being used, execute syminq on the secondary host. The output documents the relationship between the operating system device names (/dev/sd<x>) and the Symmetrix device numbers associated with the copy of the database. The output from step 1 can be then used to determine the devices and the file systems that need to be mounted on the secondary host.

These file systems may require a file system check (using fsck) before they can be mounted. If it does not exist, make an entry to /etc/fstab for each file system.

Solaris considerations

When presenting replicated devices in a Solaris environment to a different host from the one production is running on, the first step is to scan the SCSI bus which allows the secondary Solaris system to recognize the new devices. The following steps cause the host to discover and verify the disks, bring the new devices under PowerPath control if necessary, import the disk groups, start the logical volumes, and mount the file systems (if applicable). The following commands assume that VERITAS Volume Manager (VxVM) is used for logical volume management.

1. Before presenting the new devices, run the following commands and save the information to compare to, after the devices are presented:

vxdisk listvxprint -htsyminq

2. Another important step to complete before presenting the devices to the new host is to understand which disk groups are associated with which physical devices on the source host. The following commands list the disk groups on the host, identify for each


486


Oracle disk group which physical devices are used, and show the relationship between hdisks should be run on the host prior to making any device changes. This is a precaution only and is to document the environment should it reqiore a manual restore later.

vxdg list(List all the disk groups)vxdisk list(List all the disks and associated groups)syminq(Find Symmetrix volume numbers for each Oracle disk)

3. Once the Symmetrix volumes on the source host are identified, determine the relationship between the source volumes and the target volumes. Finding this relationship depends on the software used. The following commands are used to determine the relationships when TimeFinder or SRDF are the replication methods used:

symmir -g device_group query(TimeFinder)symrdf -g device_group query(SRDF)

4. The next step is for the target host to recognize the new devices. The following command scans the SCSI buses, examines all adapters and devices presented to the target system, and builds the information into the /dev directory for all LUNs found:

drvconfig;devlinks;disks

5. Confirm presentation of the new devices by running the following commands:

formatsyminq

6. VERITAS needs to discover the new devices after the OS can see them. To make VERITAS discover new devices, enter:

vxdctl enable


powermt configpowermt displaypowermt display dev=all



8. Once VERITAS has found the devices, import the disk groups. The disk group name is stored in the private area of the disk. To import the disk group, enter:

vxdg -C import diskgroup

Use the -C flag to override the host ownership flag on the disk. The ownership flag on the disk indicates the disk group is online to another host. When this ownership bit is not set, the vxdctl enable command actually performs the import when it finds the new disks.

9. Run the following command to verify that the disk group imported correctly:

vxdg list

10. Activate the logical volumes for the disk groups:

vxvol -g diskgroup startall

11. For every logical volume in the volume group, run fsck must to fix any incomplete file system unit of work:

fsck -F vxfs /dev/vx/dsk/diskgroup/lvolname

12. Mount the file systems. If the UID and GIDs are not the same between the two hosts, run the chown command to change the ownerships of the logical volumes to the DBA user and group that administers the server:

chown dbaadmin:dbagroup /dev/vx/dsk/diskgroup/lvolnamechown dbaadmin:dbagroup /dev/vx/rdsk/diskgroup/lvolname

13. The first time this procedure is performed, create mount points for the file systems, if raw volumes are not used. The mount points should be made the same as the mount points for the production file systems.

Windows considerationsTo facilitate the management of volumes, especially those of a transient nature such as BCVs, EMC provides the Symmetrix Integration Utility (SIU). SIU provides the necessary functions to scan for, register, mount, and unmount BCV devices.


488


Within the Windows Server environment, logical units (LUNs) are displayed as PHYSICALDRIVE devices. Use the sympd list command to see the currently accessible devices on the BCV host, as shown in the following example:

Symmetrix ID: 000185500028 Device Name Directors Device--------------------------- ------------- ------------------------------------ CapPhysical Sym SA :P DA :IT Config Attribute Sts (MB)--------------------------- ------------- ------------------------------------\\.\PHYSICALDRIVE4 0000 04B:0 01A:C5 2-Way Mir N/Grp'd VCM WD 11\\.\PHYSICALDRIVE5 0002 04B:0 02B:C0 RDF1+Mir Grp'd RW 8632\\.\PHYSICALDRIVE6 000E 04B:0 15A:D1 RDF1+Mir Grp'd (M) RW 34526\\.\PHYSICALDRIVE7 004F 04B:0 01A:D3 2-Way Mir N/Grp'd RW 8632\\.\PHYSICALDRIVE8 00AF 04B:0 16B:C4 2-Way Mir Grp'd (M) RW 34526\\.\PHYSICALDRIVE9 00B3 04B:0 01A:D4 2-Way Mir Grp'd (M) RW 34526\\.\PHYSICALDRIVE10 00B7 04B:0 16B:C5 2-Way Mir Grp'd (M) RW 34526\\.\PHYSICALDRIVE11 00BB 04B:0 01A:D5 2-Way Mir Grp'd (M) RW 34526

Additionally, view the mapping of these physical drive devices to volume mount points using the Windows Disk Management console, as shown next.

Figure 107 Windows Disk Management console



The "Disk x" value represents the similarly numbered PHYSICALDRIVE device from the sympd command. Thus, Disk 7 is the same device as PHYSICALDRIVE7.

As new devices such as BCVs are presented to a Windows server, the SIU can be used to scan for and register these devices. The rescan function of the symntctl utility will rediscover disk devices including newly visible BCV devices:

symntctl rescan

It is possible to use the SIU to manage the mount operations of BCV volumes by specifying the Symmetrix volume identifier with a mount operation:

symntctl mount -drive W: -sid 028 -symdev 055 -part 1

In the previous example, the -part 1 option specifies the partition on the LUN that is to be mounted, and is only required if multiple partitions exist on the device. As such, SIU will mount the volume with Symmetrix volume ID of 055 on Symmetrix 028 to the drive letter W.

Conversely, it is possible to unmount volumes using SIU. This is recommended prior to reestablishing the BCV to its source STD volume. In the case of the unmount, only the drive letter location is required to be presented.

symntctl unmount -drive W:

This command will unmount the volume from the drive letter and dismiss the Windows cache that relates to the volume. If any running application maintains an open handle to the volume. SIU will fail and report an error. The administrator should ensure that no applications are using any data from the required volume; proceeding with an unmount while processes have open handles is not recommended.

The SIU can identify those processes that maintain open handles to the specified drive, using the following command:

symntctl openhandle -drive W:

Using this command, it is possible to identify running processes to shut down or terminate to facilitate the unmount command.


490


Windows Dynamic DisksIn general, avoid using the Dynamic Disk functionality provided by the Windows Server environment. This limited Dynamic Disk functionality does not provide necessary API calls to adequately manage import and deport operations for the disk groups. Refer to the release notes for SIU to identify how to resolve this situation.

However, while all functionality may be unavailable, it is possible to implement some limited functionality when using base Dynamic Disk configurations. SIU will allow for mount and unmount operations against Dynamic Disk configurations, though it will be unable to import/deport the disk groups.

To import the disk group, use Windows Disk Management. The newly presented disk group will appear as a "Foreign" group, and may then be imported using the Disk Management interface. For specific implementation details and limitations, consult the Windows Help documentation provided by Microsoft.



Presenting database copies to the same hostA copy of a database in most cases is used on a different host from the one that owns the source database. The secondary host can then perform operations on the data independent from the primary host and without any conflicts or issues. In some circumstances, it is preferred to use the copy of the database on the same host that owns the source database, or to use two copies of the database on a secondary host. In these situations, care must be taken as OS environments can experience issues when they own disks that have identical signatures/descriptor areas. Each operating system and volume manager has its own unique way of dealing with these issues. The following sections describe how to manage duplicate disks in a single OS environment.

AIX considerations

When presenting database copies back to the same host in an AIX environment, one must deal with the fact that the OS now sees the source disk and an identical copy of the source disk. This is because the replication process copies not only the data part of the disk, but also the system part, which is known as the Volume Group Descriptor Area (VGDA). The VGDA contains the physical volume identifier (PVID) of the disk, which must be unique on a given AIX system.

The issue with duplicate PVIDs prevents a successful import of the copied volume group and has the potential to corrupt the source volume group. Fortunately, AIX provides a way to circumvent this limitation. AIX 4.3.3 SP8 and later provides the recreatevg command to rebuild the volume group from a supplied set of hdisks or powerdisks. Use syminq to determine the hdisks or powerdisks that belong to the volume group copy. Then, issue either of the two commands:

recreatevg -y replicavg_name -l lvrename.cfg hdisk## hdisk## hdisk ## …recreatevg -y replicavg_name -l lvrename.cfg hdiskpower## hdiskpower## hdiskpower## …

where the ## represents the disk numbers of the disks in the volume group. The recreatevg command gives each volume in the set of volumes a new PVID, and also imports and activates the volume group.

Presenting database copies to the same host 491

492


The lvrename.cfg file can be used to assign new alternative names to the logical volumes. If the file is not provided, AIX provides default LV names.

A successful recreatevg command varies on the volume group and performs JFS log replay if necessary for any cooked file systems. Default mount points for each file system are created in /etc/file systems using the format /xx/oldmountpoint where xx is specified with the -L parameter. If desired, mount points can be changed by editing /etc/file systems. The command mount -a can be used to mount the file systems of the replicated database.

HP-UX considerationsPresenting database copies in an HP-UX environment to the same host as the production copy is nearly identical to the process used for presenting the copy to a different host. The primary differences are the need to use a different name for the volume groups and the need to change the volume group IDs on the disks.

1. Before presenting the new devices, it is useful to run the following commands on the target host and save the information to compare to outputs taken after the devices are presented:

vgdisplay -v | grep "Name"(List all volume groups)syminq (Find Symmetrix volume for each c#t#d#)

2. Another important step to complete before presenting the devices to the new host is to understand the association of volume groups and physical devices (/dev/rdsk/c#t#d#) on the source host since all disks in a volume group must be replicated together. Additionally, the relationship between the physical devices and the Symmetrix logical devices (hypervolumes) must be identified. The following commands list the volume groups on the host, identify for each Oracle volume group which physical devices are used, and show the relationship between physical devices and the Symmetrix devices:

vgdisplay -v | grep "VG Name" (List all volume groups)vgdisplay -v <vol_grp> | grep "PV Name"

(List PVs for a volume group)syminq (Find Symmetrix volume for each c#t#d#)

3. Create map files for each volume group to be replicated. The Volume Group Reserve Area (VGRA) on disk contains descriptor information about all physical and logical volumes that make up



a volume group. This information is used when a volume group is imported to another host. However, logical volume names are not stored on disk. When a volume group is imported, the host assigns a default logical volume name. To ensure that the logical volume names are imported correctly, a map file generated on the source is created for each volume group and used on the target host when the group is imported.

vgexport -v -p -m /tmp/vol_grp.map vol_grp

4. Identify the Symmetrix device groups on the source host. The following commands, when run on the source host, list the device groups created and shows their details:

symdg list playsymdg show device_group (Shows specifics of a device group)

5. Once the Symmetrix volumes and applicable device groups on the source host are identified, identify the relationship between the source volumes and the target volumes. Finding this relationship depends on the replication software used. The following commands are used to determine the relationships when TimeFinder/Mirror is the replication method used:

symmir -g device_group query(TimeFinder)

6. After identifying the Symmetrix volumes used as targets, ensure that the target host recognizes these new devices. The following command scans the SCSI buses and examines all adapters and devices presented to the target system:

ioscan -fn

7. Create device special files for the volumes presented to the host:

insf -e

8. Confirm presentation of the new devices is confirmed by running the following commands and comparing them to output created in step 1:




494


powermt config(Configures the devices under PowerPath)powermt display(Displays the number of devices per path)powermt display dev=all(Displays all of the device detail info)

10. Once the devices are found by HP-UX, identify them with their associated volume groups from the source host so that they can be imported successfully. When using the vgimport command, specify all of the devices for the volume group to be imported. Since the target and LUN designations for the target devices are different from the source volumes, the exact devices must be identified using the syminq and symmir output. Source volume group devices can be associated with Symmetrix source devices through syminq output. Then Symmetrix device pairings from the source to target hosts are found from the symmir device group output. And finally, Symmetrix target volume to target host device pairings are made through the syminq output from the target host.

11. Change the volume group identifiers (VGIDs) on each set of devices making up each volume group. For each volume group, change the VGID on each device using the following:

vgchgid /dev/rdsk/c#t#d# [/dev/rdsk/c#t#d#] . . .

12. After changing the VGIDs for the devices in each volume group, create the volume group structures needed to successfully import the volume groups onto the new host. A directory and group file for each volume group must be created before the volume group is imported. Ensure each volume group has a unique minor number and is given a new name.

ls -l /dev/*/group(Identify used minor numbers)mkdir /dev/newvol_grpmknod /dev/newvol_grp/group c 64 0xminor#0000

(minor# must be unique)

13. Import the volume groups onto the target host. Volume group information from the source host is stored in the VGRA on each volume presented to the target host. Volume groups are imported by specifying a volume group name, if the volume group names are not used on the target.

vgimport -v -m vg_map_file vol_grp /dev/rdsk/c#t#d# [/dev/rdsk/c#t#d#]



where vg_map_file is the volume group map file created in step 3, vol_grp is the volume group name being imported, and c#t#d# is the device in the specified volume group.

14. After importing the volume groups, activate them:

vgchange -a y vol_grp

15. Once the volume groups are activated, mount on the target any file systems from the source host. These file systems may require a file system check using fsck as well. An entry should be made to /etc/fstab for each file system.

Linux considerations

Presenting database copies back to the same Linux host is possible only if the production volumes are not under the control of the logical volume manager. Linux logical volume manager does not have utility such as vgchgid to modify the UUID (universally unique identifier) written in the private area of the disk.

For a Oracle database not under LVM management, the procedure to import and access a copy of the production data on the same host is similar to the process for presenting the copy to a different host. The following steps are required:

1. Execute syminq and symmir/symclone/symsnap to determine the relationship between the Linux device name (/dev/sd<x>), the Symmetrix device numbers that contain the production data, and the Symmetrix device numbers that hold the copy of the production data. In addition, note the mount points for the production devices as listed in /etc/fstab and the output from the command df.

2. Initiate the scan of SCSI bus by running the following command as root:

echo "scsi scan-new-devices" > /proc/scsi/scsi

3. If PowerPath is in use on the production host, bring the new devices under PowerPath control:

powermt config

Verify the status of the devices:

powermt displaypowermt display dev=all


496


4. Use the syminq or sympd list commands to display, in addition to the production devices, the devices that contain the database copy. Note the Linux device name associated with the newly added devices.

5. Using the mapping information collected in step 1, mount the file systems on the new devices to the appropriate mount points. Note that the copy of the data has to be mounted at a different location since the production database volumes are still mounted and accessible on the production host. For ease of management, it is recommended to create a directory structure similar to the production devices.

For example, assume that the production database consists of one Symmetrix device, 000 accessed by the operating system by device name /dev/sdb1 and mounted at /u01. Furthermore, assume that the copy of the database is available on Symmetrix device 100 and device name is /dev/sdc1. It is recommended to unmount /dev/sdc1 at mount point /copy/u01.

The total number of hard disks presented on a Linux host varies anywhere from 128 to 256 depending on the version of the Linux kernel. When presenting copies of database back to the production host, ensure that the total devices does not exceed the limit.

Solaris considerationsPresenting database copies to a Solaris host using VERITAS volume manager where the host can see the individual volumes from the source volume group is not supported other than with Replication Manager. Replication Manager provides "production host" mount capability for VERITAS.

The problem is that the VERITAS Private Area on both the source and target volumes is identical. A vxdctl enable finds both volumes and gets confused as to which are the source and target.

To get around this problem, the copied volume needs to be processed with a vxdisk init command. This re-creates the private area. Then, a vxmake using a map file from the source volume created with a vxprint -hvmpsQq -g dggroup can be used to rebuild the volume



group structure after all the c#t#d# numbers are changed from the source disks to the target disks. This process is risky and difficult to script and maintain and is not recommended by EMC.

Windows considerations

The only difference for Windows when bringing back copies of volumes to the same Windows server is that duplicate volumes or volumes that appear to be duplicates are not supported in a cluster configuration.


498



D


◆ Sample script to replicate a database ............................................ 500

Sample DatabaseCloning Scripts

Sample Database Cloning Scripts 499

500

Sample Database Cloning Scripts

Sample script to replicate a databaseThe following example shows a Korn shell script where the requirements are to replicate the Oracle database onto a different host than the primary database using TimeFinder functionality. In this case, BCVs on the Symmetrix array are established to the primary database volumes. After the establish is complete, the tablespace names are found and each is put into hot backup mode. The BCVs are then split from the standards. Two device groups, DATA_DG and LOG_DG, are used to split the log information separately from the rest of the Oracle data.

The main script is called main.ksh, which contains callouts to perform all the additional required tasks.

#!/bin/ksh############################################################### Main script - Note: This assumes that the device groups# DATA_DG and LOG_DG are already created.#############################################################

############################################################# Define Variables############################################################ORACLE_SID=oratestexport ORACLE_SIDORACLE_HOME=/oracle/oracle10gexport ORACLE_HOME

SCR_DIR=/opt/emc/scriptsCLI_DIR=/usr/symcli/bin

DATA_DG=data_dgLOG_DG=logs_dg#############################################################

############################################################# Establish the BCVs for each device group############################################################

${SCR_DIR}/establish.kshRETURN=$?if [ $RETURN != 0 ]; thenexit 1fi



############################################################# Get the tablespace names using sqlplus############################################################

su - oracle -c ${SCR_DIR}/get_tablespaces_sub.kshRETURN=$?if [ $RETURN != 0 ]; thenexit 2fi

############################################################# Put the tablespaces into hot backup mode############################################################

su - oracle -c ${SCR_DIR}/begin_hot_backup_sub.ksh

############################################################# Split the DATA_DG device group############################################################

${SCR_DIR}/split_data.kshRETURN=$?if [ $RETURN != 0 ]; thenexit 3fi

############################################################# Take the tablespaces out of hot backup mode############################################################

su - oracle -c ${SCR_DIR}/end_hot_backup_sub.ksh

############################################################# Split the LOG_DG device group############################################################

${SCR_DIR}/split_log.kshRETURN=$?if [ $RETURN != 0 ]; thenexit 4fi

echo "Script appeared to work successfully"exit 0=================================================================

#!/bin/ksh

############################################################# establish.ksh# This script initiates a BCV establish for the $DATA_DG# and $LOG_DG device groups on the Production Host.

Sample script to replicate a database 501

502


############################################################

############################################################# Define Variables############################################################

CLI_BIN=/usr/symcli/bin

DATA_DG=data_dgLOG_DG=log_dg

############################################################# Establish the DATA_DG and LOG_DG device groups############################################################

${CLI_BIN}/symmir -g ${DATA_DG} -noprompt establishRETURN=$?if [ $RETURN != 0 ]; thenERROR_DATE=`dateècho "Split failed for Device Group ${DATA_DG}!!!"echo "Script Terminating."echoecho "establish: failed"echo "$ERROR_DATE: establish: failed to establish ${DATA_DG}"exit 1fi

${CLI_BIN}/symmir -g ${LOG_DG} -noprompt establishRETURN=$?if [ $RETURN != 0 ]; thenERROR_DATE=`dateècho "Establish failed for Device Group ${LOG_DG}!!!"echo "Script Terminating."echoecho "establish: failed"echo "$ERROR_DATE: establish: failed to establish ${LOG_DG}"exit 2fi

############################################################# Cycle ${CLI_BIN}/symmir query for status############################################################

RETURN=0while [ $RETURN = 0 ]; do${CLI_BIN}/symmir -g ${LOG_DG} query | grep SyncInProg \ > /dev/nullRETURN=$?REMAINING=`${CLI_BIN}/symmir -g ${LOG_DG} query | grep MB | \ awk '{print $3}'ècho "$REMAINING MBs remain to be established."echo



sleep 10done

RETURN=0while [ $RETURN = 0 ]; do${CLI_BIN}/symmir -g ${DATA_DG} query | grep SyncInProg \ > /dev/nullRETURN=$?REMAINING=`${CLI_BIN}/symmir -g ${DATA_DG} query | grep MB | \ awk '{print $3}'`echo "$REMAINING MBs remain to be established."echosleep 10done

exit 0

=================================================================

#!/bin/ksh

############################################################# get_tablespaces_sub.ksh# This script queries the Oracle database and returns with# a list of tablespaces which is then used to identify# which tablespaces need to be placed into hotbackup mode.############################################################


SCR_DIR=/opt/emc/scripts

############################################################# Get the tablespace name using sqlplus############################################################

sqlplus internal <<EOF > /dev/nullset echo off;spool ${SCR_DIR}/tablespaces.tmp;select tablespace_name from dba_tablespaces;spool off;exitEOF

############################################################# Remove extraneous text from spool file############################################################

cat ${SCR_DIR}/tablespaces.tmp | grep -v "TABLESPACE_NAME" | \ grep -v "-" |grep -v "rows selected." \


504


> ${TF_DIR}/tablespaces.txt

############################################################# Verify the creation of the tablespace file############################################################

if [ ! -s ${SCR_DIR}/tablespaces.txt ]; thenexit 1fi

exit 0=================================================================

#!/bin/ksh

############################################################# begin_hot_backup_sub.ksh# This script places the oracle database into hot backup# mode.############################################################



############################################################# Do a log switch############################################################

sqlplus internal <<EOFalter system archive log current;exitEOF

############################################################# Put all tablespaces into hot backup mode############################################################

TABLESPACE_LIST=`cat ${SCR_DIR}/tablespaces.txt`

for TABLESPACE in $TABLESPACE_LIST; dosqlplus internal <<EOFalter tablespace ${TABLESPACE} begin backup;exitEOFdone

exit 0

=================================================================



#!/bin/ksh

############################################################# split_data.ksh# This script initiates a Split for the $DATA_DG Device# group on the Production Host.############################################################


CLI_BIN=/usr/symcli/binDATA_DG=data_dg

############################################################# Split the DATA_DG device group############################################################

${CLI_BIN}/symmir -g ${DATA_DG} -noprompt -instant splitRETURN=$?if [ $RETURN != 0 ]; thenERROR_DATE=`date`echo "Split failed for Device Group ${DATA_DG}!!!"echo "It is not safe to continue..."echo "Script Terminating."echoecho "split_data: failed"echo "$ERROR_DATE: split_data: failed to split ${DATA_DG}"exit 1fi


RETURN=0while [ $RETURN = 0 ]; do${CLI_BIN}/symmir -g ${DATA_DG} query | grep SplitInProg \ > /dev/nullRETURN=$?REMAINING=`${CLI_BIN}/symmir -g ${DATA_DG} query | grep MB | \ awk '{print $3}'`echo "$REMAINING MBs remain to be split."echosleep 5done

exit 0=================================================================


506


#!/bin/ksh

############################################################# end_hot_backup_sub.ksh# This script ends the hot backup mode for the oracle# database. The script is initiated by the end_hot_backup# scrips############################################################



############################################################ Take all tablespaces out of hotbackup mode############################################################

TABLESPACE_LIST=`cat ${SCR_DIR}/tablespaces.txt`

for TABLESPACE in $TABLESPACE_LIST; dosqlplus internal <<EOFalter tablespace ${TABLESPACE} end backup;exitEOFdone

############################################################# Do a log switch############################################################

sqlplus internal <<EOFalter system archive log current;exitEOF

exit 0

=================================================================

#!/bin/ksh

############################################################# split_log.ksh# This script initiates a Split for the $LOG_DG Device# group on the Production Host.############################################################




CLI_BIN=/usr/symcli/binLOG_DG=log_dg

############################################################# Split the LOG_DG device group############################################################

${CLI_BIN}/symmir -g ${LOG_DG} -noprompt -instant splitRETURN=$?if [ $RETURN != 0 ]; thenERROR_DATE=`date`echo "Split failed for Device Group ${LOG_DG}!!!"echo "It is not safe to continue..."echo "Script Terminating."echoecho "split_data: failed"echo "$ERROR_DATE: split_data: failed to split ${LOG_DG}"exit 1fi


RETURN=0while [ $RETURN = 0 ]; do${CLI_BIN}/symmir -g ${LOG_DG} query | grep SplitInProg \ > /dev/nullRETURN=$?REMAINING=`${CLI_BIN}/symmir -g ${LOG_DG} query | grep MB | \ awk '{print $3}'`echo "$REMAINING MBs remain to be split."echosleep 5done

exit 0=================================================================


508



E


◆ Overview........................................................................................... 510

Solutions EnablerCommand Line

Interface (CLI) for FASTVP Operations and

Monitoring

Solutions Enabler Command Line Interface (CLI) for FAST VP Operations and Monitoring 509

510

Solutions Enabler Command Line Interface (CLI) for FAST VP Operations and Monitoring

OverviewThis appendix describes the Solutions Enabler commands lines (CLI) that can be used to configure and monitor FAST VP operations. All such operations can also be executed using the GUI of SMC. Although there are command line counterparts for the majority of the SMC-based operations, the focus here is to show only some basic tasks that operators may want to use CLI for.

Enabling FAST

Operation: Enable or disable FAST operations.

Command:

symfast –sid <Symm ID> enable/disable

Gathering detailed information about a Symmetrix thin pool

Operation: Show the detailed information about a Symmetrix thin pool.

Command:

symcfg show –pool FC_Pool –sid <Symm ID> -detail –thin

Sample output:

Symmetrix ID : 000192601262Pool Name : FC_PoolPool Type : ThinDev Emulation : FBADev Configuration : RAID-5(3+1)Pool State : Enabled....Enabled Devices(20): <==Number of Enabled Data Devices (TDAT) in the Thin Pool { ------------------------------------------------------ Sym Total Alloc Free Full Device Dev Tracks Tracks Tracks (%) State ------------------------------------------------------ 00EA 1649988 701664 948324 42 Enabled 00EB 1649988 692340 957648 41 Enabled...


Solutions Enabler Command Line Interface (CLI) for FAST VP Operations andMonitoring

}Pool Bound Thin Devices(20): <== Number of Bound Thin Devices (TDEV) in the Thin Pool { ----------------------------------------------------------------------- Pool Pool Total Sym Total Subs Allocated Written Dev Tracks (%) Tracks (%) Tracks (%) Status ----------------------------------------------------------------------- 0162 1650000 5 1010940 61 1291842 78 Bound

Checking distribution of thin device tracks across FAST VP tiers

Operation: Listing the distribution of thin device extents across FAST VP tiers that are part of a FAST VP policy associated with the storage group containing the thin devices.

Command:

symcfg –sid <Symm ID> list –tdev –range 0162:0171 –detail

Sample output:

Symmetrix ID: 000192601262Enabled Capacity (Tracks) : 363777024Bound Capacity (Tracks) : 26400000 S Y M M E T R I X T H I N D E V I C E S ------------------------------------------------------------------------------- Pool Pool Total Flags Total Subs Allocated Written Sym Pool Name EM Tracks (%) Tracks (%) Tracks (%) Status ---- ------------ ----- --------- ----- --------- --- --------- --- -----------0162 FC_Pool FX 1650000 5 1010940 61 1291842 78 Bound EFD_Pool -- - - 259212 16 - - - SATA_Pool -- - - 21732 1 - - -

Shows that Symmetrix thin device 0162 has thin device extents spread across data devices on FC_Pool, EFD_Pool and SATA_Pool

. . .

Overview 511

512

Solutions Enabler Command Line Interface (CLI) for FAST VP Operations and Monitoring

0171 FC_Pool FX 1650000 5 3720 0 1505281 91 Bound EFD_Pool -- - - 2040 0 - - - SATA_Pool -- - - 1499184 91 - - -

Legend: Flags: (E)mulation : A = AS400, F = FBA, 8 = CKD3380, 9 = CKD3390 (M)ultipool : X = multi-pool allocations, . = single pool allocation

Checking the storage tiers allocationOperation: Listing the current allocation of defined storage tiers on a Symmetrix

Command:

symtier list -vp

Sample output:

Symmetrix ID : 000192601262

-------------------------------------------------------------------- I Logical Capacities (GB) Target n --------------------------------Tier Name Tech Protection c Enabled Free Used--------------------- ---- ------------ - -------- -------- ----------------------

EFD_Tier EFD RAID-5(7+1) S 2566 2565 1FC_Tier FC RAID-5(3+1) S 4028 2814 1214SATA_Tier SATA RAID-5(3+1) S 2566 1435 1131

Shows that Symmetrix has 3 tiers defined: EFD_Tier, FC_Tier and SATA_Tier and their associated enabled, free and used capacities

Legend: Inc Type : S = Static, D = Dynamic


oracle databases on emc symmetrix storage systems

Documents