2k04hp it-symposium 2006 2 page 2 automatic failover across sites with data guard fast-start...

47
HP IT-Symposium 2006 www.decus.de 1 Page 1 “This presentation is for informational purposes only and may not be incorporated into a contract or agreement.” This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described in this document remains at the sole discretion of Oracle. This document in any form, software or printed matter, contains proprietary information that is the exclusive property of Oracle. This document and information contained herein may not be disclosed, copied, reproduced or distributed to anyone outside Oracle without prior written consent of Oracle. This document is not part of your license agreement nor can it be incorporated into any contractual agreement with Oracle or its subsidiaries or affiliates.

Upload: others

Post on 07-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 1

Page 1

“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”

This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development,

release, and timing of any features or functionality described in this document remains at the sole discretion of Oracle. This document in any form, software or printed matter, contains proprietary information

that is the exclusive property of Oracle. This document and information contained herein may not be disclosed, copied,

reproduced or distributed to anyone outside Oracle without priorwritten consent of Oracle. This document is not part of your license agreement nor can it be incorporated into any contractual agreement

with Oracle or its subsidiaries or affiliates.

Page 2: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 2

Page 2

Automatic Failoveracross sites with

Data Guard Fast-Start Failover

DECUS, Duesseldorf 2006

Larry M. CarpenterPrincipal Product Manager

High Availability & Disaster RecoveryOracle USA

Page 3: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 3

Page 3

Agenda

• A quick look at Data Guard• How do users perform Failover today?• Fast-Start Failover – An Overview• Fast-Start Failover – The Details• Client Failover• User Experiences

A quick look at Data Guard

Page 4: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 4

Page 4

What is Oracle Data Guard?

• Oracle’s Disaster Recovery solution for Oracle.• A Feature of Oracle Database Enterprise Edition• Automates the creation and maintenance of one

or more transactionally consistent Standby databases.

• Provides comprehensive role management.• Role transitions

• Standby to Primary and back to Standby• For planned and unplanned outages

A Data Guard Configuration

• Managed as a single configuration• Primary and standby databases can be Real Application Clusters

or single-instance Oracle• Up to nine standby databases supported in a single configuration

PrimaryDatabase

StandbyDatabase

Standby Site A

Standby Site B

Primary Site

StandbyDatabase

Broker

Page 5: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 5

Page 5

How do users perform Failover today?

Failover Implications

• Faster is better - Downtime is bad• If manual intervention is required, the time it

takes to notify administrative staff can be lengthy• Reliability is a must-have

• Correct procedure for failover must be followed to meet data loss (recovery point) objective

• Simplicity is preferred• Determining if failure condition warrants failover

adds time & complexity to the failover process

Page 6: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 6

Page 6

Best PracticeAdd Standby Redo Logs

• A separate pool of log file groups on a standby site

• Used just like the online redo logs on a primary• Requires local archiving on the standby database• Requires same size and number of Primary

database online redo logs but more is better• Cannot be assigned to a thread in 9i• Are required for Zero Data loss configurations as

of Oracle Database10g Release 1.

SRL Architecture

Redo from primary database

ARCH

StandbyRedo Logs

ArchivedRedo Logs

Physical &

Logicalstandby

databases

RFSLGWR

ARCH RFS

New!10g

10g

Page 7: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 7

Page 7

Benefits

• Better Performance• Standby redo logs are pre-allocated files• Can reside on raw devices

• Better Protection• Can have multiple members• If primary database failure occurs, redo data

written to standby redo logs can be fully recovered.

Failover

• Failover needed when switchover not possible• i.e. The primary is gone!• Same basic steps as switchover but some

processing might be done manually• Remember!

• Don’t plan for DR by expecting to be able to return to the Primary and ‘get’ something.

• You won’t be able to ‘get’ anything.

Page 8: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 8

Page 8

Choose a Standby• Choose a standby site with the most up to date redo

information• If Primary was ‘protected’ then one site must have the

‘latest’ redo information in its Standby On-line Redo logs• Archive Log file ‘GAPs’ at this site must be resolved from

the other standby sites

• Choose a Physical and other standby databases will come along if possible.

• Choose a Logical and none of the other standby databases can come along

Physical Standby Failover

Page 9: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 9

Page 9

Failover to a Physical Standby

PrimaryDatabase

Physical Standby Database

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH;

ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;

1

2

3RESTART DATABASE

Step 1 Improvements

• In Oracle Database10g Release 2 • New FORCE keyword

• RECOVER MANAGED STANDBY DATABASE FINISH FORCE;

• The new FORCE option stops active RFS processes on the target standby database so the failover will proceed immediately, without waiting for network connections to time out, once logs have been applied.

• SQLNET.EXPIRE_TIME no longer necessary.

10.2.0.1

Page 10: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 10

Page 10

Step 3 Improvements

• In Oracle Database10g Release 2 • No longer necessary to restart the Standby for it

to become the Primary, just do an:• ALTER DATABASE OPEN;

• Requires that the standby was not opened read only since it was last started.

• Speeds up failover time considerably.

10.2.0.1

Failover to a Physical Standby

PrimaryDatabase

Physical Standby Database

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH FORCE;

ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;

1

2

3ALTER DATABASE OPEN;

10.2.0.1

Page 11: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 11

Page 11

Logical Standby Failover

Failover to a Logical Standby

PrimaryDatabase

Logical Standby Database

ALTER DATABASE STOP LOGICAL STANDBY APPLY;

ALTER DATABASE START LOGICAL STANDBY APPLY FINISH;

1

2

ALTER DATABASE ACTIVATE LOGICAL STANDBY DATABASE;

4

ALTER DATABASE STOP LOGICAL STANDBY APPLY;

3

Page 12: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 12

Page 12

Reduced Steps

• In Oracle Database10g Release 2 • Apply finish and failover now in one command.• No longer necessary to start or stop the Apply.

10.2.0.1

Failover to a Logical Standby

PrimaryDatabase

Logical Standby Database

ALTER DATABASE ACTIVATE LOGICAL STANDBY DATABASE FINISH APPLY;

1

10.2.0.1

Page 13: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 13

Page 13

Using the Data Guard Broker

One Step always!

• Login to DGMGRL by connecting to any surviving database in the configuration

• And execute the failover!• DGMGRL> FAILOVER TO <database>;

• You still need to decide which standby to use as the target of the failover!

Page 14: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 14

Page 14

What about Grid Control?

Page 15: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 15

Page 15

Fast-Start Failover

Eliminate the Manual Steps!

Remember the Requirements?

• Faster is better - Downtime is bad?• Site failover time measured in seconds

• Not minutes• Failover is automatic, no manual intervention

• Reliability is a must-have?• Eliminates human error

• Simplicity is preferred?• Automatically determines if failover criteria met• Original primary database is automatically

reinstated as a new standby database.

Page 16: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 16

Page 16

Fast-Start Failover Architecture

• Primary Database

• Target Standby Database

• Observer Process

Standby SitePrimary Site

Observer

database database

Fast-Start Failover

The Details

Page 17: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 17

Page 17

Fast-Start Failover Requirements

• Primary and Standby are managed by the Data Guard Broker

• Primary database must be in Maximum Availability mode

• Primary and standby must have Flashback Database enabled

• Observer host must have DGMGRL utility installed and must have Oracle Net connectivity to both the primary and standby

Setting it up using Grid Control

Page 18: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 18

Page 18

Page 19: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 19

Page 19

Page 20: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 20

Page 20

Page 21: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 21

Page 21

Page 22: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 22

Page 22

Page 23: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 23

Page 23

Using the Broker Directly

DGMGRL command line interface

Set the Target and Threshold

• Configure• “FastStartFailoverTarget” is the “DB_UNIQUE_NAME” of the

target standby database. Using DGMGRL;• DGMGRL> EDIT DATABASE 'North_Sales‘

SET PROPERTY FastStartFailoverTarget =‘DR_Sales’;

• “FastStartFailoverThreshold” is the Number of seconds Observer attempts to reconnect to the primary database before initiating fast-start failover• DGMGRL> EDIT CONFIGURATION

SET PROPERTY FastStartFailoverThreshold = 45;

Page 24: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 24

Page 24

Enable and Startup

• Enable• Can be done before or after the Observer

• DGMGRL> ENABLE FAST_START FAILOVER;

• Start• Log in to the Observer host

• DGMGRL> START OBSERVER;

• Control is not returned to the user until the observer is stopped

• Specify the -logfile parameter on the command line so that output generated while acting as the observer is not lost.

Post Failover

• Reinstate After Failover• Auto reinstatement of old primary as a new

standby will happen when the original Primary database is available again.

• Can also be performed manually• DGMGRL> REINSTATE DATABASE <database>;

Page 25: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 25

Page 25

How does it work?

Fast-Start Failover

Standby SitePrimary Site

Observer

1. Data Guard in steady state – transmitting redo2. Observer monitoring state of the configuration

Page 26: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 26

Page 26

Fast-Start Failover

Standby SitePrimary Site

Observer

3. Disaster strikes the primary – connections lost

Fast-Start Failover

Standby SitePrimary Site

Observer

4. Observer <=> primary connection times out (timeout threshold configurable)5. Observer asks target standby if it is ready to fail over6. Observer begins Fast-Start Failover

Page 27: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 27

Page 27

Fast-Start Failover

Observer

Primary Site

7. Target standby automatically becomes new primary

Fast-Start Failover

Observer

Standby Site Primary Site

8. After old primary is repaired, Observer re-establishes connection9. Observer automatically reinstates old primary to be a new standby10. Redo transmission starts from new primary to new standby

Page 28: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 28

Page 28

When is a Fast-Start Failover Triggered?• Primary Site Failure• Primary Database Conditions:

• Instance Failure• Last surviving instance if RAC

• Shutdown abort of the last available instance• Datafiles taken offline due to I/O errors

• Threshold ignored when performing a failover due to offline datafiles

When is a Fast-Start Failover Triggered?• Network Related Conditions:

• Failover occurs only if link between primary and observer as well as primary and standby are down

• Requires a connection between Observer and standby to enable the Observer to confirm that the configuration is in a synchronized state

• By ensuring that at least two fast-start failover partners are present, conditions such as split-brain scenarios are avoided

Page 29: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 29

Page 29

Fast-Start Failover Monitoring

Fast-Start Failover Monitoring

• Monitor current state of configuration via FS_FAILOVER_STATUS column of V$DATABASE• SYNCHRONIZED – Primary and Standby are in sync• UNSYNCHRONIZED – Standby does not have all of

the primary database redo• Monitor the Observer via the FS_FAILOVER_OBSERVER_PRESENT column of the V$DATABASE view

Page 30: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 30

Page 30

Reinstatement afterFast-Start Failover• Any attempt to start old primary will stop at the

mount state thus preventing split brain• Once Observer sees the old primary is at the

mount state, reinstatement is begun• The old primary is automatically reinstated as

the new standby using flashback database• Once reinstated and synchronized then a

switchover can occur if desired – returning all systems to their original roles

Best Practices – Primary Database

• Maximum Availability Protection Mode• Redo Transport: LGWR SYNC AFFIRM

• Synchronous Redo Shipping . . . but• Primary is not affected by network or standby outages• Set net_timeout parameter to override TCP timeout

• Configure Flashback Database • Set DB_FLASBACK_RETENTION_TARGET = 10 minutes

Note: If Flashback Database serves additional function of protection against user error & corruption, then an extended flashback retention period should be set for an amount of time required to achieve these goals

Page 31: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 31

Page 31

Best Practices – Network Transport

• Tune OS & network parameters • Set SDU=32K• Tune network parameters that affect network

buffer sizes and queue lengths • Ensure sufficient network bandwidth for

maximum database redo rate + other activities

Refer to Primary Site and Network Configuration Best Practiceshttp://www.oracle.com/technology/deploy/availability/pdf/MAA_DG_NetBestPrac.pdf

Impact of Network Tuning

Impact of Network Tuning

937

10.8

0 200 400 600 800 1000

Tuned

Default

Network throughputMbits/sec

Oracle MAA Test Result

Page 32: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 32

Page 32

Best Practices – Standby Database• Use Standby Redo Logs• Use Real-time Apply• Configure Flashback Database

• Set DB_FLASBACK_RETENTION_TARGET = 10 (minutes)

• Optimize Apply Performance using MAA Best Practices for:

• Redo Apply (physical standby): Data Guard Redo Apply and Media Recovery

• SQL Apply (logical standby): Oracle Database 10g Data Guard SQL Apply

• MAA Home Page on OTN: http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm

Best Practices – Observer

• Install in a separate location from Primary & Standby data centers

• Do not locate the Observer at or near the primary site• Proximity to the standby site is preferred, but far

enough away to be isolated from events that typically impact the standby site

• Oracle Client Administrator install is all that is required for Observer install

• If using Enterprise Manger, also install the Enterprise Manager Agent

Page 33: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 33

Page 33

Best Practices – Setting FastStartFailoverThreshold

• Failover occurs when observer and standby lose contact for specified time (seconds)

• Recommended settings:• Single Instance primary with low latency reliable

network = 10 – 15 seconds• Single Instance primary with high latency network

over WAN = 30 – 45 seconds • RAC primary = (misscount+reconfiguration time)

+ 20-40 seconds

Best Practices – Multiple Standbys

• Ensure data protection at all times by maintaining a 2nd Data Guard standby at a remote location

• When regulatory & business requirements mandate that data be protected at all times

• At failover time, the remote standby automatically becomes a standby for the new primary

• New primary must have begun as a physical standby• Configure the remote standby for Maximum

Performance• Eliminates overhead of WAN network latency• Recommended redo transport is LGWR ASYNC

Page 34: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 34

Page 34

Best Practices – HA & DR

• Use RAC & Data Guard Together• The best possible combination of HA & DR

• Scalable, flexible, secure• Foundation for MAA

Client Failover

Oracle Database 10g Release 1vs

Oracle Database 10g Release 2

Page 35: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 35

Page 35

Oracle Database 10g Release 1 Client Failover

Standby SitePrimary Site

1. Primary site failure 2. Both FAN ONS (JDBC) and OCI clients wait for TCP timeout

JDBC/OCI Clients

Oracle Database 10g Release 1 Client Failover

New Primary SiteFailed Primary Site

3. Data Guard manual failover is executed, standby database transitions to primary role4. FAN ONS (JDBC) and OCI clients are NOT notified of new primary cluster5. Clients are redirected manually using TAF or some other mechanism6. Old primary database is rebuilt from a new backup

JDBC/OCI Clients

Page 36: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 36

Page 36

A Sample Hardware Solution

Cisco Distributed Directory

Data Guard

InternetInternet

Cisco Local Directory

Cisco Local Directory

Application Servers

Application Servers

RAC RAC

Primary Standby

A DNS Solution

Primary Standby

Data Guard

InternetInternet

Application Servers

Application Servers

RAC RAC

DNS Server

Page 37: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 37

Page 37

A TNS Method

• Use the ‘service_name’ parameter to have whomever is the Primary defined with the name the applications look for.

• Works with Dynamic registering of the database with the listener when the database is mounted.

• Requires that the parameter be changed accordingly.

• Requires a 2nd tnsnames entry for Data Guard transport services.

On the Primary System

• Configure the listener and start it.• LISTENER.ORA

• Status

LISTENER =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = Primary)(PORT = 1521))

)

Listener Parameter File /private2/oracle/OraHome/network/admin/listener.oraListening Endpoints Summary...(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=Primary)(PORT=1521)))

Services Summary...Service “payroll.us.oracle.com" has 1 instance(s).Service “payrollDR.us.oracle.com" has 1 instance(s).The command completed successfully

Page 38: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 38

Page 38

On the Standby System

• Configure the listener and start it.• LISTENER.ORA

• Status

LISTENER =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = Standby)(PORT = 1521))

)

Listener Parameter File /private2/oracle/OraHome92/network/admin/listener.oraListening Endpoints Summary...(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=Standby)(PORT=1521)))

Services Summary...Service “payrollDR.us.oracle.com" has 1 instance(s).The command completed successfully

Primary System TNS

• Primary tnsnames.oraPAYROLLDR =(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = Standby.us.oracle.com)(PORT = 1521))

)(CONNECT_DATA =(SERVICE_NAME = payrollDR.us.oracle.com)

))

PAYROLL =(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = Standby.us.oracle.com)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = Primary.us.oracle.com)(PORT = 1521))

)(CONNECT_DATA =(SERVICE_NAME = payroll.us.oracle.com)

))

Page 39: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 39

Page 39

Standby System TNS

• Standby tnsnames.oraPAYROLLDR =(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = Primary.us.oracle.com)(PORT = 1521))

)(CONNECT_DATA =(SERVICE_NAME = payrollDR.us.oracle.com)

))

PAYROLL =(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = Standby.us.oracle.com)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = Primary.us.oracle.com)(PORT = 1521))

)(CONNECT_DATA =(SERVICE_NAME = payroll.us.oracle.com)

))

After Switchover or Failover

• Reset the service_names parameter• New Primary (The ‘Old Standby’)

• New standby (The Old Primary)

• The LOG_ARCHIVE_DEST definitions point to each other using payrollDR

• The application client systems only have the Payroll definition so they try system 1 first then system 2.

ALTER SYSTEM SET SERVICE_NAMES=‘payroll,payrollDR’;

ALTER SYSTEM SET SERVICE_NAMES=‘payrollDR’;

Page 40: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 40

Page 40

A TNS setup

Data Guard

UsersUsers

DBLINKS

Trading

Operational Data Store

Primary Site Secondary Site

What about RAC?

• Multiple addresses are required for the Primary RAC nodes to facilitate node failover

• Cannot really have multiple Primary RAC hosts and then Standby RAC hosts in the same connect string.

• Would require too many connect timeouts to get to the standby

• Need a better more proactive method• Let’s talk about Oracle Database10g Release 2

Page 41: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 41

Page 41

Oracle Database 10g Release 2 Improved Client Failover

New Primary SiteFailed Primary Site

1. Observer detects failure, executes database failover when threshold is exceeded2. DB_ROLE_CHANGE trigger fires: enables primary service, updates Oracle Net alias to

point to new primary host, restarts JDBC mid-tier clients, calls any other application or pre-failover steps User writes the trigger code

3. DB_DOWN event is sent to FAN OCI clients4. Both FAN ONS (JDBC) and OCI clients drop connections and re-attach to the new primary5. Upon restart, the old primary database is reinstated automatically by Fast-Start Failover

JDBC/OCI Clients

Observer

Client Failover Components

• Connect Time Failover• Redirects failed connection requests to a

secondary listener• Transparent Application Failover (TAF)

• Client applications automatically reconnect to a database if the original connection fails.

• Fast Application Notification (FAN)• Provides quick notification when a resource (an

instance, service, node, or database) fails.

Page 42: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 42

Page 42

Client Failover Components

• Fast Connection Failover (FCF)• Provides fast failover of database connections by

allowing you to configure FAN-integrated clients to automatically subscribe to FAN HA events.

• DB_ROLE_CHANGE system event• Fired when the primary database is first opened

after a Data Guard role transition has occurred.• DB_DOWN Event

• Fired by the Broker after a failover

DB_ROLE_CHANGE Event

• New DB_ROLE_CHANGE system event fires. • A Trigger written around DB_ROLE_CHANGE

event can automatically:• Enable primary service name• Modify LDAP or other naming methods• Restart JDBC mid-tier clients• Start user applications

• Happens at all role changes.• All details with examples are described in the

MAA Best Practices paper published on OTN

Page 43: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 43

Page 43

A Sample TriggerSQL> CREATE OR REPLACE TRIGGER set_rc_svc AFTER

DB_ROLE_CHANGE ON DATABASE DECLARE role VARCHAR(30); BEGIN SELECT DATABASE_ROLE INTO role FROM V$DATABASE; IF role = 'PRIMARY' THENDBMS_SERVICE.START_SERVICE(‘payroll'); begin dbms_scheduler.create_job( job_name=>'change_ldap', job_type=>'executable',job_action=>'/u01/oracle/10.2.0/bin/change_ldap.sh',enabled=>TRUE);

end;begin

dbms_scheduler.create_job( job_name=>'publish_events', job_type=>'executable', job_action=>'/u01/oracle/10.2.0/bin/cfo.sh',enabled=>TRUE );

end;ELSE DBMS_SERVICE.STOP_SERVICE(‘payroll'); END IF; END;

Data Guard BrokerClient Redirection

• New DB_DOWN event is posted after the new primary is open

• Event notifies FAN OCI clients that the old primary is down

• Clients reconnect to the new primary/service• Done via AQ notifications

• Occurs only during a Broker Failover• A Fast-Start Failover• Manual “Failover to <database>” command

Page 44: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 44

Page 44

Fast-Start Failover

How Available?How Fast?

How Simple?How Reliable?

Amazon.comFannie Mae

Thomson Legal & RegulatoryAirbus Deutschland GmbH

How Available? - Amazon.com

The capability of fast, guaranteed zero-data-loss failover with Fast-Start Failover in Oracle Data Guard takes the availability of an Oracle database platform to new levels. Our initial tests running Oracle Database 10g Release 2, show that Fast-Start Failover offers a magnitude of improvement in availability.

Rajesh ShethManager, Database EngineeringAmazon.com

Page 45: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 45

Page 45

How Fast? - Fannie Mae

Fast-start Failover takes the DBA off the critical path. Database failover is automatic. Data Guard can now address recovery time objectives measured in seconds.

Ranjit Singh VeenManager, Enterprise Systems ManagementFannie Mae

How Simple? - Thomson Legal & Regulatory

Fast-start failover testing has shown great potential. The original primary database can be reinstated as a new standby in less than 5 minutes once the initial failure has been corrected.

Thomson Legal & Regulatory

Page 46: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 46

Page 46

How Reliable? - Airbus

Failover executed automatically without manual intervention in less than a minute. This was much faster than a cold failover using third party cluster technology. With Data Guard, Airbus can achieve continuous data protection and high levels of availability using a standard feature of the Oracle Database.

Werner Kawollek Application Management Operations Airbus Deutschland GmbH

Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S

Page 47: 2K04HP IT-Symposium 2006 2 Page 2 Automatic Failover across sites with Data Guard Fast-Start Failover DECUS, Duesseldorf 2006 Larry M. Carpenter HP IT-Symposium 2006 3 Page 3 Agenda

HP IT-Symposium 2006

www.decus.de 47

Page 47