no outages: maintenance without impact · improving the maintenance timeline no risk doing things...

51
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | No Outages: Maintenance without Impact Troy Anthony, Ian Cookson, Carol Colrain Oracle Database Development October, 2018

Upload: others

Post on 26-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

No Outages:Maintenance without Impact

Troy Anthony, Ian Cookson, Carol ColrainOracle Database DevelopmentOctober, 2018

Page 2: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

2

Page 3: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Continuous Availability

Are your building blocks in place?

Automating Maintenance

Patching OJVM

Customer Story - Epsilon

1

2

3

4

3

5

Page 4: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

From High Availability to Continuous Availability

4

• Minimizes downtime

• In-flight work is lost

• Rolling maintenance at DB

• Predictable runtime performance

• Errors may be visible

• Designed for single failure

• Basic HA building blocks

• No downtime for users

• In-flight work is preserved

• Maintenance is hidden

• Predictable performance

• Errors visible only if unrecoverable

• Designed for multiple failures

• Builds on top of HA

High Availability Continuous Availability

Page 5: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Continuous Availability

Continuous Availability is not Absolute Availability.

Probable outages and maintenance events at the database level are masked from the application, which continues to operate with no errors and within the specified response time objectives while processing these events.

Key points:

• Planned maintenance and likely unplanned outages are hidden from applications

• There is neither data loss nor data inconsistency, guaranteed

• Majority of work (% varies by customer) completes within recovery time SLA

• May appear as a slightly delayed execution

Many customers are achieving Continuous Availability Today

5

Page 6: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

There is no reason for users to see downtime during scheduled database maintenance

• Service is unavailable

• Application owners unable to agree maintenance windows

• Long running jobs see errors

• DBA’s and engineers work off hours

• Application and middleware components need to be restarted

6

No Outages: Maintenance without Impact

Family Holidays website is down"Our online transaction services are currently unavailable. Our server may be temporarily down or we may be performing routine maintenance functions scheduled every Sunday from 12 a.m. to 5 a.m. (Eastern Standard Time). We apologize for any inconvenience."

Page 7: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

How Do We Solve This?

• Move work to different instance/database with no errors reported to applications

• Transparent to applications and mid-tiers

• Support all server side maintenance

– Patches, PSUs, upgrades, repairs, changes, unplug/plug, migration, expansion, h/w replacement

• Configure once, same for all commands

7

Page 8: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Are Your Building Blocks in Place?

8

Page 9: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

What Needs to be Configured?

• RAC or RAC One, Active Data Guard, GoldenGate, GDSWhich Server Stack for me?

• All databases on Flex ASMFlex ASM

• Services for Location TransparencyServices

• Connections appear continuous Continuous Connections

• FAN or 18c Database for drainingDraining

• Application Continuity Inflight work continues

• Drain in a timely mannerSLA’s

9

Page 10: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Draining and Failover Locally – Switchover between sites

1010

Active Data Guard– Scheduled switchover– Data Protection, DR– Query Offload

Data Guard – Scheduled switchover– Data Protection, DR

GoldenGate– Scheduled switchover– Active-active replication– Heterogeneous

Sharding– Massive OLTP– Scheduled switchover– Active-active replication– Heterogeneous

Fast Application Notification– Notify draining, failover, load balancing

Transparent Application Continuity– Application HA

Global Data Services – Cross Site Placement

RAC– Online Rolling Maintenance– Scalability– Server HA

RAC One– Online Rolling Maintenance– Server HA

Production Site

Drain within RAC

Switchover ADG

Drain within RAC

Page 11: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

TIP: Use Services for Location Transparency

• Regardless of location, application keeps the name

• Moving, reshaping, prioritizing controls how a service is offered

• Batch and OLTP separated

• DB and PDB names for admin only

11

Node 1

RAC instance

Node 2

RAC instance

OLTP service

Batch service

Services provide a “dial in number” for your application

Page 12: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

TIP: Drain sessions before maintenance

Drain connections where applications do not notice

• Web requests

• Transaction boundaries

• Connection tests

• Connection pools

• Custom rules

12

Page 13: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Moving Work Since Oracle Database 10.2

Repeat for each service allowing time to drain

• Stop servicesrvctl stop service –db .. -instance .. -service ..

• Relocate service srvctl relocate service –db .. -service .. –oldinst .. –newinst

srvctl relocate service –db .. -service .. –currentnode.. –targetnode

Wait for sessions to drain…

For remaining sessions, stop transactionalexec DBMS_APP_CONT.disconnect_session( ‘… your service ..‘,

DBMS_APP_CONT.POST_TRANSACTION);

Stop the instances using your preferred tool

13

Page 14: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 14

Drain and Rebalance

TIP: Use Oracle Pools – Full Lifecycle

FAN Planned

Drain & rebalance

Applications using …

Oracle – WebLogic Active GridLink, UCP, ODP.NET managed and unmanaged, OCI Session Pool, Tuxedo, CMAN TDM

3rd party App Servers using UCP: IBM WebSphere, Apache Tomcat, NEC WebOTX, Red Hat JBoss, Spring

DBA Step srvctl [relocate|stop] service (no –force)

Sessions Drain

Immediately new work is redirected

Gradually

Active sessions are released when returned to pools

Page 15: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

TIP: UCP with other Java-based Application Servers

15

A simple data source replacement

• IBM WebSphere

• Apache Tomcat

• NEC WebOTX

• Red Hat JBoss

• Spring

Page 16: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

TIP: Configure TNS for Continuous Connections

16

Everything Oracle uses FAN

JDBC Universal Connection Pool

OCI/OCCI driver

ODP.NET Unmanaged Provider (OCI)

ODP.NET Managed Provider (C#)

OCI Session Pool

WebLogic Active GridLink

Tuxedo

JDBC Thin Driver

CMAN in Traffic Director mode

FAN is Auto-Configured

DESCRIPTION =

(CONNECT_TIMEOUT=90)

(RETRY_COUNT=20)(RETRY_DELAY=3)

(TRANSPORT_CONNECT_TIMEOUT=3)

(ADDRESS_LIST =

(LOAD_BALANCE=on)

( ADDRESS = (PROTOCOL = TCP)

(HOST=primary-scan)(PORT=1521)))

(ADDRESS_LIST =

(LOAD_BALANCE=on)

(ADDRESS = (PROTOCOL = TCP)

(HOST=standby-scan)

(PORT=1521)))

(CONNECT_DATA=(SERVICE_NAME=gold)))

Page 17: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

TIP: Are You Receiving FAN Notifications?• Create a FAN callout in GRID_HOME/racg/usrco

• Download FANwatcher from OTN oracle.com/goto/ac

18

FANwatcher..

VERSION=1.0 event_type=SERVICEMEMBER service=orcl_swing_pdb2 instance=orcl1 database=orcl db_domain= host=sun01 status=down reason=USER timestamp=2014-07-30 12:02:51 timezone=-07:00

VERSION=1.0 event_type=SERVICEMEMBER service=orcl_swing_pdb10 instance=orcl1 database=orcl db_domain= host=sun01 status=down reason=USER timestamp=2014-07-30 12:02:52 timezone=-07:00

VERSION=1.0 event_type=SERVICE service=orcl_swing_pdb10 database=orcl db_domain= host=sun01 status=down reason=USER

Page 18: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Plan B

The application “tests” the connection

Database responds connection is bad

New work continues on another connection

Connection Tests, All Drivers

19

Application or mid-tier tests connections

Draining by the Oracle Database

Tip: enable in DBMS_APP_CONT

view in DBA_CONNECTION_TESTS

Stop or relocate services/PDBs

Database marks sessions to drain

Page 19: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 20

Application Server Test Name Connection Test to DB

Oracle WebLogic –Generic & Multi data sources

TestConnectionsOnReserve

TestConnectionsOnCreate

isUsable

SQL – SELECT 1 FROM DUAL

Oracle WebLogic Active GridLink

embedded isUsable

IBM WebSphere PreTest Connections SQL - SELECT 1 FROM DUAL

Red Hat JBoss check-valid-connection-sql SQL - SELECT COUNT(*) FROM DUAL

Apache TomCatTestonBorrow

TestonReleaseSQL - SELECT 1 FROM DUAL

ODP.NET Unmanaged Connection.status OCI_ATTR_SERVER_STATUS

TIP: Enable Connection Tests for Application Servers

Page 20: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Application Condition Connection Test to DB

eBusiness Suite Connection borrowed from WeblogicTestConnectionsOnReserve

with "BEGIN NULL; END;"

Fusion Applications Connection returned to WebLogic and C++

pools and checked

TestConnectionsOnReserve

with isValid

OCIPing

OCI_ATTR_SERVER_STATUS

Siebel Connection requested OCI_ATTR_SERVER_STATUS

Peoplesoft Connection requested OCIPing

Customer Custom pool with Meta data table

Checks status every 60sOCI_ATTR_SERVER_STATUS

TIP: Enable Connection Tests for Applications

2121

Page 21: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

End of Request

22

Application or mid-tier ends a request

Stop or relocate services/PDBs

Database & Driver marks sessions to drain

Draining by the Oracle Database and Drivers

Tip: enable in DBMS_APP_CONT

view in DBA_CONNECTION_TESTS

Plan B continued

Request boundaries sent by all Oracle 12c Pools & JDK9

18c Database or the 18c Drivers close connection after end request

New work continues on another connection

Page 22: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

•Group operations pdb,

instance, node, or database

•New service attributes

drain_timeout (seconds)

stopoption (immediate,

transactional)

23

DBA Operations Simplified

Page 23: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

DBA commands: Group commands

Relocate all services by database/node/pdb

srvctl relocate service -database -instance -drain_timeout.. –stopoption

[immediate|transactional]

srvctl relocate service -node . -drain_timeout.. –stopoption

srvctl stop service –pdb . -drain_timeout.. –stopoption

Start/Stop everything at a node

srvctl stop service -node <node_name> -drain_timeout.. –stopoption

srvctl stop database -node <node_name> -drain_timeout.. –stopoption

Data Guard Switchover

Switchover to <db_resource_name> [wait [xx]];

24

Page 24: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 25

Drain… Connect… Failover

Time

TPM

Node 1

Node 2

Drain Timeout

(1) Drain Node 1

(2) Connect to Node 2

(4) Replay in-flight requests

(3) Terminate in-flight requests

Page 25: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Hides errors, timeouts, and maintenance

No application knowledge or changes to use

Rebuilds session state & in-flight transactions

Adapts as applications change: protected for the future

12c use Application Continuity

TIP: Use Transparent Application Continuity 18c

26

DatabaseRequest

Errors/Timeouts hidden

TAC

My application has not drained on time

Page 26: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

TIP: Some Applications have Built-in Read-Only Failover

27

Example Applications Set Once and Done

Siebel

Recommended TNS + STOPOPTION = Transactional

Drain_Timeout = n

PeopleSoft

JD Edwards

Informatica FAN is autoconfigured for unplanned

Page 27: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

With stopoption transactional:

• COMMIT_OUTCOME=true for Transaction Guard

• FAILOVER_RESTORE = LEVEL1

TIP: TAF SELECT with Transaction Guard – 12.2

28

DatabaseRequest

Disconnects

TAF + TG restore a new session + common states

If you have TAF SELECT now

For applications that do not change session state after connect

Page 28: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Align Application Timeouts

29

TIP: Nothing to do when

Active/Active RAC

Switchover

DG Broker

RAC Primary RAC Standby

Stop or Relocate

Service

to

Drain Work

Drain Work

Switchover

TIP: Application Timeout

Drain + Switch to DG

Page 29: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Automating Maintenance

30

Page 30: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Improving the Maintenance Timeline

No RiskDoing things outside the maintenance window is the preferred solution for necessary steps such as• real-time checks• create gold image• out of place distribution• switch service• drain service

Performance RiskDoing things during this phase is acceptable for any necessary steps that cannot be done ahead of time e.g.• final pre-check• restart on new home

Maintenance Window

Move tasks to the left

Eliminate StepsEliminating unnecessary

steps is the best solution whenever possible such as backup existing home

Shrink as much as possible

Preparation

Page 31: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Software Maintenance in the Time of Cloud

• Challenges• Complex, Error-prone Process• Maintaining standardization• Minimizing Downtime• Takes too much time and effort

OpenStackCloud

ProprietaryCloud

Datacenter

DatacenterDatacenter

Page 32: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 3310/29/2018

Rapid Home Provisioning

Local target Connected Targets 11.2, 12.1, 12.2,

18c

RHP Server

Gold Image Repository

11.2.0.4

12.1.0.2

12.2.0.1

18c

Page 33: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 34

RHP Support

Database & Grid Infrastructure

11.2.0.3.

11.2.0.4.

12.1.0.2

12.2.0.1

18c

VM VM

VM VM

VM VM

VM VM

• Single Instance• Oracle Restart• Oracle RAC One• Oracle RAC

BM

Non-CDB CDB/PDB

VM

• Generic Software

• Data Guard Aware

• Customizable

Multi-OS

Rapid Home Provisioning

Page 34: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

OJVM Patching

35

Page 35: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

OJVM PSU – Flow chart

36

RAC Rolling Install Process for the "Oracle JavaVM Component Database PSU" (OJVM PSU) Patches (Doc ID 2217053.1)

Page 36: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

TIP: OJVM Rolling

• RAC will coordinate the apply of the OJVM patch across all nodes in a rolling fashion

• Non-OJVM work not affected

• RAC tracks sessions that use OJVM in the Database

• Phased approach: prepare, execute update, post-update

• During the execute update phase OJVM work is blocked

– < 10 seconds brownout

An Automated Approach to patching the OJVM

37

Non-OJVM

OJVM

Prepared Prepared

Exec Update

Blocked

RAC Rolling

Page 37: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

©2014 Epsilon. Private & Confidential

Hiding Scheduled Maintenance at Epsilon

38

Page 38: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

©2014 E

psilo

n D

ata

Managem

ent, L

LC

. Priv

ate

& C

onfid

entia

l

39

• Epsilon is comprised of three divisions that deliver a global platform for customer experience marketing

• More than 3,500 associates and 37 offices worldwide

• Largest permission-based e-mailer in the world, delivering over 40 billion emails annually

• World’s leading source of data with information covering over 250 million consumers and 22 million businesses

• More than 2,000 global clients, including 26 of the Fortune 100

9 out of 10 Top Banks

8 out of 10 Top Retailers

The Top 10 Pharmaceutical Companies

The Top 10 Automotive Companies

Epsilon at a Glance

Page 39: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

©2014 E

psilo

n D

ata

Managem

ent, L

LC

. Priv

ate

& C

onfid

entia

l

40

• New client opportunity with extreme performance and availability requirements

• Real time POS integration with over 10000 sites

• Decrease time to market

• Real-time monitoring and reporting of system performance and health

• Need to run OLTP, batch and reporting workload concurrently against real time

data without impacting user experience

• Support over 2000 real-time transactions per second with SLA of less than 100 ms

and 99.95% availability

• Less than 8 hours RPO and RTO

High Level Business Requirements

Page 40: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

©2014 E

psilo

n D

ata

Managem

ent, L

LC

. Priv

ate

& C

onfid

entia

l

41

• Dedicated connection model from application server to database

• Application was using ODAC 11gR5

• Database hardware platform using HP DL980 server with PCIe Flash storage and Hitachi storage array

• Database version 11gR2 (11.2.0.3)

Previous State and Challenges

• No draining was available using dedicated connection model

• Every planned maintenance/unplanned outage needs application layer restart –a major pain point

• Every application executable needs FAN notification port in odp.net

• Due to large number of dedicated connection , application server CPU utilization was high

• No support for commit outcome in case of failure in 11g

• Running mixed workload was not very user friendly on non-engineered system as there is no IO prioritization in tradition storage array

• Premier support of 11.2.0.3 was coming to an end ( Aug 2015 )

Page 41: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

©2014 E

psilo

n D

ata

Managem

ent, L

LC

. Priv

ate

& C

onfid

entia

l

42

• Used connection pool instead of dedicated connection

• Application is now using ODAC 12cR3

• Database hardware platform using Exadata X4-2 half rack

• Database version 12cR1 ( 12.1.0.2 BP10 )

Current State and Resolutions

• Connection pool drains quickly after receiving FAN events

• No longer application server restart required for planned maintenance or unplanned outage of oracle stack – a big relief

• Only one port required for ONS remote communication ( 6200 ) in 12c instead of many ports in 11g

• CPU utilization reduced by 40% in application servers after connection pool implementation

• Fast Application Notification and Application Continuity provides a robust error handling mechanism

• Exadata is ideal platform for running mixed workload with both CPU and IO prioritization handling

• Moved to a supported release good for next 3 years

Page 42: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

©2014 E

psilo

n D

ata

Managem

ent, L

LC

. Priv

ate

& C

onfid

entia

l

43

Technology Stack

• Exadata X4-2 half rack for both Primary and DR site

• Web servers

• Application server uses :

• ODP.NET ( ODAC 12cR3 )

• WebLogic server using Active Gridlink

• Oracle Database 12c ( 12.1.0.2 )

Real Application Cluster (RAC )

Data Guard

Fast Application Notification ( FAN )

Application Continuity ( AC )

Transaction Guard ( TG )

• Database backup uses ZFS Backup Appliance (ZS3-BA )

Page 43: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

©2014 E

psilo

n D

ata

Managem

ent, L

LC

. Priv

ate

& C

onfid

entia

l

44

Web Server cluster

Application Server Cluster

ODAC 12cR3

Web Service Call

OLTP BatchRead

only

OLTP

Service

Batch

Service

RRAC Node 1

12.1.0.2

RRAC Node 2

12.1.0.2

Primary Site

Primary DB

Web Server cluster

Application Server Cluster

ODAC 12cR3

OLTP BatchRead

only

Report

Service

RRAC Node 1

12.1.0.2

RRAC Node 2

12.1.0.2

Standby Site

Standby DB

Normal Operation : Application service placement

Page 44: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

©2014 E

psilo

n D

ata

Managem

ent, L

LC

. Priv

ate

& C

onfid

entia

l

45

Web Server cluster

Application Server Cluster

ODAC 12cR3

Web Service Call

OLTP BatchRead

only

OLTP

Service

Batch

Service

RAC Node 1

12.1.0.2

RAC Node 2

12.1.0.2

Primary Site

Primary DB

Web Server cluster

Application Server Cluster

ODAC 12cR3

OLTP BatchRead

only

Report

Service

RRAC Node 1

12.1.0.2

RRAC Node 2

12.1.0.2

Standby Site

Standby DB

Scheduled maintenance: Application service placement

Fast

Application

Notification

Page 45: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

©2014 E

psilo

n D

ata

Managem

ent, L

LC

. Priv

ate

& C

onfid

entia

l

46

Web Server cluster

Application Server Cluster

ODAC 12cR3

Web Service Call

OLTP BatchRead

only

OLTP

Service

Batch

Service

RAC Node 1

12.1.0.2

RAC Node 2

12.1.0.2

Primary Site

Primary DB

Web Server cluster

Application Server Cluster

ODAC 12cR3

OLTP BatchRead

only

Report

Service

RRAC Node 1

12.1.0.2

RRAC Node 2

12.1.0.2

Standby Site

Standby DB

Scheduled maintenance: Application service placement

Fast

Application

Notification

Page 46: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

©2014 E

psilo

n D

ata

Managem

ent, L

LC

. Priv

ate

& C

onfid

entia

l

47

Case study – Transaction response time

Page 47: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

©2014 E

psilo

n D

ata

Managem

ent, L

LC

. Priv

ate

& C

onfid

entia

l

48

Case study – Overall transaction response time

Page 48: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

©2014 E

psilo

n D

ata

Managem

ent, L

LC

. Priv

ate

& C

onfid

entia

l

49

Business Benefits

• Scheduled maintenance of Oracle technology stack can be done

without disrupting business user experience.

• Application restart is no longer required for scheduled maintenance

which is a major relief.

• Usage of connection pool reduces CPU utilization of middle tier servers

by 40%

• Database and operating system can be patched periodically

without taking any system downtime ( meet security compliance as

well as uptime SLA )

• We have similar success for Java based applications using WebLogic

Server Active Gridlink to hide scheduled maintenance operation without

any application code change

Page 49: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

©2014 E

psilo

n D

ata

Managem

ent, L

LC

. Priv

ate

& C

onfid

entia

l

50

More Successes

• Application Continuity for ODP.NET to hide unplanned outages and

scheduled maintenance without any application code change

• Application Continuity for Java with WebLogic Server Active Gridlink to

hide unplanned outages and scheduled maintenance without any

application code change

Page 50: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 51

Page 51: No Outages: Maintenance without Impact · Improving the Maintenance Timeline No Risk Doing things outside the maintenance window is the preferred solution for necessary steps such