database consolidation: ha architecture and … marketing agenda • database consolidation for...

37
Database Consolidation: HA Architecture and Consolidation Methods Kai Yu Oracle Solutions Engineering Dell Inc

Upload: danghanh

Post on 04-Apr-2018

228 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Database Consolidation: HA Architecture and Consolidation Methods

Kai Yu Oracle Solutions Engineering Dell Inc

Presenter
Presentation Notes
Good afternoon, thank OUG Finland for inviting me here to present at the this Conference. It is my great pleasure to share some of RAC experience with friends in Oracle community here. In next 60 minutes, we will have some discussions On Oracle Clusterware and RAC technology. In this presentation I will also discuss the system infrastructure for RAC as network and storage. these are the key to the stability of RAC database. As a DBA, if you plan to use RAC technology for your business, It is very important that you have some good knowledge on these areas.
Page 2: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

About Author Kai Yu, Senior Principal Architect, Dell Database Engineering

20 years Oracle DBA/Apps DBAS and Solutions Engineering Specializing in Oracle RAC/DB, Oracle VM and Oracle EBS Oracle ACE Director , Oracle papers author/presenter 2011 OAUG Innovator of Year, 2012 Oracle Excellence Award: Technologist of the Year: Cloud Architect by Oracle Magazine My Oracle Blog: http://kyuoracleblog.wordpress.com/ Co-author Apress Book “Expert Oracle RAC 12c”

Page 3: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

My Work :Dell Integrated Systems for Oracle DB/OBIEE - Ready Infrastructure Solution based on Flash

Page 4: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Agenda

• Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture Design • High Availability Infrastructure Configuration • Achieving HA through Oracle RAC • Oracle RAC Troubleshooting and Health Check

Page 5: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Database Consolidation • The Journey to Database Cloud

– Challenges to the traditional computing architecture – Consolidate multiples databases : multitenant architecture – Integrate all the resources to allow provisioning on demand:

dynamically provisioning to meet the workload needs – Provide Database as a Service (DBaaS)/ Platform as a Service (PaaS) – Provide Measuring and Charge back service

Presenter
Presentation Notes
What is database cloud, and why we want it? The traditional Corporate Computing Architecture usually has is island like architecture, one infrastructure for one database or application. The challenge is no resource sharing, low resource utilization. This ends up with a lot of systems. This is not only very costly and difficult to manage, also hard to dynamically adapt changing workload as the infrastructure is not very flexibility and you need to build everything from ground for new application. The database cloud model is designed to solve these issues to build a pool of computing resources: servers, memory storage, networking where we can consolidate many database services. This provides a pre-built platform as a service that database can be quickly provisioned and start running. Because of sharing Infrastructure, you have the flexibility to expand for higher workload and also shrink when the demand is reduced. The following diagram shows this kind of cloud infrastructure with a pool of servers, storage and network and Cluster infrastructure pre-configured. Oracle 12c Pluggable database provides the practical solution to implement such an database Cloud architectue.
Page 6: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Database Consolidation Models

– Multitenant Architecture for Database Consolidations consolidating multiple databases on a shared infrastructure – Multiple Levels of consolidation and abstraction

Multiple schemas in a database Multiple PDBs in a container database

Multiple database instance running on an OS Multiple VMs on a physical server (virtualization) Multiple databases/apps shared a storage

Database Cloud Consolidation Methods

Page 7: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Considerations for selecting consolidation models.

– Level of isolation between the databases: security, HA, manageability, etc – Efficiency: Resource utilizations and admin cost – Different models work together

Database Consolidation Models

Page 8: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Example: consolidating multiple databases on a cluster

– Multiple databases (Single/RAC/ RAC one node) on a single cluster – Possible multiple versions Oracle Homes on 11g R2 Clusterware – A example that consolidates 100 Oracle EBS Databases

Database Consolidation Models

Page 9: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Example: consolidate database with Oracle VM

– Consolation of multiple RAC databases in fewer physical servers – Each database instance runs on its own VM independently – One database instance node eviction will not impact other databases – Less impact of downtime during OS and Oracle software upgrade – Easy provisioning with Oracle VM templates – Only need to pay for the licenses based on virtual CPUs –Overhead on OS, database Instances and extra maintaince costs

Database Consolidation Models

Presenter
Presentation Notes
By combining Oracle RAC and Oracle VM, we can achieve an Cloud like architecture like this one. We can have many RAC databases , each of DB instance ran on a separate virtual machine. The advantage of this architecture is that Any issue of one database instance is isolated in its own OS and will not impact another database instance As they are not sharing OS, not sharing physical hardware. We can do rolling upgrade of OS or Oracle software One database instance at a time. The disadvantage is that each database instance has its own OS and Oracle Software, the time and effort to patch them can be significant. For example, we can put the different databases with Different SLA in different VM. So that we can pick different time to upgrade them.
Page 10: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Multitenant Architecture with Oracle 12c Pluggable Databases

Database Consolidation Models

Page 11: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Architecture Comparison: Isolation vs Efficiency

Database Consolidation Models

Page 12: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Database High Availability Requirement

– Defined by Service Level Agreement (SLAs): – HA goal is to meet SLA requirement – Balance between the availability and implementation cost – SLA: for example, 99.95%, annual 4 hrs 22 minutes downtime Downtime window: first Saturday: 8pm-10pm every quarter

High Availability Challenges in Cloud Environment – Consolidating many databases in a single cloud infrastructure – Great business impact due to the infrastructure downtime – Databases may have different SLAs for different business: • Different business requirements • Different time zones • Infrastructure downtime means downtime for all the databases • Very difficult to find the downtime for maintenance that meets every SLA

High Availability for Database Cloud

Presenter
Presentation Notes
First, let’s see what exactly high availability means. In an IT organization, Service Level Agreement Is used to define the application availability agreement between business and IT organization. Our HA goal is to meet SLA. As we can image that usually higher HA is usually associated with cost. We Need to find the balance between cost and HA. Some time We need to ask ourselves do we really need 99.999% HA? For example backend office can be down between 6pm-7am and also need some periodically Downtime for schedule maintenance such as hardware/software upgrade, or application launches. Let’s take a look at SLA requirement for Cloud infrastructure. As this cloud consolidates many databases On the same infrastructure, the availability of these databases really depends on this commonly shared Infrastructure. The availability of this shared infrastructure will have a much larger business impact. And Multiple databases on this infrastructure also increases the chance to have a downtime on this shared Infrastructure. On other hand, if we want to have some downtime for maintenance and upgrade, It is far more difficult to find downtime to meet everyone’s SLA. So by having shared cloud infrastructure for database service, it is very crucial to have high availability And at same time it is much more difficult to implement it. The rest of this session aims to discuss the technology and options for the cloud infrastructure architecture Design for HA purpose, and also discuss some best practices to implement this cloud architecture and Maintain it.
Page 13: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Causes of Impacting System Availability

– Service outage by unplanned downtime: hardware or software failure, human error, nature disaster, etc. – Service disruption by planned downtime:

hardware/software upgrade, patching and migration from old system to new system

– Service performance degradation: violate performance SLA for example, 99% transactions finished in a 2 seconds window

Architect a High Availability Cloud Infrastructure – Architect a highly available cloud architecture – Configure HA infrastructure to reduce unplanned outage – Implement methods/options to minimize planned downtime – Use configuration and implementation best practices for HA – Administration and troubleshooting tips for High Availability – Establish the pre-active real time monitoring system

High Availability for Database Cloud

Presenter
Presentation Notes
Before we go to explore the solutions , let’s do some analysis about what are the causes that may impact the system availability Of the cloud infrastructure. We can classify two kind of causes: unplanned downtime And planned downtime and the system performance degradation. The unplanned downtime occurs without System admin anticipation, unprepared. The loss of service may be usually caused hardware/software failure Human error, or even nature disaster (losing data center). The planning downtime is usually associated with System upgrade or migration. The system performance may be related to performance issue or lost partial Infrastructure. This session will focus on first two. Every phrase of the entire life cycle of the cloud may have something to do with the high availability of the cloud. It starts with the design of high available architecture of the cloud infrastructure so that we can reduce the unplanned Or planned downtime as much as possible. In the implementation Configuration phrase, we need to make sure to leverage the HA best practices for the hardware and software Configuration to reduce unplanned downtime. As an system administrator or DBA, we need to follow the troubleshooting best practices to reduce the impact of Outage in case of system downtime. Another area is how to reduce the planned downtime for upgrade and Migration as these kind of planned downtime can potentially significantly impact the availability of the cloud Infrastructure. The rest of the presentation will focus on these areas.
Page 14: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Oracle Real Application Clusters: active-active cluster database

• Based on share everything clustering architecture • Protect database availability against up to N-1 server failure • Reduce planned downtime for hardware, OS, software upgrade • Add node or remove node based on demand of capacity • Application load balancing • Provide high availability and scalability architecture

Database Cloud Architecture Design

User connectionsUser connections

RAC Instance1

RAC Database

RAC Instance2

RAC Instance3

Cluster Interconnect

Node1 Node2 Node3

Page 15: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Basic Software Components:

– Grid Infrastructure : Oracle Clusterware and ASM – Oracle RAC: Cache Fusion technology

Hardware requirements:

– Shared storage – Private Interconnect network for cluster heartbeat and RAC Cache

Database Cloud Architecture Design

Presenter
Presentation Notes
The first one we want to discuss is the Oracle real application cluster. In this architecture, the clusterware serves the foundation of node synchronation And communication and high availability such as failover. Upper level RAC provides The cache fusion that ensures the data consistency among all the nodes so that All the nodes will function like a single server database. This is An active-active architecture design and can protect database availability against N-1 Server failure. In case of a server failure, the clusterware will automatically failover The virtual IP to other survived node. When all the nodes are functionally, the Workload will be even distributed to all the nodes. This architecture also allow Us to add or remove node base on the capacity demand. It also allow us to do maintaince work on node at a time while keeping other nodes are functioning. The recorded dome shows when one node is down, the database connections On that node were failed over to other five nodes. The key to achieve the stability of this architecture is the private interconnect and Shared storage access that we will discuss a shortly.
Page 16: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Oracle 12c Flex Clusters Architecture

– Scalability limitation of the standard cluster • All nodes tightly-connected: N * (N-1)/2 interconnect paths • All nodes directly connected to storage: total N storage paths • Preventing the cluster to go beyond of 100 nodes.

– Two two-layered hub-and–spoke topology: • Hub Node: interconnected and directly connected to the shared storage • Leaf Node: connected a Hub node, no storage connection • Scalability of Oracle 12cR1 RAC 64 Hub Nodes

20 up to 2000 Hub nodes + Leaf Nodes • Each Leaf node has its dedicated Hub

node to connect Two two-layered

Database Cloud Architecture Design

Page 17: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Oracle RAC one node database: active-passive database

– Single node database on Clusterware ; no load balancing – Require the same system architecture as RAC: network/storage – Install Oracle Grid Infrastructure and RAC on all nodes – Specify RAC One node Database during the database creation

Database Cloud Architecture Design

Presenter
Presentation Notes
A variation is RAC is called RAC one node. The architecture uses the exact clusterware as RAC database. Unlike RAC, a database has multiple instances on multiple node, RAC one node database only run one database instance on one server. But it can be failed over to other node of the cluster in case of the node failure. It also can be relocated to other node if we need to do maintenance work on the original node. The failover prevents from unplanned downtime, while relocation prevents planned downtime. But failover and relocations are provided by the underneath clusterware which is same as The clusterware for RAC. Compared to RAC, RAC one node has a huge advantage on license Cost. It only costs less than half price per process and only need to license one node. It doesn’t have load balance as it only has one node . I had a presentation about One RAC node At Collaborate 12 conference.
Page 18: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

– Protect database against server failure with VIP automatic failover – Reduce planned downtime: Online relocating database to another node for hardware, OS, database software upgrade $srvctl relocate database –d <dbname> -n <nodename> -w -15 – RAC and RAC one node can be switched back and forth Convert to RAC from RAC one node $ srvctl convert database -d <dbname> -c RAC -n <nodename> $srvctl add instance -d <dbname> -i <instname> -n <nodename> Convert to RAC one node from RAC $srvctl remove instance -d <dbname> -i <instname> $ srvctl convert database -d <dbname> -c RAC -n <hostname> – Why RAC one node: Low cost alternative to RAC: $10k/p vs $23k/p (EE) two nodes 2 X8 core servers, $80K vs. $512K(EE)

Database Cloud Architecture Design

Presenter
Presentation Notes
A variation is RAC is called RAC one node. The architecture uses the exact clusterware as RAC database. Unlike RAC, a database has multiple instances on multiple node, RAC one node database only run one database instance on one server. But it can be failed over to other node of the cluster in case of the node failure. It also can be relocated to other node if we need to do maintenance work on the original node. The failover prevents from unplanned downtime, while relocation prevents planned downtime. But failover and relocations are provided by the underneath clusterware which is same as The clusterware for RAC. Compared to RAC, RAC one node has a huge advantage on license Cost. It only costs less than half price per process and only need to license one node. It doesn’t have load balance as it only has one node . I had a presentation about One RAC node At Collaborate 12 conference.
Page 19: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

– Oracle VM provides HA against physical server failure • Databases run on virtual machine • High isolation: each database runs on a dedicated OS • Lower efficiency: extra overhead on OS, database instance • Virtual machines run on a pool of VM servers (VM server pool) • Virtual machines can be failed over or migrate to different physical server to reduce planned/unplanned downtime

Database Cloud Architecture Design

Presenter
Presentation Notes
Another HA system architecture is Oracle VM. Oracle VM establishes the cluster environment among The VM servers in the same server pool to allow the virtual machines to failover or migrate from One virtual server (physical host). As shown in this diagram, if the physical machine for VM server failed, The gust VM will with the database running on the VM failover to the second VM server 2.
Page 20: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

High Availability Storage Infrastructure

– Storage HA plays a key role in the infrastructure HA –Redundant Storage Controllers for high availability – Oracle ASM diskgroup redundancy settings – Redundant IO paths from servers to storage array

High Availability Infrastructure Configuration

Presenter
Presentation Notes
After we discussed the high available architecture design of the cloud infrastructure. Let’ Discuss the some of best practices of configuration and implementation to truly this architecture can really provide the fault tolerance . Let start with the very bottom level: Hardware , specifically the storage configuration. Storage subsystem HA is the key to the system HA as these are where the data and software stored. The HA design is to ensure That the storage accessibility and data availability are protected against any component failure. One of the method is to have the redundant IO path so that any component of this IO path failure will not cause storage downtime. For example, in this diagram, the path Linking all the Components server to storage including HBAs, switches, controllers have multiple path Any component of this path failure will not break this IO path.
Page 21: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

– Redundant IO paths from servers to storage array Multiple physical paths: (DB-HBA#, Switch#, Controller-HBA#, Controller# ), DB-HBA#: =1,2; Switch# = 1,2; Controller-HBA#=1,2,3,4, Controller# =1,2 A storage access should be able to fail over to another path

High Availability Infrastructure Configuration

Presenter
Presentation Notes
For example, we have four combination of IO paths. If anyone of the component fails, for example, a controller fails, the storage volume will be accessible through another Controller. The multipath software combines the redundant IO paths and present a Single logic device data1. All the IO on this logic volume will can be failed over To a redudant path in case of any component failure. On the storage level, we configured disk Raid to ensure the disk redudancy . The storage will survive from An diskfaiure.
Page 22: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Software multipathing: to group the multipath Linux Device Mapper (DM) or from storage vendors Group them together to alias (ibion1_v_d2) using multipathing software Linux Configuration: /etc/multipath.conf #service multipathd restart

–Oracle ASM diskgroup redundancy settings –SAN Disk Array RAID for Redundancy: Raid 10/ 5 Configuration

High Availability Infrastructure Configuration

Presenter
Presentation Notes
For example, we have four combination of IO paths. If anyone of the component fails, for example, a controller fails, the storage volume will be accessible through another Controller. The multipath software combines the redundant IO paths and present a Single logic device data1. All the IO on this logic volume will can be failed over To a redudant path in case of any component failure. On the storage level, we configured disk Raid to ensure the disk redudancy . The storage will survive from An diskfaiure.
Page 23: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Network High Availability Configuration

– fully redundant interconnects for cluster configuration

– Network bonding vs Oracle Highly Available Virtual IP (HAIP) – Dedicated switches for private interconnects

Redundant Hardware Infrastructure for Cluster Database

High Availability Infrastructure Configuration

Page 24: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

VM Server Configuration RAC nodes on VMs:

High Availability Infrastructure Configuration

Page 25: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Achieving HA through Oracle RAC

Page 26: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Preventing Unplanned downtime through RAC

– Virtual IP (VIP) automatic failover by Oracle Clusterware without waiting for TCP/IP timeout – Application connections failed over to surviving nodes DML will be rolled back and started over after reconnecting. Transparent Application Failover (TAF): Client side failover: specify how to failover query – Oracle Notification Services (ONS) for notifying down event – Fast Connect Failover (FCF) for fast connection failover: database clients registered with Fast Application Notification (FAN), database clients get notified the up and down event and react accordingly. failover works for most database clients: JDBC, OCI,, UCP, etc – Application Continuity (AC) of Oracle 12: During the instance outage, automatically replay the transaction on another instance without the need for end-users and applications resubmitting the transaction.

Achieving HA through Oracle RAC

Presenter
Presentation Notes
Transparent Application Failover (TAF): Client side failover specify how to failover query, works for OCI (Oracle Call Interface) Not JDBC. When an instance fails, FCF allows all database clients that connect to the failed instance to be quickly notified about the failed instance by receiving the Down event. Then, these database clients will stop and clean up the connections and immediately establish new connections to the surviving instance. FCF is supported by JDBC and OCI clients, Oracle Universal Connection Pool(UCP), and Oracle Data Providers for .Net. Oracle FCF is more flexible than TAF.
Page 27: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Reducing planned downtime for upgrade

– Online patches can be applied to a running database. The patches contain a single shared library, and do not require shutting down the instance or relinking the Oracle binary to apply the patches $ opatch query -all online

….. "Patch is a rolling patch: true.“ Rollback all the online patch and reapply the offline version at next downtime Refer MOS note: 761111.1

– Rolling upgrade: one instance is shutdown down for upgrade while other are functioning:

• rolling upgrade of server hardware, firmware, BIOS, OS • rolling upgrade of Oracle Clusterware and ASM • rolling upgrade of RAC: database patch, PSU, CPU patches

– Rolling upgrade process in RAC: • Relocate the database connections to the other instance • Shutdown (DB/OS/Server) , upgrade, restart • Repeat on other nodes.

Achieving HA through Oracle RAC

Presenter
Presentation Notes
In order to apply the rolling upgrade for Oracle RAC software, the Oracle RAC home must be on a local file system on each RAC node in the cluster, not in a shared file system. There are several types of patch for an Oracle RAC database: interim patch, bundle patch, patch set upgrades (PSU ), critical patch update (CPU) , and diagnostic patch. Before you apply a RAC database patch, check the readme to determine whether or not that patch is certified for the rolling upgrade. You can also use the Opatch utility to check if the patch is a rolling patch: You can upgrade Oracle Clusterware in a rolling manner from Oracle Clusterware 10g and Oracle Clusterware 11g, however you can only upgrade Oracle ASM in a rolling manner from Oracle Database 11g release 1 (11.1).
Page 28: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

– Rolling upgrade requirements Oracle RAC . Local Oracle Home; Rolling upgradable patch: $opatch query –all <Patch_location> | grep rolling …. “Patch is a rolling patch: true.” – Rolling database Upgrade to reduce downtime:

• Use Data Guard SQL apply and transient Logical Standby feature • MOS note #949322.1

Achieving HA through Oracle RAC

Presenter
Presentation Notes
In order to apply the rolling upgrade for Oracle RAC software, the Oracle RAC home must be on a local file system on each RAC node in the cluster, not in a shared file system. There are several types of patch for an Oracle RAC database: interim patch, bundle patch, patch set upgrades (PSU ), critical patch update (CPU) , and diagnostic patch. Before you apply a RAC database patch, check the readme to determine whether or not that patch is certified for the rolling upgrade. You can also use the Opatch utility to check if the patch is a rolling patch:
Page 29: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Oracle RAC Troubleshooting

Page 30: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Clusterware health check and troubleshooting

– Clusterware utility: crsctl for check crs status and start/stop crsctl check cluster –all; crsctl stat –ret -t – Log files: $GIRD_Home/log/<host>/alert<host>.log and

$GIRD_Home/log/<host>/<process>/log

Oracle RAC Troubleshooting and health Check

Page 31: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Clusterware health Verification Utility : CLUFY

– Verifies Clusterware, RAC best practices, mandatory requirements: $./cluvfy comp healthcheck –collect cluster –bestpractice –html

Page 32: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Oracle RACcheck: A RAC Configuration Audit tool

– audit the configurations settings for RAC, Clusterware andASM – MOS note ID 1268927.1, download the tool – To invoke : ./raccheck ; it will produce an auditing report

CHM: detect/analyze OS and Cluster’s resource related degrdations and failure.

– A set of tools tracing OS resource consumptions – Enhanced in Oracle 12cR1 and consists three components:

• osysmond: System Monitor Service process on each node monitor and collects real time OS metric data and send to Ologgerd

• ologgerd: cluster logger service, one for each 32 node • Grid Infrastructure Management Repository (the CHM repository): central

repository to store metrics data Grid Infrasturcture Management Repository (GIMR)

– A single instance Oracle database run by grid user – Installed on one of the cluster nodes – Need to select Advanced Installation option – run on the same node that runs the ologgerd service to reduce the traffic

Oracle RAC Troubleshooting and health Check

Page 33: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

– by default the database is stored in same location of OCR/Voting disks $ oclumon manage -get repsize reppath alllogger -details CHM Repository Path = +DATA1/_MGMTDB/DATAFILE/sysmgmtdata.260.807876429 CHM Repository Size = 38940 Logger = knewracn1 Nodes = knewracn1,knewracn2,knewracn4,knewracn7,knewracn5,knewracn8,knewracn6

– Use OCLUMON to manage the size and retention of the repository diagcollecton.pl to collect the CHM data

– Get the master node: $ oclumon manage -get master – login to the master node as root – run the command: diagcollection.pl -collect -crshome <CRS_HOME> – It produces four .gz files which include various log files for diagnosis.

use OCLUMON query the CHM repository for node specific data $oclumon dumpnodeview -allnodes -v -s “begin_timestamp" -e “end_timestamp

Oracle RAC Troubleshooting and health Check

Page 34: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Node Eviction: Cluster split brain condition: a node failure partitions the cluster into multiple sub-clusters without knowledge of the existence of others Possible causes::not responding network heartbeat, disk heartbeat, a hung node or hung ocssd.bin process

• Consequence: data collision and corruption • IO fencing: fencing the failed node off from all the IOs: STOMITH (Shoot The Other Machine In The Head) algorithm • Node eviction: pick a cluster node as victim to reboot. Always keep the largest cluster possible up, evicted other nodes two nodes: keep the lowest number node up and evict other • Two CSS heartbeats and misscounts to detect node eviction

1. Network HeartBeat (NHB) over private interconnect to check node membership; misscount: 30 secs

2. Disk heartbeat : between the cluster node and voting disk: misscount: 200 secs

Oracle RAC Troubleshooting and health Check

Page 35: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

– Troubleshooting node eviction

• Common causes for OCSSD eviction: network failure latency exceeds CSS miscount 30 seconds access disk issue: CSS misscount 200 sec OCSSD failure, • Common causes of : CSSDAGENT OR CSSDMONITOR eviction: OS

scheduler problem caused by OS locked up in driver or hardware or the heavy loads; thread of CSS demon hung

• Review the log files, refer to metalink note [1050693.1] Node Eviction Diagnosis Examples – Case 1 :Node 2 was rebooted in a 2-node cluster on Linux: OCSSD log:

$CRS_HOME/log/<hostname>/cssd/ocssd.log file in Node1:

Oracle RAC Troubleshooting and health Check

Page 36: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Case 2: node 1 reboot: $CRS_HOME/log/<hostname>/cssd/ocssd.log file

Case 3: One node rebooted once a month in 11 nodes cluster: ------- /var/log/message: Jul 23 11:15:23 racdb7 logger: Oracle clsomon failed with fatal status 12. Jul 23 11:15:23 racdb7 logger: Oracle CSSD failure 134. Jul 23 11:15:23 racdb7 logger: Oracle CRS failure. Rebooting for cluster integrity ------ OCSSD log: $CRS_HOME/log/<hostname>/cssd/ocssd.log file [ CSSD]2011-07-23 11:14:49.150 [1199618400] >WARNING: clssnmPollingThread: node racdb7 (7) at 90% heartbeat fatal, eviction in 0.550 seconds … [ CSSD]2011-07-23 11:15:19.079 [1220598112] >TRACE: clssnmDoSyncUpdate: Terminating node 7, racdb7, misstime(60200) state(3)

Oracle RAC Troubleshooting and health Check

Page 37: Database Consolidation: HA Architecture and … Marketing Agenda • Database Consolidation for Database Cloud • High Availability for Database Cloud • Database Cloud Architecture

Global Marketing

Thank You and QA Contact me at [email protected] or visit my Oracle Blog at http://kyuoracleblog.wordpress.com/

37

Presenter
Presentation Notes
=