con8225 - under the hood - oracle€¦ · con8225 - under the hood diagnosing and troubleshooting...

74

Upload: others

Post on 21-May-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

CON8225 - Under the Hood

Diagnosing and Troubleshooting Oracle Enterprise Manager 12c Release 4

Andrew Bulloch, Werner De Gruyter, Courtney Llamas Enterprise Manager Strategic Customer Programs September 29th, 2014

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

1

2

3

4

5

Architecture Overview

Diagnostic Methodology

4 Key Processes

Summary

Appendix

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Architecture Overview

1

2

3

4

5

Architecture Overview

Diagnostic Methodology

4 Key Processes

Summary

Appendix

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Total Cloud Control

Optimized, Efficient Agile, Automated | |

Expanded Cloud Stack Management

Scalable, Secure

Superior Enterprise-Grade Management

Complete Cloud Lifecycle Management

Administration and Maintenance Monitoring Service Level Management

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager Infrastructure Overview

Repository (OMR): • Heart

• Data storage, aggregation, rollup and purging

Management Server (OMS): • Brains

• Handles management data from Agents, and

delegates administration tasks to the Agents

• Handles administration and real-time monitoring

requests from UI

• Sends out notifications

Agents (OMA): • Nerve endings

• Receives and acts upon task requests from the

OMS

• Gather management information

• Perform administration task

Repository Database

Management Server

EM Users: EMCLI Console / Reports

Agent

Agent

Agent

Firewall

Notifications

Connectors

Internet

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Requirements per tier

• Agent version must be the same as the OMS or lower

• JDK 1.6.0_43 or higher

• Deployed from the OMS (agent push) or manual install (agent pull)

• Plug-ins deployed when needed

* One Agent per monitored machine

Repository (OMR)

• WebLogic Server 10.3.6

• Dedicated WLS server

• Sun JDK 1.6.0_43 or higher

• OUI will install WLS if not already

installed

• Enterprise Edition database

- Fine-grained access control

- Partitioning

• Version 10.2.0.5 or higher*

– Recommend dedicated version 11.2.0.4

• Use physical standby only for Data Guard

• Details on the new repository views can be found in the Extensibility Guide

Management Server (OMS) Management Agent (OMA)

Focus on: CPU, IO Focus on: Network, Memory Focus on: Connectivity, CPU

*Check certification matrix in My Oracle Support

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

EM Architecture

EM is composed of many different subsystems

UI & Real-Time Monitoring

Notifications

Reporting

Incident Management Data

Collection

Data Aggregation

Data Loading

Alerting Jobs

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

EM Architecture

Jobs

Repository (OMR) Management Server (OMS) Management Agent (OMA)

Data Collection

Data Loading

Data Aggregation

Incident Management

Alerting Notifications

Reporting UI & Real-Time

Monitoring

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

EM Architecture

User Interaction & Real-Time Monitoring

Notifications

Repository (OMR) Management Server (OMS) Management Agent (OMA)

Reporting

Incident Management

Data Collection Data Aggregation

Jobs

Data Loading

Alerting

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

EM Architecture – Focus today on 4 key Processes

Repository (OMR) Management Server (OMS) Management Agent (OMA)

User Interaction & Real-Time Monitoring

Notifications

Jobs

Data Loading

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

The Takeaways for Today….

• Information flows are initiated by background processes (eg: callbacks) and as a result of user initiated activity (eg: Console or EMCLI)

• Efficient interactions between all the EM components is the key to good performance

• Monitor performance and throughput on each tier

• Resources have to be balanced on all tiers (Agent, OMS and repository)

• Resource constraints on one tier can cause up-stream or down-stream bottlenecks

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager – 4 Key Processes

Management Server and Repository

User Interaction & Real-Time Monitoring Administrators logging into EM, performing tasks and requesting information

Notifications

Processing of alerts and notifying the right set of administrators and/or 3rd party

help desk applications

Jobs & Tasks Jobs initiated from both Administrators (user jobs), as well as internal housekeeping and operational jobs (system jobs and housekeeping operations)

Data Loading

Agents uploading telemetry and operational data

Agent

Console 3rd Party

Agent

Inte

racti

ve

B

ackg

rou

nd

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Diagnostic Methodology

1

2

3

4

5

Architecture Overview

Diagnostic Methodology

4 Key Processes

Summary

Appendix

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Methodology – Method in the madness

Diagnostics Workflow

• The Console is your starting point. It’s good (and getting better) at giving you task based errors (often with solution guidance)

• MTM (Monitor The Monitor) is good at giving you non-task based information but it’s all ‘pull based’ and asynchronous in nature

• Sometimes the only way to identify or diagnose a problem is at the component level (Repository, OMS or Agent)

Console

MTM

Component

Kn

ow

led

ge

Leve

l

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Component Diagnostics Tools Available

• emctl and emcli command-line utilities

• JAVA Tools (jps, jstat, jmap, etc…)

• Agent Metric Browser

• Agent (oracle_emd) targets

Repository (OMR)

• The OMS is just another managed WebLogic stack • JVM Diagnostics (!)

• MDA (Middleware Diagnostic Advisor)

• etc…

• emctl and emcli command-line utilities

• JAVA Tools (jps, jstat, jmap, etc…)

• 'Repository' (oracle_emrep) and OMS (oracle_oms) targets

• The repository is just another managed database: Database Advisor information also available for the repository database

• ADDM (Diagnostic Monitor)

• AWR (Workload Repository)

• ASH (Session History)

• Segment Advisor

• etc…

Management Server (OMS) Management Agent (OMA)

EMDIAG (EM Diagnostics)

repvfy omsvfy agtvfy

421053.1 : EMDIAG Master Index 1556491.1: Using Agent Metric browser for Diagnosing Agent Issues

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

4 Key Processes User Interaction & Real-Time Monitoring

1

2

3

4

5

Architecture Overview

Diagnostic Methodology

4 Key Processes

Summary

Appendix

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

User Interaction & Real-Time Monitoring

What are the users doing? Symptoms Experienced

Regular work in the Console • Logging in • Checking incidents • Performing administration tasks

Slow login Slow page performance

Real-time Monitoring and debugging of managed targets • Database performance pages • Adding tablespaces • Viewing log files

Slow response Refresh of pages takes a long time Connectivity issues (no data available)

Reports • Automated (scheduled) • Interactive (ad-hoc, on-demand)

Slow response Scheduled reports not running at appointed time

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

• Tracking UI page performance Manage Cloud Control -> Health Overview -> Monitoring -> All Metrics -> Page Performance

• Beacons can be added to monitor the responsiveness of the EM application over time – Deploy in strategic locations on the network – Per URL basis (login, or start of a process) – Can be done from any Agent

User Interaction & Real-Time Monitoring How do we monitor it?

12cR4

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

User Interaction & Real-Time Monitoring How to get more information? (1) - Standard EM monitoring data

• Aggregated UI usage (new in 12cR4) Setup -> Manage Cloud Control -> Health Overview -> Monitoring -> Page Performance

• Track the test performance of the

beacon An 'EM Management Beacon' is created during install to monitor the EM application on the 1st OMS

1460408.1: Troubleshooting OMS and UI Performance Issues

12cR4

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

User Interaction & Real-Time Monitoring How to get more information? (2) - Tracing an UI page

Identify Session

Start Trace

Perform Actions

Stop Trace

Run Report

$ emcli list_active_sessions -details

1640578.1: How to find out the SQL Statements that are run in the Repository

$ emcli trace -enable=true -user=<user to be traced>

$ emcli trace -enable=false -user=<user to be traced>

$ emctl genreport oms -file_name <just-the-filename>.trace

• Use EMCLI to trace any UI page (identify session-> start trace-> perform actions > stop trace-> run report)

• Run report on the OMS with emctl (check emctl.log file for details about the generated trace file)

<< user interaction here >>

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

User Interaction & Real-Time Monitoring How to control resources?

• General slowness for all the EM pages > JAVA heap size for the OMS (to accommodate memory for each logged in user)

– Use OMS_HEAP_MAX, OMS_HEAP_MIN, OMS_PERMGEN_MAX, OMS_PERMGEN_MIN parameters to change the heap settings $ emctl set property -name OMS_HEAP_MAX -value 2560

– Common values for the JAVA Heap are between 2Gb and 4Gb

• Inconsistent performance > Asymmetric RAC database configuration

– RAC configuration and resources not identical across the database

> Load Balancer setup and configuration – Users require session affinity to a particular OMS

(For new connections: Round-Robin vs. Most Requested)

• Login problems > Number of session in the database (init.ora setting)

– Check sessions and processes setting in the database

> Automatic logout of inactive sessions – Controlled by the global 'oracle.sysman.eml.maxInactiveTime' parameter (specified in minutes)

Specified only once for ALL OMS’s, default is 45 minutes $ emctl set property -name oracle.sysman.eml.maxInactiveTime -value 45

> Control the number of possible connections from clients (rarely changed):

– MaxClients parameter in httpd.conf file Default is 150 (means: 150 simultaneous connected sessions for this OMS serviced by Oracle HTTP Server)

Default values: OMS_HEAP_MAX 1740M

OMS_HEAP_MIN 56M

OMS_PERMGEN_MAX 768M

OMS_PERMGEN_MIN 128M

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

4 Key Processes Tasks & Jobs

1

2

3

4

5

Architecture Overview

Diagnostic Methodology

4 Key Processes

Summary

Appendix

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Tasks & Jobs

What are tasks and jobs? Types of tasks and jobs Symptoms EM Jobs - Job Engine of Enterprise Manager

• Maintenance and Administration tasks

submitted by the Administrators in the console

(RMAN, OS commands, reports, patches, etc…)

• Short - Small synchronous requests made

to the Agent (status updates, data retrieval,

etc…)

• Long - Long running asynchronous requests

(file transfers, running of OS commands,

etc…)

Jobs not executing on the appointed time,

failed or suspended jobs, …

EM Jobs - Job Engine of Enterprise Manager

• Maintenance and Administration tasks

submitted by EM itself in background in

response to monitoring or administration

operations (Clustered target fail-over, template

apply, Administration group synchronization, etc…)

• Short - Small synchronous requests made

to the Agent (status updates, data retrieval,

etc…)

• Long - Long running asynchronous requests

(file transfers, running of OS commands,

etc…)

Pending template apply operations,

Administration groups not updated, pending

availability status for cluster members, …

Internal Jobs - DBMS_SCHEDULER engine

(Repository)

• Housekeeping tasks (Composite Availability

Calculation, RCA analysis, Compliance score

calculation, Rollup, Purge, etc…)

• Short - Quick operations taking less than 60

seconds

• Long - Long running operations (more than

60 seconds)

Pending availability status for clustered

targets, compliance scores not updated, EM

monitoring data unavailable

Tasks

Jobs

Jobs

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Tasks & Jobs – EM Jobs

Two parts for executing jobs in EM: – Scheduling information for jobs (Repository-side)

Decide which jobs need to get picked up by an OMS for dispatching One DBMS_SCHEDULER jobs run every 30 seconds to check the schedule

– Dispatching and executing information (per OMS) Get the job details, and inform the Agents of the work that has to get done. One thread per OMS to pick-up the scheduled work, and multiple worker threads per OMS to (talk to the Agents and ) execute the actual work to be done

How do we monitor it?

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Tasks & Jobs – EM Jobs

Information in the Console:

• Special service created: 'EM Job Service' This is a rollup of the repository operations (step scheduling) and all OMS operations (job dispatching) information

• Reports for the EM Job system 'Job System Diagnostic Report' (New for 12cR4)

Things to check:

• Growing backlog (not enough resources)

• High % processing time (>75% in general means a repository bottleneck)

• Low throughput with high processing % time (Processing bottleneck)

How to get more information? (1) - Retrieving monitoring data

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Tasks & Jobs – EM Jobs

• OMS log and trace files (EM application tracing) – Look for modules 'Job Step', 'RJob Step', 'Load Job Step' and 'JobRecv‘ in <GC_INST>/em/EMGC_OMS1/sysman/log

emoms.log : English OMS trace file emoms.trc : Native language OMS trace file

– Enable debug logging $ emctl set property -name log4j.category.oracle.sysman.emdrep.jobs -value "DEBUG" -module logging

$ emctl set property -name log4j.category.oracle.sysman.eml.jobs -value "DEBUG" -module logging

Always reset the debug levels when finished !

• Repository (PL/SQL tracing) – Enable tracing

SQL> exec emdw_log.set_trace_level('EM.JOBS',<level>);

or:

$ repvfy send start_trace -name "EM.JOBS"

$ repvfy send stop_trace -name "EM.JOBS"

– To generate the PL/SQL trace report: $ repvfy dump trace

• EMDIAG reports $ repvfy dump backlog (Backlog report for all information flows) $ repvfy dump job_health (EM job system health details)

How to get more information? (2) - Logging and tracing

Levels 0 - Fine / Debug

1 – Informational

2 – Warning

3 - Severe

4 - No Tracing / OFF

421053.1 : EMDIAG Master Index 1670012.1: Impact of Setting Debug Mode for OMS and Steps to Enable Debug for Particular Subsystems

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Tasks & Jobs – EM Jobs

• See the current pool sizes: $ repvfy show job_pools

• Change the jobs pool sizes on ALL OMS’s if available threads are low (<10%), and dispatched steps are high (>60): $ emctl set property -name oracle.sysman.core.jobs.shortPoolSize -value 25

$ emctl set property -name oracle.sysman.core.jobs.longPoolSize -value 12

$ emctl set property -name oracle.sysman.core.jobs.systemPoolSize -value 25

$ emctl set property -name oracle.sysman.core.jobs.longSystemPoolSize -value 10 $ emctl set property -name oracle.sysman.core.jobs.waitPoolSize -value 10

• Job Activity Details Page (Detail for each OMS)

How to control resources? (1) OMS thread sizing

12cR4

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Tasks & Jobs – EM Jobs

• If the Job pool sizes are changed, the number of connections the EM job sub-system can make to the repository database will also have to get tuned

– Minimum value: System Normal + System Critical (Default = 25+10 = 35)

– Maximum value: Sum of all job pools (Default = 25+12+25+10+10 = 82)

– Recommended value if user jobs are submitted frequently: Minimum value + User Short/2 (Default = 35+25/2 = ~ 47)

• To change the jobs pool sizes on the OMS: (Default value out-of-box is 35) $ emctl set property -name oracle.sysman.core.conn.maxConnForJobWorkers -value 47

How to control resources? (2) OMS connection sizing

12cR4

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Tasks & Jobs - Tasks

• Standard database monitoring: Monitor any DBMS_SCHEDULER job in the database using the Database Job Status metric

• Tasks are executed by DBMS_SCHEDULER jobs Use the 'DBMS Job Status' metric from the 'OMS and Repository' target to track the performance of the task workers (Repository Metrics nn)

• Performance and throughput numbers are stored per execution of a task

What kind of monitoring do we have?

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Tasks & Jobs - Tasks

Information in the Console:

• The scheduling information and the task backlog can be found on the Repository page Setup -> Manage Cloud Control -> Repository

Things to check:

• Growing backlog (possible lack of threads)

• Low throughput and constant backlog (processing bottleneck)

• High average duration (Should be <60 second for short running tasks and up to 2 to 4 min for long running tasks)

How to get more information? (1) - Retrieving the monitoring data

12cR4

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Task & Jobs - Tasks

• Tracking tasks $ repvfy dump errors Overview of errors reported in the repository

• Repository (PL/SQL tracing) $ repvfy send run_task -id <number>

• EMDIAG reports $ repvfy dump backlog Backlog report for all information flows $ repvfy dump task_health Health report for Task sub-system

421053.1 : EMDIAG Master Index

How to get more information? (2) - Logging and tracing

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Tasks & Jobs - Tasks

• Two distinct classes of tasks: Short (< 1 minute elapsed time per execution) and Long (> 1 minute elapsed time per execution) By default, there are 2 threads defined per class Increase number of worker threads if the time spend per hour is more than 75% and there is constant backlog for the task class

• Change in the UI on the Repository page Setup -> Manage Cloud Control -> Repository -> Repository Collection Performance -> Configure

• Or Use EMDIAG

– See the current thread per class $ repvfy show worker_tasks

– Change the number of worker threads in the repository to a minimum of 2 per class $ repvfy send set_workers

How to control resources? - Tasks

12cR4

421053.1 : EMDIAG Master Index

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Tasks & Jobs – DBMS_SCHEDULER

• Controlled by the job_queue_processes database parameter (Recommended value is 10)

• Check running jobs in the database on the 'Repository' page Setup -> Manage Cloud Control -> Repository -> Repository Scheduler Job Status region

• Edit the schedule for the daily jobs to run in off-peak time

How to control resources? - DBMS_SCHEDULER

12cR4

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

4 Key Processes Target Data

1

2

3

4

5

Architecture Overview

Diagnostic Methodology

4 Key Processes

Summary

Appendix

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Loading

What types of data are loaded into EM? Examples

Target Definition Metadata describing the target, and the monitoring of the target

Target Telemetry Availability Performance Throughput

Target State Target state changes (up/down) Errors Alerts and threshold violations

Configuration data Setup and configuration Properties

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Loading

• OMS loader metrics – Basic performance and capacity metrics

• Capacity graphs on the OMS page – Aggregate operational performance for all OMS's – Based on the metrics collected per OMS

How do we monitor it?

12cR4

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Loading

• A back-off requests is generated when the OMS detects it can not process ALL incoming requests from the Agents at a given point in time – Agent is told to retry the upload requests in 'x' seconds (progressive approach for repeat offenders,

starting with 1 second and a maximum of 300 seconds)

• This is caused by an information flood: – Processing time for loading data too slow (database resource issue, performance issue, data

processing problem) – Too much information generated by all the Agents

(metric data, alerts / state changes, metadata) – A single Agent generating so much information

it is preventing other Agents from having the ability to upload data

How do we handle multiple simultaneous requests?

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Loading

• The 'Lifecycle Status' property of the target influences the behavior for loading data (incoming), notification processing (outgoing) and job dispatching (outgoing)

• The OMS and Repository know the lifecycle status of each target, and uses that with every request to/from this target to prioritize the administration requests

• Change the property on the 'Target Properties' page Target homepage -> Target Setup -> Properties

How do we prioritize incoming requests?

Possible values 1 – Mission Critical (Highest)

2 – Production

3 – Stage (Or blank/no value)

4 – Test / QA

5 – Development (Lowest)

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Loading

Information in the Console:

• Loader statistics are shown on the Health Overview page Setup -> Manage Cloud Control -> Health Overview

• A loader diagnostics report is available: Enterprise -> Reports -> Information Publisher Reports Report name = Loader Statistics

Things to checks:

• Upload rate (Mb/sec) should be more-or-less constant A fluctuating upload rate indicates some kind of performance bottleneck

• The upload backlog can temporarily spike (when several collections from multiple Agents are getting uploaded at the same time)

• Back-off requests can also temporarily spike (for a very short period of time) should always be low (near-zero) in general.

How to get more information? (1) - Loader performance and throughput

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Loading How to get more information? (2) - Data upload volume

12cR4

Information in the Console:

• Repository metrics page Setup -> Manage Cloud Control -> Repository -> Metrics tab Breakdown of the uploaded volume of metric data, alerts and errors

Things to checks:

• Unusual amount of metric errors (Bar much larger than other target types)

• Unusual amount of alerts

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Loading How to get more information? (3) - Logging and tracing

• OMS log and trace files (EM application tracing) – Look for modules 'GCLoader' in <GC_INST>/em/EMGC_OMS1/sysman/log

emoms_pbs.log : English OMS trace file emoms_pbs.trc : Native language OMS trace file

– Enable Debug $ emctl set property -name log4j.category.oracle.sysman.emdrep.dbjava.loader -value "DEBUG" -module logging

• Repository (PL/SQL tracing) – Enable tracing (Modules are 'LOADER' and 'METRIC_LOAD')

SQL> exec emdw_log.set_trace_level('LOADER',<level>);

or:

$ repvfy send start_trace -name "LOADER"

$ repvfy send stop_trace -name "LOADER"

– To generate the PL/SQL trace report: $ repvfy dump trace

• EMDIAG reports $ repvfy dump backlog (Backlog report for all information flows) $ repvfy dump loader_health (Health report for notification system) $ repvfy dump metric_stats (Aggregated metric upload report)

421053.1 : EMDIAG Master Index

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Loading

• Dill down from the top 25 target types, to get the upload volumes per target type Setup -> Manage Cloud Control -> Repository -> Metrics -> Drill down on a target type

• Alter the upload volume by changing the collection frequency for data- and configuration metrics

– Enable or disable a metric Prevent unwanted metrics from getting collected

– Collection frequency Scale back frequency of non-critical metrics

– Use 'Alerting Only' for state metrics Typically for metrics reporting a boolean-like output

How to control resources? (1) - Incoming data

12cR4

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Loading

• Keep number of collection errors to a minimum! Fix underlying problems, to guarantee proper monitoring

• Set proper metric thresholds (warning, critical, number of occurrences) What-if analysis now available in 12cR4 to predict the number of incoming alerts based on threshold settings

Reduce number of alerts generated by setting the correct warning and critical threshold

Prevent unnecessary multiple alerts by setting the number of occurrences

How to control resources? (2) - Incoming state

12cR4

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

4 Key Processes Notifications

1

2

3

4

5

Architecture Overview

Diagnostic Methodology

4 Key Processes

Summary

Appendix

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Notifications

What is It? Symptoms

Alerts / Events (threshold violations as reported by the Agent or repository)

Alerts not triggered Notifications/Emails not sent

Incidents / Problems (Software or Hardware faults, ADR and ASR incidents)

Notifications/Emails not sent

Job state changes (status of a job execution, or informational state messages about the execution of EM jobs)

Events not triggered Notifications/Emails not sent

Informational Messages Messages received via My Oracle Support

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Notifications How is a Notification generated?

Alert Event Incident / Problem

Agent detect threshold

violation, generates an Alert

and uploads it to the OMS

OMS correlates the Alerts into

unique events, to track the life-

cycle of a threshold violation

Incident rules evaluated for

each event update, to see if

this event has to get turned into

an Incident or a Problem

Incident or problem created if

so specified in the rule actions,

and can be tracked via the

Incident Manager in the UI

If the incident rules specify so, a notification is put in the

delivery queue, and delivered to the recipient

Ag

en

t O

MS

R

ep

osito

ry

OM

S

Incident Rules Promotion

Incident Rules Actions

Event Processing

Loader Sub-System

Metric Engine

Incid

en

t Ru

les

Au

tom

ated

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Notifications

• Event processing statistics captured per OMS (processing of incoming alerts/event, conversion into incidents/problems and notification delivery checks)

• Notification Delivery throughput and performance by method (EMAIL, JAVA, OS Command, PL/SQL, SNMP, SNMPv3 and Helpdesk Connector)

• A separate metric for the Agent-side availability metric for the notification system (to allow Out-Of-Band notifications)

How do we monitor it?

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Notifications

• Number of notifications will be largely driven by the number of incoming alerts Setup -> Manage Cloud Control -> Repository -> Metrics

• Notification delivery backlog shown on the 'Health Overview' page

Setup -> Manage Cloud Control -> Health Overview

How to get more information? (1) - Retrieving monitoring data

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Notifications

• OMS log and trace files (EM application tracing) – Look for modules 'notification', ‘Delivery' in <GC_INST>/em/EMGC_OMS1/sysman/log

emoms_pbs.log : English OMS trace file emoms_pbs.trc : Native language OMS trace file

– Enable Debug $ emctl set property -name log4j.category.oracle.sysman.em.notification -value "DEBUG" -module logging

• Repository (PL/SQL tracing) – Enable tracing (Modules are 'EM_NOTIFY' and 'NOTIFICATION')

SQL> exec emdw_log.set_trace_level('EM_NOTIFY',<level>);

or:

$ repvfy send start_trace -name "EM_NOTIFY"

$ repvfy send stop_trace -name "EM_NOTIFY"

– To generate the PL/SQL trace report: $ repvfy dump trace

• EMDIAG reports $ repvfy dump backlog Backlog report for all information flows $ repvfy dump notif_health Health report for notification system

$ repvfy send notif_dump Diagnostic dump in OMS log files for notification system

421053.1 : EMDIAG Master Index

How to get more information? (2) - Logging and tracing

Levels 0 - Fine / Debug

1 – Informational

2 – Warning

3 - Severe

4 - No Tracing / OFF

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Notifications

• Number of notification threads (rarely changed): $ emctl set property -name oracle.sysman.core.notification.max_delivery_threads -value 24

• Number of notification thread connections (rarely changed): (Should always be between max threads / 2 and max threads) $ emctl set property -name oracle.sysman.core.conn.maxConnForNotifications -value 25

• Limit the number of notifications send out per minute (extremely rare): (global parameters - set once for All OMS's) $ emctl set property -name oracle.sysman.core.notification.emails_per_minute -value 5000

$ emctl set property -name oracle.sysman.core.notification.cmds_per_minute -value 5000

$ emctl set property -name oracle.sysman.core.notification.traps_per_minute -value 5000

$ emctl set property -name oracle.sysman.core.notification.plsql_per_minute -value 5000

• Before changing the values though, consider these two questions: Why were the alerts generated in the first place? (wrong threshold used?)

Do I really want to notify people about all these metrics? (incident rule change?)

How to control resources?

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Summary

1

2

3

4

5

Architecture Overview

Diagnostic Methodology

4 Key Processes

Summary

Appendix

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

To summarize

• Information flows are initiated by background processes (eg: callbacks) and as a result of user initiated activity (eg: Console or EMCLI)

• Efficient interactions between all the EM components is the key to good performance

• Monitor performance and throughput on each tier

• Resources have to be balanced on all tiers (Agent, OMS and repository)

• Resource constraints on one tier can cause up-stream or down-stream bottlenecks

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Questions ?

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Appendix

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

EM Resources

• Focus-On Document for Enterprise Manager @ OOW 2014: https://oracleus.activeevents.com/2014/connect/focusOnDoc.do?focusID=17776

• Oracle website http://www.oracle.com/us/products/enterprise-manager/index.html

• Documentation http://www.oracle.com/pls/em121/homepage

• Best Practices: – Best Practices Blog

https://blogs.oracle.com/EMMAA/

– Operational Considerations and Troubleshooting http://www.oracle.com/technetwork/database/availability/managing-em12c-1973055.pdf

– White paper Sizing guidelines http://www.oracle.com/technetwork/oem/framework-infra/em12c-sizing-1590739.pdf

Getting additional information About Enterprise Manager

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager Sessions – Monday, September 29th

ID Title Time Location

CON8217 Managing the Oracle Fusion Middleware Stack with Oracle Enterprise Manager 11:45 AM - 12:30 PM Moscone South - 200

CON8856 Oracle Enterprise Manager: The Complete Solution and Oracle’s Best Kept Secrets 11:45 AM - 12:30 PM Moscone South - 301

CON8449 Automatic Workload Repository Warehouse: Helping DBAs Make Sure History Never Repeats Itself

1:30 PM - 2:15 PM Moscone South - 104

CON8018 Best Practices from Oracle Cloud Delivered On-Premises with Oracle Enterprise Manager 1:30 PM - 2:15 PM Moscone South - 270

CON8225 Under the Hood: Diagnosing and Troubleshooting Oracle Enterprise Manager 12c Release 4

1:30 PM - 2:15 PM Moscone South - 302

CON8138 Beyond the Basics: Making the Most of Oracle Enterprise Manager 12c Monitoring 1:30 PM - 2:15 PM Moscone South - 304

CON8567 Best Practices for Maintaining and Supporting Oracle Enterprise Manager 2:45 PM - 3:30 PM Intercontinental - Grand Ballroom C

CON8178 Best Practices for Managing Oracle WebLogic Server with Oracle Enterprise Manager 12c 2:45 PM - 3:30 PM Moscone South - 200

CON8177 Private Database Clouds: A Standardized Service Catalog for Delivering DBaaS 2:45 PM - 3:30 PM Moscone South - 305

CON3178 Database Software Currency: Using Oracle Enterprise Manager 12c Provisioning and Patching

2:45 PM - 3:30 PM Moscone South - 301

58

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager Sessions – Monday, September 29th

ID Title Time Location

CON3111 Set Up Oracle Real User Experience Insight 12c to Monitor Oracle WebLogic Applications’ UX

4:00 PM - 4:45 PM Moscone South - 250

CON4102 SQL Tuning Without Trying 4:00 PM - 4:45 PM Moscone South - 104

CON8212 Oracle Management Pack Plus for Identity Management Best Practices and Lessons Learned

4:00 PM - 4:45 PM Moscone South - 200

CON7899 Oracle Data Integrator: Product Update and Future Strategy 4:00 PM - 4:45 PM Moscone South - 252

CON2043 Consolidating to Database as a Service with Oracle Real Application Testing 5:15 PM - 6:00 PM Moscone North - 130

CON5983 Full Visibility into Oracle WebLogic/Java Diagnostics with Oracle Enterprise Manager 12c

5:15 PM - 6:00 PM Moscone South - 200

CON2436 Why Database as a Service Will Be a Breakaway Technology at Société Générale 5:15 PM - 6:00 PM Moscone South - 301

CON7720 Advanced Management with Oracle Application Management Suite for Oracle E-Business Suite

5:15 PM - 6:00 PM Moscone West - 2018

CON8214 Maximizing Reliability of Oracle Business Intelligence Enterprise Edition and Oracle Exalytics

5:15 PM – 8:00 PM Moscone South – 262

59

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager Sessions – Tuesday, September 30th

ID Title Time Location

GEN8250 General Session: Drive the Future of Self-Service IT with Oracle Enterprise Manager Noon – 12:45 PM Moscone South - 103

CON5748 Create a DBaaS Catalog in an Hour with a PaaS-Ready Infrastructure Noon – 12:45 PM Moscone South - 301

CON2586 Best Practices for Deploying a DBaaS in a Private Cloud Model Noon – 12:45 PM Moscone South - 310

CON7830 Solving Data Skew in Oracle Business Applications with Oracle’s Flash-Optimized SAN Storage

3:45 PM - 4:30 PM Intercontinental - Intercontinental C

CON8452 Future Now: Advanced Database Management for Today’s DBA 3:45 PM - 4:30 PM Moscone South - 104

CON4045 Provision Oracle Fusion Middleware Faster with Oracle Enterprise Manager 12c 3:45 PM - 4:30 PM

Moscone West - 3016

CON5875 Using Oracle Enterprise Manager to Deliver Multitenant DBaaS on Oracle Exadata: Lessons Learned

5:00 PM - 5:45 PM Moscone South - 301

CON8450 SQL (and PL/SQL) Tuning Experts Panel 5:00 PM - 5:45 PM Moscone South - 308

60

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager Sessions – Wednesday, October 1st

ID Title Time Location

CON4954 Oracle Infrastructure Systems Management with Oracle Enterprise Manager and Ops Center

10:15 AM - 11:00 AM Intercontinental - Telegraph Hill

CON7961 Streamline Utility IT Operations with Oracle Enterprise Manager 10:15 AM - 11:00 AM Marriott Marquis - Salon 14/15

CON8139 Database Time-Based Performance Tuning: From Theory to Practice 10:15 AM - 11:00 AM Moscone South - 104

CON8173 Management of Oracle SOA Suite and Oracle Service Bus with Oracle Enterprise Manager 12c

10:15 AM - 11:00 AM Moscone South - 200

CON8121 Databases to Oracle Exadata: The Saga Continues for Oracle Enterprise Manager–Based Patching

10:15 AM - 11:00 AM Moscone South - 300

CON3182 Deployment of Oracle Exadata and Oracle Exalogic Increases Business Efficiency 10:15 AM - 11:00 AM Moscone South - 310

CON8133 Behind the Scenes of Managing the Engineered Systems Showcase 11:30 AM – 12:15 PM Intercontinental - Telegraph Hill

61

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager Sessions – Wednesday, October 1st

ID Title Time Location

CON2927 Oracle Enterprise Manager 12c: Maximize ROI via a Single Pane of Glass Across a Data Center

11:30 AM - 12:15 PM Moscone South - 200

CON8247 DBA’s New Best Friend for Mistake-Free Administration: Oracle Real Application Testing

11:30 AM - 12:15 PM

Moscone South - 301

CON8245 Tips for Successful Oracle Exadata Management with Oracle Enterprise Manager 12c 11:30 AM - 12:15 PM

Moscone South - 303

CON8451 Next-Generation Testing with Oracle Application Testing Suite 11:30 AM - 12:15 PM

Moscone West - 3002

CON8091 Middleware as a Service: Converged Solution for Administrators and DevOps 12:45 PM - 1:30 PM Moscone South - 301

CON8134 Zero to Manageability in One Hour: Build a Solid Foundation for Oracle Enterprise Manager 12c

12:45 PM - 1:30 PM

Moscone South - 303

CON5489 Deploy Oracle Fusion Middleware as a Service (MWaaS) on a Shared-Services Cloud 12:45 PM - 1:30 PM

Moscone South - 309

62

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager Sessions – Wednesday, October 1st

ID Title Time Location

CON8185 Use Oracle Enterprise Manager in a Box to Easily Manage the Enterprise 2:00 PM - 2:45 PM Moscone North - 131

CON8130 Deployment Best Practices for Private Cloud: Fast Track to DBaaS and MWaaS 2:00 PM - 2:45 PM Moscone South - 301

CON8248 Trouble-Free Upgrade to Oracle Database 12c with Oracle Real Application Testing 2:00 PM - 2:45 PM Moscone South - 303

CON8016 DBaaS 2.0: Rapid Provisioning, Richer Services, Integrated Testing, and More 3:30 PM – 4:15 PM Moscone South - 301

CON7726 Oracle Exadata Database Machine Administration and Monitoring Made Easy 4:45 PM – 5:30 PM Moscone South - 104

CON8260 Database as a Service (DBaaS) Cookbook: Strategies and Tips for Successful Deployment

4:45 PM – 5:30 PM

Moscone South - 301

63

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager Sessions – Thursday, October 2nd

ID Title Time Location

CON2561 You’ve Got It; Flaunt It: Oracle Enterprise Manager Cloud Control Extensibility 9:30 AM - 10:15 AM Marriott Marquis - Golden Gate C3

CON8273 Management and Monitoring of Oracle Tuxedo: Integrated, Automated 9:30 AM - 10:15 AM Marriott Marquis - Salon 14/15

CON7940 Building an On-Premises Java Cloud: Oracle WebLogic Server and Oracle Enterprise Manager

9:30 AM - 10:15 AM Moscone South - 200

CON8243 Oracle Enterprise Manager 12c Security Cookbook: Best Practices for Large Data Centers

9:30 AM - 10:15 AM Moscone South - 300

CON3028 Enterprise Architecture Approach to Developing a DBaaS Private Cloud at Boeing 9:30 AM - 10:15 AM Moscone South - 301

CON8184 What’s New and Best Practices for Oracle Data Masking and Subsetting 9:30 AM - 10:15 AM Moscone South - 306

CON5451 Highly Available, Highly Scalable: Oracle Enterprise Manager 12c for Large Enterprises 10:45 AM - 11:30 AM Marriott Marquis - Golden Gate C3

CON4114 Advanced Diagnostics and Monitoring with Oracle Enterprise Manager 12c 10:45 AM - 11:30 AM Moscone South - 301

64

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager Sessions – Thursday, October 2nd

ID Title Time Location

CON2699 Oracle Exadata’s Exachk and Oracle Enterprise Manager 12c: Keeping Up with Oracle Exadata

10:45 AM - 11:30 AM Moscone South - 310

CON4448 PDBaaS with Oracle Enterprise Manager 12c 12:00 PM - 12:45 PM Marriott Marquis - Golden Gate C3

CON10038 Customer Panel: Private Cloud Consolidation, Standardization, and Automation 12:00 PM - 12:45 PM Moscone South - 301

CON8244 Manage the Manager: Tips on How to Best Manage Oracle Enterprise Manager 12c 1:15 PM - 2:00 PM Marriott Marquis - Golden Gate C3

CON8015 Security Compliance and Data Governance: Dual Problems, Single Solution 1:15 PM - 2:00 PM Moscone South - 301

CON7718 Managing and Monitoring Oracle GoldenGate 1:15 PM - 2:00 PM Moscone South - 302

CON7697 Oracle Enterprise Manager 12c Cloud Control for Managing Oracle E-Business Suite 12.2

1:15 PM - 2:00 PM Moscone West - 2018

CON6083 Real-World Operation Excellence with Oracle Enterprise Manager 12c: Taking It to the Next Level

2:30 PM - 3:15 PM Marriott Marquis - Golden Gate C3

CON8493 Odyssey of DBaaS: A UBS Story 2:30 PM - 3:15 PM Moscone South - 301

65

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager Demos

ID Title Location Area Demopod #

3943 Application and Infrastructure Testing Moscone West, Lower Left

Applications

WLL-020

3962 Automatic Application and SQL Tuning

Moscone South, Left

Database

SLD-106

3946 Automatic Fault Diagnostics

Moscone South, Left

Database

SLD-101

3963 Automatic Performance Diagnostics

Moscone South, Left

Database

SLD-103

3944 Automatic Workload Repository Warehouse

Moscone South, Left

Database

SLD-111

3948 Automation and Storage Savings with Database as a Service and Snap Clone Moscone South, Left

Database

SLD-102

3921 Complete Data Center Monitoring with Oracle Enterprise Manager 12c

Moscone South, Left

Database

SLD-112

3947 Complete Database Lifecycle Management

Moscone South, Left

Database

SLD-107

3881 End User Monitoring and Diagnostics with Oracle Enterprise Manager 12c Moscone South, Left

Middleware

SLM-109

4028 Identity Management Monitoring with Enterprise Manager 12c

Moscone South, Left

Middleware

SLM-141 66

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager Demos

ID Title Location Area Demopod #

3928 Middleware PaaS in Private Cloud with Oracle Enterprise Manager 12c

Moscone South, Left

Middleware

SLM-111

3925 Oracle Applications and Business Intelligence Management with Oracle Enterprise Manager 12c

Moscone West, Lower Left

Applications

WLL-023

3966 Oracle Enterprise Manager Cloud Control 12c Overview

Moscone South, Left

Database

SLD-105

3949 Oracle SuperCluster and Oracle VM for SPARC Management with Oracle Enterprise Manager Ops Center 12c Moscone South, Center

Systems , Servers, Virtualization -SC-158

3942 Oracle WebLogic Server and Oracle Coherence Management with Oracle Enterprise Manager 12c Moscone South, Left

Middleware

SLM-107

3945 Risk-Free Database Administration with SQL Performance Analyzer and Database Replay Moscone South, Left

Database

SLD-108

3926 SOA and Service Bus Management with Oracle Enterprise Manager 12c Moscone South, Left

Middleware

SLM-140

67

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager One Hour Hands-On Labs Monday 9/29 at Hotel Nikko

ID Title Time Room

HOL9508 Oracle Enterprise Manager Database as a Service: Automation for Broader Cloud Services

01:15 – 02:15 Hotel Nikko - Carmel

HOL9529 Rapidly Mass-Deploy Oracle Fusion Middleware with Oracle Enterprise Manager 12<i>c</i> Provisioning

02:45 – 03:45 Hotel Nikko - Nikko Ballroom I

HOL9532 Achieving Standardization with Oracle Enterprise Manager Database Lifecycle Management

04:15 – 05:15 Hotel Nikko - Carmel

HOL9530 Risk-Free Database Consolidation for Private Clouds with Oracle Real Application Testing

05:45 – 06:45 Hotel Nikko - Carmel

68

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager One Hour Hands-On Labs Tuesday 9/30 at Hotel Nikko

ID Title Time Room

HOL9528 Private Cloud Self-Service, Oracle Fusion Middleware PaaS with Oracle Enterprise Manager 12c

03:45 – 04:45 Nikko Ballroom I

HOL9509 Oracle Enterprise Manager 12c: Oracle WebLogic Server and SOA Diagnostics and Administration

05:15 – 06:15 Nikko Ballroom I

HOL9508 Oracle Enterprise Manager Database as a Service: Automation for Broader Cloud Services

05:15 – 06:15 Carmel

HOL9484 Maximizing Oracle Database 12c Performance with Oracle Enterprise Manager 06:45 – 07:45 Carmel

69

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager One Hour Hands-On Labs Wednesday 10/1 at Hotel Nikko

ID Title Time Room

HOL9484 Maximizing Oracle Database 12c Performance with Oracle Enterprise Manager 02:45 – 03:45 Carmel

HOL9532 Achieving Standardization with Oracle Enterprise Manager Database Lifecycle Management

04:15 – 05:15 Carmel

70

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager One Hour Hands-On Labs Thursday 10/2 at Hotel Nikko

ID Title Time Room

HOL9484 Maximizing Oracle Database 12c Performance with Oracle Enterprise Manager 10:00 – 11:00 Carmel

HOL9509 Oracle Enterprise Manager 12c: Oracle WebLogic Server and SOA Diagnostics and Administration

11:30 – 12:30 Nikko Ballroom I

HOL9528 Private Cloud Self-Service, Oracle Fusion Middleware PaaS with Oracle Enterprise Manager 12c

01:00 – 02:00 Nikko Ballroom I

71

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

ADDM Automatic Database Diagnostic Monitor ADR Automatic Diagnostic Repository ASH Active Session History ASR Automatic Service Request AWR Automatic Workload Repository BI Business Intelligence BIP BI Publisher CLI Command-line Interface CPU Central Processing Unit CRS Cluster Ready Services DBMS Database Management System EM Enterprise Manager GC Grid Control HTTP Hypertext Transfer Protocol IO Input / Output IP Information Publisher IT Information Technology JDK JAVA Development Kit JVM JAVA Virtual Machine JVMD JVM Diagnostics MDA Middleware Diagnostic Advisor

MTM Monitor The Monitor NAS Network Attached Storage OMA Oracle Management Agent OMR Oracle Management Repository OMS Oracle Management Server OOB Out-of-Band OS Operating System OUI Oracle Universal Installer PBS Platform Background Services PLSQL Procedural Language SQL QA Quality Assurance RAC Real Application Cluster RCA Root Cause Analysis RMAN Recovery Manager SAN Storage Area Network SLA Service Level Agreement SNMP Simple Network Management Protocol SQL Structured Query Language UI User Interface URL Uniform Resource Locator WLS WebLogic Server

The TLA library…

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |