con8225 - under the hood - oracle€¦ · con8225 - under the hood diagnosing and troubleshooting...
TRANSCRIPT
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
CON8225 - Under the Hood
Diagnosing and Troubleshooting Oracle Enterprise Manager 12c Release 4
Andrew Bulloch, Werner De Gruyter, Courtney Llamas Enterprise Manager Strategic Customer Programs September 29th, 2014
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
1
2
3
4
5
Architecture Overview
Diagnostic Methodology
4 Key Processes
Summary
Appendix
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Architecture Overview
1
2
3
4
5
Architecture Overview
Diagnostic Methodology
4 Key Processes
Summary
Appendix
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Total Cloud Control
Optimized, Efficient Agile, Automated | |
Expanded Cloud Stack Management
Scalable, Secure
Superior Enterprise-Grade Management
Complete Cloud Lifecycle Management
Administration and Maintenance Monitoring Service Level Management
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Infrastructure Overview
Repository (OMR): • Heart
• Data storage, aggregation, rollup and purging
Management Server (OMS): • Brains
• Handles management data from Agents, and
delegates administration tasks to the Agents
• Handles administration and real-time monitoring
requests from UI
• Sends out notifications
Agents (OMA): • Nerve endings
• Receives and acts upon task requests from the
OMS
• Gather management information
• Perform administration task
Repository Database
Management Server
EM Users: EMCLI Console / Reports
Agent
Agent
Agent
Firewall
Notifications
Connectors
Internet
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Requirements per tier
• Agent version must be the same as the OMS or lower
• JDK 1.6.0_43 or higher
• Deployed from the OMS (agent push) or manual install (agent pull)
• Plug-ins deployed when needed
* One Agent per monitored machine
Repository (OMR)
• WebLogic Server 10.3.6
• Dedicated WLS server
• Sun JDK 1.6.0_43 or higher
• OUI will install WLS if not already
installed
• Enterprise Edition database
- Fine-grained access control
- Partitioning
• Version 10.2.0.5 or higher*
– Recommend dedicated version 11.2.0.4
• Use physical standby only for Data Guard
• Details on the new repository views can be found in the Extensibility Guide
Management Server (OMS) Management Agent (OMA)
Focus on: CPU, IO Focus on: Network, Memory Focus on: Connectivity, CPU
*Check certification matrix in My Oracle Support
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
EM Architecture
EM is composed of many different subsystems
UI & Real-Time Monitoring
Notifications
Reporting
Incident Management Data
Collection
Data Aggregation
Data Loading
Alerting Jobs
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
EM Architecture
Jobs
Repository (OMR) Management Server (OMS) Management Agent (OMA)
Data Collection
Data Loading
Data Aggregation
Incident Management
Alerting Notifications
Reporting UI & Real-Time
Monitoring
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
EM Architecture
User Interaction & Real-Time Monitoring
Notifications
Repository (OMR) Management Server (OMS) Management Agent (OMA)
Reporting
Incident Management
Data Collection Data Aggregation
Jobs
Data Loading
Alerting
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
EM Architecture – Focus today on 4 key Processes
Repository (OMR) Management Server (OMS) Management Agent (OMA)
User Interaction & Real-Time Monitoring
Notifications
Jobs
Data Loading
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
The Takeaways for Today….
• Information flows are initiated by background processes (eg: callbacks) and as a result of user initiated activity (eg: Console or EMCLI)
• Efficient interactions between all the EM components is the key to good performance
• Monitor performance and throughput on each tier
• Resources have to be balanced on all tiers (Agent, OMS and repository)
• Resource constraints on one tier can cause up-stream or down-stream bottlenecks
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager – 4 Key Processes
Management Server and Repository
User Interaction & Real-Time Monitoring Administrators logging into EM, performing tasks and requesting information
Notifications
Processing of alerts and notifying the right set of administrators and/or 3rd party
help desk applications
Jobs & Tasks Jobs initiated from both Administrators (user jobs), as well as internal housekeeping and operational jobs (system jobs and housekeeping operations)
Data Loading
Agents uploading telemetry and operational data
Agent
Console 3rd Party
Agent
Inte
racti
ve
B
ackg
rou
nd
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Diagnostic Methodology
1
2
3
4
5
Architecture Overview
Diagnostic Methodology
4 Key Processes
Summary
Appendix
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Methodology – Method in the madness
Diagnostics Workflow
• The Console is your starting point. It’s good (and getting better) at giving you task based errors (often with solution guidance)
• MTM (Monitor The Monitor) is good at giving you non-task based information but it’s all ‘pull based’ and asynchronous in nature
• Sometimes the only way to identify or diagnose a problem is at the component level (Repository, OMS or Agent)
Console
MTM
Component
Kn
ow
led
ge
Leve
l
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Component Diagnostics Tools Available
• emctl and emcli command-line utilities
• JAVA Tools (jps, jstat, jmap, etc…)
• Agent Metric Browser
• Agent (oracle_emd) targets
Repository (OMR)
• The OMS is just another managed WebLogic stack • JVM Diagnostics (!)
• MDA (Middleware Diagnostic Advisor)
• etc…
• emctl and emcli command-line utilities
• JAVA Tools (jps, jstat, jmap, etc…)
• 'Repository' (oracle_emrep) and OMS (oracle_oms) targets
• The repository is just another managed database: Database Advisor information also available for the repository database
• ADDM (Diagnostic Monitor)
• AWR (Workload Repository)
• ASH (Session History)
• Segment Advisor
• etc…
Management Server (OMS) Management Agent (OMA)
EMDIAG (EM Diagnostics)
repvfy omsvfy agtvfy
421053.1 : EMDIAG Master Index 1556491.1: Using Agent Metric browser for Diagnosing Agent Issues
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
4 Key Processes User Interaction & Real-Time Monitoring
1
2
3
4
5
Architecture Overview
Diagnostic Methodology
4 Key Processes
Summary
Appendix
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
User Interaction & Real-Time Monitoring
What are the users doing? Symptoms Experienced
Regular work in the Console • Logging in • Checking incidents • Performing administration tasks
Slow login Slow page performance
Real-time Monitoring and debugging of managed targets • Database performance pages • Adding tablespaces • Viewing log files
Slow response Refresh of pages takes a long time Connectivity issues (no data available)
Reports • Automated (scheduled) • Interactive (ad-hoc, on-demand)
Slow response Scheduled reports not running at appointed time
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
• Tracking UI page performance Manage Cloud Control -> Health Overview -> Monitoring -> All Metrics -> Page Performance
• Beacons can be added to monitor the responsiveness of the EM application over time – Deploy in strategic locations on the network – Per URL basis (login, or start of a process) – Can be done from any Agent
User Interaction & Real-Time Monitoring How do we monitor it?
12cR4
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
User Interaction & Real-Time Monitoring How to get more information? (1) - Standard EM monitoring data
• Aggregated UI usage (new in 12cR4) Setup -> Manage Cloud Control -> Health Overview -> Monitoring -> Page Performance
• Track the test performance of the
beacon An 'EM Management Beacon' is created during install to monitor the EM application on the 1st OMS
1460408.1: Troubleshooting OMS and UI Performance Issues
12cR4
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
User Interaction & Real-Time Monitoring How to get more information? (2) - Tracing an UI page
Identify Session
Start Trace
Perform Actions
Stop Trace
Run Report
$ emcli list_active_sessions -details
1640578.1: How to find out the SQL Statements that are run in the Repository
$ emcli trace -enable=true -user=<user to be traced>
$ emcli trace -enable=false -user=<user to be traced>
$ emctl genreport oms -file_name <just-the-filename>.trace
• Use EMCLI to trace any UI page (identify session-> start trace-> perform actions > stop trace-> run report)
• Run report on the OMS with emctl (check emctl.log file for details about the generated trace file)
<< user interaction here >>
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
User Interaction & Real-Time Monitoring How to control resources?
• General slowness for all the EM pages > JAVA heap size for the OMS (to accommodate memory for each logged in user)
– Use OMS_HEAP_MAX, OMS_HEAP_MIN, OMS_PERMGEN_MAX, OMS_PERMGEN_MIN parameters to change the heap settings $ emctl set property -name OMS_HEAP_MAX -value 2560
– Common values for the JAVA Heap are between 2Gb and 4Gb
• Inconsistent performance > Asymmetric RAC database configuration
– RAC configuration and resources not identical across the database
> Load Balancer setup and configuration – Users require session affinity to a particular OMS
(For new connections: Round-Robin vs. Most Requested)
• Login problems > Number of session in the database (init.ora setting)
– Check sessions and processes setting in the database
> Automatic logout of inactive sessions – Controlled by the global 'oracle.sysman.eml.maxInactiveTime' parameter (specified in minutes)
Specified only once for ALL OMS’s, default is 45 minutes $ emctl set property -name oracle.sysman.eml.maxInactiveTime -value 45
> Control the number of possible connections from clients (rarely changed):
– MaxClients parameter in httpd.conf file Default is 150 (means: 150 simultaneous connected sessions for this OMS serviced by Oracle HTTP Server)
Default values: OMS_HEAP_MAX 1740M
OMS_HEAP_MIN 56M
OMS_PERMGEN_MAX 768M
OMS_PERMGEN_MIN 128M
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
4 Key Processes Tasks & Jobs
1
2
3
4
5
Architecture Overview
Diagnostic Methodology
4 Key Processes
Summary
Appendix
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Tasks & Jobs
What are tasks and jobs? Types of tasks and jobs Symptoms EM Jobs - Job Engine of Enterprise Manager
• Maintenance and Administration tasks
submitted by the Administrators in the console
(RMAN, OS commands, reports, patches, etc…)
• Short - Small synchronous requests made
to the Agent (status updates, data retrieval,
etc…)
• Long - Long running asynchronous requests
(file transfers, running of OS commands,
etc…)
Jobs not executing on the appointed time,
failed or suspended jobs, …
EM Jobs - Job Engine of Enterprise Manager
• Maintenance and Administration tasks
submitted by EM itself in background in
response to monitoring or administration
operations (Clustered target fail-over, template
apply, Administration group synchronization, etc…)
• Short - Small synchronous requests made
to the Agent (status updates, data retrieval,
etc…)
• Long - Long running asynchronous requests
(file transfers, running of OS commands,
etc…)
Pending template apply operations,
Administration groups not updated, pending
availability status for cluster members, …
Internal Jobs - DBMS_SCHEDULER engine
(Repository)
• Housekeeping tasks (Composite Availability
Calculation, RCA analysis, Compliance score
calculation, Rollup, Purge, etc…)
• Short - Quick operations taking less than 60
seconds
• Long - Long running operations (more than
60 seconds)
Pending availability status for clustered
targets, compliance scores not updated, EM
monitoring data unavailable
Tasks
Jobs
Jobs
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Tasks & Jobs – EM Jobs
Two parts for executing jobs in EM: – Scheduling information for jobs (Repository-side)
Decide which jobs need to get picked up by an OMS for dispatching One DBMS_SCHEDULER jobs run every 30 seconds to check the schedule
– Dispatching and executing information (per OMS) Get the job details, and inform the Agents of the work that has to get done. One thread per OMS to pick-up the scheduled work, and multiple worker threads per OMS to (talk to the Agents and ) execute the actual work to be done
How do we monitor it?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Tasks & Jobs – EM Jobs
Information in the Console:
• Special service created: 'EM Job Service' This is a rollup of the repository operations (step scheduling) and all OMS operations (job dispatching) information
• Reports for the EM Job system 'Job System Diagnostic Report' (New for 12cR4)
Things to check:
• Growing backlog (not enough resources)
• High % processing time (>75% in general means a repository bottleneck)
• Low throughput with high processing % time (Processing bottleneck)
How to get more information? (1) - Retrieving monitoring data
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Tasks & Jobs – EM Jobs
• OMS log and trace files (EM application tracing) – Look for modules 'Job Step', 'RJob Step', 'Load Job Step' and 'JobRecv‘ in <GC_INST>/em/EMGC_OMS1/sysman/log
emoms.log : English OMS trace file emoms.trc : Native language OMS trace file
– Enable debug logging $ emctl set property -name log4j.category.oracle.sysman.emdrep.jobs -value "DEBUG" -module logging
$ emctl set property -name log4j.category.oracle.sysman.eml.jobs -value "DEBUG" -module logging
Always reset the debug levels when finished !
• Repository (PL/SQL tracing) – Enable tracing
SQL> exec emdw_log.set_trace_level('EM.JOBS',<level>);
or:
$ repvfy send start_trace -name "EM.JOBS"
$ repvfy send stop_trace -name "EM.JOBS"
– To generate the PL/SQL trace report: $ repvfy dump trace
• EMDIAG reports $ repvfy dump backlog (Backlog report for all information flows) $ repvfy dump job_health (EM job system health details)
How to get more information? (2) - Logging and tracing
Levels 0 - Fine / Debug
1 – Informational
2 – Warning
3 - Severe
4 - No Tracing / OFF
421053.1 : EMDIAG Master Index 1670012.1: Impact of Setting Debug Mode for OMS and Steps to Enable Debug for Particular Subsystems
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Tasks & Jobs – EM Jobs
• See the current pool sizes: $ repvfy show job_pools
• Change the jobs pool sizes on ALL OMS’s if available threads are low (<10%), and dispatched steps are high (>60): $ emctl set property -name oracle.sysman.core.jobs.shortPoolSize -value 25
$ emctl set property -name oracle.sysman.core.jobs.longPoolSize -value 12
$ emctl set property -name oracle.sysman.core.jobs.systemPoolSize -value 25
$ emctl set property -name oracle.sysman.core.jobs.longSystemPoolSize -value 10 $ emctl set property -name oracle.sysman.core.jobs.waitPoolSize -value 10
• Job Activity Details Page (Detail for each OMS)
How to control resources? (1) OMS thread sizing
12cR4
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Tasks & Jobs – EM Jobs
• If the Job pool sizes are changed, the number of connections the EM job sub-system can make to the repository database will also have to get tuned
– Minimum value: System Normal + System Critical (Default = 25+10 = 35)
– Maximum value: Sum of all job pools (Default = 25+12+25+10+10 = 82)
– Recommended value if user jobs are submitted frequently: Minimum value + User Short/2 (Default = 35+25/2 = ~ 47)
• To change the jobs pool sizes on the OMS: (Default value out-of-box is 35) $ emctl set property -name oracle.sysman.core.conn.maxConnForJobWorkers -value 47
How to control resources? (2) OMS connection sizing
12cR4
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Tasks & Jobs - Tasks
• Standard database monitoring: Monitor any DBMS_SCHEDULER job in the database using the Database Job Status metric
• Tasks are executed by DBMS_SCHEDULER jobs Use the 'DBMS Job Status' metric from the 'OMS and Repository' target to track the performance of the task workers (Repository Metrics nn)
• Performance and throughput numbers are stored per execution of a task
What kind of monitoring do we have?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Tasks & Jobs - Tasks
Information in the Console:
• The scheduling information and the task backlog can be found on the Repository page Setup -> Manage Cloud Control -> Repository
Things to check:
• Growing backlog (possible lack of threads)
• Low throughput and constant backlog (processing bottleneck)
• High average duration (Should be <60 second for short running tasks and up to 2 to 4 min for long running tasks)
How to get more information? (1) - Retrieving the monitoring data
12cR4
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Task & Jobs - Tasks
• Tracking tasks $ repvfy dump errors Overview of errors reported in the repository
• Repository (PL/SQL tracing) $ repvfy send run_task -id <number>
• EMDIAG reports $ repvfy dump backlog Backlog report for all information flows $ repvfy dump task_health Health report for Task sub-system
421053.1 : EMDIAG Master Index
How to get more information? (2) - Logging and tracing
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Tasks & Jobs - Tasks
• Two distinct classes of tasks: Short (< 1 minute elapsed time per execution) and Long (> 1 minute elapsed time per execution) By default, there are 2 threads defined per class Increase number of worker threads if the time spend per hour is more than 75% and there is constant backlog for the task class
• Change in the UI on the Repository page Setup -> Manage Cloud Control -> Repository -> Repository Collection Performance -> Configure
• Or Use EMDIAG
– See the current thread per class $ repvfy show worker_tasks
– Change the number of worker threads in the repository to a minimum of 2 per class $ repvfy send set_workers
How to control resources? - Tasks
12cR4
421053.1 : EMDIAG Master Index
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Tasks & Jobs – DBMS_SCHEDULER
• Controlled by the job_queue_processes database parameter (Recommended value is 10)
• Check running jobs in the database on the 'Repository' page Setup -> Manage Cloud Control -> Repository -> Repository Scheduler Job Status region
• Edit the schedule for the daily jobs to run in off-peak time
How to control resources? - DBMS_SCHEDULER
12cR4
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
4 Key Processes Target Data
1
2
3
4
5
Architecture Overview
Diagnostic Methodology
4 Key Processes
Summary
Appendix
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Loading
What types of data are loaded into EM? Examples
Target Definition Metadata describing the target, and the monitoring of the target
Target Telemetry Availability Performance Throughput
Target State Target state changes (up/down) Errors Alerts and threshold violations
Configuration data Setup and configuration Properties
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Loading
• OMS loader metrics – Basic performance and capacity metrics
• Capacity graphs on the OMS page – Aggregate operational performance for all OMS's – Based on the metrics collected per OMS
How do we monitor it?
12cR4
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Loading
• A back-off requests is generated when the OMS detects it can not process ALL incoming requests from the Agents at a given point in time – Agent is told to retry the upload requests in 'x' seconds (progressive approach for repeat offenders,
starting with 1 second and a maximum of 300 seconds)
• This is caused by an information flood: – Processing time for loading data too slow (database resource issue, performance issue, data
processing problem) – Too much information generated by all the Agents
(metric data, alerts / state changes, metadata) – A single Agent generating so much information
it is preventing other Agents from having the ability to upload data
How do we handle multiple simultaneous requests?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Loading
• The 'Lifecycle Status' property of the target influences the behavior for loading data (incoming), notification processing (outgoing) and job dispatching (outgoing)
• The OMS and Repository know the lifecycle status of each target, and uses that with every request to/from this target to prioritize the administration requests
• Change the property on the 'Target Properties' page Target homepage -> Target Setup -> Properties
How do we prioritize incoming requests?
Possible values 1 – Mission Critical (Highest)
2 – Production
3 – Stage (Or blank/no value)
4 – Test / QA
5 – Development (Lowest)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Loading
Information in the Console:
• Loader statistics are shown on the Health Overview page Setup -> Manage Cloud Control -> Health Overview
• A loader diagnostics report is available: Enterprise -> Reports -> Information Publisher Reports Report name = Loader Statistics
Things to checks:
• Upload rate (Mb/sec) should be more-or-less constant A fluctuating upload rate indicates some kind of performance bottleneck
• The upload backlog can temporarily spike (when several collections from multiple Agents are getting uploaded at the same time)
• Back-off requests can also temporarily spike (for a very short period of time) should always be low (near-zero) in general.
How to get more information? (1) - Loader performance and throughput
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Loading How to get more information? (2) - Data upload volume
12cR4
Information in the Console:
• Repository metrics page Setup -> Manage Cloud Control -> Repository -> Metrics tab Breakdown of the uploaded volume of metric data, alerts and errors
Things to checks:
• Unusual amount of metric errors (Bar much larger than other target types)
• Unusual amount of alerts
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Loading How to get more information? (3) - Logging and tracing
• OMS log and trace files (EM application tracing) – Look for modules 'GCLoader' in <GC_INST>/em/EMGC_OMS1/sysman/log
emoms_pbs.log : English OMS trace file emoms_pbs.trc : Native language OMS trace file
– Enable Debug $ emctl set property -name log4j.category.oracle.sysman.emdrep.dbjava.loader -value "DEBUG" -module logging
• Repository (PL/SQL tracing) – Enable tracing (Modules are 'LOADER' and 'METRIC_LOAD')
SQL> exec emdw_log.set_trace_level('LOADER',<level>);
or:
$ repvfy send start_trace -name "LOADER"
$ repvfy send stop_trace -name "LOADER"
– To generate the PL/SQL trace report: $ repvfy dump trace
• EMDIAG reports $ repvfy dump backlog (Backlog report for all information flows) $ repvfy dump loader_health (Health report for notification system) $ repvfy dump metric_stats (Aggregated metric upload report)
421053.1 : EMDIAG Master Index
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Loading
• Dill down from the top 25 target types, to get the upload volumes per target type Setup -> Manage Cloud Control -> Repository -> Metrics -> Drill down on a target type
• Alter the upload volume by changing the collection frequency for data- and configuration metrics
– Enable or disable a metric Prevent unwanted metrics from getting collected
– Collection frequency Scale back frequency of non-critical metrics
– Use 'Alerting Only' for state metrics Typically for metrics reporting a boolean-like output
How to control resources? (1) - Incoming data
12cR4
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Loading
• Keep number of collection errors to a minimum! Fix underlying problems, to guarantee proper monitoring
• Set proper metric thresholds (warning, critical, number of occurrences) What-if analysis now available in 12cR4 to predict the number of incoming alerts based on threshold settings
Reduce number of alerts generated by setting the correct warning and critical threshold
Prevent unnecessary multiple alerts by setting the number of occurrences
How to control resources? (2) - Incoming state
12cR4
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
4 Key Processes Notifications
1
2
3
4
5
Architecture Overview
Diagnostic Methodology
4 Key Processes
Summary
Appendix
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Notifications
What is It? Symptoms
Alerts / Events (threshold violations as reported by the Agent or repository)
Alerts not triggered Notifications/Emails not sent
Incidents / Problems (Software or Hardware faults, ADR and ASR incidents)
Notifications/Emails not sent
Job state changes (status of a job execution, or informational state messages about the execution of EM jobs)
Events not triggered Notifications/Emails not sent
Informational Messages Messages received via My Oracle Support
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Notifications How is a Notification generated?
Alert Event Incident / Problem
Agent detect threshold
violation, generates an Alert
and uploads it to the OMS
OMS correlates the Alerts into
unique events, to track the life-
cycle of a threshold violation
Incident rules evaluated for
each event update, to see if
this event has to get turned into
an Incident or a Problem
Incident or problem created if
so specified in the rule actions,
and can be tracked via the
Incident Manager in the UI
If the incident rules specify so, a notification is put in the
delivery queue, and delivered to the recipient
Ag
en
t O
MS
R
ep
osito
ry
OM
S
Incident Rules Promotion
Incident Rules Actions
Event Processing
Loader Sub-System
Metric Engine
Incid
en
t Ru
les
Au
tom
ated
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Notifications
• Event processing statistics captured per OMS (processing of incoming alerts/event, conversion into incidents/problems and notification delivery checks)
• Notification Delivery throughput and performance by method (EMAIL, JAVA, OS Command, PL/SQL, SNMP, SNMPv3 and Helpdesk Connector)
• A separate metric for the Agent-side availability metric for the notification system (to allow Out-Of-Band notifications)
How do we monitor it?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Notifications
• Number of notifications will be largely driven by the number of incoming alerts Setup -> Manage Cloud Control -> Repository -> Metrics
• Notification delivery backlog shown on the 'Health Overview' page
Setup -> Manage Cloud Control -> Health Overview
How to get more information? (1) - Retrieving monitoring data
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Notifications
• OMS log and trace files (EM application tracing) – Look for modules 'notification', ‘Delivery' in <GC_INST>/em/EMGC_OMS1/sysman/log
emoms_pbs.log : English OMS trace file emoms_pbs.trc : Native language OMS trace file
– Enable Debug $ emctl set property -name log4j.category.oracle.sysman.em.notification -value "DEBUG" -module logging
• Repository (PL/SQL tracing) – Enable tracing (Modules are 'EM_NOTIFY' and 'NOTIFICATION')
SQL> exec emdw_log.set_trace_level('EM_NOTIFY',<level>);
or:
$ repvfy send start_trace -name "EM_NOTIFY"
$ repvfy send stop_trace -name "EM_NOTIFY"
– To generate the PL/SQL trace report: $ repvfy dump trace
• EMDIAG reports $ repvfy dump backlog Backlog report for all information flows $ repvfy dump notif_health Health report for notification system
$ repvfy send notif_dump Diagnostic dump in OMS log files for notification system
421053.1 : EMDIAG Master Index
How to get more information? (2) - Logging and tracing
Levels 0 - Fine / Debug
1 – Informational
2 – Warning
3 - Severe
4 - No Tracing / OFF
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Notifications
• Number of notification threads (rarely changed): $ emctl set property -name oracle.sysman.core.notification.max_delivery_threads -value 24
• Number of notification thread connections (rarely changed): (Should always be between max threads / 2 and max threads) $ emctl set property -name oracle.sysman.core.conn.maxConnForNotifications -value 25
• Limit the number of notifications send out per minute (extremely rare): (global parameters - set once for All OMS's) $ emctl set property -name oracle.sysman.core.notification.emails_per_minute -value 5000
$ emctl set property -name oracle.sysman.core.notification.cmds_per_minute -value 5000
$ emctl set property -name oracle.sysman.core.notification.traps_per_minute -value 5000
$ emctl set property -name oracle.sysman.core.notification.plsql_per_minute -value 5000
• Before changing the values though, consider these two questions: Why were the alerts generated in the first place? (wrong threshold used?)
Do I really want to notify people about all these metrics? (incident rule change?)
How to control resources?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Summary
1
2
3
4
5
Architecture Overview
Diagnostic Methodology
4 Key Processes
Summary
Appendix
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
To summarize
• Information flows are initiated by background processes (eg: callbacks) and as a result of user initiated activity (eg: Console or EMCLI)
• Efficient interactions between all the EM components is the key to good performance
• Monitor performance and throughput on each tier
• Resources have to be balanced on all tiers (Agent, OMS and repository)
• Resource constraints on one tier can cause up-stream or down-stream bottlenecks
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
EM Resources
• Focus-On Document for Enterprise Manager @ OOW 2014: https://oracleus.activeevents.com/2014/connect/focusOnDoc.do?focusID=17776
• Oracle website http://www.oracle.com/us/products/enterprise-manager/index.html
• Documentation http://www.oracle.com/pls/em121/homepage
• Best Practices: – Best Practices Blog
https://blogs.oracle.com/EMMAA/
– Operational Considerations and Troubleshooting http://www.oracle.com/technetwork/database/availability/managing-em12c-1973055.pdf
– White paper Sizing guidelines http://www.oracle.com/technetwork/oem/framework-infra/em12c-sizing-1590739.pdf
Getting additional information About Enterprise Manager
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Sessions – Monday, September 29th
ID Title Time Location
CON8217 Managing the Oracle Fusion Middleware Stack with Oracle Enterprise Manager 11:45 AM - 12:30 PM Moscone South - 200
CON8856 Oracle Enterprise Manager: The Complete Solution and Oracle’s Best Kept Secrets 11:45 AM - 12:30 PM Moscone South - 301
CON8449 Automatic Workload Repository Warehouse: Helping DBAs Make Sure History Never Repeats Itself
1:30 PM - 2:15 PM Moscone South - 104
CON8018 Best Practices from Oracle Cloud Delivered On-Premises with Oracle Enterprise Manager 1:30 PM - 2:15 PM Moscone South - 270
CON8225 Under the Hood: Diagnosing and Troubleshooting Oracle Enterprise Manager 12c Release 4
1:30 PM - 2:15 PM Moscone South - 302
CON8138 Beyond the Basics: Making the Most of Oracle Enterprise Manager 12c Monitoring 1:30 PM - 2:15 PM Moscone South - 304
CON8567 Best Practices for Maintaining and Supporting Oracle Enterprise Manager 2:45 PM - 3:30 PM Intercontinental - Grand Ballroom C
CON8178 Best Practices for Managing Oracle WebLogic Server with Oracle Enterprise Manager 12c 2:45 PM - 3:30 PM Moscone South - 200
CON8177 Private Database Clouds: A Standardized Service Catalog for Delivering DBaaS 2:45 PM - 3:30 PM Moscone South - 305
CON3178 Database Software Currency: Using Oracle Enterprise Manager 12c Provisioning and Patching
2:45 PM - 3:30 PM Moscone South - 301
58
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Sessions – Monday, September 29th
ID Title Time Location
CON3111 Set Up Oracle Real User Experience Insight 12c to Monitor Oracle WebLogic Applications’ UX
4:00 PM - 4:45 PM Moscone South - 250
CON4102 SQL Tuning Without Trying 4:00 PM - 4:45 PM Moscone South - 104
CON8212 Oracle Management Pack Plus for Identity Management Best Practices and Lessons Learned
4:00 PM - 4:45 PM Moscone South - 200
CON7899 Oracle Data Integrator: Product Update and Future Strategy 4:00 PM - 4:45 PM Moscone South - 252
CON2043 Consolidating to Database as a Service with Oracle Real Application Testing 5:15 PM - 6:00 PM Moscone North - 130
CON5983 Full Visibility into Oracle WebLogic/Java Diagnostics with Oracle Enterprise Manager 12c
5:15 PM - 6:00 PM Moscone South - 200
CON2436 Why Database as a Service Will Be a Breakaway Technology at Société Générale 5:15 PM - 6:00 PM Moscone South - 301
CON7720 Advanced Management with Oracle Application Management Suite for Oracle E-Business Suite
5:15 PM - 6:00 PM Moscone West - 2018
CON8214 Maximizing Reliability of Oracle Business Intelligence Enterprise Edition and Oracle Exalytics
5:15 PM – 8:00 PM Moscone South – 262
59
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Sessions – Tuesday, September 30th
ID Title Time Location
GEN8250 General Session: Drive the Future of Self-Service IT with Oracle Enterprise Manager Noon – 12:45 PM Moscone South - 103
CON5748 Create a DBaaS Catalog in an Hour with a PaaS-Ready Infrastructure Noon – 12:45 PM Moscone South - 301
CON2586 Best Practices for Deploying a DBaaS in a Private Cloud Model Noon – 12:45 PM Moscone South - 310
CON7830 Solving Data Skew in Oracle Business Applications with Oracle’s Flash-Optimized SAN Storage
3:45 PM - 4:30 PM Intercontinental - Intercontinental C
CON8452 Future Now: Advanced Database Management for Today’s DBA 3:45 PM - 4:30 PM Moscone South - 104
CON4045 Provision Oracle Fusion Middleware Faster with Oracle Enterprise Manager 12c 3:45 PM - 4:30 PM
Moscone West - 3016
CON5875 Using Oracle Enterprise Manager to Deliver Multitenant DBaaS on Oracle Exadata: Lessons Learned
5:00 PM - 5:45 PM Moscone South - 301
CON8450 SQL (and PL/SQL) Tuning Experts Panel 5:00 PM - 5:45 PM Moscone South - 308
60
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Sessions – Wednesday, October 1st
ID Title Time Location
CON4954 Oracle Infrastructure Systems Management with Oracle Enterprise Manager and Ops Center
10:15 AM - 11:00 AM Intercontinental - Telegraph Hill
CON7961 Streamline Utility IT Operations with Oracle Enterprise Manager 10:15 AM - 11:00 AM Marriott Marquis - Salon 14/15
CON8139 Database Time-Based Performance Tuning: From Theory to Practice 10:15 AM - 11:00 AM Moscone South - 104
CON8173 Management of Oracle SOA Suite and Oracle Service Bus with Oracle Enterprise Manager 12c
10:15 AM - 11:00 AM Moscone South - 200
CON8121 Databases to Oracle Exadata: The Saga Continues for Oracle Enterprise Manager–Based Patching
10:15 AM - 11:00 AM Moscone South - 300
CON3182 Deployment of Oracle Exadata and Oracle Exalogic Increases Business Efficiency 10:15 AM - 11:00 AM Moscone South - 310
CON8133 Behind the Scenes of Managing the Engineered Systems Showcase 11:30 AM – 12:15 PM Intercontinental - Telegraph Hill
61
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Sessions – Wednesday, October 1st
ID Title Time Location
CON2927 Oracle Enterprise Manager 12c: Maximize ROI via a Single Pane of Glass Across a Data Center
11:30 AM - 12:15 PM Moscone South - 200
CON8247 DBA’s New Best Friend for Mistake-Free Administration: Oracle Real Application Testing
11:30 AM - 12:15 PM
Moscone South - 301
CON8245 Tips for Successful Oracle Exadata Management with Oracle Enterprise Manager 12c 11:30 AM - 12:15 PM
Moscone South - 303
CON8451 Next-Generation Testing with Oracle Application Testing Suite 11:30 AM - 12:15 PM
Moscone West - 3002
CON8091 Middleware as a Service: Converged Solution for Administrators and DevOps 12:45 PM - 1:30 PM Moscone South - 301
CON8134 Zero to Manageability in One Hour: Build a Solid Foundation for Oracle Enterprise Manager 12c
12:45 PM - 1:30 PM
Moscone South - 303
CON5489 Deploy Oracle Fusion Middleware as a Service (MWaaS) on a Shared-Services Cloud 12:45 PM - 1:30 PM
Moscone South - 309
62
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Sessions – Wednesday, October 1st
ID Title Time Location
CON8185 Use Oracle Enterprise Manager in a Box to Easily Manage the Enterprise 2:00 PM - 2:45 PM Moscone North - 131
CON8130 Deployment Best Practices for Private Cloud: Fast Track to DBaaS and MWaaS 2:00 PM - 2:45 PM Moscone South - 301
CON8248 Trouble-Free Upgrade to Oracle Database 12c with Oracle Real Application Testing 2:00 PM - 2:45 PM Moscone South - 303
CON8016 DBaaS 2.0: Rapid Provisioning, Richer Services, Integrated Testing, and More 3:30 PM – 4:15 PM Moscone South - 301
CON7726 Oracle Exadata Database Machine Administration and Monitoring Made Easy 4:45 PM – 5:30 PM Moscone South - 104
CON8260 Database as a Service (DBaaS) Cookbook: Strategies and Tips for Successful Deployment
4:45 PM – 5:30 PM
Moscone South - 301
63
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Sessions – Thursday, October 2nd
ID Title Time Location
CON2561 You’ve Got It; Flaunt It: Oracle Enterprise Manager Cloud Control Extensibility 9:30 AM - 10:15 AM Marriott Marquis - Golden Gate C3
CON8273 Management and Monitoring of Oracle Tuxedo: Integrated, Automated 9:30 AM - 10:15 AM Marriott Marquis - Salon 14/15
CON7940 Building an On-Premises Java Cloud: Oracle WebLogic Server and Oracle Enterprise Manager
9:30 AM - 10:15 AM Moscone South - 200
CON8243 Oracle Enterprise Manager 12c Security Cookbook: Best Practices for Large Data Centers
9:30 AM - 10:15 AM Moscone South - 300
CON3028 Enterprise Architecture Approach to Developing a DBaaS Private Cloud at Boeing 9:30 AM - 10:15 AM Moscone South - 301
CON8184 What’s New and Best Practices for Oracle Data Masking and Subsetting 9:30 AM - 10:15 AM Moscone South - 306
CON5451 Highly Available, Highly Scalable: Oracle Enterprise Manager 12c for Large Enterprises 10:45 AM - 11:30 AM Marriott Marquis - Golden Gate C3
CON4114 Advanced Diagnostics and Monitoring with Oracle Enterprise Manager 12c 10:45 AM - 11:30 AM Moscone South - 301
64
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Sessions – Thursday, October 2nd
ID Title Time Location
CON2699 Oracle Exadata’s Exachk and Oracle Enterprise Manager 12c: Keeping Up with Oracle Exadata
10:45 AM - 11:30 AM Moscone South - 310
CON4448 PDBaaS with Oracle Enterprise Manager 12c 12:00 PM - 12:45 PM Marriott Marquis - Golden Gate C3
CON10038 Customer Panel: Private Cloud Consolidation, Standardization, and Automation 12:00 PM - 12:45 PM Moscone South - 301
CON8244 Manage the Manager: Tips on How to Best Manage Oracle Enterprise Manager 12c 1:15 PM - 2:00 PM Marriott Marquis - Golden Gate C3
CON8015 Security Compliance and Data Governance: Dual Problems, Single Solution 1:15 PM - 2:00 PM Moscone South - 301
CON7718 Managing and Monitoring Oracle GoldenGate 1:15 PM - 2:00 PM Moscone South - 302
CON7697 Oracle Enterprise Manager 12c Cloud Control for Managing Oracle E-Business Suite 12.2
1:15 PM - 2:00 PM Moscone West - 2018
CON6083 Real-World Operation Excellence with Oracle Enterprise Manager 12c: Taking It to the Next Level
2:30 PM - 3:15 PM Marriott Marquis - Golden Gate C3
CON8493 Odyssey of DBaaS: A UBS Story 2:30 PM - 3:15 PM Moscone South - 301
65
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Demos
ID Title Location Area Demopod #
3943 Application and Infrastructure Testing Moscone West, Lower Left
Applications
WLL-020
3962 Automatic Application and SQL Tuning
Moscone South, Left
Database
SLD-106
3946 Automatic Fault Diagnostics
Moscone South, Left
Database
SLD-101
3963 Automatic Performance Diagnostics
Moscone South, Left
Database
SLD-103
3944 Automatic Workload Repository Warehouse
Moscone South, Left
Database
SLD-111
3948 Automation and Storage Savings with Database as a Service and Snap Clone Moscone South, Left
Database
SLD-102
3921 Complete Data Center Monitoring with Oracle Enterprise Manager 12c
Moscone South, Left
Database
SLD-112
3947 Complete Database Lifecycle Management
Moscone South, Left
Database
SLD-107
3881 End User Monitoring and Diagnostics with Oracle Enterprise Manager 12c Moscone South, Left
Middleware
SLM-109
4028 Identity Management Monitoring with Enterprise Manager 12c
Moscone South, Left
Middleware
SLM-141 66
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Demos
ID Title Location Area Demopod #
3928 Middleware PaaS in Private Cloud with Oracle Enterprise Manager 12c
Moscone South, Left
Middleware
SLM-111
3925 Oracle Applications and Business Intelligence Management with Oracle Enterprise Manager 12c
Moscone West, Lower Left
Applications
WLL-023
3966 Oracle Enterprise Manager Cloud Control 12c Overview
Moscone South, Left
Database
SLD-105
3949 Oracle SuperCluster and Oracle VM for SPARC Management with Oracle Enterprise Manager Ops Center 12c Moscone South, Center
Systems , Servers, Virtualization -SC-158
3942 Oracle WebLogic Server and Oracle Coherence Management with Oracle Enterprise Manager 12c Moscone South, Left
Middleware
SLM-107
3945 Risk-Free Database Administration with SQL Performance Analyzer and Database Replay Moscone South, Left
Database
SLD-108
3926 SOA and Service Bus Management with Oracle Enterprise Manager 12c Moscone South, Left
Middleware
SLM-140
67
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager One Hour Hands-On Labs Monday 9/29 at Hotel Nikko
ID Title Time Room
HOL9508 Oracle Enterprise Manager Database as a Service: Automation for Broader Cloud Services
01:15 – 02:15 Hotel Nikko - Carmel
HOL9529 Rapidly Mass-Deploy Oracle Fusion Middleware with Oracle Enterprise Manager 12<i>c</i> Provisioning
02:45 – 03:45 Hotel Nikko - Nikko Ballroom I
HOL9532 Achieving Standardization with Oracle Enterprise Manager Database Lifecycle Management
04:15 – 05:15 Hotel Nikko - Carmel
HOL9530 Risk-Free Database Consolidation for Private Clouds with Oracle Real Application Testing
05:45 – 06:45 Hotel Nikko - Carmel
68
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager One Hour Hands-On Labs Tuesday 9/30 at Hotel Nikko
ID Title Time Room
HOL9528 Private Cloud Self-Service, Oracle Fusion Middleware PaaS with Oracle Enterprise Manager 12c
03:45 – 04:45 Nikko Ballroom I
HOL9509 Oracle Enterprise Manager 12c: Oracle WebLogic Server and SOA Diagnostics and Administration
05:15 – 06:15 Nikko Ballroom I
HOL9508 Oracle Enterprise Manager Database as a Service: Automation for Broader Cloud Services
05:15 – 06:15 Carmel
HOL9484 Maximizing Oracle Database 12c Performance with Oracle Enterprise Manager 06:45 – 07:45 Carmel
69
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager One Hour Hands-On Labs Wednesday 10/1 at Hotel Nikko
ID Title Time Room
HOL9484 Maximizing Oracle Database 12c Performance with Oracle Enterprise Manager 02:45 – 03:45 Carmel
HOL9532 Achieving Standardization with Oracle Enterprise Manager Database Lifecycle Management
04:15 – 05:15 Carmel
70
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager One Hour Hands-On Labs Thursday 10/2 at Hotel Nikko
ID Title Time Room
HOL9484 Maximizing Oracle Database 12c Performance with Oracle Enterprise Manager 10:00 – 11:00 Carmel
HOL9509 Oracle Enterprise Manager 12c: Oracle WebLogic Server and SOA Diagnostics and Administration
11:30 – 12:30 Nikko Ballroom I
HOL9528 Private Cloud Self-Service, Oracle Fusion Middleware PaaS with Oracle Enterprise Manager 12c
01:00 – 02:00 Nikko Ballroom I
71
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
ADDM Automatic Database Diagnostic Monitor ADR Automatic Diagnostic Repository ASH Active Session History ASR Automatic Service Request AWR Automatic Workload Repository BI Business Intelligence BIP BI Publisher CLI Command-line Interface CPU Central Processing Unit CRS Cluster Ready Services DBMS Database Management System EM Enterprise Manager GC Grid Control HTTP Hypertext Transfer Protocol IO Input / Output IP Information Publisher IT Information Technology JDK JAVA Development Kit JVM JAVA Virtual Machine JVMD JVM Diagnostics MDA Middleware Diagnostic Advisor
MTM Monitor The Monitor NAS Network Attached Storage OMA Oracle Management Agent OMR Oracle Management Repository OMS Oracle Management Server OOB Out-of-Band OS Operating System OUI Oracle Universal Installer PBS Platform Background Services PLSQL Procedural Language SQL QA Quality Assurance RAC Real Application Cluster RCA Root Cause Analysis RMAN Recovery Manager SAN Storage Area Network SLA Service Level Agreement SNMP Simple Network Management Protocol SQL Structured Query Language UI User Interface URL Uniform Resource Locator WLS WebLogic Server
The TLA library…