debugging and configuration best practices for oracle linux
DESCRIPTION
TRANSCRIPT
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle1
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle2
Debugging and Configuration Best Practices for Oracle Linux
Greg MarsdenSenior Director, Linux and Virtualization
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle3
Agenda
Key Linux Tips and Tricks
Common Issues
Diagnostic Tools and Use Cases
Do it Yourself Debugging
Ksplice in the Datacenter
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle4
Tips and Tricks:Key Points
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle5
Key Linux Tips and Tricks
Kernel Tuning: Oracle Preinstall RPM
Diagnostic Tools: kdump and oswatcher
Best Performanceand Reliability
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle6
Oracle Preinstall Package and Templates
oracle-rdbms-server-11gR2-preinstall-1.0.6.el6.x86_64.rpm Per-Product Preconfiguration Package
– Based on Validated Configuration’s real world stack testing
– Includes Product Release Notes recommendations
– Installs necessary dependencies and kernel tuning parameters
– Individual for each Oracle product
Oracle VM Template for Oracle RDBMS Server– Production-ready, installed virtual machine templates from eDelivery
Configure Oracle Products Automatically
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle7
System Diagnostics
oswatcher utility: Install and leave running to collect over-time information about system activity.
serial console or netconsole to remotely monitor system activity in the case of a disk, network or system outage.
kexec crash collection utilities to gather forensic information from malfunctioning systems.
Critical Diagnostics Software should run at all times
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle8
Tips and Tricks:Memory Management
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle9
Anonymous DBAOracle User
“Help! my system has 250 GB of RAM I’m running out of memory! My consultants are telling me we can’t scale with a 120GB SGA and this many connections, but I can’t fit any more RAM in this system.”
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle10
• Symptom: Out of Memory Errors, slow performance. Detected via oswatcher.
• Cause: SGA mapped in 4k pages instead of 2MB
• Solution: Use Hugepages
• Hugepages are faster.
• Hugepages are “pinned” and won’t be swapped.
Issue: Not using Hugepages
Frequent Issue I found the following: . 13:09:19 57591060k free 159 client connections 13:26:01 26189944k free 1826 client connections 13:32:31 15547144k free 2024 client connections 13:57:00 467048k free 2037 client connections (here is where we begin swapping memory to disk)
I also found this:
zzz ***Fri Aug 9 13:23:22 PDT 2011 MemTotal: 250 GB MemFree: 464 MB PageTables: 112 GB
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle11
Performance with Hugepages
With Hugepages
• 200 Connections to a 12.9GB SGA• Before DB Startup PageTables: 7748 kB• After DB Startup Pagetables: 21288 kB• After 200 PQ slaves run query
Pagetables: 80564 kB• Time to complete: 00:00:18.77
Without Hugepages
• 200 Connections to a 12.9GB SGA• Before DB Startup Pagetables: 7400 kB• After DB Startup Pagetables: 652900 kB• After 200 PQ Slave run query
Pagetables: 6189248k• Time to complete: 00:10:23.60
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle12
Hugepages and Transparent Hugepages
“Regular” Hugepages [Ref. Doc ID 749851.1]– Reduce footprint of individual Oracle database connections.
– Increase performance and scalability.
– Requires manual tuning after SGA changes, and does not work with AMM.
Transparent Hugepages– Transparent hugepages do not help the RDBMS use case.
– Auto-allocate hugepages for large memory allocations. Great for Java/middleware/applications.
– New for UEK and OL6!
Performance for DB and Middleware Applications
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle13
Issue: Slow Performance at 95% RAM
Symptoms– System is swapping and shows low free/cached memory
– Reduced system performance
Cause: Usually the kernel is hogging CPU in try_to_free_pages from pagecache, inactive lists.
Solution– Ensure you are running a shrink_zone patched kernel: UEK, OL6, or
OL5+BUG6086839
– If system is swapping but performance is OK, get more RAM.
OL5 Specific issue with large memory allocation
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle14
Issue: System Swapping with Free Memory
Symptom: System starts to swap while reporting free RAM– vmstat reports free memory.
– dmesg has “order 5 allocation failed” messages.
– If <5 order allocations are failing, there are larger issues
Cause: Memory Fragmentation. On NUMA systems, caused by fragmentation of node-local memory for kernel applications.
Solutions:– Disable NUMA
– Decrease MTU size if using jumboframes
NUMA Specific Problem
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle15
Memory Accounting
Free = Cached + Free– All free space on Linux is used for pagecache.
– This behavior cannot be disabled.
Process Shared Memory is hard to find in Linux– RSS double counts shared memory, Total includes unmapped pages.
– Use /proc/<pid>/smaps to see real process memory usage.
cgroups: New features in the latest kernels let you restrict RAM– Useful to throttle pagecache use by backup processes
What is Linux Doing With My Memory?
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle16
Swap is a highly contentious topic on Linux
– Benefit: Allows “room to grow” for inadequately sized systems.
– Drawback: Much slower than memory access, often makes problems worse.
Recommendation: Use swap, but ensure IO to swap disk is kept close to zero.
Swap: What is it good for?
Tuning Swap Space
vmstat output
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle17
Tips and Tricks: General Recommendations
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle18
Other Configuration Trouble
Use UUID to Mount System Disks– Symptom: System panics after upgrade
– Cause: New hardware, drivers, or kernel reorders device discovery
– Cautions: May not work with LVM snapshot
NFS Locks Not Released on Reboot– Cause: kernel and DNS have different hostnames
– Solution: ensure kernel hostname is fully qualified. See BUG 3156942.
Cluster Reboots with OCFS2– Cause: Network or Disk outages can cause OCFS2 to fence nodes
– Solution: Ensure OCFS2 timeouts are greater than storage/network failover timeouts. Defaults may be too short for o2cb heartbeat.
Assorted Common Configuration Issues
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle19
System Scheduling vm.swappiness
100: Force aggressive swapping 0: Insurance against a backup process hogging all system memory
Network Protocol Buffers net.core.wmem_default/max: Buffer size for outgoing network packets.
net.core.rmem_default/max: If these values are set too small, system may discard TCP packets
Memory Management vm.dirty_ratio: encourage frequent pagecache writeback
vm.lowmem_reserve_ratio/vm.min_free_kbytes: reserve physical memory for kernel allocations
Performance Tuning Kernel Parameters
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle20
Oracle finds and fixes critical bugs in Enterprise Linux.
– Red Hat Compatible Kernel vs. Oracle-Modified kernel
– Install the Compatible Kernel for bug-for-bug compatibility with RHEL
Patches required for correct Oracle product operation
Bugs Fixed in Enterprise Linux
Oracle Linux 5 Bug Fixes
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle21
Oracle Patches Linux
Staying up-to-date with your Linux distribution is very important. Bug-Fixed Oracle Linux Kernel UEK: Unbreakable Enterprise Kernel
– Top Performing Kernel. World Record TPCC Benchmark.
– Provides OL6 performance on OL5 systems.
Backporting of fixes is a temporary solution, not a permanent one.
– Always plan to update or ksplice to the latest kernel version.
Specially Tuned Linux Kernels for Customer Requirements
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle22
Get the latest in performance and features from Linux, tested by Oracle.
– All new kernel, optimized for Oracle.
Stay closer to mainline Linux with patches to improve performance for Oracle workloads.
– All patches are open source and submitted to mainline Linux
– Patches provided via RPM and via ksplice
– World Record TPCC Bencmark March ’12.
UEK: Modern Linux for Oracle
Fast, Modern, Reliable
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle23
Tips and Tricks: Diagnostics
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle24
Issue: Post-Event Diagnostics
Hard hangs– Panic, OOPS, nmi_watchdog
– “Spontaneous Reboot”
Brownouts– Performance Degredation
Cluster Scenarios– Network or Disk may have gone away, triggering the fence
– Need to maintain crash data in the event of loss of net/disk
– Ensure timeouts (like OCFS2) are set correctly
What to do after a crash or hang?
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle25
Continuous OS Logging oswatcher continuous logging collected
timestamped snapshots of system commands: ps, top slabinfo, meminfo vmstat, mpstat, iostat
Other tools can be employed as well, like sar or collectl.
Two Kinds of Critical OS Logging
Panic/Hang Event Logging serial console or netconsole should always
be set up for any production system. No exceptions.
Consoles also preserve sysrq data. kdump system memory image collection.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle26
Console Logs
Kernel Messages may not be available after a crash– Serial Consoles are proven technology for preserving console output
How to capture Serial output:– Reliable ways to capture serial output:
ILOM virtual console Serial-Over-Lan BIOS config Inexpensive DB9-USB converter or Serial Concentrator
– Unreliable ways to capture serial output: Physically attached terminal with ‘setterm –blank 0’ and system not configured to reboot netconsole (can be difficult to configure, and subject to network outages)
Things to check:– Ensure Baud Rate is high enough (not 9600 baud)
– For Virtual Console, ensure console history is setup to capture large amount of output
Finding Faults if Disk or Networking Fail
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle27
Keyboard and Console DiagnosticsSysRq Key Combinations for Diagnostics
Magic SysRq Key How to Invoke Magic SysRq…
M: dump system memory statistics Console:
Alt + Sys Rq + <cmd>
P, W: dump the stack for all processors Serial Console:
<Break> <cmd>
T: dump the kernel stack trace for all processes Command Line:
echo t > /proc/sysrq-trigger
C: Immediately cause a system crash Oracle VM dom0:
xm sysrq <cmd> <domain ID>
S … U… B: Emergency Sync all disks, Unmount disks, reboot.
Ensure kernel.sysrq = 1
Some of these operations (like stack trace) dump a lot of data (1MB or more!).
These operations take full priority in the kernel. Do not run them in your monitoring scripts, use carefully!
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle28
Enable Keyboard- and Console-based Debugging kernel.sysrq=1
Always have this set to enable debug commands
System-wide Events panic_on_oom: Panic for Out of Memory condition
(Alternative would be to kill the high memory process) panic_on_oops: Panic for system problems
(off: some modules may survive a panic, but system state is inconsistent)
Per-Process Events hung_task_timeout: Enable warning if process not scheduled for (timeout) second.
Can cause a lot of log messages, not usually useful hung_task_panic: Cause a stack trace and system panic if the timeout is hit
Can be useful for debugging. Not good to set by default.
Diagnostic/Destructive Kernel Parameters
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle29
Crash Kernel Memory Snapshots
kdump: uses Linux kexec function to save kernel stacks after a panic– Only way to get diagnostic data if disk or network are not available.
– Reboots the system into a protected memory area to save crashed kernel
Very common errors:– Not testing kdump: Requires specific memory tuning (crashkernel=) and also
requires specific HBA or network drivers
– Have dedicated space for crash dumps. Preferably not in your root partition. Remember, vmcore == physical memory size.
– Local disk is faster and more reliable than network dumps.
– Use gzip or `makedumpfile` to compress cores prior to upload
Set Your System to Automatically Dump Core
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle30
Crash `bt` backtrace or SysRq-T stack
– Get debug symbols from oss.oracle.com
Red flags:– Many processes in D state (IO)
– Many processes in same kernel routine (contention?)
Caution: Stack traces can be 1M or greater. Don’t do this frequently.
Reading a Kernel Core
Using the Crash Utility
dmesg output after SysRq-T
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle31
Diagnostic Tools for Brownouts
strace -ttT: Diagnose slow processes– Automatically timestamp system calls
– Useful for diagnosing specific process syscall latency
– Also helpful to determine if a problem is in kernel or usermode
Crash utility on Virtual machines – `xm dump-core` takes a noninvasive kernel snapshot of a system
– Provides memory, stack traces, and kernel logs
Getting more out of your diagnostic tools
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle32
Tips and Tricks: Ksplice
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle33
Issue: Old Kernels
Symptom: Customer systems are encountering known/fixed issues– Examples: tcp window_size, shrink_zone, etc.
– Use new kernels for new features: NFSv4, dtrace, btrfs.
Cause: Older kernels are not ‘stable’. New kernels fix bugs. Solutions:
– Implement a periodic update schedule for kernel and OS packages, or…
– Use ksplice to stay up to date with patches
Newer Kernel Releases Fix Your Bugs
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle34
Ksplice: Rebootless Kernel Patching
Ksplice keeps your system up to date– Integrated with ULN
– Now available in online and offline modes
Using Ksplice for Diagnostics and Patching– Real-World NFS Example
Zero Downtime Patching for Bugs and Security Updates
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle35
Summary
Key Linux Tips and Tricks– kexec and oswatcher
Common Issues– memory management and configuration
Diagnostic Tools and Use Cases
Ksplice in the Datacenter
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle36
Visit our partners and
don’t miss these events
sponsored by QLogic
Smoothie Bar on
Monday, Oct 1st, 2:30-
5:30pm
Ice Cream Social on
Wednesday, Oct 3rd, 1-
2pm
ORACLE LINUX PAVILION
Moscone South Booth 1033
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle37
Oracle Linux Sessions
Tuesday, Oct 2nd
Oracle Linux TRACK SESSIONS
GEN8726
General Session: Oracle Linux Strategy and Roadmap Speakers: Wim Coekaerts and Monica Kumar, Oracle
10:15 AM Moscone South - 103
CON8731
Top Technical Tips for Automatic and Secure Oracle Linux Deployments Speakers: Lenz Grimmer, Oracle, Martin Breslin, SEI Global, Ed Bailey, Transunion
11:45 AM Moscone South - 270
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle38
Oracle Linux Sessions
Wednesday, Oct 3rd
Oracle Linux TRACK SESSION
CON8729Why Switch to Oracle Linux? Speakers: Monica Kumar, Mike Radomski, SUNY
3:30 PMMoscone South -
270
HANDS ON LABS
HOL9383 Oracle Linux Package Management: Configuring and Enabling Services
10:15 AM Marriot Salon 14/15 YB level
HOL9384 Oracle Linux Storage Management with LVM and Device-Mapper
11:45 AM Marriot Salon 14/15 YB level
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle39
Oracle Linux Sessions
Thursday, Oct 4th
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle40
NEW: Oracle Linux Curriculum Footprint
Oracle Linux System AdministrationInstructor-led and Live virtual
Unix/Linux EssentialsInstructor-led and Live virtual
This Oracle Linux System Administration course teachesyou all the essential system administration skills and includeskey information specific to Oracle Linux: Unbreakable Enterprise Kernel, Ksplice, ULN, and other key features
Visit:oracle.com/education/linux
Oracle Linux Training from Oracle University
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle41
@ORCL_Linux Facebook.com/OracleLinux
Blogs.oracle.com/linux
Oracle LinuxExperts Group
YouTube.com/oraclelinuxchannel
Join our communities
Resources
VisitOracle.com/linux
Download for FREE edelivery.oracle.com/linux
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle42
Graphic Section Divider
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle43
The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract.It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle44
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle45