debugging and configuration best practices for oracle linux

45
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle 1

Upload: terry-wang

Post on 18-Dec-2014

2.497 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle1

Page 2: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle2

Debugging and Configuration Best Practices for Oracle Linux

Greg MarsdenSenior Director, Linux and Virtualization

Page 3: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle3

Agenda

Key Linux Tips and Tricks

Common Issues

Diagnostic Tools and Use Cases

Do it Yourself Debugging

Ksplice in the Datacenter

Page 4: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle4

Tips and Tricks:Key Points

Page 5: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle5

Key Linux Tips and Tricks

Kernel Tuning: Oracle Preinstall RPM

Diagnostic Tools: kdump and oswatcher

Best Performanceand Reliability

Page 6: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle6

Oracle Preinstall Package and Templates

oracle-rdbms-server-11gR2-preinstall-1.0.6.el6.x86_64.rpm Per-Product Preconfiguration Package

– Based on Validated Configuration’s real world stack testing

– Includes Product Release Notes recommendations

– Installs necessary dependencies and kernel tuning parameters

– Individual for each Oracle product

Oracle VM Template for Oracle RDBMS Server– Production-ready, installed virtual machine templates from eDelivery

Configure Oracle Products Automatically

Page 7: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle7

System Diagnostics

oswatcher utility: Install and leave running to collect over-time information about system activity.

serial console or netconsole to remotely monitor system activity in the case of a disk, network or system outage.

kexec crash collection utilities to gather forensic information from malfunctioning systems.

Critical Diagnostics Software should run at all times

Page 8: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle8

Tips and Tricks:Memory Management

Page 9: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle9

Anonymous DBAOracle User

“Help! my system has 250 GB of RAM I’m running out of memory! My consultants are telling me we can’t scale with a 120GB SGA and this many connections, but I can’t fit any more RAM in this system.”

Page 10: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle10

• Symptom: Out of Memory Errors, slow performance. Detected via oswatcher.

• Cause: SGA mapped in 4k pages instead of 2MB

• Solution: Use Hugepages

• Hugepages are faster.

• Hugepages are “pinned” and won’t be swapped.

Issue: Not using Hugepages

Frequent Issue I found the following: . 13:09:19    57591060k free      159 client connections 13:26:01    26189944k free      1826 client connections 13:32:31    15547144k free      2024 client connections 13:57:00    467048k free        2037 client connections  (here is where we begin swapping memory to disk)

I also found this:

zzz ***Fri Aug 9 13:23:22 PDT 2011 MemTotal:     250 GB MemFree:     464 MB PageTables:   112 GB

Page 11: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle11

Performance with Hugepages

With Hugepages

• 200 Connections to a 12.9GB SGA• Before DB Startup PageTables: 7748 kB• After DB Startup Pagetables: 21288 kB• After 200 PQ slaves run query

Pagetables: 80564 kB• Time to complete: 00:00:18.77

Without Hugepages

• 200 Connections to a 12.9GB SGA• Before DB Startup Pagetables: 7400 kB• After DB Startup Pagetables: 652900 kB• After 200 PQ Slave run query

Pagetables: 6189248k• Time to complete: 00:10:23.60

Page 12: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle12

Hugepages and Transparent Hugepages

“Regular” Hugepages [Ref. Doc ID 749851.1]– Reduce footprint of individual Oracle database connections.

– Increase performance and scalability.

– Requires manual tuning after SGA changes, and does not work with AMM.

Transparent Hugepages– Transparent hugepages do not help the RDBMS use case.

– Auto-allocate hugepages for large memory allocations. Great for Java/middleware/applications.

– New for UEK and OL6!

Performance for DB and Middleware Applications

Page 13: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle13

Issue: Slow Performance at 95% RAM

Symptoms– System is swapping and shows low free/cached memory

– Reduced system performance

Cause: Usually the kernel is hogging CPU in try_to_free_pages from pagecache, inactive lists.

Solution– Ensure you are running a shrink_zone patched kernel: UEK, OL6, or

OL5+BUG6086839

– If system is swapping but performance is OK, get more RAM.

OL5 Specific issue with large memory allocation

Page 14: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle14

Issue: System Swapping with Free Memory

Symptom: System starts to swap while reporting free RAM– vmstat reports free memory.

– dmesg has “order 5 allocation failed” messages.

– If <5 order allocations are failing, there are larger issues

Cause: Memory Fragmentation. On NUMA systems, caused by fragmentation of node-local memory for kernel applications.

Solutions:– Disable NUMA

– Decrease MTU size if using jumboframes

NUMA Specific Problem

Page 15: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle15

Memory Accounting

Free = Cached + Free– All free space on Linux is used for pagecache.

– This behavior cannot be disabled.

Process Shared Memory is hard to find in Linux– RSS double counts shared memory, Total includes unmapped pages.

– Use /proc/<pid>/smaps to see real process memory usage.

cgroups: New features in the latest kernels let you restrict RAM– Useful to throttle pagecache use by backup processes

What is Linux Doing With My Memory?

Page 16: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle16

Swap is a highly contentious topic on Linux

– Benefit: Allows “room to grow” for inadequately sized systems.

– Drawback: Much slower than memory access, often makes problems worse.

Recommendation: Use swap, but ensure IO to swap disk is kept close to zero.

Swap: What is it good for?

Tuning Swap Space

vmstat output

Page 17: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle17

Tips and Tricks: General Recommendations

Page 18: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle18

Other Configuration Trouble

Use UUID to Mount System Disks– Symptom: System panics after upgrade

– Cause: New hardware, drivers, or kernel reorders device discovery

– Cautions: May not work with LVM snapshot

NFS Locks Not Released on Reboot– Cause: kernel and DNS have different hostnames

– Solution: ensure kernel hostname is fully qualified. See BUG 3156942.

Cluster Reboots with OCFS2– Cause: Network or Disk outages can cause OCFS2 to fence nodes

– Solution: Ensure OCFS2 timeouts are greater than storage/network failover timeouts. Defaults may be too short for o2cb heartbeat.

Assorted Common Configuration Issues

Page 19: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle19

System Scheduling vm.swappiness

100: Force aggressive swapping 0: Insurance against a backup process hogging all system memory

Network Protocol Buffers net.core.wmem_default/max: Buffer size for outgoing network packets.

net.core.rmem_default/max: If these values are set too small, system may discard TCP packets

Memory Management vm.dirty_ratio: encourage frequent pagecache writeback

vm.lowmem_reserve_ratio/vm.min_free_kbytes: reserve physical memory for kernel allocations

Performance Tuning Kernel Parameters

Page 20: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle20

Oracle finds and fixes critical bugs in Enterprise Linux.

– Red Hat Compatible Kernel vs. Oracle-Modified kernel

– Install the Compatible Kernel for bug-for-bug compatibility with RHEL

Patches required for correct Oracle product operation

Bugs Fixed in Enterprise Linux

Oracle Linux 5 Bug Fixes

Page 21: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle21

Oracle Patches Linux

Staying up-to-date with your Linux distribution is very important. Bug-Fixed Oracle Linux Kernel UEK: Unbreakable Enterprise Kernel

– Top Performing Kernel. World Record TPCC Benchmark.

– Provides OL6 performance on OL5 systems.

Backporting of fixes is a temporary solution, not a permanent one.

– Always plan to update or ksplice to the latest kernel version.

Specially Tuned Linux Kernels for Customer Requirements

Page 22: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle22

Get the latest in performance and features from Linux, tested by Oracle.

– All new kernel, optimized for Oracle.

Stay closer to mainline Linux with patches to improve performance for Oracle workloads.

– All patches are open source and submitted to mainline Linux

– Patches provided via RPM and via ksplice

– World Record TPCC Bencmark March ’12.

UEK: Modern Linux for Oracle

Fast, Modern, Reliable

Page 23: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle23

Tips and Tricks: Diagnostics

Page 24: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle24

Issue: Post-Event Diagnostics

Hard hangs– Panic, OOPS, nmi_watchdog

– “Spontaneous Reboot”

Brownouts– Performance Degredation

Cluster Scenarios– Network or Disk may have gone away, triggering the fence

– Need to maintain crash data in the event of loss of net/disk

– Ensure timeouts (like OCFS2) are set correctly

What to do after a crash or hang?

Page 25: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle25

Continuous OS Logging oswatcher continuous logging collected

timestamped snapshots of system commands: ps, top slabinfo, meminfo vmstat, mpstat, iostat

Other tools can be employed as well, like sar or collectl.

Two Kinds of Critical OS Logging

Panic/Hang Event Logging serial console or netconsole should always

be set up for any production system. No exceptions.

Consoles also preserve sysrq data. kdump system memory image collection.

Page 26: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle26

Console Logs

Kernel Messages may not be available after a crash– Serial Consoles are proven technology for preserving console output

How to capture Serial output:– Reliable ways to capture serial output:

ILOM virtual console Serial-Over-Lan BIOS config Inexpensive DB9-USB converter or Serial Concentrator

– Unreliable ways to capture serial output: Physically attached terminal with ‘setterm –blank 0’ and system not configured to reboot netconsole (can be difficult to configure, and subject to network outages)

Things to check:– Ensure Baud Rate is high enough (not 9600 baud)

– For Virtual Console, ensure console history is setup to capture large amount of output

Finding Faults if Disk or Networking Fail

Page 27: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle27

Keyboard and Console DiagnosticsSysRq Key Combinations for Diagnostics

Magic SysRq Key How to Invoke Magic SysRq…

M: dump system memory statistics Console:

Alt + Sys Rq + <cmd>

P, W: dump the stack for all processors Serial Console:

<Break> <cmd>

T: dump the kernel stack trace for all processes Command Line:

echo t > /proc/sysrq-trigger

C: Immediately cause a system crash Oracle VM dom0:

xm sysrq <cmd> <domain ID>

S … U… B: Emergency Sync all disks, Unmount disks, reboot.

Ensure kernel.sysrq = 1

Some of these operations (like stack trace) dump a lot of data (1MB or more!).

These operations take full priority in the kernel. Do not run them in your monitoring scripts, use carefully!

Page 28: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle28

Enable Keyboard- and Console-based Debugging kernel.sysrq=1

Always have this set to enable debug commands

System-wide Events panic_on_oom: Panic for Out of Memory condition

(Alternative would be to kill the high memory process) panic_on_oops: Panic for system problems

(off: some modules may survive a panic, but system state is inconsistent)

Per-Process Events hung_task_timeout: Enable warning if process not scheduled for (timeout) second.

Can cause a lot of log messages, not usually useful hung_task_panic: Cause a stack trace and system panic if the timeout is hit

Can be useful for debugging. Not good to set by default.

Diagnostic/Destructive Kernel Parameters

Page 29: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle29

Crash Kernel Memory Snapshots

kdump: uses Linux kexec function to save kernel stacks after a panic– Only way to get diagnostic data if disk or network are not available.

– Reboots the system into a protected memory area to save crashed kernel

Very common errors:– Not testing kdump: Requires specific memory tuning (crashkernel=) and also

requires specific HBA or network drivers

– Have dedicated space for crash dumps. Preferably not in your root partition. Remember, vmcore == physical memory size.

– Local disk is faster and more reliable than network dumps.

– Use gzip or `makedumpfile` to compress cores prior to upload

Set Your System to Automatically Dump Core

Page 30: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle30

Crash `bt` backtrace or SysRq-T stack

– Get debug symbols from oss.oracle.com

Red flags:– Many processes in D state (IO)

– Many processes in same kernel routine (contention?)

Caution: Stack traces can be 1M or greater. Don’t do this frequently.

Reading a Kernel Core

Using the Crash Utility

dmesg output after SysRq-T

Page 31: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle31

Diagnostic Tools for Brownouts

strace -ttT: Diagnose slow processes– Automatically timestamp system calls

– Useful for diagnosing specific process syscall latency

– Also helpful to determine if a problem is in kernel or usermode

Crash utility on Virtual machines – `xm dump-core` takes a noninvasive kernel snapshot of a system

– Provides memory, stack traces, and kernel logs

Getting more out of your diagnostic tools

Page 32: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle32

Tips and Tricks: Ksplice

Page 33: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle33

Issue: Old Kernels

Symptom: Customer systems are encountering known/fixed issues– Examples: tcp window_size, shrink_zone, etc.

– Use new kernels for new features: NFSv4, dtrace, btrfs.

Cause: Older kernels are not ‘stable’. New kernels fix bugs. Solutions:

– Implement a periodic update schedule for kernel and OS packages, or…

– Use ksplice to stay up to date with patches

Newer Kernel Releases Fix Your Bugs

Page 34: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle34

Ksplice: Rebootless Kernel Patching

Ksplice keeps your system up to date– Integrated with ULN

– Now available in online and offline modes

Using Ksplice for Diagnostics and Patching– Real-World NFS Example

Zero Downtime Patching for Bugs and Security Updates

Page 35: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle35

Summary

Key Linux Tips and Tricks– kexec and oswatcher

Common Issues– memory management and configuration

Diagnostic Tools and Use Cases

Ksplice in the Datacenter

Page 36: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle36

Visit our partners and

don’t miss these events

sponsored by QLogic

Smoothie Bar on

Monday, Oct 1st, 2:30-

5:30pm

Ice Cream Social on

Wednesday, Oct 3rd, 1-

2pm

ORACLE LINUX PAVILION

Moscone South Booth 1033

Page 37: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle37

Oracle Linux Sessions

Tuesday, Oct 2nd

Oracle Linux TRACK SESSIONS    

GEN8726

General Session: Oracle Linux Strategy and Roadmap Speakers: Wim Coekaerts and Monica Kumar, Oracle

10:15 AM Moscone South - 103

CON8731

Top Technical Tips for Automatic and Secure Oracle Linux Deployments Speakers: Lenz Grimmer, Oracle, Martin Breslin, SEI Global, Ed Bailey, Transunion

11:45 AM Moscone South - 270

Page 38: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle38

Oracle Linux Sessions

Wednesday, Oct 3rd

Oracle Linux TRACK SESSION    

CON8729Why Switch to Oracle Linux? Speakers: Monica Kumar, Mike Radomski, SUNY

3:30 PMMoscone South -

270

HANDS ON LABS    

HOL9383 Oracle Linux Package Management: Configuring and Enabling Services

10:15 AM Marriot Salon 14/15 YB level

HOL9384 Oracle Linux Storage Management with LVM and Device-Mapper

11:45 AM Marriot Salon 14/15 YB level

Page 39: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle39

Oracle Linux Sessions

Thursday, Oct 4th

Page 40: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle40

NEW: Oracle Linux Curriculum Footprint

Oracle Linux System AdministrationInstructor-led and Live virtual

Unix/Linux EssentialsInstructor-led and Live virtual

This Oracle Linux System Administration course teachesyou all the essential system administration skills and includeskey information specific to Oracle Linux: Unbreakable Enterprise Kernel, Ksplice, ULN, and other key features

Visit:oracle.com/education/linux

Oracle Linux Training from Oracle University

Page 41: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle41

@ORCL_Linux Facebook.com/OracleLinux

Blogs.oracle.com/linux

Oracle LinuxExperts Group

YouTube.com/oraclelinuxchannel

Join our communities

Resources

VisitOracle.com/linux

Download for FREE edelivery.oracle.com/linux

Page 42: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle42

Graphic Section Divider

Page 43: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle43

The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract.It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Page 44: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle44

Page 45: Debugging and Configuration Best Practices for Oracle Linux

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle45