x64 workshop linux information gathering

Download X64 Workshop Linux Information Gathering

If you can't read please download the document

Upload: aero-plane

Post on 19-May-2015

2.344 views

Category:

Documents


5 download

TRANSCRIPT

  • 1.
    • Thorsten Kellermann
  • Sun Microsystems

X64 Work ShopLinux Information Gathering 2. Agenda

  • Linux Support Overview
    • Support model and structure
  • Data Collection
    • Red Hat sysreport
    • SuSE siga / config.sh
    • Linux explorer
  • Data Analyzing

3. Agenda (cont)

  • Advanced Troubleshooting
    • System Core Dump Capturing
    • Linux SysRq
    • Hanging System
  • Linux Analysis
    • CDA

4. Linux Support Overview

  • Linux Support from System TSC organization:
    • EMEA: System TSC VSP
      • coverage from 9am - 5pm
    • AMER/APAC: System TSC OS

5. Linux Support Overview (cont)

  • Supported Linux Versions:
    • Red Hat Enterprise Linux (RHEL)
      • Version 3 and 4
      • AS, WS, ES, DESKTOP
      • Only existing contracts, no new contracts after the 30.09.2006.
    • Novell/SuSE Linux Enterprise (SLES)
      • Version 8, 9 and 10
  • Back line support from Vendor available
    • We have a path to escalate issue to Red Hat or Novell/SuSE.

6. Linux Support Overview (cont)

  • What is covered by support?
    • Bugs within the OS or with Core applications
  • What is not covered?
    • Configuration of the system
    • HowTo questions
    • 3 rdParty applications
  • Other limitations
    • Own compiled Kernels, tainted Modules
    • Sun do not fix bugs within any distribution, its up to the Vendor.

7. Data Collection

  • Entitlement information
    • We need the entitlement for the Linux the customer installed
  • General thoughts about data collection
    • The issue must be visible within the data.
    • Data must be current.
      • Anything changed to the system? New data!
    • And it must be understandable.
      • if not, try SGRT.

8. Data Collection (cont)

  • Samples:
    • Customer has a working and a non working system
      • Collect data from both systems
    • Customer has changed the configuration by the advice of Sun Support, but this doesn't work.
      • Collect again all relevant data from the system to see what was changed.
    • Customer applies online updates to the system, but the issue isn't fixed.
      • We need again the data from the system to see what updates are applied.

9. Data Collection (cont)

  • Red Hat:
    • sysreport
      • Mandatory for escalating to Red Hat
      • File system hierarchy
      • Lack of some interesting information.
  • SuSE:
    • siga
      • Insufficient messages etc.
    • config.sh (preferred)
      • Collect much more infos than siga.
      • Encapsulate siga report

10. Data Collection (cont)

  • Others
    • Linux Explorer
      • Most complete data collection
      • Not a Sun tool
      • We are in discussion with SuSE to also accept this data set instead of siga/config.sh
      • Not accepted by Red Hat for escalation

11. Data Analyzing

  • There is no automatic tool!
  • This presentation isn't complete at all.
  • Determinate the Linux Version:
    • uname -a
    • /etc/*release*
  • Looking up Messages:
    • messages
    • dmesg
    • boot.log

12. Data Analyzing (cont)

  • What packages are installed? Which version?
    • RPM is the packages manager of RHEL and SLES.
      • rpm -qa
      • rpm -qaV (takes some time)
  • SAR report
    • looking in the sar data (package sysstat) shows the load of the system at the time when an issue occurs

13. Data Analyzing (cont)

  • Hardware/Firmware information
    • lspci [-v[v[v[v[v]]]]]
    • lsusb
    • dmidecode
      • not part of sysreport!
      • hardware.py
        • Python script wrapping dmidecode (RH only, may included in sysreport)
    • dmesg or /proc releated
      • e.g. firmware of SCSI disk in /proc/scsi/scsi

14. Data Analyzing (cont)

  • Overview

15. System Core Dump Capturing

  • No standard at the moment
    • Kdump has find it's way into the mainstream kernel.
  • RHEL 3 / 4 uses it's own stuff
    • netdump (preferred)
    • diskdump
  • RHEL 5 uses Kdump
    • An resident own kernel with small footprint
    • highly flexible and reliable

16. System Core Dump Capturing (cont)

  • SLES 8 / 9 uses LKCD
    • Based on an IBM/SGI implementation.
  • SLES 10 uses kdump
    • An resident own kernel with small footprint
    • highly flexible and reliable

17. Setting up RHEL 3 & 4 Netdump

  • Install Netdump Server
    • install package netdump-server
    • normally no configuration needed.
    • start service
  • Install Netdump Client
    • install package netdump-client
    • configure /etc/sysconfig/netdump
    • "service netdump propagate"
    • start service

18. Setting up RHEL 5 Kdump

  • Installed by Default
  • Configuration Dialog
    • enable / disable kdump
    • configure dump locations
      • local: file
      • net: nfs / ssh
      • partitions: ext2 / ext3 / raw
  • Quite easy to setup with the GUI dialog

19. Setting up SLES 8/9 LKCD

  • Install required package:
    • lkcdutils
  • Edit /etc/sysconfig/dump
  • Write configuration
    • # lkcd config
  • Activate service
    • # insserv /etc/init.d/boot.lkcd

20. Seting up SLES 10 Kdump

  • Install needed packages:
    • kexec-tools
    • kernel-kdump
    • kernel-*-debuginfo
  • Edit /etc/sysconfig/kdump
  • Enable kdump init service
    • via YaST runlevel editor
    • "chkconfig kdump on"
  • Add boot option "crashkernel=64M@16M"

21. Checking dump setup

  • Check if everything fit together:
    • Enable Magic SysRq feature temporarily
      • echo "1" > /proc/sys/kernel/sysrq
    • Force the system to dump
      • echo "c" > /proc/sysrq-trigger

22. Linux SysRq Feature

  • The Magic SysRq Feature is somewhat similar to Stop-A on Solaris
  • It can force the kernel to printout or dump information about the system
  • Sometimes really helpful for trouble shouting
  • May even work if the system seems to hang

23. Linux SysRq Feature (cont)

  • Disabled by default, need to be enabled
    • temporarily until next reboot
      • echo "1" > /proc/sys/kernel/sysrq
    • permanently
      • edit /etc/sysctl.conf to add the line: kernel.sysrq = 1
  • Issue locally on keyboard by Alt+SysRQ+
  • Issue remote by "echo > /proc/sysrq-trigger"

24. Linux SysRq Feature (cont)

  • Some Hotkeys:
    • K
      • call the Secure Attention function (SAK). SAK terminate every process running on the actual console, to cleanup the terminal.
    • s
      • Synchronized all hard disks.
    • u
      • Remounts all hard disks in read only mode. This will prevent dataloss, when the system is in an unstable situation.
    • t
      • Shows the actual task list.

25. Linux SysRq Feature (cont)

  • Some Hotkeys (cont):
    • b
      • boots the system immediately. You should synchronize and remount the hard disks read only before restarting the system.
    • p
      • Prints out the actual register content.
    • m
      • Prints out the memory information.
  • For a complete list lookup sysrq.txt in the Kernel documentation

26. Crash Dump Analyzing

  • Crash utility
    • Support varios dump fomats
      • Kdump, LKCD, Net/Disk dump
    • Integrated GDB
    • Can examinate live system Kernel
  • http://people.redhat.com/~anderson/

27. Crash Dump Analyzing (cont)

  • You need to have the debug information of the kernel
  • Crash package need to be installed
  • Load vmcore for analyzing
    • crash System.map vmlinux vmcore
      • dmesg
      • ps list
      • stack traces
      • etc.

28. Troubleshoot a Hanging System

  • Hard to troubleshoot due to lack of information
  • If a deadlocked kernel, NMI watchdog may help
    • add Kernel boot cmd nmi_watchdog=1 to grub configuration.
    • When system look is detected, a kernel panic will be initiated.
  • There might be a chance to force a dump (SysRq) when system hanging

29. Links

  • External Sources:
    • Linux Explorer http://www.unix-consultants.co.uk/examples/scripts/linux/linux-explorer/
    • LKCD Setup on SLES http://www.novell.com/coolsolutions/feature/15284.html
    • Crash Utility http://people.redhat.com/~anderson/
  • Internal Sources
    • System TSC Linux pages http://systems-tsc/twiki/bin/view/Teams/LinuxDataGathering
    • PTS Linux pages (outdated) http://barentz.germany.sun.com/ptsvs/Wiki.jsp?page=LinuxHowTos

30. Links

  • Did you know http://www.google.com/linux?

31. X64 Work ShopLinux Information Gathering

  • Thorsten Kellermann
    • [email_address]