the sco group 2007 © the sco group, inc. all rights reserved 1 sco unix diagnostics and...

37
1 THE SCO GROUP 2007 The SCO Group, Inc. All Rights Reserved SCO Unix Diagnostics and Troubleshooting Alexander Sack ([email protected] ) Senior Software Engineer

Upload: derick-dennis

Post on 29-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

1

THE SCO GROUP 2007

© The SCO Group, Inc. All Rights Reserved

SCO Unix Diagnostics and Troubleshooting

Alexander Sack ([email protected])Senior Software Engineer

Page 2: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

2

Agenda

Intro Initial System Load (ISL) Common Hardware and Driver Issues System Tuning Networking Tips Reporting Problems Q & A

Page 3: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

3

ISL: Overview

Before installing… Has the system itself been certified by the OEM? Is the motherboard in the CHWP? (Intel whitebox) Is it compatible kinda sorta maybe? Do I need a third-party HBA diskette? Network card supported? Does X support my graphic chipset? Disk layout issues, multi-boot?

Page 4: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

4

ISL: Debugging

“Alt-SysReq-H” or “Alt-Ctrl-H” to enter console mode

“Alt-SysReq-F1” or “Alt-Ctrl-F1” to go back to install screens

Acess to resmgr, ISL scripts (/isl/ui_modules), note any console messages during install

IVAR_DEBUG_ALL=1 Dumps log files in /tmp/log Transfer logs to floppy via cpio

E.g. find /tmp/log/* | cpio –oc –O /dev/dsk/f03ht cpio –ic –I /dev/dsk/f03ht

Page 5: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

5

ISL: Issues

Problem: Installation sees more processors than actually present

Reasons: Bad MPS tables Cores listed as physical CPUs in BIOS Limited ACPI support (OSR5 only)

Solution: Boot in single processor mode (ATUP) and apply latest MP/SMP

pack ACPI=Y, USE_XAPIC=Y, ENABLE_JT=Y, MULTICORE=N Flash BIOS

Page 6: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

6

ISL: Issues

Problem: Kernel hangs on boot-up Reasons:

Missing interrupts Mixed stepping processors

Solution: Boot in single processor mode (ATUP) Reverse stepped processors, make the LOWER stepping

processor in slot 1 Check BIOS settings, ACPI vs. MPS Move add-on PCI card to a different slot PnP set to OFF in BIOS

Page 7: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

7

ISL: Issues

Problem: Can not load a HBA from USB floppy Reasons:

BIOS does not support legacy mode (OSR5 only) “Device enumeration timeout” USB is disabled in the BIOS ISL CD left in tray

Solution: Check USB BIOS settings Re-plug USB floppy device, verify sdiconfig output on console Follow TA article on renaming disk nodes Remove CD before load Make sure disk was created correctly, dd image to p0 not s0 Try a different USB floppy device

Page 8: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

8

ISL: Issues

Problem: Root HBA not found after the DCU runs Reasons:

Didn’t load the right third-party HBA Software based RAID issues Valid media kit USB floppy wasn’t really picked up (ISL will use CD1 for HBA

drivers from an ATAPI drive) Solution:

Disconnect USB floppy after HBA loads Bind third-party resmgr entry to HBA driver manually via DCU Check resmgr entry BOARDID and verify that HBA really

supports the card Download a later driver from IHV website

Page 9: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

9

ISL: Issues

Problem: SATA or IDE hangs after loading or fails to recognize my devices

Reasons: Missed interrupts (polling messages) DMA incompatibility Driver in slave only configuration (OSR6/UW7) SATA/PATA card uses custom third-party driver (e.g. Adaptec,

Silicon Image, Marvell) Solution:

Check cables and jumpers Change mode in BIOS: Legacy, Compatible, Enhanced, AHCI

ATAPI_DMA_DISABLE=Y Avoid cable select (legacy PATA)

Page 10: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

10

ISL: Issues

Problem: Red screen during mount of CD Reasons:

Missed interrupts (polling messages) DMA incompatibility Driver in slave only configuration (OSR6/UW7) SATA/PATA card uses custom third-party driver (e.g. Adaptec,

Silicon Image, Marvell)

Solution: Check cables and jumpers Change mode in BIOS: Legacy,

Compatible, Enhanced, AHCI ATAPI_DMA_DISABLE=Y Avoid cable select (legacy PATA)

Page 11: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

11

ISL: Issues

Problem: NIC is not auto-detected Reasons:

Driver on ISL media is older than card Driver issues with card, driver loads but fails

Solution: Defer networking and pkgadd drivers after install After install, use SCOadmin Network to configure card Bind entry to particular NIC driver if card is within the

same family via DCU Stick in another card!

Page 12: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

12

ISL: Issues

Problem: vfs_mountroot() failure Reasons:

Driver on ISL media is older than card Driver issues with card, driver loads but fails “$static” not added to ROOT HBA sdevice file

Solution: Follow TA to mount disk from ISL Use the RECUT media Make sure you are using the latest HBA driver

Page 13: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

13

ISL: Issues

Problem: Screen goes blank after logo appears Reasons:

VESA mode is not supported by card On-board chipset uses system memory for

framebuffer

Solution: AGP Gart is now supported, install latest maintenance

pack USE_VESA_BIOS=Y Use a supported graphics chipset!

Page 14: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

14

ISL: Issues

Problem: Filesystem is left dirty after ISL and every reboot

Reasons: Aggressive BIOS Power Management RAID battery failure Target issues – CHECK CONDITIONS Older driver and the write cache

Solution: Check RAID battery levels Check HBA and target firmware revision Update to latest driver

Page 15: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

15

ISL: Issues

Problem: Installed one OS and another one won’t boot Reasons:

OSR5 8GB limit UW7/OSR6 128GB limit OSR5 on the first partition of a drive is recommended MBR rewritten

Solution: Use CD1 to boot-up and execute fdisk to rewrite MBR from

UW7/OSR6 fdisk Use a third-party boot loader like GRUB

Page 16: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

16

ISL: Issues

Problem: Failing to create large logical volumes Reasons:

VXFS technical 2TB limit OSR6/UW7 1TB physical capacity limit HTFS has issues with greater than 1TB filesystems (slow) RAID utility issues

Solution: Use VXFS and ODM Split volumes in 1TB chunks Use RAID BIOS or OEM utility if possible to always setup

volumes

Page 17: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

17

ISL: Issues

Problem: ISL load time is very slow Reasons:

ATAPI DMA is disabled Write caching is disabled Media errors Faulty hardware

Solution: Check IDE/SATA settings Some OEM disable write caching which makes install slow –

future boot parameter Check hardware and BIOS settings

Page 18: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

18

ISL: Issues

Problem: Kernel link failure at end of ISL Reasons:

IRQ conflicts in System driver file Driver configuration build error

Solution: Check BIOS settings Disable serial or legacy devices you don’t need Chroot into fresh install and check build files Update HBA drivers if available

Page 19: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

19

ISL: Issues

Problem: Kernel panics on boot-up Reasons:

Full moon out You weren’t nice to the machine that day The customer is out to get you

Solution: Boot in single processor mode Disable USB via boot parameter or BIOS Take note if possible of the stack trace to discern error Cry to the OEM Cry to SCO support

Page 20: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

20

Hardware and Driver Issues: Disk migration

Migrating OSR5 disk to OSR6 Install wd supplement before migration! Administer the disk at the source system FIRST

before migration OSR6 Divvy now works on OSR5 (wd) and OSR6

disks Limitations:

There is no conversion for UW VTOC disks to dual format OSR6

OSR6 does not support extended VTOC slices Always back your data before migration!

Page 21: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

21

Hardware and Driver Issues: Multi-core

All Intel based processors are multi-core! ACPI is required to fully support multi-core

(OSR6/UW7) OSR5 supports multi-core provided MPS tables are

sane – has some ACPI support (HT) OEMs have stopped testing MPS table!

SCO licenses per CPU package not core (industry standard)

Mixed steppings headaches

Page 22: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

22

Hardware and Driver Issues: HBAs

What driver to use? If in doubt, always use the driver diskette with the higher

IHVVERSION in it! Supported cards can be found in the Drvmap files of the

HBA driver/btld package http://pciids.sourceforge.net/ Sometimes adding a OEM branded BOARDID will work –

sometimes it will panic your system! “echo pcilong | ndcfg”

Management utilities are packaged with the driver if available

Recut media and maintenance packs include latest drivers

Read the README posted on the SCO download area!

Page 23: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

23

System Tuning: General

Migrating from OSR5 to OSR6 DO NOT BLINDLY import OSR5 tunables from OSR6 E.g. buffer cache has different use on OSR6 Identify the performance problem you are trying to

solve first! [ GOLDEN RULE ] Take measurements

/etc/conf/bin/idtune SCOadmin has wrapper for idtune

Page 24: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

24

System Tuning: Performance

Performance Tuning Identify bottleneck Rtpm, prfstat, sar, prof, lprof

CPU performance sar –u

00:00:00 %usr %sys %wio %idle %intr 00:00:01 30 10 10 46 4

high usr, investigate with truss, prof high sys, intr, investigate with prfstat high wio, storage throughput

Page 25: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

25

System Tuning: Simple Example

Page 26: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

26

System Tuning: Simple Example

Page 27: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

27

System Tuning: Simple Example

Page 28: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

28

System Tuning: Storage

Storage Performance Hardware configuration

Device topology don’t connect slow devices and fast devices on the same bus

e.g. put your slow tape drive on a separate controller Cabling

ensure your cables are up to specifications Hardware RAID

performance RAID 0 vs integrity RAID 1 RAID 5

Filesystem tuning fsadm, block size, increase logsize (@ mkfs only) mount options; tmplog

ODM dramatic performance boost for $99

Page 29: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

29

System Tuning: Memory

Memory Avoid swapping DEDICATED_MEMORY, use if using shared memory

mkdev dedicated Dedicated memory reserves physical Saves kernel virtual Reduces paging, uses large mappings (PSE)

SEGKMEM_PSE_BYTES Add more memory!

Page 30: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

30

System Tuning: Filesystem

Tuning for largefile support HDATLIM, SDATLIM, HVMMLIM, SVMMLIM,

HFSZLIM, SFSZLIM set to 0x7fffffff (unlimited) /etc/conf/bin/idbuild –B && init 6 fsadm /mountpoint or raw device

fsadm –o largefiles /

OSR6 defaults to largefiles, UW7 does not

Building large file aware applications -D_FILE_OFFSET_BITS=64

Page 31: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

31

Networking Tips: Configuration

Network configuration netconfig

drivers installed in /etc/inst/nd/ bcfg files are parsed by ndcfg /etc/confnet.d/inet/interface is configured at boot /etc/tcp (c.f. S69inet on UW) is run to link the driver into

dlpi - initialize -U STREAMS based network stack

ndcfg useful for displaying info about the system geared toward network device driver writers

Page 32: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

32

Networking Tips: Tuning and Tools

Network monitoring & tuning tools netstat ifconfig inconfig ndstat ndcfg traceroute ping Tcpdump

dlpid logging dlpid –l <logfile> /etc/inst/nd/dlpidPIPE or edit /etc/default/dlpid

LOG=<logfile> NIC failover

automatically and transparently switch to a backup NIC in the event of failure of the primary

Chains of backup NICs supported

Page 33: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

33

Networking Tips: Commons Issues

Network is UP but can’t connect to other systems is DNS configured correctly? netstat –rna do you have a default route?

Network performance is poor check cabling ndstat –l

collisions inconfig nfsstat

Page 34: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

34

Networking Tips: Common Issues

Network responds to pings but can’t login are the daemons running ? licensed ?

Multiple hosts with the same IP or MAC arp –an (-n disable name resolution) ? (132.147.103.1) at xx:xx:xx:xx:xx:xx (802.3) ? (132.147.103.9) at xx:xx:xx:xx:xx:xx (802.3)

Stopping and starting the interface ifconfig net0 down /etc/tcp stop – daemons stopped, NIC is UP /etc/tcp shutdown – everything down /etc/nd stop start

Page 35: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

35

Reporting Problems

crash Primarily used for panic analysis

/var/spool/dump dumpmemory to generate a crash dump on a live system crash –a <dumpfile>; will produce a listing suitable for SCO support provide dumpfile, /stand/unix, all of /etc/conf/mod.d, /usr/sbin/crash

Useful crash commands ps, as, trace, u, eng, od, addstruct, help walk data structures using od

od –f

ksh style history buffer

lsof, can save hours of fun on a live system

Page 36: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

36

Reporting Problems

When reporting problems to support: Establish a reproducible case (if possible) Save any crash related files Note stack trace, crash -a Save system log files

/var/adm/

Include hardware specs when filing a bug run sysinfo

Be aware of changes made to /stand/boot bootparam

Page 37: THE SCO GROUP 2007 © The SCO Group, Inc. All Rights Reserved 1 SCO Unix Diagnostics and Troubleshooting Alexander Sack (alexs@sco.com) Senior Software

37

Q & A