hpux kernel tuning guide

HP−UX Kernel Tuning Guidefor

Technical Computing

Getting The Best PerformanceOn Your Hewlett−Packard HP 9000 Systems

Version 2.0

Introduction

This document describes the underlying basics of why and how a HP−UX kernel is tuned and configured. Theintent is to provide customers, developers, application designers, and HP's technical consultants theinformation necessary to optimize the performance of existing hardware configurations and to makeintelligent decisions when running applications on HP's UNIX platforms.

Hardware Considerations

HP, and other hardware vendors, offer a broad selection of products with a wide range of CPU performance,memory and disk options, varying greatly in price. Obviously, performance of a software application will beaffected by the hardware selected to run it on. The reason so many different products are available, is to allowthe customer to select the most cost effective solution for their particular software problem. A large, heavilyconfigured system may not be utilized to its full potential if you only need to solve small, simple problemswhile a less capable system may be overloaded trying to solve large, complex problems that exceed itscapacity. Under these circumstances, neither system would be cost effective when utilized in this manner.Selecting the most cost effective system requires understanding your compute requirements as well as thehardware options.

There five key hardware areas that directly affect the performance you will obtain from your application:CPU, Memory, Disk, Graphics, and Network. While all these hardware areas are important, it is equallyimportant to configure a balanced system. It is counter productive to buy the fastest CPU and then configure itwith insufficient memory. You might get better performance and throughput with a slower, less expensive,CPU with the difference in price invested in more memory.

There are a large number of variables to consider when deciding on the hardware for your computeinfrastructure. The compute needs may vary from the very simple to the incredibly complex.

The best way to select the appropriate hardware configurations is to resolve your compute needs:

How many users need to be served?• What are the data server needs ?• What are the compute server needs ?• What are the application software needs?•

HP Global Technical Partner − Cadence

Getting The Best Performance On Your Hewlett−Packard HP 9000 Systems Version 2.0 1

There should be couple of different system configurations to fully cover your environment. Maybe 1, 2 or 3base system configurations will properly handle your desktop computing needs: one hardware configurationfor one type of user, a slightly different configuration for another and yet another configuration for the userrwho has major memory and swap requirements for her/his system. There may be a need for managing bothsmall and large batch tasks under a compute server or task queuing methodology. A data server will be neededfor storing the large amounts of data with a reliable backup system and revision control system. Add to thiscollection a software server dedicated to manage large software applications and licensing programs. The bestway to select your appropriate hardware configuration(s) is to perform benchmark tests that duplicate yourintended use of the system. With relevant benchmark data in hand, you will have the information you need tomake intelligent tradeoff decisions on the cost/performance benefits of the available hardware options for yoursite.

CPU

Many operations require a large number of integer and floating point calculations. A few applications will useinteger calculations, but others might rely heavily on floating point calculations. CPU performance is thesingle most important performance factor for executing a large number of calculations in the shortest possibletime. Selecting the CPU is a tradeoff between cost, the size of the problems you will be solving, and yourperception of adequate performance. If an operation takes five seconds, is it worth it to you to spend an extra$10,000 to do the operation in three seconds? However, if the operation takes five hours and the time can bereduced to one or three hours, it may be worth the added expense. If the operation is done several times a dayit is almost certainly worth it. If it is only done once a month then it may be questionable. When evaluatinghardware performance, you must prioritize the tasks to be performed relative to their importance, frequency,and impact on overall productivity.

Tasks that are most affected by CPU performance are those that involve more computation than disk access orgraphics display. Don't forget to consider investment protection. The CPU that seems adequate today may notmeet your needs in the near future. The rapid pace of hardware development makes existing systems obsoletein a very short period of time. How easy will it be for you to upgrade your systems to increase MIP's capacityor take advantage of the latest compiler or hardware technology?

One standard benchmark that you can use to gauge CPU performance is SPECint.

Memory

One of the most commonly asked questions is "How much memory do I need?". Unfortunately, the realanswers to this question are "Enough" and "It depends". The amount of memory you need is directly related tothe size of the applications you are working with. While 'X' amount of memory may allow you to run yourapplication, it may not be largeenough to allow for optimal performance. Memory management is a complex topic. Memory, its relationshipto swap space, and its effect on performance are discussed in more detail in the section "UnderstandingMemory and Swap" later in this document. Again, cost must be weighed versus benefits; certainly you canspend the money to configure a system with enough memory to allow your application to be run in memory,but depending on the application, the cycle time savings may not be worth it.

Disk

Sometimes data can be quite large. Disk I/O is often a performance bottleneck. Other than the obvious effectson data loading bandwidth, disk I/O can also be the limiting factor in overall performance if a system startspaging.


Hardware Considerations 2

HP's philosophy is to design balanced systems in which no single component becomes a performancebottleneck. HP has made significant enhancements to I/O performance in order to keep pace with the speed ofour CPUs. I/O performance depends on several parts of the system working together efficiently. The I/Osubsystems have been redesigned so that they now offer the industry's fastest and most functional I/O asstandard equipment.

To improve disk I/O performance:

Distribute the work load across multiple disks. Disk I/O performance can be improved by splitting the workload. In many configurations, a single drive must handle operating system access, swap, and data file accesssimultaneously. If these different tasks can be distributed across multiple disks then the job can be shared,providing subsequent performance improvements. For example, a system might be configured with fourlogical volumes, spread accross more than one physical volume. The HP−UX operating system could exist onone volume, the application on a second volume, swap space interleaved across all local disk drives and datafiles on a fourth volume.

Split swap space across two or more disk volumes. Device swap space can be distributed across disk volumesand interleaved. This will improve performance if your system starts paging. This is discussed in more detailin the section on Swap Space Configuration later in this document.

Enable Asynchronous I/O − By default, HP−UX uses synchronous disk I/O, when writing file system "metastructures" (super block, directory blocks, inodes, etc.) to disk. This means that any file system activity of thistype must complete to the disk before the program is allowed to continue; the process does not regain controluntil completion of the physical I/O. When HP−UX writes to disk asynchronously, I/O is scheduled at somelater time and the process regains control immediately, without waiting.

Synchronous writes of the meta structures ensure file system integrity in case of system crash, but this kind ofdisk writing also impedes system performance. Run−time performance increases significantly (up to roughlyten percent) on I/O intensive applications when all disk writes occur asynchronously; little effect is seen forcompute−bound processes. Benchmarks have shown that load times for large files can be improved by asmuch as 20% using asynchronous I/O. However, if a system using asynchronous disk writes of metastructures crashes, recovery might require system administrator intervention using fsck and, might also causedata loss. You must determine whether the improved performance is worth the slight risk of data loss in theevent of a system crash. A UPS device, used in a power failure event will help reduce the risk of lost data.

Asynchronous writing of the file system meta structures is enabled by setting the value of the kernelparameter fs_async to 1 and disabled by setting it to 0, the default. For instructions on how to configure kernelparameters, see the section Kernel Configuration Parameters later in this document.

You may want to use a RAID (Redundant Array of Inexpensive Disks) configuration for reliability. MostRAID configurations do not perform as well as non−RAID configurations, but the reliability gains may beworth it.

Graphics and Color Mapping

Many tools use 2−D graphics, and are X11 based. Thus, a platform's X11 performance is key to maximizingthe graphics performance of these applications. This can be measured with the standard benchmark xmark93.


Disk 3

Network

Many installations are client/server networks, primarily because of the need for shared data and massiveamounts of on−line storage. Therefore, the network configuration can be, and usually is critical to the overallperformance and throughput. Most current networks are ethernet−based, which, when combined with a 700class machine may create an unbalanced situation. For example, a single HP 735 can almost saturate a singleethernet wire under the right conditions. See the section labeled Networking later in this document for tuningand configuration guidelines for ethernet networks. You can, of course, upgrade to Fast Ethernet, FDDI,ATM, or other faster network technology if you have the money.

Understanding Memory and Swap

There is a lot of confusion regarding cache memory, configuration of swap space, swap's relationship tophysical memory, kernel parameters affecting memory allocation, and performance implications. If there wasa simple formula, this would be easy. However, this is not the case. It is important to understand memory inorder to understand these settings and how to determine optimal settings for a given situation.

Memory Management

HP−UX memory management system is composed of 3 basic elements: Cache, memory and swap space.Swap space can be composed of two types: device swap space and file system swap space. Device swap spacecan be made up of primary swap space that is defined on the root file system disk drive and secondary swapspace which is defined on the remaining disk volumes. All of these memory elements can be optimizedthrough HP−UX kernel parameter tuning or application compile.

The data and instructions of any process (a program in execution) must be available to the CPU by residing inphysical memory at the time of execution. RAM, the actual physical memory (also called "main memory"), isshared by all processes. To execute a process, the HP−UX kernel executes through a per−process virtualaddress space that has been mapped into physical memory.

The term "memory management" refers to the rules that govern physical and virtual memory and allow forefficient sharing of the system's resources by user and system processes.

Memory management allows the total size of user processes to exceed physical memory by using an approachtermed demand−paged virtual memory. Demand paged virtual memory enables you to execute a process bybringing into main memory parts of the process only as needed, that is, on demand, and pushing out to disk,parts of a process that have not been recently used.

The HP−UX operating system uses paging to manage virtual memory. Paging involves moving small units(called pages) of a process between main memory and disk swap space.

One method for increasing the efficiency of memory allocation within memory management is the usage ofthe mallopt command before each malloc call within the EDA application code. This command is unique toHP−UX and controls the memory allocation algorithm and other optimization options within the malloclibrary. Usage of this option can improve application execution time up to 10X depending on the data size. Itis important that the Maxfast and Numlblks options (i.e. the first two options to mallopt) be defined to reflectthe data size links being accessed.


Network 4

Physical Memory

Physical memory is composed of hardware known as RAM (also called SIMM's, DIMM's, etc...). For theCPU to execute a process, the relevant parts of a process must exist in the system's RAM.

The more main memory in the system, the more data it can access and the more or larger a process(es) it canexecute without having to page. This is because the system can retain more processes in main memory, thusrequiring the kernel to page less frequently. Each time the system has to page there is a performance cost sincethe speed of reading or writing from/to disk is much slower than accessing memory.

Not all physical memory is available to user processes. The kernel occupies some main memory (that is, it isnever paged).The amount of main memory not reserved for the kernel is termed available memory. Available memory isused by the system for executing processes.

Secondary Storage

Main memory stores computer data required for program execution. During process execution, data resides intwo faster implementations of memory found in the processor subsystem, registers and cache. Program filesare kept in secondary storage or secondary memory, typically disks accessible either via system buses ornetwork. Data is also stored when no longer needed in main memory, to make room for active processes.

Swap

A temporary form of secondary data storage is termed swap, dating from early UNIX implementations thatmanaged physical memory resources by moving, i.e. swapping, entire processes between main memory andsecondary storage. HP−UX uses paging, a more efficient memory resource management mechanism. It shouldbe noted that HP−UX does not "swap" any more, it pages and, as a "last resort" deactivates processes. Theprocess of deactivation replaces what was formerly known as swapping entire processes out.

While executing a program, data and instructions can be paged (copied) to and from secondary storage, ordisk, if the system load warrants such behavior.

Swap space is initially allocated when the system is configured. HP−UX supports two types of swap space:device swap space and file system swap space. Device swap is allocated on the disk before a file system hasbeen created and can take the following forms:

an entire disk• a designated area on a disk• a software disk−striped partition on a disk•

If the entire disk hasn't been designated as swap, the remaining space on the disk can be used for a file system.File−system swap space is allocated from a mounted file system and can be added dynamically to a runningsystem. If more swap space is required, it can be added dynamically to a running system, as either deviceswap or file−system swap.

Note that file−system swap has significantly lower performance than device swap as it must use separateread/write requests for each page block and has a smaller page swapping size than used in device swap. TheI/O for file system swap will contend with user I/O on that file system, which will cause performance todegrade. File system swap space usage should be avoided.


Physical Memory 5

Either Sam or the swapon command can be used to enable disk space or a directory in a file system for swap.

NOTE: Once allocated, you cannot remove either type of swap without rebooting the system. HP−UX alsouses a early swap space reservation method to make sure it has space available but it only allocates the spacewhen it actually needs to write to it.

Virtual Address Space

Virtual memory uses a structure for mapping processes termed the virtual address space. The virtual addressspace contains information and pointers to the memory that the process can reference.

One virtual address space (vas) exists per process and serves several purposes:

It provides the overall description of each process.• It contains pointers to another element in the memory management subsystem − per−process regions.(pregions)

•

It keeps track of pregions most recently involved in page faults.•

Each HP−UX process executes within a 4 Gb virtual address space (this may change in the near future). Thevirtual address space structure points to per−process regions, or pregions. Pregions are logical segments thatpoint to specific segments of a process, including code (text, or process instructions), data, u_area and kernelstack, user stack, shared memory segments and shared library code and data segments.

The size of various memory segments is controlled by the values assigned to certain configurable kernelparameters. It is beyond the scope of this paper to discuss all the process virtual memory segments. Thefollowing, however, is a description of the segments most relevant to this discussion.

Text − The text segment holds a process's executable object code and may be shared bymultiple processes. The maximum size of the text segment is limited by the configurableoperating−system parameter maxtsiz.

Data − The data segment contains a process's initialized (data) and uninitialized (.bss) datastructures, along with the heap, private "shared" data, "user" stack, etc. A process candynamically grow it's data space. The total allotment for initialized data, uninitialized dataand dynamically allocated memory (heap) is governed by the configurable kernel parametermaxdsiz.

Stack − Space used for local variables, subroutine return addresses, kernel routines, etc. Theu_area contains information about process characteristics. The kernel stack , which is in theu_area, contains a process's run−time stack while executing in kernel mode. Both the u_areaand kernel stack are fixed in size. Space available for remaining stack use is determined bythe configurable parameter maxssiz.

Shared Memory − Address space which is sharable among multiple processes.

Configurable Parameters

HP−UX configurable kernel parameters limit the size of the text, data, and stack segments for each individualprocess. These parameters have pre−defined defaults, but can be reconfigured in the kernel. Some may needto be adjusted when swap space is increased. This is discussed in more detail in the section on configuring theHP−UX kernel.


Swap 6

bufpagescreate_fastlinksfs_asynchpux_aes_overridemaxdsizmaxfilesmaxfiles_limmaxssizmaxswapchunksmaxtsizmaxuprcnetmemmaxnfileninodenprocnpty

Sets number of buffer pagesStore symbolic link data in the inodeSets asynchronous write to diskControls directory creation on automounted disk drivesLimits the size of the data segment.Limits the soft file limit per processLimits the hard file limit per processesLimits the size of the stack segment.Limits the maximum number of swap chunksLimits the size of the text (code) segment.Limits the maximum number of user processesSets the network dynamic memory limitLimits the maximum number of "opens" in the systemLimits the maximum number of open inodes in memoryLimits the maximum number of concurrent processesSets the maximum number of pseudo ttys

The four GB virtual address space is divided into four one−GB quadrants. Each quadrant has associated withit:

The first quadrant always contains the process's text segment (code), and sometimes some of the data(EXEC_MAGIC).

•

The second quadrant contains the data segment (static data, stack, and heap, etc.).• The third quadrant contains shared library code, shared memory mapped files and sometimes sharedmemory.

•

The fourth quadrant contains shared memory segments, shared memory−mapped files, shared librarycode, and I/O space.

•

Physical Memory Versus Performance

The amount of memory available to applications is determined by the amount of swap configured plusphysical memory. The size of physical memory determines how much paging will be done while applicationsare running. Paging imposes a performance penalty because pages are being moved between physical memoryand secondary storage, or disk. The more time that is spent paging, the slower the performance. There is acritical threshold for physical memory size below which the system spends almost all its CPU time paging.This is known as thrashing and is evident by the fact that system performance virtually comes to a standstilland even simple commands, like ls, take a long time to complete.

Optimally, all operations would be done in physical memory and paging would never occur. However,memory costs money, so there is usually a tradeoff made between budgetary constraints and the minimumacceptable performance level. Understanding how memory size affects performance can help you make sureyou are maximizing your expenditure on memory. One thing to keep in mind is that memory needs are alwayschanging and the base system configuration will need to be constantly addressed. HP's Glance/GlancePlus is agood application that will help you address and resolve memory versus performance issues.

Where Is The Memory Going?

To help you understand the minimum memory configuration you should consider, it helps to understand howmemory is consumed. On a system, you will minimally have the following memory consuming resources:


Configurable Parameters 7

HP−UX Operating System 10−12 MB• Windowing System 21 MB (X11) 25 MB (VUE) 32 MB (CDS) •

Any other processes or services running on the system will consume additional memory resources. As you cansee, if you add these up, before you even load the first part, you are already consuming approximately 50Mbof memory. This isn't quite as straightforward as it seems, however. HP−UX uses a paging algorithm to movedata in and out of physical memory. The only data that isn't subject to paging is HP−UX itself. Out of the25Mb of executable code in VUE, you will not be using all of it at any given time. Since code will beoverwriten if it isn't used, and there are many functions in VUE that you may seldom or never use, there issome percentage of the executable code that will never be paged in. This same behavior applies toapplications. For example, an application that involves significant disk I/O or LAN activity, followed byintensive CPU activity.

Determining Appropriate Physical Memory Size

There are a couple of ways to determine whether the amount of physical memory in your system is adequate.The first is to run a series of timed benchmarks on systems with increasing levels of physical memory anddetermine the impact of additional memory on those operations. Another way is to use one of HP'sperformance tools to monitor the system operation. It will tell you how much paging is occurring, if any.

If you plot memory size versus time to perform an typical operation in an application, you will get a dog−legshaped curve for most operations. This means that performance increases on a fairly steep curve as memorysize is increased up to a point. Beyond that point, the curve flattens out and adding additional memory will notsignificantly improve performance.

The ideal memory configuration is one that falls on the breakpoint. If your memory is less than the breakpoint,you are not getting all the performance you could from your system. The performance breakpoint variesdepending on the operation being performed in combination with the data set used. The only accurate way todetermine the optimal memory size is to perform timed benchmarks using real data.

HP−UX Configuration

This section explains HP−UX configurable software settings and parameters that affect system capacityand/or performance. Most of this section is common for HP−UX 9.X and HP−UX 10.X. Specific differencesare noted.

Swap Configuration

How much swap do I have?

SAM, Glance/GlancePlus, top, and swapinfo all show swap information. To see how much swap space isconfigured on your system, and how much is in use, execute one of the following commands:

top• Glance/GlancePlus• sam requires root passwd• /etc/swapinfo −t HP−UX 9.X systems and requires root login• /usr/sbin/swapinfo −t HP−UX 10.X systems and requires root login•


Physical Memory Versus Performance 8

Any user can execute top and Glance. The program sam and command swapinfo both require root privilege.This is because these commands must open the kernel memory file /kmem to read the swap usage information. Since this is a critical operating system file, access is usually restricted to root only.

How Much Swap Do I need?

The amount of swap available determines the maximum address space, or virtual memory, available forapplications . The minimum recommendation is twice as much swap space as physical memory. If swap is toosmall, and you try to load something that exceeds available swap you will get an out of memory error. If youconfigure more swap than you will ever need, you are wasting valuable disk space. The correct swap size willvary considerably depending on the application(s) run on a system.The optimal swap configuration may varybetween individual users and/or systems. However, optimizing swap on a user to user basis is not advised. Acommon swap size for systems should be resolved for ease of supportability and maximum long−term designflexibility.

The correct swap space configuration for your site can only be accurately determined by monitoring swapusage while working with real data. This could be done either with the swapinfo command or using a tool likeHP's GlancePlus. GlancePlus allows you to monitor system resources on a per process basis and will trackhigh water marks over a period of time. You would configure a system with more swap than you expect toneed and then run GlancePlus while running an application in a real work environment. By monitoring thehigh water mark, you can determine the maximum swap space used and adjust the swap configurationaccordingly. Obviously, if you experience out of memory errors, swap space is too small.

Swap space should not be less than the amount of physical memory in your system.

NOTE: For best performance, swap space should be distributed evenly across all disks at the same priority .There are two types of swap space in HP−UX, device and file system. Device swap provides much betterperformance because it utilizes the raw disk I/O. File system defined swap space should be avoided.

Configuring Swap Space

As mentioned previously, device swap is preferred over file system swap to achieve the best performance. Theideal swap configuration is device swap interleaved on two or more disks. When device swap is interleaved on2 or more disks, the system alternates between the disks as paging requests occur, providing betterperformance than a single disk.

SAM is the easiest method for adding and configuring swap space. Swap configuration is under the Disks andFile System area of SAM. For more information on configuring swap, please see the on−line Help sectionwithin SAM's Swap Configuration.

Kernel Configuration Parameters

BufpagesBufpages specifies how many 4096−byte memory pages are allocated for the file system buffer cache. Thesebuffers are used for all file system I/O operations, as well as all other block I/O operations in the system (exec,mount, inode reading, and some device drivers.).

In HP−UX 10.X, we highly recommend this kernel parameter be set to 0. This will enable dynamic buffercache which has been changed in the 10.X OS.


Swap Configuration 9

In HP−UX 9.X, we do NOT recommend using dynamic buffer cache. A fixed buffer cache can be specifiedby setting bufpages to a non−zero value, for example, 4096 and nbuf to 0. This will set 2048 buffer headersand allocate 16 Kb of buffer pool space at system boot time. If you wish to reserve 10% of physical memoryfor the file system buffer cache, the value can be calculated as:

bufpages = (.1 * ((physical memory in Mb) / (pagesize in 4096 bytes)) ).Create_FastlinksCreate_fastlinks tells the system to store HFS symbolic link data in the symbolic link's inode. This reducesdisk space usage and speeds things up. By default, this feature is disabled for backward compatibility. Werecommend all systems have create_fastlinks enabled by setting this kernel parameter to 1.

Dbc_Max_PctThis parameter determines the percentage of main memory that the dynamically allocated buffer cache isallowed to grow to. As the system will use as much memory as it can for buffer cache, when performingintense block I/O, this becomes the size of the buffer cache on a system that is not feeling memory pressuredue to process invocations. The problem arises when memory stress due to process space requirementsrequires the system to start paging, at which point, the system tries to reclaim buffer cache pages to allocatethem to running processes. But the system is also trying to allocate as much buffer cache as it can, causing avicious cycle of allocating and deallocating memory between buffer cache and process memory space,creating a large amount of overhead.

The idea then is to keep this number resonably low, allowing you to have the cache space but also keep theapplication space large enough to avoid high levels of conflict between them. The default value is 50%, butwe recommend 25% to start. We have seen systems that need buffer cache to have a max of as little as 5%,with a min at 2%. This is something that requires careful attention, with appropriate modification.

If this form of thrashing in main memory becomes an increasing problem, the only good fix is to purchasemore physical memory.

Fs_AsyncThis kernel parameter controls the switch between synchronous or asynchronous writes of file system metastructures to disk. Asynchronous writes to disk can improve file system I/O performance significantly.However, synchronous writes to disk make it easier to restore file system integrity if a system crash occurswhile file system meta structures are being updated on the file system. Depending on the application, you willneed to decide which is more important. The decision should be based on what types of applications are goingto be run. You may value file system integrity more than I/O speed. If so, fs_async should be set to 0.

HPUX_AES_OverrideThis value is part of the OSF/AES compliance. It controls directory creation on automounted disk drives. Werecommend hpux_aes_override be set to 1. If this value is not set, you may see the following error message:

mkdir: cannot create /design/ram: Read−only file system.This system parameter cannot be set using SAM. The kernel must be manually modified the old way. It is bestto modify the other parameters with SAM first and then change this parameter second, else SAM will overrideyour 'unsupported' value with default.

MaxdsizMaxdsiz defines the maximum size of the data segment of an executing process. The default value of 64 Mb istoo small for most applications. We recommend this value be set to the maximum value of 1.9Gb. If maxdsizis exceeded by a process, it will be terminated, usually with a SIGSEGV (segmentation violation) and youwill probably see the following message:


Kernel Configuration Parameters 10

Memory fault(coredump)In this case, check out the values of maxdsiz, maxssiz and maxtsiz. For more information on these parameters,please see the on−line Help section within SAM's Kernel Configuration. If you need to exceed the specifiedmaximum of 1.9Gb, there are a couple of ways (yet to be supported) to do so. Contact your Hewlwett Packardtechnical consultant for the details. It is important to note that the maxdsiz parameter must be modified inorder for these procedures to work. Maxdsiz will need to be set to 2.75Gb or 3.6Gb depending on the methodchosen and/or size required.

MaxfilesThis sets the soft limit for the number of files a process is allowed to have open . We recommend this value beset to 200.

Maxfiles_LimThis sets the hard limit for number of files a process is allowed to have open . This parameter is limited byninode. The default for this kernel parameter is 2048.

MaxssizMaxssiz defines the maximum size of the stack of a process. The default value is 8Mb. We recommend thisvalue be set to a value of 79 Mb.

MaxswapchunksThis (in conjunction with some other parameters) sets the maximum amount of swap space configurable onthe system. Maxswapchunks should be set to support sufficient swap space to accommodate all swapanticipated. Also remember, swap space, once configured, is made available for paging (at boot) by specifyingit in the file /etc/fstab. The maximum swap space limit is calculated in bytes is: (maxswapchunks * swchunk *DEV_BSIZE). We recommend this parameter be set to 2048.

MaxtsizMaxtsiz defines the maximum size of the text segment of a process. We recommend 1024 MB.

MaxuprcThis restricts the number of concurrent processes that a user can run. A user is identified by the user IDnumber and not by the number of login instances. Maxuprc is used to keep a single user from monopolizingsystem resources. If maxuprc is too low, the system issues the following error message to the user whenattemting to invoke too many processes:

no more processesWe recommend maxuprc be set to 200.

MaxusersThis kernel parameter is used in various algoritms and formulae throughout the kernel. It is used to limitsystem resource allocation and not the actual number of users on the system. It is also used to define thesystem table size. The default values of nproc, ncallout, ninode and nfile are defined in terms of maxusers. Weare recommend fixed values for nproc, ninode and nfile. Set maxusers to 124.

NetmemmaxThis specifies how much memory can be used for holding partial internet−protocal(IP) messages in memory.They are typically held in memory for up to 30 seconds. The default of 0 allows up to 10% of total memory tobe used for IP level reassembly of packet fragments. Values for netmemmax are specified as follows:


Kernel Configuration Parameters 11

Value Description

−1 No limit, 100% of memory is available for IP packet reassembly.

0 netmemmax limit is 10% of real memory.

>0Specifies that X bytes of memory can be be used for IP packet reassembly.The minimum is 200 Kb and the value is rounded up to the next multiple of pages(4096 bytes).

If system network performance is poor, it might be because the system is dropping fragments due toinsufficient memory for the fragmentation queue. Setting this parameter to −1 will improve networkperformance, but, at the risk of leaving less memory available for processes. We recommend it be set to −1 forsystems acting as data servers only. For all other systems, we recommend a setting of 0.

NfileNfile sizes the system file table. It contains entries in it for each instance of an open of a file. It thereforerestricts the total number of concurrent "opens" on your system. We suggest that you set this at 2800. Thisparameter defaults to ((16 * (nproc + 16 + maxusers) / 10 ) + 32 + 2 * npty). If a process attempts to open onemore (than nfile) file, the following message will appear on the console:

file: table is full

When this happens, running processes may fail because they cannot open files and no new processes can bestarted.

NinodeNinode sizes the incore inode table, also called the inode cache.For performance, the most recently accessedinodes are kept in memory. Each open file has an inode in the table. An entry is made in the table for each"login directory", each "current directory", each mount point directory, etc. It is recommended that ninode beset to 15,000.

NprocNproc sizes the process table. It restricts the total number of concurrent processes in the system.When someone/process attepmts to start one more (than nproc) process, the system issues these messages:

at console window : proc: table is fullat user shell window: no more processesSet nproc to 1024.

NptyThis parameter limits the number of master/slave pty data structures that can be opened. These are used bynetwork programs like rlogin, telnet, xterm, etc. We recommend this parameter be set to 512.

Configuring Kernel Parameters

The following are the suggested kernel parameter values.

Value


Configuring Kernel Parameters 12

# Parameter#bufpages

create_fastlinksdbc_max_pctfs_asyncmaxdsizmaxfilesmaxfiles_limmaxssizmaxswapchunksmaxtsizmaxuprcmaxusersnetmemmax

nfileninodenprocnpty

0 # on HP−UX 10.X4096 # on HP−UX 9.X125120638064642002048(383*1024*1024)4096(1024*1024*1024)200124 0 # on desktop systems−1 # on data servers2800150001024512

Configuring Kernel Parameters in 9.X

In HP−UX 9.X we recommend manual kernel configuration. All work related to creating a new kernel in 9.Xtakes place in the /etc directory. You will copy the old kernel configuration file, dfile, into an new name.Modify the dfile. Run make to build the new kernel. Then copy the new kernel file into place after saving theold kernel.

cd /etc/• cp dfile dfile.old• vi dfile• Modify the dfile to include the kernel parameters and values suggested above.• config dfile• make −f config.mk• mv /hp−ux /hp−ux.old• mv /etc//hp−ux /hp−ux• cd / ; shutdown −h 0•

Note: For more information on manual kernel configuration, please see the HP−UX System Administration"How To" Book

Configuring Kernel Parameters in 10.X

In HP−UX 10.X we recommend first manually modifying the kernel parameter hpux_aes_overide and thenmodifying the other kernel parameters in SAM by using a tuned parameter set. The hpux_aes_override kernelparameter is the only recommended parameter that must be modified manually. The other parameters couldthen be updated with SAM or modified manually along with hpux_aes_override. We recommend using SAMto take advantage of its built−in kernel parameter rule checker.


Configuring Kernel Parameters in 9.X 13

To configure a kernel manually, you must be root.

All work related to creating a new kernel in 10.X takes place in the /stand/build directory. You will create anew kernel configuration file, after moving the existing configuration file, system, into a new name. Runmk_kernel to build the new kernel and copy the new kernel file into place after saving the old kernel (asanother name). Then reboot the system

cd /stand/build• /usr/lbin/sysadm/system_prep −s system• vi system• Either add or modify the entries to match:• hpux_aes_override 1• mk_kernel −s system• mv /stand/system /stand/system.prev• mv /stand/build/system /stand/system• mv /stand/vmunix /stand/vmunix.prev• mv /stand/build/vmunix_test /stand/vmunix• cd / ; shutdown −h 0•

Note: For more information on manual kernel configuration, please see the HP−UX 10.X SystemAdministration "How To" Book. .

To configure the remaining kernel parameters with SAM, follow these steps:

Login to the system as root• Place the list of kernel parameter values above in the file:• /usr/sam/lib/kc/tuned/stuff.tune•

(The first line should be "STUFF Applications" in the format shown in the general "ConfiguringKernel Parameters" section above.)

Start SAM by typing the command: sam• With the mouse, double−click on Kernel Configuration .• On the next screen, double−click on Configurable Parameters.• SAM will display a screen with a list of all configurable parameters and their current and pendingvalues. Click on the Actions selection on the menu bar and select Apply Tuned Parameter Set ... onthe pull−down menu. Select STUFF Applications from the list and click on the OK button.

•

Click on the Actions selection on the menu bar and select Create A New Kernel. A confirmationwindow will be displayed warning you that a reboot is required. Click on YES to proceed.

•

SAM will build the new kernel and then display a form with two options:• Move Kernel Into Place and Reboot the System Now♦ Exit Without Moving the Kernel Into Place♦ If you select the first option and then click on OK, the new kernel will be moved into placeand the system will be automatically rebooted.

♦

If you select the second option move the kernel from the /stand/build directory into the/stand/vmunix

♦

Networks


Configuring Kernel Parameters in 10.X 14

Network configuration can also have an impact on performance. Virtually all installations use some form oflocal area network to facilitate sharing of data files and to simplify system management. Most installations useNFS to mount remote file systems so they appear local to the user. This enables the user to access data fromany disk on the network as easily as from a local disk. This imposes a performance penalty, however, becausethe I/O bandwidth for accessing data on an NFS mounted disk is less than that for a directly connected disk.There are a few system configuration recommendations that can be made to maximize the convenience thatNFS and the local area network provide while minimizing the performance penalty.

Patches. Always install the latest HP−UX NFS patch. HP periodically releases patches that correctproblems associated with NFS, many of them performance related. If you are using NFS, you shouldmake sure the latest patch is installed on both the client and server. See the PATCHES section formore details. General HP−UX patch information can be found on http://us−support.external.hp.com.

•

Local vs. Remote. You will need to determine what things are located remotely, and which should belocal. From a system administration viewpoint, the most convenient scenario is to have applications,data, home directories, and basically anything anyone cares about on a central NFS file server whichis backed up regularly. That server is then accessed by multiple clients, which are typicallyworkstations with a minimal amount of local disk for OS and swap, and are not backed up. At theother extreme, for maximum performance it is best to have no network access whatsoever and keepeverything on local disks. Between those two extremes there are a continuum of options, all of whichhave associated tradeoffs.

•

Subnetting. In general, it is a bad idea to have too many systems on a single wire. Implementation of aswitched ethernet configuration with a multi host server or a server backbone configuration canpreserve existing wiring while maximizing performance. If you are doing rewiring, seriously considerusing fiber for future upgradability.

•

Local paging. When applications are located remotely, one trick you can use is to set the "sticky bit"on the applications binaries, using the chmod +t and find commands. This forces the system to pagethe text segment to the local disk, improving performance. Otherwise, it is paged across the network.Of course, this would only apply when there is actual paging occurring. More recently, there is akernel parameter, remote_nfs_swap, when set to 1 will accomplish same.

•

Demand loading. Previous versions of this document have setting the demand loading bit on binariesusing the chatr command. There's been some controversy over this; empirical data has shown that itdoes make a difference, while some information has been found stating that there is no differencebetween demand loadable binaries and shared ones. The current conclusion is that there is indeed adifference and that it may be beneficial to lessen startup times by setting the demand loading bit asdescribed.

•

File locking. Make sure the revisions of statd and lockd throughout the network are compatible; ifthey are out of synch, it can cause mysterious file locking errors. This particularly affects user mailfiles and Korn shell history files.

•

NFS configuration. On NFS servers, a good first order approximation is to run two nfsd processes perdisk. The default is four total, which is probably not enough on a server. On 9.x systems, too manynfsd processes can cause context switching bottlenecks, because all the nfsds are awakened any time arequest comes in. On 10.x systems, this is not the case and you can safely have extra nfsd processes.Start with 30 or 40 nfsd's. On NFS clients run sixteen biod processes. In general, HP−UX 10.X hasmuch better NFS performance than previous versions.

•

Design the lan configuration to minimize inter segment traffic. To accomplish this you will have toensure that heavily used network services (NFS, licensing, etc.) are available on the same localsegment as the clients being served. Avoid heavy cross segment automounting.

•

Maximize the usage of the automounter. It allows you to centralize administration of the network andalso greater flexibility in configuring the network.. Avoid the use of specific machine names whichmay change over time in your mount scheme; force mount points that make sense. /net ties you to aparticular server, which may change over time.

•


Networks 15

http://us.external.hp.com:80/patches/html/patches.html

You can watch the network performance with Glance, the netstat command, and the nfstat command.There are other tools like NetMetrix or a LAN analyzer to watch lan performance. Additionally, youcan use the HP products PerfView Software/UX and HP MeasureWare/UX to collect data over timeand analyze it. You may want to tune the timeo and retrans variables. For HP systems, small numbers4 for retrans and 7 for timeo are good. The default values for wsize and rsize, 8K, are almost alwaysappropriate. Do NOT use 1024 unless talking to an Apollo system running NFS 2.3 on SR10.3. 8K isappropriate for 10.4 Apollos running NFS 4.1.

•

Explore using dedicated servers for computing, file serving, and licensing. A good scenario has agroup of dedicated servers connected with a fast "server backbone", which is then connected to anethernet switch, which is itself connected to the desktop systems.

•

Flexlm Licensing

Some EDA applications use FlexLM, a commonly used UNIX licensing scheme. Some things you may wantto be aware of:

Licensing can generate significant network traffic. Some EDA applications perform a "breath of life"license check periodically. This varies from application to application; some intervals are as short as40 seconds.

•

In heavy usage mission critical situations, configure three machines to be your redundant licensecluster, and make licensing the only thing running on those machines. They can be smallworkstations, for example, but don't bog them down with NFS or other services.

•

You can mix license files from many vendors and use a single server or cluster to serve them. Thevendors must support Flex 2.2 or above, and you must use the LM_LICENSE_FILE.

•

There is NO FlexLM performance benefit in node−locked licenses; the server is still contacted forlicense checkin and checkout.

•

You will want to follow the following order in the license file: node−lock multilicense lines,node−lock single license lines, floating multilicense lines, floating single license lines.

•

You must call the vendor hotline and get a new license file if you want to either change the nodeassociated with a node lock license or change servers.

•

By default the device file /lan0 is overprotected for FlexLM usage.; it is set to rw−−−−−−−. This mustbe changed for FlexLM to work. rw−r−−r−− is appropriate. This has been fixed at 10.x. The symptomhere is that the user root can execute applications successfully, but an ordinary user cannot.

•

X Terminal Configuration

Many EDA sites are moving to X terminal (or "X station" in HP talk) configurations. Here are someguidelines regarding these configurations:

Server memory. You will need 64Mb to start, and 24−48Mb for each X terminal to be serveddepending on the application. The more memory, the better. Swap space configuration should fallalong the same lines as other systems, just all on the server.

•

X terminal memory. 18Mb minimum. This allows efficient usage of fonts.• Server kernel configuration. Set maxusers to 64. Set nptys to 512.• Networking. Try and keep X terminal traffic away from critical NFS traffic on the network.• Use NFS to load the server files; it's faster than TFTP.• Font paths. You may have to hardwire the paths to the EDA vendor specific fonts in the setup screen.Or set up a font server.

•


Flexlm Licensing 16

Patches

Since patch numbers change frequently, it is recommended that you always check for the latest information.Here are some general recommendations:

If you are using dynamic buffer cache on a 9.x system, load the latest kernel patch that mentionsdynamic buffer cache. These patches limit the growth of the buffer cache to half of physical memory,and also modify cache management algorithms to be more efficient. These are not needed on 800systems (in 9.X), or systems not using dynamic buffer cache.

•

Always load the latest kernel megapatch, ARPA transport patch, NFS/automounter patches,statd/lockd patches, and SCSI patch. Many performance and reliability improvements can be had.

•

Load the latest C compiler and linker. The linker in particular is required for 9.01 systems.• Load HP−VUE or CDE, and X/Motif patches at your discretion. Generally these are bug fixes.• Almost always load the latest X server. Many display issues have been solved in the past by loadingthe latest X server. There have been isolated instances in the past of a new X server causing problemswith EDA applications, though. When in doubt, call the hotline.

•

How to get patches. If you have WWW access go to http://us−support.external.hp.com, and follow the linksto the patch list. This is also a good way to browse the latest patch list. You can also get patches by e−mail. Ifyou know what the name of the patch you want is, send a message to support@@support.mayfield.hp.com,with the text "send patchname". Don't forget to substitute the name of the patch you want for "patchname".You can get a current list by sending the text "send patchlist". To get a complete guide on using the mailserver, send the text "send guide". If the customer has HP SupportLine access, then patches can be requestedfrom the HP SupportLine at (800)633−3600, and are also available for FTP access.

How to tell what patches are loaded. First scan the directory /etc/filesets (9.x) systems, or use the swlistcommand (10.x). Patches are named PHxx_nnnn, where xx can be KL, NE, CO, or SS. nnnn refers to thepatch number, which is always unique no matter what PHxx category is specified. If a patch has been loadedon a 9.x system, a file will exist in /etc/filesets, with the same name as the patch. If a patch has been loaded ona 10.x system, the patch should be listed in the output of swlist.

How to load patches. Patches are shipped as shell archives, named after the patch. To unpack the shellarchive, enter sh filename where filename is the path to the patch shell archive. You will end up with twofiles, a .text file and a .updt file. The .text file has detailed information about the patch. The .updt file is theactual patch source. You can install the patch with /etc/update on 9.X, either in command line mode orinteractive mode. Use the following command line:/etc/update −s/pathname−to−updt−file −S700 −r \*

You must specify either −S700 or −S800. The −r allows a kernel rebuild and reboot if you are installing akernel patch, so be prepared to reboot the system.

Using interactive mode, point to the patch file as if it were a tape device in the "Change Source orDestination" menu, then have at it.

Make sure you are in single user mode when installing any patch.

To install a patch on a 10.X system, use the following command line:swinstall −x autoreboot=true −x match_target=true −s /pathname−to−depot−file


Patches 17

You can install multiple patches at a time by creating a netdist area that contains the patches using /etc/updist,or by specifying a list of patches in a file using the −f switch.

Patch management. Patch management can be a fulltime job for a large site. HP recommends that large sitesthat don't want to tackle that particular task purchase the PSS support option. This service provides aconsultant who, among other things, provides patch management. It's well worth the money.

How to make a patch tape. On a 9.x system, you can use dd to make a patch tape as follows:dd if=/pathname−to−updt−file of=/rmt/0m bs=2k

On a 10.x system, use the following command:swpackage −s /pathname−to−depot −x target_type=tape −d /rmt/0m patchname

Performance Tips

Kernel ParametersMost, if not all of the kernel parameter tuning has been covered in the preceding sections of this document.Any additional/future parameters will appear here.

File SystemsWhen using UFS (HFS) file systems, configure them with a block size of 64K and a fragment size of 8K. HFSfile systems have historically preferred to perform I/O in 64K block sizes. I have improved performance byusing a VxFS (JFS) file system when it is being used as a "scratch" file system...a file system that you do notcare about when the application crashes, or when it completes successfully. When doing so, you need tomount this file system with three specific options in order to gain performance. They are:

nolog• mincache=tmpcache• convosync=delay•

The on−line (advanced) JFS product is required to use these options. In my experience, the JFS block size isof no consequence when using JFS. JFS likes to perform I/O in 64K chunks, regardless of the block size.Supported block sizes are 1, 2, 4, and 8K. There is no fragment on a JFS file system.

When striping with LVM, one should make sure that the file system block size and the LVM stripe size areidentical. This will aid performance.

When mounting file systems, they should be positioned at mount points that are as close to the "root" of thetree. This will help "shorten" directory search paths. It is very important that file systems that contain "tools"that will be used by the application(s), be mounted as close to the top as possible.

As of the latest revision (2.0) of this document, there is a JFS "mega patch" for performance. The patchnumber is PHKL_12901 for 700's and PHKL_12902 for the 800's.

Logical Volume ManagerThe following are simply recommendations...you do not have to do them. Obviously, there are pros and conswith everything. This is not the forum for this type of discussion, so, here they are. Use as many physicaldisks as possible. Stripe them if you can. If you have followed the file system recommendation of using a 64Kblock size, use a 64K stripe size as well. I would suggest a 64K stripe size for LVM anyway. Hopefully, youwill have identical disks (make, model, size, geometry, etc.). When you have control, place your logical


Performance Tips 18

volumes so that the "pieces" a logical volume are located in the same place accross the physical devices. Forexample, having four physical devices, you "stripe" a logical volume so that 25% of appears on each of thefour disks, and, each piece appears at the "top" of the disk.

Startup ProgramI have noticed very many customers and ISV's using the C shell as a startup. This might be OK on other"variations" of UNIX, but does not fare as well on HP−UX (due to the implementation) as the K shell orPOSIX shell. When a process forks many children, the .cshrc file is "fired up" and executed for each fork. Ihave seen some of these files that are extremely long AND they source files that source other files, and so on.This is very time consuming and degrades performance. If possible, do not use the C shell.

The PATH VariableThis is one of the most abused areas that causes performance problems. PATH variables that are way too longAND the positioning of the directory that contains the most frequently used tools (by the application), at theend. This is of great concern.

NFSCheck your buffer cache size. Some say 128K for each 1000 IOP's a server expects to deliver.

Check your disk and file system configurations:

LVM configuration/layout• Multiple disk striping?• HFS? ...check your block/fragment sizes• JFS? ...check your mount options•

Reads and writes...server and client block sizes should match. Pay attention to the suggestions for file systems(above).

nfsd's ...start with 30 or 40. Some say that 2 per spindle is adequate

Make sure that ninode is at least 15000 (on 10.X). Some people have seen performance degradation on MultiProcessor systems when ninode is greater 4000. Check it on your system. The details of this problem aremuch to detailed and complicated for this document.

NFS file systems should be exported with the async option in /etc/exports.

Some items that can be investigated...

nfsd invocations

nfstat −a•

UDP buffer size

netstat −an | grep −e Proto −e 2049•

How often the UDP buffer overflows

netstat −s | grep overflow•


Performance Tips 19

NFS timeouts...are they a result of packet loss? Do they correlate to errors reported by the links? Uselanadmin() or netstat −i to check this.

IP fragment reassembly timeouts?

netstat −p ip•

UDP socket buffer overflows?

...see above•

mounting through routers?

check to see if routers are dropping packets•

check for transport bad checksums

netstat −s•

is server dropping requests as duplicates?

nfsstat•

is client getting duplicate replies? (badxid)

nfsstat on CLIENT•

Some people have mentioned that they have had serious problems because of too many levels of hierarchywithin the netgroup file. It seems that this file is re−read a very many times, and the more hierarchy, thelongerit takes to read.

(c) Copyright 1996 Hewlett−Packard Company.

December 1, 1997


Performance Tips 20

hpux kernel tuning guide

Documents